bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v2 0/7] BPF support for global data
@ 2019-02-28 23:18 Daniel Borkmann
  2019-02-28 23:18 ` [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access Daniel Borkmann
                   ` (6 more replies)
  0 siblings, 7 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-02-28 23:18 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb, Daniel Borkmann

This series is a major rework of previously submitted libbpf
patches [0] in order to add global data support for BPF. The
kernel has been extended to add proper infrastructure that allows
for full .bss/.data/.rodata sections on BPF loader side based
upon feedback from LPC discussions [1]. Latter support is then
also added into libbpf in this series which allows for more
natural C-like programming of BPF programs. For more information
on loader, please refer to 'bpf, libbpf: support global data/bss/
rodata sections' patch in this series. Joint work with Joe Stringer.

Thanks a lot!

  v1 -> v2:
    - Instead of 32-bit static data, implement full global
      data support.

  [0] https://patchwork.ozlabs.org/cover/1040290/
  [1] http://vger.kernel.org/lpc-bpf2018.html#session-3

Daniel Borkmann (5):
  bpf: implement lookup-free direct value access
  bpf: add program side {rd,wr}only support
  bpf, obj: allow . char as part of the name
  bpf, libbpf: support global data/bss/rodata sections
  bpf, selftest: test {rd,wr}only flags and direct value access

Joe Stringer (2):
  bpf, libbpf: refactor relocation handling
  bpf, selftest: test global data/bss/rodata sections

 include/linux/bpf.h                           |  24 ++
 include/linux/bpf_verifier.h                  |   4 +
 include/uapi/linux/bpf.h                      |  16 +-
 kernel/bpf/arraymap.c                         |  35 +-
 kernel/bpf/core.c                             |   3 +-
 kernel/bpf/disasm.c                           |   5 +-
 kernel/bpf/hashtab.c                          |   2 +-
 kernel/bpf/local_storage.c                    |   2 +-
 kernel/bpf/lpm_trie.c                         |   2 +-
 kernel/bpf/queue_stack_maps.c                 |   3 +-
 kernel/bpf/syscall.c                          |  35 +-
 kernel/bpf/verifier.c                         | 103 ++++--
 tools/bpf/bpftool/xlated_dumper.c             |   3 +
 tools/include/linux/filter.h                  |  19 +-
 tools/include/uapi/linux/bpf.h                |  16 +-
 tools/lib/bpf/libbpf.c                        | 321 ++++++++++++++----
 tools/testing/selftests/bpf/bpf_helpers.h     |   2 +-
 .../selftests/bpf/progs/test_global_data.c    |  61 ++++
 tools/testing/selftests/bpf/test_progs.c      |  50 +++
 tools/testing/selftests/bpf/test_verifier.c   |  40 ++-
 .../selftests/bpf/verifier/array_access.c     | 159 +++++++++
 .../bpf/verifier/direct_value_access.c        | 170 ++++++++++
 22 files changed, 955 insertions(+), 120 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_global_data.c
 create mode 100644 tools/testing/selftests/bpf/verifier/direct_value_access.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-02-28 23:18 [PATCH bpf-next v2 0/7] BPF support for global data Daniel Borkmann
@ 2019-02-28 23:18 ` Daniel Borkmann
  2019-03-01  3:33   ` Jann Horn
                     ` (4 more replies)
  2019-02-28 23:18 ` [PATCH bpf-next v2 2/7] bpf: add program side {rd,wr}only support Daniel Borkmann
                   ` (5 subsequent siblings)
  6 siblings, 5 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-02-28 23:18 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb, Daniel Borkmann

This generic extension to BPF maps allows for directly loading an
address residing inside a BPF map value as a single BPF ldimm64
instruction.

The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
is a special src_reg flag for ldimm64 instruction that indicates
that inside the first part of the double insns's imm field is a
file descriptor which the verifier then replaces as a full 64bit
address of the map into both imm parts.

For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea
is similar: the first part of the double insns's imm field is
again a file descriptor corresponding to the map, and the second
part of the imm field is an offset. The verifier will then replace
both imm parts with an address that points into the BPF map value
for maps that support this operation. BPF_PSEUDO_MAP_VALUE is a
distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not
differ offset 0 between load of map pointer versus load of map's
value at offset 0.

This allows for efficiently retrieving an address to a map value
memory area without having to issue a helper call which needs to
prepare registers according to calling convention, etc, without
needing the extra NULL test, and without having to add the offset
in an additional instruction to the value base pointer.

The verifier then treats the destination register as PTR_TO_MAP_VALUE
with constant reg->off from the user passed offset from the second
imm field, and guarantees that this is within bounds of the map
value. Any subsequent operations are normally treated as typical
map value handling without anything else needed for verification.

The two map operations for direct value access have been added to
array map for now. In future other types could be supported as
well depending on the use case. The main use case for this commit
is to allow for BPF loader support for global variables that
reside in .data/.rodata/.bss sections such that we can directly
load the address of them with minimal additional infrastructure
required. Loader support has been added in subsequent commits for
libbpf library.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h               |  6 +++
 include/linux/bpf_verifier.h      |  4 ++
 include/uapi/linux/bpf.h          |  6 ++-
 kernel/bpf/arraymap.c             | 33 ++++++++++++++
 kernel/bpf/core.c                 |  3 +-
 kernel/bpf/disasm.c               |  5 ++-
 kernel/bpf/syscall.c              | 29 +++++++++---
 kernel/bpf/verifier.c             | 73 +++++++++++++++++++++++--------
 tools/bpf/bpftool/xlated_dumper.c |  3 ++
 tools/include/uapi/linux/bpf.h    |  6 ++-
 10 files changed, 138 insertions(+), 30 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a2132e09dc1c..bdcc6e2a9977 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -57,6 +57,12 @@ struct bpf_map_ops {
 			     const struct btf *btf,
 			     const struct btf_type *key_type,
 			     const struct btf_type *value_type);
+
+	/* Direct value access helpers. */
+	int (*map_direct_value_access)(const struct bpf_map *map,
+				       u32 off, u64 *imm);
+	int (*map_direct_value_offset)(const struct bpf_map *map,
+				       u64 imm, u32 *off);
 };
 
 struct bpf_map {
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 69f7a3449eda..6e28f1c24710 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -183,6 +183,10 @@ struct bpf_insn_aux_data {
 		unsigned long map_state;	/* pointer/poison value for maps */
 		s32 call_imm;			/* saved imm field of call insn */
 		u32 alu_limit;			/* limit for add/sub register with pointer */
+		struct {
+			u32 map_index;		/* index into used_maps[] */
+			u32 map_off;		/* offset from value base address */
+		};
 	};
 	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
 	int sanitize_stack_off; /* stack slot to be cleared */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2e308e90ffea..8884072e1a46 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -255,8 +255,12 @@ enum bpf_attach_type {
  */
 #define BPF_F_ANY_ALIGNMENT	(1U << 1)
 
-/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
+/* When bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_{FD,VALUE}, then
+ * bpf_ldimm64's insn[0]->imm == fd in both cases. Additionally,
+ * for BPF_PSEUDO_MAP_VALUE, insn[1]->imm == offset into value.
+ */
 #define BPF_PSEUDO_MAP_FD	1
+#define BPF_PSEUDO_MAP_VALUE	2
 
 /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
  * offset to another bpf function
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index c72e0d8e1e65..3e5969c0c979 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -160,6 +160,37 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
 	return array->value + array->elem_size * (index & array->index_mask);
 }
 
+static int array_map_direct_value_access(const struct bpf_map *map, u32 off,
+					 u64 *imm)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+
+	if (map->max_entries != 1)
+		return -ENOTSUPP;
+	if (off >= map->value_size)
+		return -EINVAL;
+
+	*imm = (unsigned long)array->value;
+	return 0;
+}
+
+static int array_map_direct_value_offset(const struct bpf_map *map, u64 imm,
+					 u32 *off)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	unsigned long range = map->value_size;
+	unsigned long base  = array->value;
+	unsigned long addr  = imm;
+
+	if (map->max_entries != 1)
+		return -ENOENT;
+	if (addr < base || addr >= base + range)
+		return -ENOENT;
+
+	*off = addr - base;
+	return 0;
+}
+
 /* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
 static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
 {
@@ -419,6 +450,8 @@ const struct bpf_map_ops array_map_ops = {
 	.map_update_elem = array_map_update_elem,
 	.map_delete_elem = array_map_delete_elem,
 	.map_gen_lookup = array_map_gen_lookup,
+	.map_direct_value_access = array_map_direct_value_access,
+	.map_direct_value_offset = array_map_direct_value_offset,
 	.map_seq_show_elem = array_map_seq_show_elem,
 	.map_check_btf = array_map_check_btf,
 };
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1c14c347f3cf..49fc0ff14537 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -286,7 +286,8 @@ int bpf_prog_calc_tag(struct bpf_prog *fp)
 		dst[i] = fp->insnsi[i];
 		if (!was_ld_map &&
 		    dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
-		    dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
+		    (dst[i].src_reg == BPF_PSEUDO_MAP_FD ||
+		     dst[i].src_reg == BPF_PSEUDO_MAP_VALUE)) {
 			was_ld_map = true;
 			dst[i].imm = 0;
 		} else if (was_ld_map &&
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index de73f55e42fd..d9ce383c0f9c 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -205,10 +205,11 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
 			 * part of the ldimm64 insn is accessible.
 			 */
 			u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
-			bool map_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD;
+			bool is_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD ||
+				      insn->src_reg == BPF_PSEUDO_MAP_VALUE;
 			char tmp[64];
 
-			if (map_ptr && !allow_ptr_leaks)
+			if (is_ptr && !allow_ptr_leaks)
 				imm = 0;
 
 			verbose(cbs->private_data, "(%02x) r%d = %s\n",
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 174581dfe225..d3ef45e01d7a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2061,13 +2061,27 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 }
 
 static const struct bpf_map *bpf_map_from_imm(const struct bpf_prog *prog,
-					      unsigned long addr)
+					      unsigned long addr, u32 *off,
+					      u32 *type)
 {
+	const struct bpf_map *map;
 	int i;
 
-	for (i = 0; i < prog->aux->used_map_cnt; i++)
-		if (prog->aux->used_maps[i] == (void *)addr)
-			return prog->aux->used_maps[i];
+	*off = *type = 0;
+	for (i = 0; i < prog->aux->used_map_cnt; i++) {
+		map = prog->aux->used_maps[i];
+		if (map == (void *)addr) {
+			*type = BPF_PSEUDO_MAP_FD;
+			return map;
+		}
+		if (!map->ops->map_direct_value_offset)
+			continue;
+		if (!map->ops->map_direct_value_offset(map, addr, off)) {
+			*type = BPF_PSEUDO_MAP_VALUE;
+			return map;
+		}
+	}
+
 	return NULL;
 }
 
@@ -2075,6 +2089,7 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
 {
 	const struct bpf_map *map;
 	struct bpf_insn *insns;
+	u32 off, type;
 	u64 imm;
 	int i;
 
@@ -2102,11 +2117,11 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
 			continue;
 
 		imm = ((u64)insns[i + 1].imm << 32) | (u32)insns[i].imm;
-		map = bpf_map_from_imm(prog, imm);
+		map = bpf_map_from_imm(prog, imm, &off, &type);
 		if (map) {
-			insns[i].src_reg = BPF_PSEUDO_MAP_FD;
+			insns[i].src_reg = type;
 			insns[i].imm = map->id;
-			insns[i + 1].imm = 0;
+			insns[i + 1].imm = off;
 			continue;
 		}
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0e4edd7e3c5f..3ad05dda6e9d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4944,18 +4944,12 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 	return 0;
 }
 
-/* return the map pointer stored inside BPF_LD_IMM64 instruction */
-static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
-{
-	u64 imm64 = ((u64) (u32) insn[0].imm) | ((u64) (u32) insn[1].imm) << 32;
-
-	return (struct bpf_map *) (unsigned long) imm64;
-}
-
 /* verify BPF_LD_IMM64 instruction */
 static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {
+	struct bpf_insn_aux_data *aux = cur_aux(env);
 	struct bpf_reg_state *regs = cur_regs(env);
+	struct bpf_map *map;
 	int err;
 
 	if (BPF_SIZE(insn->code) != BPF_DW) {
@@ -4979,11 +4973,22 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		return 0;
 	}
 
-	/* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
-	BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
+	map = env->used_maps[aux->map_index];
+	mark_reg_known_zero(env, regs, insn->dst_reg);
+	regs[insn->dst_reg].map_ptr = map;
+
+	if (insn->src_reg == BPF_PSEUDO_MAP_VALUE) {
+		regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
+		regs[insn->dst_reg].off = aux->map_off;
+		if (map_value_has_spin_lock(map))
+			regs[insn->dst_reg].id = ++env->id_gen;
+	} else if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
+		regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
+	} else {
+		verbose(env, "bpf verifier is misconfigured\n");
+		return -EINVAL;
+	}
 
-	regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
-	regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
 	return 0;
 }
 
@@ -6664,8 +6669,10 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
 		}
 
 		if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
+			struct bpf_insn_aux_data *aux;
 			struct bpf_map *map;
 			struct fd f;
+			u64 addr;
 
 			if (i == insn_cnt - 1 || insn[1].code != 0 ||
 			    insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
@@ -6677,8 +6684,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
 			if (insn->src_reg == 0)
 				/* valid generic load 64-bit imm */
 				goto next_insn;
-
-			if (insn->src_reg != BPF_PSEUDO_MAP_FD) {
+			if (insn->src_reg != BPF_PSEUDO_MAP_FD &&
+			    insn->src_reg != BPF_PSEUDO_MAP_VALUE) {
 				verbose(env,
 					"unrecognized bpf_ld_imm64 insn\n");
 				return -EINVAL;
@@ -6698,16 +6705,44 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
 				return err;
 			}
 
-			/* store map pointer inside BPF_LD_IMM64 instruction */
-			insn[0].imm = (u32) (unsigned long) map;
-			insn[1].imm = ((u64) (unsigned long) map) >> 32;
+			aux = &env->insn_aux_data[i];
+			if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
+				addr = (unsigned long)map;
+			} else {
+				u32 off = insn[1].imm;
+
+				if (off >= BPF_MAX_VAR_OFF) {
+					verbose(env, "direct value offset of %u is not allowed\n",
+						off);
+					return -EINVAL;
+				}
+				if (!map->ops->map_direct_value_access) {
+					verbose(env, "no direct value access support for this map type\n");
+					return -EINVAL;
+				}
+
+				err = map->ops->map_direct_value_access(map, off, &addr);
+				if (err) {
+					verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
+						map->value_size, off);
+					return err;
+				}
+
+				aux->map_off = off;
+				addr += off;
+			}
+
+			insn[0].imm = (u32)addr;
+			insn[1].imm = addr >> 32;
 
 			/* check whether we recorded this map already */
-			for (j = 0; j < env->used_map_cnt; j++)
+			for (j = 0; j < env->used_map_cnt; j++) {
 				if (env->used_maps[j] == map) {
+					aux->map_index = j;
 					fdput(f);
 					goto next_insn;
 				}
+			}
 
 			if (env->used_map_cnt >= MAX_USED_MAPS) {
 				fdput(f);
@@ -6724,6 +6759,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
 				fdput(f);
 				return PTR_ERR(map);
 			}
+
+			aux->map_index = env->used_map_cnt;
 			env->used_maps[env->used_map_cnt++] = map;
 
 			if (bpf_map_is_cgroup_storage(map) &&
diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
index 7073dbe1ff27..0bb17bf88b18 100644
--- a/tools/bpf/bpftool/xlated_dumper.c
+++ b/tools/bpf/bpftool/xlated_dumper.c
@@ -195,6 +195,9 @@ static const char *print_imm(void *private_data,
 	if (insn->src_reg == BPF_PSEUDO_MAP_FD)
 		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
 			 "map[id:%u]", insn->imm);
+	else if (insn->src_reg == BPF_PSEUDO_MAP_VALUE)
+		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+			 "map[id:%u][0]+%u", insn->imm, (insn + 1)->imm);
 	else
 		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
 			 "0x%llx", (unsigned long long)full_imm);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2e308e90ffea..8884072e1a46 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -255,8 +255,12 @@ enum bpf_attach_type {
  */
 #define BPF_F_ANY_ALIGNMENT	(1U << 1)
 
-/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
+/* When bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_{FD,VALUE}, then
+ * bpf_ldimm64's insn[0]->imm == fd in both cases. Additionally,
+ * for BPF_PSEUDO_MAP_VALUE, insn[1]->imm == offset into value.
+ */
 #define BPF_PSEUDO_MAP_FD	1
+#define BPF_PSEUDO_MAP_VALUE	2
 
 /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
  * offset to another bpf function
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v2 2/7] bpf: add program side {rd,wr}only support
  2019-02-28 23:18 [PATCH bpf-next v2 0/7] BPF support for global data Daniel Borkmann
  2019-02-28 23:18 ` [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access Daniel Borkmann
@ 2019-02-28 23:18 ` Daniel Borkmann
  2019-03-01  3:51   ` Jakub Kicinski
  2019-02-28 23:18 ` [PATCH bpf-next v2 3/7] bpf, obj: allow . char as part of the name Daniel Borkmann
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-02-28 23:18 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb, Daniel Borkmann

This work adds two new map creation flags BPF_F_RDONLY_PROG
and BPF_F_WRONLY_PROG in order to allow for read-only or
write-only BPF maps from a BPF program side.

Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
applies to system call side, meaning the BPF program has full
read/write access to the map as usual while bpf(2) calls with
map fd can either only read or write into the map depending
on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
for the exact opposite such that verifier is going to reject
program loads if write into a read-only map or a read into a
write-only map is detected.

We've enabled this generic map extension to various non-special
maps holding normal user data: array, hash, lru, lpm, local
storage, queue and stack. Further map types could be followed
up in future depending on use-case. Main use case here is to
forbid writes into .rodata map values from verifier side.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h           | 18 ++++++++++++++++++
 include/uapi/linux/bpf.h      | 10 +++++++++-
 kernel/bpf/arraymap.c         |  2 +-
 kernel/bpf/hashtab.c          |  2 +-
 kernel/bpf/local_storage.c    |  2 +-
 kernel/bpf/lpm_trie.c         |  2 +-
 kernel/bpf/queue_stack_maps.c |  3 +--
 kernel/bpf/verifier.c         | 30 +++++++++++++++++++++++++++++-
 8 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bdcc6e2a9977..3f74194dd4f6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -427,6 +427,24 @@ struct bpf_array {
 	};
 };
 
+#define BPF_MAP_CAN_READ	BIT(0)
+#define BPF_MAP_CAN_WRITE	BIT(1)
+
+static inline u32 bpf_map_flags_to_cap(struct bpf_map *map)
+{
+	u32 access_flags = map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
+
+	/* Combination of BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG is
+	 * not possible.
+	 */
+	if (access_flags & BPF_F_RDONLY_PROG)
+		return BPF_MAP_CAN_READ;
+	else if (access_flags & BPF_F_WRONLY_PROG)
+		return BPF_MAP_CAN_WRITE;
+	else
+		return BPF_MAP_CAN_READ | BPF_MAP_CAN_WRITE;
+}
+
 #define MAX_TAIL_CALL_CNT 32
 
 struct bpf_event_entry {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 8884072e1a46..04b26f59b413 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -287,7 +287,7 @@ enum bpf_attach_type {
 
 #define BPF_OBJ_NAME_LEN 16U
 
-/* Flags for accessing BPF object */
+/* Flags for accessing BPF object from syscall side. */
 #define BPF_F_RDONLY		(1U << 3)
 #define BPF_F_WRONLY		(1U << 4)
 
@@ -297,6 +297,14 @@ enum bpf_attach_type {
 /* Zero-initialize hash function seed. This should only be used for testing. */
 #define BPF_F_ZERO_SEED		(1U << 6)
 
+/* Flags for accessing BPF object from program side. */
+#define BPF_F_RDONLY_PROG	(1U << 7)
+#define BPF_F_WRONLY_PROG	(1U << 8)
+#define BPF_F_ACCESS_MASK	(BPF_F_RDONLY |		\
+				 BPF_F_RDONLY_PROG |	\
+				 BPF_F_WRONLY |		\
+				 BPF_F_WRONLY_PROG)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 3e5969c0c979..076dc3d77faf 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -22,7 +22,7 @@
 #include "map_in_map.h"
 
 #define ARRAY_CREATE_FLAG_MASK \
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
 
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index fed15cf94dca..ab9d51ac80e1 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -23,7 +23,7 @@
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
-	 BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_ZERO_SEED)
+	 BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
 
 struct bucket {
 	struct hlist_nulls_head head;
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 6b572e2de7fb..3ffe3259da00 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -14,7 +14,7 @@ DEFINE_PER_CPU(struct bpf_cgroup_storage*, bpf_cgroup_storage[MAX_BPF_CGROUP_STO
 #ifdef CONFIG_CGROUP_BPF
 
 #define LOCAL_STORAGE_CREATE_FLAG_MASK					\
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
 
 struct bpf_cgroup_storage_map {
 	struct bpf_map map;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index abf1002080df..79c75b1626b8 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -537,7 +537,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 #define LPM_KEY_SIZE_MIN	LPM_KEY_SIZE(LPM_DATA_SIZE_MIN)
 
 #define LPM_CREATE_FLAG_MASK	(BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE |	\
-				 BPF_F_RDONLY | BPF_F_WRONLY)
+				 BPF_F_ACCESS_MASK)
 
 static struct bpf_map *trie_alloc(union bpf_attr *attr)
 {
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index b384ea9f3254..1eb9ceef075c 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -11,8 +11,7 @@
 #include "percpu_freelist.h"
 
 #define QUEUE_STACK_CREATE_FLAG_MASK \
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
-
+	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
 
 struct bpf_queue_stack {
 	struct bpf_map map;
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3ad05dda6e9d..cdd2cb01f789 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1429,6 +1429,28 @@ static int check_stack_access(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
+				 int off, int size, enum bpf_access_type type)
+{
+	struct bpf_reg_state *regs = cur_regs(env);
+	struct bpf_map *map = regs[regno].map_ptr;
+	u32 cap = bpf_map_flags_to_cap(map);
+
+	if (type == BPF_WRITE && !(cap & BPF_MAP_CAN_WRITE)) {
+		verbose(env, "write into map forbidden, value_size=%d off=%d size=%d\n",
+			map->value_size, off, size);
+		return -EACCES;
+	}
+
+	if (type == BPF_READ && !(cap & BPF_MAP_CAN_READ)) {
+		verbose(env, "read into map forbidden, value_size=%d off=%d size=%d\n",
+			map->value_size, off, size);
+		return -EACCES;
+	}
+
+	return 0;
+}
+
 /* check read/write into map element returned by bpf_map_lookup_elem() */
 static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
 			      int size, bool zero_size_allowed)
@@ -2014,7 +2036,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 			verbose(env, "R%d leaks addr into map\n", value_regno);
 			return -EACCES;
 		}
-
+		err = check_map_access_type(env, regno, off, size, t);
+		if (err)
+			return err;
 		err = check_map_access(env, regno, off, size, false);
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
@@ -2250,6 +2274,10 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
 		return check_packet_access(env, regno, reg->off, access_size,
 					   zero_size_allowed);
 	case PTR_TO_MAP_VALUE:
+		if (check_map_access_type(env, regno, reg->off, access_size,
+					  meta && meta->raw_mode ? BPF_WRITE :
+					  BPF_READ))
+			return -EACCES;
 		return check_map_access(env, regno, reg->off, access_size,
 					zero_size_allowed);
 	default: /* scalar_value|ptr_to_stack or invalid ptr */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v2 3/7] bpf, obj: allow . char as part of the name
  2019-02-28 23:18 [PATCH bpf-next v2 0/7] BPF support for global data Daniel Borkmann
  2019-02-28 23:18 ` [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access Daniel Borkmann
  2019-02-28 23:18 ` [PATCH bpf-next v2 2/7] bpf: add program side {rd,wr}only support Daniel Borkmann
@ 2019-02-28 23:18 ` Daniel Borkmann
  2019-03-01  5:52   ` Andrii Nakryiko
  2019-02-28 23:18 ` [PATCH bpf-next v2 4/7] bpf, libbpf: refactor relocation handling Daniel Borkmann
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-02-28 23:18 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb, Daniel Borkmann

Trivial addition to allow '.' aside from '_' as "special" characters
in the object name. Used to name maps from loader side as ".bss",
".data", ".rodata".

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/bpf/syscall.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index d3ef45e01d7a..90044da3346e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -440,10 +440,10 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
 	const char *end = src + BPF_OBJ_NAME_LEN;
 
 	memset(dst, 0, BPF_OBJ_NAME_LEN);
-
-	/* Copy all isalnum() and '_' char */
+	/* Copy all isalnum(), '_' and '.' chars. */
 	while (src < end && *src) {
-		if (!isalnum(*src) && *src != '_')
+		if (!isalnum(*src) &&
+		    *src != '_' && *src != '.')
 			return -EINVAL;
 		*dst++ = *src++;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v2 4/7] bpf, libbpf: refactor relocation handling
  2019-02-28 23:18 [PATCH bpf-next v2 0/7] BPF support for global data Daniel Borkmann
                   ` (2 preceding siblings ...)
  2019-02-28 23:18 ` [PATCH bpf-next v2 3/7] bpf, obj: allow . char as part of the name Daniel Borkmann
@ 2019-02-28 23:18 ` Daniel Borkmann
  2019-02-28 23:18 ` [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-02-28 23:18 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb, Daniel Borkmann

From: Joe Stringer <joe@wand.net.nz>

Adjust the code for relocations slightly with no functional changes,
so that upcoming patches that will introduce support for relocations
into the .data, .rodata and .bss sections can be added independent
of these changes.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/lib/bpf/libbpf.c | 62 ++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b38dcbe7460a..8f8f688f3e9b 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -851,20 +851,20 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 				obj->efile.symbols = data;
 				obj->efile.strtabidx = sh.sh_link;
 			}
-		} else if ((sh.sh_type == SHT_PROGBITS) &&
-			   (sh.sh_flags & SHF_EXECINSTR) &&
-			   (data->d_size > 0)) {
-			if (strcmp(name, ".text") == 0)
-				obj->efile.text_shndx = idx;
-			err = bpf_object__add_program(obj, data->d_buf,
-						      data->d_size, name, idx);
-			if (err) {
-				char errmsg[STRERR_BUFSIZE];
-				char *cp = libbpf_strerror_r(-err, errmsg,
-							     sizeof(errmsg));
-
-				pr_warning("failed to alloc program %s (%s): %s",
-					   name, obj->path, cp);
+		} else if (sh.sh_type == SHT_PROGBITS && data->d_size > 0) {
+			if (sh.sh_flags & SHF_EXECINSTR) {
+				if (strcmp(name, ".text") == 0)
+					obj->efile.text_shndx = idx;
+				err = bpf_object__add_program(obj, data->d_buf,
+							      data->d_size, name, idx);
+				if (err) {
+					char errmsg[STRERR_BUFSIZE];
+					char *cp = libbpf_strerror_r(-err, errmsg,
+								     sizeof(errmsg));
+
+					pr_warning("failed to alloc program %s (%s): %s",
+						   name, obj->path, cp);
+				}
 			}
 		} else if (sh.sh_type == SHT_REL) {
 			void *reloc = obj->efile.reloc;
@@ -1026,24 +1026,26 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 			return -LIBBPF_ERRNO__RELOC;
 		}
 
-		/* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
-		for (map_idx = 0; map_idx < nr_maps; map_idx++) {
-			if (maps[map_idx].offset == sym.st_value) {
-				pr_debug("relocation: find map %zd (%s) for insn %u\n",
-					 map_idx, maps[map_idx].name, insn_idx);
-				break;
+		if (sym.st_shndx == maps_shndx) {
+			/* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
+			for (map_idx = 0; map_idx < nr_maps; map_idx++) {
+				if (maps[map_idx].offset == sym.st_value) {
+					pr_debug("relocation: find map %zd (%s) for insn %u\n",
+						 map_idx, maps[map_idx].name, insn_idx);
+					break;
+				}
 			}
-		}
 
-		if (map_idx >= nr_maps) {
-			pr_warning("bpf relocation: map_idx %d large than %d\n",
-				   (int)map_idx, (int)nr_maps - 1);
-			return -LIBBPF_ERRNO__RELOC;
-		}
+			if (map_idx >= nr_maps) {
+				pr_warning("bpf relocation: map_idx %d large than %d\n",
+					   (int)map_idx, (int)nr_maps - 1);
+				return -LIBBPF_ERRNO__RELOC;
+			}
 
-		prog->reloc_desc[i].type = RELO_LD64;
-		prog->reloc_desc[i].insn_idx = insn_idx;
-		prog->reloc_desc[i].map_idx = map_idx;
+			prog->reloc_desc[i].type = RELO_LD64;
+			prog->reloc_desc[i].insn_idx = insn_idx;
+			prog->reloc_desc[i].map_idx = map_idx;
+		}
 	}
 	return 0;
 }
@@ -1405,7 +1407,7 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
 			}
 			insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
 			insns[insn_idx].imm = obj->maps[map_idx].fd;
-		} else {
+		} else if (prog->reloc_desc[i].type == RELO_CALL) {
 			err = bpf_program__reloc_text(prog, obj,
 						      &prog->reloc_desc[i]);
 			if (err)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-02-28 23:18 [PATCH bpf-next v2 0/7] BPF support for global data Daniel Borkmann
                   ` (3 preceding siblings ...)
  2019-02-28 23:18 ` [PATCH bpf-next v2 4/7] bpf, libbpf: refactor relocation handling Daniel Borkmann
@ 2019-02-28 23:18 ` Daniel Borkmann
  2019-02-28 23:41   ` Stanislav Fomichev
                     ` (2 more replies)
  2019-02-28 23:18 ` [PATCH bpf-next v2 6/7] bpf, selftest: test " Daniel Borkmann
  2019-02-28 23:18 ` [PATCH bpf-next v2 7/7] bpf, selftest: test {rd,wr}only flags and direct value access Daniel Borkmann
  6 siblings, 3 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-02-28 23:18 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb, Daniel Borkmann

This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.

Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential life updates of their content. This work follows
this path by taking the following steps from loader side:

 1) In bpf_object__elf_collect() step we pick up ".data",
    ".rodata", and ".bss" section information.

 2) If present, in bpf_object__init_global_maps() we create
    a map that corresponds to each of the present sections.
    Given section size and access properties can differ, a
    single entry array map is created with value size that
    is corresponding to the ELF section size of .data, .bss
    or .rodata. In the latter case, the map is created as
    read-only from program side such that verifier rejects
    any write attempts into .rodata. In a subsequent step,
    for .data and .rodata sections, the section content is
    copied into the map through bpf_map_update_elem(). For
    .bss this is not necessary since array map is already
    zero-initialized by default.

 3) In bpf_program__collect_reloc() step, we record the
    corresponding map, insn index, and relocation type for
    the global data.

 4) And last but not least in the actual relocation step in
    bpf_program__relocate(), we mark the ldimm64 instruction
    with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
    imm field the map's file descriptor is stored as similarly
    done as in BPF_PSEUDO_MAP_FD, and in the second imm field
    (as ldimm64 is 2-insn wide) we store the access offset
    into the section.

 5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
    load will then store the actual target address in order
    to have a 'map-lookup'-free access. That is, the actual
    map value base address + offset. The destination register
    in the verifier will then be marked as PTR_TO_MAP_VALUE,
    containing the fixed offset as reg->off and backing BPF
    map as reg->map_ptr. Meaning, it's treated as any other
    normal map value from verification side, only with
    efficient, direct value access instead of actual call to
    map lookup helper as in the typical case.

Simple example dump of program using globals vars in each
section:

  # readelf -a test_global_data.o
  [...]
  [ 6] .bss              NOBITS           0000000000000000  00000328
       0000000000000010  0000000000000000  WA       0     0     8
  [ 7] .data             PROGBITS         0000000000000000  00000328
       0000000000000010  0000000000000000  WA       0     0     8
  [ 8] .rodata           PROGBITS         0000000000000000  00000338
       0000000000000018  0000000000000000   A       0     0     8
  [...]
    95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
    96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
    97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
    98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
    99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
   100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
   101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
  [...]

  # bpftool prog
  103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
       loaded_at 2019-02-28T02:02:35+0000  uid 0
       xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
  # bpftool map show id 63
  63: array  name .bss  flags 0x0                      <-- .bss area, rw
      key 4B  value 16B  max_entries 1  memlock 4096B
  # bpftool map show id 64
  64: array  name .data  flags 0x0                     <-- .data area, rw
      key 4B  value 16B  max_entries 1  memlock 4096B
  # bpftool map show id 65
  65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
      key 4B  value 24B  max_entries 1  memlock 4096B

  # bpftool prog dump xlated id 103
  int load_static_data(struct __sk_buff * skb):
  ; int load_static_data(struct __sk_buff *skb)
     0: (b7) r1 = 0
  ; key = 0;
     1: (63) *(u32 *)(r10 -4) = r1
     2: (bf) r6 = r10
  ; int load_static_data(struct __sk_buff *skb)
     3: (07) r6 += -4
  ; bpf_map_update_elem(&result, &key, &static_bss, 0);
     4: (18) r1 = map[id:66]
     6: (bf) r2 = r6
     7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
     9: (b7) r4 = 0
    10: (85) call array_map_update_elem#99888
    11: (b7) r1 = 1
  ; key = 1;
    12: (63) *(u32 *)(r10 -4) = r1
  ; bpf_map_update_elem(&result, &key, &static_data, 0);
    13: (18) r1 = map[id:66]
    15: (bf) r2 = r6
    16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
    18: (b7) r4 = 0
    19: (85) call array_map_update_elem#99888
    20: (b7) r1 = 2
  ; key = 2;
    21: (63) *(u32 *)(r10 -4) = r1
  ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
    22: (18) r1 = map[id:66]
    24: (bf) r2 = r6
    25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
    27: (b7) r4 = 0
    28: (85) call array_map_update_elem#99888
    29: (b7) r1 = 3
  ; key = 3;
    30: (63) *(u32 *)(r10 -4) = r1
  ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
    31: (18) r7 = map[id:63][0]+8         <--.
    33: (18) r1 = map[id:66]                 |
    35: (bf) r2 = r6                         |
    36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
    38: (b7) r4 = 0
    39: (85) call array_map_update_elem#99888
  [...]

For now .data/.rodata/.bss maps are not exposed via API to the
user, but this could be done in a subsequent step.

Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
fail for static variables").

Joint work with Joe Stringer.

  [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
      http://vger.kernel.org/lpc-bpf2018.html#session-3

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Joe Stringer <joe@wand.net.nz>
---
 tools/include/uapi/linux/bpf.h |  10 +-
 tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
 2 files changed, 226 insertions(+), 43 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 8884072e1a46..04b26f59b413 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -287,7 +287,7 @@ enum bpf_attach_type {
 
 #define BPF_OBJ_NAME_LEN 16U
 
-/* Flags for accessing BPF object */
+/* Flags for accessing BPF object from syscall side. */
 #define BPF_F_RDONLY		(1U << 3)
 #define BPF_F_WRONLY		(1U << 4)
 
@@ -297,6 +297,14 @@ enum bpf_attach_type {
 /* Zero-initialize hash function seed. This should only be used for testing. */
 #define BPF_F_ZERO_SEED		(1U << 6)
 
+/* Flags for accessing BPF object from program side. */
+#define BPF_F_RDONLY_PROG	(1U << 7)
+#define BPF_F_WRONLY_PROG	(1U << 8)
+#define BPF_F_ACCESS_MASK	(BPF_F_RDONLY |		\
+				 BPF_F_RDONLY_PROG |	\
+				 BPF_F_WRONLY |		\
+				 BPF_F_WRONLY_PROG)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 8f8f688f3e9b..969bc3d9f02c 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -139,6 +139,9 @@ struct bpf_program {
 		enum {
 			RELO_LD64,
 			RELO_CALL,
+			RELO_DATA,
+			RELO_RODATA,
+			RELO_BSS,
 		} type;
 		int insn_idx;
 		union {
@@ -174,7 +177,10 @@ struct bpf_program {
 struct bpf_map {
 	int fd;
 	char *name;
-	size_t offset;
+	union {
+		__u32 global_type;
+		size_t offset;
+	};
 	int map_ifindex;
 	int inner_map_fd;
 	struct bpf_map_def def;
@@ -194,6 +200,8 @@ struct bpf_object {
 	size_t nr_programs;
 	struct bpf_map *maps;
 	size_t nr_maps;
+	struct bpf_map *maps_global;
+	size_t nr_maps_global;
 
 	bool loaded;
 	bool has_pseudo_calls;
@@ -209,6 +217,9 @@ struct bpf_object {
 		Elf *elf;
 		GElf_Ehdr ehdr;
 		Elf_Data *symbols;
+		Elf_Data *global_data;
+		Elf_Data *global_rodata;
+		Elf_Data *global_bss;
 		size_t strtabidx;
 		struct {
 			GElf_Shdr shdr;
@@ -217,6 +228,9 @@ struct bpf_object {
 		int nr_reloc;
 		int maps_shndx;
 		int text_shndx;
+		int data_shndx;
+		int rodata_shndx;
+		int bss_shndx;
 	} efile;
 	/*
 	 * All loaded bpf_object is linked in a list, which is
@@ -457,6 +471,9 @@ static struct bpf_object *bpf_object__new(const char *path,
 	obj->efile.obj_buf = obj_buf;
 	obj->efile.obj_buf_sz = obj_buf_sz;
 	obj->efile.maps_shndx = -1;
+	obj->efile.data_shndx = -1;
+	obj->efile.rodata_shndx = -1;
+	obj->efile.bss_shndx = -1;
 
 	obj->loaded = false;
 
@@ -475,6 +492,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
 		obj->efile.elf = NULL;
 	}
 	obj->efile.symbols = NULL;
+	obj->efile.global_data = NULL;
+	obj->efile.global_rodata = NULL;
+	obj->efile.global_bss = NULL;
 
 	zfree(&obj->efile.reloc);
 	obj->efile.nr_reloc = 0;
@@ -757,6 +777,85 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
 	return 0;
 }
 
+static int
+bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map);
+
+static int
+bpf_object__init_global(struct bpf_object *obj, int i, int type,
+			const char *name, Elf_Data *map_data)
+{
+	struct bpf_map *map = &obj->maps_global[i];
+	struct bpf_map_def *def = &map->def;
+	char *cp, errmsg[STRERR_BUFSIZE];
+	int err, slot0 = 0;
+
+	def->type = BPF_MAP_TYPE_ARRAY;
+	def->key_size = sizeof(int);
+	def->value_size = map_data->d_size;
+	def->max_entries = 1;
+	def->map_flags = type == RELO_RODATA ? BPF_F_RDONLY_PROG : 0;
+
+	map->name = strdup(name);
+	map->global_type = type;
+	map->fd = bpf_object__create_map(obj, map);
+	if (map->fd < 0) {
+		err = map->fd;
+		cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
+		pr_warning("failed to create map (name: '%s'): %s\n",
+			   map->name, cp);
+		goto destroy;
+	}
+
+	pr_debug("create map %s: fd=%d\n", map->name, map->fd);
+
+	if (type != RELO_BSS) {
+		err = bpf_map_update_elem(map->fd, &slot0, map_data->d_buf, 0);
+		if (err < 0) {
+			cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
+			pr_warning("failed to update map (name: '%s'): %s\n",
+				   map->name, cp);
+			goto destroy;
+		}
+
+		pr_debug("updated map %s with elf data: fd=%d\n", map->name,
+			 map->fd);
+	}
+	return 0;
+destroy:
+	for (i = 0; i < obj->nr_maps_global; i++)
+		zclose(obj->maps_global[i].fd);
+	return err;
+}
+
+static int
+bpf_object__init_global_maps(struct bpf_object *obj)
+{
+	int nr_maps_global = (obj->efile.data_shndx >= 0) +
+			     (obj->efile.rodata_shndx >= 0) +
+			     (obj->efile.bss_shndx >= 0), i, err = 0;
+
+	obj->maps_global = calloc(nr_maps_global, sizeof(obj->maps_global[0]));
+	if (!obj->maps_global) {
+		pr_warning("alloc maps for object failed\n");
+		return -ENOMEM;
+	}
+
+	obj->nr_maps_global = nr_maps_global;
+	for (i = 0; i < obj->nr_maps_global; i++)
+		obj->maps[i].fd = -1;
+	i = 0;
+	if (obj->efile.bss_shndx >= 0)
+		err = bpf_object__init_global(obj, i++, RELO_BSS, ".bss",
+					      obj->efile.global_bss);
+	if (obj->efile.data_shndx >= 0 && !err)
+		err = bpf_object__init_global(obj, i++, RELO_DATA, ".data",
+					      obj->efile.global_data);
+	if (obj->efile.rodata_shndx >= 0 && !err)
+		err = bpf_object__init_global(obj, i++, RELO_RODATA, ".rodata",
+					      obj->efile.global_rodata);
+	return err;
+}
+
 static bool section_have_execinstr(struct bpf_object *obj, int idx)
 {
 	Elf_Scn *scn;
@@ -865,6 +964,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 					pr_warning("failed to alloc program %s (%s): %s",
 						   name, obj->path, cp);
 				}
+			} else if (strcmp(name, ".data") == 0) {
+				obj->efile.global_data = data;
+				obj->efile.data_shndx = idx;
+			} else if (strcmp(name, ".rodata") == 0) {
+				obj->efile.global_rodata = data;
+				obj->efile.rodata_shndx = idx;
 			}
 		} else if (sh.sh_type == SHT_REL) {
 			void *reloc = obj->efile.reloc;
@@ -892,6 +997,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 				obj->efile.reloc[n].shdr = sh;
 				obj->efile.reloc[n].data = data;
 			}
+		} else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
+			obj->efile.global_bss = data;
+			obj->efile.bss_shndx = idx;
 		} else {
 			pr_debug("skip section(%d) %s\n", idx, name);
 		}
@@ -923,6 +1031,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 		if (err)
 			goto out;
 	}
+	if (obj->efile.data_shndx >= 0 ||
+	    obj->efile.rodata_shndx >= 0 ||
+	    obj->efile.bss_shndx >= 0) {
+		err = bpf_object__init_global_maps(obj);
+		if (err)
+			goto out;
+	}
+
 	err = bpf_object__init_prog_names(obj);
 out:
 	return err;
@@ -961,6 +1077,11 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 	Elf_Data *symbols = obj->efile.symbols;
 	int text_shndx = obj->efile.text_shndx;
 	int maps_shndx = obj->efile.maps_shndx;
+	int data_shndx = obj->efile.data_shndx;
+	int rodata_shndx = obj->efile.rodata_shndx;
+	int bss_shndx = obj->efile.bss_shndx;
+	struct bpf_map *maps_global = obj->maps_global;
+	size_t nr_maps_global = obj->nr_maps_global;
 	struct bpf_map *maps = obj->maps;
 	size_t nr_maps = obj->nr_maps;
 	int i, nrels;
@@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 			 (long long) (rel.r_info >> 32),
 			 (long long) sym.st_value, sym.st_name);
 
-		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
-			pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
+		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
+		    sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
+		    sym.st_shndx != bss_shndx) {
+			pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
 				   prog->section_name, sym.st_shndx);
 			return -LIBBPF_ERRNO__RELOC;
 		}
@@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 			prog->reloc_desc[i].type = RELO_LD64;
 			prog->reloc_desc[i].insn_idx = insn_idx;
 			prog->reloc_desc[i].map_idx = map_idx;
+		} else if (sym.st_shndx == data_shndx ||
+			   sym.st_shndx == rodata_shndx ||
+			   sym.st_shndx == bss_shndx) {
+			int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
+				   (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
+								    RELO_BSS;
+
+			for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
+				if (maps_global[map_idx].global_type == type) {
+					pr_debug("relocation: find map %zd (%s) for insn %u\n",
+						 map_idx, maps_global[map_idx].name, insn_idx);
+					break;
+				}
+			}
+
+			if (map_idx >= nr_maps_global) {
+				pr_warning("bpf relocation: map_idx %d large than %d\n",
+					   (int)map_idx, (int)nr_maps_global - 1);
+				return -LIBBPF_ERRNO__RELOC;
+			}
+
+			prog->reloc_desc[i].type = type;
+			prog->reloc_desc[i].insn_idx = insn_idx;
+			prog->reloc_desc[i].map_idx = map_idx;
 		}
 	}
 	return 0;
@@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
 }
 
 static int
-bpf_object__create_maps(struct bpf_object *obj)
+bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map)
 {
 	struct bpf_create_map_attr create_attr = {};
+	struct bpf_map_def *def = &map->def;
+	char *cp, errmsg[STRERR_BUFSIZE];
+	int fd;
+
+	if (obj->caps.name)
+		create_attr.name = map->name;
+	create_attr.map_ifindex = map->map_ifindex;
+	create_attr.map_type = def->type;
+	create_attr.map_flags = def->map_flags;
+	create_attr.key_size = def->key_size;
+	create_attr.value_size = def->value_size;
+	create_attr.max_entries = def->max_entries;
+	create_attr.btf_fd = 0;
+	create_attr.btf_key_type_id = 0;
+	create_attr.btf_value_type_id = 0;
+	if (bpf_map_type__is_map_in_map(def->type) &&
+	    map->inner_map_fd >= 0)
+		create_attr.inner_map_fd = map->inner_map_fd;
+	if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
+		create_attr.btf_fd = btf__fd(obj->btf);
+		create_attr.btf_key_type_id = map->btf_key_type_id;
+		create_attr.btf_value_type_id = map->btf_value_type_id;
+	}
+
+	fd = bpf_create_map_xattr(&create_attr);
+	if (fd < 0 && create_attr.btf_key_type_id) {
+		cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
+		pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
+			   map->name, cp, errno);
+
+		create_attr.btf_fd = 0;
+		create_attr.btf_key_type_id = 0;
+		create_attr.btf_value_type_id = 0;
+		map->btf_key_type_id = 0;
+		map->btf_value_type_id = 0;
+		fd = bpf_create_map_xattr(&create_attr);
+	}
+
+	return fd;
+}
+
+static int
+bpf_object__create_maps(struct bpf_object *obj)
+{
 	unsigned int i;
 	int err;
 
 	for (i = 0; i < obj->nr_maps; i++) {
 		struct bpf_map *map = &obj->maps[i];
-		struct bpf_map_def *def = &map->def;
 		char *cp, errmsg[STRERR_BUFSIZE];
 		int *pfd = &map->fd;
 
@@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 				 map->name, map->fd);
 			continue;
 		}
-
-		if (obj->caps.name)
-			create_attr.name = map->name;
-		create_attr.map_ifindex = map->map_ifindex;
-		create_attr.map_type = def->type;
-		create_attr.map_flags = def->map_flags;
-		create_attr.key_size = def->key_size;
-		create_attr.value_size = def->value_size;
-		create_attr.max_entries = def->max_entries;
-		create_attr.btf_fd = 0;
-		create_attr.btf_key_type_id = 0;
-		create_attr.btf_value_type_id = 0;
-		if (bpf_map_type__is_map_in_map(def->type) &&
-		    map->inner_map_fd >= 0)
-			create_attr.inner_map_fd = map->inner_map_fd;
-
-		if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
-			create_attr.btf_fd = btf__fd(obj->btf);
-			create_attr.btf_key_type_id = map->btf_key_type_id;
-			create_attr.btf_value_type_id = map->btf_value_type_id;
-		}
-
-		*pfd = bpf_create_map_xattr(&create_attr);
-		if (*pfd < 0 && create_attr.btf_key_type_id) {
-			cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
-			pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
-				   map->name, cp, errno);
-			create_attr.btf_fd = 0;
-			create_attr.btf_key_type_id = 0;
-			create_attr.btf_value_type_id = 0;
-			map->btf_key_type_id = 0;
-			map->btf_value_type_id = 0;
-			*pfd = bpf_create_map_xattr(&create_attr);
-		}
-
+		*pfd = bpf_object__create_map(obj, map);
 		if (*pfd < 0) {
 			size_t j;
 
@@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
 						      &prog->reloc_desc[i]);
 			if (err)
 				return err;
+		} else if (prog->reloc_desc[i].type == RELO_DATA ||
+			   prog->reloc_desc[i].type == RELO_RODATA ||
+			   prog->reloc_desc[i].type == RELO_BSS) {
+			struct bpf_insn *insns = prog->insns;
+			int insn_idx, map_idx, data_off;
+
+			insn_idx = prog->reloc_desc[i].insn_idx;
+			map_idx  = prog->reloc_desc[i].map_idx;
+			data_off = insns[insn_idx].imm;
+
+			if (insn_idx + 1 >= (int)prog->insns_cnt) {
+				pr_warning("relocation out of range: '%s'\n",
+					   prog->section_name);
+				return -LIBBPF_ERRNO__RELOC;
+			}
+			insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
+			insns[insn_idx].imm = obj->maps_global[map_idx].fd;
+			insns[insn_idx + 1].imm = data_off;
 		}
 	}
 
@@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
 
 	CHECK_ERR(bpf_object__elf_init(obj), err, out);
 	CHECK_ERR(bpf_object__check_endianness(obj), err, out);
+	CHECK_ERR(bpf_object__probe_caps(obj), err, out);
 	CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
 	CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
 	CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
@@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
 
 	for (i = 0; i < obj->nr_maps; i++)
 		zclose(obj->maps[i].fd);
-
+	for (i = 0; i < obj->nr_maps_global; i++)
+		zclose(obj->maps_global[i].fd);
 	for (i = 0; i < obj->nr_programs; i++)
 		bpf_program__unload(&obj->programs[i]);
 
@@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
 
 	obj->loaded = true;
 
-	CHECK_ERR(bpf_object__probe_caps(obj), err, out);
 	CHECK_ERR(bpf_object__create_maps(obj), err, out);
 	CHECK_ERR(bpf_object__relocate(obj), err, out);
 	CHECK_ERR(bpf_object__load_progs(obj), err, out);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v2 6/7] bpf, selftest: test global data/bss/rodata sections
  2019-02-28 23:18 [PATCH bpf-next v2 0/7] BPF support for global data Daniel Borkmann
                   ` (4 preceding siblings ...)
  2019-02-28 23:18 ` [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
@ 2019-02-28 23:18 ` Daniel Borkmann
  2019-03-01 19:13   ` Andrii Nakryiko
  2019-02-28 23:18 ` [PATCH bpf-next v2 7/7] bpf, selftest: test {rd,wr}only flags and direct value access Daniel Borkmann
  6 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-02-28 23:18 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb, Daniel Borkmann

From: Joe Stringer <joe@wand.net.nz>

Add tests for libbpf relocation of static variable references
into the .data, .rodata and .bss sections of the ELF. Tests with
different offsets are all passing:

  # ./test_progs
  [...]
  test_static_data_access:PASS:load program 0 nsec
  test_static_data_access:PASS:pass packet 278 nsec
  test_static_data_access:PASS:relocate .bss reference 1 278 nsec
  test_static_data_access:PASS:relocate .data reference 1 278 nsec
  test_static_data_access:PASS:relocate .rodata reference 1 278 nsec
  test_static_data_access:PASS:relocate .bss reference 2 278 nsec
  test_static_data_access:PASS:relocate .data reference 2 278 nsec
  test_static_data_access:PASS:relocate .rodata reference 2 278 nsec
  test_static_data_access:PASS:relocate .bss reference 3 278 nsec
  test_static_data_access:PASS:relocate .bss reference 4 278 nsec
  Summary: 223 PASSED, 0 FAILED

Joint work with Daniel Borkmann.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/testing/selftests/bpf/bpf_helpers.h     |  2 +-
 .../selftests/bpf/progs/test_global_data.c    | 61 +++++++++++++++++++
 tools/testing/selftests/bpf/test_progs.c      | 50 +++++++++++++++
 3 files changed, 112 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_global_data.c

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index d9999f1ed1d2..0463662935f9 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -11,7 +11,7 @@
 /* helper functions called from eBPF programs written in C */
 static void *(*bpf_map_lookup_elem)(void *map, void *key) =
 	(void *) BPF_FUNC_map_lookup_elem;
-static int (*bpf_map_update_elem)(void *map, void *key, void *value,
+static int (*bpf_map_update_elem)(void *map, const void *key, const void *value,
 				  unsigned long long flags) =
 	(void *) BPF_FUNC_map_update_elem;
 static int (*bpf_map_delete_elem)(void *map, void *key) =
diff --git a/tools/testing/selftests/bpf/progs/test_global_data.c b/tools/testing/selftests/bpf/progs/test_global_data.c
new file mode 100644
index 000000000000..2a7cf40b8efb
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_global_data.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Isovalent, Inc.
+
+#include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <string.h>
+
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") result = {
+	.type		= BPF_MAP_TYPE_ARRAY,
+	.key_size	= sizeof(__u32),
+	.value_size	= sizeof(__u64),
+	.max_entries	= 9,
+};
+
+static       __u64 static_bss     = 0;		/* Reloc reference to .bss  section   */
+static       __u64 static_data    = 42;		/* Reloc reference to .data section   */
+static const __u64 static_rodata  = 24;		/* Reloc reference to .rodata section */
+static       __u64 static_bss2    = 0;		/* Reloc reference to .bss  section   */
+static       __u64 static_data2   = 0xffeeff;	/* Reloc reference to .data section   */
+static const __u64 static_rodata2 = 0xabab;	/* Reloc reference to .rodata section */
+static const __u64 static_rodata3 = 0xab;	/* Reloc reference to .rodata section */
+
+SEC("static_data_load")
+int load_static_data(struct __sk_buff *skb)
+{
+	__u32 key;
+
+	key = 0;
+	bpf_map_update_elem(&result, &key, &static_bss, 0);
+
+	key = 1;
+	bpf_map_update_elem(&result, &key, &static_data, 0);
+
+	key = 2;
+	bpf_map_update_elem(&result, &key, &static_rodata, 0);
+
+	key = 3;
+	bpf_map_update_elem(&result, &key, &static_bss2, 0);
+
+	key = 4;
+	bpf_map_update_elem(&result, &key, &static_data2, 0);
+
+	key = 5;
+	bpf_map_update_elem(&result, &key, &static_rodata2, 0);
+
+	key = 6;
+	static_bss2 = 1234;
+	bpf_map_update_elem(&result, &key, &static_bss2, 0);
+
+	key = 7;
+	bpf_map_update_elem(&result, &key, &static_bss, 0);
+
+	key = 8;
+	bpf_map_update_elem(&result, &key, &static_rodata3, 0);
+
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
index c59d2e015d16..a3e64c054572 100644
--- a/tools/testing/selftests/bpf/test_progs.c
+++ b/tools/testing/selftests/bpf/test_progs.c
@@ -738,6 +738,55 @@ static void test_pkt_md_access(void)
 	bpf_object__close(obj);
 }
 
+static void test_static_data_access(void)
+{
+	const char *file = "./test_global_data.o";
+	struct bpf_object *obj;
+	__u32 duration = 0, retval;
+	int i, err, prog_fd, map_fd;
+	uint64_t value;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
+	if (CHECK(err, "load program", "error %d loading %s\n", err, file))
+		return;
+
+	map_fd = bpf_find_map(__func__, obj, "result");
+	if (map_fd < 0) {
+		error_cnt++;
+		goto close_prog;
+	}
+
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				NULL, NULL, &retval, &duration);
+	CHECK(err || retval, "pass packet",
+	      "err %d errno %d retval %d duration %d\n",
+	      err, errno, retval, duration);
+
+	struct {
+		char *name;
+		uint32_t key;
+		uint64_t value;
+	} tests[] = {
+		{ "relocate .bss reference 1",    0, 0 },
+		{ "relocate .data reference 1",   1, 42 },
+		{ "relocate .rodata reference 1", 2, 24 },
+		{ "relocate .bss reference 2",    3, 0 },
+		{ "relocate .data reference 2",   4, 0xffeeff },
+		{ "relocate .rodata reference 2", 5, 0xabab },
+		{ "relocate .bss reference 3",    6, 1234 },
+		{ "relocate .bss reference 4",    7, 0 },
+	};
+	for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+		err = bpf_map_lookup_elem(map_fd, &tests[i].key, &value);
+		CHECK (err || value != tests[i].value, tests[i].name,
+		       "err %d result %lu expected %lu\n",
+		       err, value, tests[i].value);
+	}
+
+close_prog:
+	bpf_object__close(obj);
+}
+
 static void test_obj_name(void)
 {
 	struct {
@@ -2182,6 +2231,7 @@ int main(void)
 	test_map_lock();
 	test_signal_pending(BPF_PROG_TYPE_SOCKET_FILTER);
 	test_signal_pending(BPF_PROG_TYPE_FLOW_DISSECTOR);
+	test_static_data_access();
 
 	printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
 	return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH bpf-next v2 7/7] bpf, selftest: test {rd,wr}only flags and direct value access
  2019-02-28 23:18 [PATCH bpf-next v2 0/7] BPF support for global data Daniel Borkmann
                   ` (5 preceding siblings ...)
  2019-02-28 23:18 ` [PATCH bpf-next v2 6/7] bpf, selftest: test " Daniel Borkmann
@ 2019-02-28 23:18 ` Daniel Borkmann
  6 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-02-28 23:18 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb, Daniel Borkmann

Extend test_verifier with various test cases around the two kernel
extensions, that is, {rd,wr}only map support as well as direct map
value access. All passing, one skipped due to xskmap not present
on test machine:

  # ./test_verifier
  [...]
  #913/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
  #914/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
  Summary: 1352 PASSED, 1 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/include/linux/filter.h                  |  19 +-
 tools/testing/selftests/bpf/test_verifier.c   |  40 ++++-
 .../selftests/bpf/verifier/array_access.c     | 159 ++++++++++++++++
 .../bpf/verifier/direct_value_access.c        | 170 ++++++++++++++++++
 4 files changed, 383 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/direct_value_access.c

diff --git a/tools/include/linux/filter.h b/tools/include/linux/filter.h
index cce0b02c0e28..fdf4383db556 100644
--- a/tools/include/linux/filter.h
+++ b/tools/include/linux/filter.h
@@ -280,8 +280,25 @@
 
 /* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
 
+#define BPF_LD_IMM64_RAW2(DST, SRC, IMM1, IMM2)			\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_DW | BPF_IMM,		\
+		.dst_reg = DST,					\
+		.src_reg = SRC,					\
+		.off   = 0,					\
+		.imm   = IMM1 }),				\
+	((struct bpf_insn) {					\
+		.code  = 0, /* zero is reserved opcode */	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = 0,					\
+		.imm   = IMM2 })
+
 #define BPF_LD_MAP_FD(DST, MAP_FD)				\
-	BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
+	BPF_LD_IMM64_RAW2(DST, BPF_PSEUDO_MAP_FD, MAP_FD, 0)
+
+#define BPF_LD_MAP_VALUE(DST, MAP_FD, VALUE_OFF)		\
+	BPF_LD_IMM64_RAW2(DST, BPF_PSEUDO_MAP_VALUE, MAP_FD, VALUE_OFF)
 
 /* Relative call */
 
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 477a9dcf9fff..50359916b82a 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -51,7 +51,7 @@
 
 #define MAX_INSNS	BPF_MAXINSNS
 #define MAX_FIXUPS	8
-#define MAX_NR_MAPS	14
+#define MAX_NR_MAPS	16
 #define MAX_TEST_RUNS	8
 #define POINTER_VALUE	0xcafe4all
 #define TEST_DATA_LEN	64
@@ -80,6 +80,8 @@ struct bpf_test {
 	int fixup_cgroup_storage[MAX_FIXUPS];
 	int fixup_percpu_cgroup_storage[MAX_FIXUPS];
 	int fixup_map_spin_lock[MAX_FIXUPS];
+	int fixup_map_array_ro[MAX_FIXUPS];
+	int fixup_map_array_wo[MAX_FIXUPS];
 	const char *errstr;
 	const char *errstr_unpriv;
 	uint32_t retval, retval_unpriv, insn_processed;
@@ -277,13 +279,15 @@ static bool skip_unsupported_map(enum bpf_map_type map_type)
 	return false;
 }
 
-static int create_map(uint32_t type, uint32_t size_key,
-		      uint32_t size_value, uint32_t max_elem)
+static int __create_map(uint32_t type, uint32_t size_key,
+			uint32_t size_value, uint32_t max_elem,
+			uint32_t extra_flags)
 {
 	int fd;
 
 	fd = bpf_create_map(type, size_key, size_value, max_elem,
-			    type == BPF_MAP_TYPE_HASH ? BPF_F_NO_PREALLOC : 0);
+			    (type == BPF_MAP_TYPE_HASH ?
+			     BPF_F_NO_PREALLOC : 0) | extra_flags);
 	if (fd < 0) {
 		if (skip_unsupported_map(type))
 			return -1;
@@ -293,6 +297,12 @@ static int create_map(uint32_t type, uint32_t size_key,
 	return fd;
 }
 
+static int create_map(uint32_t type, uint32_t size_key,
+		      uint32_t size_value, uint32_t max_elem)
+{
+	return __create_map(type, size_key, size_value, max_elem, 0);
+}
+
 static void update_map(int fd, int index)
 {
 	struct test_val value = {
@@ -519,6 +529,8 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 	int *fixup_cgroup_storage = test->fixup_cgroup_storage;
 	int *fixup_percpu_cgroup_storage = test->fixup_percpu_cgroup_storage;
 	int *fixup_map_spin_lock = test->fixup_map_spin_lock;
+	int *fixup_map_array_ro = test->fixup_map_array_ro;
+	int *fixup_map_array_wo = test->fixup_map_array_wo;
 
 	if (test->fill_helper)
 		test->fill_helper(test);
@@ -642,6 +654,26 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 			fixup_map_spin_lock++;
 		} while (*fixup_map_spin_lock);
 	}
+	if (*fixup_map_array_ro) {
+		map_fds[14] = __create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
+					   sizeof(struct test_val), 1,
+					   BPF_F_RDONLY_PROG);
+		update_map(map_fds[14], 0);
+		do {
+			prog[*fixup_map_array_ro].imm = map_fds[14];
+			fixup_map_array_ro++;
+		} while (*fixup_map_array_ro);
+	}
+	if (*fixup_map_array_wo) {
+		map_fds[15] = __create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
+					   sizeof(struct test_val), 1,
+					   BPF_F_WRONLY_PROG);
+		update_map(map_fds[15], 0);
+		do {
+			prog[*fixup_map_array_wo].imm = map_fds[15];
+			fixup_map_array_wo++;
+		} while (*fixup_map_array_wo);
+	}
 }
 
 static int set_admin(bool admin)
diff --git a/tools/testing/selftests/bpf/verifier/array_access.c b/tools/testing/selftests/bpf/verifier/array_access.c
index 0dcecaf3ec6f..9a2b6f9b4414 100644
--- a/tools/testing/selftests/bpf/verifier/array_access.c
+++ b/tools/testing/selftests/bpf/verifier/array_access.c
@@ -217,3 +217,162 @@
 	.result = REJECT,
 	.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
+{
+	"valid read map access into a read-only array 1",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_ro = { 3 },
+	.result = ACCEPT,
+	.retval = 28,
+},
+{
+	"valid read map access into a read-only array 2",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 4),
+	BPF_MOV64_IMM(BPF_REG_3, 0),
+	BPF_MOV64_IMM(BPF_REG_4, 0),
+	BPF_MOV64_IMM(BPF_REG_5, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+		     BPF_FUNC_csum_diff),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_array_ro = { 3 },
+	.result = ACCEPT,
+	.retval = -29,
+},
+{
+	"invalid write map access into a read-only array 1",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_ro = { 3 },
+	.result = REJECT,
+	.errstr = "write into map forbidden",
+},
+{
+	"invalid write map access into a read-only array 2",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_4, 8),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+		     BPF_FUNC_skb_load_bytes),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_array_ro = { 4 },
+	.result = REJECT,
+	.errstr = "write into map forbidden",
+},
+{
+	"valid write map access into a write-only array 1",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_wo = { 3 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"valid write map access into a write-only array 2",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_4, 8),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+		     BPF_FUNC_skb_load_bytes),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_array_wo = { 4 },
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"invalid read map access into a write-only array 1",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_wo = { 3 },
+	.result = REJECT,
+	.errstr = "read into map forbidden",
+},
+{
+	"invalid read map access into a write-only array 2",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 4),
+	BPF_MOV64_IMM(BPF_REG_3, 0),
+	BPF_MOV64_IMM(BPF_REG_4, 0),
+	BPF_MOV64_IMM(BPF_REG_5, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+		     BPF_FUNC_csum_diff),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_array_wo = { 3 },
+	.result = REJECT,
+	.errstr = "read into map forbidden",
+},
diff --git a/tools/testing/selftests/bpf/verifier/direct_value_access.c b/tools/testing/selftests/bpf/verifier/direct_value_access.c
new file mode 100644
index 000000000000..0a92ae8cf646
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/direct_value_access.c
@@ -0,0 +1,170 @@
+{
+	"direct map access, write test 1",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 2",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 8),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 3",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 8),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 8, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 4",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 40),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 5",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 32),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 8, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 6",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 40),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 4, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "R1 min value is outside of the array range",
+},
+{
+	"direct map access, write test 7",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, -1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 4, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "direct value offset of 4294967295 is not allowed",
+},
+{
+	"direct map access, write test 8",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, -1, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 9",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 48),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "invalid access to map value pointer",
+},
+{
+	"direct map access, write test 10",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 47),
+	BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 11",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 48),
+	BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "invalid access to map value pointer",
+},
+{
+	"direct map access, write test 12",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, (1<<29)),
+	BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "direct value offset of 536870912 is not allowed",
+},
+{
+	"direct map access, write test 13",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, (1<<29)-1),
+	BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "invalid access to map value pointer, value_size=48 off=536870911",
+},
+{
+	"direct map access, write test 14",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 47),
+	BPF_LD_MAP_VALUE(BPF_REG_2, 0, 46),
+	BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
+	BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1, 3 },
+	.result = ACCEPT,
+	.retval = 0xff,
+},
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-02-28 23:18 ` [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
@ 2019-02-28 23:41   ` Stanislav Fomichev
  2019-03-01  0:19     ` Daniel Borkmann
  2019-03-01  6:53   ` Andrii Nakryiko
  2019-03-01 18:11   ` Yonghong Song
  2 siblings, 1 reply; 46+ messages in thread
From: Stanislav Fomichev @ 2019-02-28 23:41 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: ast, bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb

On 03/01, Daniel Borkmann wrote:
> This work adds BPF loader support for global data sections
> to libbpf. This allows to write BPF programs in more natural
> C-like way by being able to define global variables and const
> data.
> 
> Back at LPC 2018 [0] we presented a first prototype which
> implemented support for global data sections by extending BPF
> syscall where union bpf_attr would get additional memory/size
> pair for each section passed during prog load in order to later
> add this base address into the ldimm64 instruction along with
> the user provided offset when accessing a variable. Consensus
> from LPC was that for proper upstream support, it would be
> more desirable to use maps instead of bpf_attr extension as
> this would allow for introspection of these sections as well
> as potential life updates of their content. This work follows
> this path by taking the following steps from loader side:
> 
>  1) In bpf_object__elf_collect() step we pick up ".data",
>     ".rodata", and ".bss" section information.
> 
>  2) If present, in bpf_object__init_global_maps() we create
>     a map that corresponds to each of the present sections.
>     Given section size and access properties can differ, a
>     single entry array map is created with value size that
>     is corresponding to the ELF section size of .data, .bss
>     or .rodata. In the latter case, the map is created as
>     read-only from program side such that verifier rejects
>     any write attempts into .rodata. In a subsequent step,
>     for .data and .rodata sections, the section content is
>     copied into the map through bpf_map_update_elem(). For
>     .bss this is not necessary since array map is already
>     zero-initialized by default.
> 
>  3) In bpf_program__collect_reloc() step, we record the
>     corresponding map, insn index, and relocation type for
>     the global data.
> 
>  4) And last but not least in the actual relocation step in
>     bpf_program__relocate(), we mark the ldimm64 instruction
>     with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>     imm field the map's file descriptor is stored as similarly
>     done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>     (as ldimm64 is 2-insn wide) we store the access offset
>     into the section.
> 
>  5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>     load will then store the actual target address in order
>     to have a 'map-lookup'-free access. That is, the actual
>     map value base address + offset. The destination register
>     in the verifier will then be marked as PTR_TO_MAP_VALUE,
>     containing the fixed offset as reg->off and backing BPF
>     map as reg->map_ptr. Meaning, it's treated as any other
>     normal map value from verification side, only with
>     efficient, direct value access instead of actual call to
>     map lookup helper as in the typical case.
> 
> Simple example dump of program using globals vars in each
> section:
> 
>   # readelf -a test_global_data.o
>   [...]
>   [ 6] .bss              NOBITS           0000000000000000  00000328
>        0000000000000010  0000000000000000  WA       0     0     8
>   [ 7] .data             PROGBITS         0000000000000000  00000328
>        0000000000000010  0000000000000000  WA       0     0     8
>   [ 8] .rodata           PROGBITS         0000000000000000  00000338
>        0000000000000018  0000000000000000   A       0     0     8
>   [...]
>     95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>     96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>     97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>     98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>     99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>    100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>    101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>   [...]
> 
>   # bpftool prog
>   103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>        loaded_at 2019-02-28T02:02:35+0000  uid 0
>        xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>   # bpftool map show id 63
>   63: array  name .bss  flags 0x0                      <-- .bss area, rw
Can we use <main prog>.bss/data/rodata names? If we load more than one
prog with global data that should make it easier to find which one is which.

>       key 4B  value 16B  max_entries 1  memlock 4096B
>   # bpftool map show id 64
>   64: array  name .data  flags 0x0                     <-- .data area, rw
>       key 4B  value 16B  max_entries 1  memlock 4096B
>   # bpftool map show id 65
>   65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
>       key 4B  value 24B  max_entries 1  memlock 4096B
> 
>   # bpftool prog dump xlated id 103
>   int load_static_data(struct __sk_buff * skb):
>   ; int load_static_data(struct __sk_buff *skb)
>      0: (b7) r1 = 0
>   ; key = 0;
>      1: (63) *(u32 *)(r10 -4) = r1
>      2: (bf) r6 = r10
>   ; int load_static_data(struct __sk_buff *skb)
>      3: (07) r6 += -4
>   ; bpf_map_update_elem(&result, &key, &static_bss, 0);
>      4: (18) r1 = map[id:66]
>      6: (bf) r2 = r6
>      7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
>      9: (b7) r4 = 0
>     10: (85) call array_map_update_elem#99888
>     11: (b7) r1 = 1
>   ; key = 1;
>     12: (63) *(u32 *)(r10 -4) = r1
>   ; bpf_map_update_elem(&result, &key, &static_data, 0);
>     13: (18) r1 = map[id:66]
>     15: (bf) r2 = r6
>     16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
>     18: (b7) r4 = 0
>     19: (85) call array_map_update_elem#99888
>     20: (b7) r1 = 2
>   ; key = 2;
>     21: (63) *(u32 *)(r10 -4) = r1
>   ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
>     22: (18) r1 = map[id:66]
>     24: (bf) r2 = r6
>     25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
>     27: (b7) r4 = 0
>     28: (85) call array_map_update_elem#99888
>     29: (b7) r1 = 3
>   ; key = 3;
>     30: (63) *(u32 *)(r10 -4) = r1
>   ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
>     31: (18) r7 = map[id:63][0]+8         <--.
>     33: (18) r1 = map[id:66]                 |
>     35: (bf) r2 = r6                         |
>     36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
>     38: (b7) r4 = 0
>     39: (85) call array_map_update_elem#99888
>   [...]
> 
> For now .data/.rodata/.bss maps are not exposed via API to the
> user, but this could be done in a subsequent step.
> 
> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> fail for static variables").
> 
> Joint work with Joe Stringer.
> 
>   [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
>       http://vger.kernel.org/lpc-bpf2018.html#session-3
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
>  tools/include/uapi/linux/bpf.h |  10 +-
>  tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
>  2 files changed, 226 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 8884072e1a46..04b26f59b413 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -287,7 +287,7 @@ enum bpf_attach_type {
>  
>  #define BPF_OBJ_NAME_LEN 16U
>  
> -/* Flags for accessing BPF object */
> +/* Flags for accessing BPF object from syscall side. */
>  #define BPF_F_RDONLY		(1U << 3)
>  #define BPF_F_WRONLY		(1U << 4)
>  
> @@ -297,6 +297,14 @@ enum bpf_attach_type {
>  /* Zero-initialize hash function seed. This should only be used for testing. */
>  #define BPF_F_ZERO_SEED		(1U << 6)
>  
> +/* Flags for accessing BPF object from program side. */
> +#define BPF_F_RDONLY_PROG	(1U << 7)
> +#define BPF_F_WRONLY_PROG	(1U << 8)
> +#define BPF_F_ACCESS_MASK	(BPF_F_RDONLY |		\
> +				 BPF_F_RDONLY_PROG |	\
> +				 BPF_F_WRONLY |		\
> +				 BPF_F_WRONLY_PROG)
> +
>  /* flags for BPF_PROG_QUERY */
>  #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
>  
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 8f8f688f3e9b..969bc3d9f02c 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -139,6 +139,9 @@ struct bpf_program {
>  		enum {
>  			RELO_LD64,
>  			RELO_CALL,
> +			RELO_DATA,
> +			RELO_RODATA,
> +			RELO_BSS,
>  		} type;
>  		int insn_idx;
>  		union {
> @@ -174,7 +177,10 @@ struct bpf_program {
>  struct bpf_map {
>  	int fd;
>  	char *name;
> -	size_t offset;
> +	union {
> +		__u32 global_type;
> +		size_t offset;
> +	};
>  	int map_ifindex;
>  	int inner_map_fd;
>  	struct bpf_map_def def;
> @@ -194,6 +200,8 @@ struct bpf_object {
>  	size_t nr_programs;
>  	struct bpf_map *maps;
>  	size_t nr_maps;
> +	struct bpf_map *maps_global;
> +	size_t nr_maps_global;
>  
>  	bool loaded;
>  	bool has_pseudo_calls;
> @@ -209,6 +217,9 @@ struct bpf_object {
>  		Elf *elf;
>  		GElf_Ehdr ehdr;
>  		Elf_Data *symbols;
> +		Elf_Data *global_data;
> +		Elf_Data *global_rodata;
> +		Elf_Data *global_bss;
>  		size_t strtabidx;
>  		struct {
>  			GElf_Shdr shdr;
> @@ -217,6 +228,9 @@ struct bpf_object {
>  		int nr_reloc;
>  		int maps_shndx;
>  		int text_shndx;
> +		int data_shndx;
> +		int rodata_shndx;
> +		int bss_shndx;
>  	} efile;
>  	/*
>  	 * All loaded bpf_object is linked in a list, which is
> @@ -457,6 +471,9 @@ static struct bpf_object *bpf_object__new(const char *path,
>  	obj->efile.obj_buf = obj_buf;
>  	obj->efile.obj_buf_sz = obj_buf_sz;
>  	obj->efile.maps_shndx = -1;
> +	obj->efile.data_shndx = -1;
> +	obj->efile.rodata_shndx = -1;
> +	obj->efile.bss_shndx = -1;
>  
>  	obj->loaded = false;
>  
> @@ -475,6 +492,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
>  		obj->efile.elf = NULL;
>  	}
>  	obj->efile.symbols = NULL;
> +	obj->efile.global_data = NULL;
> +	obj->efile.global_rodata = NULL;
> +	obj->efile.global_bss = NULL;
>  
>  	zfree(&obj->efile.reloc);
>  	obj->efile.nr_reloc = 0;
> @@ -757,6 +777,85 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
>  	return 0;
>  }
>  
> +static int
> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map);
> +
> +static int
> +bpf_object__init_global(struct bpf_object *obj, int i, int type,
> +			const char *name, Elf_Data *map_data)
> +{
> +	struct bpf_map *map = &obj->maps_global[i];
> +	struct bpf_map_def *def = &map->def;
> +	char *cp, errmsg[STRERR_BUFSIZE];
> +	int err, slot0 = 0;
> +
> +	def->type = BPF_MAP_TYPE_ARRAY;
> +	def->key_size = sizeof(int);
> +	def->value_size = map_data->d_size;
> +	def->max_entries = 1;
> +	def->map_flags = type == RELO_RODATA ? BPF_F_RDONLY_PROG : 0;
> +
> +	map->name = strdup(name);
> +	map->global_type = type;
> +	map->fd = bpf_object__create_map(obj, map);
> +	if (map->fd < 0) {
> +		err = map->fd;
> +		cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> +		pr_warning("failed to create map (name: '%s'): %s\n",
> +			   map->name, cp);
> +		goto destroy;
> +	}
> +
> +	pr_debug("create map %s: fd=%d\n", map->name, map->fd);
> +
> +	if (type != RELO_BSS) {
> +		err = bpf_map_update_elem(map->fd, &slot0, map_data->d_buf, 0);
> +		if (err < 0) {
> +			cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> +			pr_warning("failed to update map (name: '%s'): %s\n",
> +				   map->name, cp);
> +			goto destroy;
> +		}
> +
> +		pr_debug("updated map %s with elf data: fd=%d\n", map->name,
> +			 map->fd);
> +	}
> +	return 0;
> +destroy:
> +	for (i = 0; i < obj->nr_maps_global; i++)
> +		zclose(obj->maps_global[i].fd);
> +	return err;
> +}
> +
> +static int
> +bpf_object__init_global_maps(struct bpf_object *obj)
> +{
> +	int nr_maps_global = (obj->efile.data_shndx >= 0) +
> +			     (obj->efile.rodata_shndx >= 0) +
> +			     (obj->efile.bss_shndx >= 0), i, err = 0;
> +
> +	obj->maps_global = calloc(nr_maps_global, sizeof(obj->maps_global[0]));
> +	if (!obj->maps_global) {
> +		pr_warning("alloc maps for object failed\n");
> +		return -ENOMEM;
> +	}
> +
> +	obj->nr_maps_global = nr_maps_global;
> +	for (i = 0; i < obj->nr_maps_global; i++)
> +		obj->maps[i].fd = -1;
> +	i = 0;
> +	if (obj->efile.bss_shndx >= 0)
> +		err = bpf_object__init_global(obj, i++, RELO_BSS, ".bss",
> +					      obj->efile.global_bss);
> +	if (obj->efile.data_shndx >= 0 && !err)
> +		err = bpf_object__init_global(obj, i++, RELO_DATA, ".data",
> +					      obj->efile.global_data);
> +	if (obj->efile.rodata_shndx >= 0 && !err)
> +		err = bpf_object__init_global(obj, i++, RELO_RODATA, ".rodata",
> +					      obj->efile.global_rodata);
> +	return err;
> +}
> +
>  static bool section_have_execinstr(struct bpf_object *obj, int idx)
>  {
>  	Elf_Scn *scn;
> @@ -865,6 +964,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>  					pr_warning("failed to alloc program %s (%s): %s",
>  						   name, obj->path, cp);
>  				}
> +			} else if (strcmp(name, ".data") == 0) {
> +				obj->efile.global_data = data;
> +				obj->efile.data_shndx = idx;
> +			} else if (strcmp(name, ".rodata") == 0) {
> +				obj->efile.global_rodata = data;
> +				obj->efile.rodata_shndx = idx;
>  			}
>  		} else if (sh.sh_type == SHT_REL) {
>  			void *reloc = obj->efile.reloc;
> @@ -892,6 +997,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>  				obj->efile.reloc[n].shdr = sh;
>  				obj->efile.reloc[n].data = data;
>  			}
> +		} else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
> +			obj->efile.global_bss = data;
> +			obj->efile.bss_shndx = idx;
>  		} else {
>  			pr_debug("skip section(%d) %s\n", idx, name);
>  		}
> @@ -923,6 +1031,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>  		if (err)
>  			goto out;
>  	}
> +	if (obj->efile.data_shndx >= 0 ||
> +	    obj->efile.rodata_shndx >= 0 ||
> +	    obj->efile.bss_shndx >= 0) {
> +		err = bpf_object__init_global_maps(obj);
> +		if (err)
> +			goto out;
> +	}
> +
>  	err = bpf_object__init_prog_names(obj);
>  out:
>  	return err;
> @@ -961,6 +1077,11 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>  	Elf_Data *symbols = obj->efile.symbols;
>  	int text_shndx = obj->efile.text_shndx;
>  	int maps_shndx = obj->efile.maps_shndx;
> +	int data_shndx = obj->efile.data_shndx;
> +	int rodata_shndx = obj->efile.rodata_shndx;
> +	int bss_shndx = obj->efile.bss_shndx;
> +	struct bpf_map *maps_global = obj->maps_global;
> +	size_t nr_maps_global = obj->nr_maps_global;
>  	struct bpf_map *maps = obj->maps;
>  	size_t nr_maps = obj->nr_maps;
>  	int i, nrels;
> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>  			 (long long) (rel.r_info >> 32),
>  			 (long long) sym.st_value, sym.st_name);
>  
> -		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> -			pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> +		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> +		    sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> +		    sym.st_shndx != bss_shndx) {
> +			pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
>  				   prog->section_name, sym.st_shndx);
>  			return -LIBBPF_ERRNO__RELOC;
>  		}
> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>  			prog->reloc_desc[i].type = RELO_LD64;
>  			prog->reloc_desc[i].insn_idx = insn_idx;
>  			prog->reloc_desc[i].map_idx = map_idx;
> +		} else if (sym.st_shndx == data_shndx ||
> +			   sym.st_shndx == rodata_shndx ||
> +			   sym.st_shndx == bss_shndx) {
> +			int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
> +				   (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> +								    RELO_BSS;
> +
> +			for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> +				if (maps_global[map_idx].global_type == type) {
> +					pr_debug("relocation: find map %zd (%s) for insn %u\n",
> +						 map_idx, maps_global[map_idx].name, insn_idx);
> +					break;
> +				}
> +			}
> +
> +			if (map_idx >= nr_maps_global) {
> +				pr_warning("bpf relocation: map_idx %d large than %d\n",
> +					   (int)map_idx, (int)nr_maps_global - 1);
> +				return -LIBBPF_ERRNO__RELOC;
> +			}
> +
> +			prog->reloc_desc[i].type = type;
> +			prog->reloc_desc[i].insn_idx = insn_idx;
> +			prog->reloc_desc[i].map_idx = map_idx;
>  		}
>  	}
>  	return 0;
> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
>  }
>  
>  static int
> -bpf_object__create_maps(struct bpf_object *obj)
> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map)
>  {
>  	struct bpf_create_map_attr create_attr = {};
> +	struct bpf_map_def *def = &map->def;
> +	char *cp, errmsg[STRERR_BUFSIZE];
> +	int fd;
> +
> +	if (obj->caps.name)
> +		create_attr.name = map->name;
> +	create_attr.map_ifindex = map->map_ifindex;
> +	create_attr.map_type = def->type;
> +	create_attr.map_flags = def->map_flags;
> +	create_attr.key_size = def->key_size;
> +	create_attr.value_size = def->value_size;
> +	create_attr.max_entries = def->max_entries;
> +	create_attr.btf_fd = 0;
> +	create_attr.btf_key_type_id = 0;
> +	create_attr.btf_value_type_id = 0;
> +	if (bpf_map_type__is_map_in_map(def->type) &&
> +	    map->inner_map_fd >= 0)
> +		create_attr.inner_map_fd = map->inner_map_fd;
> +	if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> +		create_attr.btf_fd = btf__fd(obj->btf);
> +		create_attr.btf_key_type_id = map->btf_key_type_id;
> +		create_attr.btf_value_type_id = map->btf_value_type_id;
> +	}
> +
> +	fd = bpf_create_map_xattr(&create_attr);
> +	if (fd < 0 && create_attr.btf_key_type_id) {
> +		cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> +		pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> +			   map->name, cp, errno);
> +
> +		create_attr.btf_fd = 0;
> +		create_attr.btf_key_type_id = 0;
> +		create_attr.btf_value_type_id = 0;
> +		map->btf_key_type_id = 0;
> +		map->btf_value_type_id = 0;
> +		fd = bpf_create_map_xattr(&create_attr);
> +	}
> +
> +	return fd;
> +}
> +
> +static int
> +bpf_object__create_maps(struct bpf_object *obj)
> +{
>  	unsigned int i;
>  	int err;
>  
>  	for (i = 0; i < obj->nr_maps; i++) {
>  		struct bpf_map *map = &obj->maps[i];
> -		struct bpf_map_def *def = &map->def;
>  		char *cp, errmsg[STRERR_BUFSIZE];
>  		int *pfd = &map->fd;
>  
> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>  				 map->name, map->fd);
>  			continue;
>  		}
> -
> -		if (obj->caps.name)
> -			create_attr.name = map->name;
> -		create_attr.map_ifindex = map->map_ifindex;
> -		create_attr.map_type = def->type;
> -		create_attr.map_flags = def->map_flags;
> -		create_attr.key_size = def->key_size;
> -		create_attr.value_size = def->value_size;
> -		create_attr.max_entries = def->max_entries;
> -		create_attr.btf_fd = 0;
> -		create_attr.btf_key_type_id = 0;
> -		create_attr.btf_value_type_id = 0;
> -		if (bpf_map_type__is_map_in_map(def->type) &&
> -		    map->inner_map_fd >= 0)
> -			create_attr.inner_map_fd = map->inner_map_fd;
> -
> -		if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> -			create_attr.btf_fd = btf__fd(obj->btf);
> -			create_attr.btf_key_type_id = map->btf_key_type_id;
> -			create_attr.btf_value_type_id = map->btf_value_type_id;
> -		}
> -
> -		*pfd = bpf_create_map_xattr(&create_attr);
> -		if (*pfd < 0 && create_attr.btf_key_type_id) {
> -			cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> -			pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> -				   map->name, cp, errno);
> -			create_attr.btf_fd = 0;
> -			create_attr.btf_key_type_id = 0;
> -			create_attr.btf_value_type_id = 0;
> -			map->btf_key_type_id = 0;
> -			map->btf_value_type_id = 0;
> -			*pfd = bpf_create_map_xattr(&create_attr);
> -		}
> -
> +		*pfd = bpf_object__create_map(obj, map);
>  		if (*pfd < 0) {
>  			size_t j;
>  
> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>  						      &prog->reloc_desc[i]);
>  			if (err)
>  				return err;
> +		} else if (prog->reloc_desc[i].type == RELO_DATA ||
> +			   prog->reloc_desc[i].type == RELO_RODATA ||
> +			   prog->reloc_desc[i].type == RELO_BSS) {
> +			struct bpf_insn *insns = prog->insns;
> +			int insn_idx, map_idx, data_off;
> +
> +			insn_idx = prog->reloc_desc[i].insn_idx;
> +			map_idx  = prog->reloc_desc[i].map_idx;
> +			data_off = insns[insn_idx].imm;
> +
> +			if (insn_idx + 1 >= (int)prog->insns_cnt) {
> +				pr_warning("relocation out of range: '%s'\n",
> +					   prog->section_name);
> +				return -LIBBPF_ERRNO__RELOC;
> +			}
> +			insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> +			insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> +			insns[insn_idx + 1].imm = data_off;
>  		}
>  	}
>  
> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
>  
>  	CHECK_ERR(bpf_object__elf_init(obj), err, out);
>  	CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> +	CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>  	CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
>  	CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
>  	CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
>  
>  	for (i = 0; i < obj->nr_maps; i++)
>  		zclose(obj->maps[i].fd);
> -
> +	for (i = 0; i < obj->nr_maps_global; i++)
> +		zclose(obj->maps_global[i].fd);
>  	for (i = 0; i < obj->nr_programs; i++)
>  		bpf_program__unload(&obj->programs[i]);
>  
> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
>  
>  	obj->loaded = true;
>  
> -	CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>  	CHECK_ERR(bpf_object__create_maps(obj), err, out);
>  	CHECK_ERR(bpf_object__relocate(obj), err, out);
>  	CHECK_ERR(bpf_object__load_progs(obj), err, out);
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-02-28 23:41   ` Stanislav Fomichev
@ 2019-03-01  0:19     ` Daniel Borkmann
  2019-03-02  0:23       ` Yonghong Song
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01  0:19 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: ast, bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin,
	jakub.kicinski, lmb

On 03/01/2019 12:41 AM, Stanislav Fomichev wrote:
> On 03/01, Daniel Borkmann wrote:
>> This work adds BPF loader support for global data sections
>> to libbpf. This allows to write BPF programs in more natural
>> C-like way by being able to define global variables and const
>> data.
>>
>> Back at LPC 2018 [0] we presented a first prototype which
>> implemented support for global data sections by extending BPF
>> syscall where union bpf_attr would get additional memory/size
>> pair for each section passed during prog load in order to later
>> add this base address into the ldimm64 instruction along with
>> the user provided offset when accessing a variable. Consensus
>> from LPC was that for proper upstream support, it would be
>> more desirable to use maps instead of bpf_attr extension as
>> this would allow for introspection of these sections as well
>> as potential life updates of their content. This work follows
>> this path by taking the following steps from loader side:
>>
>>  1) In bpf_object__elf_collect() step we pick up ".data",
>>     ".rodata", and ".bss" section information.
>>
>>  2) If present, in bpf_object__init_global_maps() we create
>>     a map that corresponds to each of the present sections.
>>     Given section size and access properties can differ, a
>>     single entry array map is created with value size that
>>     is corresponding to the ELF section size of .data, .bss
>>     or .rodata. In the latter case, the map is created as
>>     read-only from program side such that verifier rejects
>>     any write attempts into .rodata. In a subsequent step,
>>     for .data and .rodata sections, the section content is
>>     copied into the map through bpf_map_update_elem(). For
>>     .bss this is not necessary since array map is already
>>     zero-initialized by default.
>>
>>  3) In bpf_program__collect_reloc() step, we record the
>>     corresponding map, insn index, and relocation type for
>>     the global data.
>>
>>  4) And last but not least in the actual relocation step in
>>     bpf_program__relocate(), we mark the ldimm64 instruction
>>     with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>>     imm field the map's file descriptor is stored as similarly
>>     done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>>     (as ldimm64 is 2-insn wide) we store the access offset
>>     into the section.
>>
>>  5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>>     load will then store the actual target address in order
>>     to have a 'map-lookup'-free access. That is, the actual
>>     map value base address + offset. The destination register
>>     in the verifier will then be marked as PTR_TO_MAP_VALUE,
>>     containing the fixed offset as reg->off and backing BPF
>>     map as reg->map_ptr. Meaning, it's treated as any other
>>     normal map value from verification side, only with
>>     efficient, direct value access instead of actual call to
>>     map lookup helper as in the typical case.
>>
>> Simple example dump of program using globals vars in each
>> section:
>>
>>   # readelf -a test_global_data.o
>>   [...]
>>   [ 6] .bss              NOBITS           0000000000000000  00000328
>>        0000000000000010  0000000000000000  WA       0     0     8
>>   [ 7] .data             PROGBITS         0000000000000000  00000328
>>        0000000000000010  0000000000000000  WA       0     0     8
>>   [ 8] .rodata           PROGBITS         0000000000000000  00000338
>>        0000000000000018  0000000000000000   A       0     0     8
>>   [...]
>>     95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>>     96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>>     97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>>     98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>>     99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>>    100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>>    101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>>   [...]
>>
>>   # bpftool prog
>>   103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>>        loaded_at 2019-02-28T02:02:35+0000  uid 0
>>        xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>>   # bpftool map show id 63
>>   63: array  name .bss  flags 0x0                      <-- .bss area, rw
> Can we use <main prog>.bss/data/rodata names? If we load more than one
> prog with global data that should make it easier to find which one is which.

Yeah that's fine, we can change it. They could potentially also be shared,
so <main prog>.bss/data/rodata might be misleading, but <obj>.bss/data/rodata
could be.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-02-28 23:18 ` [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access Daniel Borkmann
@ 2019-03-01  3:33   ` Jann Horn
  2019-03-01  3:58   ` kbuild test robot
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: Jann Horn @ 2019-03-01  3:33 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Network Development, joe,
	john.fastabend, tgraf, yhs, andriin, jakub.kicinski, lmb

On Fri, Mar 1, 2019 at 12:19 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> This generic extension to BPF maps allows for directly loading an
> address residing inside a BPF map value as a single BPF ldimm64
> instruction.
[...]
> @@ -6698,16 +6705,44 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                                 return err;
>                         }
>
> -                       /* store map pointer inside BPF_LD_IMM64 instruction */
> -                       insn[0].imm = (u32) (unsigned long) map;
> -                       insn[1].imm = ((u64) (unsigned long) map) >> 32;
> +                       aux = &env->insn_aux_data[i];
> +                       if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
> +                               addr = (unsigned long)map;
> +                       } else {
> +                               u32 off = insn[1].imm;
> +
> +                               if (off >= BPF_MAX_VAR_OFF) {
> +                                       verbose(env, "direct value offset of %u is not allowed\n",
> +                                               off);
> +                                       return -EINVAL;
> +                               }
> +                               if (!map->ops->map_direct_value_access) {
> +                                       verbose(env, "no direct value access support for this map type\n");
> +                                       return -EINVAL;
> +                               }
> +
> +                               err = map->ops->map_direct_value_access(map, off, &addr);
> +                               if (err) {
> +                                       verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
> +                                               map->value_size, off);
> +                                       return err;
> +                               }

All these error returns need fdput(f), I think.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 2/7] bpf: add program side {rd,wr}only support
  2019-02-28 23:18 ` [PATCH bpf-next v2 2/7] bpf: add program side {rd,wr}only support Daniel Borkmann
@ 2019-03-01  3:51   ` Jakub Kicinski
  2019-03-01  9:01     ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Jakub Kicinski @ 2019-03-01  3:51 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: ast, bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin, lmb

On Fri,  1 Mar 2019 00:18:24 +0100, Daniel Borkmann wrote:
> This work adds two new map creation flags BPF_F_RDONLY_PROG
> and BPF_F_WRONLY_PROG in order to allow for read-only or
> write-only BPF maps from a BPF program side.
> 
> Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
> applies to system call side, meaning the BPF program has full
> read/write access to the map as usual while bpf(2) calls with
> map fd can either only read or write into the map depending
> on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
> for the exact opposite such that verifier is going to reject
> program loads if write into a read-only map or a read into a
> write-only map is detected.
> 
> We've enabled this generic map extension to various non-special
> maps holding normal user data: array, hash, lru, lpm, local
> storage, queue and stack. Further map types could be followed
> up in future depending on use-case. Main use case here is to
> forbid writes into .rodata map values from verifier side.

This will also enable optimizing the accesses on system with rich
memory architecture :)

> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/linux/bpf.h           | 18 ++++++++++++++++++
>  include/uapi/linux/bpf.h      | 10 +++++++++-
>  kernel/bpf/arraymap.c         |  2 +-
>  kernel/bpf/hashtab.c          |  2 +-
>  kernel/bpf/local_storage.c    |  2 +-
>  kernel/bpf/lpm_trie.c         |  2 +-
>  kernel/bpf/queue_stack_maps.c |  3 +--
>  kernel/bpf/verifier.c         | 30 +++++++++++++++++++++++++++++-
>  8 files changed, 61 insertions(+), 8 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index bdcc6e2a9977..3f74194dd4f6 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -427,6 +427,24 @@ struct bpf_array {
>  	};
>  };
>  
> +#define BPF_MAP_CAN_READ	BIT(0)
> +#define BPF_MAP_CAN_WRITE	BIT(1)
> +
> +static inline u32 bpf_map_flags_to_cap(struct bpf_map *map)
> +{
> +	u32 access_flags = map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
> +
> +	/* Combination of BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG is
> +	 * not possible.
> +	 */

minor nit: we do check that old RDONLY and WRONLY are not set at the
           same time, but here it's not done?

> +	if (access_flags & BPF_F_RDONLY_PROG)
> +		return BPF_MAP_CAN_READ;
> +	else if (access_flags & BPF_F_WRONLY_PROG)
> +		return BPF_MAP_CAN_WRITE;
> +	else
> +		return BPF_MAP_CAN_READ | BPF_MAP_CAN_WRITE;
> +}

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-02-28 23:18 ` [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access Daniel Borkmann
  2019-03-01  3:33   ` Jann Horn
@ 2019-03-01  3:58   ` kbuild test robot
  2019-03-01  5:46   ` Andrii Nakryiko
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 46+ messages in thread
From: kbuild test robot @ 2019-03-01  3:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: kbuild-all, ast, bpf, netdev, joe, john.fastabend, tgraf, yhs,
	andriin, jakub.kicinski, lmb, Daniel Borkmann

[-- Attachment #1: Type: text/plain, Size: 1568 bytes --]

Hi Daniel,

I love your patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Daniel-Borkmann/BPF-support-for-global-data/20190301-112203
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: x86_64-randconfig-x017-201908 (attached as .config)
compiler: gcc-8 (Debian 8.2.0-21) 8.2.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   kernel//bpf/arraymap.c: In function 'array_map_direct_value_offset':
>> kernel//bpf/arraymap.c:182:24: warning: initialization of 'long unsigned int' from 'char *' makes integer from pointer without a cast [-Wint-conversion]
     unsigned long base  = array->value;
                           ^~~~~

vim +182 kernel//bpf/arraymap.c

   176	
   177	static int array_map_direct_value_offset(const struct bpf_map *map, u64 imm,
   178						 u32 *off)
   179	{
   180		struct bpf_array *array = container_of(map, struct bpf_array, map);
   181		unsigned long range = map->value_size;
 > 182		unsigned long base  = array->value;
   183		unsigned long addr  = imm;
   184	
   185		if (map->max_entries != 1)
   186			return -ENOENT;
   187		if (addr < base || addr >= base + range)
   188			return -ENOENT;
   189	
   190		*off = addr - base;
   191		return 0;
   192	}
   193	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 30095 bytes --]

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-02-28 23:18 ` [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access Daniel Borkmann
  2019-03-01  3:33   ` Jann Horn
  2019-03-01  3:58   ` kbuild test robot
@ 2019-03-01  5:46   ` Andrii Nakryiko
  2019-03-01  9:49     ` Daniel Borkmann
  2019-03-01 17:18   ` Yonghong Song
  2019-03-04  6:03   ` Andrii Nakryiko
  4 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-01  5:46 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, joe, john.fastabend, tgraf,
	Yonghong Song, Andrii Nakryiko, Jakub Kicinski, lmb

On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> This generic extension to BPF maps allows for directly loading an
> address residing inside a BPF map value as a single BPF ldimm64
> instruction.

This is great! I'm going to review code more thoroughly tomorrow, but
I also have few questions/suggestions I'd like to discuss, if you
don't mind.

>
> The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
> is a special src_reg flag for ldimm64 instruction that indicates
> that inside the first part of the double insns's imm field is a
> file descriptor which the verifier then replaces as a full 64bit
> address of the map into both imm parts.
>
> For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea
> is similar: the first part of the double insns's imm field is
> again a file descriptor corresponding to the map, and the second
> part of the imm field is an offset. The verifier will then replace
> both imm parts with an address that points into the BPF map value
> for maps that support this operation. BPF_PSEUDO_MAP_VALUE is a
> distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not
> differ offset 0 between load of map pointer versus load of map's
> value at offset 0.

Is having both BPF_PSEUDO_MAP_FD and BPF_PSEUDO_MAP_VALUE a desirable
thing? I'm asking because it's seems like it would be really simple to
stick to using just BPF_PSEUDO_MAP_FD and then interpret imm
differently depending on whether it's 0 or not. E.g., we can say that
imm=0 is old BPF_PSEUDO_MAP_FD behavior (loading map addr), but any
other imm value X is really just (X-1) offset into map's value? Or,
given that valid offset is limited to 1<<29, we can set highest-order
bit to 1 and lower bits would be offset? In other words, if we just
need too carve out zero as a special case, then it's easy to do and we
can avoid adding new BPF_PSEUDO_MAP_VALUE.

>
> This allows for efficiently retrieving an address to a map value
> memory area without having to issue a helper call which needs to
> prepare registers according to calling convention, etc, without
> needing the extra NULL test, and without having to add the offset
> in an additional instruction to the value base pointer.

It seems like we allow this only for arrays of size 1 right now. We
can easily generalize this to support not just offset into map's
value, but also specifying integer key (i.e., array index) by
utilizing off fields (16-bit + 16-bit). This would allow to eliminate
any bpf_map_update_elem calls to array maps altogether by allowing to
provide both array index and offset into value in one BPF instruction.
Do you think it's a good addition?

>
> The verifier then treats the destination register as PTR_TO_MAP_VALUE
> with constant reg->off from the user passed offset from the second
> imm field, and guarantees that this is within bounds of the map
> value. Any subsequent operations are normally treated as typical
> map value handling without anything else needed for verification.
>
> The two map operations for direct value access have been added to
> array map for now. In future other types could be supported as
> well depending on the use case. The main use case for this commit
> is to allow for BPF loader support for global variables that
> reside in .data/.rodata/.bss sections such that we can directly
> load the address of them with minimal additional infrastructure
> required. Loader support has been added in subsequent commits for
> libbpf library.

I was considering adding a new kind of map representing contiguous
block of memory (e.g., how about BPF_MAP_TYPE_HEAP or
BPF_MAP_TYPE_BLOB?). It's keys would be offset into that memory
region. Value size is size of the memory region, but it would allow
reading smaller chunks of memory as values. This would provide
convenient interface for poking at global variables from userland,
given offset.

Libbpf itself would provide higher-level API as well, if there is
corresponding BTF type information describing layout of
.data/.bss/.rodata, so that applications can fetch variables by name
and/or offset, whichever is more convenient. Together with
bpf_spinlock this would allow easy way to customize subsets of global
variables in atomic fashion.

Do you think that would work? Using array is a bit limiting, because
it doesn't allow to do partial reads/updates, while BPF_MAP_TYPE_HEAP
would be single big value that allows partial reading/updating.


>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/linux/bpf.h               |  6 +++
>  include/linux/bpf_verifier.h      |  4 ++
>  include/uapi/linux/bpf.h          |  6 ++-
>  kernel/bpf/arraymap.c             | 33 ++++++++++++++
>  kernel/bpf/core.c                 |  3 +-
>  kernel/bpf/disasm.c               |  5 ++-
>  kernel/bpf/syscall.c              | 29 +++++++++---
>  kernel/bpf/verifier.c             | 73 +++++++++++++++++++++++--------
>  tools/bpf/bpftool/xlated_dumper.c |  3 ++
>  tools/include/uapi/linux/bpf.h    |  6 ++-
>  10 files changed, 138 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a2132e09dc1c..bdcc6e2a9977 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -57,6 +57,12 @@ struct bpf_map_ops {
>                              const struct btf *btf,
>                              const struct btf_type *key_type,
>                              const struct btf_type *value_type);
> +
> +       /* Direct value access helpers. */
> +       int (*map_direct_value_access)(const struct bpf_map *map,
> +                                      u32 off, u64 *imm);
> +       int (*map_direct_value_offset)(const struct bpf_map *map,
> +                                      u64 imm, u32 *off);
>  };
>
>  struct bpf_map {
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 69f7a3449eda..6e28f1c24710 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -183,6 +183,10 @@ struct bpf_insn_aux_data {
>                 unsigned long map_state;        /* pointer/poison value for maps */
>                 s32 call_imm;                   /* saved imm field of call insn */
>                 u32 alu_limit;                  /* limit for add/sub register with pointer */
> +               struct {
> +                       u32 map_index;          /* index into used_maps[] */
> +                       u32 map_off;            /* offset from value base address */
> +               };
>         };
>         int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
>         int sanitize_stack_off; /* stack slot to be cleared */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2e308e90ffea..8884072e1a46 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -255,8 +255,12 @@ enum bpf_attach_type {
>   */
>  #define BPF_F_ANY_ALIGNMENT    (1U << 1)
>
> -/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
> +/* When bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_{FD,VALUE}, then
> + * bpf_ldimm64's insn[0]->imm == fd in both cases. Additionally,
> + * for BPF_PSEUDO_MAP_VALUE, insn[1]->imm == offset into value.
> + */
>  #define BPF_PSEUDO_MAP_FD      1
> +#define BPF_PSEUDO_MAP_VALUE   2
>
>  /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>   * offset to another bpf function
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index c72e0d8e1e65..3e5969c0c979 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -160,6 +160,37 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
>         return array->value + array->elem_size * (index & array->index_mask);
>  }
>
> +static int array_map_direct_value_access(const struct bpf_map *map, u32 off,
> +                                        u64 *imm)
> +{
> +       struct bpf_array *array = container_of(map, struct bpf_array, map);
> +
> +       if (map->max_entries != 1)
> +               return -ENOTSUPP;
> +       if (off >= map->value_size)
> +               return -EINVAL;
> +
> +       *imm = (unsigned long)array->value;
> +       return 0;
> +}
> +
> +static int array_map_direct_value_offset(const struct bpf_map *map, u64 imm,
> +                                        u32 *off)
> +{
> +       struct bpf_array *array = container_of(map, struct bpf_array, map);
> +       unsigned long range = map->value_size;
> +       unsigned long base  = array->value;
> +       unsigned long addr  = imm;
> +
> +       if (map->max_entries != 1)
> +               return -ENOENT;
> +       if (addr < base || addr >= base + range)
> +               return -ENOENT;
> +
> +       *off = addr - base;
> +       return 0;
> +}
> +
>  /* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
>  static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
>  {
> @@ -419,6 +450,8 @@ const struct bpf_map_ops array_map_ops = {
>         .map_update_elem = array_map_update_elem,
>         .map_delete_elem = array_map_delete_elem,
>         .map_gen_lookup = array_map_gen_lookup,
> +       .map_direct_value_access = array_map_direct_value_access,
> +       .map_direct_value_offset = array_map_direct_value_offset,
>         .map_seq_show_elem = array_map_seq_show_elem,
>         .map_check_btf = array_map_check_btf,
>  };
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 1c14c347f3cf..49fc0ff14537 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -286,7 +286,8 @@ int bpf_prog_calc_tag(struct bpf_prog *fp)
>                 dst[i] = fp->insnsi[i];
>                 if (!was_ld_map &&
>                     dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
> -                   dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
> +                   (dst[i].src_reg == BPF_PSEUDO_MAP_FD ||
> +                    dst[i].src_reg == BPF_PSEUDO_MAP_VALUE)) {
>                         was_ld_map = true;
>                         dst[i].imm = 0;
>                 } else if (was_ld_map &&
> diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
> index de73f55e42fd..d9ce383c0f9c 100644
> --- a/kernel/bpf/disasm.c
> +++ b/kernel/bpf/disasm.c
> @@ -205,10 +205,11 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
>                          * part of the ldimm64 insn is accessible.
>                          */
>                         u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
> -                       bool map_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD;
> +                       bool is_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD ||
> +                                     insn->src_reg == BPF_PSEUDO_MAP_VALUE;
>                         char tmp[64];
>
> -                       if (map_ptr && !allow_ptr_leaks)
> +                       if (is_ptr && !allow_ptr_leaks)
>                                 imm = 0;
>
>                         verbose(cbs->private_data, "(%02x) r%d = %s\n",
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 174581dfe225..d3ef45e01d7a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2061,13 +2061,27 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
>  }
>
>  static const struct bpf_map *bpf_map_from_imm(const struct bpf_prog *prog,
> -                                             unsigned long addr)
> +                                             unsigned long addr, u32 *off,
> +                                             u32 *type)
>  {
> +       const struct bpf_map *map;
>         int i;
>
> -       for (i = 0; i < prog->aux->used_map_cnt; i++)
> -               if (prog->aux->used_maps[i] == (void *)addr)
> -                       return prog->aux->used_maps[i];
> +       *off = *type = 0;
> +       for (i = 0; i < prog->aux->used_map_cnt; i++) {
> +               map = prog->aux->used_maps[i];
> +               if (map == (void *)addr) {
> +                       *type = BPF_PSEUDO_MAP_FD;
> +                       return map;
> +               }
> +               if (!map->ops->map_direct_value_offset)
> +                       continue;
> +               if (!map->ops->map_direct_value_offset(map, addr, off)) {
> +                       *type = BPF_PSEUDO_MAP_VALUE;
> +                       return map;
> +               }
> +       }
> +
>         return NULL;
>  }
>
> @@ -2075,6 +2089,7 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
>  {
>         const struct bpf_map *map;
>         struct bpf_insn *insns;
> +       u32 off, type;
>         u64 imm;
>         int i;
>
> @@ -2102,11 +2117,11 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
>                         continue;
>
>                 imm = ((u64)insns[i + 1].imm << 32) | (u32)insns[i].imm;
> -               map = bpf_map_from_imm(prog, imm);
> +               map = bpf_map_from_imm(prog, imm, &off, &type);
>                 if (map) {
> -                       insns[i].src_reg = BPF_PSEUDO_MAP_FD;
> +                       insns[i].src_reg = type;
>                         insns[i].imm = map->id;
> -                       insns[i + 1].imm = 0;
> +                       insns[i + 1].imm = off;
>                         continue;
>                 }
>         }
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0e4edd7e3c5f..3ad05dda6e9d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4944,18 +4944,12 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>         return 0;
>  }
>
> -/* return the map pointer stored inside BPF_LD_IMM64 instruction */
> -static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
> -{
> -       u64 imm64 = ((u64) (u32) insn[0].imm) | ((u64) (u32) insn[1].imm) << 32;
> -
> -       return (struct bpf_map *) (unsigned long) imm64;
> -}
> -
>  /* verify BPF_LD_IMM64 instruction */
>  static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  {
> +       struct bpf_insn_aux_data *aux = cur_aux(env);
>         struct bpf_reg_state *regs = cur_regs(env);
> +       struct bpf_map *map;
>         int err;
>
>         if (BPF_SIZE(insn->code) != BPF_DW) {
> @@ -4979,11 +4973,22 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>                 return 0;
>         }
>
> -       /* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
> -       BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
> +       map = env->used_maps[aux->map_index];
> +       mark_reg_known_zero(env, regs, insn->dst_reg);
> +       regs[insn->dst_reg].map_ptr = map;
> +
> +       if (insn->src_reg == BPF_PSEUDO_MAP_VALUE) {
> +               regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
> +               regs[insn->dst_reg].off = aux->map_off;
> +               if (map_value_has_spin_lock(map))
> +                       regs[insn->dst_reg].id = ++env->id_gen;
> +       } else if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
> +               regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
> +       } else {
> +               verbose(env, "bpf verifier is misconfigured\n");
> +               return -EINVAL;
> +       }
>
> -       regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
> -       regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
>         return 0;
>  }
>
> @@ -6664,8 +6669,10 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                 }
>
>                 if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
> +                       struct bpf_insn_aux_data *aux;
>                         struct bpf_map *map;
>                         struct fd f;
> +                       u64 addr;
>
>                         if (i == insn_cnt - 1 || insn[1].code != 0 ||
>                             insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
> @@ -6677,8 +6684,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                         if (insn->src_reg == 0)
>                                 /* valid generic load 64-bit imm */
>                                 goto next_insn;
> -
> -                       if (insn->src_reg != BPF_PSEUDO_MAP_FD) {
> +                       if (insn->src_reg != BPF_PSEUDO_MAP_FD &&
> +                           insn->src_reg != BPF_PSEUDO_MAP_VALUE) {
>                                 verbose(env,
>                                         "unrecognized bpf_ld_imm64 insn\n");
>                                 return -EINVAL;
> @@ -6698,16 +6705,44 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                                 return err;
>                         }
>
> -                       /* store map pointer inside BPF_LD_IMM64 instruction */
> -                       insn[0].imm = (u32) (unsigned long) map;
> -                       insn[1].imm = ((u64) (unsigned long) map) >> 32;
> +                       aux = &env->insn_aux_data[i];
> +                       if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
> +                               addr = (unsigned long)map;
> +                       } else {
> +                               u32 off = insn[1].imm;
> +
> +                               if (off >= BPF_MAX_VAR_OFF) {
> +                                       verbose(env, "direct value offset of %u is not allowed\n",
> +                                               off);
> +                                       return -EINVAL;
> +                               }
> +                               if (!map->ops->map_direct_value_access) {
> +                                       verbose(env, "no direct value access support for this map type\n");
> +                                       return -EINVAL;
> +                               }
> +
> +                               err = map->ops->map_direct_value_access(map, off, &addr);
> +                               if (err) {
> +                                       verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
> +                                               map->value_size, off);
> +                                       return err;
> +                               }
> +
> +                               aux->map_off = off;
> +                               addr += off;
> +                       }
> +
> +                       insn[0].imm = (u32)addr;
> +                       insn[1].imm = addr >> 32;
>
>                         /* check whether we recorded this map already */
> -                       for (j = 0; j < env->used_map_cnt; j++)
> +                       for (j = 0; j < env->used_map_cnt; j++) {
>                                 if (env->used_maps[j] == map) {
> +                                       aux->map_index = j;
>                                         fdput(f);
>                                         goto next_insn;
>                                 }
> +                       }
>
>                         if (env->used_map_cnt >= MAX_USED_MAPS) {
>                                 fdput(f);
> @@ -6724,6 +6759,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                                 fdput(f);
>                                 return PTR_ERR(map);
>                         }
> +
> +                       aux->map_index = env->used_map_cnt;
>                         env->used_maps[env->used_map_cnt++] = map;
>
>                         if (bpf_map_is_cgroup_storage(map) &&
> diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
> index 7073dbe1ff27..0bb17bf88b18 100644
> --- a/tools/bpf/bpftool/xlated_dumper.c
> +++ b/tools/bpf/bpftool/xlated_dumper.c
> @@ -195,6 +195,9 @@ static const char *print_imm(void *private_data,
>         if (insn->src_reg == BPF_PSEUDO_MAP_FD)
>                 snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>                          "map[id:%u]", insn->imm);
> +       else if (insn->src_reg == BPF_PSEUDO_MAP_VALUE)
> +               snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
> +                        "map[id:%u][0]+%u", insn->imm, (insn + 1)->imm);
>         else
>                 snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>                          "0x%llx", (unsigned long long)full_imm);
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 2e308e90ffea..8884072e1a46 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -255,8 +255,12 @@ enum bpf_attach_type {
>   */
>  #define BPF_F_ANY_ALIGNMENT    (1U << 1)
>
> -/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
> +/* When bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_{FD,VALUE}, then
> + * bpf_ldimm64's insn[0]->imm == fd in both cases. Additionally,
> + * for BPF_PSEUDO_MAP_VALUE, insn[1]->imm == offset into value.
> + */
>  #define BPF_PSEUDO_MAP_FD      1
> +#define BPF_PSEUDO_MAP_VALUE   2
>
>  /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>   * offset to another bpf function
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf, obj: allow . char as part of the name
  2019-02-28 23:18 ` [PATCH bpf-next v2 3/7] bpf, obj: allow . char as part of the name Daniel Borkmann
@ 2019-03-01  5:52   ` Andrii Nakryiko
  2019-03-01  9:04     ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-01  5:52 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, joe, john.fastabend, tgraf,
	Yonghong Song, Andrii Nakryiko, Jakub Kicinski, lmb

On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Trivial addition to allow '.' aside from '_' as "special" characters
> in the object name. Used to name maps from loader side as ".bss",
> ".data", ".rodata".
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Acked-by: Andrii Nakryiko <andriin@fb.com>

> ---
>  kernel/bpf/syscall.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index d3ef45e01d7a..90044da3346e 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -440,10 +440,10 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
>         const char *end = src + BPF_OBJ_NAME_LEN;
>
>         memset(dst, 0, BPF_OBJ_NAME_LEN);
> -
> -       /* Copy all isalnum() and '_' char */
> +       /* Copy all isalnum(), '_' and '.' chars. */

Is there any reason names are so restrictive? Say, why not '-' as
well? It's perfectly safe even in filenames. Or even '/' and '\'? Is
this name used by anything else in the system, except for
introspection?

>         while (src < end && *src) {
> -               if (!isalnum(*src) && *src != '_')
> +               if (!isalnum(*src) &&
> +                   *src != '_' && *src != '.')
>                         return -EINVAL;
>                 *dst++ = *src++;
>         }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-02-28 23:18 ` [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
  2019-02-28 23:41   ` Stanislav Fomichev
@ 2019-03-01  6:53   ` Andrii Nakryiko
  2019-03-01 10:46     ` Daniel Borkmann
  2019-03-01 18:11   ` Yonghong Song
  2 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-01  6:53 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, joe, john.fastabend, tgraf,
	Yonghong Song, Andrii Nakryiko, Jakub Kicinski, lmb

On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> This work adds BPF loader support for global data sections
> to libbpf. This allows to write BPF programs in more natural
> C-like way by being able to define global variables and const
> data.
>
> Back at LPC 2018 [0] we presented a first prototype which
> implemented support for global data sections by extending BPF
> syscall where union bpf_attr would get additional memory/size
> pair for each section passed during prog load in order to later
> add this base address into the ldimm64 instruction along with
> the user provided offset when accessing a variable. Consensus
> from LPC was that for proper upstream support, it would be
> more desirable to use maps instead of bpf_attr extension as
> this would allow for introspection of these sections as well
> as potential life updates of their content. This work follows
> this path by taking the following steps from loader side:
>
>  1) In bpf_object__elf_collect() step we pick up ".data",
>     ".rodata", and ".bss" section information.
>
>  2) If present, in bpf_object__init_global_maps() we create
>     a map that corresponds to each of the present sections.

Is there any point in having .data and .bss in separate maps? I can
only see for reasons of inspectiion from bpftool, but other than that
isn't .bss just an optimization over .data to save space in ELF file,
but in other regards is just another part of r/w .data section?

>     Given section size and access properties can differ, a
>     single entry array map is created with value size that
>     is corresponding to the ELF section size of .data, .bss
>     or .rodata. In the latter case, the map is created as
>     read-only from program side such that verifier rejects
>     any write attempts into .rodata. In a subsequent step,
>     for .data and .rodata sections, the section content is
>     copied into the map through bpf_map_update_elem(). For
>     .bss this is not necessary since array map is already
>     zero-initialized by default.

For .rodata, ideally it would be nice to make it RDONLY from userland
as well, except for first UPDATE. How hard is it to support that?

>
>  3) In bpf_program__collect_reloc() step, we record the
>     corresponding map, insn index, and relocation type for
>     the global data.
>
>  4) And last but not least in the actual relocation step in
>     bpf_program__relocate(), we mark the ldimm64 instruction
>     with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>     imm field the map's file descriptor is stored as similarly
>     done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>     (as ldimm64 is 2-insn wide) we store the access offset
>     into the section.
>
>  5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>     load will then store the actual target address in order
>     to have a 'map-lookup'-free access. That is, the actual
>     map value base address + offset. The destination register
>     in the verifier will then be marked as PTR_TO_MAP_VALUE,
>     containing the fixed offset as reg->off and backing BPF
>     map as reg->map_ptr. Meaning, it's treated as any other
>     normal map value from verification side, only with
>     efficient, direct value access instead of actual call to
>     map lookup helper as in the typical case.
>
> Simple example dump of program using globals vars in each
> section:
>
>   # readelf -a test_global_data.o
>   [...]
>   [ 6] .bss              NOBITS           0000000000000000  00000328
>        0000000000000010  0000000000000000  WA       0     0     8
>   [ 7] .data             PROGBITS         0000000000000000  00000328
>        0000000000000010  0000000000000000  WA       0     0     8
>   [ 8] .rodata           PROGBITS         0000000000000000  00000338
>        0000000000000018  0000000000000000   A       0     0     8
>   [...]
>     95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>     96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>     97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>     98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>     99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>    100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>    101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>   [...]
>
>   # bpftool prog
>   103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>        loaded_at 2019-02-28T02:02:35+0000  uid 0
>        xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>   # bpftool map show id 63
>   63: array  name .bss  flags 0x0                      <-- .bss area, rw
>       key 4B  value 16B  max_entries 1  memlock 4096B
>   # bpftool map show id 64
>   64: array  name .data  flags 0x0                     <-- .data area, rw
>       key 4B  value 16B  max_entries 1  memlock 4096B
>   # bpftool map show id 65
>   65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
>       key 4B  value 24B  max_entries 1  memlock 4096B
>
>   # bpftool prog dump xlated id 103
>   int load_static_data(struct __sk_buff * skb):
>   ; int load_static_data(struct __sk_buff *skb)
>      0: (b7) r1 = 0
>   ; key = 0;
>      1: (63) *(u32 *)(r10 -4) = r1
>      2: (bf) r6 = r10
>   ; int load_static_data(struct __sk_buff *skb)
>      3: (07) r6 += -4
>   ; bpf_map_update_elem(&result, &key, &static_bss, 0);
>      4: (18) r1 = map[id:66]
>      6: (bf) r2 = r6
>      7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
>      9: (b7) r4 = 0
>     10: (85) call array_map_update_elem#99888
>     11: (b7) r1 = 1
>   ; key = 1;
>     12: (63) *(u32 *)(r10 -4) = r1
>   ; bpf_map_update_elem(&result, &key, &static_data, 0);
>     13: (18) r1 = map[id:66]
>     15: (bf) r2 = r6
>     16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
>     18: (b7) r4 = 0
>     19: (85) call array_map_update_elem#99888
>     20: (b7) r1 = 2
>   ; key = 2;
>     21: (63) *(u32 *)(r10 -4) = r1
>   ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
>     22: (18) r1 = map[id:66]
>     24: (bf) r2 = r6
>     25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
>     27: (b7) r4 = 0
>     28: (85) call array_map_update_elem#99888
>     29: (b7) r1 = 3
>   ; key = 3;
>     30: (63) *(u32 *)(r10 -4) = r1
>   ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
>     31: (18) r7 = map[id:63][0]+8         <--.
>     33: (18) r1 = map[id:66]                 |
>     35: (bf) r2 = r6                         |
>     36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
>     38: (b7) r4 = 0
>     39: (85) call array_map_update_elem#99888
>   [...]
>
> For now .data/.rodata/.bss maps are not exposed via API to the
> user, but this could be done in a subsequent step.

See comment about BPF_MAP_TYPE_HEAP/BLOB map in comments to patch #1,
it would probably make more useful API for .data/.rodata/.bss.

>
> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> fail for static variables").
>
> Joint work with Joe Stringer.
>
>   [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
>       http://vger.kernel.org/lpc-bpf2018.html#session-3
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
>  tools/include/uapi/linux/bpf.h |  10 +-
>  tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
>  2 files changed, 226 insertions(+), 43 deletions(-)
>
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 8884072e1a46..04b26f59b413 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -287,7 +287,7 @@ enum bpf_attach_type {
>
>  #define BPF_OBJ_NAME_LEN 16U
>
> -/* Flags for accessing BPF object */
> +/* Flags for accessing BPF object from syscall side. */
>  #define BPF_F_RDONLY           (1U << 3)
>  #define BPF_F_WRONLY           (1U << 4)
>
> @@ -297,6 +297,14 @@ enum bpf_attach_type {
>  /* Zero-initialize hash function seed. This should only be used for testing. */
>  #define BPF_F_ZERO_SEED                (1U << 6)
>
> +/* Flags for accessing BPF object from program side. */
> +#define BPF_F_RDONLY_PROG      (1U << 7)
> +#define BPF_F_WRONLY_PROG      (1U << 8)
> +#define BPF_F_ACCESS_MASK      (BPF_F_RDONLY |         \
> +                                BPF_F_RDONLY_PROG |    \
> +                                BPF_F_WRONLY |         \
> +                                BPF_F_WRONLY_PROG)
> +
>  /* flags for BPF_PROG_QUERY */
>  #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 8f8f688f3e9b..969bc3d9f02c 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -139,6 +139,9 @@ struct bpf_program {
>                 enum {
>                         RELO_LD64,
>                         RELO_CALL,
> +                       RELO_DATA,
> +                       RELO_RODATA,
> +                       RELO_BSS,

All three of those are essentially the same relocations, just applied
against different ELF sections.
I think by having just single RELO_GLOBAL_DATA you can actually
simplify a bunch of code below, please see corresponding comments.

>                 } type;
>                 int insn_idx;
>                 union {
> @@ -174,7 +177,10 @@ struct bpf_program {
>  struct bpf_map {
>         int fd;
>         char *name;
> -       size_t offset;
> +       union {
> +               __u32 global_type;

This could be an index into common maps array.

> +               size_t offset;
> +       };
>         int map_ifindex;
>         int inner_map_fd;
>         struct bpf_map_def def;
> @@ -194,6 +200,8 @@ struct bpf_object {
>         size_t nr_programs;
>         struct bpf_map *maps;
>         size_t nr_maps;
> +       struct bpf_map *maps_global;
> +       size_t nr_maps_global;

Global maps could be stored in maps, along other ones, so that we
don't need to keep track of them separately.

Another inconvenience of having a separate array of global maps is
that bpf_map__iter won't iterate them. I don't know if that's
desirable behavior or not, but it probably would be nice to iterate
over global ones as well?

>
>         bool loaded;
>         bool has_pseudo_calls;
> @@ -209,6 +217,9 @@ struct bpf_object {
>                 Elf *elf;
>                 GElf_Ehdr ehdr;
>                 Elf_Data *symbols;
> +               Elf_Data *global_data;
> +               Elf_Data *global_rodata;
> +               Elf_Data *global_bss;
>                 size_t strtabidx;
>                 struct {
>                         GElf_Shdr shdr;
> @@ -217,6 +228,9 @@ struct bpf_object {
>                 int nr_reloc;
>                 int maps_shndx;
>                 int text_shndx;
> +               int data_shndx;
> +               int rodata_shndx;
> +               int bss_shndx;
>         } efile;
>         /*
>          * All loaded bpf_object is linked in a list, which is
> @@ -457,6 +471,9 @@ static struct bpf_object *bpf_object__new(const char *path,
>         obj->efile.obj_buf = obj_buf;
>         obj->efile.obj_buf_sz = obj_buf_sz;
>         obj->efile.maps_shndx = -1;
> +       obj->efile.data_shndx = -1;
> +       obj->efile.rodata_shndx = -1;
> +       obj->efile.bss_shndx = -1;
>
>         obj->loaded = false;
>
> @@ -475,6 +492,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
>                 obj->efile.elf = NULL;
>         }
>         obj->efile.symbols = NULL;
> +       obj->efile.global_data = NULL;
> +       obj->efile.global_rodata = NULL;
> +       obj->efile.global_bss = NULL;
>
>         zfree(&obj->efile.reloc);
>         obj->efile.nr_reloc = 0;
> @@ -757,6 +777,85 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
>         return 0;
>  }
>
> +static int
> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map);
> +
> +static int
> +bpf_object__init_global(struct bpf_object *obj, int i, int type,
> +                       const char *name, Elf_Data *map_data)

Instead of deducing flags and looking up for map by index, you can
just pass struct bpf_map * directly instead of int i and provide
flags, instead of type.

> +{
> +       struct bpf_map *map = &obj->maps_global[i];
> +       struct bpf_map_def *def = &map->def;
> +       char *cp, errmsg[STRERR_BUFSIZE];
> +       int err, slot0 = 0;
> +
> +       def->type = BPF_MAP_TYPE_ARRAY;
> +       def->key_size = sizeof(int);
> +       def->value_size = map_data->d_size;
> +       def->max_entries = 1;
> +       def->map_flags = type == RELO_RODATA ? BPF_F_RDONLY_PROG : 0;
> +
> +       map->name = strdup(name);
> +       map->global_type = type;
> +       map->fd = bpf_object__create_map(obj, map);
> +       if (map->fd < 0) {
> +               err = map->fd;
> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> +               pr_warning("failed to create map (name: '%s'): %s\n",
> +                          map->name, cp);
> +               goto destroy;
> +       }
> +
> +       pr_debug("create map %s: fd=%d\n", map->name, map->fd);
> +
> +       if (type != RELO_BSS) {
> +               err = bpf_map_update_elem(map->fd, &slot0, map_data->d_buf, 0);
> +               if (err < 0) {
> +                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> +                       pr_warning("failed to update map (name: '%s'): %s\n",
> +                                  map->name, cp);
> +                       goto destroy;
> +               }
> +
> +               pr_debug("updated map %s with elf data: fd=%d\n", map->name,
> +                        map->fd);
> +       }
> +       return 0;
> +destroy:
> +       for (i = 0; i < obj->nr_maps_global; i++)
> +               zclose(obj->maps_global[i].fd);
> +       return err;
> +}
> +
> +static int
> +bpf_object__init_global_maps(struct bpf_object *obj)
> +{
> +       int nr_maps_global = (obj->efile.data_shndx >= 0) +
> +                            (obj->efile.rodata_shndx >= 0) +
> +                            (obj->efile.bss_shndx >= 0), i, err = 0;

This looks like a good candidate for separate static function? It can
also be reused below to check if there is any global map present.

> +
> +       obj->maps_global = calloc(nr_maps_global, sizeof(obj->maps_global[0]));
> +       if (!obj->maps_global) {

If nr_maps_global is 0, calloc might or might not return NULL, so this
check might erroneously return error.

> +               pr_warning("alloc maps for object failed\n");
> +               return -ENOMEM;
> +       }
> +
> +       obj->nr_maps_global = nr_maps_global;
> +       for (i = 0; i < obj->nr_maps_global; i++)
> +               obj->maps[i].fd = -1;
> +       i = 0;
> +       if (obj->efile.bss_shndx >= 0)
> +               err = bpf_object__init_global(obj, i++, RELO_BSS, ".bss",
> +                                             obj->efile.global_bss);
> +       if (obj->efile.data_shndx >= 0 && !err)
> +               err = bpf_object__init_global(obj, i++, RELO_DATA, ".data",
> +                                             obj->efile.global_data);
> +       if (obj->efile.rodata_shndx >= 0 && !err)
> +               err = bpf_object__init_global(obj, i++, RELO_RODATA, ".rodata",
> +                                             obj->efile.global_rodata);

Here we know exactly what type of map we are creating, so we can just
directly pass all the required structs/flags/data.

Also, to speed up and simplify relocation processing below, I think
it's better to store map indexes for each of available .bss, .data and
.rodata maps, eliminating another need for having three different
types of data relocations.

> +       return err;
> +}
> +
>  static bool section_have_execinstr(struct bpf_object *obj, int idx)
>  {
>         Elf_Scn *scn;
> @@ -865,6 +964,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>                                         pr_warning("failed to alloc program %s (%s): %s",
>                                                    name, obj->path, cp);
>                                 }
> +                       } else if (strcmp(name, ".data") == 0) {
> +                               obj->efile.global_data = data;
> +                               obj->efile.data_shndx = idx;
> +                       } else if (strcmp(name, ".rodata") == 0) {
> +                               obj->efile.global_rodata = data;
> +                               obj->efile.rodata_shndx = idx;
>                         }

Previously if we encountered unknown PROGBITS section, we'd emit debug
message about skipping section, should we add that message here?

>                 } else if (sh.sh_type == SHT_REL) {
>                         void *reloc = obj->efile.reloc;
> @@ -892,6 +997,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>                                 obj->efile.reloc[n].shdr = sh;
>                                 obj->efile.reloc[n].data = data;
>                         }
> +               } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
> +                       obj->efile.global_bss = data;
> +                       obj->efile.bss_shndx = idx;
>                 } else {
>                         pr_debug("skip section(%d) %s\n", idx, name);
>                 }
> @@ -923,6 +1031,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>                 if (err)
>                         goto out;
>         }
> +       if (obj->efile.data_shndx >= 0 ||
> +           obj->efile.rodata_shndx >= 0 ||
> +           obj->efile.bss_shndx >= 0) {
> +               err = bpf_object__init_global_maps(obj);
> +               if (err)
> +                       goto out;
> +       }
> +
>         err = bpf_object__init_prog_names(obj);
>  out:
>         return err;
> @@ -961,6 +1077,11 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>         Elf_Data *symbols = obj->efile.symbols;
>         int text_shndx = obj->efile.text_shndx;
>         int maps_shndx = obj->efile.maps_shndx;
> +       int data_shndx = obj->efile.data_shndx;
> +       int rodata_shndx = obj->efile.rodata_shndx;
> +       int bss_shndx = obj->efile.bss_shndx;
> +       struct bpf_map *maps_global = obj->maps_global;
> +       size_t nr_maps_global = obj->nr_maps_global;
>         struct bpf_map *maps = obj->maps;
>         size_t nr_maps = obj->nr_maps;
>         int i, nrels;
> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                          (long long) (rel.r_info >> 32),
>                          (long long) sym.st_value, sym.st_name);
>
> -               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> -                       pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> +               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> +                   sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> +                   sym.st_shndx != bss_shndx) {
> +                       pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
>                                    prog->section_name, sym.st_shndx);
>                         return -LIBBPF_ERRNO__RELOC;
>                 }
> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                         prog->reloc_desc[i].type = RELO_LD64;
>                         prog->reloc_desc[i].insn_idx = insn_idx;
>                         prog->reloc_desc[i].map_idx = map_idx;
> +               } else if (sym.st_shndx == data_shndx ||
> +                          sym.st_shndx == rodata_shndx ||
> +                          sym.st_shndx == bss_shndx) {
> +                       int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
> +                                  (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> +                                                                   RELO_BSS;
> +
> +                       for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> +                               if (maps_global[map_idx].global_type == type) {
> +                                       pr_debug("relocation: find map %zd (%s) for insn %u\n",
> +                                                map_idx, maps_global[map_idx].name, insn_idx);
> +                                       break;
> +                               }
> +                       }
> +
> +                       if (map_idx >= nr_maps_global) {
> +                               pr_warning("bpf relocation: map_idx %d large than %d\n",
> +                                          (int)map_idx, (int)nr_maps_global - 1);
> +                               return -LIBBPF_ERRNO__RELOC;
> +                       }

We don't need to handle all of this if we just remember global map
indicies during creation, instead of calculating type, we can just
pick correct index (and check it exists). And type can be just generic
RELO_DATA.

> +
> +                       prog->reloc_desc[i].type = type;
> +                       prog->reloc_desc[i].insn_idx = insn_idx;
> +                       prog->reloc_desc[i].map_idx = map_idx;
>                 }
>         }
>         return 0;
> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
>  }
>
>  static int
> -bpf_object__create_maps(struct bpf_object *obj)
> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map)
>  {
>         struct bpf_create_map_attr create_attr = {};
> +       struct bpf_map_def *def = &map->def;
> +       char *cp, errmsg[STRERR_BUFSIZE];
> +       int fd;
> +
> +       if (obj->caps.name)
> +               create_attr.name = map->name;
> +       create_attr.map_ifindex = map->map_ifindex;
> +       create_attr.map_type = def->type;
> +       create_attr.map_flags = def->map_flags;
> +       create_attr.key_size = def->key_size;
> +       create_attr.value_size = def->value_size;
> +       create_attr.max_entries = def->max_entries;
> +       create_attr.btf_fd = 0;
> +       create_attr.btf_key_type_id = 0;
> +       create_attr.btf_value_type_id = 0;
> +       if (bpf_map_type__is_map_in_map(def->type) &&
> +           map->inner_map_fd >= 0)
> +               create_attr.inner_map_fd = map->inner_map_fd;
> +       if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> +               create_attr.btf_fd = btf__fd(obj->btf);
> +               create_attr.btf_key_type_id = map->btf_key_type_id;
> +               create_attr.btf_value_type_id = map->btf_value_type_id;
> +       }
> +
> +       fd = bpf_create_map_xattr(&create_attr);
> +       if (fd < 0 && create_attr.btf_key_type_id) {
> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> +               pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> +                          map->name, cp, errno);
> +
> +               create_attr.btf_fd = 0;
> +               create_attr.btf_key_type_id = 0;
> +               create_attr.btf_value_type_id = 0;
> +               map->btf_key_type_id = 0;
> +               map->btf_value_type_id = 0;
> +               fd = bpf_create_map_xattr(&create_attr);
> +       }
> +
> +       return fd;
> +}
> +
> +static int
> +bpf_object__create_maps(struct bpf_object *obj)
> +{
>         unsigned int i;
>         int err;
>
>         for (i = 0; i < obj->nr_maps; i++) {
>                 struct bpf_map *map = &obj->maps[i];
> -               struct bpf_map_def *def = &map->def;
>                 char *cp, errmsg[STRERR_BUFSIZE];
>                 int *pfd = &map->fd;
>
> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>                                  map->name, map->fd);
>                         continue;
>                 }
> -
> -               if (obj->caps.name)
> -                       create_attr.name = map->name;
> -               create_attr.map_ifindex = map->map_ifindex;
> -               create_attr.map_type = def->type;
> -               create_attr.map_flags = def->map_flags;
> -               create_attr.key_size = def->key_size;
> -               create_attr.value_size = def->value_size;
> -               create_attr.max_entries = def->max_entries;
> -               create_attr.btf_fd = 0;
> -               create_attr.btf_key_type_id = 0;
> -               create_attr.btf_value_type_id = 0;
> -               if (bpf_map_type__is_map_in_map(def->type) &&
> -                   map->inner_map_fd >= 0)
> -                       create_attr.inner_map_fd = map->inner_map_fd;
> -
> -               if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> -                       create_attr.btf_fd = btf__fd(obj->btf);
> -                       create_attr.btf_key_type_id = map->btf_key_type_id;
> -                       create_attr.btf_value_type_id = map->btf_value_type_id;
> -               }
> -
> -               *pfd = bpf_create_map_xattr(&create_attr);
> -               if (*pfd < 0 && create_attr.btf_key_type_id) {
> -                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> -                       pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> -                                  map->name, cp, errno);
> -                       create_attr.btf_fd = 0;
> -                       create_attr.btf_key_type_id = 0;
> -                       create_attr.btf_value_type_id = 0;
> -                       map->btf_key_type_id = 0;
> -                       map->btf_value_type_id = 0;
> -                       *pfd = bpf_create_map_xattr(&create_attr);
> -               }
> -
> +               *pfd = bpf_object__create_map(obj, map);
>                 if (*pfd < 0) {
>                         size_t j;
>
> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>                                                       &prog->reloc_desc[i]);
>                         if (err)
>                                 return err;
> +               } else if (prog->reloc_desc[i].type == RELO_DATA ||
> +                          prog->reloc_desc[i].type == RELO_RODATA ||
> +                          prog->reloc_desc[i].type == RELO_BSS) {
> +                       struct bpf_insn *insns = prog->insns;
> +                       int insn_idx, map_idx, data_off;
> +
> +                       insn_idx = prog->reloc_desc[i].insn_idx;
> +                       map_idx  = prog->reloc_desc[i].map_idx;
> +                       data_off = insns[insn_idx].imm;
> +
> +                       if (insn_idx + 1 >= (int)prog->insns_cnt) {
> +                               pr_warning("relocation out of range: '%s'\n",
> +                                          prog->section_name);
> +                               return -LIBBPF_ERRNO__RELOC;
> +                       }
> +                       insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> +                       insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> +                       insns[insn_idx + 1].imm = data_off;
>                 }
>         }
>
> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
>
>         CHECK_ERR(bpf_object__elf_init(obj), err, out);
>         CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> +       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>         CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
>         CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
>         CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
>
>         for (i = 0; i < obj->nr_maps; i++)
>                 zclose(obj->maps[i].fd);
> -
> +       for (i = 0; i < obj->nr_maps_global; i++)
> +               zclose(obj->maps_global[i].fd);
>         for (i = 0; i < obj->nr_programs; i++)
>                 bpf_program__unload(&obj->programs[i]);
>
> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
>
>         obj->loaded = true;
>
> -       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>         CHECK_ERR(bpf_object__create_maps(obj), err, out);
>         CHECK_ERR(bpf_object__relocate(obj), err, out);
>         CHECK_ERR(bpf_object__load_progs(obj), err, out);
> --
> 2.17.1
>

I'm sorry if I seem a bit too obsessed with those three new relocation
types. I just believe that having one generic and storing global maps
along with other maps is cleaner and more uniform.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 2/7] bpf: add program side {rd,wr}only support
  2019-03-01  3:51   ` Jakub Kicinski
@ 2019-03-01  9:01     ` Daniel Borkmann
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01  9:01 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: ast, bpf, netdev, joe, john.fastabend, tgraf, yhs, andriin, lmb

On 03/01/2019 04:51 AM, Jakub Kicinski wrote:
> On Fri,  1 Mar 2019 00:18:24 +0100, Daniel Borkmann wrote:
>> This work adds two new map creation flags BPF_F_RDONLY_PROG
>> and BPF_F_WRONLY_PROG in order to allow for read-only or
>> write-only BPF maps from a BPF program side.
>>
>> Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
>> applies to system call side, meaning the BPF program has full
>> read/write access to the map as usual while bpf(2) calls with
>> map fd can either only read or write into the map depending
>> on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
>> for the exact opposite such that verifier is going to reject
>> program loads if write into a read-only map or a read into a
>> write-only map is detected.
>>
>> We've enabled this generic map extension to various non-special
>> maps holding normal user data: array, hash, lru, lpm, local
>> storage, queue and stack. Further map types could be followed
>> up in future depending on use-case. Main use case here is to
>> forbid writes into .rodata map values from verifier side.
> 
> This will also enable optimizing the accesses on system with rich
> memory architecture :)

Nice! :)

>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>  include/linux/bpf.h           | 18 ++++++++++++++++++
>>  include/uapi/linux/bpf.h      | 10 +++++++++-
>>  kernel/bpf/arraymap.c         |  2 +-
>>  kernel/bpf/hashtab.c          |  2 +-
>>  kernel/bpf/local_storage.c    |  2 +-
>>  kernel/bpf/lpm_trie.c         |  2 +-
>>  kernel/bpf/queue_stack_maps.c |  3 +--
>>  kernel/bpf/verifier.c         | 30 +++++++++++++++++++++++++++++-
>>  8 files changed, 61 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index bdcc6e2a9977..3f74194dd4f6 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -427,6 +427,24 @@ struct bpf_array {
>>  	};
>>  };
>>  
>> +#define BPF_MAP_CAN_READ	BIT(0)
>> +#define BPF_MAP_CAN_WRITE	BIT(1)
>> +
>> +static inline u32 bpf_map_flags_to_cap(struct bpf_map *map)
>> +{
>> +	u32 access_flags = map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
>> +
>> +	/* Combination of BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG is
>> +	 * not possible.
>> +	 */
> 
> minor nit: we do check that old RDONLY and WRONLY are not set at the
>            same time, but here it's not done?

Good point indeed, I'll fix it in a v3 such that an invalid combination of
both, BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG, is rejected upon map creation.

>> +	if (access_flags & BPF_F_RDONLY_PROG)
>> +		return BPF_MAP_CAN_READ;
>> +	else if (access_flags & BPF_F_WRONLY_PROG)
>> +		return BPF_MAP_CAN_WRITE;
>> +	else
>> +		return BPF_MAP_CAN_READ | BPF_MAP_CAN_WRITE;
>> +}

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 3/7] bpf, obj: allow . char as part of the name
  2019-03-01  5:52   ` Andrii Nakryiko
@ 2019-03-01  9:04     ` Daniel Borkmann
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01  9:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Networking, joe, john.fastabend, tgraf,
	Yonghong Song, Andrii Nakryiko, Jakub Kicinski, lmb

On 03/01/2019 06:52 AM, Andrii Nakryiko wrote:
> On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> Trivial addition to allow '.' aside from '_' as "special" characters
>> in the object name. Used to name maps from loader side as ".bss",
>> ".data", ".rodata".
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> 
> Acked-by: Andrii Nakryiko <andriin@fb.com>
> 
>>  kernel/bpf/syscall.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
>> index d3ef45e01d7a..90044da3346e 100644
>> --- a/kernel/bpf/syscall.c
>> +++ b/kernel/bpf/syscall.c
>> @@ -440,10 +440,10 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
>>         const char *end = src + BPF_OBJ_NAME_LEN;
>>
>>         memset(dst, 0, BPF_OBJ_NAME_LEN);
>> -
>> -       /* Copy all isalnum() and '_' char */
>> +       /* Copy all isalnum(), '_' and '.' chars. */
> 
> Is there any reason names are so restrictive? Say, why not '-' as
> well? It's perfectly safe even in filenames. Or even '/' and '\'? Is
> this name used by anything else in the system, except for
> introspection?

Could be done, presumably it was more restrictive in case one might
need some reserved names in unforeseeable future, but looks so far
noone run into the need to extend it further than this. :)

>>         while (src < end && *src) {
>> -               if (!isalnum(*src) && *src != '_')
>> +               if (!isalnum(*src) &&
>> +                   *src != '_' && *src != '.')
>>                         return -EINVAL;
>>                 *dst++ = *src++;
>>         }
>> --
>> 2.17.1
>>


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-03-01  5:46   ` Andrii Nakryiko
@ 2019-03-01  9:49     ` Daniel Borkmann
  2019-03-01 18:50       ` Jakub Kicinski
  2019-03-01 19:35       ` Andrii Nakryiko
  0 siblings, 2 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01  9:49 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Networking, joe, john.fastabend, tgraf,
	Yonghong Song, Andrii Nakryiko, Jakub Kicinski, lmb

On 03/01/2019 06:46 AM, Andrii Nakryiko wrote:
> On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> This generic extension to BPF maps allows for directly loading an
>> address residing inside a BPF map value as a single BPF ldimm64
>> instruction.
> 
> This is great! I'm going to review code more thoroughly tomorrow, but
> I also have few questions/suggestions I'd like to discuss, if you
> don't mind.

Awesome, thanks!

>> The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
>> is a special src_reg flag for ldimm64 instruction that indicates
>> that inside the first part of the double insns's imm field is a
>> file descriptor which the verifier then replaces as a full 64bit
>> address of the map into both imm parts.
>>
>> For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea
>> is similar: the first part of the double insns's imm field is
>> again a file descriptor corresponding to the map, and the second
>> part of the imm field is an offset. The verifier will then replace
>> both imm parts with an address that points into the BPF map value
>> for maps that support this operation. BPF_PSEUDO_MAP_VALUE is a
>> distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not
>> differ offset 0 between load of map pointer versus load of map's
>> value at offset 0.
> 
> Is having both BPF_PSEUDO_MAP_FD and BPF_PSEUDO_MAP_VALUE a desirable
> thing? I'm asking because it's seems like it would be really simple to
> stick to using just BPF_PSEUDO_MAP_FD and then interpret imm
> differently depending on whether it's 0 or not. E.g., we can say that
> imm=0 is old BPF_PSEUDO_MAP_FD behavior (loading map addr), but any
> other imm value X is really just (X-1) offset into map's value? Or,
> given that valid offset is limited to 1<<29, we can set highest-order
> bit to 1 and lower bits would be offset? In other words, if we just
> need too carve out zero as a special case, then it's easy to do and we
> can avoid adding new BPF_PSEUDO_MAP_VALUE.

Was thinking about reusing BPF_PSEUDO_MAP_FD initially as mentioned in
here, but went for BPF_PSEUDO_MAP_VALUE eventually to have a straight
forward mapping. Your suggestion could be done, but it feels more
complex than necessary, imho, meaning it might confuse users trying to
make sense of a insn dump or verifier output wondering whether the off
by one is a bug or not, which won't happen if the offset is exactly the
same value as LLVM emits. There is also one more unfortunate reason
which I noticed while implementing: in replace_map_fd_with_map_ptr()
we never enforced that for BPF_PSEUDO_MAP_FD insns the second imm part
must be 0, meaning it could also have a garbage value which would then
break loaders in the wild; with the code today this is ignored and then
overridden by the map address. We could try to push a patch to stable
to reject anything non-zero in the second imm for BPF_PSEUDO_MAP_FD
and see if anyone actually notices, and then use some higher-order bit
as a selector, but that still would need some extra handling to make
the offset clear for users wrt dumps; I can give it a try though to
check how much more complex it gets. Worst case if something should
really break somewhere, we might need to revert the imm==0 rejection
though. Overall, BPF_PSEUDO_MAP_VALUE felt slightly more suitable to me.

>> This allows for efficiently retrieving an address to a map value
>> memory area without having to issue a helper call which needs to
>> prepare registers according to calling convention, etc, without
>> needing the extra NULL test, and without having to add the offset
>> in an additional instruction to the value base pointer.
> 
> It seems like we allow this only for arrays of size 1 right now. We
> can easily generalize this to support not just offset into map's
> value, but also specifying integer key (i.e., array index) by
> utilizing off fields (16-bit + 16-bit). This would allow to eliminate
> any bpf_map_update_elem calls to array maps altogether by allowing to
> provide both array index and offset into value in one BPF instruction.
> Do you think it's a good addition?

Yeah, I've been thinking about this as well that for array-like maps
it's easy to support and at the same time lifts the restriction, too.
I think it would be useful and straight forward to implement, I can
include it into v3.

>> The verifier then treats the destination register as PTR_TO_MAP_VALUE
>> with constant reg->off from the user passed offset from the second
>> imm field, and guarantees that this is within bounds of the map
>> value. Any subsequent operations are normally treated as typical
>> map value handling without anything else needed for verification.
>>
>> The two map operations for direct value access have been added to
>> array map for now. In future other types could be supported as
>> well depending on the use case. The main use case for this commit
>> is to allow for BPF loader support for global variables that
>> reside in .data/.rodata/.bss sections such that we can directly
>> load the address of them with minimal additional infrastructure
>> required. Loader support has been added in subsequent commits for
>> libbpf library.
> 
> I was considering adding a new kind of map representing contiguous
> block of memory (e.g., how about BPF_MAP_TYPE_HEAP or
> BPF_MAP_TYPE_BLOB?). It's keys would be offset into that memory
> region. Value size is size of the memory region, but it would allow
> reading smaller chunks of memory as values. This would provide
> convenient interface for poking at global variables from userland,
> given offset.
> 
> Libbpf itself would provide higher-level API as well, if there is
> corresponding BTF type information describing layout of
> .data/.bss/.rodata, so that applications can fetch variables by name
> and/or offset, whichever is more convenient. Together with
> bpf_spinlock this would allow easy way to customize subsets of global
> variables in atomic fashion.
> 
> Do you think that would work? Using array is a bit limiting, because
> it doesn't allow to do partial reads/updates, while BPF_MAP_TYPE_HEAP
> would be single big value that allows partial reading/updating.

If I understand it correctly, the main difference this would have is
to be able to use spin_locks in a more fine-grained fashion, right?
Meaning, partial reads/updates of the memory area under spin_lock as
opposed to having to lock over the full area? Yeah, sounds like a
reasonable extension to me that could be done on top of the series,
presumably most of the array map logic could also be reused for this
which is nice.

Thanks a lot,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01  6:53   ` Andrii Nakryiko
@ 2019-03-01 10:46     ` Daniel Borkmann
  2019-03-01 18:10       ` Stanislav Fomichev
  2019-03-01 18:46       ` Andrii Nakryiko
  0 siblings, 2 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01 10:46 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Networking, joe, john.fastabend, tgraf,
	Yonghong Song, Andrii Nakryiko, Jakub Kicinski, lmb

On 03/01/2019 07:53 AM, Andrii Nakryiko wrote:
> On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>
>> This work adds BPF loader support for global data sections
>> to libbpf. This allows to write BPF programs in more natural
>> C-like way by being able to define global variables and const
>> data.
>>
>> Back at LPC 2018 [0] we presented a first prototype which
>> implemented support for global data sections by extending BPF
>> syscall where union bpf_attr would get additional memory/size
>> pair for each section passed during prog load in order to later
>> add this base address into the ldimm64 instruction along with
>> the user provided offset when accessing a variable. Consensus
>> from LPC was that for proper upstream support, it would be
>> more desirable to use maps instead of bpf_attr extension as
>> this would allow for introspection of these sections as well
>> as potential life updates of their content. This work follows
>> this path by taking the following steps from loader side:
>>
>>  1) In bpf_object__elf_collect() step we pick up ".data",
>>     ".rodata", and ".bss" section information.
>>
>>  2) If present, in bpf_object__init_global_maps() we create
>>     a map that corresponds to each of the present sections.
> 
> Is there any point in having .data and .bss in separate maps? I can
> only see for reasons of inspectiion from bpftool, but other than that
> isn't .bss just an optimization over .data to save space in ELF file,
> but in other regards is just another part of r/w .data section?

Hmm, I actually don't mind too much combining both of them. Had
the same thought with regards to introspection from bpftool which
was why I separated them. But combining the two into a single map
is fine actually, saves a bit of resources in kernel, and offsets
can easily be fixed up from libbpf side. Will do for v3.

>>     Given section size and access properties can differ, a
>>     single entry array map is created with value size that
>>     is corresponding to the ELF section size of .data, .bss
>>     or .rodata. In the latter case, the map is created as
>>     read-only from program side such that verifier rejects
>>     any write attempts into .rodata. In a subsequent step,
>>     for .data and .rodata sections, the section content is
>>     copied into the map through bpf_map_update_elem(). For
>>     .bss this is not necessary since array map is already
>>     zero-initialized by default.
> 
> For .rodata, ideally it would be nice to make it RDONLY from userland
> as well, except for first UPDATE. How hard is it to support that?

Right now the BPF_F_RDONLY, BPF_F_WRONLY semantics to make the
maps read-only or write-only from syscall side are that these
permissions are stored into the struct file front end (file->f_mode)
for the anon inode we use, meaning it's separated from the actual
BPF map, so you can create the map with BPF_F_RDONLY, but root
user can do BPF_MAP_GET_FD_BY_ID without the BPF_F_RDONLY and
again write into it. This design choice would require that we'd
need to add some additional infrastructure on top of this, which
would then need to enforce file->f_mode to read-only after the
first setup. I think there's simple trick we can apply to make
it read-only after setup from syscall side: we'll add a new flag
to the map, and then upon map creation libbpf sets everything
up, holds the id, closes its fd, and refetches the fd by id.
From that point onwards any interface where you would get the
fd from the map in user space will enforce BPF_F_RDONLY behavior
for file->f_mode. Another, less hacky option could be to extend
the struct file ops we currently use for BPF maps and set a
map 'immutable' flag from there which is then enforced once all
pending operations have completed. I can look a bit into this.

>>  3) In bpf_program__collect_reloc() step, we record the
>>     corresponding map, insn index, and relocation type for
>>     the global data.
>>
>>  4) And last but not least in the actual relocation step in
>>     bpf_program__relocate(), we mark the ldimm64 instruction
>>     with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>>     imm field the map's file descriptor is stored as similarly
>>     done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>>     (as ldimm64 is 2-insn wide) we store the access offset
>>     into the section.
>>
>>  5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>>     load will then store the actual target address in order
>>     to have a 'map-lookup'-free access. That is, the actual
>>     map value base address + offset. The destination register
>>     in the verifier will then be marked as PTR_TO_MAP_VALUE,
>>     containing the fixed offset as reg->off and backing BPF
>>     map as reg->map_ptr. Meaning, it's treated as any other
>>     normal map value from verification side, only with
>>     efficient, direct value access instead of actual call to
>>     map lookup helper as in the typical case.
>>
>> Simple example dump of program using globals vars in each
>> section:
>>
>>   # readelf -a test_global_data.o
>>   [...]
>>   [ 6] .bss              NOBITS           0000000000000000  00000328
>>        0000000000000010  0000000000000000  WA       0     0     8
>>   [ 7] .data             PROGBITS         0000000000000000  00000328
>>        0000000000000010  0000000000000000  WA       0     0     8
>>   [ 8] .rodata           PROGBITS         0000000000000000  00000338
>>        0000000000000018  0000000000000000   A       0     0     8
>>   [...]
>>     95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>>     96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>>     97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>>     98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>>     99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>>    100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>>    101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>>   [...]
>>
>>   # bpftool prog
>>   103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>>        loaded_at 2019-02-28T02:02:35+0000  uid 0
>>        xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>>   # bpftool map show id 63
>>   63: array  name .bss  flags 0x0                      <-- .bss area, rw
>>       key 4B  value 16B  max_entries 1  memlock 4096B
>>   # bpftool map show id 64
>>   64: array  name .data  flags 0x0                     <-- .data area, rw
>>       key 4B  value 16B  max_entries 1  memlock 4096B
>>   # bpftool map show id 65
>>   65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
>>       key 4B  value 24B  max_entries 1  memlock 4096B
>>
>>   # bpftool prog dump xlated id 103
>>   int load_static_data(struct __sk_buff * skb):
>>   ; int load_static_data(struct __sk_buff *skb)
>>      0: (b7) r1 = 0
>>   ; key = 0;
>>      1: (63) *(u32 *)(r10 -4) = r1
>>      2: (bf) r6 = r10
>>   ; int load_static_data(struct __sk_buff *skb)
>>      3: (07) r6 += -4
>>   ; bpf_map_update_elem(&result, &key, &static_bss, 0);
>>      4: (18) r1 = map[id:66]
>>      6: (bf) r2 = r6
>>      7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
>>      9: (b7) r4 = 0
>>     10: (85) call array_map_update_elem#99888
>>     11: (b7) r1 = 1
>>   ; key = 1;
>>     12: (63) *(u32 *)(r10 -4) = r1
>>   ; bpf_map_update_elem(&result, &key, &static_data, 0);
>>     13: (18) r1 = map[id:66]
>>     15: (bf) r2 = r6
>>     16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
>>     18: (b7) r4 = 0
>>     19: (85) call array_map_update_elem#99888
>>     20: (b7) r1 = 2
>>   ; key = 2;
>>     21: (63) *(u32 *)(r10 -4) = r1
>>   ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
>>     22: (18) r1 = map[id:66]
>>     24: (bf) r2 = r6
>>     25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
>>     27: (b7) r4 = 0
>>     28: (85) call array_map_update_elem#99888
>>     29: (b7) r1 = 3
>>   ; key = 3;
>>     30: (63) *(u32 *)(r10 -4) = r1
>>   ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
>>     31: (18) r7 = map[id:63][0]+8         <--.
>>     33: (18) r1 = map[id:66]                 |
>>     35: (bf) r2 = r6                         |
>>     36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
>>     38: (b7) r4 = 0
>>     39: (85) call array_map_update_elem#99888
>>   [...]
>>
>> For now .data/.rodata/.bss maps are not exposed via API to the
>> user, but this could be done in a subsequent step.
> 
> See comment about BPF_MAP_TYPE_HEAP/BLOB map in comments to patch #1,
> it would probably make more useful API for .data/.rodata/.bss.
> 
>>
>> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
>> fail for static variables").
>>
>> Joint work with Joe Stringer.
>>
>>   [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
>>       http://vger.kernel.org/lpc-bpf2018.html#session-3
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>> ---
>>  tools/include/uapi/linux/bpf.h |  10 +-
>>  tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
>>  2 files changed, 226 insertions(+), 43 deletions(-)
>>
>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>> index 8884072e1a46..04b26f59b413 100644
>> --- a/tools/include/uapi/linux/bpf.h
>> +++ b/tools/include/uapi/linux/bpf.h
>> @@ -287,7 +287,7 @@ enum bpf_attach_type {
>>
>>  #define BPF_OBJ_NAME_LEN 16U
>>
>> -/* Flags for accessing BPF object */
>> +/* Flags for accessing BPF object from syscall side. */
>>  #define BPF_F_RDONLY           (1U << 3)
>>  #define BPF_F_WRONLY           (1U << 4)
>>
>> @@ -297,6 +297,14 @@ enum bpf_attach_type {
>>  /* Zero-initialize hash function seed. This should only be used for testing. */
>>  #define BPF_F_ZERO_SEED                (1U << 6)
>>
>> +/* Flags for accessing BPF object from program side. */
>> +#define BPF_F_RDONLY_PROG      (1U << 7)
>> +#define BPF_F_WRONLY_PROG      (1U << 8)
>> +#define BPF_F_ACCESS_MASK      (BPF_F_RDONLY |         \
>> +                                BPF_F_RDONLY_PROG |    \
>> +                                BPF_F_WRONLY |         \
>> +                                BPF_F_WRONLY_PROG)
>> +
>>  /* flags for BPF_PROG_QUERY */
>>  #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
>>
>> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
>> index 8f8f688f3e9b..969bc3d9f02c 100644
>> --- a/tools/lib/bpf/libbpf.c
>> +++ b/tools/lib/bpf/libbpf.c
>> @@ -139,6 +139,9 @@ struct bpf_program {
>>                 enum {
>>                         RELO_LD64,
>>                         RELO_CALL,
>> +                       RELO_DATA,
>> +                       RELO_RODATA,
>> +                       RELO_BSS,
> 
> All three of those are essentially the same relocations, just applied
> against different ELF sections.
> I think by having just single RELO_GLOBAL_DATA you can actually
> simplify a bunch of code below, please see corresponding comments.

Ok, sounds like a reasonable simplification, will do all well for v3.

>>                 } type;
>>                 int insn_idx;
>>                 union {
>> @@ -174,7 +177,10 @@ struct bpf_program {
>>  struct bpf_map {
>>         int fd;
>>         char *name;
>> -       size_t offset;
>> +       union {
>> +               __u32 global_type;
> 
> This could be an index into common maps array.
> 
>> +               size_t offset;
>> +       };
>>         int map_ifindex;
>>         int inner_map_fd;
>>         struct bpf_map_def def;
>> @@ -194,6 +200,8 @@ struct bpf_object {
>>         size_t nr_programs;
>>         struct bpf_map *maps;
>>         size_t nr_maps;
>> +       struct bpf_map *maps_global;
>> +       size_t nr_maps_global;
> 
> Global maps could be stored in maps, along other ones, so that we
> don't need to keep track of them separately.
> 
> Another inconvenience of having a separate array of global maps is
> that bpf_map__iter won't iterate them. I don't know if that's
> desirable behavior or not, but it probably would be nice to iterate
> over global ones as well?

My thinking was that these maps are not explicitly user specified,
so libbpf API would expose them through a different interface than
the one we have today in order to not confuse or break application
behavior which would otherwise rely on iterating / processing over
them. Separate API would retain current behavior and definitely
make this unambiguous to apps with regards to what to expect from
each of such API call.

>>         bool loaded;
>>         bool has_pseudo_calls;
>> @@ -209,6 +217,9 @@ struct bpf_object {
>>                 Elf *elf;
>>                 GElf_Ehdr ehdr;
>>                 Elf_Data *symbols;
>> +               Elf_Data *global_data;
>> +               Elf_Data *global_rodata;
>> +               Elf_Data *global_bss;
>>                 size_t strtabidx;
>>                 struct {
>>                         GElf_Shdr shdr;
>> @@ -217,6 +228,9 @@ struct bpf_object {
>>                 int nr_reloc;
>>                 int maps_shndx;
>>                 int text_shndx;
>> +               int data_shndx;
>> +               int rodata_shndx;
>> +               int bss_shndx;
>>         } efile;
>>         /*
>>          * All loaded bpf_object is linked in a list, which is
>> @@ -457,6 +471,9 @@ static struct bpf_object *bpf_object__new(const char *path,
>>         obj->efile.obj_buf = obj_buf;
>>         obj->efile.obj_buf_sz = obj_buf_sz;
>>         obj->efile.maps_shndx = -1;
>> +       obj->efile.data_shndx = -1;
>> +       obj->efile.rodata_shndx = -1;
>> +       obj->efile.bss_shndx = -1;
>>
>>         obj->loaded = false;
>>
>> @@ -475,6 +492,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
>>                 obj->efile.elf = NULL;
>>         }
>>         obj->efile.symbols = NULL;
>> +       obj->efile.global_data = NULL;
>> +       obj->efile.global_rodata = NULL;
>> +       obj->efile.global_bss = NULL;
>>
>>         zfree(&obj->efile.reloc);
>>         obj->efile.nr_reloc = 0;
>> @@ -757,6 +777,85 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
>>         return 0;
>>  }
>>
>> +static int
>> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map);
>> +
>> +static int
>> +bpf_object__init_global(struct bpf_object *obj, int i, int type,
>> +                       const char *name, Elf_Data *map_data)
> 
> Instead of deducing flags and looking up for map by index, you can
> just pass struct bpf_map * directly instead of int i and provide
> flags, instead of type.

Yep, agree.

>> +{
>> +       struct bpf_map *map = &obj->maps_global[i];
>> +       struct bpf_map_def *def = &map->def;
>> +       char *cp, errmsg[STRERR_BUFSIZE];
>> +       int err, slot0 = 0;
>> +
>> +       def->type = BPF_MAP_TYPE_ARRAY;
>> +       def->key_size = sizeof(int);
>> +       def->value_size = map_data->d_size;
>> +       def->max_entries = 1;
>> +       def->map_flags = type == RELO_RODATA ? BPF_F_RDONLY_PROG : 0;
>> +
>> +       map->name = strdup(name);
>> +       map->global_type = type;
>> +       map->fd = bpf_object__create_map(obj, map);
>> +       if (map->fd < 0) {
>> +               err = map->fd;
>> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>> +               pr_warning("failed to create map (name: '%s'): %s\n",
>> +                          map->name, cp);
>> +               goto destroy;
>> +       }
>> +
>> +       pr_debug("create map %s: fd=%d\n", map->name, map->fd);
>> +
>> +       if (type != RELO_BSS) {
>> +               err = bpf_map_update_elem(map->fd, &slot0, map_data->d_buf, 0);
>> +               if (err < 0) {
>> +                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>> +                       pr_warning("failed to update map (name: '%s'): %s\n",
>> +                                  map->name, cp);
>> +                       goto destroy;
>> +               }
>> +
>> +               pr_debug("updated map %s with elf data: fd=%d\n", map->name,
>> +                        map->fd);
>> +       }
>> +       return 0;
>> +destroy:
>> +       for (i = 0; i < obj->nr_maps_global; i++)
>> +               zclose(obj->maps_global[i].fd);
>> +       return err;
>> +}
>> +
>> +static int
>> +bpf_object__init_global_maps(struct bpf_object *obj)
>> +{
>> +       int nr_maps_global = (obj->efile.data_shndx >= 0) +
>> +                            (obj->efile.rodata_shndx >= 0) +
>> +                            (obj->efile.bss_shndx >= 0), i, err = 0;
> 
> This looks like a good candidate for separate static function? It can
> also be reused below to check if there is any global map present.

Sounds good.

>> +
>> +       obj->maps_global = calloc(nr_maps_global, sizeof(obj->maps_global[0]));
>> +       if (!obj->maps_global) {
> 
> If nr_maps_global is 0, calloc might or might not return NULL, so this
> check might erroneously return error.

Good point, just read it up as well from man page, will fix.

>> +               pr_warning("alloc maps for object failed\n");
>> +               return -ENOMEM;
>> +       }
>> +
>> +       obj->nr_maps_global = nr_maps_global;
>> +       for (i = 0; i < obj->nr_maps_global; i++)
>> +               obj->maps[i].fd = -1;
>> +       i = 0;
>> +       if (obj->efile.bss_shndx >= 0)
>> +               err = bpf_object__init_global(obj, i++, RELO_BSS, ".bss",
>> +                                             obj->efile.global_bss);
>> +       if (obj->efile.data_shndx >= 0 && !err)
>> +               err = bpf_object__init_global(obj, i++, RELO_DATA, ".data",
>> +                                             obj->efile.global_data);
>> +       if (obj->efile.rodata_shndx >= 0 && !err)
>> +               err = bpf_object__init_global(obj, i++, RELO_RODATA, ".rodata",
>> +                                             obj->efile.global_rodata);
> 
> Here we know exactly what type of map we are creating, so we can just
> directly pass all the required structs/flags/data.
> 
> Also, to speed up and simplify relocation processing below, I think
> it's better to store map indexes for each of available .bss, .data and
> .rodata maps, eliminating another need for having three different
> types of data relocations.

Yep, I'll clean this up.

>> +       return err;
>> +}
>> +
>>  static bool section_have_execinstr(struct bpf_object *obj, int idx)
>>  {
>>         Elf_Scn *scn;
>> @@ -865,6 +964,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>>                                         pr_warning("failed to alloc program %s (%s): %s",
>>                                                    name, obj->path, cp);
>>                                 }
>> +                       } else if (strcmp(name, ".data") == 0) {
>> +                               obj->efile.global_data = data;
>> +                               obj->efile.data_shndx = idx;
>> +                       } else if (strcmp(name, ".rodata") == 0) {
>> +                               obj->efile.global_rodata = data;
>> +                               obj->efile.rodata_shndx = idx;
>>                         }
> 
> Previously if we encountered unknown PROGBITS section, we'd emit debug
> message about skipping section, should we add that message here?

Sounds reasonable, I'll add a similar 'skip section' debug output there.

>>                 } else if (sh.sh_type == SHT_REL) {
>>                         void *reloc = obj->efile.reloc;
>> @@ -892,6 +997,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>>                                 obj->efile.reloc[n].shdr = sh;
>>                                 obj->efile.reloc[n].data = data;
>>                         }
>> +               } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
>> +                       obj->efile.global_bss = data;
>> +                       obj->efile.bss_shndx = idx;
>>                 } else {
>>                         pr_debug("skip section(%d) %s\n", idx, name);
>>                 }
>> @@ -923,6 +1031,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>>                 if (err)
>>                         goto out;
>>         }
>> +       if (obj->efile.data_shndx >= 0 ||
>> +           obj->efile.rodata_shndx >= 0 ||
>> +           obj->efile.bss_shndx >= 0) {
>> +               err = bpf_object__init_global_maps(obj);
>> +               if (err)
>> +                       goto out;
>> +       }
>> +
>>         err = bpf_object__init_prog_names(obj);
>>  out:
>>         return err;
>> @@ -961,6 +1077,11 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>         Elf_Data *symbols = obj->efile.symbols;
>>         int text_shndx = obj->efile.text_shndx;
>>         int maps_shndx = obj->efile.maps_shndx;
>> +       int data_shndx = obj->efile.data_shndx;
>> +       int rodata_shndx = obj->efile.rodata_shndx;
>> +       int bss_shndx = obj->efile.bss_shndx;
>> +       struct bpf_map *maps_global = obj->maps_global;
>> +       size_t nr_maps_global = obj->nr_maps_global;
>>         struct bpf_map *maps = obj->maps;
>>         size_t nr_maps = obj->nr_maps;
>>         int i, nrels;
>> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>                          (long long) (rel.r_info >> 32),
>>                          (long long) sym.st_value, sym.st_name);
>>
>> -               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
>> -                       pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
>> +               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
>> +                   sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
>> +                   sym.st_shndx != bss_shndx) {
>> +                       pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
>>                                    prog->section_name, sym.st_shndx);
>>                         return -LIBBPF_ERRNO__RELOC;
>>                 }
>> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>                         prog->reloc_desc[i].type = RELO_LD64;
>>                         prog->reloc_desc[i].insn_idx = insn_idx;
>>                         prog->reloc_desc[i].map_idx = map_idx;
>> +               } else if (sym.st_shndx == data_shndx ||
>> +                          sym.st_shndx == rodata_shndx ||
>> +                          sym.st_shndx == bss_shndx) {
>> +                       int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
>> +                                  (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
>> +                                                                   RELO_BSS;
>> +
>> +                       for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
>> +                               if (maps_global[map_idx].global_type == type) {
>> +                                       pr_debug("relocation: find map %zd (%s) for insn %u\n",
>> +                                                map_idx, maps_global[map_idx].name, insn_idx);
>> +                                       break;
>> +                               }
>> +                       }
>> +
>> +                       if (map_idx >= nr_maps_global) {
>> +                               pr_warning("bpf relocation: map_idx %d large than %d\n",
>> +                                          (int)map_idx, (int)nr_maps_global - 1);
>> +                               return -LIBBPF_ERRNO__RELOC;
>> +                       }
> 
> We don't need to handle all of this if we just remember global map
> indicies during creation, instead of calculating type, we can just
> pick correct index (and check it exists). And type can be just generic
> RELO_DATA.
> 
>> +
>> +                       prog->reloc_desc[i].type = type;
>> +                       prog->reloc_desc[i].insn_idx = insn_idx;
>> +                       prog->reloc_desc[i].map_idx = map_idx;
>>                 }
>>         }
>>         return 0;
>> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
>>  }
>>
>>  static int
>> -bpf_object__create_maps(struct bpf_object *obj)
>> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map)
>>  {
>>         struct bpf_create_map_attr create_attr = {};
>> +       struct bpf_map_def *def = &map->def;
>> +       char *cp, errmsg[STRERR_BUFSIZE];
>> +       int fd;
>> +
>> +       if (obj->caps.name)
>> +               create_attr.name = map->name;
>> +       create_attr.map_ifindex = map->map_ifindex;
>> +       create_attr.map_type = def->type;
>> +       create_attr.map_flags = def->map_flags;
>> +       create_attr.key_size = def->key_size;
>> +       create_attr.value_size = def->value_size;
>> +       create_attr.max_entries = def->max_entries;
>> +       create_attr.btf_fd = 0;
>> +       create_attr.btf_key_type_id = 0;
>> +       create_attr.btf_value_type_id = 0;
>> +       if (bpf_map_type__is_map_in_map(def->type) &&
>> +           map->inner_map_fd >= 0)
>> +               create_attr.inner_map_fd = map->inner_map_fd;
>> +       if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
>> +               create_attr.btf_fd = btf__fd(obj->btf);
>> +               create_attr.btf_key_type_id = map->btf_key_type_id;
>> +               create_attr.btf_value_type_id = map->btf_value_type_id;
>> +       }
>> +
>> +       fd = bpf_create_map_xattr(&create_attr);
>> +       if (fd < 0 && create_attr.btf_key_type_id) {
>> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>> +               pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
>> +                          map->name, cp, errno);
>> +
>> +               create_attr.btf_fd = 0;
>> +               create_attr.btf_key_type_id = 0;
>> +               create_attr.btf_value_type_id = 0;
>> +               map->btf_key_type_id = 0;
>> +               map->btf_value_type_id = 0;
>> +               fd = bpf_create_map_xattr(&create_attr);
>> +       }
>> +
>> +       return fd;
>> +}
>> +
>> +static int
>> +bpf_object__create_maps(struct bpf_object *obj)
>> +{
>>         unsigned int i;
>>         int err;
>>
>>         for (i = 0; i < obj->nr_maps; i++) {
>>                 struct bpf_map *map = &obj->maps[i];
>> -               struct bpf_map_def *def = &map->def;
>>                 char *cp, errmsg[STRERR_BUFSIZE];
>>                 int *pfd = &map->fd;
>>
>> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>>                                  map->name, map->fd);
>>                         continue;
>>                 }
>> -
>> -               if (obj->caps.name)
>> -                       create_attr.name = map->name;
>> -               create_attr.map_ifindex = map->map_ifindex;
>> -               create_attr.map_type = def->type;
>> -               create_attr.map_flags = def->map_flags;
>> -               create_attr.key_size = def->key_size;
>> -               create_attr.value_size = def->value_size;
>> -               create_attr.max_entries = def->max_entries;
>> -               create_attr.btf_fd = 0;
>> -               create_attr.btf_key_type_id = 0;
>> -               create_attr.btf_value_type_id = 0;
>> -               if (bpf_map_type__is_map_in_map(def->type) &&
>> -                   map->inner_map_fd >= 0)
>> -                       create_attr.inner_map_fd = map->inner_map_fd;
>> -
>> -               if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
>> -                       create_attr.btf_fd = btf__fd(obj->btf);
>> -                       create_attr.btf_key_type_id = map->btf_key_type_id;
>> -                       create_attr.btf_value_type_id = map->btf_value_type_id;
>> -               }
>> -
>> -               *pfd = bpf_create_map_xattr(&create_attr);
>> -               if (*pfd < 0 && create_attr.btf_key_type_id) {
>> -                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>> -                       pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
>> -                                  map->name, cp, errno);
>> -                       create_attr.btf_fd = 0;
>> -                       create_attr.btf_key_type_id = 0;
>> -                       create_attr.btf_value_type_id = 0;
>> -                       map->btf_key_type_id = 0;
>> -                       map->btf_value_type_id = 0;
>> -                       *pfd = bpf_create_map_xattr(&create_attr);
>> -               }
>> -
>> +               *pfd = bpf_object__create_map(obj, map);
>>                 if (*pfd < 0) {
>>                         size_t j;
>>
>> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>>                                                       &prog->reloc_desc[i]);
>>                         if (err)
>>                                 return err;
>> +               } else if (prog->reloc_desc[i].type == RELO_DATA ||
>> +                          prog->reloc_desc[i].type == RELO_RODATA ||
>> +                          prog->reloc_desc[i].type == RELO_BSS) {
>> +                       struct bpf_insn *insns = prog->insns;
>> +                       int insn_idx, map_idx, data_off;
>> +
>> +                       insn_idx = prog->reloc_desc[i].insn_idx;
>> +                       map_idx  = prog->reloc_desc[i].map_idx;
>> +                       data_off = insns[insn_idx].imm;
>> +
>> +                       if (insn_idx + 1 >= (int)prog->insns_cnt) {
>> +                               pr_warning("relocation out of range: '%s'\n",
>> +                                          prog->section_name);
>> +                               return -LIBBPF_ERRNO__RELOC;
>> +                       }
>> +                       insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
>> +                       insns[insn_idx].imm = obj->maps_global[map_idx].fd;
>> +                       insns[insn_idx + 1].imm = data_off;
>>                 }
>>         }
>>
>> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
>>
>>         CHECK_ERR(bpf_object__elf_init(obj), err, out);
>>         CHECK_ERR(bpf_object__check_endianness(obj), err, out);
>> +       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>>         CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
>>         CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
>>         CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
>> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
>>
>>         for (i = 0; i < obj->nr_maps; i++)
>>                 zclose(obj->maps[i].fd);
>> -
>> +       for (i = 0; i < obj->nr_maps_global; i++)
>> +               zclose(obj->maps_global[i].fd);
>>         for (i = 0; i < obj->nr_programs; i++)
>>                 bpf_program__unload(&obj->programs[i]);
>>
>> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
>>
>>         obj->loaded = true;
>>
>> -       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>>         CHECK_ERR(bpf_object__create_maps(obj), err, out);
>>         CHECK_ERR(bpf_object__relocate(obj), err, out);
>>         CHECK_ERR(bpf_object__load_progs(obj), err, out);
>> --
>> 2.17.1
>>
> 
> I'm sorry if I seem a bit too obsessed with those three new relocation
> types. I just believe that having one generic and storing global maps
> along with other maps is cleaner and more uniform.

No worries, thanks for all your feedback and review!

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-02-28 23:18 ` [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access Daniel Borkmann
                     ` (2 preceding siblings ...)
  2019-03-01  5:46   ` Andrii Nakryiko
@ 2019-03-01 17:18   ` Yonghong Song
  2019-03-01 19:51     ` Daniel Borkmann
  2019-03-04  6:03   ` Andrii Nakryiko
  4 siblings, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2019-03-01 17:18 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: bpf, netdev, joe, john.fastabend, tgraf, Andrii Nakryiko,
	jakub.kicinski, lmb



On 2/28/19 3:18 PM, Daniel Borkmann wrote:
> This generic extension to BPF maps allows for directly loading an
> address residing inside a BPF map value as a single BPF ldimm64
> instruction.
> 
> The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
> is a special src_reg flag for ldimm64 instruction that indicates
> that inside the first part of the double insns's imm field is a
> file descriptor which the verifier then replaces as a full 64bit
> address of the map into both imm parts.
> 
> For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea
> is similar: the first part of the double insns's imm field is
> again a file descriptor corresponding to the map, and the second
> part of the imm field is an offset. The verifier will then replace
> both imm parts with an address that points into the BPF map value
> for maps that support this operation. BPF_PSEUDO_MAP_VALUE is a
> distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not
> differ offset 0 between load of map pointer versus load of map's
> value at offset 0.
> 
> This allows for efficiently retrieving an address to a map value
> memory area without having to issue a helper call which needs to
> prepare registers according to calling convention, etc, without
> needing the extra NULL test, and without having to add the offset
> in an additional instruction to the value base pointer.
> 
> The verifier then treats the destination register as PTR_TO_MAP_VALUE
> with constant reg->off from the user passed offset from the second
> imm field, and guarantees that this is within bounds of the map
> value. Any subsequent operations are normally treated as typical
> map value handling without anything else needed for verification.
> 
> The two map operations for direct value access have been added to
> array map for now. In future other types could be supported as
> well depending on the use case. The main use case for this commit
> is to allow for BPF loader support for global variables that
> reside in .data/.rodata/.bss sections such that we can directly
> load the address of them with minimal additional infrastructure
> required. Loader support has been added in subsequent commits for
> libbpf library.

The patch version #1 provides a way to replace the load with
immediate (presumably read-only data). This will be good for
the use case like below:

    if (static_variable_kernel_version == V1) {
        /* code here will work for kernel V1 */
        ... access helpers available for V1 ...
    } else if (static_variable_kernel_version == V2) {
        /* code here will work for kernel V2 */
        ... access helpers available for V2 ...
    }

The approach here did not replace the map value access with values from 
e.g., readonly section for which libbpf could provide an interface to 
fill in data from user.

This may require a little more analysis, e.g.,
    ptr = ld_imm64 from a readonly section
    ...
    *(u32 *)ptr;
    *(u64 *)(ptr + 8);
    ...

Do you think we could do this in kernel verifier or we should
push the whole readonly stuff into user space?

> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>   include/linux/bpf.h               |  6 +++
>   include/linux/bpf_verifier.h      |  4 ++
>   include/uapi/linux/bpf.h          |  6 ++-
>   kernel/bpf/arraymap.c             | 33 ++++++++++++++
>   kernel/bpf/core.c                 |  3 +-
>   kernel/bpf/disasm.c               |  5 ++-
>   kernel/bpf/syscall.c              | 29 +++++++++---
>   kernel/bpf/verifier.c             | 73 +++++++++++++++++++++++--------
>   tools/bpf/bpftool/xlated_dumper.c |  3 ++
>   tools/include/uapi/linux/bpf.h    |  6 ++-
>   10 files changed, 138 insertions(+), 30 deletions(-)
> 
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a2132e09dc1c..bdcc6e2a9977 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -57,6 +57,12 @@ struct bpf_map_ops {
>   			     const struct btf *btf,
>   			     const struct btf_type *key_type,
>   			     const struct btf_type *value_type);
> +
> +	/* Direct value access helpers. */
> +	int (*map_direct_value_access)(const struct bpf_map *map,
> +				       u32 off, u64 *imm);
> +	int (*map_direct_value_offset)(const struct bpf_map *map,
> +				       u64 imm, u32 *off);
>   };
>   
>   struct bpf_map {
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 69f7a3449eda..6e28f1c24710 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -183,6 +183,10 @@ struct bpf_insn_aux_data {
>   		unsigned long map_state;	/* pointer/poison value for maps */
>   		s32 call_imm;			/* saved imm field of call insn */
>   		u32 alu_limit;			/* limit for add/sub register with pointer */
> +		struct {
> +			u32 map_index;		/* index into used_maps[] */
> +			u32 map_off;		/* offset from value base address */
> +		};
>   	};
>   	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
>   	int sanitize_stack_off; /* stack slot to be cleared */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2e308e90ffea..8884072e1a46 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -255,8 +255,12 @@ enum bpf_attach_type {
>    */
>   #define BPF_F_ANY_ALIGNMENT	(1U << 1)
>   
> -/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
> +/* When bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_{FD,VALUE}, then
> + * bpf_ldimm64's insn[0]->imm == fd in both cases. Additionally,
> + * for BPF_PSEUDO_MAP_VALUE, insn[1]->imm == offset into value.
> + */
>   #define BPF_PSEUDO_MAP_FD	1
> +#define BPF_PSEUDO_MAP_VALUE	2
>   
>   /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>    * offset to another bpf function
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index c72e0d8e1e65..3e5969c0c979 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -160,6 +160,37 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
>   	return array->value + array->elem_size * (index & array->index_mask);
>   }
>   
> +static int array_map_direct_value_access(const struct bpf_map *map, u32 off,
> +					 u64 *imm)
> +{
> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
> +
> +	if (map->max_entries != 1)
> +		return -ENOTSUPP;
> +	if (off >= map->value_size)
> +		return -EINVAL;
> +
> +	*imm = (unsigned long)array->value;
> +	return 0;
> +}
> +
> +static int array_map_direct_value_offset(const struct bpf_map *map, u64 imm,
> +					 u32 *off)
> +{
> +	struct bpf_array *array = container_of(map, struct bpf_array, map);
> +	unsigned long range = map->value_size;
> +	unsigned long base  = array->value;
> +	unsigned long addr  = imm;
> +
> +	if (map->max_entries != 1)
> +		return -ENOENT;
> +	if (addr < base || addr >= base + range)
> +		return -ENOENT;
> +
> +	*off = addr - base;
> +	return 0;
> +}
> +
>   /* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
>   static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
>   {
> @@ -419,6 +450,8 @@ const struct bpf_map_ops array_map_ops = {
>   	.map_update_elem = array_map_update_elem,
>   	.map_delete_elem = array_map_delete_elem,
>   	.map_gen_lookup = array_map_gen_lookup,
> +	.map_direct_value_access = array_map_direct_value_access,
> +	.map_direct_value_offset = array_map_direct_value_offset,
>   	.map_seq_show_elem = array_map_seq_show_elem,
>   	.map_check_btf = array_map_check_btf,
>   };
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 1c14c347f3cf..49fc0ff14537 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -286,7 +286,8 @@ int bpf_prog_calc_tag(struct bpf_prog *fp)
>   		dst[i] = fp->insnsi[i];
>   		if (!was_ld_map &&
>   		    dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
> -		    dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
> +		    (dst[i].src_reg == BPF_PSEUDO_MAP_FD ||
> +		     dst[i].src_reg == BPF_PSEUDO_MAP_VALUE)) {
>   			was_ld_map = true;
>   			dst[i].imm = 0;
>   		} else if (was_ld_map &&
> diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
> index de73f55e42fd..d9ce383c0f9c 100644
> --- a/kernel/bpf/disasm.c
> +++ b/kernel/bpf/disasm.c
> @@ -205,10 +205,11 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
>   			 * part of the ldimm64 insn is accessible.
>   			 */
>   			u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
> -			bool map_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD;
> +			bool is_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD ||
> +				      insn->src_reg == BPF_PSEUDO_MAP_VALUE;
>   			char tmp[64];
>   
> -			if (map_ptr && !allow_ptr_leaks)
> +			if (is_ptr && !allow_ptr_leaks)
>   				imm = 0;
>   
>   			verbose(cbs->private_data, "(%02x) r%d = %s\n",
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 174581dfe225..d3ef45e01d7a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2061,13 +2061,27 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
>   }
>   
>   static const struct bpf_map *bpf_map_from_imm(const struct bpf_prog *prog,
> -					      unsigned long addr)
> +					      unsigned long addr, u32 *off,
> +					      u32 *type)
>   {
> +	const struct bpf_map *map;
>   	int i;
>   
> -	for (i = 0; i < prog->aux->used_map_cnt; i++)
> -		if (prog->aux->used_maps[i] == (void *)addr)
> -			return prog->aux->used_maps[i];
> +	*off = *type = 0;
> +	for (i = 0; i < prog->aux->used_map_cnt; i++) {
> +		map = prog->aux->used_maps[i];
> +		if (map == (void *)addr) {
> +			*type = BPF_PSEUDO_MAP_FD;
> +			return map;
> +		}
> +		if (!map->ops->map_direct_value_offset)
> +			continue;
> +		if (!map->ops->map_direct_value_offset(map, addr, off)) {
> +			*type = BPF_PSEUDO_MAP_VALUE;
> +			return map;
> +		}
> +	}
> +
>   	return NULL;
>   }
>   
> @@ -2075,6 +2089,7 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
>   {
>   	const struct bpf_map *map;
>   	struct bpf_insn *insns;
> +	u32 off, type;
>   	u64 imm;
>   	int i;
>   
> @@ -2102,11 +2117,11 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
>   			continue;
>   
>   		imm = ((u64)insns[i + 1].imm << 32) | (u32)insns[i].imm;
> -		map = bpf_map_from_imm(prog, imm);
> +		map = bpf_map_from_imm(prog, imm, &off, &type);
>   		if (map) {
> -			insns[i].src_reg = BPF_PSEUDO_MAP_FD;
> +			insns[i].src_reg = type;
>   			insns[i].imm = map->id;
> -			insns[i + 1].imm = 0;
> +			insns[i + 1].imm = off;
>   			continue;
>   		}
>   	}
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0e4edd7e3c5f..3ad05dda6e9d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4944,18 +4944,12 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>   	return 0;
>   }
>   
> -/* return the map pointer stored inside BPF_LD_IMM64 instruction */
> -static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
> -{
> -	u64 imm64 = ((u64) (u32) insn[0].imm) | ((u64) (u32) insn[1].imm) << 32;
> -
> -	return (struct bpf_map *) (unsigned long) imm64;
> -}
> -
>   /* verify BPF_LD_IMM64 instruction */
>   static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>   {
> +	struct bpf_insn_aux_data *aux = cur_aux(env);
>   	struct bpf_reg_state *regs = cur_regs(env);
> +	struct bpf_map *map;
>   	int err;
>   
>   	if (BPF_SIZE(insn->code) != BPF_DW) {
> @@ -4979,11 +4973,22 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>   		return 0;
>   	}
>   
> -	/* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
> -	BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
> +	map = env->used_maps[aux->map_index];
> +	mark_reg_known_zero(env, regs, insn->dst_reg);
> +	regs[insn->dst_reg].map_ptr = map;
> +
> +	if (insn->src_reg == BPF_PSEUDO_MAP_VALUE) {
> +		regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
> +		regs[insn->dst_reg].off = aux->map_off;
> +		if (map_value_has_spin_lock(map))
> +			regs[insn->dst_reg].id = ++env->id_gen;
> +	} else if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
> +		regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
> +	} else {
> +		verbose(env, "bpf verifier is misconfigured\n");
> +		return -EINVAL;
> +	}
>   
> -	regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
> -	regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
>   	return 0;
>   }
>   
> @@ -6664,8 +6669,10 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>   		}
>   
>   		if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
> +			struct bpf_insn_aux_data *aux;
>   			struct bpf_map *map;
>   			struct fd f;
> +			u64 addr;
>   
>   			if (i == insn_cnt - 1 || insn[1].code != 0 ||
>   			    insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
> @@ -6677,8 +6684,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>   			if (insn->src_reg == 0)
>   				/* valid generic load 64-bit imm */
>   				goto next_insn;
> -
> -			if (insn->src_reg != BPF_PSEUDO_MAP_FD) {
> +			if (insn->src_reg != BPF_PSEUDO_MAP_FD &&
> +			    insn->src_reg != BPF_PSEUDO_MAP_VALUE) {
>   				verbose(env,
>   					"unrecognized bpf_ld_imm64 insn\n");
>   				return -EINVAL;
> @@ -6698,16 +6705,44 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>   				return err;
>   			}
>   
> -			/* store map pointer inside BPF_LD_IMM64 instruction */
> -			insn[0].imm = (u32) (unsigned long) map;
> -			insn[1].imm = ((u64) (unsigned long) map) >> 32;
> +			aux = &env->insn_aux_data[i];
> +			if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
> +				addr = (unsigned long)map;
> +			} else {
> +				u32 off = insn[1].imm;
> +
> +				if (off >= BPF_MAX_VAR_OFF) {
> +					verbose(env, "direct value offset of %u is not allowed\n",
> +						off);
> +					return -EINVAL;
> +				}
> +				if (!map->ops->map_direct_value_access) {
> +					verbose(env, "no direct value access support for this map type\n");
> +					return -EINVAL;
> +				}
> +
> +				err = map->ops->map_direct_value_access(map, off, &addr);
> +				if (err) {
> +					verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
> +						map->value_size, off);
> +					return err;
> +				}
> +
> +				aux->map_off = off;
> +				addr += off;
> +			}
> +
> +			insn[0].imm = (u32)addr;
> +			insn[1].imm = addr >> 32;
>   
>   			/* check whether we recorded this map already */
> -			for (j = 0; j < env->used_map_cnt; j++)
> +			for (j = 0; j < env->used_map_cnt; j++) {
>   				if (env->used_maps[j] == map) {
> +					aux->map_index = j;
>   					fdput(f);
>   					goto next_insn;
>   				}
> +			}
>   
>   			if (env->used_map_cnt >= MAX_USED_MAPS) {
>   				fdput(f);
> @@ -6724,6 +6759,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>   				fdput(f);
>   				return PTR_ERR(map);
>   			}
> +
> +			aux->map_index = env->used_map_cnt;
>   			env->used_maps[env->used_map_cnt++] = map;
>   
>   			if (bpf_map_is_cgroup_storage(map) &&
> diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
> index 7073dbe1ff27..0bb17bf88b18 100644
> --- a/tools/bpf/bpftool/xlated_dumper.c
> +++ b/tools/bpf/bpftool/xlated_dumper.c
> @@ -195,6 +195,9 @@ static const char *print_imm(void *private_data,
>   	if (insn->src_reg == BPF_PSEUDO_MAP_FD)
>   		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>   			 "map[id:%u]", insn->imm);
> +	else if (insn->src_reg == BPF_PSEUDO_MAP_VALUE)
> +		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
> +			 "map[id:%u][0]+%u", insn->imm, (insn + 1)->imm);
>   	else
>   		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>   			 "0x%llx", (unsigned long long)full_imm);
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 2e308e90ffea..8884072e1a46 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -255,8 +255,12 @@ enum bpf_attach_type {
>    */
>   #define BPF_F_ANY_ALIGNMENT	(1U << 1)
>   
> -/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
> +/* When bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_{FD,VALUE}, then
> + * bpf_ldimm64's insn[0]->imm == fd in both cases. Additionally,
> + * for BPF_PSEUDO_MAP_VALUE, insn[1]->imm == offset into value.
> + */
>   #define BPF_PSEUDO_MAP_FD	1
> +#define BPF_PSEUDO_MAP_VALUE	2
>   
>   /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>    * offset to another bpf function
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 10:46     ` Daniel Borkmann
@ 2019-03-01 18:10       ` Stanislav Fomichev
  2019-03-01 18:46       ` Andrii Nakryiko
  1 sibling, 0 replies; 46+ messages in thread
From: Stanislav Fomichev @ 2019-03-01 18:10 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andrii Nakryiko, Alexei Starovoitov, bpf, Networking, joe,
	john.fastabend, tgraf, Yonghong Song, Andrii Nakryiko,
	Jakub Kicinski, lmb

On 03/01, Daniel Borkmann wrote:
> On 03/01/2019 07:53 AM, Andrii Nakryiko wrote:
> > On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>
> >> This work adds BPF loader support for global data sections
> >> to libbpf. This allows to write BPF programs in more natural
> >> C-like way by being able to define global variables and const
> >> data.
> >>
> >> Back at LPC 2018 [0] we presented a first prototype which
> >> implemented support for global data sections by extending BPF
> >> syscall where union bpf_attr would get additional memory/size
> >> pair for each section passed during prog load in order to later
> >> add this base address into the ldimm64 instruction along with
> >> the user provided offset when accessing a variable. Consensus
> >> from LPC was that for proper upstream support, it would be
> >> more desirable to use maps instead of bpf_attr extension as
> >> this would allow for introspection of these sections as well
> >> as potential life updates of their content. This work follows
> >> this path by taking the following steps from loader side:
> >>
> >>  1) In bpf_object__elf_collect() step we pick up ".data",
> >>     ".rodata", and ".bss" section information.
> >>
> >>  2) If present, in bpf_object__init_global_maps() we create
> >>     a map that corresponds to each of the present sections.
> > 
> > Is there any point in having .data and .bss in separate maps? I can
> > only see for reasons of inspectiion from bpftool, but other than that
> > isn't .bss just an optimization over .data to save space in ELF file,
> > but in other regards is just another part of r/w .data section?
> 
> Hmm, I actually don't mind too much combining both of them. Had
> the same thought with regards to introspection from bpftool which
> was why I separated them. But combining the two into a single map
> is fine actually, saves a bit of resources in kernel, and offsets
> can easily be fixed up from libbpf side. Will do for v3.
Do we plan to pretty-print data/bss with BTF from the bpftool at some
point? Does combining them makes it harder?

> >>     Given section size and access properties can differ, a
> >>     single entry array map is created with value size that
> >>     is corresponding to the ELF section size of .data, .bss
> >>     or .rodata. In the latter case, the map is created as
> >>     read-only from program side such that verifier rejects
> >>     any write attempts into .rodata. In a subsequent step,
> >>     for .data and .rodata sections, the section content is
> >>     copied into the map through bpf_map_update_elem(). For
> >>     .bss this is not necessary since array map is already
> >>     zero-initialized by default.
> > 
> > For .rodata, ideally it would be nice to make it RDONLY from userland
> > as well, except for first UPDATE. How hard is it to support that?
> 
> Right now the BPF_F_RDONLY, BPF_F_WRONLY semantics to make the
> maps read-only or write-only from syscall side are that these
> permissions are stored into the struct file front end (file->f_mode)
> for the anon inode we use, meaning it's separated from the actual
> BPF map, so you can create the map with BPF_F_RDONLY, but root
> user can do BPF_MAP_GET_FD_BY_ID without the BPF_F_RDONLY and
> again write into it. This design choice would require that we'd
> need to add some additional infrastructure on top of this, which
> would then need to enforce file->f_mode to read-only after the
> first setup. I think there's simple trick we can apply to make
> it read-only after setup from syscall side: we'll add a new flag
> to the map, and then upon map creation libbpf sets everything
> up, holds the id, closes its fd, and refetches the fd by id.
> From that point onwards any interface where you would get the
> fd from the map in user space will enforce BPF_F_RDONLY behavior
> for file->f_mode. Another, less hacky option could be to extend
> the struct file ops we currently use for BPF maps and set a
> map 'immutable' flag from there which is then enforced once all
> pending operations have completed. I can look a bit into this.
> 
> >>  3) In bpf_program__collect_reloc() step, we record the
> >>     corresponding map, insn index, and relocation type for
> >>     the global data.
> >>
> >>  4) And last but not least in the actual relocation step in
> >>     bpf_program__relocate(), we mark the ldimm64 instruction
> >>     with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
> >>     imm field the map's file descriptor is stored as similarly
> >>     done as in BPF_PSEUDO_MAP_FD, and in the second imm field
> >>     (as ldimm64 is 2-insn wide) we store the access offset
> >>     into the section.
> >>
> >>  5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
> >>     load will then store the actual target address in order
> >>     to have a 'map-lookup'-free access. That is, the actual
> >>     map value base address + offset. The destination register
> >>     in the verifier will then be marked as PTR_TO_MAP_VALUE,
> >>     containing the fixed offset as reg->off and backing BPF
> >>     map as reg->map_ptr. Meaning, it's treated as any other
> >>     normal map value from verification side, only with
> >>     efficient, direct value access instead of actual call to
> >>     map lookup helper as in the typical case.
> >>
> >> Simple example dump of program using globals vars in each
> >> section:
> >>
> >>   # readelf -a test_global_data.o
> >>   [...]
> >>   [ 6] .bss              NOBITS           0000000000000000  00000328
> >>        0000000000000010  0000000000000000  WA       0     0     8
> >>   [ 7] .data             PROGBITS         0000000000000000  00000328
> >>        0000000000000010  0000000000000000  WA       0     0     8
> >>   [ 8] .rodata           PROGBITS         0000000000000000  00000338
> >>        0000000000000018  0000000000000000   A       0     0     8
> >>   [...]
> >>     95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
> >>     96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
> >>     97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
> >>     98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
> >>     99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
> >>    100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
> >>    101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
> >>   [...]
> >>
> >>   # bpftool prog
> >>   103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
> >>        loaded_at 2019-02-28T02:02:35+0000  uid 0
> >>        xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
> >>   # bpftool map show id 63
> >>   63: array  name .bss  flags 0x0                      <-- .bss area, rw
> >>       key 4B  value 16B  max_entries 1  memlock 4096B
> >>   # bpftool map show id 64
> >>   64: array  name .data  flags 0x0                     <-- .data area, rw
> >>       key 4B  value 16B  max_entries 1  memlock 4096B
> >>   # bpftool map show id 65
> >>   65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
> >>       key 4B  value 24B  max_entries 1  memlock 4096B
> >>
> >>   # bpftool prog dump xlated id 103
> >>   int load_static_data(struct __sk_buff * skb):
> >>   ; int load_static_data(struct __sk_buff *skb)
> >>      0: (b7) r1 = 0
> >>   ; key = 0;
> >>      1: (63) *(u32 *)(r10 -4) = r1
> >>      2: (bf) r6 = r10
> >>   ; int load_static_data(struct __sk_buff *skb)
> >>      3: (07) r6 += -4
> >>   ; bpf_map_update_elem(&result, &key, &static_bss, 0);
> >>      4: (18) r1 = map[id:66]
> >>      6: (bf) r2 = r6
> >>      7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
> >>      9: (b7) r4 = 0
> >>     10: (85) call array_map_update_elem#99888
> >>     11: (b7) r1 = 1
> >>   ; key = 1;
> >>     12: (63) *(u32 *)(r10 -4) = r1
> >>   ; bpf_map_update_elem(&result, &key, &static_data, 0);
> >>     13: (18) r1 = map[id:66]
> >>     15: (bf) r2 = r6
> >>     16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
> >>     18: (b7) r4 = 0
> >>     19: (85) call array_map_update_elem#99888
> >>     20: (b7) r1 = 2
> >>   ; key = 2;
> >>     21: (63) *(u32 *)(r10 -4) = r1
> >>   ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
> >>     22: (18) r1 = map[id:66]
> >>     24: (bf) r2 = r6
> >>     25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
> >>     27: (b7) r4 = 0
> >>     28: (85) call array_map_update_elem#99888
> >>     29: (b7) r1 = 3
> >>   ; key = 3;
> >>     30: (63) *(u32 *)(r10 -4) = r1
> >>   ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
> >>     31: (18) r7 = map[id:63][0]+8         <--.
> >>     33: (18) r1 = map[id:66]                 |
> >>     35: (bf) r2 = r6                         |
> >>     36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
> >>     38: (b7) r4 = 0
> >>     39: (85) call array_map_update_elem#99888
> >>   [...]
> >>
> >> For now .data/.rodata/.bss maps are not exposed via API to the
> >> user, but this could be done in a subsequent step.
> > 
> > See comment about BPF_MAP_TYPE_HEAP/BLOB map in comments to patch #1,
> > it would probably make more useful API for .data/.rodata/.bss.
> > 
> >>
> >> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> >> fail for static variables").
> >>
> >> Joint work with Joe Stringer.
> >>
> >>   [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
> >>       http://vger.kernel.org/lpc-bpf2018.html#session-3
> >>
> >> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> >> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> >> ---
> >>  tools/include/uapi/linux/bpf.h |  10 +-
> >>  tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
> >>  2 files changed, 226 insertions(+), 43 deletions(-)
> >>
> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >> index 8884072e1a46..04b26f59b413 100644
> >> --- a/tools/include/uapi/linux/bpf.h
> >> +++ b/tools/include/uapi/linux/bpf.h
> >> @@ -287,7 +287,7 @@ enum bpf_attach_type {
> >>
> >>  #define BPF_OBJ_NAME_LEN 16U
> >>
> >> -/* Flags for accessing BPF object */
> >> +/* Flags for accessing BPF object from syscall side. */
> >>  #define BPF_F_RDONLY           (1U << 3)
> >>  #define BPF_F_WRONLY           (1U << 4)
> >>
> >> @@ -297,6 +297,14 @@ enum bpf_attach_type {
> >>  /* Zero-initialize hash function seed. This should only be used for testing. */
> >>  #define BPF_F_ZERO_SEED                (1U << 6)
> >>
> >> +/* Flags for accessing BPF object from program side. */
> >> +#define BPF_F_RDONLY_PROG      (1U << 7)
> >> +#define BPF_F_WRONLY_PROG      (1U << 8)
> >> +#define BPF_F_ACCESS_MASK      (BPF_F_RDONLY |         \
> >> +                                BPF_F_RDONLY_PROG |    \
> >> +                                BPF_F_WRONLY |         \
> >> +                                BPF_F_WRONLY_PROG)
> >> +
> >>  /* flags for BPF_PROG_QUERY */
> >>  #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
> >>
> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> index 8f8f688f3e9b..969bc3d9f02c 100644
> >> --- a/tools/lib/bpf/libbpf.c
> >> +++ b/tools/lib/bpf/libbpf.c
> >> @@ -139,6 +139,9 @@ struct bpf_program {
> >>                 enum {
> >>                         RELO_LD64,
> >>                         RELO_CALL,
> >> +                       RELO_DATA,
> >> +                       RELO_RODATA,
> >> +                       RELO_BSS,
> > 
> > All three of those are essentially the same relocations, just applied
> > against different ELF sections.
> > I think by having just single RELO_GLOBAL_DATA you can actually
> > simplify a bunch of code below, please see corresponding comments.
> 
> Ok, sounds like a reasonable simplification, will do all well for v3.
> 
> >>                 } type;
> >>                 int insn_idx;
> >>                 union {
> >> @@ -174,7 +177,10 @@ struct bpf_program {
> >>  struct bpf_map {
> >>         int fd;
> >>         char *name;
> >> -       size_t offset;
> >> +       union {
> >> +               __u32 global_type;
> > 
> > This could be an index into common maps array.
> > 
> >> +               size_t offset;
> >> +       };
> >>         int map_ifindex;
> >>         int inner_map_fd;
> >>         struct bpf_map_def def;
> >> @@ -194,6 +200,8 @@ struct bpf_object {
> >>         size_t nr_programs;
> >>         struct bpf_map *maps;
> >>         size_t nr_maps;
> >> +       struct bpf_map *maps_global;
> >> +       size_t nr_maps_global;
> > 
> > Global maps could be stored in maps, along other ones, so that we
> > don't need to keep track of them separately.
> > 
> > Another inconvenience of having a separate array of global maps is
> > that bpf_map__iter won't iterate them. I don't know if that's
> > desirable behavior or not, but it probably would be nice to iterate
> > over global ones as well?
> 
> My thinking was that these maps are not explicitly user specified,
> so libbpf API would expose them through a different interface than
> the one we have today in order to not confuse or break application
> behavior which would otherwise rely on iterating / processing over
> them. Separate API would retain current behavior and definitely
> make this unambiguous to apps with regards to what to expect from
> each of such API call.
> 
> >>         bool loaded;
> >>         bool has_pseudo_calls;
> >> @@ -209,6 +217,9 @@ struct bpf_object {
> >>                 Elf *elf;
> >>                 GElf_Ehdr ehdr;
> >>                 Elf_Data *symbols;
> >> +               Elf_Data *global_data;
> >> +               Elf_Data *global_rodata;
> >> +               Elf_Data *global_bss;
> >>                 size_t strtabidx;
> >>                 struct {
> >>                         GElf_Shdr shdr;
> >> @@ -217,6 +228,9 @@ struct bpf_object {
> >>                 int nr_reloc;
> >>                 int maps_shndx;
> >>                 int text_shndx;
> >> +               int data_shndx;
> >> +               int rodata_shndx;
> >> +               int bss_shndx;
> >>         } efile;
> >>         /*
> >>          * All loaded bpf_object is linked in a list, which is
> >> @@ -457,6 +471,9 @@ static struct bpf_object *bpf_object__new(const char *path,
> >>         obj->efile.obj_buf = obj_buf;
> >>         obj->efile.obj_buf_sz = obj_buf_sz;
> >>         obj->efile.maps_shndx = -1;
> >> +       obj->efile.data_shndx = -1;
> >> +       obj->efile.rodata_shndx = -1;
> >> +       obj->efile.bss_shndx = -1;
> >>
> >>         obj->loaded = false;
> >>
> >> @@ -475,6 +492,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
> >>                 obj->efile.elf = NULL;
> >>         }
> >>         obj->efile.symbols = NULL;
> >> +       obj->efile.global_data = NULL;
> >> +       obj->efile.global_rodata = NULL;
> >> +       obj->efile.global_bss = NULL;
> >>
> >>         zfree(&obj->efile.reloc);
> >>         obj->efile.nr_reloc = 0;
> >> @@ -757,6 +777,85 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
> >>         return 0;
> >>  }
> >>
> >> +static int
> >> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map);
> >> +
> >> +static int
> >> +bpf_object__init_global(struct bpf_object *obj, int i, int type,
> >> +                       const char *name, Elf_Data *map_data)
> > 
> > Instead of deducing flags and looking up for map by index, you can
> > just pass struct bpf_map * directly instead of int i and provide
> > flags, instead of type.
> 
> Yep, agree.
> 
> >> +{
> >> +       struct bpf_map *map = &obj->maps_global[i];
> >> +       struct bpf_map_def *def = &map->def;
> >> +       char *cp, errmsg[STRERR_BUFSIZE];
> >> +       int err, slot0 = 0;
> >> +
> >> +       def->type = BPF_MAP_TYPE_ARRAY;
> >> +       def->key_size = sizeof(int);
> >> +       def->value_size = map_data->d_size;
> >> +       def->max_entries = 1;
> >> +       def->map_flags = type == RELO_RODATA ? BPF_F_RDONLY_PROG : 0;
> >> +
> >> +       map->name = strdup(name);
> >> +       map->global_type = type;
> >> +       map->fd = bpf_object__create_map(obj, map);
> >> +       if (map->fd < 0) {
> >> +               err = map->fd;
> >> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >> +               pr_warning("failed to create map (name: '%s'): %s\n",
> >> +                          map->name, cp);
> >> +               goto destroy;
> >> +       }
> >> +
> >> +       pr_debug("create map %s: fd=%d\n", map->name, map->fd);
> >> +
> >> +       if (type != RELO_BSS) {
> >> +               err = bpf_map_update_elem(map->fd, &slot0, map_data->d_buf, 0);
> >> +               if (err < 0) {
> >> +                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >> +                       pr_warning("failed to update map (name: '%s'): %s\n",
> >> +                                  map->name, cp);
> >> +                       goto destroy;
> >> +               }
> >> +
> >> +               pr_debug("updated map %s with elf data: fd=%d\n", map->name,
> >> +                        map->fd);
> >> +       }
> >> +       return 0;
> >> +destroy:
> >> +       for (i = 0; i < obj->nr_maps_global; i++)
> >> +               zclose(obj->maps_global[i].fd);
> >> +       return err;
> >> +}
> >> +
> >> +static int
> >> +bpf_object__init_global_maps(struct bpf_object *obj)
> >> +{
> >> +       int nr_maps_global = (obj->efile.data_shndx >= 0) +
> >> +                            (obj->efile.rodata_shndx >= 0) +
> >> +                            (obj->efile.bss_shndx >= 0), i, err = 0;
> > 
> > This looks like a good candidate for separate static function? It can
> > also be reused below to check if there is any global map present.
> 
> Sounds good.
> 
> >> +
> >> +       obj->maps_global = calloc(nr_maps_global, sizeof(obj->maps_global[0]));
> >> +       if (!obj->maps_global) {
> > 
> > If nr_maps_global is 0, calloc might or might not return NULL, so this
> > check might erroneously return error.
> 
> Good point, just read it up as well from man page, will fix.
> 
> >> +               pr_warning("alloc maps for object failed\n");
> >> +               return -ENOMEM;
> >> +       }
> >> +
> >> +       obj->nr_maps_global = nr_maps_global;
> >> +       for (i = 0; i < obj->nr_maps_global; i++)
> >> +               obj->maps[i].fd = -1;
> >> +       i = 0;
> >> +       if (obj->efile.bss_shndx >= 0)
> >> +               err = bpf_object__init_global(obj, i++, RELO_BSS, ".bss",
> >> +                                             obj->efile.global_bss);
> >> +       if (obj->efile.data_shndx >= 0 && !err)
> >> +               err = bpf_object__init_global(obj, i++, RELO_DATA, ".data",
> >> +                                             obj->efile.global_data);
> >> +       if (obj->efile.rodata_shndx >= 0 && !err)
> >> +               err = bpf_object__init_global(obj, i++, RELO_RODATA, ".rodata",
> >> +                                             obj->efile.global_rodata);
> > 
> > Here we know exactly what type of map we are creating, so we can just
> > directly pass all the required structs/flags/data.
> > 
> > Also, to speed up and simplify relocation processing below, I think
> > it's better to store map indexes for each of available .bss, .data and
> > .rodata maps, eliminating another need for having three different
> > types of data relocations.
> 
> Yep, I'll clean this up.
> 
> >> +       return err;
> >> +}
> >> +
> >>  static bool section_have_execinstr(struct bpf_object *obj, int idx)
> >>  {
> >>         Elf_Scn *scn;
> >> @@ -865,6 +964,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
> >>                                         pr_warning("failed to alloc program %s (%s): %s",
> >>                                                    name, obj->path, cp);
> >>                                 }
> >> +                       } else if (strcmp(name, ".data") == 0) {
> >> +                               obj->efile.global_data = data;
> >> +                               obj->efile.data_shndx = idx;
> >> +                       } else if (strcmp(name, ".rodata") == 0) {
> >> +                               obj->efile.global_rodata = data;
> >> +                               obj->efile.rodata_shndx = idx;
> >>                         }
> > 
> > Previously if we encountered unknown PROGBITS section, we'd emit debug
> > message about skipping section, should we add that message here?
> 
> Sounds reasonable, I'll add a similar 'skip section' debug output there.
> 
> >>                 } else if (sh.sh_type == SHT_REL) {
> >>                         void *reloc = obj->efile.reloc;
> >> @@ -892,6 +997,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
> >>                                 obj->efile.reloc[n].shdr = sh;
> >>                                 obj->efile.reloc[n].data = data;
> >>                         }
> >> +               } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
> >> +                       obj->efile.global_bss = data;
> >> +                       obj->efile.bss_shndx = idx;
> >>                 } else {
> >>                         pr_debug("skip section(%d) %s\n", idx, name);
> >>                 }
> >> @@ -923,6 +1031,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
> >>                 if (err)
> >>                         goto out;
> >>         }
> >> +       if (obj->efile.data_shndx >= 0 ||
> >> +           obj->efile.rodata_shndx >= 0 ||
> >> +           obj->efile.bss_shndx >= 0) {
> >> +               err = bpf_object__init_global_maps(obj);
> >> +               if (err)
> >> +                       goto out;
> >> +       }
> >> +
> >>         err = bpf_object__init_prog_names(obj);
> >>  out:
> >>         return err;
> >> @@ -961,6 +1077,11 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>         Elf_Data *symbols = obj->efile.symbols;
> >>         int text_shndx = obj->efile.text_shndx;
> >>         int maps_shndx = obj->efile.maps_shndx;
> >> +       int data_shndx = obj->efile.data_shndx;
> >> +       int rodata_shndx = obj->efile.rodata_shndx;
> >> +       int bss_shndx = obj->efile.bss_shndx;
> >> +       struct bpf_map *maps_global = obj->maps_global;
> >> +       size_t nr_maps_global = obj->nr_maps_global;
> >>         struct bpf_map *maps = obj->maps;
> >>         size_t nr_maps = obj->nr_maps;
> >>         int i, nrels;
> >> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>                          (long long) (rel.r_info >> 32),
> >>                          (long long) sym.st_value, sym.st_name);
> >>
> >> -               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> >> -                       pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> >> +               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> >> +                   sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> >> +                   sym.st_shndx != bss_shndx) {
> >> +                       pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
> >>                                    prog->section_name, sym.st_shndx);
> >>                         return -LIBBPF_ERRNO__RELOC;
> >>                 }
> >> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>                         prog->reloc_desc[i].type = RELO_LD64;
> >>                         prog->reloc_desc[i].insn_idx = insn_idx;
> >>                         prog->reloc_desc[i].map_idx = map_idx;
> >> +               } else if (sym.st_shndx == data_shndx ||
> >> +                          sym.st_shndx == rodata_shndx ||
> >> +                          sym.st_shndx == bss_shndx) {
> >> +                       int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
> >> +                                  (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> >> +                                                                   RELO_BSS;
> >> +
> >> +                       for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> >> +                               if (maps_global[map_idx].global_type == type) {
> >> +                                       pr_debug("relocation: find map %zd (%s) for insn %u\n",
> >> +                                                map_idx, maps_global[map_idx].name, insn_idx);
> >> +                                       break;
> >> +                               }
> >> +                       }
> >> +
> >> +                       if (map_idx >= nr_maps_global) {
> >> +                               pr_warning("bpf relocation: map_idx %d large than %d\n",
> >> +                                          (int)map_idx, (int)nr_maps_global - 1);
> >> +                               return -LIBBPF_ERRNO__RELOC;
> >> +                       }
> > 
> > We don't need to handle all of this if we just remember global map
> > indicies during creation, instead of calculating type, we can just
> > pick correct index (and check it exists). And type can be just generic
> > RELO_DATA.
> > 
> >> +
> >> +                       prog->reloc_desc[i].type = type;
> >> +                       prog->reloc_desc[i].insn_idx = insn_idx;
> >> +                       prog->reloc_desc[i].map_idx = map_idx;
> >>                 }
> >>         }
> >>         return 0;
> >> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
> >>  }
> >>
> >>  static int
> >> -bpf_object__create_maps(struct bpf_object *obj)
> >> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map)
> >>  {
> >>         struct bpf_create_map_attr create_attr = {};
> >> +       struct bpf_map_def *def = &map->def;
> >> +       char *cp, errmsg[STRERR_BUFSIZE];
> >> +       int fd;
> >> +
> >> +       if (obj->caps.name)
> >> +               create_attr.name = map->name;
> >> +       create_attr.map_ifindex = map->map_ifindex;
> >> +       create_attr.map_type = def->type;
> >> +       create_attr.map_flags = def->map_flags;
> >> +       create_attr.key_size = def->key_size;
> >> +       create_attr.value_size = def->value_size;
> >> +       create_attr.max_entries = def->max_entries;
> >> +       create_attr.btf_fd = 0;
> >> +       create_attr.btf_key_type_id = 0;
> >> +       create_attr.btf_value_type_id = 0;
> >> +       if (bpf_map_type__is_map_in_map(def->type) &&
> >> +           map->inner_map_fd >= 0)
> >> +               create_attr.inner_map_fd = map->inner_map_fd;
> >> +       if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> >> +               create_attr.btf_fd = btf__fd(obj->btf);
> >> +               create_attr.btf_key_type_id = map->btf_key_type_id;
> >> +               create_attr.btf_value_type_id = map->btf_value_type_id;
> >> +       }
> >> +
> >> +       fd = bpf_create_map_xattr(&create_attr);
> >> +       if (fd < 0 && create_attr.btf_key_type_id) {
> >> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >> +               pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> >> +                          map->name, cp, errno);
> >> +
> >> +               create_attr.btf_fd = 0;
> >> +               create_attr.btf_key_type_id = 0;
> >> +               create_attr.btf_value_type_id = 0;
> >> +               map->btf_key_type_id = 0;
> >> +               map->btf_value_type_id = 0;
> >> +               fd = bpf_create_map_xattr(&create_attr);
> >> +       }
> >> +
> >> +       return fd;
> >> +}
> >> +
> >> +static int
> >> +bpf_object__create_maps(struct bpf_object *obj)
> >> +{
> >>         unsigned int i;
> >>         int err;
> >>
> >>         for (i = 0; i < obj->nr_maps; i++) {
> >>                 struct bpf_map *map = &obj->maps[i];
> >> -               struct bpf_map_def *def = &map->def;
> >>                 char *cp, errmsg[STRERR_BUFSIZE];
> >>                 int *pfd = &map->fd;
> >>
> >> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
> >>                                  map->name, map->fd);
> >>                         continue;
> >>                 }
> >> -
> >> -               if (obj->caps.name)
> >> -                       create_attr.name = map->name;
> >> -               create_attr.map_ifindex = map->map_ifindex;
> >> -               create_attr.map_type = def->type;
> >> -               create_attr.map_flags = def->map_flags;
> >> -               create_attr.key_size = def->key_size;
> >> -               create_attr.value_size = def->value_size;
> >> -               create_attr.max_entries = def->max_entries;
> >> -               create_attr.btf_fd = 0;
> >> -               create_attr.btf_key_type_id = 0;
> >> -               create_attr.btf_value_type_id = 0;
> >> -               if (bpf_map_type__is_map_in_map(def->type) &&
> >> -                   map->inner_map_fd >= 0)
> >> -                       create_attr.inner_map_fd = map->inner_map_fd;
> >> -
> >> -               if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> >> -                       create_attr.btf_fd = btf__fd(obj->btf);
> >> -                       create_attr.btf_key_type_id = map->btf_key_type_id;
> >> -                       create_attr.btf_value_type_id = map->btf_value_type_id;
> >> -               }
> >> -
> >> -               *pfd = bpf_create_map_xattr(&create_attr);
> >> -               if (*pfd < 0 && create_attr.btf_key_type_id) {
> >> -                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >> -                       pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> >> -                                  map->name, cp, errno);
> >> -                       create_attr.btf_fd = 0;
> >> -                       create_attr.btf_key_type_id = 0;
> >> -                       create_attr.btf_value_type_id = 0;
> >> -                       map->btf_key_type_id = 0;
> >> -                       map->btf_value_type_id = 0;
> >> -                       *pfd = bpf_create_map_xattr(&create_attr);
> >> -               }
> >> -
> >> +               *pfd = bpf_object__create_map(obj, map);
> >>                 if (*pfd < 0) {
> >>                         size_t j;
> >>
> >> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
> >>                                                       &prog->reloc_desc[i]);
> >>                         if (err)
> >>                                 return err;
> >> +               } else if (prog->reloc_desc[i].type == RELO_DATA ||
> >> +                          prog->reloc_desc[i].type == RELO_RODATA ||
> >> +                          prog->reloc_desc[i].type == RELO_BSS) {
> >> +                       struct bpf_insn *insns = prog->insns;
> >> +                       int insn_idx, map_idx, data_off;
> >> +
> >> +                       insn_idx = prog->reloc_desc[i].insn_idx;
> >> +                       map_idx  = prog->reloc_desc[i].map_idx;
> >> +                       data_off = insns[insn_idx].imm;
> >> +
> >> +                       if (insn_idx + 1 >= (int)prog->insns_cnt) {
> >> +                               pr_warning("relocation out of range: '%s'\n",
> >> +                                          prog->section_name);
> >> +                               return -LIBBPF_ERRNO__RELOC;
> >> +                       }
> >> +                       insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> >> +                       insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> >> +                       insns[insn_idx + 1].imm = data_off;
> >>                 }
> >>         }
> >>
> >> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
> >>
> >>         CHECK_ERR(bpf_object__elf_init(obj), err, out);
> >>         CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> >> +       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>         CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
> >>         CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
> >>         CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> >> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
> >>
> >>         for (i = 0; i < obj->nr_maps; i++)
> >>                 zclose(obj->maps[i].fd);
> >> -
> >> +       for (i = 0; i < obj->nr_maps_global; i++)
> >> +               zclose(obj->maps_global[i].fd);
> >>         for (i = 0; i < obj->nr_programs; i++)
> >>                 bpf_program__unload(&obj->programs[i]);
> >>
> >> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
> >>
> >>         obj->loaded = true;
> >>
> >> -       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>         CHECK_ERR(bpf_object__create_maps(obj), err, out);
> >>         CHECK_ERR(bpf_object__relocate(obj), err, out);
> >>         CHECK_ERR(bpf_object__load_progs(obj), err, out);
> >> --
> >> 2.17.1
> >>
> > 
> > I'm sorry if I seem a bit too obsessed with those three new relocation
> > types. I just believe that having one generic and storing global maps
> > along with other maps is cleaner and more uniform.
> 
> No worries, thanks for all your feedback and review!
> 
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-02-28 23:18 ` [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
  2019-02-28 23:41   ` Stanislav Fomichev
  2019-03-01  6:53   ` Andrii Nakryiko
@ 2019-03-01 18:11   ` Yonghong Song
  2019-03-01 18:48     ` Andrii Nakryiko
  2019-03-01 19:56     ` Daniel Borkmann
  2 siblings, 2 replies; 46+ messages in thread
From: Yonghong Song @ 2019-03-01 18:11 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: bpf, netdev, joe, john.fastabend, tgraf, Andrii Nakryiko,
	jakub.kicinski, lmb



On 2/28/19 3:18 PM, Daniel Borkmann wrote:
> This work adds BPF loader support for global data sections
> to libbpf. This allows to write BPF programs in more natural
> C-like way by being able to define global variables and const
> data.
> 
> Back at LPC 2018 [0] we presented a first prototype which
> implemented support for global data sections by extending BPF
> syscall where union bpf_attr would get additional memory/size
> pair for each section passed during prog load in order to later
> add this base address into the ldimm64 instruction along with
> the user provided offset when accessing a variable. Consensus
> from LPC was that for proper upstream support, it would be
> more desirable to use maps instead of bpf_attr extension as
> this would allow for introspection of these sections as well
> as potential life updates of their content. This work follows
> this path by taking the following steps from loader side:
> 
>   1) In bpf_object__elf_collect() step we pick up ".data",
>      ".rodata", and ".bss" section information.
> 
>   2) If present, in bpf_object__init_global_maps() we create
>      a map that corresponds to each of the present sections.
>      Given section size and access properties can differ, a
>      single entry array map is created with value size that
>      is corresponding to the ELF section size of .data, .bss
>      or .rodata. In the latter case, the map is created as
>      read-only from program side such that verifier rejects
>      any write attempts into .rodata. In a subsequent step,
>      for .data and .rodata sections, the section content is
>      copied into the map through bpf_map_update_elem(). For
>      .bss this is not necessary since array map is already
>      zero-initialized by default.
> 
>   3) In bpf_program__collect_reloc() step, we record the
>      corresponding map, insn index, and relocation type for
>      the global data.
> 
>   4) And last but not least in the actual relocation step in
>      bpf_program__relocate(), we mark the ldimm64 instruction
>      with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>      imm field the map's file descriptor is stored as similarly
>      done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>      (as ldimm64 is 2-insn wide) we store the access offset
>      into the section.
> 
>   5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>      load will then store the actual target address in order
>      to have a 'map-lookup'-free access. That is, the actual
>      map value base address + offset. The destination register
>      in the verifier will then be marked as PTR_TO_MAP_VALUE,
>      containing the fixed offset as reg->off and backing BPF
>      map as reg->map_ptr. Meaning, it's treated as any other
>      normal map value from verification side, only with
>      efficient, direct value access instead of actual call to
>      map lookup helper as in the typical case.
> 
> Simple example dump of program using globals vars in each
> section:
> 
>    # readelf -a test_global_data.o
>    [...]
>    [ 6] .bss              NOBITS           0000000000000000  00000328
>         0000000000000010  0000000000000000  WA       0     0     8
>    [ 7] .data             PROGBITS         0000000000000000  00000328
>         0000000000000010  0000000000000000  WA       0     0     8
>    [ 8] .rodata           PROGBITS         0000000000000000  00000338
>         0000000000000018  0000000000000000   A       0     0     8
>    [...]
>      95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>      96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>      97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>      98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>      99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>     100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>     101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>    [...]
> 
>    # bpftool prog
>    103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>         loaded_at 2019-02-28T02:02:35+0000  uid 0
>         xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>    # bpftool map show id 63
>    63: array  name .bss  flags 0x0                      <-- .bss area, rw
>        key 4B  value 16B  max_entries 1  memlock 4096B
>    # bpftool map show id 64
>    64: array  name .data  flags 0x0                     <-- .data area, rw
>        key 4B  value 16B  max_entries 1  memlock 4096B
>    # bpftool map show id 65
>    65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
>        key 4B  value 24B  max_entries 1  memlock 4096B
> 
>    # bpftool prog dump xlated id 103
>    int load_static_data(struct __sk_buff * skb):
>    ; int load_static_data(struct __sk_buff *skb)
>       0: (b7) r1 = 0
>    ; key = 0;
>       1: (63) *(u32 *)(r10 -4) = r1
>       2: (bf) r6 = r10
>    ; int load_static_data(struct __sk_buff *skb)
>       3: (07) r6 += -4
>    ; bpf_map_update_elem(&result, &key, &static_bss, 0);
>       4: (18) r1 = map[id:66]
>       6: (bf) r2 = r6
>       7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
>       9: (b7) r4 = 0
>      10: (85) call array_map_update_elem#99888
>      11: (b7) r1 = 1
>    ; key = 1;
>      12: (63) *(u32 *)(r10 -4) = r1
>    ; bpf_map_update_elem(&result, &key, &static_data, 0);
>      13: (18) r1 = map[id:66]
>      15: (bf) r2 = r6
>      16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
>      18: (b7) r4 = 0
>      19: (85) call array_map_update_elem#99888
>      20: (b7) r1 = 2
>    ; key = 2;
>      21: (63) *(u32 *)(r10 -4) = r1
>    ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
>      22: (18) r1 = map[id:66]
>      24: (bf) r2 = r6
>      25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
>      27: (b7) r4 = 0
>      28: (85) call array_map_update_elem#99888
>      29: (b7) r1 = 3
>    ; key = 3;
>      30: (63) *(u32 *)(r10 -4) = r1
>    ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
>      31: (18) r7 = map[id:63][0]+8         <--.
>      33: (18) r1 = map[id:66]                 |
>      35: (bf) r2 = r6                         |
>      36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
>      38: (b7) r4 = 0
>      39: (85) call array_map_update_elem#99888
>    [...]
> 
> For now .data/.rodata/.bss maps are not exposed via API to the
> user, but this could be done in a subsequent step.
> 
> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> fail for static variables").
> 
> Joint work with Joe Stringer.
> 
>    [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
>        http://vger.kernel.org/lpc-bpf2018.html#session-3
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> ---
>   tools/include/uapi/linux/bpf.h |  10 +-
>   tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
>   2 files changed, 226 insertions(+), 43 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 8884072e1a46..04b26f59b413 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -287,7 +287,7 @@ enum bpf_attach_type {
> [...]
> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>   			 (long long) (rel.r_info >> 32),
>   			 (long long) sym.st_value, sym.st_name);
>   
> -		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> -			pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> +		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> +		    sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> +		    sym.st_shndx != bss_shndx) {
> +			pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
>   				   prog->section_name, sym.st_shndx);
>   			return -LIBBPF_ERRNO__RELOC;
>   		}
> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>   			prog->reloc_desc[i].type = RELO_LD64;
>   			prog->reloc_desc[i].insn_idx = insn_idx;
>   			prog->reloc_desc[i].map_idx = map_idx;
> +		} else if (sym.st_shndx == data_shndx ||
> +			   sym.st_shndx == rodata_shndx ||
> +			   sym.st_shndx == bss_shndx) {
> +			int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
> +				   (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> +								    RELO_BSS;
> +
> +			for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> +				if (maps_global[map_idx].global_type == type) {
> +					pr_debug("relocation: find map %zd (%s) for insn %u\n",
> +						 map_idx, maps_global[map_idx].name, insn_idx);
> +					break;
> +				}
> +			}
> +
> +			if (map_idx >= nr_maps_global) {
> +				pr_warning("bpf relocation: map_idx %d large than %d\n",
> +					   (int)map_idx, (int)nr_maps_global - 1);
> +				return -LIBBPF_ERRNO__RELOC;
> +			}
> +
> +			prog->reloc_desc[i].type = type;
> +			prog->reloc_desc[i].insn_idx = insn_idx;
> +			prog->reloc_desc[i].map_idx = map_idx;
>   		}
>   	}
>   	return 0;
> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
>   }
>   
>   static int
[...]
> +
> +static int
> +bpf_object__create_maps(struct bpf_object *obj)
> +{
>   	unsigned int i;
>   	int err;
>   
>   	for (i = 0; i < obj->nr_maps; i++) {
>   		struct bpf_map *map = &obj->maps[i];
> -		struct bpf_map_def *def = &map->def;
>   		char *cp, errmsg[STRERR_BUFSIZE];
>   		int *pfd = &map->fd;
>   
> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>   				 map->name, map->fd);
>   			continue;
>   		}
> -
> -		if (obj->caps.name)
> -			create_attr.name = map->name;
> -		create_attr.map_ifindex = map->map_ifindex;
> -		create_attr.map_type = def->type;
> -		create_attr.map_flags = def->map_flags;
> -		create_attr.key_size = def->key_size;
> -		create_attr.value_size = def->value_size;
> -		create_attr.max_entries = def->max_entries;
> -		create_attr.btf_fd = 0;
> -		create_attr.btf_key_type_id = 0;
> -		create_attr.btf_value_type_id = 0;
> -		if (bpf_map_type__is_map_in_map(def->type) &&
> -		    map->inner_map_fd >= 0)
> -			create_attr.inner_map_fd = map->inner_map_fd;
> -
> -		if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> -			create_attr.btf_fd = btf__fd(obj->btf);
> -			create_attr.btf_key_type_id = map->btf_key_type_id;
> -			create_attr.btf_value_type_id = map->btf_value_type_id;
> -		}
> -
> -		*pfd = bpf_create_map_xattr(&create_attr);
> -		if (*pfd < 0 && create_attr.btf_key_type_id) {
> -			cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> -			pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> -				   map->name, cp, errno);
> -			create_attr.btf_fd = 0;
> -			create_attr.btf_key_type_id = 0;
> -			create_attr.btf_value_type_id = 0;
> -			map->btf_key_type_id = 0;
> -			map->btf_value_type_id = 0;
> -			*pfd = bpf_create_map_xattr(&create_attr);
> -		}
> -
> +		*pfd = bpf_object__create_map(obj, map);
>   		if (*pfd < 0) {
>   			size_t j;
>   
> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>   						      &prog->reloc_desc[i]);
>   			if (err)
>   				return err;
> +		} else if (prog->reloc_desc[i].type == RELO_DATA ||
> +			   prog->reloc_desc[i].type == RELO_RODATA ||
> +			   prog->reloc_desc[i].type == RELO_BSS) {
> +			struct bpf_insn *insns = prog->insns;
> +			int insn_idx, map_idx, data_off;
> +
> +			insn_idx = prog->reloc_desc[i].insn_idx;
> +			map_idx  = prog->reloc_desc[i].map_idx;
> +			data_off = insns[insn_idx].imm;

I want to point to a subtle difference here between handling pure global 
variables and static global variables. The "imm" value is only available
for static variables. For example,

-bash-4.4$ cat g.c
static volatile long sg = 2;
static volatile int si = 3;
long g = 4;
int i = 5;
int test() { return sg + si + g + i; }
-bash-4.4$
-bash-4.4$ clang -target bpf -O2 -c g.c 

-bash-4.4$ readelf -s g.o 


Symbol table '.symtab' contains 8 entries:
    Num:    Value          Size Type    Bind   Vis      Ndx Name
      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
      1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g.c
      2: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    4 sg
      3: 0000000000000018     4 OBJECT  LOCAL  DEFAULT    4 si
      4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
      5: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    4 g
      6: 0000000000000008     4 OBJECT  GLOBAL DEFAULT    4 i
      7: 0000000000000000   128 FUNC    GLOBAL DEFAULT    2 test
-bash-4.4$
-bash-4.4$ llvm-readelf -r g.o

Relocation section '.rel.text' at offset 0x1d8 contains 4 entries:
     Offset             Info             Type               Symbol's 
Value  Symbol's Name
0000000000000000  0000000400000001 R_BPF_64_64 
0000000000000000 .data
0000000000000018  0000000400000001 R_BPF_64_64 
0000000000000000 .data
0000000000000038  0000000500000001 R_BPF_64_64            0000000000000000 g
0000000000000058  0000000600000001 R_BPF_64_64            0000000000000008 i
-bash-4.4$ llvm-objdump -d g.o

g.o:    file format ELF64-BPF

Disassembly of section .text:
0000000000000000 test:
        0:       18 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00 
r1 = 16 ll
        2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
        3:       18 02 00 00 18 00 00 00 00 00 00 00 00 00 00 00 
r2 = 24 ll
        5:       61 22 00 00 00 00 00 00         r2 = *(u32 *)(r2 + 0)
        6:       0f 21 00 00 00 00 00 00         r1 += r2
        7:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
r2 = 0 ll
        9:       79 22 00 00 00 00 00 00         r2 = *(u64 *)(r2 + 0)
       10:       0f 21 00 00 00 00 00 00         r1 += r2
       11:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
r2 = 0 ll
       13:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
       14:       0f 10 00 00 00 00 00 00         r0 += r1
       15:       95 00 00 00 00 00 00 00         exit
-bash-4.4$

You can see the above, the non-static global access does not have its
in-section offset encoded in the insn itself. The difference is due to
llvm treating static global and non-static global differently.

To support both cases, during relocation recording stage, you can
also record:
    . symbol binding (GELF_ST_BIND(sym.st_info)),
      non-static global has binding STB_GLOBAL and static
      global has binding STB_LOCAL
    . symbol value (sym.st_value)

During the above relocation resolution, if symbol bind is local, do
what you already did here. If symbol bind is global, assign data_off
with symbol value.

This applied to both .data and .rodata sections.

The non initialized
global variable will not be in any allocated section in ELF file,
it is in a COM section which is to be allocated by loader.
So user defines some like
    int g;
and later on uses it. Right now, it will not work. The workaround
is "int g = 4", or "static int g". I guess it should be
okay, we should encourage users to use "static" variables instead.

> +
> +			if (insn_idx + 1 >= (int)prog->insns_cnt) {
> +				pr_warning("relocation out of range: '%s'\n",
> +					   prog->section_name);
> +				return -LIBBPF_ERRNO__RELOC;
> +			}
> +			insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> +			insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> +			insns[insn_idx + 1].imm = data_off;
>   		}
>   	}
>   
> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
>   
>   	CHECK_ERR(bpf_object__elf_init(obj), err, out);
>   	CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> +	CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>   	CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
>   	CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
>   	CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
>   
>   	for (i = 0; i < obj->nr_maps; i++)
>   		zclose(obj->maps[i].fd);
> -
> +	for (i = 0; i < obj->nr_maps_global; i++)
> +		zclose(obj->maps_global[i].fd);
>   	for (i = 0; i < obj->nr_programs; i++)
>   		bpf_program__unload(&obj->programs[i]);
>   
> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
>   
>   	obj->loaded = true;
>   
> -	CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>   	CHECK_ERR(bpf_object__create_maps(obj), err, out);
>   	CHECK_ERR(bpf_object__relocate(obj), err, out);
>   	CHECK_ERR(bpf_object__load_progs(obj), err, out);
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 10:46     ` Daniel Borkmann
  2019-03-01 18:10       ` Stanislav Fomichev
@ 2019-03-01 18:46       ` Andrii Nakryiko
  1 sibling, 0 replies; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-01 18:46 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, joe, john.fastabend, tgraf,
	Yonghong Song, Andrii Nakryiko, Jakub Kicinski, lmb

On Fri, Mar 1, 2019 at 2:46 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 03/01/2019 07:53 AM, Andrii Nakryiko wrote:
> > On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>
> >> This work adds BPF loader support for global data sections
> >> to libbpf. This allows to write BPF programs in more natural
> >> C-like way by being able to define global variables and const
> >> data.
> >>
> >> Back at LPC 2018 [0] we presented a first prototype which
> >> implemented support for global data sections by extending BPF
> >> syscall where union bpf_attr would get additional memory/size
> >> pair for each section passed during prog load in order to later
> >> add this base address into the ldimm64 instruction along with
> >> the user provided offset when accessing a variable. Consensus
> >> from LPC was that for proper upstream support, it would be
> >> more desirable to use maps instead of bpf_attr extension as
> >> this would allow for introspection of these sections as well
> >> as potential life updates of their content. This work follows
> >> this path by taking the following steps from loader side:
> >>
> >>  1) In bpf_object__elf_collect() step we pick up ".data",
> >>     ".rodata", and ".bss" section information.
> >>
> >>  2) If present, in bpf_object__init_global_maps() we create
> >>     a map that corresponds to each of the present sections.
> >
> > Is there any point in having .data and .bss in separate maps? I can
> > only see for reasons of inspectiion from bpftool, but other than that
> > isn't .bss just an optimization over .data to save space in ELF file,
> > but in other regards is just another part of r/w .data section?
>
> Hmm, I actually don't mind too much combining both of them. Had
> the same thought with regards to introspection from bpftool which
> was why I separated them. But combining the two into a single map
> is fine actually, saves a bit of resources in kernel, and offsets
> can easily be fixed up from libbpf side. Will do for v3.

I thought a bit more about this and I think there is one good reason
(in addition to introspection) to keep them separate.
If total size of .data is not multiple of 8-byte, when appending .bss
to it, we'll need to pad few bytes. It probably would be a good thing
for verifier to enforce that those bytes are never accessed, which
will be impossible to do if we combine .data and .bss.

I'm not sure how important that is, just wanted to bring up one more
counter-argument.

>
> >>     Given section size and access properties can differ, a
> >>     single entry array map is created with value size that
> >>     is corresponding to the ELF section size of .data, .bss
> >>     or .rodata. In the latter case, the map is created as
> >>     read-only from program side such that verifier rejects
> >>     any write attempts into .rodata. In a subsequent step,
> >>     for .data and .rodata sections, the section content is
> >>     copied into the map through bpf_map_update_elem(). For
> >>     .bss this is not necessary since array map is already
> >>     zero-initialized by default.
> >
> > For .rodata, ideally it would be nice to make it RDONLY from userland
> > as well, except for first UPDATE. How hard is it to support that?
>
> Right now the BPF_F_RDONLY, BPF_F_WRONLY semantics to make the
> maps read-only or write-only from syscall side are that these
> permissions are stored into the struct file front end (file->f_mode)
> for the anon inode we use, meaning it's separated from the actual
> BPF map, so you can create the map with BPF_F_RDONLY, but root
> user can do BPF_MAP_GET_FD_BY_ID without the BPF_F_RDONLY and
> again write into it. This design choice would require that we'd
> need to add some additional infrastructure on top of this, which
> would then need to enforce file->f_mode to read-only after the
> first setup. I think there's simple trick we can apply to make
> it read-only after setup from syscall side: we'll add a new flag
> to the map, and then upon map creation libbpf sets everything
> up, holds the id, closes its fd, and refetches the fd by id.
> From that point onwards any interface where you would get the
> fd from the map in user space will enforce BPF_F_RDONLY behavior
> for file->f_mode. Another, less hacky option could be to extend
> the struct file ops we currently use for BPF maps and set a
> map 'immutable' flag from there which is then enforced once all
> pending operations have completed. I can look a bit into this.

Cool, thanks for explaining!

I think that we can start off without this enforcement, but if we had
this enforcement, it would open up new possibilities for verifier to
optimize/prune code based on values in .rodata, because we'd have
guarantee that values can't change.

>
> >>  3) In bpf_program__collect_reloc() step, we record the
> >>     corresponding map, insn index, and relocation type for
> >>     the global data.
> >>
> >>  4) And last but not least in the actual relocation step in
> >>     bpf_program__relocate(), we mark the ldimm64 instruction
> >>     with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
> >>     imm field the map's file descriptor is stored as similarly
> >>     done as in BPF_PSEUDO_MAP_FD, and in the second imm field
> >>     (as ldimm64 is 2-insn wide) we store the access offset
> >>     into the section.
> >>
> >>  5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
> >>     load will then store the actual target address in order
> >>     to have a 'map-lookup'-free access. That is, the actual
> >>     map value base address + offset. The destination register
> >>     in the verifier will then be marked as PTR_TO_MAP_VALUE,
> >>     containing the fixed offset as reg->off and backing BPF
> >>     map as reg->map_ptr. Meaning, it's treated as any other
> >>     normal map value from verification side, only with
> >>     efficient, direct value access instead of actual call to
> >>     map lookup helper as in the typical case.
> >>
> >> Simple example dump of program using globals vars in each
> >> section:
> >>
> >>   # readelf -a test_global_data.o
> >>   [...]
> >>   [ 6] .bss              NOBITS           0000000000000000  00000328
> >>        0000000000000010  0000000000000000  WA       0     0     8
> >>   [ 7] .data             PROGBITS         0000000000000000  00000328
> >>        0000000000000010  0000000000000000  WA       0     0     8
> >>   [ 8] .rodata           PROGBITS         0000000000000000  00000338
> >>        0000000000000018  0000000000000000   A       0     0     8
> >>   [...]
> >>     95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
> >>     96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
> >>     97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
> >>     98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
> >>     99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
> >>    100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
> >>    101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
> >>   [...]
> >>
> >>   # bpftool prog
> >>   103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
> >>        loaded_at 2019-02-28T02:02:35+0000  uid 0
> >>        xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
> >>   # bpftool map show id 63
> >>   63: array  name .bss  flags 0x0                      <-- .bss area, rw
> >>       key 4B  value 16B  max_entries 1  memlock 4096B
> >>   # bpftool map show id 64
> >>   64: array  name .data  flags 0x0                     <-- .data area, rw
> >>       key 4B  value 16B  max_entries 1  memlock 4096B
> >>   # bpftool map show id 65
> >>   65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
> >>       key 4B  value 24B  max_entries 1  memlock 4096B
> >>
> >>   # bpftool prog dump xlated id 103
> >>   int load_static_data(struct __sk_buff * skb):
> >>   ; int load_static_data(struct __sk_buff *skb)
> >>      0: (b7) r1 = 0
> >>   ; key = 0;
> >>      1: (63) *(u32 *)(r10 -4) = r1
> >>      2: (bf) r6 = r10
> >>   ; int load_static_data(struct __sk_buff *skb)
> >>      3: (07) r6 += -4
> >>   ; bpf_map_update_elem(&result, &key, &static_bss, 0);
> >>      4: (18) r1 = map[id:66]
> >>      6: (bf) r2 = r6
> >>      7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
> >>      9: (b7) r4 = 0
> >>     10: (85) call array_map_update_elem#99888
> >>     11: (b7) r1 = 1
> >>   ; key = 1;
> >>     12: (63) *(u32 *)(r10 -4) = r1
> >>   ; bpf_map_update_elem(&result, &key, &static_data, 0);
> >>     13: (18) r1 = map[id:66]
> >>     15: (bf) r2 = r6
> >>     16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
> >>     18: (b7) r4 = 0
> >>     19: (85) call array_map_update_elem#99888
> >>     20: (b7) r1 = 2
> >>   ; key = 2;
> >>     21: (63) *(u32 *)(r10 -4) = r1
> >>   ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
> >>     22: (18) r1 = map[id:66]
> >>     24: (bf) r2 = r6
> >>     25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
> >>     27: (b7) r4 = 0
> >>     28: (85) call array_map_update_elem#99888
> >>     29: (b7) r1 = 3
> >>   ; key = 3;
> >>     30: (63) *(u32 *)(r10 -4) = r1
> >>   ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
> >>     31: (18) r7 = map[id:63][0]+8         <--.
> >>     33: (18) r1 = map[id:66]                 |
> >>     35: (bf) r2 = r6                         |
> >>     36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
> >>     38: (b7) r4 = 0
> >>     39: (85) call array_map_update_elem#99888
> >>   [...]
> >>
> >> For now .data/.rodata/.bss maps are not exposed via API to the
> >> user, but this could be done in a subsequent step.
> >
> > See comment about BPF_MAP_TYPE_HEAP/BLOB map in comments to patch #1,
> > it would probably make more useful API for .data/.rodata/.bss.
> >
> >>
> >> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> >> fail for static variables").
> >>
> >> Joint work with Joe Stringer.
> >>
> >>   [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
> >>       http://vger.kernel.org/lpc-bpf2018.html#session-3
> >>
> >> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> >> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> >> ---
> >>  tools/include/uapi/linux/bpf.h |  10 +-
> >>  tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
> >>  2 files changed, 226 insertions(+), 43 deletions(-)
> >>
> >> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >> index 8884072e1a46..04b26f59b413 100644
> >> --- a/tools/include/uapi/linux/bpf.h
> >> +++ b/tools/include/uapi/linux/bpf.h
> >> @@ -287,7 +287,7 @@ enum bpf_attach_type {
> >>
> >>  #define BPF_OBJ_NAME_LEN 16U
> >>
> >> -/* Flags for accessing BPF object */
> >> +/* Flags for accessing BPF object from syscall side. */
> >>  #define BPF_F_RDONLY           (1U << 3)
> >>  #define BPF_F_WRONLY           (1U << 4)
> >>
> >> @@ -297,6 +297,14 @@ enum bpf_attach_type {
> >>  /* Zero-initialize hash function seed. This should only be used for testing. */
> >>  #define BPF_F_ZERO_SEED                (1U << 6)
> >>
> >> +/* Flags for accessing BPF object from program side. */
> >> +#define BPF_F_RDONLY_PROG      (1U << 7)
> >> +#define BPF_F_WRONLY_PROG      (1U << 8)
> >> +#define BPF_F_ACCESS_MASK      (BPF_F_RDONLY |         \
> >> +                                BPF_F_RDONLY_PROG |    \
> >> +                                BPF_F_WRONLY |         \
> >> +                                BPF_F_WRONLY_PROG)
> >> +
> >>  /* flags for BPF_PROG_QUERY */
> >>  #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
> >>
> >> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> >> index 8f8f688f3e9b..969bc3d9f02c 100644
> >> --- a/tools/lib/bpf/libbpf.c
> >> +++ b/tools/lib/bpf/libbpf.c
> >> @@ -139,6 +139,9 @@ struct bpf_program {
> >>                 enum {
> >>                         RELO_LD64,
> >>                         RELO_CALL,
> >> +                       RELO_DATA,
> >> +                       RELO_RODATA,
> >> +                       RELO_BSS,
> >
> > All three of those are essentially the same relocations, just applied
> > against different ELF sections.
> > I think by having just single RELO_GLOBAL_DATA you can actually
> > simplify a bunch of code below, please see corresponding comments.
>
> Ok, sounds like a reasonable simplification, will do all well for v3.
>
> >>                 } type;
> >>                 int insn_idx;
> >>                 union {
> >> @@ -174,7 +177,10 @@ struct bpf_program {
> >>  struct bpf_map {
> >>         int fd;
> >>         char *name;
> >> -       size_t offset;
> >> +       union {
> >> +               __u32 global_type;
> >
> > This could be an index into common maps array.
> >
> >> +               size_t offset;
> >> +       };
> >>         int map_ifindex;
> >>         int inner_map_fd;
> >>         struct bpf_map_def def;
> >> @@ -194,6 +200,8 @@ struct bpf_object {
> >>         size_t nr_programs;
> >>         struct bpf_map *maps;
> >>         size_t nr_maps;
> >> +       struct bpf_map *maps_global;
> >> +       size_t nr_maps_global;
> >
> > Global maps could be stored in maps, along other ones, so that we
> > don't need to keep track of them separately.
> >
> > Another inconvenience of having a separate array of global maps is
> > that bpf_map__iter won't iterate them. I don't know if that's
> > desirable behavior or not, but it probably would be nice to iterate
> > over global ones as well?
>
> My thinking was that these maps are not explicitly user specified,
> so libbpf API would expose them through a different interface than
> the one we have today in order to not confuse or break application
> behavior which would otherwise rely on iterating / processing over
> them. Separate API would retain current behavior and definitely
> make this unambiguous to apps with regards to what to expect from
> each of such API call.

Hm... yeah, I guess it's a decision that needs to be made consciously.
To me it feels like global maps are sort of compiler sugar, which
provides good developer experience, but resolves into lower-level
well-known pre-existing abstractions (just another maps). In any case,
more generic tools like bpftool will dump all of them, so I'm not
certain we should treat them as separate concepts in libbpf? We
currently shouldn't be breaking anyone, because applications can't use
static/global data, so these new maps will come up only for new use
cases. If you think it might be important to differentiate between
auto-generated maps and explicitly defined by user maps, then we can
add some bpf_map__{global/special/autogen} accessor to differentiate?

My thinking here is that if you want to iterate all maps, it's kind of
inconvenient to remember to iterate "normal maps" and remember to
generate "automatic maps".

Also, in the future, if libbpf will need to auto-create some other
type of map to support some other features, it will require another
set of APIs to access/iterate them, so it feels like sticking to
single common list is preferrable, imho.

>
> >>         bool loaded;
> >>         bool has_pseudo_calls;
> >> @@ -209,6 +217,9 @@ struct bpf_object {
> >>                 Elf *elf;
> >>                 GElf_Ehdr ehdr;
> >>                 Elf_Data *symbols;
> >> +               Elf_Data *global_data;
> >> +               Elf_Data *global_rodata;
> >> +               Elf_Data *global_bss;
> >>                 size_t strtabidx;
> >>                 struct {
> >>                         GElf_Shdr shdr;
> >> @@ -217,6 +228,9 @@ struct bpf_object {
> >>                 int nr_reloc;
> >>                 int maps_shndx;
> >>                 int text_shndx;
> >> +               int data_shndx;
> >> +               int rodata_shndx;
> >> +               int bss_shndx;
> >>         } efile;
> >>         /*
> >>          * All loaded bpf_object is linked in a list, which is
> >> @@ -457,6 +471,9 @@ static struct bpf_object *bpf_object__new(const char *path,
> >>         obj->efile.obj_buf = obj_buf;
> >>         obj->efile.obj_buf_sz = obj_buf_sz;
> >>         obj->efile.maps_shndx = -1;
> >> +       obj->efile.data_shndx = -1;
> >> +       obj->efile.rodata_shndx = -1;
> >> +       obj->efile.bss_shndx = -1;
> >>
> >>         obj->loaded = false;
> >>
> >> @@ -475,6 +492,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
> >>                 obj->efile.elf = NULL;
> >>         }
> >>         obj->efile.symbols = NULL;
> >> +       obj->efile.global_data = NULL;
> >> +       obj->efile.global_rodata = NULL;
> >> +       obj->efile.global_bss = NULL;
> >>
> >>         zfree(&obj->efile.reloc);
> >>         obj->efile.nr_reloc = 0;
> >> @@ -757,6 +777,85 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
> >>         return 0;
> >>  }
> >>
> >> +static int
> >> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map);
> >> +
> >> +static int
> >> +bpf_object__init_global(struct bpf_object *obj, int i, int type,
> >> +                       const char *name, Elf_Data *map_data)
> >
> > Instead of deducing flags and looking up for map by index, you can
> > just pass struct bpf_map * directly instead of int i and provide
> > flags, instead of type.
>
> Yep, agree.
>
> >> +{
> >> +       struct bpf_map *map = &obj->maps_global[i];
> >> +       struct bpf_map_def *def = &map->def;
> >> +       char *cp, errmsg[STRERR_BUFSIZE];
> >> +       int err, slot0 = 0;
> >> +
> >> +       def->type = BPF_MAP_TYPE_ARRAY;
> >> +       def->key_size = sizeof(int);
> >> +       def->value_size = map_data->d_size;
> >> +       def->max_entries = 1;
> >> +       def->map_flags = type == RELO_RODATA ? BPF_F_RDONLY_PROG : 0;
> >> +
> >> +       map->name = strdup(name);
> >> +       map->global_type = type;
> >> +       map->fd = bpf_object__create_map(obj, map);
> >> +       if (map->fd < 0) {
> >> +               err = map->fd;
> >> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >> +               pr_warning("failed to create map (name: '%s'): %s\n",
> >> +                          map->name, cp);
> >> +               goto destroy;
> >> +       }
> >> +
> >> +       pr_debug("create map %s: fd=%d\n", map->name, map->fd);
> >> +
> >> +       if (type != RELO_BSS) {
> >> +               err = bpf_map_update_elem(map->fd, &slot0, map_data->d_buf, 0);
> >> +               if (err < 0) {
> >> +                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >> +                       pr_warning("failed to update map (name: '%s'): %s\n",
> >> +                                  map->name, cp);
> >> +                       goto destroy;
> >> +               }
> >> +
> >> +               pr_debug("updated map %s with elf data: fd=%d\n", map->name,
> >> +                        map->fd);
> >> +       }
> >> +       return 0;
> >> +destroy:
> >> +       for (i = 0; i < obj->nr_maps_global; i++)
> >> +               zclose(obj->maps_global[i].fd);
> >> +       return err;
> >> +}
> >> +
> >> +static int
> >> +bpf_object__init_global_maps(struct bpf_object *obj)
> >> +{
> >> +       int nr_maps_global = (obj->efile.data_shndx >= 0) +
> >> +                            (obj->efile.rodata_shndx >= 0) +
> >> +                            (obj->efile.bss_shndx >= 0), i, err = 0;
> >
> > This looks like a good candidate for separate static function? It can
> > also be reused below to check if there is any global map present.
>
> Sounds good.
>
> >> +
> >> +       obj->maps_global = calloc(nr_maps_global, sizeof(obj->maps_global[0]));
> >> +       if (!obj->maps_global) {
> >
> > If nr_maps_global is 0, calloc might or might not return NULL, so this
> > check might erroneously return error.
>
> Good point, just read it up as well from man page, will fix.
>
> >> +               pr_warning("alloc maps for object failed\n");
> >> +               return -ENOMEM;
> >> +       }
> >> +
> >> +       obj->nr_maps_global = nr_maps_global;
> >> +       for (i = 0; i < obj->nr_maps_global; i++)
> >> +               obj->maps[i].fd = -1;
> >> +       i = 0;
> >> +       if (obj->efile.bss_shndx >= 0)
> >> +               err = bpf_object__init_global(obj, i++, RELO_BSS, ".bss",
> >> +                                             obj->efile.global_bss);
> >> +       if (obj->efile.data_shndx >= 0 && !err)
> >> +               err = bpf_object__init_global(obj, i++, RELO_DATA, ".data",
> >> +                                             obj->efile.global_data);
> >> +       if (obj->efile.rodata_shndx >= 0 && !err)
> >> +               err = bpf_object__init_global(obj, i++, RELO_RODATA, ".rodata",
> >> +                                             obj->efile.global_rodata);
> >
> > Here we know exactly what type of map we are creating, so we can just
> > directly pass all the required structs/flags/data.
> >
> > Also, to speed up and simplify relocation processing below, I think
> > it's better to store map indexes for each of available .bss, .data and
> > .rodata maps, eliminating another need for having three different
> > types of data relocations.
>
> Yep, I'll clean this up.
>
> >> +       return err;
> >> +}
> >> +
> >>  static bool section_have_execinstr(struct bpf_object *obj, int idx)
> >>  {
> >>         Elf_Scn *scn;
> >> @@ -865,6 +964,12 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
> >>                                         pr_warning("failed to alloc program %s (%s): %s",
> >>                                                    name, obj->path, cp);
> >>                                 }
> >> +                       } else if (strcmp(name, ".data") == 0) {
> >> +                               obj->efile.global_data = data;
> >> +                               obj->efile.data_shndx = idx;
> >> +                       } else if (strcmp(name, ".rodata") == 0) {
> >> +                               obj->efile.global_rodata = data;
> >> +                               obj->efile.rodata_shndx = idx;
> >>                         }
> >
> > Previously if we encountered unknown PROGBITS section, we'd emit debug
> > message about skipping section, should we add that message here?
>
> Sounds reasonable, I'll add a similar 'skip section' debug output there.
>
> >>                 } else if (sh.sh_type == SHT_REL) {
> >>                         void *reloc = obj->efile.reloc;
> >> @@ -892,6 +997,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
> >>                                 obj->efile.reloc[n].shdr = sh;
> >>                                 obj->efile.reloc[n].data = data;
> >>                         }
> >> +               } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
> >> +                       obj->efile.global_bss = data;
> >> +                       obj->efile.bss_shndx = idx;
> >>                 } else {
> >>                         pr_debug("skip section(%d) %s\n", idx, name);
> >>                 }
> >> @@ -923,6 +1031,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
> >>                 if (err)
> >>                         goto out;
> >>         }
> >> +       if (obj->efile.data_shndx >= 0 ||
> >> +           obj->efile.rodata_shndx >= 0 ||
> >> +           obj->efile.bss_shndx >= 0) {
> >> +               err = bpf_object__init_global_maps(obj);
> >> +               if (err)
> >> +                       goto out;
> >> +       }
> >> +
> >>         err = bpf_object__init_prog_names(obj);
> >>  out:
> >>         return err;
> >> @@ -961,6 +1077,11 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>         Elf_Data *symbols = obj->efile.symbols;
> >>         int text_shndx = obj->efile.text_shndx;
> >>         int maps_shndx = obj->efile.maps_shndx;
> >> +       int data_shndx = obj->efile.data_shndx;
> >> +       int rodata_shndx = obj->efile.rodata_shndx;
> >> +       int bss_shndx = obj->efile.bss_shndx;
> >> +       struct bpf_map *maps_global = obj->maps_global;
> >> +       size_t nr_maps_global = obj->nr_maps_global;
> >>         struct bpf_map *maps = obj->maps;
> >>         size_t nr_maps = obj->nr_maps;
> >>         int i, nrels;
> >> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>                          (long long) (rel.r_info >> 32),
> >>                          (long long) sym.st_value, sym.st_name);
> >>
> >> -               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> >> -                       pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> >> +               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> >> +                   sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> >> +                   sym.st_shndx != bss_shndx) {
> >> +                       pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
> >>                                    prog->section_name, sym.st_shndx);
> >>                         return -LIBBPF_ERRNO__RELOC;
> >>                 }
> >> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>                         prog->reloc_desc[i].type = RELO_LD64;
> >>                         prog->reloc_desc[i].insn_idx = insn_idx;
> >>                         prog->reloc_desc[i].map_idx = map_idx;
> >> +               } else if (sym.st_shndx == data_shndx ||
> >> +                          sym.st_shndx == rodata_shndx ||
> >> +                          sym.st_shndx == bss_shndx) {
> >> +                       int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
> >> +                                  (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> >> +                                                                   RELO_BSS;
> >> +
> >> +                       for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> >> +                               if (maps_global[map_idx].global_type == type) {
> >> +                                       pr_debug("relocation: find map %zd (%s) for insn %u\n",
> >> +                                                map_idx, maps_global[map_idx].name, insn_idx);
> >> +                                       break;
> >> +                               }
> >> +                       }
> >> +
> >> +                       if (map_idx >= nr_maps_global) {
> >> +                               pr_warning("bpf relocation: map_idx %d large than %d\n",
> >> +                                          (int)map_idx, (int)nr_maps_global - 1);
> >> +                               return -LIBBPF_ERRNO__RELOC;
> >> +                       }
> >
> > We don't need to handle all of this if we just remember global map
> > indicies during creation, instead of calculating type, we can just
> > pick correct index (and check it exists). And type can be just generic
> > RELO_DATA.
> >
> >> +
> >> +                       prog->reloc_desc[i].type = type;
> >> +                       prog->reloc_desc[i].insn_idx = insn_idx;
> >> +                       prog->reloc_desc[i].map_idx = map_idx;
> >>                 }
> >>         }
> >>         return 0;
> >> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
> >>  }
> >>
> >>  static int
> >> -bpf_object__create_maps(struct bpf_object *obj)
> >> +bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map)
> >>  {
> >>         struct bpf_create_map_attr create_attr = {};
> >> +       struct bpf_map_def *def = &map->def;
> >> +       char *cp, errmsg[STRERR_BUFSIZE];
> >> +       int fd;
> >> +
> >> +       if (obj->caps.name)
> >> +               create_attr.name = map->name;
> >> +       create_attr.map_ifindex = map->map_ifindex;
> >> +       create_attr.map_type = def->type;
> >> +       create_attr.map_flags = def->map_flags;
> >> +       create_attr.key_size = def->key_size;
> >> +       create_attr.value_size = def->value_size;
> >> +       create_attr.max_entries = def->max_entries;
> >> +       create_attr.btf_fd = 0;
> >> +       create_attr.btf_key_type_id = 0;
> >> +       create_attr.btf_value_type_id = 0;
> >> +       if (bpf_map_type__is_map_in_map(def->type) &&
> >> +           map->inner_map_fd >= 0)
> >> +               create_attr.inner_map_fd = map->inner_map_fd;
> >> +       if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> >> +               create_attr.btf_fd = btf__fd(obj->btf);
> >> +               create_attr.btf_key_type_id = map->btf_key_type_id;
> >> +               create_attr.btf_value_type_id = map->btf_value_type_id;
> >> +       }
> >> +
> >> +       fd = bpf_create_map_xattr(&create_attr);
> >> +       if (fd < 0 && create_attr.btf_key_type_id) {
> >> +               cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >> +               pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> >> +                          map->name, cp, errno);
> >> +
> >> +               create_attr.btf_fd = 0;
> >> +               create_attr.btf_key_type_id = 0;
> >> +               create_attr.btf_value_type_id = 0;
> >> +               map->btf_key_type_id = 0;
> >> +               map->btf_value_type_id = 0;
> >> +               fd = bpf_create_map_xattr(&create_attr);
> >> +       }
> >> +
> >> +       return fd;
> >> +}
> >> +
> >> +static int
> >> +bpf_object__create_maps(struct bpf_object *obj)
> >> +{
> >>         unsigned int i;
> >>         int err;
> >>
> >>         for (i = 0; i < obj->nr_maps; i++) {
> >>                 struct bpf_map *map = &obj->maps[i];
> >> -               struct bpf_map_def *def = &map->def;
> >>                 char *cp, errmsg[STRERR_BUFSIZE];
> >>                 int *pfd = &map->fd;
> >>
> >> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
> >>                                  map->name, map->fd);
> >>                         continue;
> >>                 }
> >> -
> >> -               if (obj->caps.name)
> >> -                       create_attr.name = map->name;
> >> -               create_attr.map_ifindex = map->map_ifindex;
> >> -               create_attr.map_type = def->type;
> >> -               create_attr.map_flags = def->map_flags;
> >> -               create_attr.key_size = def->key_size;
> >> -               create_attr.value_size = def->value_size;
> >> -               create_attr.max_entries = def->max_entries;
> >> -               create_attr.btf_fd = 0;
> >> -               create_attr.btf_key_type_id = 0;
> >> -               create_attr.btf_value_type_id = 0;
> >> -               if (bpf_map_type__is_map_in_map(def->type) &&
> >> -                   map->inner_map_fd >= 0)
> >> -                       create_attr.inner_map_fd = map->inner_map_fd;
> >> -
> >> -               if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> >> -                       create_attr.btf_fd = btf__fd(obj->btf);
> >> -                       create_attr.btf_key_type_id = map->btf_key_type_id;
> >> -                       create_attr.btf_value_type_id = map->btf_value_type_id;
> >> -               }
> >> -
> >> -               *pfd = bpf_create_map_xattr(&create_attr);
> >> -               if (*pfd < 0 && create_attr.btf_key_type_id) {
> >> -                       cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >> -                       pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> >> -                                  map->name, cp, errno);
> >> -                       create_attr.btf_fd = 0;
> >> -                       create_attr.btf_key_type_id = 0;
> >> -                       create_attr.btf_value_type_id = 0;
> >> -                       map->btf_key_type_id = 0;
> >> -                       map->btf_value_type_id = 0;
> >> -                       *pfd = bpf_create_map_xattr(&create_attr);
> >> -               }
> >> -
> >> +               *pfd = bpf_object__create_map(obj, map);
> >>                 if (*pfd < 0) {
> >>                         size_t j;
> >>
> >> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
> >>                                                       &prog->reloc_desc[i]);
> >>                         if (err)
> >>                                 return err;
> >> +               } else if (prog->reloc_desc[i].type == RELO_DATA ||
> >> +                          prog->reloc_desc[i].type == RELO_RODATA ||
> >> +                          prog->reloc_desc[i].type == RELO_BSS) {
> >> +                       struct bpf_insn *insns = prog->insns;
> >> +                       int insn_idx, map_idx, data_off;
> >> +
> >> +                       insn_idx = prog->reloc_desc[i].insn_idx;
> >> +                       map_idx  = prog->reloc_desc[i].map_idx;
> >> +                       data_off = insns[insn_idx].imm;
> >> +
> >> +                       if (insn_idx + 1 >= (int)prog->insns_cnt) {
> >> +                               pr_warning("relocation out of range: '%s'\n",
> >> +                                          prog->section_name);
> >> +                               return -LIBBPF_ERRNO__RELOC;
> >> +                       }
> >> +                       insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> >> +                       insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> >> +                       insns[insn_idx + 1].imm = data_off;
> >>                 }
> >>         }
> >>
> >> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
> >>
> >>         CHECK_ERR(bpf_object__elf_init(obj), err, out);
> >>         CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> >> +       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>         CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
> >>         CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
> >>         CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> >> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
> >>
> >>         for (i = 0; i < obj->nr_maps; i++)
> >>                 zclose(obj->maps[i].fd);
> >> -
> >> +       for (i = 0; i < obj->nr_maps_global; i++)
> >> +               zclose(obj->maps_global[i].fd);
> >>         for (i = 0; i < obj->nr_programs; i++)
> >>                 bpf_program__unload(&obj->programs[i]);
> >>
> >> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
> >>
> >>         obj->loaded = true;
> >>
> >> -       CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>         CHECK_ERR(bpf_object__create_maps(obj), err, out);
> >>         CHECK_ERR(bpf_object__relocate(obj), err, out);
> >>         CHECK_ERR(bpf_object__load_progs(obj), err, out);
> >> --
> >> 2.17.1
> >>
> >
> > I'm sorry if I seem a bit too obsessed with those three new relocation
> > types. I just believe that having one generic and storing global maps
> > along with other maps is cleaner and more uniform.
>
> No worries, thanks for all your feedback and review!
>
> Thanks,
> Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 18:11   ` Yonghong Song
@ 2019-03-01 18:48     ` Andrii Nakryiko
  2019-03-01 18:58       ` Yonghong Song
  2019-03-01 19:56     ` Daniel Borkmann
  1 sibling, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-01 18:48 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Daniel Borkmann, Alexei Starovoitov, bpf, netdev, joe,
	john.fastabend, tgraf, Andrii Nakryiko, jakub.kicinski, lmb

On Fri, Mar 1, 2019 at 10:31 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
> > This work adds BPF loader support for global data sections
> > to libbpf. This allows to write BPF programs in more natural
> > C-like way by being able to define global variables and const
> > data.
> >
> > Back at LPC 2018 [0] we presented a first prototype which
> > implemented support for global data sections by extending BPF
> > syscall where union bpf_attr would get additional memory/size
> > pair for each section passed during prog load in order to later
> > add this base address into the ldimm64 instruction along with
> > the user provided offset when accessing a variable. Consensus
> > from LPC was that for proper upstream support, it would be
> > more desirable to use maps instead of bpf_attr extension as
> > this would allow for introspection of these sections as well
> > as potential life updates of their content. This work follows
> > this path by taking the following steps from loader side:
> >
> >   1) In bpf_object__elf_collect() step we pick up ".data",
> >      ".rodata", and ".bss" section information.
> >
> >   2) If present, in bpf_object__init_global_maps() we create
> >      a map that corresponds to each of the present sections.
> >      Given section size and access properties can differ, a
> >      single entry array map is created with value size that
> >      is corresponding to the ELF section size of .data, .bss
> >      or .rodata. In the latter case, the map is created as
> >      read-only from program side such that verifier rejects
> >      any write attempts into .rodata. In a subsequent step,
> >      for .data and .rodata sections, the section content is
> >      copied into the map through bpf_map_update_elem(). For
> >      .bss this is not necessary since array map is already
> >      zero-initialized by default.
> >
> >   3) In bpf_program__collect_reloc() step, we record the
> >      corresponding map, insn index, and relocation type for
> >      the global data.
> >
> >   4) And last but not least in the actual relocation step in
> >      bpf_program__relocate(), we mark the ldimm64 instruction
> >      with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
> >      imm field the map's file descriptor is stored as similarly
> >      done as in BPF_PSEUDO_MAP_FD, and in the second imm field
> >      (as ldimm64 is 2-insn wide) we store the access offset
> >      into the section.
> >
> >   5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
> >      load will then store the actual target address in order
> >      to have a 'map-lookup'-free access. That is, the actual
> >      map value base address + offset. The destination register
> >      in the verifier will then be marked as PTR_TO_MAP_VALUE,
> >      containing the fixed offset as reg->off and backing BPF
> >      map as reg->map_ptr. Meaning, it's treated as any other
> >      normal map value from verification side, only with
> >      efficient, direct value access instead of actual call to
> >      map lookup helper as in the typical case.
> >
> > Simple example dump of program using globals vars in each
> > section:
> >
> >    # readelf -a test_global_data.o
> >    [...]
> >    [ 6] .bss              NOBITS           0000000000000000  00000328
> >         0000000000000010  0000000000000000  WA       0     0     8
> >    [ 7] .data             PROGBITS         0000000000000000  00000328
> >         0000000000000010  0000000000000000  WA       0     0     8
> >    [ 8] .rodata           PROGBITS         0000000000000000  00000338
> >         0000000000000018  0000000000000000   A       0     0     8
> >    [...]
> >      95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
> >      96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
> >      97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
> >      98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
> >      99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
> >     100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
> >     101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
> >    [...]
> >
> >    # bpftool prog
> >    103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
> >         loaded_at 2019-02-28T02:02:35+0000  uid 0
> >         xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
> >    # bpftool map show id 63
> >    63: array  name .bss  flags 0x0                      <-- .bss area, rw
> >        key 4B  value 16B  max_entries 1  memlock 4096B
> >    # bpftool map show id 64
> >    64: array  name .data  flags 0x0                     <-- .data area, rw
> >        key 4B  value 16B  max_entries 1  memlock 4096B
> >    # bpftool map show id 65
> >    65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
> >        key 4B  value 24B  max_entries 1  memlock 4096B
> >
> >    # bpftool prog dump xlated id 103
> >    int load_static_data(struct __sk_buff * skb):
> >    ; int load_static_data(struct __sk_buff *skb)
> >       0: (b7) r1 = 0
> >    ; key = 0;
> >       1: (63) *(u32 *)(r10 -4) = r1
> >       2: (bf) r6 = r10
> >    ; int load_static_data(struct __sk_buff *skb)
> >       3: (07) r6 += -4
> >    ; bpf_map_update_elem(&result, &key, &static_bss, 0);
> >       4: (18) r1 = map[id:66]
> >       6: (bf) r2 = r6
> >       7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
> >       9: (b7) r4 = 0
> >      10: (85) call array_map_update_elem#99888
> >      11: (b7) r1 = 1
> >    ; key = 1;
> >      12: (63) *(u32 *)(r10 -4) = r1
> >    ; bpf_map_update_elem(&result, &key, &static_data, 0);
> >      13: (18) r1 = map[id:66]
> >      15: (bf) r2 = r6
> >      16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
> >      18: (b7) r4 = 0
> >      19: (85) call array_map_update_elem#99888
> >      20: (b7) r1 = 2
> >    ; key = 2;
> >      21: (63) *(u32 *)(r10 -4) = r1
> >    ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
> >      22: (18) r1 = map[id:66]
> >      24: (bf) r2 = r6
> >      25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
> >      27: (b7) r4 = 0
> >      28: (85) call array_map_update_elem#99888
> >      29: (b7) r1 = 3
> >    ; key = 3;
> >      30: (63) *(u32 *)(r10 -4) = r1
> >    ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
> >      31: (18) r7 = map[id:63][0]+8         <--.
> >      33: (18) r1 = map[id:66]                 |
> >      35: (bf) r2 = r6                         |
> >      36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
> >      38: (b7) r4 = 0
> >      39: (85) call array_map_update_elem#99888
> >    [...]
> >
> > For now .data/.rodata/.bss maps are not exposed via API to the
> > user, but this could be done in a subsequent step.
> >
> > Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> > fail for static variables").
> >
> > Joint work with Joe Stringer.
> >
> >    [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
> >        http://vger.kernel.org/lpc-bpf2018.html#session-3
> >
> > Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> > Signed-off-by: Joe Stringer <joe@wand.net.nz>
> > ---
> >   tools/include/uapi/linux/bpf.h |  10 +-
> >   tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
> >   2 files changed, 226 insertions(+), 43 deletions(-)
> >
> > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> > index 8884072e1a46..04b26f59b413 100644
> > --- a/tools/include/uapi/linux/bpf.h
> > +++ b/tools/include/uapi/linux/bpf.h
> > @@ -287,7 +287,7 @@ enum bpf_attach_type {
> > [...]
> > @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >                        (long long) (rel.r_info >> 32),
> >                        (long long) sym.st_value, sym.st_name);
> >
> > -             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> > -                     pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> > +             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> > +                 sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> > +                 sym.st_shndx != bss_shndx) {
> > +                     pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
> >                                  prog->section_name, sym.st_shndx);
> >                       return -LIBBPF_ERRNO__RELOC;
> >               }
> > @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >                       prog->reloc_desc[i].type = RELO_LD64;
> >                       prog->reloc_desc[i].insn_idx = insn_idx;
> >                       prog->reloc_desc[i].map_idx = map_idx;
> > +             } else if (sym.st_shndx == data_shndx ||
> > +                        sym.st_shndx == rodata_shndx ||
> > +                        sym.st_shndx == bss_shndx) {
> > +                     int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
> > +                                (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> > +                                                                 RELO_BSS;
> > +
> > +                     for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> > +                             if (maps_global[map_idx].global_type == type) {
> > +                                     pr_debug("relocation: find map %zd (%s) for insn %u\n",
> > +                                              map_idx, maps_global[map_idx].name, insn_idx);
> > +                                     break;
> > +                             }
> > +                     }
> > +
> > +                     if (map_idx >= nr_maps_global) {
> > +                             pr_warning("bpf relocation: map_idx %d large than %d\n",
> > +                                        (int)map_idx, (int)nr_maps_global - 1);
> > +                             return -LIBBPF_ERRNO__RELOC;
> > +                     }
> > +
> > +                     prog->reloc_desc[i].type = type;
> > +                     prog->reloc_desc[i].insn_idx = insn_idx;
> > +                     prog->reloc_desc[i].map_idx = map_idx;
> >               }
> >       }
> >       return 0;
> > @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
> >   }
> >
> >   static int
> [...]
> > +
> > +static int
> > +bpf_object__create_maps(struct bpf_object *obj)
> > +{
> >       unsigned int i;
> >       int err;
> >
> >       for (i = 0; i < obj->nr_maps; i++) {
> >               struct bpf_map *map = &obj->maps[i];
> > -             struct bpf_map_def *def = &map->def;
> >               char *cp, errmsg[STRERR_BUFSIZE];
> >               int *pfd = &map->fd;
> >
> > @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
> >                                map->name, map->fd);
> >                       continue;
> >               }
> > -
> > -             if (obj->caps.name)
> > -                     create_attr.name = map->name;
> > -             create_attr.map_ifindex = map->map_ifindex;
> > -             create_attr.map_type = def->type;
> > -             create_attr.map_flags = def->map_flags;
> > -             create_attr.key_size = def->key_size;
> > -             create_attr.value_size = def->value_size;
> > -             create_attr.max_entries = def->max_entries;
> > -             create_attr.btf_fd = 0;
> > -             create_attr.btf_key_type_id = 0;
> > -             create_attr.btf_value_type_id = 0;
> > -             if (bpf_map_type__is_map_in_map(def->type) &&
> > -                 map->inner_map_fd >= 0)
> > -                     create_attr.inner_map_fd = map->inner_map_fd;
> > -
> > -             if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> > -                     create_attr.btf_fd = btf__fd(obj->btf);
> > -                     create_attr.btf_key_type_id = map->btf_key_type_id;
> > -                     create_attr.btf_value_type_id = map->btf_value_type_id;
> > -             }
> > -
> > -             *pfd = bpf_create_map_xattr(&create_attr);
> > -             if (*pfd < 0 && create_attr.btf_key_type_id) {
> > -                     cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> > -                     pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> > -                                map->name, cp, errno);
> > -                     create_attr.btf_fd = 0;
> > -                     create_attr.btf_key_type_id = 0;
> > -                     create_attr.btf_value_type_id = 0;
> > -                     map->btf_key_type_id = 0;
> > -                     map->btf_value_type_id = 0;
> > -                     *pfd = bpf_create_map_xattr(&create_attr);
> > -             }
> > -
> > +             *pfd = bpf_object__create_map(obj, map);
> >               if (*pfd < 0) {
> >                       size_t j;
> >
> > @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
> >                                                     &prog->reloc_desc[i]);
> >                       if (err)
> >                               return err;
> > +             } else if (prog->reloc_desc[i].type == RELO_DATA ||
> > +                        prog->reloc_desc[i].type == RELO_RODATA ||
> > +                        prog->reloc_desc[i].type == RELO_BSS) {
> > +                     struct bpf_insn *insns = prog->insns;
> > +                     int insn_idx, map_idx, data_off;
> > +
> > +                     insn_idx = prog->reloc_desc[i].insn_idx;
> > +                     map_idx  = prog->reloc_desc[i].map_idx;
> > +                     data_off = insns[insn_idx].imm;
>
> I want to point to a subtle difference here between handling pure global
> variables and static global variables. The "imm" value is only available
> for static variables. For example,
>
> -bash-4.4$ cat g.c
> static volatile long sg = 2;
> static volatile int si = 3;
> long g = 4;
> int i = 5;
> int test() { return sg + si + g + i; }
> -bash-4.4$
> -bash-4.4$ clang -target bpf -O2 -c g.c
>
> -bash-4.4$ readelf -s g.o
>
>
> Symbol table '.symtab' contains 8 entries:
>     Num:    Value          Size Type    Bind   Vis      Ndx Name
>       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>       1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g.c
>       2: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    4 sg
>       3: 0000000000000018     4 OBJECT  LOCAL  DEFAULT    4 si
>       4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
>       5: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    4 g
>       6: 0000000000000008     4 OBJECT  GLOBAL DEFAULT    4 i
>       7: 0000000000000000   128 FUNC    GLOBAL DEFAULT    2 test
> -bash-4.4$
> -bash-4.4$ llvm-readelf -r g.o
>
> Relocation section '.rel.text' at offset 0x1d8 contains 4 entries:
>      Offset             Info             Type               Symbol's
> Value  Symbol's Name
> 0000000000000000  0000000400000001 R_BPF_64_64
> 0000000000000000 .data
> 0000000000000018  0000000400000001 R_BPF_64_64
> 0000000000000000 .data
> 0000000000000038  0000000500000001 R_BPF_64_64            0000000000000000 g
> 0000000000000058  0000000600000001 R_BPF_64_64            0000000000000008 i
> -bash-4.4$ llvm-objdump -d g.o
>
> g.o:    file format ELF64-BPF
>
> Disassembly of section .text:
> 0000000000000000 test:
>         0:       18 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00
> r1 = 16 ll
>         2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
>         3:       18 02 00 00 18 00 00 00 00 00 00 00 00 00 00 00
> r2 = 24 ll
>         5:       61 22 00 00 00 00 00 00         r2 = *(u32 *)(r2 + 0)
>         6:       0f 21 00 00 00 00 00 00         r1 += r2
>         7:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> r2 = 0 ll
>         9:       79 22 00 00 00 00 00 00         r2 = *(u64 *)(r2 + 0)
>        10:       0f 21 00 00 00 00 00 00         r1 += r2
>        11:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> r2 = 0 ll
>        13:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
>        14:       0f 10 00 00 00 00 00 00         r0 += r1
>        15:       95 00 00 00 00 00 00 00         exit
> -bash-4.4$
>
> You can see the above, the non-static global access does not have its
> in-section offset encoded in the insn itself. The difference is due to
> llvm treating static global and non-static global differently.
>
> To support both cases, during relocation recording stage, you can
> also record:
>     . symbol binding (GELF_ST_BIND(sym.st_info)),
>       non-static global has binding STB_GLOBAL and static
>       global has binding STB_LOCAL
>     . symbol value (sym.st_value)
>
> During the above relocation resolution, if symbol bind is local, do
> what you already did here. If symbol bind is global, assign data_off
> with symbol value.
>
> This applied to both .data and .rodata sections.
>
> The non initialized
> global variable will not be in any allocated section in ELF file,
> it is in a COM section which is to be allocated by loader.
> So user defines some like
>     int g;
> and later on uses it. Right now, it will not work. The workaround
> is "int g = 4", or "static int g". I guess it should be
> okay, we should encourage users to use "static" variables instead.

Would it be reasonable to just plain disable usage of uninitialized
global variables, as it kind of goes against BPF's philosophy that
everything should be written to, before can be read? So while we can
just implicitly zero-out everything beforehand, it might be a good
idea to remind and enforce that explictly?

>
> > +
> > +                     if (insn_idx + 1 >= (int)prog->insns_cnt) {
> > +                             pr_warning("relocation out of range: '%s'\n",
> > +                                        prog->section_name);
> > +                             return -LIBBPF_ERRNO__RELOC;
> > +                     }
> > +                     insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> > +                     insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> > +                     insns[insn_idx + 1].imm = data_off;
> >               }
> >       }
> >
> > @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
> >
> >       CHECK_ERR(bpf_object__elf_init(obj), err, out);
> >       CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> > +     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >       CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
> >       CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
> >       CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> > @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
> >
> >       for (i = 0; i < obj->nr_maps; i++)
> >               zclose(obj->maps[i].fd);
> > -
> > +     for (i = 0; i < obj->nr_maps_global; i++)
> > +             zclose(obj->maps_global[i].fd);
> >       for (i = 0; i < obj->nr_programs; i++)
> >               bpf_program__unload(&obj->programs[i]);
> >
> > @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
> >
> >       obj->loaded = true;
> >
> > -     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >       CHECK_ERR(bpf_object__create_maps(obj), err, out);
> >       CHECK_ERR(bpf_object__relocate(obj), err, out);
> >       CHECK_ERR(bpf_object__load_progs(obj), err, out);
> >

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-03-01  9:49     ` Daniel Borkmann
@ 2019-03-01 18:50       ` Jakub Kicinski
  2019-03-01 19:35       ` Andrii Nakryiko
  1 sibling, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2019-03-01 18:50 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andrii Nakryiko, Alexei Starovoitov, bpf, Networking, joe,
	john.fastabend, tgraf, Yonghong Song, Andrii Nakryiko, lmb

On Fri, 1 Mar 2019 10:49:46 +0100, Daniel Borkmann wrote:
> Overall, BPF_PSEUDO_MAP_VALUE felt slightly more suitable to me.

FWIW +1 I think different type is way cleaner here.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 18:48     ` Andrii Nakryiko
@ 2019-03-01 18:58       ` Yonghong Song
  2019-03-01 19:10         ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2019-03-01 18:58 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Daniel Borkmann, Alexei Starovoitov, bpf, netdev, joe,
	john.fastabend, tgraf, Andrii Nakryiko, jakub.kicinski, lmb



On 3/1/19 10:48 AM, Andrii Nakryiko wrote:
> On Fri, Mar 1, 2019 at 10:31 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
>>> This work adds BPF loader support for global data sections
>>> to libbpf. This allows to write BPF programs in more natural
>>> C-like way by being able to define global variables and const
>>> data.
>>>
>>> Back at LPC 2018 [0] we presented a first prototype which
>>> implemented support for global data sections by extending BPF
>>> syscall where union bpf_attr would get additional memory/size
>>> pair for each section passed during prog load in order to later
>>> add this base address into the ldimm64 instruction along with
>>> the user provided offset when accessing a variable. Consensus
>>> from LPC was that for proper upstream support, it would be
>>> more desirable to use maps instead of bpf_attr extension as
>>> this would allow for introspection of these sections as well
>>> as potential life updates of their content. This work follows
>>> this path by taking the following steps from loader side:
>>>
>>>    1) In bpf_object__elf_collect() step we pick up ".data",
>>>       ".rodata", and ".bss" section information.
>>>
>>>    2) If present, in bpf_object__init_global_maps() we create
>>>       a map that corresponds to each of the present sections.
>>>       Given section size and access properties can differ, a
>>>       single entry array map is created with value size that
>>>       is corresponding to the ELF section size of .data, .bss
>>>       or .rodata. In the latter case, the map is created as
>>>       read-only from program side such that verifier rejects
>>>       any write attempts into .rodata. In a subsequent step,
>>>       for .data and .rodata sections, the section content is
>>>       copied into the map through bpf_map_update_elem(). For
>>>       .bss this is not necessary since array map is already
>>>       zero-initialized by default.
>>>
>>>    3) In bpf_program__collect_reloc() step, we record the
>>>       corresponding map, insn index, and relocation type for
>>>       the global data.
>>>
>>>    4) And last but not least in the actual relocation step in
>>>       bpf_program__relocate(), we mark the ldimm64 instruction
>>>       with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>>>       imm field the map's file descriptor is stored as similarly
>>>       done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>>>       (as ldimm64 is 2-insn wide) we store the access offset
>>>       into the section.
>>>
>>>    5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>>>       load will then store the actual target address in order
>>>       to have a 'map-lookup'-free access. That is, the actual
>>>       map value base address + offset. The destination register
>>>       in the verifier will then be marked as PTR_TO_MAP_VALUE,
>>>       containing the fixed offset as reg->off and backing BPF
>>>       map as reg->map_ptr. Meaning, it's treated as any other
>>>       normal map value from verification side, only with
>>>       efficient, direct value access instead of actual call to
>>>       map lookup helper as in the typical case.
>>>
>>> Simple example dump of program using globals vars in each
>>> section:
>>>
>>>     # readelf -a test_global_data.o
>>>     [...]
>>>     [ 6] .bss              NOBITS           0000000000000000  00000328
>>>          0000000000000010  0000000000000000  WA       0     0     8
>>>     [ 7] .data             PROGBITS         0000000000000000  00000328
>>>          0000000000000010  0000000000000000  WA       0     0     8
>>>     [ 8] .rodata           PROGBITS         0000000000000000  00000338
>>>          0000000000000018  0000000000000000   A       0     0     8
>>>     [...]
>>>       95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>>>       96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>>>       97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>>>       98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>>>       99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>>>      100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>>>      101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>>>     [...]
>>>
>>>     # bpftool prog
>>>     103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>>>          loaded_at 2019-02-28T02:02:35+0000  uid 0
>>>          xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>>>     # bpftool map show id 63
>>>     63: array  name .bss  flags 0x0                      <-- .bss area, rw
>>>         key 4B  value 16B  max_entries 1  memlock 4096B
>>>     # bpftool map show id 64
>>>     64: array  name .data  flags 0x0                     <-- .data area, rw
>>>         key 4B  value 16B  max_entries 1  memlock 4096B
>>>     # bpftool map show id 65
>>>     65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
>>>         key 4B  value 24B  max_entries 1  memlock 4096B
>>>
>>>     # bpftool prog dump xlated id 103
>>>     int load_static_data(struct __sk_buff * skb):
>>>     ; int load_static_data(struct __sk_buff *skb)
>>>        0: (b7) r1 = 0
>>>     ; key = 0;
>>>        1: (63) *(u32 *)(r10 -4) = r1
>>>        2: (bf) r6 = r10
>>>     ; int load_static_data(struct __sk_buff *skb)
>>>        3: (07) r6 += -4
>>>     ; bpf_map_update_elem(&result, &key, &static_bss, 0);
>>>        4: (18) r1 = map[id:66]
>>>        6: (bf) r2 = r6
>>>        7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
>>>        9: (b7) r4 = 0
>>>       10: (85) call array_map_update_elem#99888
>>>       11: (b7) r1 = 1
>>>     ; key = 1;
>>>       12: (63) *(u32 *)(r10 -4) = r1
>>>     ; bpf_map_update_elem(&result, &key, &static_data, 0);
>>>       13: (18) r1 = map[id:66]
>>>       15: (bf) r2 = r6
>>>       16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
>>>       18: (b7) r4 = 0
>>>       19: (85) call array_map_update_elem#99888
>>>       20: (b7) r1 = 2
>>>     ; key = 2;
>>>       21: (63) *(u32 *)(r10 -4) = r1
>>>     ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
>>>       22: (18) r1 = map[id:66]
>>>       24: (bf) r2 = r6
>>>       25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
>>>       27: (b7) r4 = 0
>>>       28: (85) call array_map_update_elem#99888
>>>       29: (b7) r1 = 3
>>>     ; key = 3;
>>>       30: (63) *(u32 *)(r10 -4) = r1
>>>     ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
>>>       31: (18) r7 = map[id:63][0]+8         <--.
>>>       33: (18) r1 = map[id:66]                 |
>>>       35: (bf) r2 = r6                         |
>>>       36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
>>>       38: (b7) r4 = 0
>>>       39: (85) call array_map_update_elem#99888
>>>     [...]
>>>
>>> For now .data/.rodata/.bss maps are not exposed via API to the
>>> user, but this could be done in a subsequent step.
>>>
>>> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
>>> fail for static variables").
>>>
>>> Joint work with Joe Stringer.
>>>
>>>     [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
>>>         http://vger.kernel.org/lpc-bpf2018.html#session-3
>>>
>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>>> ---
>>>    tools/include/uapi/linux/bpf.h |  10 +-
>>>    tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
>>>    2 files changed, 226 insertions(+), 43 deletions(-)
>>>
>>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>>> index 8884072e1a46..04b26f59b413 100644
>>> --- a/tools/include/uapi/linux/bpf.h
>>> +++ b/tools/include/uapi/linux/bpf.h
>>> @@ -287,7 +287,7 @@ enum bpf_attach_type {
>>> [...]
>>> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>>                         (long long) (rel.r_info >> 32),
>>>                         (long long) sym.st_value, sym.st_name);
>>>
>>> -             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
>>> -                     pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
>>> +             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
>>> +                 sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
>>> +                 sym.st_shndx != bss_shndx) {
>>> +                     pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
>>>                                   prog->section_name, sym.st_shndx);
>>>                        return -LIBBPF_ERRNO__RELOC;
>>>                }
>>> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>>                        prog->reloc_desc[i].type = RELO_LD64;
>>>                        prog->reloc_desc[i].insn_idx = insn_idx;
>>>                        prog->reloc_desc[i].map_idx = map_idx;
>>> +             } else if (sym.st_shndx == data_shndx ||
>>> +                        sym.st_shndx == rodata_shndx ||
>>> +                        sym.st_shndx == bss_shndx) {
>>> +                     int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
>>> +                                (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
>>> +                                                                 RELO_BSS;
>>> +
>>> +                     for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
>>> +                             if (maps_global[map_idx].global_type == type) {
>>> +                                     pr_debug("relocation: find map %zd (%s) for insn %u\n",
>>> +                                              map_idx, maps_global[map_idx].name, insn_idx);
>>> +                                     break;
>>> +                             }
>>> +                     }
>>> +
>>> +                     if (map_idx >= nr_maps_global) {
>>> +                             pr_warning("bpf relocation: map_idx %d large than %d\n",
>>> +                                        (int)map_idx, (int)nr_maps_global - 1);
>>> +                             return -LIBBPF_ERRNO__RELOC;
>>> +                     }
>>> +
>>> +                     prog->reloc_desc[i].type = type;
>>> +                     prog->reloc_desc[i].insn_idx = insn_idx;
>>> +                     prog->reloc_desc[i].map_idx = map_idx;
>>>                }
>>>        }
>>>        return 0;
>>> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
>>>    }
>>>
>>>    static int
>> [...]
>>> +
>>> +static int
>>> +bpf_object__create_maps(struct bpf_object *obj)
>>> +{
>>>        unsigned int i;
>>>        int err;
>>>
>>>        for (i = 0; i < obj->nr_maps; i++) {
>>>                struct bpf_map *map = &obj->maps[i];
>>> -             struct bpf_map_def *def = &map->def;
>>>                char *cp, errmsg[STRERR_BUFSIZE];
>>>                int *pfd = &map->fd;
>>>
>>> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>>>                                 map->name, map->fd);
>>>                        continue;
>>>                }
>>> -
>>> -             if (obj->caps.name)
>>> -                     create_attr.name = map->name;
>>> -             create_attr.map_ifindex = map->map_ifindex;
>>> -             create_attr.map_type = def->type;
>>> -             create_attr.map_flags = def->map_flags;
>>> -             create_attr.key_size = def->key_size;
>>> -             create_attr.value_size = def->value_size;
>>> -             create_attr.max_entries = def->max_entries;
>>> -             create_attr.btf_fd = 0;
>>> -             create_attr.btf_key_type_id = 0;
>>> -             create_attr.btf_value_type_id = 0;
>>> -             if (bpf_map_type__is_map_in_map(def->type) &&
>>> -                 map->inner_map_fd >= 0)
>>> -                     create_attr.inner_map_fd = map->inner_map_fd;
>>> -
>>> -             if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
>>> -                     create_attr.btf_fd = btf__fd(obj->btf);
>>> -                     create_attr.btf_key_type_id = map->btf_key_type_id;
>>> -                     create_attr.btf_value_type_id = map->btf_value_type_id;
>>> -             }
>>> -
>>> -             *pfd = bpf_create_map_xattr(&create_attr);
>>> -             if (*pfd < 0 && create_attr.btf_key_type_id) {
>>> -                     cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>>> -                     pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
>>> -                                map->name, cp, errno);
>>> -                     create_attr.btf_fd = 0;
>>> -                     create_attr.btf_key_type_id = 0;
>>> -                     create_attr.btf_value_type_id = 0;
>>> -                     map->btf_key_type_id = 0;
>>> -                     map->btf_value_type_id = 0;
>>> -                     *pfd = bpf_create_map_xattr(&create_attr);
>>> -             }
>>> -
>>> +             *pfd = bpf_object__create_map(obj, map);
>>>                if (*pfd < 0) {
>>>                        size_t j;
>>>
>>> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>>>                                                      &prog->reloc_desc[i]);
>>>                        if (err)
>>>                                return err;
>>> +             } else if (prog->reloc_desc[i].type == RELO_DATA ||
>>> +                        prog->reloc_desc[i].type == RELO_RODATA ||
>>> +                        prog->reloc_desc[i].type == RELO_BSS) {
>>> +                     struct bpf_insn *insns = prog->insns;
>>> +                     int insn_idx, map_idx, data_off;
>>> +
>>> +                     insn_idx = prog->reloc_desc[i].insn_idx;
>>> +                     map_idx  = prog->reloc_desc[i].map_idx;
>>> +                     data_off = insns[insn_idx].imm;
>>
>> I want to point to a subtle difference here between handling pure global
>> variables and static global variables. The "imm" value is only available
>> for static variables. For example,
>>
>> -bash-4.4$ cat g.c
>> static volatile long sg = 2;
>> static volatile int si = 3;
>> long g = 4;
>> int i = 5;
>> int test() { return sg + si + g + i; }
>> -bash-4.4$
>> -bash-4.4$ clang -target bpf -O2 -c g.c
>>
>> -bash-4.4$ readelf -s g.o
>>
>>
>> Symbol table '.symtab' contains 8 entries:
>>      Num:    Value          Size Type    Bind   Vis      Ndx Name
>>        0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>>        1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g.c
>>        2: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    4 sg
>>        3: 0000000000000018     4 OBJECT  LOCAL  DEFAULT    4 si
>>        4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
>>        5: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    4 g
>>        6: 0000000000000008     4 OBJECT  GLOBAL DEFAULT    4 i
>>        7: 0000000000000000   128 FUNC    GLOBAL DEFAULT    2 test
>> -bash-4.4$
>> -bash-4.4$ llvm-readelf -r g.o
>>
>> Relocation section '.rel.text' at offset 0x1d8 contains 4 entries:
>>       Offset             Info             Type               Symbol's
>> Value  Symbol's Name
>> 0000000000000000  0000000400000001 R_BPF_64_64
>> 0000000000000000 .data
>> 0000000000000018  0000000400000001 R_BPF_64_64
>> 0000000000000000 .data
>> 0000000000000038  0000000500000001 R_BPF_64_64            0000000000000000 g
>> 0000000000000058  0000000600000001 R_BPF_64_64            0000000000000008 i
>> -bash-4.4$ llvm-objdump -d g.o
>>
>> g.o:    file format ELF64-BPF
>>
>> Disassembly of section .text:
>> 0000000000000000 test:
>>          0:       18 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00
>> r1 = 16 ll
>>          2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
>>          3:       18 02 00 00 18 00 00 00 00 00 00 00 00 00 00 00
>> r2 = 24 ll
>>          5:       61 22 00 00 00 00 00 00         r2 = *(u32 *)(r2 + 0)
>>          6:       0f 21 00 00 00 00 00 00         r1 += r2
>>          7:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> r2 = 0 ll
>>          9:       79 22 00 00 00 00 00 00         r2 = *(u64 *)(r2 + 0)
>>         10:       0f 21 00 00 00 00 00 00         r1 += r2
>>         11:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>> r2 = 0 ll
>>         13:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
>>         14:       0f 10 00 00 00 00 00 00         r0 += r1
>>         15:       95 00 00 00 00 00 00 00         exit
>> -bash-4.4$
>>
>> You can see the above, the non-static global access does not have its
>> in-section offset encoded in the insn itself. The difference is due to
>> llvm treating static global and non-static global differently.
>>
>> To support both cases, during relocation recording stage, you can
>> also record:
>>      . symbol binding (GELF_ST_BIND(sym.st_info)),
>>        non-static global has binding STB_GLOBAL and static
>>        global has binding STB_LOCAL
>>      . symbol value (sym.st_value)
>>
>> During the above relocation resolution, if symbol bind is local, do
>> what you already did here. If symbol bind is global, assign data_off
>> with symbol value.
>>
>> This applied to both .data and .rodata sections.
>>
>> The non initialized
>> global variable will not be in any allocated section in ELF file,
>> it is in a COM section which is to be allocated by loader.
>> So user defines some like
>>      int g;
>> and later on uses it. Right now, it will not work. The workaround
>> is "int g = 4", or "static int g". I guess it should be
>> okay, we should encourage users to use "static" variables instead.
> 
> Would it be reasonable to just plain disable usage of uninitialized
> global variables, as it kind of goes against BPF's philosophy that
> everything should be written to, before can be read? So while we can
> just implicitly zero-out everything beforehand, it might be a good
> idea to remind and enforce that explictly?

There will be a verifier error, so the program with "int g" will not 
run, the same as today.

We could improve by flagging the error at compiler error or libbpf time.
But it is not required. I am mentioning just for completeness.

> 
>>
>>> +
>>> +                     if (insn_idx + 1 >= (int)prog->insns_cnt) {
>>> +                             pr_warning("relocation out of range: '%s'\n",
>>> +                                        prog->section_name);
>>> +                             return -LIBBPF_ERRNO__RELOC;
>>> +                     }
>>> +                     insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
>>> +                     insns[insn_idx].imm = obj->maps_global[map_idx].fd;
>>> +                     insns[insn_idx + 1].imm = data_off;
>>>                }
>>>        }
>>>
>>> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
>>>
>>>        CHECK_ERR(bpf_object__elf_init(obj), err, out);
>>>        CHECK_ERR(bpf_object__check_endianness(obj), err, out);
>>> +     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>>>        CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
>>>        CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
>>>        CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
>>> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
>>>
>>>        for (i = 0; i < obj->nr_maps; i++)
>>>                zclose(obj->maps[i].fd);
>>> -
>>> +     for (i = 0; i < obj->nr_maps_global; i++)
>>> +             zclose(obj->maps_global[i].fd);
>>>        for (i = 0; i < obj->nr_programs; i++)
>>>                bpf_program__unload(&obj->programs[i]);
>>>
>>> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
>>>
>>>        obj->loaded = true;
>>>
>>> -     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>>>        CHECK_ERR(bpf_object__create_maps(obj), err, out);
>>>        CHECK_ERR(bpf_object__relocate(obj), err, out);
>>>        CHECK_ERR(bpf_object__load_progs(obj), err, out);
>>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 18:58       ` Yonghong Song
@ 2019-03-01 19:10         ` Andrii Nakryiko
  2019-03-01 19:19           ` Yonghong Song
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-01 19:10 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Daniel Borkmann, Alexei Starovoitov, bpf, netdev, joe,
	john.fastabend, tgraf, Andrii Nakryiko, jakub.kicinski, lmb

On Fri, Mar 1, 2019 at 10:58 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 3/1/19 10:48 AM, Andrii Nakryiko wrote:
> > On Fri, Mar 1, 2019 at 10:31 AM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
> >>> This work adds BPF loader support for global data sections
> >>> to libbpf. This allows to write BPF programs in more natural
> >>> C-like way by being able to define global variables and const
> >>> data.
> >>>
> >>> Back at LPC 2018 [0] we presented a first prototype which
> >>> implemented support for global data sections by extending BPF
> >>> syscall where union bpf_attr would get additional memory/size
> >>> pair for each section passed during prog load in order to later
> >>> add this base address into the ldimm64 instruction along with
> >>> the user provided offset when accessing a variable. Consensus
> >>> from LPC was that for proper upstream support, it would be
> >>> more desirable to use maps instead of bpf_attr extension as
> >>> this would allow for introspection of these sections as well
> >>> as potential life updates of their content. This work follows
> >>> this path by taking the following steps from loader side:
> >>>
> >>>    1) In bpf_object__elf_collect() step we pick up ".data",
> >>>       ".rodata", and ".bss" section information.
> >>>
> >>>    2) If present, in bpf_object__init_global_maps() we create
> >>>       a map that corresponds to each of the present sections.
> >>>       Given section size and access properties can differ, a
> >>>       single entry array map is created with value size that
> >>>       is corresponding to the ELF section size of .data, .bss
> >>>       or .rodata. In the latter case, the map is created as
> >>>       read-only from program side such that verifier rejects
> >>>       any write attempts into .rodata. In a subsequent step,
> >>>       for .data and .rodata sections, the section content is
> >>>       copied into the map through bpf_map_update_elem(). For
> >>>       .bss this is not necessary since array map is already
> >>>       zero-initialized by default.
> >>>
> >>>    3) In bpf_program__collect_reloc() step, we record the
> >>>       corresponding map, insn index, and relocation type for
> >>>       the global data.
> >>>
> >>>    4) And last but not least in the actual relocation step in
> >>>       bpf_program__relocate(), we mark the ldimm64 instruction
> >>>       with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
> >>>       imm field the map's file descriptor is stored as similarly
> >>>       done as in BPF_PSEUDO_MAP_FD, and in the second imm field
> >>>       (as ldimm64 is 2-insn wide) we store the access offset
> >>>       into the section.
> >>>
> >>>    5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
> >>>       load will then store the actual target address in order
> >>>       to have a 'map-lookup'-free access. That is, the actual
> >>>       map value base address + offset. The destination register
> >>>       in the verifier will then be marked as PTR_TO_MAP_VALUE,
> >>>       containing the fixed offset as reg->off and backing BPF
> >>>       map as reg->map_ptr. Meaning, it's treated as any other
> >>>       normal map value from verification side, only with
> >>>       efficient, direct value access instead of actual call to
> >>>       map lookup helper as in the typical case.
> >>>
> >>> Simple example dump of program using globals vars in each
> >>> section:
> >>>
> >>>     # readelf -a test_global_data.o
> >>>     [...]
> >>>     [ 6] .bss              NOBITS           0000000000000000  00000328
> >>>          0000000000000010  0000000000000000  WA       0     0     8
> >>>     [ 7] .data             PROGBITS         0000000000000000  00000328
> >>>          0000000000000010  0000000000000000  WA       0     0     8
> >>>     [ 8] .rodata           PROGBITS         0000000000000000  00000338
> >>>          0000000000000018  0000000000000000   A       0     0     8
> >>>     [...]
> >>>       95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
> >>>       96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
> >>>       97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
> >>>       98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
> >>>       99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
> >>>      100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
> >>>      101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
> >>>     [...]
> >>>
> >>>     # bpftool prog
> >>>     103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
> >>>          loaded_at 2019-02-28T02:02:35+0000  uid 0
> >>>          xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
> >>>     # bpftool map show id 63
> >>>     63: array  name .bss  flags 0x0                      <-- .bss area, rw
> >>>         key 4B  value 16B  max_entries 1  memlock 4096B
> >>>     # bpftool map show id 64
> >>>     64: array  name .data  flags 0x0                     <-- .data area, rw
> >>>         key 4B  value 16B  max_entries 1  memlock 4096B
> >>>     # bpftool map show id 65
> >>>     65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
> >>>         key 4B  value 24B  max_entries 1  memlock 4096B
> >>>
> >>>     # bpftool prog dump xlated id 103
> >>>     int load_static_data(struct __sk_buff * skb):
> >>>     ; int load_static_data(struct __sk_buff *skb)
> >>>        0: (b7) r1 = 0
> >>>     ; key = 0;
> >>>        1: (63) *(u32 *)(r10 -4) = r1
> >>>        2: (bf) r6 = r10
> >>>     ; int load_static_data(struct __sk_buff *skb)
> >>>        3: (07) r6 += -4
> >>>     ; bpf_map_update_elem(&result, &key, &static_bss, 0);
> >>>        4: (18) r1 = map[id:66]
> >>>        6: (bf) r2 = r6
> >>>        7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
> >>>        9: (b7) r4 = 0
> >>>       10: (85) call array_map_update_elem#99888
> >>>       11: (b7) r1 = 1
> >>>     ; key = 1;
> >>>       12: (63) *(u32 *)(r10 -4) = r1
> >>>     ; bpf_map_update_elem(&result, &key, &static_data, 0);
> >>>       13: (18) r1 = map[id:66]
> >>>       15: (bf) r2 = r6
> >>>       16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
> >>>       18: (b7) r4 = 0
> >>>       19: (85) call array_map_update_elem#99888
> >>>       20: (b7) r1 = 2
> >>>     ; key = 2;
> >>>       21: (63) *(u32 *)(r10 -4) = r1
> >>>     ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
> >>>       22: (18) r1 = map[id:66]
> >>>       24: (bf) r2 = r6
> >>>       25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
> >>>       27: (b7) r4 = 0
> >>>       28: (85) call array_map_update_elem#99888
> >>>       29: (b7) r1 = 3
> >>>     ; key = 3;
> >>>       30: (63) *(u32 *)(r10 -4) = r1
> >>>     ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
> >>>       31: (18) r7 = map[id:63][0]+8         <--.
> >>>       33: (18) r1 = map[id:66]                 |
> >>>       35: (bf) r2 = r6                         |
> >>>       36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
> >>>       38: (b7) r4 = 0
> >>>       39: (85) call array_map_update_elem#99888
> >>>     [...]
> >>>
> >>> For now .data/.rodata/.bss maps are not exposed via API to the
> >>> user, but this could be done in a subsequent step.
> >>>
> >>> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> >>> fail for static variables").
> >>>
> >>> Joint work with Joe Stringer.
> >>>
> >>>     [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
> >>>         http://vger.kernel.org/lpc-bpf2018.html#session-3
> >>>
> >>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> >>> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> >>> ---
> >>>    tools/include/uapi/linux/bpf.h |  10 +-
> >>>    tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
> >>>    2 files changed, 226 insertions(+), 43 deletions(-)
> >>>
> >>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> >>> index 8884072e1a46..04b26f59b413 100644
> >>> --- a/tools/include/uapi/linux/bpf.h
> >>> +++ b/tools/include/uapi/linux/bpf.h
> >>> @@ -287,7 +287,7 @@ enum bpf_attach_type {
> >>> [...]
> >>> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>>                         (long long) (rel.r_info >> 32),
> >>>                         (long long) sym.st_value, sym.st_name);
> >>>
> >>> -             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> >>> -                     pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> >>> +             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
> >>> +                 sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
> >>> +                 sym.st_shndx != bss_shndx) {
> >>> +                     pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
> >>>                                   prog->section_name, sym.st_shndx);
> >>>                        return -LIBBPF_ERRNO__RELOC;
> >>>                }
> >>> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
> >>>                        prog->reloc_desc[i].type = RELO_LD64;
> >>>                        prog->reloc_desc[i].insn_idx = insn_idx;
> >>>                        prog->reloc_desc[i].map_idx = map_idx;
> >>> +             } else if (sym.st_shndx == data_shndx ||
> >>> +                        sym.st_shndx == rodata_shndx ||
> >>> +                        sym.st_shndx == bss_shndx) {
> >>> +                     int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
> >>> +                                (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
> >>> +                                                                 RELO_BSS;
> >>> +
> >>> +                     for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
> >>> +                             if (maps_global[map_idx].global_type == type) {
> >>> +                                     pr_debug("relocation: find map %zd (%s) for insn %u\n",
> >>> +                                              map_idx, maps_global[map_idx].name, insn_idx);
> >>> +                                     break;
> >>> +                             }
> >>> +                     }
> >>> +
> >>> +                     if (map_idx >= nr_maps_global) {
> >>> +                             pr_warning("bpf relocation: map_idx %d large than %d\n",
> >>> +                                        (int)map_idx, (int)nr_maps_global - 1);
> >>> +                             return -LIBBPF_ERRNO__RELOC;
> >>> +                     }
> >>> +
> >>> +                     prog->reloc_desc[i].type = type;
> >>> +                     prog->reloc_desc[i].insn_idx = insn_idx;
> >>> +                     prog->reloc_desc[i].map_idx = map_idx;
> >>>                }
> >>>        }
> >>>        return 0;
> >>> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
> >>>    }
> >>>
> >>>    static int
> >> [...]
> >>> +
> >>> +static int
> >>> +bpf_object__create_maps(struct bpf_object *obj)
> >>> +{
> >>>        unsigned int i;
> >>>        int err;
> >>>
> >>>        for (i = 0; i < obj->nr_maps; i++) {
> >>>                struct bpf_map *map = &obj->maps[i];
> >>> -             struct bpf_map_def *def = &map->def;
> >>>                char *cp, errmsg[STRERR_BUFSIZE];
> >>>                int *pfd = &map->fd;
> >>>
> >>> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
> >>>                                 map->name, map->fd);
> >>>                        continue;
> >>>                }
> >>> -
> >>> -             if (obj->caps.name)
> >>> -                     create_attr.name = map->name;
> >>> -             create_attr.map_ifindex = map->map_ifindex;
> >>> -             create_attr.map_type = def->type;
> >>> -             create_attr.map_flags = def->map_flags;
> >>> -             create_attr.key_size = def->key_size;
> >>> -             create_attr.value_size = def->value_size;
> >>> -             create_attr.max_entries = def->max_entries;
> >>> -             create_attr.btf_fd = 0;
> >>> -             create_attr.btf_key_type_id = 0;
> >>> -             create_attr.btf_value_type_id = 0;
> >>> -             if (bpf_map_type__is_map_in_map(def->type) &&
> >>> -                 map->inner_map_fd >= 0)
> >>> -                     create_attr.inner_map_fd = map->inner_map_fd;
> >>> -
> >>> -             if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
> >>> -                     create_attr.btf_fd = btf__fd(obj->btf);
> >>> -                     create_attr.btf_key_type_id = map->btf_key_type_id;
> >>> -                     create_attr.btf_value_type_id = map->btf_value_type_id;
> >>> -             }
> >>> -
> >>> -             *pfd = bpf_create_map_xattr(&create_attr);
> >>> -             if (*pfd < 0 && create_attr.btf_key_type_id) {
> >>> -                     cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
> >>> -                     pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
> >>> -                                map->name, cp, errno);
> >>> -                     create_attr.btf_fd = 0;
> >>> -                     create_attr.btf_key_type_id = 0;
> >>> -                     create_attr.btf_value_type_id = 0;
> >>> -                     map->btf_key_type_id = 0;
> >>> -                     map->btf_value_type_id = 0;
> >>> -                     *pfd = bpf_create_map_xattr(&create_attr);
> >>> -             }
> >>> -
> >>> +             *pfd = bpf_object__create_map(obj, map);
> >>>                if (*pfd < 0) {
> >>>                        size_t j;
> >>>
> >>> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
> >>>                                                      &prog->reloc_desc[i]);
> >>>                        if (err)
> >>>                                return err;
> >>> +             } else if (prog->reloc_desc[i].type == RELO_DATA ||
> >>> +                        prog->reloc_desc[i].type == RELO_RODATA ||
> >>> +                        prog->reloc_desc[i].type == RELO_BSS) {
> >>> +                     struct bpf_insn *insns = prog->insns;
> >>> +                     int insn_idx, map_idx, data_off;
> >>> +
> >>> +                     insn_idx = prog->reloc_desc[i].insn_idx;
> >>> +                     map_idx  = prog->reloc_desc[i].map_idx;
> >>> +                     data_off = insns[insn_idx].imm;
> >>
> >> I want to point to a subtle difference here between handling pure global
> >> variables and static global variables. The "imm" value is only available
> >> for static variables. For example,
> >>
> >> -bash-4.4$ cat g.c
> >> static volatile long sg = 2;
> >> static volatile int si = 3;
> >> long g = 4;
> >> int i = 5;
> >> int test() { return sg + si + g + i; }
> >> -bash-4.4$
> >> -bash-4.4$ clang -target bpf -O2 -c g.c
> >>
> >> -bash-4.4$ readelf -s g.o
> >>
> >>
> >> Symbol table '.symtab' contains 8 entries:
> >>      Num:    Value          Size Type    Bind   Vis      Ndx Name
> >>        0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
> >>        1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g.c
> >>        2: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    4 sg
> >>        3: 0000000000000018     4 OBJECT  LOCAL  DEFAULT    4 si
> >>        4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
> >>        5: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    4 g
> >>        6: 0000000000000008     4 OBJECT  GLOBAL DEFAULT    4 i
> >>        7: 0000000000000000   128 FUNC    GLOBAL DEFAULT    2 test
> >> -bash-4.4$
> >> -bash-4.4$ llvm-readelf -r g.o
> >>
> >> Relocation section '.rel.text' at offset 0x1d8 contains 4 entries:
> >>       Offset             Info             Type               Symbol's
> >> Value  Symbol's Name
> >> 0000000000000000  0000000400000001 R_BPF_64_64
> >> 0000000000000000 .data
> >> 0000000000000018  0000000400000001 R_BPF_64_64
> >> 0000000000000000 .data
> >> 0000000000000038  0000000500000001 R_BPF_64_64            0000000000000000 g
> >> 0000000000000058  0000000600000001 R_BPF_64_64            0000000000000008 i
> >> -bash-4.4$ llvm-objdump -d g.o
> >>
> >> g.o:    file format ELF64-BPF
> >>
> >> Disassembly of section .text:
> >> 0000000000000000 test:
> >>          0:       18 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00
> >> r1 = 16 ll
> >>          2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
> >>          3:       18 02 00 00 18 00 00 00 00 00 00 00 00 00 00 00
> >> r2 = 24 ll
> >>          5:       61 22 00 00 00 00 00 00         r2 = *(u32 *)(r2 + 0)
> >>          6:       0f 21 00 00 00 00 00 00         r1 += r2
> >>          7:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> r2 = 0 ll
> >>          9:       79 22 00 00 00 00 00 00         r2 = *(u64 *)(r2 + 0)
> >>         10:       0f 21 00 00 00 00 00 00         r1 += r2
> >>         11:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >> r2 = 0 ll
> >>         13:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
> >>         14:       0f 10 00 00 00 00 00 00         r0 += r1
> >>         15:       95 00 00 00 00 00 00 00         exit
> >> -bash-4.4$
> >>
> >> You can see the above, the non-static global access does not have its
> >> in-section offset encoded in the insn itself. The difference is due to
> >> llvm treating static global and non-static global differently.
> >>
> >> To support both cases, during relocation recording stage, you can
> >> also record:
> >>      . symbol binding (GELF_ST_BIND(sym.st_info)),
> >>        non-static global has binding STB_GLOBAL and static
> >>        global has binding STB_LOCAL
> >>      . symbol value (sym.st_value)
> >>
> >> During the above relocation resolution, if symbol bind is local, do
> >> what you already did here. If symbol bind is global, assign data_off
> >> with symbol value.
> >>
> >> This applied to both .data and .rodata sections.
> >>
> >> The non initialized
> >> global variable will not be in any allocated section in ELF file,
> >> it is in a COM section which is to be allocated by loader.
> >> So user defines some like
> >>      int g;
> >> and later on uses it. Right now, it will not work. The workaround
> >> is "int g = 4", or "static int g". I guess it should be
> >> okay, we should encourage users to use "static" variables instead.
> >
> > Would it be reasonable to just plain disable usage of uninitialized
> > global variables, as it kind of goes against BPF's philosophy that
> > everything should be written to, before can be read? So while we can
> > just implicitly zero-out everything beforehand, it might be a good
> > idea to remind and enforce that explictly?
>
> There will be a verifier error, so the program with "int g" will not
> run, the same as today.

Yeah, I understand, but with pretty obscure error about not supporting
relocations and stuff, right?

>
> We could improve by flagging the error at compiler error or libbpf time.

So that's my point, that having compiler emit nicer error for
target=bpf would be nice touch to user experience :)

> But it is not required. I am mentioning just for completeness.
>
> >
> >>
> >>> +
> >>> +                     if (insn_idx + 1 >= (int)prog->insns_cnt) {
> >>> +                             pr_warning("relocation out of range: '%s'\n",
> >>> +                                        prog->section_name);
> >>> +                             return -LIBBPF_ERRNO__RELOC;
> >>> +                     }
> >>> +                     insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> >>> +                     insns[insn_idx].imm = obj->maps_global[map_idx].fd;
> >>> +                     insns[insn_idx + 1].imm = data_off;
> >>>                }
> >>>        }
> >>>
> >>> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
> >>>
> >>>        CHECK_ERR(bpf_object__elf_init(obj), err, out);
> >>>        CHECK_ERR(bpf_object__check_endianness(obj), err, out);
> >>> +     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>>        CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
> >>>        CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
> >>>        CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
> >>> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
> >>>
> >>>        for (i = 0; i < obj->nr_maps; i++)
> >>>                zclose(obj->maps[i].fd);
> >>> -
> >>> +     for (i = 0; i < obj->nr_maps_global; i++)
> >>> +             zclose(obj->maps_global[i].fd);
> >>>        for (i = 0; i < obj->nr_programs; i++)
> >>>                bpf_program__unload(&obj->programs[i]);
> >>>
> >>> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
> >>>
> >>>        obj->loaded = true;
> >>>
> >>> -     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
> >>>        CHECK_ERR(bpf_object__create_maps(obj), err, out);
> >>>        CHECK_ERR(bpf_object__relocate(obj), err, out);
> >>>        CHECK_ERR(bpf_object__load_progs(obj), err, out);
> >>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 6/7] bpf, selftest: test global data/bss/rodata sections
  2019-02-28 23:18 ` [PATCH bpf-next v2 6/7] bpf, selftest: test " Daniel Borkmann
@ 2019-03-01 19:13   ` Andrii Nakryiko
  2019-03-01 20:02     ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-01 19:13 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, joe, john.fastabend, tgraf,
	Yonghong Song, Andrii Nakryiko, Jakub Kicinski, lmb

On Thu, Feb 28, 2019 at 3:32 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> From: Joe Stringer <joe@wand.net.nz>
>
> Add tests for libbpf relocation of static variable references
> into the .data, .rodata and .bss sections of the ELF. Tests with
> different offsets are all passing:
>
>   # ./test_progs
>   [...]
>   test_static_data_access:PASS:load program 0 nsec
>   test_static_data_access:PASS:pass packet 278 nsec
>   test_static_data_access:PASS:relocate .bss reference 1 278 nsec
>   test_static_data_access:PASS:relocate .data reference 1 278 nsec
>   test_static_data_access:PASS:relocate .rodata reference 1 278 nsec
>   test_static_data_access:PASS:relocate .bss reference 2 278 nsec
>   test_static_data_access:PASS:relocate .data reference 2 278 nsec
>   test_static_data_access:PASS:relocate .rodata reference 2 278 nsec
>   test_static_data_access:PASS:relocate .bss reference 3 278 nsec
>   test_static_data_access:PASS:relocate .bss reference 4 278 nsec
>   Summary: 223 PASSED, 0 FAILED
>
> Joint work with Daniel Borkmann.
>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  tools/testing/selftests/bpf/bpf_helpers.h     |  2 +-
>  .../selftests/bpf/progs/test_global_data.c    | 61 +++++++++++++++++++
>  tools/testing/selftests/bpf/test_progs.c      | 50 +++++++++++++++
>  3 files changed, 112 insertions(+), 1 deletion(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/test_global_data.c
>
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index d9999f1ed1d2..0463662935f9 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -11,7 +11,7 @@
>  /* helper functions called from eBPF programs written in C */
>  static void *(*bpf_map_lookup_elem)(void *map, void *key) =
>         (void *) BPF_FUNC_map_lookup_elem;
> -static int (*bpf_map_update_elem)(void *map, void *key, void *value,
> +static int (*bpf_map_update_elem)(void *map, const void *key, const void *value,
>                                   unsigned long long flags) =
>         (void *) BPF_FUNC_map_update_elem;
>  static int (*bpf_map_delete_elem)(void *map, void *key) =
> diff --git a/tools/testing/selftests/bpf/progs/test_global_data.c b/tools/testing/selftests/bpf/progs/test_global_data.c
> new file mode 100644
> index 000000000000..2a7cf40b8efb
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_global_data.c
> @@ -0,0 +1,61 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2019 Isovalent, Inc.
> +
> +#include <linux/bpf.h>
> +#include <linux/pkt_cls.h>
> +#include <string.h>
> +
> +#include "bpf_helpers.h"
> +
> +struct bpf_map_def SEC("maps") result = {
> +       .type           = BPF_MAP_TYPE_ARRAY,
> +       .key_size       = sizeof(__u32),
> +       .value_size     = sizeof(__u64),
> +       .max_entries    = 9,
> +};
> +
> +static       __u64 static_bss     = 0;         /* Reloc reference to .bss  section   */
> +static       __u64 static_data    = 42;                /* Reloc reference to .data section   */
> +static const __u64 static_rodata  = 24;                /* Reloc reference to .rodata section */
> +static       __u64 static_bss2    = 0;         /* Reloc reference to .bss  section   */
> +static       __u64 static_data2   = 0xffeeff;  /* Reloc reference to .data section   */
> +static const __u64 static_rodata2 = 0xabab;    /* Reloc reference to .rodata section */
> +static const __u64 static_rodata3 = 0xab;      /* Reloc reference to .rodata section */

In the light of Yonghong's explanation about static vs non-static
globals, it would be nice to add test for non-static initialized
globals here as well?

> +
> +SEC("static_data_load")
> +int load_static_data(struct __sk_buff *skb)
> +{
> +       __u32 key;
> +
> +       key = 0;
> +       bpf_map_update_elem(&result, &key, &static_bss, 0);
> +
> +       key = 1;
> +       bpf_map_update_elem(&result, &key, &static_data, 0);
> +
> +       key = 2;
> +       bpf_map_update_elem(&result, &key, &static_rodata, 0);
> +
> +       key = 3;
> +       bpf_map_update_elem(&result, &key, &static_bss2, 0);
> +
> +       key = 4;
> +       bpf_map_update_elem(&result, &key, &static_data2, 0);
> +
> +       key = 5;
> +       bpf_map_update_elem(&result, &key, &static_rodata2, 0);
> +
> +       key = 6;
> +       static_bss2 = 1234;
> +       bpf_map_update_elem(&result, &key, &static_bss2, 0);
> +
> +       key = 7;
> +       bpf_map_update_elem(&result, &key, &static_bss, 0);
> +
> +       key = 8;
> +       bpf_map_update_elem(&result, &key, &static_rodata3, 0);
> +
> +       return TC_ACT_OK;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> diff --git a/tools/testing/selftests/bpf/test_progs.c b/tools/testing/selftests/bpf/test_progs.c
> index c59d2e015d16..a3e64c054572 100644
> --- a/tools/testing/selftests/bpf/test_progs.c
> +++ b/tools/testing/selftests/bpf/test_progs.c
> @@ -738,6 +738,55 @@ static void test_pkt_md_access(void)
>         bpf_object__close(obj);
>  }
>
> +static void test_static_data_access(void)
> +{
> +       const char *file = "./test_global_data.o";
> +       struct bpf_object *obj;
> +       __u32 duration = 0, retval;
> +       int i, err, prog_fd, map_fd;
> +       uint64_t value;
> +
> +       err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
> +       if (CHECK(err, "load program", "error %d loading %s\n", err, file))
> +               return;
> +
> +       map_fd = bpf_find_map(__func__, obj, "result");
> +       if (map_fd < 0) {
> +               error_cnt++;
> +               goto close_prog;
> +       }
> +
> +       err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
> +                               NULL, NULL, &retval, &duration);
> +       CHECK(err || retval, "pass packet",
> +             "err %d errno %d retval %d duration %d\n",
> +             err, errno, retval, duration);
> +
> +       struct {
> +               char *name;
> +               uint32_t key;
> +               uint64_t value;
> +       } tests[] = {
> +               { "relocate .bss reference 1",    0, 0 },
> +               { "relocate .data reference 1",   1, 42 },
> +               { "relocate .rodata reference 1", 2, 24 },
> +               { "relocate .bss reference 2",    3, 0 },
> +               { "relocate .data reference 2",   4, 0xffeeff },
> +               { "relocate .rodata reference 2", 5, 0xabab },
> +               { "relocate .bss reference 3",    6, 1234 },
> +               { "relocate .bss reference 4",    7, 0 },
> +       };
> +       for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
> +               err = bpf_map_lookup_elem(map_fd, &tests[i].key, &value);
> +               CHECK (err || value != tests[i].value, tests[i].name,
> +                      "err %d result %lu expected %lu\n",
> +                      err, value, tests[i].value);
> +       }
> +
> +close_prog:
> +       bpf_object__close(obj);
> +}
> +
>  static void test_obj_name(void)
>  {
>         struct {
> @@ -2182,6 +2231,7 @@ int main(void)
>         test_map_lock();
>         test_signal_pending(BPF_PROG_TYPE_SOCKET_FILTER);
>         test_signal_pending(BPF_PROG_TYPE_FLOW_DISSECTOR);
> +       test_static_data_access();
>
>         printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt);
>         return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 19:10         ` Andrii Nakryiko
@ 2019-03-01 19:19           ` Yonghong Song
  2019-03-01 20:06             ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2019-03-01 19:19 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Daniel Borkmann, Alexei Starovoitov, bpf, netdev, joe,
	john.fastabend, tgraf, Andrii Nakryiko, jakub.kicinski, lmb



On 3/1/19 11:10 AM, Andrii Nakryiko wrote:
> On Fri, Mar 1, 2019 at 10:58 AM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 3/1/19 10:48 AM, Andrii Nakryiko wrote:
>>> On Fri, Mar 1, 2019 at 10:31 AM Yonghong Song <yhs@fb.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
>>>>> This work adds BPF loader support for global data sections
>>>>> to libbpf. This allows to write BPF programs in more natural
>>>>> C-like way by being able to define global variables and const
>>>>> data.
>>>>>
>>>>> Back at LPC 2018 [0] we presented a first prototype which
>>>>> implemented support for global data sections by extending BPF
>>>>> syscall where union bpf_attr would get additional memory/size
>>>>> pair for each section passed during prog load in order to later
>>>>> add this base address into the ldimm64 instruction along with
>>>>> the user provided offset when accessing a variable. Consensus
>>>>> from LPC was that for proper upstream support, it would be
>>>>> more desirable to use maps instead of bpf_attr extension as
>>>>> this would allow for introspection of these sections as well
>>>>> as potential life updates of their content. This work follows
>>>>> this path by taking the following steps from loader side:
>>>>>
>>>>>     1) In bpf_object__elf_collect() step we pick up ".data",
>>>>>        ".rodata", and ".bss" section information.
>>>>>
>>>>>     2) If present, in bpf_object__init_global_maps() we create
>>>>>        a map that corresponds to each of the present sections.
>>>>>        Given section size and access properties can differ, a
>>>>>        single entry array map is created with value size that
>>>>>        is corresponding to the ELF section size of .data, .bss
>>>>>        or .rodata. In the latter case, the map is created as
>>>>>        read-only from program side such that verifier rejects
>>>>>        any write attempts into .rodata. In a subsequent step,
>>>>>        for .data and .rodata sections, the section content is
>>>>>        copied into the map through bpf_map_update_elem(). For
>>>>>        .bss this is not necessary since array map is already
>>>>>        zero-initialized by default.
>>>>>
>>>>>     3) In bpf_program__collect_reloc() step, we record the
>>>>>        corresponding map, insn index, and relocation type for
>>>>>        the global data.
>>>>>
>>>>>     4) And last but not least in the actual relocation step in
>>>>>        bpf_program__relocate(), we mark the ldimm64 instruction
>>>>>        with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>>>>>        imm field the map's file descriptor is stored as similarly
>>>>>        done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>>>>>        (as ldimm64 is 2-insn wide) we store the access offset
>>>>>        into the section.
>>>>>
>>>>>     5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>>>>>        load will then store the actual target address in order
>>>>>        to have a 'map-lookup'-free access. That is, the actual
>>>>>        map value base address + offset. The destination register
>>>>>        in the verifier will then be marked as PTR_TO_MAP_VALUE,
>>>>>        containing the fixed offset as reg->off and backing BPF
>>>>>        map as reg->map_ptr. Meaning, it's treated as any other
>>>>>        normal map value from verification side, only with
>>>>>        efficient, direct value access instead of actual call to
>>>>>        map lookup helper as in the typical case.
>>>>>
>>>>> Simple example dump of program using globals vars in each
>>>>> section:
>>>>>
>>>>>      # readelf -a test_global_data.o
>>>>>      [...]
>>>>>      [ 6] .bss              NOBITS           0000000000000000  00000328
>>>>>           0000000000000010  0000000000000000  WA       0     0     8
>>>>>      [ 7] .data             PROGBITS         0000000000000000  00000328
>>>>>           0000000000000010  0000000000000000  WA       0     0     8
>>>>>      [ 8] .rodata           PROGBITS         0000000000000000  00000338
>>>>>           0000000000000018  0000000000000000   A       0     0     8
>>>>>      [...]
>>>>>        95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>>>>>        96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>>>>>        97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>>>>>        98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>>>>>        99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>>>>>       100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>>>>>       101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>>>>>      [...]
>>>>>
>>>>>      # bpftool prog
>>>>>      103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>>>>>           loaded_at 2019-02-28T02:02:35+0000  uid 0
>>>>>           xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>>>>>      # bpftool map show id 63
>>>>>      63: array  name .bss  flags 0x0                      <-- .bss area, rw
>>>>>          key 4B  value 16B  max_entries 1  memlock 4096B
>>>>>      # bpftool map show id 64
>>>>>      64: array  name .data  flags 0x0                     <-- .data area, rw
>>>>>          key 4B  value 16B  max_entries 1  memlock 4096B
>>>>>      # bpftool map show id 65
>>>>>      65: array  name .rodata  flags 0x80                  <-- .rodata area, ro
>>>>>          key 4B  value 24B  max_entries 1  memlock 4096B
>>>>>
>>>>>      # bpftool prog dump xlated id 103
>>>>>      int load_static_data(struct __sk_buff * skb):
>>>>>      ; int load_static_data(struct __sk_buff *skb)
>>>>>         0: (b7) r1 = 0
>>>>>      ; key = 0;
>>>>>         1: (63) *(u32 *)(r10 -4) = r1
>>>>>         2: (bf) r6 = r10
>>>>>      ; int load_static_data(struct __sk_buff *skb)
>>>>>         3: (07) r6 += -4
>>>>>      ; bpf_map_update_elem(&result, &key, &static_bss, 0);
>>>>>         4: (18) r1 = map[id:66]
>>>>>         6: (bf) r2 = r6
>>>>>         7: (18) r3 = map[id:63][0]+0         <-- direct static_bss addr in .bss area
>>>>>         9: (b7) r4 = 0
>>>>>        10: (85) call array_map_update_elem#99888
>>>>>        11: (b7) r1 = 1
>>>>>      ; key = 1;
>>>>>        12: (63) *(u32 *)(r10 -4) = r1
>>>>>      ; bpf_map_update_elem(&result, &key, &static_data, 0);
>>>>>        13: (18) r1 = map[id:66]
>>>>>        15: (bf) r2 = r6
>>>>>        16: (18) r3 = map[id:64][0]+0         <-- direct static_data addr in .data area
>>>>>        18: (b7) r4 = 0
>>>>>        19: (85) call array_map_update_elem#99888
>>>>>        20: (b7) r1 = 2
>>>>>      ; key = 2;
>>>>>        21: (63) *(u32 *)(r10 -4) = r1
>>>>>      ; bpf_map_update_elem(&result, &key, &static_rodata, 0);
>>>>>        22: (18) r1 = map[id:66]
>>>>>        24: (bf) r2 = r6
>>>>>        25: (18) r3 = map[id:65][0]+0         <-- direct static_rodata addr in .rodata area
>>>>>        27: (b7) r4 = 0
>>>>>        28: (85) call array_map_update_elem#99888
>>>>>        29: (b7) r1 = 3
>>>>>      ; key = 3;
>>>>>        30: (63) *(u32 *)(r10 -4) = r1
>>>>>      ; bpf_map_update_elem(&result, &key, &static_bss2, 0);
>>>>>        31: (18) r7 = map[id:63][0]+8         <--.
>>>>>        33: (18) r1 = map[id:66]                 |
>>>>>        35: (bf) r2 = r6                         |
>>>>>        36: (18) r3 = map[id:63][0]+8         <-- direct static_bss2 addr in .bss area
>>>>>        38: (b7) r4 = 0
>>>>>        39: (85) call array_map_update_elem#99888
>>>>>      [...]
>>>>>
>>>>> For now .data/.rodata/.bss maps are not exposed via API to the
>>>>> user, but this could be done in a subsequent step.
>>>>>
>>>>> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
>>>>> fail for static variables").
>>>>>
>>>>> Joint work with Joe Stringer.
>>>>>
>>>>>      [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
>>>>>          http://vger.kernel.org/lpc-bpf2018.html#session-3
>>>>>
>>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>>>>> ---
>>>>>     tools/include/uapi/linux/bpf.h |  10 +-
>>>>>     tools/lib/bpf/libbpf.c         | 259 +++++++++++++++++++++++++++------
>>>>>     2 files changed, 226 insertions(+), 43 deletions(-)
>>>>>
>>>>> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
>>>>> index 8884072e1a46..04b26f59b413 100644
>>>>> --- a/tools/include/uapi/linux/bpf.h
>>>>> +++ b/tools/include/uapi/linux/bpf.h
>>>>> @@ -287,7 +287,7 @@ enum bpf_attach_type {
>>>>> [...]
>>>>> @@ -999,8 +1120,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>>>>                          (long long) (rel.r_info >> 32),
>>>>>                          (long long) sym.st_value, sym.st_name);
>>>>>
>>>>> -             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
>>>>> -                     pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
>>>>> +             if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx &&
>>>>> +                 sym.st_shndx != data_shndx && sym.st_shndx != rodata_shndx &&
>>>>> +                 sym.st_shndx != bss_shndx) {
>>>>> +                     pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
>>>>>                                    prog->section_name, sym.st_shndx);
>>>>>                         return -LIBBPF_ERRNO__RELOC;
>>>>>                 }
>>>>> @@ -1045,6 +1168,30 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>>>>>                         prog->reloc_desc[i].type = RELO_LD64;
>>>>>                         prog->reloc_desc[i].insn_idx = insn_idx;
>>>>>                         prog->reloc_desc[i].map_idx = map_idx;
>>>>> +             } else if (sym.st_shndx == data_shndx ||
>>>>> +                        sym.st_shndx == rodata_shndx ||
>>>>> +                        sym.st_shndx == bss_shndx) {
>>>>> +                     int type = (sym.st_shndx == data_shndx)   ? RELO_DATA :
>>>>> +                                (sym.st_shndx == rodata_shndx) ? RELO_RODATA :
>>>>> +                                                                 RELO_BSS;
>>>>> +
>>>>> +                     for (map_idx = 0; map_idx < nr_maps_global; map_idx++) {
>>>>> +                             if (maps_global[map_idx].global_type == type) {
>>>>> +                                     pr_debug("relocation: find map %zd (%s) for insn %u\n",
>>>>> +                                              map_idx, maps_global[map_idx].name, insn_idx);
>>>>> +                                     break;
>>>>> +                             }
>>>>> +                     }
>>>>> +
>>>>> +                     if (map_idx >= nr_maps_global) {
>>>>> +                             pr_warning("bpf relocation: map_idx %d large than %d\n",
>>>>> +                                        (int)map_idx, (int)nr_maps_global - 1);
>>>>> +                             return -LIBBPF_ERRNO__RELOC;
>>>>> +                     }
>>>>> +
>>>>> +                     prog->reloc_desc[i].type = type;
>>>>> +                     prog->reloc_desc[i].insn_idx = insn_idx;
>>>>> +                     prog->reloc_desc[i].map_idx = map_idx;
>>>>>                 }
>>>>>         }
>>>>>         return 0;
>>>>> @@ -1176,15 +1323,58 @@ bpf_object__probe_caps(struct bpf_object *obj)
>>>>>     }
>>>>>
>>>>>     static int
>>>> [...]
>>>>> +
>>>>> +static int
>>>>> +bpf_object__create_maps(struct bpf_object *obj)
>>>>> +{
>>>>>         unsigned int i;
>>>>>         int err;
>>>>>
>>>>>         for (i = 0; i < obj->nr_maps; i++) {
>>>>>                 struct bpf_map *map = &obj->maps[i];
>>>>> -             struct bpf_map_def *def = &map->def;
>>>>>                 char *cp, errmsg[STRERR_BUFSIZE];
>>>>>                 int *pfd = &map->fd;
>>>>>
>>>>> @@ -1193,41 +1383,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>>>>>                                  map->name, map->fd);
>>>>>                         continue;
>>>>>                 }
>>>>> -
>>>>> -             if (obj->caps.name)
>>>>> -                     create_attr.name = map->name;
>>>>> -             create_attr.map_ifindex = map->map_ifindex;
>>>>> -             create_attr.map_type = def->type;
>>>>> -             create_attr.map_flags = def->map_flags;
>>>>> -             create_attr.key_size = def->key_size;
>>>>> -             create_attr.value_size = def->value_size;
>>>>> -             create_attr.max_entries = def->max_entries;
>>>>> -             create_attr.btf_fd = 0;
>>>>> -             create_attr.btf_key_type_id = 0;
>>>>> -             create_attr.btf_value_type_id = 0;
>>>>> -             if (bpf_map_type__is_map_in_map(def->type) &&
>>>>> -                 map->inner_map_fd >= 0)
>>>>> -                     create_attr.inner_map_fd = map->inner_map_fd;
>>>>> -
>>>>> -             if (obj->btf && !bpf_map_find_btf_info(map, obj->btf)) {
>>>>> -                     create_attr.btf_fd = btf__fd(obj->btf);
>>>>> -                     create_attr.btf_key_type_id = map->btf_key_type_id;
>>>>> -                     create_attr.btf_value_type_id = map->btf_value_type_id;
>>>>> -             }
>>>>> -
>>>>> -             *pfd = bpf_create_map_xattr(&create_attr);
>>>>> -             if (*pfd < 0 && create_attr.btf_key_type_id) {
>>>>> -                     cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>>>>> -                     pr_warning("Error in bpf_create_map_xattr(%s):%s(%d). Retrying without BTF.\n",
>>>>> -                                map->name, cp, errno);
>>>>> -                     create_attr.btf_fd = 0;
>>>>> -                     create_attr.btf_key_type_id = 0;
>>>>> -                     create_attr.btf_value_type_id = 0;
>>>>> -                     map->btf_key_type_id = 0;
>>>>> -                     map->btf_value_type_id = 0;
>>>>> -                     *pfd = bpf_create_map_xattr(&create_attr);
>>>>> -             }
>>>>> -
>>>>> +             *pfd = bpf_object__create_map(obj, map);
>>>>>                 if (*pfd < 0) {
>>>>>                         size_t j;
>>>>>
>>>>> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>>>>>                                                       &prog->reloc_desc[i]);
>>>>>                         if (err)
>>>>>                                 return err;
>>>>> +             } else if (prog->reloc_desc[i].type == RELO_DATA ||
>>>>> +                        prog->reloc_desc[i].type == RELO_RODATA ||
>>>>> +                        prog->reloc_desc[i].type == RELO_BSS) {
>>>>> +                     struct bpf_insn *insns = prog->insns;
>>>>> +                     int insn_idx, map_idx, data_off;
>>>>> +
>>>>> +                     insn_idx = prog->reloc_desc[i].insn_idx;
>>>>> +                     map_idx  = prog->reloc_desc[i].map_idx;
>>>>> +                     data_off = insns[insn_idx].imm;
>>>>
>>>> I want to point to a subtle difference here between handling pure global
>>>> variables and static global variables. The "imm" value is only available
>>>> for static variables. For example,
>>>>
>>>> -bash-4.4$ cat g.c
>>>> static volatile long sg = 2;
>>>> static volatile int si = 3;
>>>> long g = 4;
>>>> int i = 5;
>>>> int test() { return sg + si + g + i; }
>>>> -bash-4.4$
>>>> -bash-4.4$ clang -target bpf -O2 -c g.c
>>>>
>>>> -bash-4.4$ readelf -s g.o
>>>>
>>>>
>>>> Symbol table '.symtab' contains 8 entries:
>>>>       Num:    Value          Size Type    Bind   Vis      Ndx Name
>>>>         0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>>>>         1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g.c
>>>>         2: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    4 sg
>>>>         3: 0000000000000018     4 OBJECT  LOCAL  DEFAULT    4 si
>>>>         4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
>>>>         5: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    4 g
>>>>         6: 0000000000000008     4 OBJECT  GLOBAL DEFAULT    4 i
>>>>         7: 0000000000000000   128 FUNC    GLOBAL DEFAULT    2 test
>>>> -bash-4.4$
>>>> -bash-4.4$ llvm-readelf -r g.o
>>>>
>>>> Relocation section '.rel.text' at offset 0x1d8 contains 4 entries:
>>>>        Offset             Info             Type               Symbol's
>>>> Value  Symbol's Name
>>>> 0000000000000000  0000000400000001 R_BPF_64_64
>>>> 0000000000000000 .data
>>>> 0000000000000018  0000000400000001 R_BPF_64_64
>>>> 0000000000000000 .data
>>>> 0000000000000038  0000000500000001 R_BPF_64_64            0000000000000000 g
>>>> 0000000000000058  0000000600000001 R_BPF_64_64            0000000000000008 i
>>>> -bash-4.4$ llvm-objdump -d g.o
>>>>
>>>> g.o:    file format ELF64-BPF
>>>>
>>>> Disassembly of section .text:
>>>> 0000000000000000 test:
>>>>           0:       18 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00
>>>> r1 = 16 ll
>>>>           2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
>>>>           3:       18 02 00 00 18 00 00 00 00 00 00 00 00 00 00 00
>>>> r2 = 24 ll
>>>>           5:       61 22 00 00 00 00 00 00         r2 = *(u32 *)(r2 + 0)
>>>>           6:       0f 21 00 00 00 00 00 00         r1 += r2
>>>>           7:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> r2 = 0 ll
>>>>           9:       79 22 00 00 00 00 00 00         r2 = *(u64 *)(r2 + 0)
>>>>          10:       0f 21 00 00 00 00 00 00         r1 += r2
>>>>          11:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>>>> r2 = 0 ll
>>>>          13:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
>>>>          14:       0f 10 00 00 00 00 00 00         r0 += r1
>>>>          15:       95 00 00 00 00 00 00 00         exit
>>>> -bash-4.4$
>>>>
>>>> You can see the above, the non-static global access does not have its
>>>> in-section offset encoded in the insn itself. The difference is due to
>>>> llvm treating static global and non-static global differently.
>>>>
>>>> To support both cases, during relocation recording stage, you can
>>>> also record:
>>>>       . symbol binding (GELF_ST_BIND(sym.st_info)),
>>>>         non-static global has binding STB_GLOBAL and static
>>>>         global has binding STB_LOCAL
>>>>       . symbol value (sym.st_value)
>>>>
>>>> During the above relocation resolution, if symbol bind is local, do
>>>> what you already did here. If symbol bind is global, assign data_off
>>>> with symbol value.
>>>>
>>>> This applied to both .data and .rodata sections.
>>>>
>>>> The non initialized
>>>> global variable will not be in any allocated section in ELF file,
>>>> it is in a COM section which is to be allocated by loader.
>>>> So user defines some like
>>>>       int g;
>>>> and later on uses it. Right now, it will not work. The workaround
>>>> is "int g = 4", or "static int g". I guess it should be
>>>> okay, we should encourage users to use "static" variables instead.
>>>
>>> Would it be reasonable to just plain disable usage of uninitialized
>>> global variables, as it kind of goes against BPF's philosophy that
>>> everything should be written to, before can be read? So while we can
>>> just implicitly zero-out everything beforehand, it might be a good
>>> idea to remind and enforce that explictly?
>>
>> There will be a verifier error, so the program with "int g" will not
>> run, the same as today.
> 
> Yeah, I understand, but with pretty obscure error about not supporting
> relocations and stuff, right?
> 
>>
>> We could improve by flagging the error at compiler error or libbpf time.
> 
> So that's my point, that having compiler emit nicer error for
> target=bpf would be nice touch to user experience :)

I just removed a compiler error for static variables...

I will wait for this patch lands, hear people complains (either need to
support "int g;" or need better error messages, etc.) and then decide
what next to do ...

> 
>> But it is not required. I am mentioning just for completeness.
>>
>>>
>>>>
>>>>> +
>>>>> +                     if (insn_idx + 1 >= (int)prog->insns_cnt) {
>>>>> +                             pr_warning("relocation out of range: '%s'\n",
>>>>> +                                        prog->section_name);
>>>>> +                             return -LIBBPF_ERRNO__RELOC;
>>>>> +                     }
>>>>> +                     insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
>>>>> +                     insns[insn_idx].imm = obj->maps_global[map_idx].fd;
>>>>> +                     insns[insn_idx + 1].imm = data_off;
>>>>>                 }
>>>>>         }
>>>>>
>>>>> @@ -1717,6 +1891,7 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz,
>>>>>
>>>>>         CHECK_ERR(bpf_object__elf_init(obj), err, out);
>>>>>         CHECK_ERR(bpf_object__check_endianness(obj), err, out);
>>>>> +     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>>>>>         CHECK_ERR(bpf_object__elf_collect(obj, flags), err, out);
>>>>>         CHECK_ERR(bpf_object__collect_reloc(obj), err, out);
>>>>>         CHECK_ERR(bpf_object__validate(obj, needs_kver), err, out);
>>>>> @@ -1789,7 +1964,8 @@ int bpf_object__unload(struct bpf_object *obj)
>>>>>
>>>>>         for (i = 0; i < obj->nr_maps; i++)
>>>>>                 zclose(obj->maps[i].fd);
>>>>> -
>>>>> +     for (i = 0; i < obj->nr_maps_global; i++)
>>>>> +             zclose(obj->maps_global[i].fd);
>>>>>         for (i = 0; i < obj->nr_programs; i++)
>>>>>                 bpf_program__unload(&obj->programs[i]);
>>>>>
>>>>> @@ -1810,7 +1986,6 @@ int bpf_object__load(struct bpf_object *obj)
>>>>>
>>>>>         obj->loaded = true;
>>>>>
>>>>> -     CHECK_ERR(bpf_object__probe_caps(obj), err, out);
>>>>>         CHECK_ERR(bpf_object__create_maps(obj), err, out);
>>>>>         CHECK_ERR(bpf_object__relocate(obj), err, out);
>>>>>         CHECK_ERR(bpf_object__load_progs(obj), err, out);
>>>>>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-03-01  9:49     ` Daniel Borkmann
  2019-03-01 18:50       ` Jakub Kicinski
@ 2019-03-01 19:35       ` Andrii Nakryiko
  2019-03-01 20:08         ` Jakub Kicinski
  1 sibling, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-01 19:35 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, tgraf, Yonghong Song, Andrii Nakryiko,
	Jakub Kicinski, lmb

On Fri, Mar 1, 2019 at 1:49 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 03/01/2019 06:46 AM, Andrii Nakryiko wrote:
> > On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>
> >> This generic extension to BPF maps allows for directly loading an
> >> address residing inside a BPF map value as a single BPF ldimm64
> >> instruction.
> >
> > This is great! I'm going to review code more thoroughly tomorrow, but
> > I also have few questions/suggestions I'd like to discuss, if you
> > don't mind.
>
> Awesome, thanks!
>
> >> The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
> >> is a special src_reg flag for ldimm64 instruction that indicates
> >> that inside the first part of the double insns's imm field is a
> >> file descriptor which the verifier then replaces as a full 64bit
> >> address of the map into both imm parts.
> >>
> >> For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea
> >> is similar: the first part of the double insns's imm field is
> >> again a file descriptor corresponding to the map, and the second
> >> part of the imm field is an offset. The verifier will then replace
> >> both imm parts with an address that points into the BPF map value
> >> for maps that support this operation. BPF_PSEUDO_MAP_VALUE is a
> >> distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not
> >> differ offset 0 between load of map pointer versus load of map's
> >> value at offset 0.
> >
> > Is having both BPF_PSEUDO_MAP_FD and BPF_PSEUDO_MAP_VALUE a desirable
> > thing? I'm asking because it's seems like it would be really simple to
> > stick to using just BPF_PSEUDO_MAP_FD and then interpret imm
> > differently depending on whether it's 0 or not. E.g., we can say that
> > imm=0 is old BPF_PSEUDO_MAP_FD behavior (loading map addr), but any
> > other imm value X is really just (X-1) offset into map's value? Or,
> > given that valid offset is limited to 1<<29, we can set highest-order
> > bit to 1 and lower bits would be offset? In other words, if we just
> > need too carve out zero as a special case, then it's easy to do and we
> > can avoid adding new BPF_PSEUDO_MAP_VALUE.
>
> Was thinking about reusing BPF_PSEUDO_MAP_FD initially as mentioned in
> here, but went for BPF_PSEUDO_MAP_VALUE eventually to have a straight
> forward mapping. Your suggestion could be done, but it feels more
> complex than necessary, imho, meaning it might confuse users trying to
> make sense of a insn dump or verifier output wondering whether the off
> by one is a bug or not, which won't happen if the offset is exactly the
> same value as LLVM emits. There is also one more unfortunate reason
> which I noticed while implementing: in replace_map_fd_with_map_ptr()
> we never enforced that for BPF_PSEUDO_MAP_FD insns the second imm part
> must be 0, meaning it could also have a garbage value which would then
> break loaders in the wild; with the code today this is ignored and then
> overridden by the map address. We could try to push a patch to stable
> to reject anything non-zero in the second imm for BPF_PSEUDO_MAP_FD
> and see if anyone actually notices, and then use some higher-order bit
> as a selector, but that still would need some extra handling to make
> the offset clear for users wrt dumps; I can give it a try though to
> check how much more complex it gets. Worst case if something should
> really break somewhere, we might need to revert the imm==0 rejection
> though. Overall, BPF_PSEUDO_MAP_VALUE felt slightly more suitable to me.

Yeah, make sense, BPF_PSEUDO_MAP_VALUE is more explicit.

>
> >> This allows for efficiently retrieving an address to a map value
> >> memory area without having to issue a helper call which needs to
> >> prepare registers according to calling convention, etc, without
> >> needing the extra NULL test, and without having to add the offset
> >> in an additional instruction to the value base pointer.
> >
> > It seems like we allow this only for arrays of size 1 right now. We
> > can easily generalize this to support not just offset into map's
> > value, but also specifying integer key (i.e., array index) by
> > utilizing off fields (16-bit + 16-bit). This would allow to eliminate
> > any bpf_map_update_elem calls to array maps altogether by allowing to
> > provide both array index and offset into value in one BPF instruction.
> > Do you think it's a good addition?
>
> Yeah, I've been thinking about this as well that for array-like maps
> it's easy to support and at the same time lifts the restriction, too.
> I think it would be useful and straight forward to implement, I can
> include it into v3.
>
> >> The verifier then treats the destination register as PTR_TO_MAP_VALUE
> >> with constant reg->off from the user passed offset from the second
> >> imm field, and guarantees that this is within bounds of the map
> >> value. Any subsequent operations are normally treated as typical
> >> map value handling without anything else needed for verification.
> >>
> >> The two map operations for direct value access have been added to
> >> array map for now. In future other types could be supported as
> >> well depending on the use case. The main use case for this commit
> >> is to allow for BPF loader support for global variables that
> >> reside in .data/.rodata/.bss sections such that we can directly
> >> load the address of them with minimal additional infrastructure
> >> required. Loader support has been added in subsequent commits for
> >> libbpf library.
> >
> > I was considering adding a new kind of map representing contiguous
> > block of memory (e.g., how about BPF_MAP_TYPE_HEAP or
> > BPF_MAP_TYPE_BLOB?). It's keys would be offset into that memory
> > region. Value size is size of the memory region, but it would allow
> > reading smaller chunks of memory as values. This would provide
> > convenient interface for poking at global variables from userland,
> > given offset.
> >
> > Libbpf itself would provide higher-level API as well, if there is
> > corresponding BTF type information describing layout of
> > .data/.bss/.rodata, so that applications can fetch variables by name
> > and/or offset, whichever is more convenient. Together with
> > bpf_spinlock this would allow easy way to customize subsets of global
> > variables in atomic fashion.
> >
> > Do you think that would work? Using array is a bit limiting, because
> > it doesn't allow to do partial reads/updates, while BPF_MAP_TYPE_HEAP
> > would be single big value that allows partial reading/updating.
>
> If I understand it correctly, the main difference this would have is
> to be able to use spin_locks in a more fine-grained fashion, right?

spin_lock is just a nice bonus, if there is any manipulation that
isn't <= 8 byte long that needs to be done atomically.

The reason for this new type of map is actually ability to update
global variables from outside BPF program in granular fashion. E.g.,
turning on some extra debug output temporarily, tuning parameters,
changing PID to trace, etc, without stopping and reloading BPF
program. With array, it's all-or-nothing: to update anything you have
to overwrite entire .data section. As I mentioned, if we can get BTF
type information for entirety of .data, it would allow to manipulate
those variables by name even with generic tools like bpftool.

> Meaning, partial reads/updates of the memory area under spin_lock as
> opposed to having to lock over the full area? Yeah, sounds like a
> reasonable extension to me that could be done on top of the series,
> presumably most of the array map logic could also be reused for this
> which is nice.
>
> Thanks a lot,
> Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-03-01 17:18   ` Yonghong Song
@ 2019-03-01 19:51     ` Daniel Borkmann
  2019-03-01 23:02       ` Yonghong Song
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01 19:51 UTC (permalink / raw)
  To: Yonghong Song, Alexei Starovoitov
  Cc: bpf, netdev, joe, john.fastabend, tgraf, Andrii Nakryiko,
	jakub.kicinski, lmb

On 03/01/2019 06:18 PM, Yonghong Song wrote:
> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
>> This generic extension to BPF maps allows for directly loading an
>> address residing inside a BPF map value as a single BPF ldimm64
>> instruction.
>>
>> The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
>> is a special src_reg flag for ldimm64 instruction that indicates
>> that inside the first part of the double insns's imm field is a
>> file descriptor which the verifier then replaces as a full 64bit
>> address of the map into both imm parts.
>>
>> For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea
>> is similar: the first part of the double insns's imm field is
>> again a file descriptor corresponding to the map, and the second
>> part of the imm field is an offset. The verifier will then replace
>> both imm parts with an address that points into the BPF map value
>> for maps that support this operation. BPF_PSEUDO_MAP_VALUE is a
>> distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not
>> differ offset 0 between load of map pointer versus load of map's
>> value at offset 0.
>>
>> This allows for efficiently retrieving an address to a map value
>> memory area without having to issue a helper call which needs to
>> prepare registers according to calling convention, etc, without
>> needing the extra NULL test, and without having to add the offset
>> in an additional instruction to the value base pointer.
>>
>> The verifier then treats the destination register as PTR_TO_MAP_VALUE
>> with constant reg->off from the user passed offset from the second
>> imm field, and guarantees that this is within bounds of the map
>> value. Any subsequent operations are normally treated as typical
>> map value handling without anything else needed for verification.
>>
>> The two map operations for direct value access have been added to
>> array map for now. In future other types could be supported as
>> well depending on the use case. The main use case for this commit
>> is to allow for BPF loader support for global variables that
>> reside in .data/.rodata/.bss sections such that we can directly
>> load the address of them with minimal additional infrastructure
>> required. Loader support has been added in subsequent commits for
>> libbpf library.
> 
> The patch version #1 provides a way to replace the load with
> immediate (presumably read-only data). This will be good for
> the use case like below:
> 
>     if (static_variable_kernel_version == V1) {
>         /* code here will work for kernel V1 */
>         ... access helpers available for V1 ...
>     } else if (static_variable_kernel_version == V2) {
>         /* code here will work for kernel V2 */
>         ... access helpers available for V2 ...
>     }
> 
> The approach here did not replace the map value access with values from 
> e.g., readonly section for which libbpf could provide an interface to 
> fill in data from user.
> 
> This may require a little more analysis, e.g.,
>     ptr = ld_imm64 from a readonly section
>     ...
>     *(u32 *)ptr;
>     *(u64 *)(ptr + 8);
>     ...
> 
> Do you think we could do this in kernel verifier or we should
> push the whole readonly stuff into user space?

And in your case the static_variable_kernel_version would be determined
at runtime, for example, where you then would want to eliminate all the
other branches, right? Meaning, you'd need a way to turn this into a imm
load such that verifier will detect these dead branches and patch them
out, which it should already be able to do. How would you mark these
special vars like static_variable_kernel_version such that they have
special treatment from the rest, some sort of builtin? Potentially one
could get away with doing this from loader side if it's simple enough,
though one thing that would be good to avoid is to duplicate all the
complex branch fixup logic etc that we have in kernel already. Are you
thinking to mark these via BTF in some way such that loader does inline
replacement?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 18:11   ` Yonghong Song
  2019-03-01 18:48     ` Andrii Nakryiko
@ 2019-03-01 19:56     ` Daniel Borkmann
  1 sibling, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01 19:56 UTC (permalink / raw)
  To: Yonghong Song, Alexei Starovoitov
  Cc: bpf, netdev, joe, john.fastabend, tgraf, Andrii Nakryiko,
	jakub.kicinski, lmb

On 03/01/2019 07:11 PM, Yonghong Song wrote:
> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
[...]
>> @@ -1412,6 +1568,24 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>>   						      &prog->reloc_desc[i]);
>>   			if (err)
>>   				return err;
>> +		} else if (prog->reloc_desc[i].type == RELO_DATA ||
>> +			   prog->reloc_desc[i].type == RELO_RODATA ||
>> +			   prog->reloc_desc[i].type == RELO_BSS) {
>> +			struct bpf_insn *insns = prog->insns;
>> +			int insn_idx, map_idx, data_off;
>> +
>> +			insn_idx = prog->reloc_desc[i].insn_idx;
>> +			map_idx  = prog->reloc_desc[i].map_idx;
>> +			data_off = insns[insn_idx].imm;
> 
> I want to point to a subtle difference here between handling pure global 
> variables and static global variables. The "imm" value is only available
> for static variables. For example,
> 
> -bash-4.4$ cat g.c
> static volatile long sg = 2;
> static volatile int si = 3;
> long g = 4;
> int i = 5;
> int test() { return sg + si + g + i; }
> -bash-4.4$
> -bash-4.4$ clang -target bpf -O2 -c g.c 
> 
> -bash-4.4$ readelf -s g.o 
> 
> 
> Symbol table '.symtab' contains 8 entries:
>     Num:    Value          Size Type    Bind   Vis      Ndx Name
>       0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>       1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g.c
>       2: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    4 sg
>       3: 0000000000000018     4 OBJECT  LOCAL  DEFAULT    4 si
>       4: 0000000000000000     0 SECTION LOCAL  DEFAULT    4
>       5: 0000000000000000     8 OBJECT  GLOBAL DEFAULT    4 g
>       6: 0000000000000008     4 OBJECT  GLOBAL DEFAULT    4 i
>       7: 0000000000000000   128 FUNC    GLOBAL DEFAULT    2 test
> -bash-4.4$
> -bash-4.4$ llvm-readelf -r g.o
> 
> Relocation section '.rel.text' at offset 0x1d8 contains 4 entries:
>      Offset             Info             Type               Symbol's 
> Value  Symbol's Name
> 0000000000000000  0000000400000001 R_BPF_64_64 
> 0000000000000000 .data
> 0000000000000018  0000000400000001 R_BPF_64_64 
> 0000000000000000 .data
> 0000000000000038  0000000500000001 R_BPF_64_64            0000000000000000 g
> 0000000000000058  0000000600000001 R_BPF_64_64            0000000000000008 i
> -bash-4.4$ llvm-objdump -d g.o
> 
> g.o:    file format ELF64-BPF
> 
> Disassembly of section .text:
> 0000000000000000 test:
>         0:       18 01 00 00 10 00 00 00 00 00 00 00 00 00 00 00 
> r1 = 16 ll
>         2:       79 11 00 00 00 00 00 00         r1 = *(u64 *)(r1 + 0)
>         3:       18 02 00 00 18 00 00 00 00 00 00 00 00 00 00 00 
> r2 = 24 ll
>         5:       61 22 00 00 00 00 00 00         r2 = *(u32 *)(r2 + 0)
>         6:       0f 21 00 00 00 00 00 00         r1 += r2
>         7:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> r2 = 0 ll
>         9:       79 22 00 00 00 00 00 00         r2 = *(u64 *)(r2 + 0)
>        10:       0f 21 00 00 00 00 00 00         r1 += r2
>        11:       18 02 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
> r2 = 0 ll
>        13:       61 20 00 00 00 00 00 00         r0 = *(u32 *)(r2 + 0)
>        14:       0f 10 00 00 00 00 00 00         r0 += r1
>        15:       95 00 00 00 00 00 00 00         exit
> -bash-4.4$
> 
> You can see the above, the non-static global access does not have its
> in-section offset encoded in the insn itself. The difference is due to
> llvm treating static global and non-static global differently.
> 
> To support both cases, during relocation recording stage, you can
> also record:
>     . symbol binding (GELF_ST_BIND(sym.st_info)),
>       non-static global has binding STB_GLOBAL and static
>       global has binding STB_LOCAL
>     . symbol value (sym.st_value)
> 
> During the above relocation resolution, if symbol bind is local, do
> what you already did here. If symbol bind is global, assign data_off
> with symbol value.
> 
> This applied to both .data and .rodata sections.
> 
> The non initialized
> global variable will not be in any allocated section in ELF file,
> it is in a COM section which is to be allocated by loader.
> So user defines some like
>     int g;
> and later on uses it. Right now, it will not work. The workaround
> is "int g = 4", or "static int g". I guess it should be
> okay, we should encourage users to use "static" variables instead.

Agree and noted, and thanks for pointing this out, Yonghong! I'll
fix this up accordingly in next round.

Thanks a lot,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 6/7] bpf, selftest: test global data/bss/rodata sections
  2019-03-01 19:13   ` Andrii Nakryiko
@ 2019-03-01 20:02     ` Daniel Borkmann
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01 20:02 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Networking, joe, john.fastabend, tgraf,
	Yonghong Song, Andrii Nakryiko, Jakub Kicinski, lmb

On 03/01/2019 08:13 PM, Andrii Nakryiko wrote:
> On Thu, Feb 28, 2019 at 3:32 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> From: Joe Stringer <joe@wand.net.nz>
>>
>> Add tests for libbpf relocation of static variable references
>> into the .data, .rodata and .bss sections of the ELF. Tests with
>> different offsets are all passing:
>>
>>   # ./test_progs
>>   [...]
>>   test_static_data_access:PASS:load program 0 nsec
>>   test_static_data_access:PASS:pass packet 278 nsec
>>   test_static_data_access:PASS:relocate .bss reference 1 278 nsec
>>   test_static_data_access:PASS:relocate .data reference 1 278 nsec
>>   test_static_data_access:PASS:relocate .rodata reference 1 278 nsec
>>   test_static_data_access:PASS:relocate .bss reference 2 278 nsec
>>   test_static_data_access:PASS:relocate .data reference 2 278 nsec
>>   test_static_data_access:PASS:relocate .rodata reference 2 278 nsec
>>   test_static_data_access:PASS:relocate .bss reference 3 278 nsec
>>   test_static_data_access:PASS:relocate .bss reference 4 278 nsec
>>   Summary: 223 PASSED, 0 FAILED
>>
>> Joint work with Daniel Borkmann.
>>
>> Signed-off-by: Joe Stringer <joe@wand.net.nz>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>  tools/testing/selftests/bpf/bpf_helpers.h     |  2 +-
>>  .../selftests/bpf/progs/test_global_data.c    | 61 +++++++++++++++++++
>>  tools/testing/selftests/bpf/test_progs.c      | 50 +++++++++++++++
>>  3 files changed, 112 insertions(+), 1 deletion(-)
>>  create mode 100644 tools/testing/selftests/bpf/progs/test_global_data.c
>>
>> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
>> index d9999f1ed1d2..0463662935f9 100644
>> --- a/tools/testing/selftests/bpf/bpf_helpers.h
>> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
>> @@ -11,7 +11,7 @@
>>  /* helper functions called from eBPF programs written in C */
>>  static void *(*bpf_map_lookup_elem)(void *map, void *key) =
>>         (void *) BPF_FUNC_map_lookup_elem;
>> -static int (*bpf_map_update_elem)(void *map, void *key, void *value,
>> +static int (*bpf_map_update_elem)(void *map, const void *key, const void *value,
>>                                   unsigned long long flags) =
>>         (void *) BPF_FUNC_map_update_elem;
>>  static int (*bpf_map_delete_elem)(void *map, void *key) =
>> diff --git a/tools/testing/selftests/bpf/progs/test_global_data.c b/tools/testing/selftests/bpf/progs/test_global_data.c
>> new file mode 100644
>> index 000000000000..2a7cf40b8efb
>> --- /dev/null
>> +++ b/tools/testing/selftests/bpf/progs/test_global_data.c
>> @@ -0,0 +1,61 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +// Copyright (c) 2019 Isovalent, Inc.
>> +
>> +#include <linux/bpf.h>
>> +#include <linux/pkt_cls.h>
>> +#include <string.h>
>> +
>> +#include "bpf_helpers.h"
>> +
>> +struct bpf_map_def SEC("maps") result = {
>> +       .type           = BPF_MAP_TYPE_ARRAY,
>> +       .key_size       = sizeof(__u32),
>> +       .value_size     = sizeof(__u64),
>> +       .max_entries    = 9,
>> +};
>> +
>> +static       __u64 static_bss     = 0;         /* Reloc reference to .bss  section   */
>> +static       __u64 static_data    = 42;                /* Reloc reference to .data section   */
>> +static const __u64 static_rodata  = 24;                /* Reloc reference to .rodata section */
>> +static       __u64 static_bss2    = 0;         /* Reloc reference to .bss  section   */
>> +static       __u64 static_data2   = 0xffeeff;  /* Reloc reference to .data section   */
>> +static const __u64 static_rodata2 = 0xabab;    /* Reloc reference to .rodata section */
>> +static const __u64 static_rodata3 = 0xab;      /* Reloc reference to .rodata section */
> 
> In the light of Yonghong's explanation about static vs non-static
> globals, it would be nice to add test for non-static initialized
> globals here as well?

Yes, agree, I'll add them as well and integrate the correct offset
for libbpf along with it.

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 19:19           ` Yonghong Song
@ 2019-03-01 20:06             ` Daniel Borkmann
  2019-03-01 20:25               ` Yonghong Song
  2019-03-05  2:28               ` static bpf vars. Was: " Alexei Starovoitov
  0 siblings, 2 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01 20:06 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, netdev, joe, john.fastabend, tgraf,
	Andrii Nakryiko, jakub.kicinski, lmb

On 03/01/2019 08:19 PM, Yonghong Song wrote:
> On 3/1/19 11:10 AM, Andrii Nakryiko wrote:
>> On Fri, Mar 1, 2019 at 10:58 AM Yonghong Song <yhs@fb.com> wrote:
>>> On 3/1/19 10:48 AM, Andrii Nakryiko wrote:
>>>> On Fri, Mar 1, 2019 at 10:31 AM Yonghong Song <yhs@fb.com> wrote:
>>>>> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
[...]
>>>> Would it be reasonable to just plain disable usage of uninitialized
>>>> global variables, as it kind of goes against BPF's philosophy that
>>>> everything should be written to, before can be read? So while we can
>>>> just implicitly zero-out everything beforehand, it might be a good
>>>> idea to remind and enforce that explictly?
>>>
>>> There will be a verifier error, so the program with "int g" will not
>>> run, the same as today.
>>
>> Yeah, I understand, but with pretty obscure error about not supporting
>> relocations and stuff, right?
>>
>>>
>>> We could improve by flagging the error at compiler error or libbpf time.
>>
>> So that's my point, that having compiler emit nicer error for
>> target=bpf would be nice touch to user experience :)
> 
> I just removed a compiler error for static variables...
> 
> I will wait for this patch lands, hear people complains (either need to
> support "int g;" or need better error messages, etc.) and then decide
> what next to do ...

By the way, from LLVM side, do you think it makes sense for local vars
where you encode the offset into insn->imm to already encode it into
(insn+1)->imm of the ldimm64, so that loaders can just pass this offset
through instead of fixing it up like I did? I'm fine either way though,
just thought might be worth pointing out while we're at it. :)

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-03-01 19:35       ` Andrii Nakryiko
@ 2019-03-01 20:08         ` Jakub Kicinski
  0 siblings, 0 replies; 46+ messages in thread
From: Jakub Kicinski @ 2019-03-01 20:08 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Daniel Borkmann, Alexei Starovoitov, bpf, Networking,
	Joe Stringer, john fastabend, tgraf, Yonghong Song,
	Andrii Nakryiko, lmb

On Fri, 1 Mar 2019 11:35:17 -0800, Andrii Nakryiko wrote:
> > > Do you think that would work? Using array is a bit limiting, because
> > > it doesn't allow to do partial reads/updates, while BPF_MAP_TYPE_HEAP
> > > would be single big value that allows partial reading/updating.  
> >
> > If I understand it correctly, the main difference this would have is
> > to be able to use spin_locks in a more fine-grained fashion, right?  
> 
> spin_lock is just a nice bonus, if there is any manipulation that
> isn't <= 8 byte long that needs to be done atomically.
> 
> The reason for this new type of map is actually ability to update
> global variables from outside BPF program in granular fashion. E.g.,
> turning on some extra debug output temporarily, tuning parameters,
> changing PID to trace, etc, without stopping and reloading BPF
> program. With array, it's all-or-nothing: to update anything you have
> to overwrite entire .data section. As I mentioned, if we can get BTF
> type information for entirety of .data, it would allow to manipulate
> those variables by name even with generic tools like bpftool.

Sounds like you'd almost want to mmap the value ;)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 20:06             ` Daniel Borkmann
@ 2019-03-01 20:25               ` Yonghong Song
  2019-03-01 20:33                 ` Daniel Borkmann
  2019-03-05  2:28               ` static bpf vars. Was: " Alexei Starovoitov
  1 sibling, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2019-03-01 20:25 UTC (permalink / raw)
  To: Daniel Borkmann, Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, netdev, joe, john.fastabend, tgraf,
	Andrii Nakryiko, jakub.kicinski, lmb



On 3/1/19 12:06 PM, Daniel Borkmann wrote:
> On 03/01/2019 08:19 PM, Yonghong Song wrote:
>> On 3/1/19 11:10 AM, Andrii Nakryiko wrote:
>>> On Fri, Mar 1, 2019 at 10:58 AM Yonghong Song <yhs@fb.com> wrote:
>>>> On 3/1/19 10:48 AM, Andrii Nakryiko wrote:
>>>>> On Fri, Mar 1, 2019 at 10:31 AM Yonghong Song <yhs@fb.com> wrote:
>>>>>> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
> [...]
>>>>> Would it be reasonable to just plain disable usage of uninitialized
>>>>> global variables, as it kind of goes against BPF's philosophy that
>>>>> everything should be written to, before can be read? So while we can
>>>>> just implicitly zero-out everything beforehand, it might be a good
>>>>> idea to remind and enforce that explictly?
>>>>
>>>> There will be a verifier error, so the program with "int g" will not
>>>> run, the same as today.
>>>
>>> Yeah, I understand, but with pretty obscure error about not supporting
>>> relocations and stuff, right?
>>>
>>>>
>>>> We could improve by flagging the error at compiler error or libbpf time.
>>>
>>> So that's my point, that having compiler emit nicer error for
>>> target=bpf would be nice touch to user experience :)
>>
>> I just removed a compiler error for static variables...
>>
>> I will wait for this patch lands, hear people complains (either need to
>> support "int g;" or need better error messages, etc.) and then decide
>> what next to do ...
> 
> By the way, from LLVM side, do you think it makes sense for local vars
> where you encode the offset into insn->imm to already encode it into
> (insn+1)->imm of the ldimm64, so that loaders can just pass this offset
> through instead of fixing it up like I did? I'm fine either way though,
> just thought might be worth pointing out while we're at it. :)

Yes, llvm can do that. Let me prototype it and will let you know
if it landed in llvm trunk.

> 
> Thanks,
> Daniel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 20:25               ` Yonghong Song
@ 2019-03-01 20:33                 ` Daniel Borkmann
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-01 20:33 UTC (permalink / raw)
  To: Yonghong Song, Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, netdev, joe, john.fastabend, tgraf,
	Andrii Nakryiko, jakub.kicinski, lmb

On 03/01/2019 09:25 PM, Yonghong Song wrote:
> On 3/1/19 12:06 PM, Daniel Borkmann wrote:
>> On 03/01/2019 08:19 PM, Yonghong Song wrote:
>>> On 3/1/19 11:10 AM, Andrii Nakryiko wrote:
>>>> On Fri, Mar 1, 2019 at 10:58 AM Yonghong Song <yhs@fb.com> wrote:
>>>>> On 3/1/19 10:48 AM, Andrii Nakryiko wrote:
>>>>>> On Fri, Mar 1, 2019 at 10:31 AM Yonghong Song <yhs@fb.com> wrote:
>>>>>>> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
>> [...]
>>>>>> Would it be reasonable to just plain disable usage of uninitialized
>>>>>> global variables, as it kind of goes against BPF's philosophy that
>>>>>> everything should be written to, before can be read? So while we can
>>>>>> just implicitly zero-out everything beforehand, it might be a good
>>>>>> idea to remind and enforce that explictly?
>>>>>
>>>>> There will be a verifier error, so the program with "int g" will not
>>>>> run, the same as today.
>>>>
>>>> Yeah, I understand, but with pretty obscure error about not supporting
>>>> relocations and stuff, right?
>>>>
>>>>>
>>>>> We could improve by flagging the error at compiler error or libbpf time.
>>>>
>>>> So that's my point, that having compiler emit nicer error for
>>>> target=bpf would be nice touch to user experience :)
>>>
>>> I just removed a compiler error for static variables...
>>>
>>> I will wait for this patch lands, hear people complains (either need to
>>> support "int g;" or need better error messages, etc.) and then decide
>>> what next to do ...
>>
>> By the way, from LLVM side, do you think it makes sense for local vars
>> where you encode the offset into insn->imm to already encode it into
>> (insn+1)->imm of the ldimm64, so that loaders can just pass this offset
>> through instead of fixing it up like I did? I'm fine either way though,
>> just thought might be worth pointing out while we're at it. :)
> 
> Yes, llvm can do that. Let me prototype it and will let you know
> if it landed in llvm trunk.

Awesome, thanks!

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-03-01 19:51     ` Daniel Borkmann
@ 2019-03-01 23:02       ` Yonghong Song
  0 siblings, 0 replies; 46+ messages in thread
From: Yonghong Song @ 2019-03-01 23:02 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: bpf, netdev, joe, john.fastabend, tgraf, Andrii Nakryiko,
	jakub.kicinski, lmb



On 3/1/19 11:51 AM, Daniel Borkmann wrote:
> On 03/01/2019 06:18 PM, Yonghong Song wrote:
>> On 2/28/19 3:18 PM, Daniel Borkmann wrote:
>>> This generic extension to BPF maps allows for directly loading an
>>> address residing inside a BPF map value as a single BPF ldimm64
>>> instruction.
>>>
>>> The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
>>> is a special src_reg flag for ldimm64 instruction that indicates
>>> that inside the first part of the double insns's imm field is a
>>> file descriptor which the verifier then replaces as a full 64bit
>>> address of the map into both imm parts.
>>>
>>> For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea
>>> is similar: the first part of the double insns's imm field is
>>> again a file descriptor corresponding to the map, and the second
>>> part of the imm field is an offset. The verifier will then replace
>>> both imm parts with an address that points into the BPF map value
>>> for maps that support this operation. BPF_PSEUDO_MAP_VALUE is a
>>> distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not
>>> differ offset 0 between load of map pointer versus load of map's
>>> value at offset 0.
>>>
>>> This allows for efficiently retrieving an address to a map value
>>> memory area without having to issue a helper call which needs to
>>> prepare registers according to calling convention, etc, without
>>> needing the extra NULL test, and without having to add the offset
>>> in an additional instruction to the value base pointer.
>>>
>>> The verifier then treats the destination register as PTR_TO_MAP_VALUE
>>> with constant reg->off from the user passed offset from the second
>>> imm field, and guarantees that this is within bounds of the map
>>> value. Any subsequent operations are normally treated as typical
>>> map value handling without anything else needed for verification.
>>>
>>> The two map operations for direct value access have been added to
>>> array map for now. In future other types could be supported as
>>> well depending on the use case. The main use case for this commit
>>> is to allow for BPF loader support for global variables that
>>> reside in .data/.rodata/.bss sections such that we can directly
>>> load the address of them with minimal additional infrastructure
>>> required. Loader support has been added in subsequent commits for
>>> libbpf library.
>>
>> The patch version #1 provides a way to replace the load with
>> immediate (presumably read-only data). This will be good for
>> the use case like below:
>>
>>      if (static_variable_kernel_version == V1) {
>>          /* code here will work for kernel V1 */
>>          ... access helpers available for V1 ...
>>      } else if (static_variable_kernel_version == V2) {
>>          /* code here will work for kernel V2 */
>>          ... access helpers available for V2 ...
>>      }
>>
>> The approach here did not replace the map value access with values from
>> e.g., readonly section for which libbpf could provide an interface to
>> fill in data from user.
>>
>> This may require a little more analysis, e.g.,
>>      ptr = ld_imm64 from a readonly section
>>      ...
>>      *(u32 *)ptr;
>>      *(u64 *)(ptr + 8);
>>      ...
>>
>> Do you think we could do this in kernel verifier or we should
>> push the whole readonly stuff into user space?
> 
> And in your case the static_variable_kernel_version would be determined
> at runtime, for example, where you then would want to eliminate all the
> other branches, right? Meaning, you'd need a way to turn this into a imm
> load such that verifier will detect these dead branches and patch them

Yes, the program will be compiled once and deployed to many hosts with 
different kernel versions. Different hosts may have different kernel
versions. The static_variable_kernel_version is determined

> out, which it should already be able to do. How would you mark these
> special vars like static_variable_kernel_version such that they have
> special treatment from the rest, some sort of builtin? Potentially one

A libbpf API is needed to assign a particular value to a readonly 
section value. For example, a bpf program may look like:

-bash-4.4$ cat g1.c
static volatile const unsigned __kernel_version;
int prog() {
   unsigned kernel_ver = __kernel_version;

   if (kernel_ver == 411)
     return 0;
   else if (kernel_ver == 416)
     return 1;
   return 2;
}
-bash-4.4$ clang -target bpf -O2 -c g1.c 

-bash-4.4$ llvm-readelf -r g1.o

Relocation section '.rel.text' at offset 0x178 contains 1 entries:
     Offset             Info             Type               Symbol's 
Value  Symbol's Name
0000000000000000  0000000500000001 R_BPF_64_64 
0000000000000000 .rodata
-bash-4.4$ llvm-objdump -d g1.o 


g1.o:   file format ELF64-BPF

Disassembly of section .text:
0000000000000000 prog:
        0:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
r1 = 0 ll
        2:       61 11 00 00 00 00 00 00         r1 = *(u32 *)(r1 + 0)
        3:       b7 02 00 00 01 00 00 00         r2 = 1
        4:       15 01 01 00 a0 01 00 00         if r1 == 416 goto +1 
<LBB0_2>
        5:       b7 02 00 00 02 00 00 00         r2 = 2

0000000000000030 LBB0_2:
        6:       b7 00 00 00 00 00 00 00         r0 = 0
        7:       15 01 01 00 9b 01 00 00         if r1 == 411 goto +1 
<LBB0_4>
        8:       bf 20 00 00 00 00 00 00         r0 = r2

0000000000000048 LBB0_4:
        9:       95 00 00 00 00 00 00 00         exit
-bash-4.4$ llvm-readelf -S g1.o
There are 9 section headers, starting at offset 0x1f8:

Section Headers:
   [Nr] Name              Type            Address          Off    Size 
ES Flg Lk Inf Al
   [ 0]                   NULL            0000000000000000 000000 000000 
00      0   0  0
   [ 1] .strtab           STRTAB          0000000000000000 000189 000068 
00      0   0  1
   [ 2] .text             PROGBITS        0000000000000000 000040 000050 
00  AX  0   0  8
   [ 3] .rel.text         REL             0000000000000000 000178 000010 
10      8   2  8
   [ 4] .rodata           PROGBITS        0000000000000000 000090 000004 
00   A  0   0  4
   [ 5] .BTF              PROGBITS        0000000000000000 000094 000019 
00      0   0  1
   [ 6] .BTF.ext          PROGBITS        0000000000000000 0000ad 000020 
00      0   0  1
   [ 7] .llvm_addrsig     LLVM_ADDRSIG    0000000000000000 000188 000001 
00   E  8   0  1
   [ 8] .symtab           SYMTAB          0000000000000000 0000d0 0000a8 
18      1   6  8
Key to Flags:
   W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
   I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
   O (extra OS processing required) o (OS specific), p (processor specific)
-bash-4.4$ llvm-readelf -s g1.o 


Symbol table '.symtab' contains 7 entries:
    Num:    Value          Size Type    Bind   Vis      Ndx Name
      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
      1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS g1.c
      2: 0000000000000030     0 NOTYPE  LOCAL  DEFAULT    2 LBB0_2
      3: 0000000000000048     0 NOTYPE  LOCAL  DEFAULT    2 LBB0_4
      4: 0000000000000000     4 OBJECT  LOCAL  DEFAULT    4 __kernel_version
      5: 0000000000000000     0 SECTION LOCAL  DEFAULT    4 .rodata
      6: 0000000000000000    80 FUNC    GLOBAL DEFAULT    2 prog
-bash-4.4$

The relocation is for the first insn.
The address is the start of .rodata section, which happens to
match variable __kernel_version (size 4).

The libbpf API can provide a way for user to assign a value to readonly 
section. In this particular case, e.g., on HostA, __kernel_version is
assigned to 416, which means the first 4 bytes of .rodata is modified
to have value 416. Considering this is a generic interface, the API
may look like
   bpf_object__change_readonly_value(const char *var_name, void *val_buf,
     unsigned var_buf_size);
The libbpf will change the value if there is a "var_name" in rodata
section and val_buf_size matches the size in the symbol table.

> could get away with doing this from loader side if it's simple enough,
> though one thing that would be good to avoid is to duplicate all the
> complex branch fixup logic etc that we have in kernel already. Are you

I totally agree that kernel is already able to prune dead codes while 
maintaining correct func/line info. We should do that part in kernel.

Let us look at the byte codes,

0000000000000000 prog:
        0:       r1 = 0 ll
        2:       r1 = *(u32 *)(r1 + 0)
        3:       r2 = 1
        4:       if r1 == 416 goto +1 <LBB0_2>
        5:       r2 = 2

0000000000000030 LBB0_2:
        6:       r0 = 0
        7:       if r1 == 411 goto +1 <LBB0_4>
        8:       r0 = r2

0000000000000048 LBB0_4:
        9:       exit

Here, the goal is to let r1 at insn #2 get the constant.
Do you think we can get it from the kernel? In this particular case,
insn #0, get a romap_ptr with addr of rodata section offset 0,
insn #2, load u32 from romap offset 0, the value is already populated, 
e.g., 416.

The verifier is path sensitive, will need extra care to
perform such transformation in case it is invalid in different paths.
Maybe slightly extension of verifier is able to do this?
Initially we do not need to handle complicated cases. Most global/static
variable accesses are all like
    r1 = #num ll
    r1 = *(type *)(r1 + offset)
If there is branch into the middle of the above pair of insns
and r1 is romap_ptr, it is totally safe to replace the second insn
as r1 = constant which can enable later dead code elimination.
If all read only region access are converted to constants,
"r1 = #num ll" ld_imm64 insns can be removed as well.

> thinking to mark these via BTF in some way such that loader does inline
> replacement?

I have not thought about BTF. BTF could provide information about insn 
#2 referring to a particular readonly section location. But looks like 
verifier is able to track it as well in the above?

Let us first study whether without BTF is okay. If needed, we can go
through BTF path with compiler assistance.

> 
> Thanks,
> Daniel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01  0:19     ` Daniel Borkmann
@ 2019-03-02  0:23       ` Yonghong Song
  2019-03-02  0:27         ` Daniel Borkmann
  0 siblings, 1 reply; 46+ messages in thread
From: Yonghong Song @ 2019-03-02  0:23 UTC (permalink / raw)
  To: Daniel Borkmann, Stanislav Fomichev
  Cc: Alexei Starovoitov, bpf, netdev, joe, john.fastabend, tgraf,
	Andrii Nakryiko, jakub.kicinski, lmb



On 2/28/19 4:19 PM, Daniel Borkmann wrote:
> On 03/01/2019 12:41 AM, Stanislav Fomichev wrote:
>> On 03/01, Daniel Borkmann wrote:
>>> This work adds BPF loader support for global data sections
>>> to libbpf. This allows to write BPF programs in more natural
>>> C-like way by being able to define global variables and const
>>> data.
>>>
>>> Back at LPC 2018 [0] we presented a first prototype which
>>> implemented support for global data sections by extending BPF
>>> syscall where union bpf_attr would get additional memory/size
>>> pair for each section passed during prog load in order to later
>>> add this base address into the ldimm64 instruction along with
>>> the user provided offset when accessing a variable. Consensus
>>> from LPC was that for proper upstream support, it would be
>>> more desirable to use maps instead of bpf_attr extension as
>>> this would allow for introspection of these sections as well
>>> as potential life updates of their content. This work follows
>>> this path by taking the following steps from loader side:
>>>
>>>   1) In bpf_object__elf_collect() step we pick up ".data",
>>>      ".rodata", and ".bss" section information.
>>>
>>>   2) If present, in bpf_object__init_global_maps() we create
>>>      a map that corresponds to each of the present sections.
>>>      Given section size and access properties can differ, a
>>>      single entry array map is created with value size that
>>>      is corresponding to the ELF section size of .data, .bss
>>>      or .rodata. In the latter case, the map is created as
>>>      read-only from program side such that verifier rejects
>>>      any write attempts into .rodata. In a subsequent step,
>>>      for .data and .rodata sections, the section content is
>>>      copied into the map through bpf_map_update_elem(). For
>>>      .bss this is not necessary since array map is already
>>>      zero-initialized by default.
>>>
>>>   3) In bpf_program__collect_reloc() step, we record the
>>>      corresponding map, insn index, and relocation type for
>>>      the global data.
>>>
>>>   4) And last but not least in the actual relocation step in
>>>      bpf_program__relocate(), we mark the ldimm64 instruction
>>>      with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>>>      imm field the map's file descriptor is stored as similarly
>>>      done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>>>      (as ldimm64 is 2-insn wide) we store the access offset
>>>      into the section.
>>>
>>>   5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>>>      load will then store the actual target address in order
>>>      to have a 'map-lookup'-free access. That is, the actual
>>>      map value base address + offset. The destination register
>>>      in the verifier will then be marked as PTR_TO_MAP_VALUE,
>>>      containing the fixed offset as reg->off and backing BPF
>>>      map as reg->map_ptr. Meaning, it's treated as any other
>>>      normal map value from verification side, only with
>>>      efficient, direct value access instead of actual call to
>>>      map lookup helper as in the typical case.
>>>
>>> Simple example dump of program using globals vars in each
>>> section:
>>>
>>>    # readelf -a test_global_data.o
>>>    [...]
>>>    [ 6] .bss              NOBITS           0000000000000000  00000328
>>>         0000000000000010  0000000000000000  WA       0     0     8
>>>    [ 7] .data             PROGBITS         0000000000000000  00000328
>>>         0000000000000010  0000000000000000  WA       0     0     8
>>>    [ 8] .rodata           PROGBITS         0000000000000000  00000338
>>>         0000000000000018  0000000000000000   A       0     0     8
>>>    [...]
>>>      95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>>>      96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>>>      97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>>>      98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>>>      99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>>>     100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>>>     101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>>>    [...]
>>>
>>>    # bpftool prog
>>>    103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>>>         loaded_at 2019-02-28T02:02:35+0000  uid 0
>>>         xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>>>    # bpftool map show id 63
>>>    63: array  name .bss  flags 0x0                      <-- .bss area, rw
>> Can we use <main prog>.bss/data/rodata names? If we load more than one
>> prog with global data that should make it easier to find which one is which.
> 
> Yeah that's fine, we can change it. They could potentially also be shared,
> so <main prog>.bss/data/rodata might be misleading, but <obj>.bss/data/rodata
> could be.

Note the map_name field only 16 bytes (excluding ending '\0', only 15 
bytes). If <obj> file has a long name like test_verifier.o, you may have
to shorten the <obj> part of the name.

> 
> Thanks,
> Daniel
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-02  0:23       ` Yonghong Song
@ 2019-03-02  0:27         ` Daniel Borkmann
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-02  0:27 UTC (permalink / raw)
  To: Yonghong Song, Stanislav Fomichev
  Cc: Alexei Starovoitov, bpf, netdev, joe, john.fastabend, tgraf,
	Andrii Nakryiko, jakub.kicinski, lmb

On 03/02/2019 01:23 AM, Yonghong Song wrote:
> On 2/28/19 4:19 PM, Daniel Borkmann wrote:
>> On 03/01/2019 12:41 AM, Stanislav Fomichev wrote:
>>> On 03/01, Daniel Borkmann wrote:
>>>> This work adds BPF loader support for global data sections
>>>> to libbpf. This allows to write BPF programs in more natural
>>>> C-like way by being able to define global variables and const
>>>> data.
>>>>
>>>> Back at LPC 2018 [0] we presented a first prototype which
>>>> implemented support for global data sections by extending BPF
>>>> syscall where union bpf_attr would get additional memory/size
>>>> pair for each section passed during prog load in order to later
>>>> add this base address into the ldimm64 instruction along with
>>>> the user provided offset when accessing a variable. Consensus
>>>> from LPC was that for proper upstream support, it would be
>>>> more desirable to use maps instead of bpf_attr extension as
>>>> this would allow for introspection of these sections as well
>>>> as potential life updates of their content. This work follows
>>>> this path by taking the following steps from loader side:
>>>>
>>>>   1) In bpf_object__elf_collect() step we pick up ".data",
>>>>      ".rodata", and ".bss" section information.
>>>>
>>>>   2) If present, in bpf_object__init_global_maps() we create
>>>>      a map that corresponds to each of the present sections.
>>>>      Given section size and access properties can differ, a
>>>>      single entry array map is created with value size that
>>>>      is corresponding to the ELF section size of .data, .bss
>>>>      or .rodata. In the latter case, the map is created as
>>>>      read-only from program side such that verifier rejects
>>>>      any write attempts into .rodata. In a subsequent step,
>>>>      for .data and .rodata sections, the section content is
>>>>      copied into the map through bpf_map_update_elem(). For
>>>>      .bss this is not necessary since array map is already
>>>>      zero-initialized by default.
>>>>
>>>>   3) In bpf_program__collect_reloc() step, we record the
>>>>      corresponding map, insn index, and relocation type for
>>>>      the global data.
>>>>
>>>>   4) And last but not least in the actual relocation step in
>>>>      bpf_program__relocate(), we mark the ldimm64 instruction
>>>>      with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>>>>      imm field the map's file descriptor is stored as similarly
>>>>      done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>>>>      (as ldimm64 is 2-insn wide) we store the access offset
>>>>      into the section.
>>>>
>>>>   5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>>>>      load will then store the actual target address in order
>>>>      to have a 'map-lookup'-free access. That is, the actual
>>>>      map value base address + offset. The destination register
>>>>      in the verifier will then be marked as PTR_TO_MAP_VALUE,
>>>>      containing the fixed offset as reg->off and backing BPF
>>>>      map as reg->map_ptr. Meaning, it's treated as any other
>>>>      normal map value from verification side, only with
>>>>      efficient, direct value access instead of actual call to
>>>>      map lookup helper as in the typical case.
>>>>
>>>> Simple example dump of program using globals vars in each
>>>> section:
>>>>
>>>>    # readelf -a test_global_data.o
>>>>    [...]
>>>>    [ 6] .bss              NOBITS           0000000000000000  00000328
>>>>         0000000000000010  0000000000000000  WA       0     0     8
>>>>    [ 7] .data             PROGBITS         0000000000000000  00000328
>>>>         0000000000000010  0000000000000000  WA       0     0     8
>>>>    [ 8] .rodata           PROGBITS         0000000000000000  00000338
>>>>         0000000000000018  0000000000000000   A       0     0     8
>>>>    [...]
>>>>      95: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    6 static_bss
>>>>      96: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    6 static_bss2
>>>>      97: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    7 static_data
>>>>      98: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    7 static_data2
>>>>      99: 0000000000000000     8 OBJECT  LOCAL  DEFAULT    8 static_rodata
>>>>     100: 0000000000000008     8 OBJECT  LOCAL  DEFAULT    8 static_rodata2
>>>>     101: 0000000000000010     8 OBJECT  LOCAL  DEFAULT    8 static_rodata3
>>>>    [...]
>>>>
>>>>    # bpftool prog
>>>>    103: sched_cls  name load_static_dat  tag 37a8b6822fc39a29  gpl
>>>>         loaded_at 2019-02-28T02:02:35+0000  uid 0
>>>>         xlated 712B  jited 426B  memlock 4096B  map_ids 63,64,65,66
>>>>    # bpftool map show id 63
>>>>    63: array  name .bss  flags 0x0                      <-- .bss area, rw
>>> Can we use <main prog>.bss/data/rodata names? If we load more than one
>>> prog with global data that should make it easier to find which one is which.
>>
>> Yeah that's fine, we can change it. They could potentially also be shared,
>> so <main prog>.bss/data/rodata might be misleading, but <obj>.bss/data/rodata
>> could be.
> 
> Note the map_name field only 16 bytes (excluding ending '\0', only 15 
> bytes). If <obj> file has a long name like test_verifier.o, you may have
> to shorten the <obj> part of the name.

Yes, it needs to be ensured that (bss/)data/rodata part is still visible
to the user, so <obj> part would need to be truncated accordingly.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-02-28 23:18 ` [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access Daniel Borkmann
                     ` (3 preceding siblings ...)
  2019-03-01 17:18   ` Yonghong Song
@ 2019-03-04  6:03   ` Andrii Nakryiko
  2019-03-04 15:59     ` Daniel Borkmann
  4 siblings, 1 reply; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-04  6:03 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, tgraf, Yonghong Song, Andrii Nakryiko,
	Jakub Kicinski, lmb

On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> This generic extension to BPF maps allows for directly loading an
> address residing inside a BPF map value as a single BPF ldimm64
> instruction.
>
> The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
> is a special src_reg flag for ldimm64 instruction that indicates
> that inside the first part of the double insns's imm field is a
> file descriptor which the verifier then replaces as a full 64bit
> address of the map into both imm parts.
>
> For the newly added BPF_PSEUDO_MAP_VALUE src_reg flag, the idea
> is similar: the first part of the double insns's imm field is
> again a file descriptor corresponding to the map, and the second
> part of the imm field is an offset. The verifier will then replace
> both imm parts with an address that points into the BPF map value
> for maps that support this operation. BPF_PSEUDO_MAP_VALUE is a
> distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could not
> differ offset 0 between load of map pointer versus load of map's
> value at offset 0.
>
> This allows for efficiently retrieving an address to a map value
> memory area without having to issue a helper call which needs to
> prepare registers according to calling convention, etc, without
> needing the extra NULL test, and without having to add the offset
> in an additional instruction to the value base pointer.
>
> The verifier then treats the destination register as PTR_TO_MAP_VALUE
> with constant reg->off from the user passed offset from the second
> imm field, and guarantees that this is within bounds of the map
> value. Any subsequent operations are normally treated as typical
> map value handling without anything else needed for verification.
>
> The two map operations for direct value access have been added to
> array map for now. In future other types could be supported as
> well depending on the use case. The main use case for this commit
> is to allow for BPF loader support for global variables that
> reside in .data/.rodata/.bss sections such that we can directly
> load the address of them with minimal additional infrastructure
> required. Loader support has been added in subsequent commits for
> libbpf library.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/linux/bpf.h               |  6 +++
>  include/linux/bpf_verifier.h      |  4 ++
>  include/uapi/linux/bpf.h          |  6 ++-
>  kernel/bpf/arraymap.c             | 33 ++++++++++++++
>  kernel/bpf/core.c                 |  3 +-
>  kernel/bpf/disasm.c               |  5 ++-
>  kernel/bpf/syscall.c              | 29 +++++++++---
>  kernel/bpf/verifier.c             | 73 +++++++++++++++++++++++--------
>  tools/bpf/bpftool/xlated_dumper.c |  3 ++
>  tools/include/uapi/linux/bpf.h    |  6 ++-
>  10 files changed, 138 insertions(+), 30 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a2132e09dc1c..bdcc6e2a9977 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -57,6 +57,12 @@ struct bpf_map_ops {
>                              const struct btf *btf,
>                              const struct btf_type *key_type,
>                              const struct btf_type *value_type);
> +
> +       /* Direct value access helpers. */
> +       int (*map_direct_value_access)(const struct bpf_map *map,
> +                                      u32 off, u64 *imm);
> +       int (*map_direct_value_offset)(const struct bpf_map *map,
> +                                      u64 imm, u32 *off);
>  };
>
>  struct bpf_map {
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 69f7a3449eda..6e28f1c24710 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -183,6 +183,10 @@ struct bpf_insn_aux_data {
>                 unsigned long map_state;        /* pointer/poison value for maps */
>                 s32 call_imm;                   /* saved imm field of call insn */
>                 u32 alu_limit;                  /* limit for add/sub register with pointer */
> +               struct {
> +                       u32 map_index;          /* index into used_maps[] */
> +                       u32 map_off;            /* offset from value base address */
> +               };
>         };
>         int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
>         int sanitize_stack_off; /* stack slot to be cleared */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2e308e90ffea..8884072e1a46 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -255,8 +255,12 @@ enum bpf_attach_type {
>   */
>  #define BPF_F_ANY_ALIGNMENT    (1U << 1)
>
> -/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
> +/* When bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_{FD,VALUE}, then
> + * bpf_ldimm64's insn[0]->imm == fd in both cases. Additionally,
> + * for BPF_PSEUDO_MAP_VALUE, insn[1]->imm == offset into value.
> + */
>  #define BPF_PSEUDO_MAP_FD      1
> +#define BPF_PSEUDO_MAP_VALUE   2
>
>  /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>   * offset to another bpf function
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index c72e0d8e1e65..3e5969c0c979 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -160,6 +160,37 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
>         return array->value + array->elem_size * (index & array->index_mask);
>  }
>
> +static int array_map_direct_value_access(const struct bpf_map *map, u32 off,
> +                                        u64 *imm)
> +{
> +       struct bpf_array *array = container_of(map, struct bpf_array, map);
> +
> +       if (map->max_entries != 1)
> +               return -ENOTSUPP;
> +       if (off >= map->value_size)
> +               return -EINVAL;
> +
> +       *imm = (unsigned long)array->value;
> +       return 0;
> +}
> +
> +static int array_map_direct_value_offset(const struct bpf_map *map, u64 imm,
> +                                        u32 *off)
> +{
> +       struct bpf_array *array = container_of(map, struct bpf_array, map);
> +       unsigned long range = map->value_size;
> +       unsigned long base  = array->value;
> +       unsigned long addr  = imm;
> +
> +       if (map->max_entries != 1)
> +               return -ENOENT;
> +       if (addr < base || addr >= base + range)
> +               return -ENOENT;
> +
> +       *off = addr - base;
> +       return 0;
> +}
> +
>  /* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
>  static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
>  {
> @@ -419,6 +450,8 @@ const struct bpf_map_ops array_map_ops = {
>         .map_update_elem = array_map_update_elem,
>         .map_delete_elem = array_map_delete_elem,
>         .map_gen_lookup = array_map_gen_lookup,
> +       .map_direct_value_access = array_map_direct_value_access,
> +       .map_direct_value_offset = array_map_direct_value_offset,
>         .map_seq_show_elem = array_map_seq_show_elem,
>         .map_check_btf = array_map_check_btf,
>  };
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 1c14c347f3cf..49fc0ff14537 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -286,7 +286,8 @@ int bpf_prog_calc_tag(struct bpf_prog *fp)
>                 dst[i] = fp->insnsi[i];
>                 if (!was_ld_map &&
>                     dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
> -                   dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
> +                   (dst[i].src_reg == BPF_PSEUDO_MAP_FD ||
> +                    dst[i].src_reg == BPF_PSEUDO_MAP_VALUE)) {
>                         was_ld_map = true;
>                         dst[i].imm = 0;
>                 } else if (was_ld_map &&
> diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
> index de73f55e42fd..d9ce383c0f9c 100644
> --- a/kernel/bpf/disasm.c
> +++ b/kernel/bpf/disasm.c
> @@ -205,10 +205,11 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
>                          * part of the ldimm64 insn is accessible.
>                          */
>                         u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
> -                       bool map_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD;
> +                       bool is_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD ||
> +                                     insn->src_reg == BPF_PSEUDO_MAP_VALUE;
>                         char tmp[64];
>
> -                       if (map_ptr && !allow_ptr_leaks)
> +                       if (is_ptr && !allow_ptr_leaks)
>                                 imm = 0;
>
>                         verbose(cbs->private_data, "(%02x) r%d = %s\n",
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index 174581dfe225..d3ef45e01d7a 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2061,13 +2061,27 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
>  }
>
>  static const struct bpf_map *bpf_map_from_imm(const struct bpf_prog *prog,
> -                                             unsigned long addr)
> +                                             unsigned long addr, u32 *off,
> +                                             u32 *type)
>  {
> +       const struct bpf_map *map;
>         int i;
>
> -       for (i = 0; i < prog->aux->used_map_cnt; i++)
> -               if (prog->aux->used_maps[i] == (void *)addr)
> -                       return prog->aux->used_maps[i];
> +       *off = *type = 0;
> +       for (i = 0; i < prog->aux->used_map_cnt; i++) {
> +               map = prog->aux->used_maps[i];
> +               if (map == (void *)addr) {
> +                       *type = BPF_PSEUDO_MAP_FD;
> +                       return map;
> +               }
> +               if (!map->ops->map_direct_value_offset)
> +                       continue;
> +               if (!map->ops->map_direct_value_offset(map, addr, off)) {
> +                       *type = BPF_PSEUDO_MAP_VALUE;
> +                       return map;
> +               }
> +       }
> +
>         return NULL;
>  }
>
> @@ -2075,6 +2089,7 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
>  {
>         const struct bpf_map *map;
>         struct bpf_insn *insns;
> +       u32 off, type;
>         u64 imm;
>         int i;
>
> @@ -2102,11 +2117,11 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
>                         continue;
>
>                 imm = ((u64)insns[i + 1].imm << 32) | (u32)insns[i].imm;
> -               map = bpf_map_from_imm(prog, imm);
> +               map = bpf_map_from_imm(prog, imm, &off, &type);
>                 if (map) {
> -                       insns[i].src_reg = BPF_PSEUDO_MAP_FD;
> +                       insns[i].src_reg = type;
>                         insns[i].imm = map->id;
> -                       insns[i + 1].imm = 0;
> +                       insns[i + 1].imm = off;
>                         continue;
>                 }
>         }
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0e4edd7e3c5f..3ad05dda6e9d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4944,18 +4944,12 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>         return 0;
>  }
>
> -/* return the map pointer stored inside BPF_LD_IMM64 instruction */
> -static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
> -{
> -       u64 imm64 = ((u64) (u32) insn[0].imm) | ((u64) (u32) insn[1].imm) << 32;
> -
> -       return (struct bpf_map *) (unsigned long) imm64;
> -}
> -
>  /* verify BPF_LD_IMM64 instruction */
>  static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  {
> +       struct bpf_insn_aux_data *aux = cur_aux(env);
>         struct bpf_reg_state *regs = cur_regs(env);
> +       struct bpf_map *map;
>         int err;
>
>         if (BPF_SIZE(insn->code) != BPF_DW) {
> @@ -4979,11 +4973,22 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>                 return 0;
>         }
>
> -       /* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
> -       BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
> +       map = env->used_maps[aux->map_index];
> +       mark_reg_known_zero(env, regs, insn->dst_reg);
> +       regs[insn->dst_reg].map_ptr = map;
> +
> +       if (insn->src_reg == BPF_PSEUDO_MAP_VALUE) {
> +               regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
> +               regs[insn->dst_reg].off = aux->map_off;
> +               if (map_value_has_spin_lock(map))
> +                       regs[insn->dst_reg].id = ++env->id_gen;
> +       } else if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
> +               regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
> +       } else {
> +               verbose(env, "bpf verifier is misconfigured\n");
> +               return -EINVAL;
> +       }
>
> -       regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
> -       regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
>         return 0;
>  }
>
> @@ -6664,8 +6669,10 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                 }
>
>                 if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
> +                       struct bpf_insn_aux_data *aux;
>                         struct bpf_map *map;
>                         struct fd f;
> +                       u64 addr;
>
>                         if (i == insn_cnt - 1 || insn[1].code != 0 ||
>                             insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||

Next line after this one rejects ldimm64 instructions with off != 0.
This check needs to be changed, depending on whether src_reg ==
BPF_PSEUDO_MAP_VALUE, right?

This is also to the previously discussed question of not enforcing
offset (imm=0 in 2nd part of insn) for BPF_PSEUDO_MAP_FD. Seems like
verifier *does* enforce that (not that I'm advocating for re-using
BPF_PSEUDO_MAP_FD, just stumbled on this bit when going through
verifier code).

> @@ -6677,8 +6684,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                         if (insn->src_reg == 0)
>                                 /* valid generic load 64-bit imm */
>                                 goto next_insn;
> -
> -                       if (insn->src_reg != BPF_PSEUDO_MAP_FD) {
> +                       if (insn->src_reg != BPF_PSEUDO_MAP_FD &&
> +                           insn->src_reg != BPF_PSEUDO_MAP_VALUE) {
>                                 verbose(env,
>                                         "unrecognized bpf_ld_imm64 insn\n");
>                                 return -EINVAL;
> @@ -6698,16 +6705,44 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                                 return err;
>                         }
>
> -                       /* store map pointer inside BPF_LD_IMM64 instruction */
> -                       insn[0].imm = (u32) (unsigned long) map;
> -                       insn[1].imm = ((u64) (unsigned long) map) >> 32;
> +                       aux = &env->insn_aux_data[i];
> +                       if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
> +                               addr = (unsigned long)map;
> +                       } else {
> +                               u32 off = insn[1].imm;
> +
> +                               if (off >= BPF_MAX_VAR_OFF) {
> +                                       verbose(env, "direct value offset of %u is not allowed\n",
> +                                               off);
> +                                       return -EINVAL;
> +                               }
> +                               if (!map->ops->map_direct_value_access) {
> +                                       verbose(env, "no direct value access support for this map type\n");
> +                                       return -EINVAL;
> +                               }
> +
> +                               err = map->ops->map_direct_value_access(map, off, &addr);
> +                               if (err) {
> +                                       verbose(env, "invalid access to map value pointer, value_size=%u off=%u\n",
> +                                               map->value_size, off);
> +                                       return err;
> +                               }
> +
> +                               aux->map_off = off;
> +                               addr += off;
> +                       }
> +
> +                       insn[0].imm = (u32)addr;
> +                       insn[1].imm = addr >> 32;
>
>                         /* check whether we recorded this map already */
> -                       for (j = 0; j < env->used_map_cnt; j++)
> +                       for (j = 0; j < env->used_map_cnt; j++) {
>                                 if (env->used_maps[j] == map) {
> +                                       aux->map_index = j;
>                                         fdput(f);
>                                         goto next_insn;
>                                 }
> +                       }
>
>                         if (env->used_map_cnt >= MAX_USED_MAPS) {
>                                 fdput(f);
> @@ -6724,6 +6759,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                                 fdput(f);
>                                 return PTR_ERR(map);
>                         }
> +
> +                       aux->map_index = env->used_map_cnt;
>                         env->used_maps[env->used_map_cnt++] = map;
>
>                         if (bpf_map_is_cgroup_storage(map) &&
> diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
> index 7073dbe1ff27..0bb17bf88b18 100644
> --- a/tools/bpf/bpftool/xlated_dumper.c
> +++ b/tools/bpf/bpftool/xlated_dumper.c
> @@ -195,6 +195,9 @@ static const char *print_imm(void *private_data,
>         if (insn->src_reg == BPF_PSEUDO_MAP_FD)
>                 snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>                          "map[id:%u]", insn->imm);
> +       else if (insn->src_reg == BPF_PSEUDO_MAP_VALUE)
> +               snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
> +                        "map[id:%u][0]+%u", insn->imm, (insn + 1)->imm);
>         else
>                 snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>                          "0x%llx", (unsigned long long)full_imm);
> diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
> index 2e308e90ffea..8884072e1a46 100644
> --- a/tools/include/uapi/linux/bpf.h
> +++ b/tools/include/uapi/linux/bpf.h
> @@ -255,8 +255,12 @@ enum bpf_attach_type {
>   */
>  #define BPF_F_ANY_ALIGNMENT    (1U << 1)
>
> -/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
> +/* When bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_{FD,VALUE}, then
> + * bpf_ldimm64's insn[0]->imm == fd in both cases. Additionally,
> + * for BPF_PSEUDO_MAP_VALUE, insn[1]->imm == offset into value.
> + */
>  #define BPF_PSEUDO_MAP_FD      1
> +#define BPF_PSEUDO_MAP_VALUE   2
>
>  /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>   * offset to another bpf function
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-03-04  6:03   ` Andrii Nakryiko
@ 2019-03-04 15:59     ` Daniel Borkmann
  2019-03-04 17:32       ` Andrii Nakryiko
  0 siblings, 1 reply; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-04 15:59 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, tgraf, Yonghong Song, Andrii Nakryiko,
	Jakub Kicinski, lmb

On 03/04/2019 07:03 AM, Andrii Nakryiko wrote:
> On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
[...]
>> @@ -6664,8 +6669,10 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>>                 }
>>
>>                 if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
>> +                       struct bpf_insn_aux_data *aux;
>>                         struct bpf_map *map;
>>                         struct fd f;
>> +                       u64 addr;
>>
>>                         if (i == insn_cnt - 1 || insn[1].code != 0 ||
>>                             insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
> 
> Next line after this one rejects ldimm64 instructions with off != 0.
> This check needs to be changed, depending on whether src_reg ==
> BPF_PSEUDO_MAP_VALUE, right?

Yes, that's correct, I already have that changed in my local branch for
supporting non-zero off.

> This is also to the previously discussed question of not enforcing
> offset (imm=0 in 2nd part of insn) for BPF_PSEUDO_MAP_FD. Seems like
> verifier *does* enforce that (not that I'm advocating for re-using
> BPF_PSEUDO_MAP_FD, just stumbled on this bit when going through
> verifier code).

Not really, lets test:

[...]
        .insns = {
        BPF_MOV64_IMM(BPF_REG_0, 0),
        BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, BPF_REG_1,
                     BPF_PSEUDO_MAP_FD, 0, 0),
        BPF_RAW_INSN(0, 0, 0, 0, 0xfefefe),
        BPF_EXIT_INSN(),
        },
[...]

#545/p test14 ld_imm64: reject 2nd imm != 0 FAIL
Unexpected success to load!
0: (b7) r0 = 0
1: (18) r1 = 0xffff97e612486400
3: (95) exit
processed 3 insns (limit 131072), stack depth 0
Summary: 0 PASSED, 0 SKIPPED, 2 FAILED

So I still think it would be worth doing something like the below
for bpf:

From 290f739ae6bab7b0709d327855a1812f9022beed Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Mon, 4 Mar 2019 14:22:41 +0000
Subject: [PATCH bpf] bpf: fix replace_map_fd_with_map_ptr wrt ldimm64 wrt second imm field

Non-zero imm value in the second part of the ldimm64 instruction for
BPF_PSEUDO_MAP_FD is invalid, and thus must be rejected. The map fd
only ever sits in the first instructions' imm field. None of the BPF
loaders known to us are using it, so risk of regression is minimal.
For clarity and consistency, the few insn->{src_reg,imm} occurences
are rewritten into insn[0].{src_reg,imm}. Add a test case to the BPF
selftest suite as well.

Fixes: 0246e64d9a5f ("bpf: handle pseudo BPF_LD_IMM64 insn")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/bpf/verifier.c                           | 10 +++++-----
 tools/testing/selftests/bpf/verifier/ld_imm64.c | 17 +++++++++++++++--
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 0e4edd7e3c5f..c8d2a948db37 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -6678,17 +6678,17 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
 				/* valid generic load 64-bit imm */
 				goto next_insn;

-			if (insn->src_reg != BPF_PSEUDO_MAP_FD) {
-				verbose(env,
-					"unrecognized bpf_ld_imm64 insn\n");
+			if (insn[0].src_reg != BPF_PSEUDO_MAP_FD ||
+			    insn[1].imm != 0) {
+				verbose(env, "unrecognized bpf_ld_imm64 insn\n");
 				return -EINVAL;
 			}

-			f = fdget(insn->imm);
+			f = fdget(insn[0].imm);
 			map = __bpf_map_get(f);
 			if (IS_ERR(map)) {
 				verbose(env, "fd %d is not pointing to valid bpf_map\n",
-					insn->imm);
+					insn[0].imm);
 				return PTR_ERR(map);
 			}

diff --git a/tools/testing/selftests/bpf/verifier/ld_imm64.c b/tools/testing/selftests/bpf/verifier/ld_imm64.c
index 28b8c805a293..4a1ff4560a8a 100644
--- a/tools/testing/selftests/bpf/verifier/ld_imm64.c
+++ b/tools/testing/selftests/bpf/verifier/ld_imm64.c
@@ -121,8 +121,8 @@
 	"test12 ld_imm64",
 	.insns = {
 	BPF_MOV64_IMM(BPF_REG_1, 0),
-	BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, BPF_REG_1, 0, 1),
-	BPF_RAW_INSN(0, 0, 0, 0, 1),
+	BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, BPF_REG_1, 0, 999),
+	BPF_RAW_INSN(0, 0, 0, 0, 0),
 	BPF_EXIT_INSN(),
 	},
 	.errstr = "not pointing to valid bpf_map",
@@ -139,3 +139,16 @@
 	.errstr = "invalid bpf_ld_imm64 insn",
 	.result = REJECT,
 },
+{
+	"test14 ld_imm64: reject 2nd imm != 0",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 0),
+	BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, BPF_REG_1,
+		     BPF_PSEUDO_MAP_FD, 0, 0),
+	BPF_RAW_INSN(0, 0, 0, 0, 0xfefefe),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_hash_48b = { 1 },
+	.errstr = "unrecognized bpf_ld_imm64 insn",
+	.result = REJECT,
+},
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access
  2019-03-04 15:59     ` Daniel Borkmann
@ 2019-03-04 17:32       ` Andrii Nakryiko
  0 siblings, 0 replies; 46+ messages in thread
From: Andrii Nakryiko @ 2019-03-04 17:32 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, tgraf, Yonghong Song, Andrii Nakryiko,
	Jakub Kicinski, lmb

On Mon, Mar 4, 2019 at 7:59 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 03/04/2019 07:03 AM, Andrii Nakryiko wrote:
> > On Thu, Feb 28, 2019 at 3:31 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> [...]
> >> @@ -6664,8 +6669,10 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
> >>                 }
> >>
> >>                 if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
> >> +                       struct bpf_insn_aux_data *aux;
> >>                         struct bpf_map *map;
> >>                         struct fd f;
> >> +                       u64 addr;
> >>
> >>                         if (i == insn_cnt - 1 || insn[1].code != 0 ||
> >>                             insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
> >
> > Next line after this one rejects ldimm64 instructions with off != 0.
> > This check needs to be changed, depending on whether src_reg ==
> > BPF_PSEUDO_MAP_VALUE, right?
>
> Yes, that's correct, I already have that changed in my local branch for
> supporting non-zero off.
>
> > This is also to the previously discussed question of not enforcing
> > offset (imm=0 in 2nd part of insn) for BPF_PSEUDO_MAP_FD. Seems like
> > verifier *does* enforce that (not that I'm advocating for re-using
> > BPF_PSEUDO_MAP_FD, just stumbled on this bit when going through
> > verifier code).
>
> Not really, lets test:

Ah, sorry, my bad. That code tests .off, not .imm, so yeah, any imm
would be accepted.

>
> [...]
>         .insns = {
>         BPF_MOV64_IMM(BPF_REG_0, 0),
>         BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, BPF_REG_1,
>                      BPF_PSEUDO_MAP_FD, 0, 0),
>         BPF_RAW_INSN(0, 0, 0, 0, 0xfefefe),
>         BPF_EXIT_INSN(),
>         },
> [...]
>
> #545/p test14 ld_imm64: reject 2nd imm != 0 FAIL
> Unexpected success to load!
> 0: (b7) r0 = 0
> 1: (18) r1 = 0xffff97e612486400
> 3: (95) exit
> processed 3 insns (limit 131072), stack depth 0
> Summary: 0 PASSED, 0 SKIPPED, 2 FAILED
>
> So I still think it would be worth doing something like the below
> for bpf:

Yep, lgtm.

>
> From 290f739ae6bab7b0709d327855a1812f9022beed Mon Sep 17 00:00:00 2001
> From: Daniel Borkmann <daniel@iogearbox.net>
> Date: Mon, 4 Mar 2019 14:22:41 +0000
> Subject: [PATCH bpf] bpf: fix replace_map_fd_with_map_ptr wrt ldimm64 wrt second imm field
>
> Non-zero imm value in the second part of the ldimm64 instruction for
> BPF_PSEUDO_MAP_FD is invalid, and thus must be rejected. The map fd
> only ever sits in the first instructions' imm field. None of the BPF
> loaders known to us are using it, so risk of regression is minimal.
> For clarity and consistency, the few insn->{src_reg,imm} occurences
> are rewritten into insn[0].{src_reg,imm}. Add a test case to the BPF
> selftest suite as well.
>
> Fixes: 0246e64d9a5f ("bpf: handle pseudo BPF_LD_IMM64 insn")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  kernel/bpf/verifier.c                           | 10 +++++-----
>  tools/testing/selftests/bpf/verifier/ld_imm64.c | 17 +++++++++++++++--
>  2 files changed, 20 insertions(+), 7 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 0e4edd7e3c5f..c8d2a948db37 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -6678,17 +6678,17 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                                 /* valid generic load 64-bit imm */
>                                 goto next_insn;
>
> -                       if (insn->src_reg != BPF_PSEUDO_MAP_FD) {
> -                               verbose(env,
> -                                       "unrecognized bpf_ld_imm64 insn\n");
> +                       if (insn[0].src_reg != BPF_PSEUDO_MAP_FD ||
> +                           insn[1].imm != 0) {
> +                               verbose(env, "unrecognized bpf_ld_imm64 insn\n");
>                                 return -EINVAL;
>                         }
>
> -                       f = fdget(insn->imm);
> +                       f = fdget(insn[0].imm);
>                         map = __bpf_map_get(f);
>                         if (IS_ERR(map)) {
>                                 verbose(env, "fd %d is not pointing to valid bpf_map\n",
> -                                       insn->imm);
> +                                       insn[0].imm);
>                                 return PTR_ERR(map);
>                         }
>
> diff --git a/tools/testing/selftests/bpf/verifier/ld_imm64.c b/tools/testing/selftests/bpf/verifier/ld_imm64.c
> index 28b8c805a293..4a1ff4560a8a 100644
> --- a/tools/testing/selftests/bpf/verifier/ld_imm64.c
> +++ b/tools/testing/selftests/bpf/verifier/ld_imm64.c
> @@ -121,8 +121,8 @@
>         "test12 ld_imm64",
>         .insns = {
>         BPF_MOV64_IMM(BPF_REG_1, 0),
> -       BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, BPF_REG_1, 0, 1),
> -       BPF_RAW_INSN(0, 0, 0, 0, 1),
> +       BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, 0, BPF_REG_1, 0, 999),
> +       BPF_RAW_INSN(0, 0, 0, 0, 0),
>         BPF_EXIT_INSN(),
>         },
>         .errstr = "not pointing to valid bpf_map",
> @@ -139,3 +139,16 @@
>         .errstr = "invalid bpf_ld_imm64 insn",
>         .result = REJECT,
>  },
> +{
> +       "test14 ld_imm64: reject 2nd imm != 0",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 0),
> +       BPF_RAW_INSN(BPF_LD | BPF_IMM | BPF_DW, BPF_REG_1,
> +                    BPF_PSEUDO_MAP_FD, 0, 0),
> +       BPF_RAW_INSN(0, 0, 0, 0, 0xfefefe),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_hash_48b = { 1 },
> +       .errstr = "unrecognized bpf_ld_imm64 insn",
> +       .result = REJECT,
> +},
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* static bpf vars. Was: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-01 20:06             ` Daniel Borkmann
  2019-03-01 20:25               ` Yonghong Song
@ 2019-03-05  2:28               ` Alexei Starovoitov
  2019-03-05  9:31                 ` Daniel Borkmann
  1 sibling, 1 reply; 46+ messages in thread
From: Alexei Starovoitov @ 2019-03-05  2:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Yonghong Song, Andrii Nakryiko, Alexei Starovoitov, bpf, netdev,
	joe, john.fastabend, tgraf, Andrii Nakryiko, jakub.kicinski, lmb

On Fri, Mar 01, 2019 at 09:06:35PM +0100, Daniel Borkmann wrote:
> 

Overall I think the patches and direction is great.
Thanks a lot for working on it.
More thoughts below:

> By the way, from LLVM side, do you think it makes sense for local vars
> where you encode the offset into insn->imm to already encode it into
> (insn+1)->imm of the ldimm64, so that loaders can just pass this offset
> through instead of fixing it up like I did? I'm fine either way though,
> just thought might be worth pointing out while we're at it. :)

1.
typical ldimm64 carries 64-bit value. upper 32-bits are in insn[1]->imm.
I think it's better to stick to this meaning of imm for encoding
the section offset and keep it in insn[0]->imm.
Especially since from disassembler pov it looks like normal ldimm64 with relocation.
Later libbpf does PSEUDO trick and can move [0]->imm to [1]->imm.
Or it can use [1]->imm to store FD to pass to kernel?
Another alternative is to use 16-bit of 'off' to encode the offset,
but the limit of 32kbyte of static variables is probably too small.

2.
I think it's better to only allow 'static int foo' in libbpf for now.
In C 'global' variables suppose to be shareable across multiple elf files.
libbpf doesn't have support for it yet. We need to start discussing this multi elf
support which will lead into discussion about bpf libraries.
I think we need a bit of freedom to decide the meaning of 'global' modifier.
So for now I would support only 'static int foo' in libbpf.
llvm supports both and that's ok, since that's a standard elf markings.

Also were these patches tested with 'static struct {int a; int b;} foo;' and
'static char foo[]="my string";' ?
I suspect it should 'just work' with proposed design, but would be good to have a test.

3.
let's come up with concrete plan for BTF for these static variables.
With normal maps the BTF was retrofitted via BPF_ANNOTATE_KV_PAIR macro
and no one is proud of this interface choice. We had 'struct bpf_map_def'
and the macro was the least evil.
For static variables let's figure out how to integrate BTF into it cleanly.
LLVM already knows about the types. It needs another kind to describe
the variable names and offsets. I think what we did with BTF_KIND_FUNC
should be applicable here. Last time we discussed that we will add BTF_KIND_VAR
for this purpose. Shouldn't be too much work to hack it into llvm now?
I would even go a step further and make BTF to be mandatory for static vars.
Introspection of BPF programs will go long way.

4.
if we make BTF mandatory then the issue of map names will be solved too.
The <obj>.bss/data/rodata might not fit into map_name[16],
but with BTF we can give it full name.
"<obj>.data" looks great to me especially from introspection angle.

5.
I think it makes sense to keep .bss vs .data vs .rodata separate.
I can imagine a program using large zero-inited static array.
It's important to save space in elf file, save time to do map create
and map populate. With combined .data and .bss it would be much harder.

6.
as the next step would be good to add two new syscall command to read/write
individual static variables. Something like:
bpf_map_read/write_bytes(map, key, offset_in_value, buf, size_of_buf)
would allow user space to tweak certain variables.
With mandatory BTF we can have an api to read/write variable by name
(and not only by offset).
Such bpf_map_read_bytes() would be useful for regular hash maps too.
It's convenient to be able to update few bytes in the hash value instead
of overwriting the whole thing.

7.
since these 3 maps are created as normal arrays from kernel pov
the 'bpftool map show' should not exclude them from the output.
Similarly libbpf api bpf_object__for_each_map() and bpf_map__next()
should probably include them too. There can be bpf_map__is_pseudo() flag
to tell them apart, but since it's a normal map in the kernel
other libbpf accessors bpf_map__name(), bpf_map__pin() should work too.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: static bpf vars. Was: [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections
  2019-03-05  2:28               ` static bpf vars. Was: " Alexei Starovoitov
@ 2019-03-05  9:31                 ` Daniel Borkmann
  0 siblings, 0 replies; 46+ messages in thread
From: Daniel Borkmann @ 2019-03-05  9:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Yonghong Song, Andrii Nakryiko, Alexei Starovoitov, bpf, netdev,
	joe, john.fastabend, tgraf, Andrii Nakryiko, jakub.kicinski, lmb

On 03/05/2019 03:28 AM, Alexei Starovoitov wrote:
> On Fri, Mar 01, 2019 at 09:06:35PM +0100, Daniel Borkmann wrote:
> 
> Overall I think the patches and direction is great.
> Thanks a lot for working on it.
> More thoughts below:

Okay, thanks!

>> By the way, from LLVM side, do you think it makes sense for local vars
>> where you encode the offset into insn->imm to already encode it into
>> (insn+1)->imm of the ldimm64, so that loaders can just pass this offset
>> through instead of fixing it up like I did? I'm fine either way though,
>> just thought might be worth pointing out while we're at it. :)
> 
> 1.
> typical ldimm64 carries 64-bit value. upper 32-bits are in insn[1]->imm.
> I think it's better to stick to this meaning of imm for encoding
> the section offset and keep it in insn[0]->imm.
> Especially since from disassembler pov it looks like normal ldimm64 with relocation.

Hadn't thought about it from this angle, but yes, from debugging /disasm
point of view it makes perfect sense to keep it as-is.

> Later libbpf does PSEUDO trick and can move [0]->imm to [1]->imm.
> Or it can use [1]->imm to store FD to pass to kernel?

My preference would rather be the former as it's consistent with how
we do it for normal map fds as well, so I'd keep loader side having it
moved as it currently is; I think it's fine.

> Another alternative is to use 16-bit of 'off' to encode the offset,
> but the limit of 32kbyte of static variables is probably too small.

Yes agree, and I've also reused both off fields now for optionally
allowing to select an index into the array based on Andrii's feedback.

> 2.
> I think it's better to only allow 'static int foo' in libbpf for now.
> In C 'global' variables suppose to be shareable across multiple elf files.
> libbpf doesn't have support for it yet. We need to start discussing this multi elf
> support which will lead into discussion about bpf libraries.
> I think we need a bit of freedom to decide the meaning of 'global' modifier.
> So for now I would support only 'static int foo' in libbpf.
> llvm supports both and that's ok, since that's a standard elf markings.

Makes sense, we first need to have a better understanding on how semantics
should look there from BPF pov. I'll restrict libbpf for static vars for
the time being, and throw an error message otherwise.

There is one more semantical question: a BPF ELF file can have multiple
program entry points in it. Both programs from there could refer to static
global variables inside that object file. My thinking which this patch set
also implemented is to have both of them refer to the same .bss/.data/.rodata
map (and _not_ separate ones) as I think this would also naturally be what
the user is expecting when referring to the same global vars in the given
object file.

> Also were these patches tested with 'static struct {int a; int b;} foo;' and
> 'static char foo[]="my string";' ?
> I suspect it should 'just work' with proposed design, but would be good to have a test.

+1, I'll add more tests. Iirc, I tested it back with the original prototype
from LPC which worked same way from high level point of view, but good to
have it explicitly in BPF selftests, agree, will add.

> 3.
> let's come up with concrete plan for BTF for these static variables.
> With normal maps the BTF was retrofitted via BPF_ANNOTATE_KV_PAIR macro
> and no one is proud of this interface choice. We had 'struct bpf_map_def'
> and the macro was the least evil.
> For static variables let's figure out how to integrate BTF into it cleanly.
> LLVM already knows about the types. It needs another kind to describe
> the variable names and offsets. I think what we did with BTF_KIND_FUNC
> should be applicable here. Last time we discussed that we will add BTF_KIND_VAR
> for this purpose. Shouldn't be too much work to hack it into llvm now?
> I would even go a step further and make BTF to be mandatory for static vars.
> Introspection of BPF programs will go long way.

We can make that mandatory, nice thing would be that such BPF_ANNOTATE_KV_PAIR
macro hack wouldn't be needed at all. From BTF pov, that would be some sort of
BTF_KIND_SEC as a section 'container' for all BTF_KIND_VAR variables belonging
to it, so map could use its id to describe its map value as such?

> 4.
> if we make BTF mandatory then the issue of map names will be solved too.
> The <obj>.bss/data/rodata might not fit into map_name[16],
> but with BTF we can give it full name.
> "<obj>.data" looks great to me especially from introspection angle.

+1

> 5.
> I think it makes sense to keep .bss vs .data vs .rodata separate.
> I can imagine a program using large zero-inited static array.
> It's important to save space in elf file, save time to do map create
> and map populate. With combined .data and .bss it would be much harder.

I'm fine either way, but yeah, this approach as also taken here means less
fixing up from loader pov, and it would probably also make BTF handling
easier. So, sticking with separate .bss/.data/.rodata then.

> 6.
> as the next step would be good to add two new syscall command to read/write
> individual static variables. Something like:
> bpf_map_read/write_bytes(map, key, offset_in_value, buf, size_of_buf)
> would allow user space to tweak certain variables.
> With mandatory BTF we can have an api to read/write variable by name
> (and not only by offset).
> Such bpf_map_read_bytes() would be useful for regular hash maps too.
> It's convenient to be able to update few bytes in the hash value instead
> of overwriting the whole thing.

Definitely useful, there was also the suggestion to have an mmap interface
which would work at least for array maps, though haven't looked yet into
what would be needed from buffer mgmt side to remap for processes that just
got the fd by id or via bpf fs for retrieving existing content.

> 7.
> since these 3 maps are created as normal arrays from kernel pov
> the 'bpftool map show' should not exclude them from the output.
> Similarly libbpf api bpf_object__for_each_map() and bpf_map__next()
> should probably include them too. There can be bpf_map__is_pseudo() flag
> to tell them apart, but since it's a normal map in the kernel
> other libbpf accessors bpf_map__name(), bpf_map__pin() should work too.

Yeah, I'll integrate it there in libbpf, as long as we have something to
tell them apart from a user pov it should be good.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2019-03-05  9:31 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-28 23:18 [PATCH bpf-next v2 0/7] BPF support for global data Daniel Borkmann
2019-02-28 23:18 ` [PATCH bpf-next v2 1/7] bpf: implement lookup-free direct value access Daniel Borkmann
2019-03-01  3:33   ` Jann Horn
2019-03-01  3:58   ` kbuild test robot
2019-03-01  5:46   ` Andrii Nakryiko
2019-03-01  9:49     ` Daniel Borkmann
2019-03-01 18:50       ` Jakub Kicinski
2019-03-01 19:35       ` Andrii Nakryiko
2019-03-01 20:08         ` Jakub Kicinski
2019-03-01 17:18   ` Yonghong Song
2019-03-01 19:51     ` Daniel Borkmann
2019-03-01 23:02       ` Yonghong Song
2019-03-04  6:03   ` Andrii Nakryiko
2019-03-04 15:59     ` Daniel Borkmann
2019-03-04 17:32       ` Andrii Nakryiko
2019-02-28 23:18 ` [PATCH bpf-next v2 2/7] bpf: add program side {rd,wr}only support Daniel Borkmann
2019-03-01  3:51   ` Jakub Kicinski
2019-03-01  9:01     ` Daniel Borkmann
2019-02-28 23:18 ` [PATCH bpf-next v2 3/7] bpf, obj: allow . char as part of the name Daniel Borkmann
2019-03-01  5:52   ` Andrii Nakryiko
2019-03-01  9:04     ` Daniel Borkmann
2019-02-28 23:18 ` [PATCH bpf-next v2 4/7] bpf, libbpf: refactor relocation handling Daniel Borkmann
2019-02-28 23:18 ` [PATCH bpf-next v2 5/7] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
2019-02-28 23:41   ` Stanislav Fomichev
2019-03-01  0:19     ` Daniel Borkmann
2019-03-02  0:23       ` Yonghong Song
2019-03-02  0:27         ` Daniel Borkmann
2019-03-01  6:53   ` Andrii Nakryiko
2019-03-01 10:46     ` Daniel Borkmann
2019-03-01 18:10       ` Stanislav Fomichev
2019-03-01 18:46       ` Andrii Nakryiko
2019-03-01 18:11   ` Yonghong Song
2019-03-01 18:48     ` Andrii Nakryiko
2019-03-01 18:58       ` Yonghong Song
2019-03-01 19:10         ` Andrii Nakryiko
2019-03-01 19:19           ` Yonghong Song
2019-03-01 20:06             ` Daniel Borkmann
2019-03-01 20:25               ` Yonghong Song
2019-03-01 20:33                 ` Daniel Borkmann
2019-03-05  2:28               ` static bpf vars. Was: " Alexei Starovoitov
2019-03-05  9:31                 ` Daniel Borkmann
2019-03-01 19:56     ` Daniel Borkmann
2019-02-28 23:18 ` [PATCH bpf-next v2 6/7] bpf, selftest: test " Daniel Borkmann
2019-03-01 19:13   ` Andrii Nakryiko
2019-03-01 20:02     ` Daniel Borkmann
2019-02-28 23:18 ` [PATCH bpf-next v2 7/7] bpf, selftest: test {rd,wr}only flags and direct value access Daniel Borkmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).