All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH rfc v3 bpf-next 0/9] BPF support for global data
@ 2019-03-11 21:51 Daniel Borkmann
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 1/9] bpf: implement lookup-free direct value access for maps Daniel Borkmann
                   ` (8 more replies)
  0 siblings, 9 replies; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

This series is a major rework of previously submitted libbpf
patches [0] in order to add global data support for BPF. The
kernel has been extended to add proper infrastructure that allows
for full .bss/.data/.rodata sections on BPF loader side based
upon feedback from LPC discussions [1]. Latter support is then
also added into libbpf in this series which allows for more
natural C-like programming of BPF programs. For more information
on loader, please refer to 'bpf, libbpf: support global data/bss/
rodata sections' patch in this series. Joint work with Joe Stringer.

Thanks a lot!

Note, since merge window is still open, sending as rfc to dump
current progress I have from last v2 series, final v3 will go
out once bpf-next is open again.

  v2 -> rfc v3:
   - Add index selection into ldimm64 (Andrii)
   - Fix missing fdput() (Jann)
   - Reject invalid flags in BPF_F_*_PROG (Jakub)
   - Complete rework of libbpf support, includes:
    - Add objname to map name (Stanislav)
    - Make .rodata map full read-only after setup (Andrii)
    - Merge relocation handling into single one (Andrii)
    - Store global maps into obj->maps array (Andrii, Alexei)
    - Debug message when skipping section (Andrii)
    - Reject non-static global data till we have
      semantics for sharing them (Yonghong, Andrii, Alexei)
    - More test cases and completely reworked prog test (Alexei)
   - Fixes, cleanups, etc all over the set
   - Not yet addressed:
    - Make BTF mandatory for these maps (Alexei)
    -> Waiting till BTF support for these lands first
  v1 -> v2:
    - Instead of 32-bit static data, implement full global
      data support (Alexei)

  [0] https://patchwork.ozlabs.org/cover/1040290/
  [1] http://vger.kernel.org/lpc-bpf2018.html#session-3

Daniel Borkmann (7):
  bpf: implement lookup-free direct value access for maps
  bpf: add program side {rd,wr}only support for maps
  bpf: add syscall side map lock support
  bpf, obj: allow . char as part of the name
  bpf: sync bpf.h uapi header from tools infrastructure
  bpf, libbpf: support global data/bss/rodata sections
  bpf, selftest: test {rd,wr}only flags and direct value access

Joe Stringer (2):
  bpf, libbpf: refactor relocation handling
  bpf, selftest: test global data/bss/rodata sections

 include/linux/bpf.h                           |  35 +-
 include/linux/bpf_verifier.h                  |   4 +
 include/uapi/linux/bpf.h                      |  24 +-
 kernel/bpf/arraymap.c                         |  32 +-
 kernel/bpf/core.c                             |   3 +-
 kernel/bpf/disasm.c                           |   5 +-
 kernel/bpf/hashtab.c                          |   6 +-
 kernel/bpf/local_storage.c                    |   6 +-
 kernel/bpf/lpm_trie.c                         |   3 +-
 kernel/bpf/queue_stack_maps.c                 |   6 +-
 kernel/bpf/syscall.c                          | 111 ++++-
 kernel/bpf/verifier.c                         | 155 +++++--
 tools/bpf/bpftool/xlated_dumper.c             |   6 +
 tools/include/linux/filter.h                  |  14 +
 tools/include/uapi/linux/bpf.h                |  24 +-
 tools/lib/bpf/bpf.c                           |  10 +
 tools/lib/bpf/bpf.h                           |   1 +
 tools/lib/bpf/libbpf.c                        | 378 ++++++++++++++----
 tools/lib/bpf/libbpf.h                        |   1 +
 tools/lib/bpf/libbpf.map                      |   6 +
 tools/testing/selftests/bpf/bpf_helpers.h     |   8 +-
 .../selftests/bpf/prog_tests/global_data.c    | 157 ++++++++
 .../selftests/bpf/progs/test_global_data.c    | 106 +++++
 tools/testing/selftests/bpf/test_verifier.c   |  42 +-
 .../selftests/bpf/verifier/array_access.c     | 159 ++++++++
 .../bpf/verifier/direct_value_access.c        | 226 +++++++++++
 26 files changed, 1382 insertions(+), 146 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/global_data.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_global_data.c
 create mode 100644 tools/testing/selftests/bpf/verifier/direct_value_access.c

-- 
2.17.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH rfc v3 bpf-next 1/9] bpf: implement lookup-free direct value access for maps
  2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
@ 2019-03-11 21:51 ` Daniel Borkmann
  2019-03-14 18:11   ` Andrii Nakryiko
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support " Daniel Borkmann
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

This generic extension to BPF maps allows for directly loading an
address residing inside a BPF map value as a single BPF ldimm64
instruction.

The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
is a special src_reg flag for ldimm64 instruction that indicates
that inside the first part of the double insns's imm field is a
file descriptor which the verifier then replaces as a full 64bit
address of the map into both imm parts. For the newly added
BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following:
the first part of the double insns's imm field is again a file
descriptor corresponding to the map, and the second part of the
imm field is an offset into the value. Both insns's off fields
build the optional key resp. index to the map if it contains
more than just one element. The verifier will then replace both
imm parts with an address that points into the BPF map value
for maps that support this operation. BPF_PSEUDO_MAP_VALUE is
a distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could
not differ offset 0 between load of map pointer versus load of
map's value at offset 0, and changing BPF_PSEUDO_MAP_FD's
encoding into off by one to differ between regular map pointer
and map value pointer would add unnecessary complexity and
increases barrier for debuggability thus less suitable.

This extension allows for efficiently retrieving an address to
a map value memory area without having to issue a helper call
which needs to prepare registers according to calling convention,
etc, without needing the extra NULL test, and without having to
add the offset in an additional instruction to the value base
pointer. The verifier then treats the destination register as
PTR_TO_MAP_VALUE with constant reg->off from the user passed
offset from the second imm field, and guarantees that this is
within bounds of the map value. Any subsequent operations are
normally treated as typical map value handling without anything
else needed for verification.

The two map operations for direct value access have been added to
array map for now. In future other types could be supported as
well depending on the use case. The main use case for this commit
is to allow for BPF loader support for global variables that
reside in .data/.rodata/.bss sections such that we can directly
load the address of them with minimal additional infrastructure
required. Loader support has been added in subsequent commits for
libbpf library.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h               |   6 ++
 include/linux/bpf_verifier.h      |   4 ++
 include/uapi/linux/bpf.h          |  13 +++-
 kernel/bpf/arraymap.c             |  29 ++++++++
 kernel/bpf/core.c                 |   3 +-
 kernel/bpf/disasm.c               |   5 +-
 kernel/bpf/syscall.c              |  31 +++++++--
 kernel/bpf/verifier.c             | 109 ++++++++++++++++++++++--------
 tools/bpf/bpftool/xlated_dumper.c |   6 ++
 9 files changed, 168 insertions(+), 38 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index a2132e09dc1c..85b6b5dc883f 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -57,6 +57,12 @@ struct bpf_map_ops {
 			     const struct btf *btf,
 			     const struct btf_type *key_type,
 			     const struct btf_type *value_type);
+
+	/* Direct value access helpers. */
+	int (*map_direct_value_addr)(const struct bpf_map *map,
+				     u64 *imm, u32 idx, u32 off);
+	int (*map_direct_value_meta)(const struct bpf_map *map,
+				     u64 imm, u32 *idx, u32 *off);
 };
 
 struct bpf_map {
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 69f7a3449eda..6e28f1c24710 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -183,6 +183,10 @@ struct bpf_insn_aux_data {
 		unsigned long map_state;	/* pointer/poison value for maps */
 		s32 call_imm;			/* saved imm field of call insn */
 		u32 alu_limit;			/* limit for add/sub register with pointer */
+		struct {
+			u32 map_index;		/* index into used_maps[] */
+			u32 map_off;		/* offset from value base address */
+		};
 	};
 	int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
 	int sanitize_stack_off; /* stack slot to be cleared */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 3c38ac9a92a7..d0b80fce0fc9 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -255,8 +255,19 @@ enum bpf_attach_type {
  */
 #define BPF_F_ANY_ALIGNMENT	(1U << 1)
 
-/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
+/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
+ * two extensions:
+ *
+ * insn[0].src_reg:  BPF_PSEUDO_MAP_FD   BPF_PSEUDO_MAP_VALUE
+ * insn[0].imm:      map fd              map fd
+ * insn[1].imm:      0                   offset into value
+ * insn[0].off:      0                   32 bit index to the
+ * insn[1].off:      0                   map value
+ * ldimm64 rewrite:  address of map      address of map[index]+offset
+ * verifier type:    CONST_PTR_TO_MAP    PTR_TO_MAP_VALUE
+ */
 #define BPF_PSEUDO_MAP_FD	1
+#define BPF_PSEUDO_MAP_VALUE	2
 
 /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
  * offset to another bpf function
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index c72e0d8e1e65..862d20422ad1 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -160,6 +160,33 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
 	return array->value + array->elem_size * (index & array->index_mask);
 }
 
+static int array_map_direct_value_addr(const struct bpf_map *map, u64 *imm,
+				       u32 idx, u32 off)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+
+	if (idx >= map->max_entries || off >= map->value_size)
+		return -EINVAL;
+	*imm = (unsigned long)(array->value +
+			       array->elem_size * (idx & array->index_mask));
+	return 0;
+}
+
+static int array_map_direct_value_meta(const struct bpf_map *map, u64 imm,
+				       u32 *idx, u32 *off)
+{
+	struct bpf_array *array = container_of(map, struct bpf_array, map);
+	u64 rem, base = (unsigned long)array->value, slot = map->value_size;
+	u64 range = slot * map->max_entries;
+
+	if (imm < base || imm >= base + range)
+		return -ENOENT;
+	base = imm - base;
+	*idx = div64_u64_rem(base, slot, &rem);
+	*off = rem;
+	return 0;
+}
+
 /* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
 static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
 {
@@ -419,6 +446,8 @@ const struct bpf_map_ops array_map_ops = {
 	.map_update_elem = array_map_update_elem,
 	.map_delete_elem = array_map_delete_elem,
 	.map_gen_lookup = array_map_gen_lookup,
+	.map_direct_value_addr = array_map_direct_value_addr,
+	.map_direct_value_meta = array_map_direct_value_meta,
 	.map_seq_show_elem = array_map_seq_show_elem,
 	.map_check_btf = array_map_check_btf,
 };
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 3f08c257858e..af3dcd8b852b 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -292,7 +292,8 @@ int bpf_prog_calc_tag(struct bpf_prog *fp)
 		dst[i] = fp->insnsi[i];
 		if (!was_ld_map &&
 		    dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
-		    dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
+		    (dst[i].src_reg == BPF_PSEUDO_MAP_FD ||
+		     dst[i].src_reg == BPF_PSEUDO_MAP_VALUE)) {
 			was_ld_map = true;
 			dst[i].imm = 0;
 		} else if (was_ld_map &&
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index de73f55e42fd..d9ce383c0f9c 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -205,10 +205,11 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
 			 * part of the ldimm64 insn is accessible.
 			 */
 			u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
-			bool map_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD;
+			bool is_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD ||
+				      insn->src_reg == BPF_PSEUDO_MAP_VALUE;
 			char tmp[64];
 
-			if (map_ptr && !allow_ptr_leaks)
+			if (is_ptr && !allow_ptr_leaks)
 				imm = 0;
 
 			verbose(cbs->private_data, "(%02x) r%d = %s\n",
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index bc34cf9fe9ee..b0c7a6485c49 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2061,13 +2061,27 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
 }
 
 static const struct bpf_map *bpf_map_from_imm(const struct bpf_prog *prog,
-					      unsigned long addr)
+					      unsigned long addr, u32 *idx,
+					      u32 *off, u32 *type)
 {
+	const struct bpf_map *map;
 	int i;
 
-	for (i = 0; i < prog->aux->used_map_cnt; i++)
-		if (prog->aux->used_maps[i] == (void *)addr)
-			return prog->aux->used_maps[i];
+	*off = *idx = 0;
+	for (i = 0; i < prog->aux->used_map_cnt; i++) {
+		map = prog->aux->used_maps[i];
+		if (map == (void *)addr) {
+			*type = BPF_PSEUDO_MAP_FD;
+			return map;
+		}
+		if (!map->ops->map_direct_value_meta)
+			continue;
+		if (!map->ops->map_direct_value_meta(map, addr, idx, off)) {
+			*type = BPF_PSEUDO_MAP_VALUE;
+			return map;
+		}
+	}
+
 	return NULL;
 }
 
@@ -2075,6 +2089,7 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
 {
 	const struct bpf_map *map;
 	struct bpf_insn *insns;
+	u32 idx, off, type;
 	u64 imm;
 	int i;
 
@@ -2102,11 +2117,13 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
 			continue;
 
 		imm = ((u64)insns[i + 1].imm << 32) | (u32)insns[i].imm;
-		map = bpf_map_from_imm(prog, imm);
+		map = bpf_map_from_imm(prog, imm, &idx, &off, &type);
 		if (map) {
-			insns[i].src_reg = BPF_PSEUDO_MAP_FD;
+			insns[i].src_reg = type;
 			insns[i].imm = map->id;
-			insns[i + 1].imm = 0;
+			insns[i].off = (u16)idx;
+			insns[i + 1].imm = off;
+			insns[i + 1].off = (u16)(idx >> 16);
 			continue;
 		}
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ce166a002d16..57678cef9a2c 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4944,25 +4944,20 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
 	return 0;
 }
 
-/* return the map pointer stored inside BPF_LD_IMM64 instruction */
-static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
-{
-	u64 imm64 = ((u64) (u32) insn[0].imm) | ((u64) (u32) insn[1].imm) << 32;
-
-	return (struct bpf_map *) (unsigned long) imm64;
-}
-
 /* verify BPF_LD_IMM64 instruction */
 static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 {
+	struct bpf_insn_aux_data *aux = cur_aux(env);
 	struct bpf_reg_state *regs = cur_regs(env);
+	struct bpf_map *map;
 	int err;
 
 	if (BPF_SIZE(insn->code) != BPF_DW) {
 		verbose(env, "invalid BPF_LD_IMM insn\n");
 		return -EINVAL;
 	}
-	if (insn->off != 0) {
+
+	if (insn->src_reg != BPF_PSEUDO_MAP_VALUE && insn->off != 0) {
 		verbose(env, "BPF_LD_IMM64 uses reserved fields\n");
 		return -EINVAL;
 	}
@@ -4979,11 +4974,22 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
 		return 0;
 	}
 
-	/* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
-	BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
+	map = env->used_maps[aux->map_index];
+	mark_reg_known_zero(env, regs, insn->dst_reg);
+	regs[insn->dst_reg].map_ptr = map;
+
+	if (insn->src_reg == BPF_PSEUDO_MAP_VALUE) {
+		regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
+		regs[insn->dst_reg].off = aux->map_off;
+		if (map_value_has_spin_lock(map))
+			regs[insn->dst_reg].id = ++env->id_gen;
+	} else if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
+		regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
+	} else {
+		verbose(env, "bpf verifier is misconfigured\n");
+		return -EINVAL;
+	}
 
-	regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
-	regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
 	return 0;
 }
 
@@ -6664,23 +6670,34 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
 		}
 
 		if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
+			struct bpf_insn_aux_data *aux;
 			struct bpf_map *map;
 			struct fd f;
-
-			if (i == insn_cnt - 1 || insn[1].code != 0 ||
-			    insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
-			    insn[1].off != 0) {
+			u64 addr;
+
+			if (i == insn_cnt - 1 ||
+			    insn[1].code != 0 ||
+			    insn[1].dst_reg != 0 ||
+			    insn[1].src_reg != 0 ||
+			    (insn[1].off != 0 &&
+			     insn[0].src_reg != BPF_PSEUDO_MAP_VALUE)) {
 				verbose(env, "invalid bpf_ld_imm64 insn\n");
 				return -EINVAL;
 			}
 
-			if (insn->src_reg == 0)
+			if (insn[0].src_reg == 0)
 				/* valid generic load 64-bit imm */
 				goto next_insn;
 
-			if (insn[0].src_reg != BPF_PSEUDO_MAP_FD ||
-			    insn[1].imm != 0) {
-				verbose(env, "unrecognized bpf_ld_imm64 insn\n");
+			/* In final convert_pseudo_ld_imm64() step, this is
+			 * converted into regular 64-bit imm load insn.
+			 */
+			if ((insn[0].src_reg != BPF_PSEUDO_MAP_FD &&
+			     insn[0].src_reg != BPF_PSEUDO_MAP_VALUE) ||
+			    (insn[0].src_reg == BPF_PSEUDO_MAP_FD &&
+			     insn[1].imm != 0)) {
+				verbose(env,
+					"unrecognized bpf_ld_imm64 insn\n");
 				return -EINVAL;
 			}
 
@@ -6698,16 +6715,49 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
 				return err;
 			}
 
-			/* store map pointer inside BPF_LD_IMM64 instruction */
-			insn[0].imm = (u32) (unsigned long) map;
-			insn[1].imm = ((u64) (unsigned long) map) >> 32;
+			aux = &env->insn_aux_data[i];
+			if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
+				addr = (unsigned long)map;
+			} else {
+				u32 idx = ((u32)(u16)insn[0].off) |
+					  ((u32)(u16)insn[1].off) << 16;
+				u32 off = insn[1].imm;
+
+				if (off >= BPF_MAX_VAR_OFF) {
+					verbose(env, "direct value offset of %u is not allowed\n", off);
+					fdput(f);
+					return -EINVAL;
+				}
+
+				if (!map->ops->map_direct_value_addr) {
+					verbose(env, "no direct value access support for this map type\n");
+					fdput(f);
+					return -EINVAL;
+				}
+
+				err = map->ops->map_direct_value_addr(map, &addr, idx, off);
+				if (err) {
+					verbose(env, "invalid access to map value pointer, value_size=%u index=%u off=%u\n",
+						map->value_size, idx, off);
+					fdput(f);
+					return err;
+				}
+
+				aux->map_off = off;
+				addr += off;
+			}
+
+			insn[0].imm = (u32)addr;
+			insn[1].imm = addr >> 32;
 
 			/* check whether we recorded this map already */
-			for (j = 0; j < env->used_map_cnt; j++)
+			for (j = 0; j < env->used_map_cnt; j++) {
 				if (env->used_maps[j] == map) {
+					aux->map_index = j;
 					fdput(f);
 					goto next_insn;
 				}
+			}
 
 			if (env->used_map_cnt >= MAX_USED_MAPS) {
 				fdput(f);
@@ -6724,6 +6774,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
 				fdput(f);
 				return PTR_ERR(map);
 			}
+
+			aux->map_index = env->used_map_cnt;
 			env->used_maps[env->used_map_cnt++] = map;
 
 			if (bpf_map_is_cgroup_storage(map) &&
@@ -6778,9 +6830,12 @@ static void convert_pseudo_ld_imm64(struct bpf_verifier_env *env)
 	int insn_cnt = env->prog->len;
 	int i;
 
-	for (i = 0; i < insn_cnt; i++, insn++)
-		if (insn->code == (BPF_LD | BPF_IMM | BPF_DW))
+	for (i = 0; i < insn_cnt; i++, insn++) {
+		if (insn->code == (BPF_LD | BPF_IMM | BPF_DW)) {
 			insn->src_reg = 0;
+			insn->off = (insn + 1)->off = 0;
+		}
+	}
 }
 
 /* single env->prog->insni[off] instruction was replaced with the range
diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
index 7073dbe1ff27..5391a9a70112 100644
--- a/tools/bpf/bpftool/xlated_dumper.c
+++ b/tools/bpf/bpftool/xlated_dumper.c
@@ -195,6 +195,12 @@ static const char *print_imm(void *private_data,
 	if (insn->src_reg == BPF_PSEUDO_MAP_FD)
 		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
 			 "map[id:%u]", insn->imm);
+	else if (insn->src_reg == BPF_PSEUDO_MAP_VALUE)
+		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+			 "map[id:%u][%u]+%u", insn->imm,
+			 ((__u32)(__u16)insn[0].off) |
+			 ((__u32)(__u16)insn[1].off) << 16,
+			 (insn + 1)->imm);
 	else
 		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
 			 "0x%llx", (unsigned long long)full_imm);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support for maps
  2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 1/9] bpf: implement lookup-free direct value access for maps Daniel Borkmann
@ 2019-03-11 21:51 ` Daniel Borkmann
  2019-03-11 23:06   ` Alexei Starovoitov
  2019-03-14 19:26   ` Andrii Nakryiko
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 3/9] bpf: add syscall side map lock support Daniel Borkmann
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

This work adds two new map creation flags BPF_F_RDONLY_PROG
and BPF_F_WRONLY_PROG in order to allow for read-only or
write-only BPF maps from a BPF program side.

Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
applies to system call side, meaning the BPF program has full
read/write access to the map as usual while bpf(2) calls with
map fd can either only read or write into the map depending
on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
for the exact opposite such that verifier is going to reject
program loads if write into a read-only map or a read into a
write-only map is detected. For read-only map case also some
helpers are forbidden for programs that would alter the map
state such as map deletion, update, etc.

We've enabled this generic map extension to various non-special
maps holding normal user data: array, hash, lru, lpm, local
storage, queue and stack. Further map types could be followed
up in future depending on use-case. Main use case here is to
forbid writes into .rodata map values from verifier side.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h           | 24 ++++++++++++++++++
 include/uapi/linux/bpf.h      | 10 +++++++-
 kernel/bpf/arraymap.c         |  3 ++-
 kernel/bpf/hashtab.c          |  6 ++---
 kernel/bpf/local_storage.c    |  6 ++---
 kernel/bpf/lpm_trie.c         |  3 ++-
 kernel/bpf/queue_stack_maps.c |  6 ++---
 kernel/bpf/syscall.c          |  2 ++
 kernel/bpf/verifier.c         | 46 +++++++++++++++++++++++++++++++++--
 9 files changed, 92 insertions(+), 14 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 85b6b5dc883f..bb80c78924b0 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -427,6 +427,30 @@ struct bpf_array {
 	};
 };
 
+#define BPF_MAP_CAN_READ	BIT(0)
+#define BPF_MAP_CAN_WRITE	BIT(1)
+
+static inline u32 bpf_map_flags_to_cap(struct bpf_map *map)
+{
+	u32 access_flags = map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
+
+	/* Combination of BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG is
+	 * not possible.
+	 */
+	if (access_flags & BPF_F_RDONLY_PROG)
+		return BPF_MAP_CAN_READ;
+	else if (access_flags & BPF_F_WRONLY_PROG)
+		return BPF_MAP_CAN_WRITE;
+	else
+		return BPF_MAP_CAN_READ | BPF_MAP_CAN_WRITE;
+}
+
+static inline bool bpf_map_flags_access_ok(u32 access_flags)
+{
+	return (access_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) !=
+	       (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
+}
+
 #define MAX_TAIL_CALL_CNT 32
 
 struct bpf_event_entry {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index d0b80fce0fc9..e64fd9862e68 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -294,7 +294,7 @@ enum bpf_attach_type {
 
 #define BPF_OBJ_NAME_LEN 16U
 
-/* Flags for accessing BPF object */
+/* Flags for accessing BPF object from syscall side. */
 #define BPF_F_RDONLY		(1U << 3)
 #define BPF_F_WRONLY		(1U << 4)
 
@@ -304,6 +304,14 @@ enum bpf_attach_type {
 /* Zero-initialize hash function seed. This should only be used for testing. */
 #define BPF_F_ZERO_SEED		(1U << 6)
 
+/* Flags for accessing BPF object from program side. */
+#define BPF_F_RDONLY_PROG	(1U << 7)
+#define BPF_F_WRONLY_PROG	(1U << 8)
+#define BPF_F_ACCESS_MASK	(BPF_F_RDONLY |		\
+				 BPF_F_RDONLY_PROG |	\
+				 BPF_F_WRONLY |		\
+				 BPF_F_WRONLY_PROG)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 862d20422ad1..6d2ce06485ae 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -22,7 +22,7 @@
 #include "map_in_map.h"
 
 #define ARRAY_CREATE_FLAG_MASK \
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
 
 static void bpf_array_free_percpu(struct bpf_array *array)
 {
@@ -63,6 +63,7 @@ int array_map_alloc_check(union bpf_attr *attr)
 	if (attr->max_entries == 0 || attr->key_size != 4 ||
 	    attr->value_size == 0 ||
 	    attr->map_flags & ~ARRAY_CREATE_FLAG_MASK ||
+	    !bpf_map_flags_access_ok(attr->map_flags) ||
 	    (percpu && numa_node != NUMA_NO_NODE))
 		return -EINVAL;
 
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index fed15cf94dca..192d32e77db3 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -23,7 +23,7 @@
 
 #define HTAB_CREATE_FLAG_MASK						\
 	(BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |	\
-	 BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_ZERO_SEED)
+	 BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
 
 struct bucket {
 	struct hlist_nulls_head head;
@@ -262,8 +262,8 @@ static int htab_map_alloc_check(union bpf_attr *attr)
 		/* Guard against local DoS, and discourage production use. */
 		return -EPERM;
 
-	if (attr->map_flags & ~HTAB_CREATE_FLAG_MASK)
-		/* reserved bits should not be used */
+	if (attr->map_flags & ~HTAB_CREATE_FLAG_MASK ||
+	    !bpf_map_flags_access_ok(attr->map_flags))
 		return -EINVAL;
 
 	if (!lru && percpu_lru)
diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
index 6b572e2de7fb..980e8f1f6cb5 100644
--- a/kernel/bpf/local_storage.c
+++ b/kernel/bpf/local_storage.c
@@ -14,7 +14,7 @@ DEFINE_PER_CPU(struct bpf_cgroup_storage*, bpf_cgroup_storage[MAX_BPF_CGROUP_STO
 #ifdef CONFIG_CGROUP_BPF
 
 #define LOCAL_STORAGE_CREATE_FLAG_MASK					\
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
 
 struct bpf_cgroup_storage_map {
 	struct bpf_map map;
@@ -282,8 +282,8 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
 	if (attr->value_size > PAGE_SIZE)
 		return ERR_PTR(-E2BIG);
 
-	if (attr->map_flags & ~LOCAL_STORAGE_CREATE_FLAG_MASK)
-		/* reserved bits should not be used */
+	if (attr->map_flags & ~LOCAL_STORAGE_CREATE_FLAG_MASK ||
+	    !bpf_map_flags_access_ok(attr->map_flags))
 		return ERR_PTR(-EINVAL);
 
 	if (attr->max_entries)
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 93a5cbbde421..e61630c2e50b 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -538,7 +538,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
 #define LPM_KEY_SIZE_MIN	LPM_KEY_SIZE(LPM_DATA_SIZE_MIN)
 
 #define LPM_CREATE_FLAG_MASK	(BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE |	\
-				 BPF_F_RDONLY | BPF_F_WRONLY)
+				 BPF_F_ACCESS_MASK)
 
 static struct bpf_map *trie_alloc(union bpf_attr *attr)
 {
@@ -553,6 +553,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
 	if (attr->max_entries == 0 ||
 	    !(attr->map_flags & BPF_F_NO_PREALLOC) ||
 	    attr->map_flags & ~LPM_CREATE_FLAG_MASK ||
+	    !bpf_map_flags_access_ok(attr->map_flags) ||
 	    attr->key_size < LPM_KEY_SIZE_MIN ||
 	    attr->key_size > LPM_KEY_SIZE_MAX ||
 	    attr->value_size < LPM_VAL_SIZE_MIN ||
diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
index b384ea9f3254..0b140d236889 100644
--- a/kernel/bpf/queue_stack_maps.c
+++ b/kernel/bpf/queue_stack_maps.c
@@ -11,8 +11,7 @@
 #include "percpu_freelist.h"
 
 #define QUEUE_STACK_CREATE_FLAG_MASK \
-	(BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
-
+	(BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
 
 struct bpf_queue_stack {
 	struct bpf_map map;
@@ -52,7 +51,8 @@ static int queue_stack_map_alloc_check(union bpf_attr *attr)
 	/* check sanity of attributes */
 	if (attr->max_entries == 0 || attr->key_size != 0 ||
 	    attr->value_size == 0 ||
-	    attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK)
+	    attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK ||
+	    !bpf_map_flags_access_ok(attr->map_flags))
 		return -EINVAL;
 
 	if (attr->value_size > KMALLOC_MAX_SIZE)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b0c7a6485c49..ba2fe4cfad09 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -481,6 +481,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
 	map->spin_lock_off = btf_find_spin_lock(btf, value_type);
 
 	if (map_value_has_spin_lock(map)) {
+		if (map->map_flags & BPF_F_RDONLY_PROG)
+			return -EACCES;
 		if (map->map_type != BPF_MAP_TYPE_HASH &&
 		    map->map_type != BPF_MAP_TYPE_ARRAY &&
 		    map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 57678cef9a2c..af3cddb18efb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1429,6 +1429,28 @@ static int check_stack_access(struct bpf_verifier_env *env,
 	return 0;
 }
 
+static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
+				 int off, int size, enum bpf_access_type type)
+{
+	struct bpf_reg_state *regs = cur_regs(env);
+	struct bpf_map *map = regs[regno].map_ptr;
+	u32 cap = bpf_map_flags_to_cap(map);
+
+	if (type == BPF_WRITE && !(cap & BPF_MAP_CAN_WRITE)) {
+		verbose(env, "write into map forbidden, value_size=%d off=%d size=%d\n",
+			map->value_size, off, size);
+		return -EACCES;
+	}
+
+	if (type == BPF_READ && !(cap & BPF_MAP_CAN_READ)) {
+		verbose(env, "read into map forbidden, value_size=%d off=%d size=%d\n",
+			map->value_size, off, size);
+		return -EACCES;
+	}
+
+	return 0;
+}
+
 /* check read/write into map element returned by bpf_map_lookup_elem() */
 static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
 			      int size, bool zero_size_allowed)
@@ -2014,7 +2036,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
 			verbose(env, "R%d leaks addr into map\n", value_regno);
 			return -EACCES;
 		}
-
+		err = check_map_access_type(env, regno, off, size, t);
+		if (err)
+			return err;
 		err = check_map_access(env, regno, off, size, false);
 		if (!err && t == BPF_READ && value_regno >= 0)
 			mark_reg_unknown(env, regs, value_regno);
@@ -2250,6 +2274,10 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
 		return check_packet_access(env, regno, reg->off, access_size,
 					   zero_size_allowed);
 	case PTR_TO_MAP_VALUE:
+		if (check_map_access_type(env, regno, reg->off, access_size,
+					  meta && meta->raw_mode ? BPF_WRITE :
+					  BPF_READ))
+			return -EACCES;
 		return check_map_access(env, regno, reg->off, access_size,
 					zero_size_allowed);
 	default: /* scalar_value|ptr_to_stack or invalid ptr */
@@ -2971,6 +2999,7 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 		int func_id, int insn_idx)
 {
 	struct bpf_insn_aux_data *aux = &env->insn_aux_data[insn_idx];
+	struct bpf_map *map = meta->map_ptr;
 
 	if (func_id != BPF_FUNC_tail_call &&
 	    func_id != BPF_FUNC_map_lookup_elem &&
@@ -2981,11 +3010,24 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
 	    func_id != BPF_FUNC_map_peek_elem)
 		return 0;
 
-	if (meta->map_ptr == NULL) {
+	if (map == NULL) {
 		verbose(env, "kernel subsystem misconfigured verifier\n");
 		return -EINVAL;
 	}
 
+	/* In case of read-only, some additional restrictions
+	 * need to be applied in order to prevent altering the
+	 * state of the map from program side.
+	 */
+	if ((map->map_flags & BPF_F_RDONLY_PROG) &&
+	    (func_id == BPF_FUNC_map_delete_elem ||
+	     func_id == BPF_FUNC_map_update_elem ||
+	     func_id == BPF_FUNC_map_push_elem ||
+	     func_id == BPF_FUNC_map_pop_elem)) {
+		verbose(env, "write into map forbidden\n");
+		return -EACCES;
+	}
+
 	if (!BPF_MAP_PTR(aux->map_state))
 		bpf_map_ptr_store(aux, meta->map_ptr,
 				  meta->map_ptr->unpriv_array);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH rfc v3 bpf-next 3/9] bpf: add syscall side map lock support
  2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 1/9] bpf: implement lookup-free direct value access for maps Daniel Borkmann
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support " Daniel Borkmann
@ 2019-03-11 21:51 ` Daniel Borkmann
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 4/9] bpf, obj: allow . char as part of the name Daniel Borkmann
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

This patch adds a new BPF_MAP_LOCK command which allows to lock the
map globally as read-only/immutable from syscall side. Map permission
handling has been refactored into map_get_sys_perms() and drops
FMODE_CAN_WRITE in case of locked map. Main use case is to allow
for setting up .rodata sections from the BPF ELF which are loaded
into the kernel, meaning BPF loader first allocates map, sets up
map value by copying .rodata section into it and once complete, it
calls BPF_MAP_LOCK on the map fd to prevent further modifications.
Given maps can be shared, we only grant the original creator of
the map the ability to lock it as syscall-side read-only or only
priviledged users otherwise.

Right now BPF_MAP_LOCK only takes map fd as argument while remaining
bpf_attr members are required to be zero. I didn't add write-only
locking here as counterpart since I don't have a concrete use-case
for it on my side, and I think it makes probably more sense to wait
once there is actually one. In that case bpf_attr can be extended
as usual with a flag field and/or others where flag 0 means that
we lock the map read-only hence this doesn't prevent to add further
extensions to BPF_MAP_LOCK upon need.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/bpf.h      |  5 ++-
 include/uapi/linux/bpf.h |  1 +
 kernel/bpf/syscall.c     | 72 +++++++++++++++++++++++++++++++++-------
 3 files changed, 65 insertions(+), 13 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index bb80c78924b0..6b9717b430ff 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -87,7 +87,10 @@ struct bpf_map {
 	struct btf *btf;
 	u32 pages;
 	bool unpriv_array;
-	/* 51 bytes hole */
+	/* Next two members are write-once. */
+	bool sys_immutable;
+	struct task_struct *creator;
+	/* 40 bytes hole */
 
 	/* The 3rd and 4th cacheline with misc members to avoid false sharing
 	 * particularly with refcounting.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index e64fd9862e68..5eb59f05a147 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -105,6 +105,7 @@ enum bpf_cmd {
 	BPF_BTF_GET_FD_BY_ID,
 	BPF_TASK_FD_QUERY,
 	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
+	BPF_MAP_LOCK,
 };
 
 enum bpf_map_type {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ba2fe4cfad09..b5ba138351e1 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -328,6 +328,8 @@ static int bpf_map_release(struct inode *inode, struct file *filp)
 {
 	struct bpf_map *map = filp->private_data;
 
+	if (READ_ONCE(map->creator))
+		cmpxchg(&map->creator, current, NULL);
 	if (map->ops->map_release)
 		map->ops->map_release(map, filp);
 
@@ -335,6 +337,18 @@ static int bpf_map_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+static fmode_t map_get_sys_perms(struct bpf_map *map, struct fd f)
+{
+	fmode_t mode = f.file->f_mode;
+
+	/* Our file permissions may have been overridden by global
+	 * map permissions facing syscall side.
+	 */
+	if (READ_ONCE(map->sys_immutable))
+		mode &= ~FMODE_CAN_WRITE;
+	return mode;
+}
+
 #ifdef CONFIG_PROC_FS
 static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 {
@@ -356,14 +370,16 @@ static void bpf_map_show_fdinfo(struct seq_file *m, struct file *filp)
 		   "max_entries:\t%u\n"
 		   "map_flags:\t%#x\n"
 		   "memlock:\t%llu\n"
-		   "map_id:\t%u\n",
+		   "map_id:\t%u\n"
+		   "sys_immutable:\t%u\n",
 		   map->map_type,
 		   map->key_size,
 		   map->value_size,
 		   map->max_entries,
 		   map->map_flags,
 		   map->pages * 1ULL << PAGE_SHIFT,
-		   map->id);
+		   map->id,
+		   READ_ONCE(map->sys_immutable));
 
 	if (owner_prog_type) {
 		seq_printf(m, "owner_prog_type:\t%u\n",
@@ -533,6 +549,7 @@ static int map_create(union bpf_attr *attr)
 	if (err)
 		goto free_map_nouncharge;
 
+	WRITE_ONCE(map->creator, current);
 	atomic_set(&map->refcnt, 1);
 	atomic_set(&map->usercnt, 1);
 
@@ -707,8 +724,7 @@ static int map_lookup_elem(union bpf_attr *attr)
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
-
-	if (!(f.file->f_mode & FMODE_CAN_READ)) {
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
 		err = -EPERM;
 		goto err_put;
 	}
@@ -837,8 +853,7 @@ static int map_update_elem(union bpf_attr *attr)
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
-
-	if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
 		err = -EPERM;
 		goto err_put;
 	}
@@ -949,8 +964,7 @@ static int map_delete_elem(union bpf_attr *attr)
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
-
-	if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
 		err = -EPERM;
 		goto err_put;
 	}
@@ -1001,8 +1015,7 @@ static int map_get_next_key(union bpf_attr *attr)
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
-
-	if (!(f.file->f_mode & FMODE_CAN_READ)) {
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_READ)) {
 		err = -EPERM;
 		goto err_put;
 	}
@@ -1069,8 +1082,7 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 	map = __bpf_map_get(f);
 	if (IS_ERR(map))
 		return PTR_ERR(map);
-
-	if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE)) {
 		err = -EPERM;
 		goto err_put;
 	}
@@ -1112,6 +1124,39 @@ static int map_lookup_and_delete_elem(union bpf_attr *attr)
 	return err;
 }
 
+#define BPF_MAP_LOCK_LAST_FIELD map_fd
+
+static int map_lock(const union bpf_attr *attr)
+{
+	int err = 0, ufd = attr->map_fd;
+	struct bpf_map *map;
+	struct fd f;
+
+	if (CHECK_ATTR(BPF_MAP_LOCK))
+		return -EINVAL;
+
+	f = fdget(ufd);
+	map = __bpf_map_get(f);
+	if (IS_ERR(map))
+		return PTR_ERR(map);
+	if (READ_ONCE(map->sys_immutable)) {
+		err = -EBUSY;
+		goto err_put;
+	}
+	if (!(map_get_sys_perms(map, f) & FMODE_CAN_WRITE) ||
+	    (!capable(CAP_SYS_ADMIN) &&
+	     READ_ONCE(map->creator) != current)) {
+		err = -EPERM;
+		goto err_put;
+	}
+
+	WRITE_ONCE(map->sys_immutable, true);
+	synchronize_rcu();
+err_put:
+	fdput(f);
+	return err;
+}
+
 static const struct bpf_prog_ops * const bpf_prog_types[] = {
 #define BPF_PROG_TYPE(_id, _name) \
 	[_id] = & _name ## _prog_ops,
@@ -2715,6 +2760,9 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
 	case BPF_MAP_GET_NEXT_KEY:
 		err = map_get_next_key(&attr);
 		break;
+	case BPF_MAP_LOCK:
+		err = map_lock(&attr);
+		break;
 	case BPF_PROG_LOAD:
 		err = bpf_prog_load(&attr, uattr);
 		break;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH rfc v3 bpf-next 4/9] bpf, obj: allow . char as part of the name
  2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
                   ` (2 preceding siblings ...)
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 3/9] bpf: add syscall side map lock support Daniel Borkmann
@ 2019-03-11 21:51 ` Daniel Borkmann
  2019-03-14 19:40   ` Andrii Nakryiko
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 5/9] bpf: sync bpf.h uapi header from tools infrastructure Daniel Borkmann
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

Trivial addition to allow '.' aside from '_' as "special" characters
in the object name. Used to allow for substrings in maps from loader
side such as ".bss", ".data", ".rodata", but could also be useful for
other purposes.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 kernel/bpf/syscall.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index b5ba138351e1..04279747c092 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -456,10 +456,10 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
 	const char *end = src + BPF_OBJ_NAME_LEN;
 
 	memset(dst, 0, BPF_OBJ_NAME_LEN);
-
-	/* Copy all isalnum() and '_' char */
+	/* Copy all isalnum(), '_' and '.' chars. */
 	while (src < end && *src) {
-		if (!isalnum(*src) && *src != '_')
+		if (!isalnum(*src) &&
+		    *src != '_' && *src != '.')
 			return -EINVAL;
 		*dst++ = *src++;
 	}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH rfc v3 bpf-next 5/9] bpf: sync bpf.h uapi header from tools infrastructure
  2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
                   ` (3 preceding siblings ...)
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 4/9] bpf, obj: allow . char as part of the name Daniel Borkmann
@ 2019-03-11 21:51 ` Daniel Borkmann
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 6/9] bpf, libbpf: refactor relocation handling Daniel Borkmann
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

Pull in latest changes, so we can make use of them in libbpf.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/include/uapi/linux/bpf.h | 24 ++++++++++++++++++++++--
 1 file changed, 22 insertions(+), 2 deletions(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 3c38ac9a92a7..5eb59f05a147 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -105,6 +105,7 @@ enum bpf_cmd {
 	BPF_BTF_GET_FD_BY_ID,
 	BPF_TASK_FD_QUERY,
 	BPF_MAP_LOOKUP_AND_DELETE_ELEM,
+	BPF_MAP_LOCK,
 };
 
 enum bpf_map_type {
@@ -255,8 +256,19 @@ enum bpf_attach_type {
  */
 #define BPF_F_ANY_ALIGNMENT	(1U << 1)
 
-/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
+/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
+ * two extensions:
+ *
+ * insn[0].src_reg:  BPF_PSEUDO_MAP_FD   BPF_PSEUDO_MAP_VALUE
+ * insn[0].imm:      map fd              map fd
+ * insn[1].imm:      0                   offset into value
+ * insn[0].off:      0                   32 bit index to the
+ * insn[1].off:      0                   map value
+ * ldimm64 rewrite:  address of map      address of map[index]+offset
+ * verifier type:    CONST_PTR_TO_MAP    PTR_TO_MAP_VALUE
+ */
 #define BPF_PSEUDO_MAP_FD	1
+#define BPF_PSEUDO_MAP_VALUE	2
 
 /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
  * offset to another bpf function
@@ -283,7 +295,7 @@ enum bpf_attach_type {
 
 #define BPF_OBJ_NAME_LEN 16U
 
-/* Flags for accessing BPF object */
+/* Flags for accessing BPF object from syscall side. */
 #define BPF_F_RDONLY		(1U << 3)
 #define BPF_F_WRONLY		(1U << 4)
 
@@ -293,6 +305,14 @@ enum bpf_attach_type {
 /* Zero-initialize hash function seed. This should only be used for testing. */
 #define BPF_F_ZERO_SEED		(1U << 6)
 
+/* Flags for accessing BPF object from program side. */
+#define BPF_F_RDONLY_PROG	(1U << 7)
+#define BPF_F_WRONLY_PROG	(1U << 8)
+#define BPF_F_ACCESS_MASK	(BPF_F_RDONLY |		\
+				 BPF_F_RDONLY_PROG |	\
+				 BPF_F_WRONLY |		\
+				 BPF_F_WRONLY_PROG)
+
 /* flags for BPF_PROG_QUERY */
 #define BPF_F_QUERY_EFFECTIVE	(1U << 0)
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH rfc v3 bpf-next 6/9] bpf, libbpf: refactor relocation handling
  2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
                   ` (4 preceding siblings ...)
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 5/9] bpf: sync bpf.h uapi header from tools infrastructure Daniel Borkmann
@ 2019-03-11 21:51 ` Daniel Borkmann
  2019-03-14 21:05   ` Andrii Nakryiko
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 7/9] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

From: Joe Stringer <joe@wand.net.nz>

Adjust the code for relocations slightly with no functional changes,
so that upcoming patches that will introduce support for relocations
into the .data, .rodata and .bss sections can be added independent
of these changes.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/lib/bpf/libbpf.c | 62 ++++++++++++++++++++++--------------------
 1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f5eb60379c8d..0afdb8914386 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -851,20 +851,20 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 				obj->efile.symbols = data;
 				obj->efile.strtabidx = sh.sh_link;
 			}
-		} else if ((sh.sh_type == SHT_PROGBITS) &&
-			   (sh.sh_flags & SHF_EXECINSTR) &&
-			   (data->d_size > 0)) {
-			if (strcmp(name, ".text") == 0)
-				obj->efile.text_shndx = idx;
-			err = bpf_object__add_program(obj, data->d_buf,
-						      data->d_size, name, idx);
-			if (err) {
-				char errmsg[STRERR_BUFSIZE];
-				char *cp = libbpf_strerror_r(-err, errmsg,
-							     sizeof(errmsg));
-
-				pr_warning("failed to alloc program %s (%s): %s",
-					   name, obj->path, cp);
+		} else if (sh.sh_type == SHT_PROGBITS && data->d_size > 0) {
+			if (sh.sh_flags & SHF_EXECINSTR) {
+				if (strcmp(name, ".text") == 0)
+					obj->efile.text_shndx = idx;
+				err = bpf_object__add_program(obj, data->d_buf,
+							      data->d_size, name, idx);
+				if (err) {
+					char errmsg[STRERR_BUFSIZE];
+					char *cp = libbpf_strerror_r(-err, errmsg,
+								     sizeof(errmsg));
+
+					pr_warning("failed to alloc program %s (%s): %s",
+						   name, obj->path, cp);
+				}
 			}
 		} else if (sh.sh_type == SHT_REL) {
 			void *reloc = obj->efile.reloc;
@@ -1026,24 +1026,26 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 			return -LIBBPF_ERRNO__RELOC;
 		}
 
-		/* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
-		for (map_idx = 0; map_idx < nr_maps; map_idx++) {
-			if (maps[map_idx].offset == sym.st_value) {
-				pr_debug("relocation: find map %zd (%s) for insn %u\n",
-					 map_idx, maps[map_idx].name, insn_idx);
-				break;
+		if (sym.st_shndx == maps_shndx) {
+			/* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
+			for (map_idx = 0; map_idx < nr_maps; map_idx++) {
+				if (maps[map_idx].offset == sym.st_value) {
+					pr_debug("relocation: find map %zd (%s) for insn %u\n",
+						 map_idx, maps[map_idx].name, insn_idx);
+					break;
+				}
 			}
-		}
 
-		if (map_idx >= nr_maps) {
-			pr_warning("bpf relocation: map_idx %d large than %d\n",
-				   (int)map_idx, (int)nr_maps - 1);
-			return -LIBBPF_ERRNO__RELOC;
-		}
+			if (map_idx >= nr_maps) {
+				pr_warning("bpf relocation: map_idx %d large than %d\n",
+					   (int)map_idx, (int)nr_maps - 1);
+				return -LIBBPF_ERRNO__RELOC;
+			}
 
-		prog->reloc_desc[i].type = RELO_LD64;
-		prog->reloc_desc[i].insn_idx = insn_idx;
-		prog->reloc_desc[i].map_idx = map_idx;
+			prog->reloc_desc[i].type = RELO_LD64;
+			prog->reloc_desc[i].insn_idx = insn_idx;
+			prog->reloc_desc[i].map_idx = map_idx;
+		}
 	}
 	return 0;
 }
@@ -1405,7 +1407,7 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
 			}
 			insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
 			insns[insn_idx].imm = obj->maps[map_idx].fd;
-		} else {
+		} else if (prog->reloc_desc[i].type == RELO_CALL) {
 			err = bpf_program__reloc_text(prog, obj,
 						      &prog->reloc_desc[i]);
 			if (err)
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH rfc v3 bpf-next 7/9] bpf, libbpf: support global data/bss/rodata sections
  2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
                   ` (5 preceding siblings ...)
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 6/9] bpf, libbpf: refactor relocation handling Daniel Borkmann
@ 2019-03-11 21:51 ` Daniel Borkmann
  2019-03-14 22:14   ` Andrii Nakryiko
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 8/9] bpf, selftest: test {rd,wr}only flags and direct value access Daniel Borkmann
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 9/9] bpf, selftest: test global data/bss/rodata sections Daniel Borkmann
  8 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

This work adds BPF loader support for global data sections
to libbpf. This allows to write BPF programs in more natural
C-like way by being able to define global variables and const
data.

Back at LPC 2018 [0] we presented a first prototype which
implemented support for global data sections by extending BPF
syscall where union bpf_attr would get additional memory/size
pair for each section passed during prog load in order to later
add this base address into the ldimm64 instruction along with
the user provided offset when accessing a variable. Consensus
from LPC was that for proper upstream support, it would be
more desirable to use maps instead of bpf_attr extension as
this would allow for introspection of these sections as well
as potential life updates of their content. This work follows
this path by taking the following steps from loader side:

 1) In bpf_object__elf_collect() step we pick up ".data",
    ".rodata", and ".bss" section information.

 2) If present, in bpf_object__init_internal_map() we add
    maps to the obj's map array that corresponds to each
    of the present sections. Given section size and access
    properties can differ, a single entry array map is
    created with value size that is corresponding to the
    ELF section size of .data, .bss or .rodata. These
    internal maps are integrated into the normal map
    handling of libbpf such that when user traverses all
    obj maps, they can be differentiated from user-created
    ones via bpf_map__is_internal(). In later steps when
    we actually create these maps in the kernel via
    bpf_object__create_maps(), then for .data and .rodata
    sections their content is copied into the map through
    bpf_map_update_elem(). For .bss this is not necessary
    since array map is already zero-initialized by default.
    Additionally, for .rodata the map is locked as read-only
    after setup, such that neither from program nor syscall
    side writes would be possible.

 3) In bpf_program__collect_reloc() step, we record the
    corresponding map, insn index, and relocation type for
    the global data.

 4) And last but not least in the actual relocation step in
    bpf_program__relocate(), we mark the ldimm64 instruction
    with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
    imm field the map's file descriptor is stored as similarly
    done as in BPF_PSEUDO_MAP_FD, and in the second imm field
    (as ldimm64 is 2-insn wide) we store the access offset
    into the section. Given these maps have only single element
    ldimm64's off remains zero in both parts.

 5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
    load will then store the actual target address in order
    to have a 'map-lookup'-free access. That is, the actual
    map value base address + offset. The destination register
    in the verifier will then be marked as PTR_TO_MAP_VALUE,
    containing the fixed offset as reg->off and backing BPF
    map as reg->map_ptr. Meaning, it's treated as any other
    normal map value from verification side, only with
    efficient, direct value access instead of actual call to
    map lookup helper as in the typical case.

Currently, only support for static global variables has been
added, and libbpf rejects non-static global variables from
loading. This can be lifted until we have proper semantics
for how BPF will treat these.

From BTF side, libbpf associates these three maps with BTF
map name of ".bss", ".data" and ".rodata" which LLVM will
emit (w/o the object name prefix).

Simple example dump of program using globals vars in each
section:

  # bpftool prog
  [...]
  6784: sched_cls  name load_static_dat  tag a7e1291567277844  gpl
        loaded_at 2019-03-11T15:39:34+0000  uid 0
        xlated 1776B  jited 993B  memlock 4096B  map_ids 2238,2237,2235,2236,2239,2240

  # bpftool map show id 2237
  2237: array  name test_glo.bss  flags 0x0
        key 4B  value 64B  max_entries 1  memlock 4096B
  # bpftool map show id 2235
  2235: array  name test_glo.data  flags 0x0
        key 4B  value 64B  max_entries 1  memlock 4096B
  # bpftool map show id 2236
  2236: array  name test_glo.rodata  flags 0x80
        key 4B  value 96B  max_entries 1  memlock 4096B

  # bpftool prog dump xlated id 6784
  int load_static_data(struct __sk_buff * skb):
  ; int load_static_data(struct __sk_buff *skb)
     0: (b7) r6 = 0
  ; test_reloc(number, 0, &num0);
     1: (63) *(u32 *)(r10 -4) = r6
     2: (bf) r2 = r10
  ; int load_static_data(struct __sk_buff *skb)
     3: (07) r2 += -4
  ; test_reloc(number, 0, &num0);
     4: (18) r1 = map[id:2238]
     6: (18) r3 = map[id:2237][0]+0    <-- direct addr in .bss area
     8: (b7) r4 = 0
     9: (85) call array_map_update_elem#100464
    10: (b7) r1 = 1
  ; test_reloc(number, 1, &num1);
  [...]
  ; test_reloc(string, 2, str2);
   120: (18) r8 = map[id:2237][0]+16   <-- same here at offset +16
   122: (18) r1 = map[id:2239]
   124: (18) r3 = map[id:2237][0]+16
   126: (b7) r4 = 0
   127: (85) call array_map_update_elem#100464
   128: (b7) r1 = 120
  ; str1[5] = 'x';
   129: (73) *(u8 *)(r9 +5) = r1
  ; test_reloc(string, 3, str1);
   130: (b7) r1 = 3
   131: (63) *(u32 *)(r10 -4) = r1
   132: (b7) r9 = 3
   133: (bf) r2 = r10
  ; int load_static_data(struct __sk_buff *skb)
   134: (07) r2 += -4
  ; test_reloc(string, 3, str1);
   135: (18) r1 = map[id:2239]
   137: (18) r3 = map[id:2235][0]+16   <-- direct addr in .data area
   139: (b7) r4 = 0
   140: (85) call array_map_update_elem#100464
   141: (b7) r1 = 111
  ; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
   142: (73) *(u8 *)(r8 +6) = r1       <-- further access based on .bss data
   143: (b7) r1 = 108
   144: (73) *(u8 *)(r8 +5) = r1
  [...]

Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
fail for static variables").

  [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
      http://vger.kernel.org/lpc-bpf2018.html#session-3

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/lib/bpf/bpf.c      |  10 ++
 tools/lib/bpf/bpf.h      |   1 +
 tools/lib/bpf/libbpf.c   | 324 ++++++++++++++++++++++++++++++++++-----
 tools/lib/bpf/libbpf.h   |   1 +
 tools/lib/bpf/libbpf.map |   6 +
 5 files changed, 301 insertions(+), 41 deletions(-)

diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 9cd015574e83..cba2a615e135 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -429,6 +429,16 @@ int bpf_map_get_next_key(int fd, const void *key, void *next_key)
 	return sys_bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
 }
 
+int bpf_map_lock(int fd)
+{
+	union bpf_attr attr;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.map_fd = fd;
+
+	return sys_bpf(BPF_MAP_LOCK, &attr, sizeof(attr));
+}
+
 int bpf_obj_pin(int fd, const char *pathname)
 {
 	union bpf_attr attr;
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 6ffdd79bea89..fa2bdbba6f00 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -117,6 +117,7 @@ LIBBPF_API int bpf_map_lookup_and_delete_elem(int fd, const void *key,
 					      void *value);
 LIBBPF_API int bpf_map_delete_elem(int fd, const void *key);
 LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
+LIBBPF_API int bpf_map_lock(int fd);
 LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
 LIBBPF_API int bpf_obj_get(const char *pathname);
 LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 0afdb8914386..7821c9b1e838 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -7,6 +7,7 @@
  * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
  * Copyright (C) 2015 Huawei Inc.
  * Copyright (C) 2017 Nicira, Inc.
+ * Copyright (C) 2019 Isovalent, Inc.
  */
 
 #ifndef _GNU_SOURCE
@@ -139,6 +140,7 @@ struct bpf_program {
 		enum {
 			RELO_LD64,
 			RELO_CALL,
+			RELO_DATA,
 		} type;
 		int insn_idx;
 		union {
@@ -171,6 +173,19 @@ struct bpf_program {
 	__u32 line_info_cnt;
 };
 
+enum libbpf_map_type {
+	LIBBPF_MAP_UNSPEC,
+	LIBBPF_MAP_DATA,
+	LIBBPF_MAP_BSS,
+	LIBBPF_MAP_RODATA,
+};
+
+static const char *libbpf_type_to_btf_name[] = {
+	[LIBBPF_MAP_DATA]	= ".data",
+	[LIBBPF_MAP_BSS]	= ".bss",
+	[LIBBPF_MAP_RODATA]	= ".rodata",
+};
+
 struct bpf_map {
 	int fd;
 	char *name;
@@ -182,11 +197,18 @@ struct bpf_map {
 	__u32 btf_value_type_id;
 	void *priv;
 	bpf_map_clear_priv_t clear_priv;
+	enum libbpf_map_type libbpf_type;
+};
+
+struct bpf_secdata {
+	void *rodata;
+	void *data;
 };
 
 static LIST_HEAD(bpf_objects_list);
 
 struct bpf_object {
+	char name[BPF_OBJ_NAME_LEN];
 	char license[64];
 	__u32 kern_version;
 
@@ -194,6 +216,7 @@ struct bpf_object {
 	size_t nr_programs;
 	struct bpf_map *maps;
 	size_t nr_maps;
+	struct bpf_secdata sections;
 
 	bool loaded;
 	bool has_pseudo_calls;
@@ -209,6 +232,9 @@ struct bpf_object {
 		Elf *elf;
 		GElf_Ehdr ehdr;
 		Elf_Data *symbols;
+		Elf_Data *data;
+		Elf_Data *rodata;
+		Elf_Data *bss;
 		size_t strtabidx;
 		struct {
 			GElf_Shdr shdr;
@@ -217,6 +243,9 @@ struct bpf_object {
 		int nr_reloc;
 		int maps_shndx;
 		int text_shndx;
+		int data_shndx;
+		int rodata_shndx;
+		int bss_shndx;
 	} efile;
 	/*
 	 * All loaded bpf_object is linked in a list, which is
@@ -438,6 +467,7 @@ static struct bpf_object *bpf_object__new(const char *path,
 					  size_t obj_buf_sz)
 {
 	struct bpf_object *obj;
+	char *end;
 
 	obj = calloc(1, sizeof(struct bpf_object) + strlen(path) + 1);
 	if (!obj) {
@@ -446,8 +476,14 @@ static struct bpf_object *bpf_object__new(const char *path,
 	}
 
 	strcpy(obj->path, path);
-	obj->efile.fd = -1;
+	/* Using basename() GNU version which doesn't modify arg. */
+	strncpy(obj->name, basename((void *)path),
+		sizeof(obj->name) - 1);
+	end = strchr(obj->name, '.');
+	if (end)
+		*end = 0;
 
+	obj->efile.fd = -1;
 	/*
 	 * Caller of this function should also calls
 	 * bpf_object__elf_finish() after data collection to return
@@ -457,6 +493,9 @@ static struct bpf_object *bpf_object__new(const char *path,
 	obj->efile.obj_buf = obj_buf;
 	obj->efile.obj_buf_sz = obj_buf_sz;
 	obj->efile.maps_shndx = -1;
+	obj->efile.data_shndx = -1;
+	obj->efile.rodata_shndx = -1;
+	obj->efile.bss_shndx = -1;
 
 	obj->loaded = false;
 
@@ -475,6 +514,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
 		obj->efile.elf = NULL;
 	}
 	obj->efile.symbols = NULL;
+	obj->efile.data = NULL;
+	obj->efile.rodata = NULL;
+	obj->efile.bss = NULL;
 
 	zfree(&obj->efile.reloc);
 	obj->efile.nr_reloc = 0;
@@ -616,27 +658,76 @@ static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
 	return false;
 }
 
+static bool bpf_object__has_maps(const struct bpf_object *obj)
+{
+	return obj->efile.maps_shndx >= 0 ||
+	       obj->efile.data_shndx >= 0 ||
+	       obj->efile.rodata_shndx >= 0 ||
+	       obj->efile.bss_shndx >= 0;
+}
+
+static int
+bpf_object__init_internal_map(struct bpf_object *obj, struct bpf_map *map,
+			      enum libbpf_map_type type, Elf_Data *data,
+			      void **data_buff)
+{
+	struct bpf_map_def *def = &map->def;
+	char map_name[BPF_OBJ_NAME_LEN];
+
+	map->libbpf_type = type;
+	map->offset = ~(typeof(map->offset))0;
+	snprintf(map_name, sizeof(map_name), "%.8s%.7s", obj->name,
+		 libbpf_type_to_btf_name[type]);
+	map->name = strdup(map_name);
+	if (!map->name) {
+		pr_warning("failed to alloc map name\n");
+		return -ENOMEM;
+	}
+
+	def->type = BPF_MAP_TYPE_ARRAY;
+	def->key_size = sizeof(int);
+	def->value_size = data->d_size;
+	def->max_entries = 1;
+	def->map_flags = type == LIBBPF_MAP_RODATA ?
+			 BPF_F_RDONLY_PROG : 0;
+	if (data_buff) {
+		*data_buff = malloc(data->d_size);
+		if (!*data_buff) {
+			zfree(&map->name);
+			pr_warning("failed to alloc map content buffer\n");
+			return -ENOMEM;
+		}
+		memcpy(*data_buff, data->d_buf, data->d_size);
+	}
+
+	pr_debug("map %ld is \"%s\"\n", map - obj->maps, map->name);
+	return 0;
+}
+
 static int
 bpf_object__init_maps(struct bpf_object *obj, int flags)
 {
+	int i, map_idx, map_def_sz, nr_syms, nr_maps = 0, nr_maps_glob = 0;
 	bool strict = !(flags & MAPS_RELAX_COMPAT);
-	int i, map_idx, map_def_sz, nr_maps = 0;
-	Elf_Scn *scn;
-	Elf_Data *data;
 	Elf_Data *symbols = obj->efile.symbols;
+	Elf_Data *data = NULL;
+	int ret = 0;
 
-	if (obj->efile.maps_shndx < 0)
-		return -EINVAL;
 	if (!symbols)
 		return -EINVAL;
+	nr_syms = symbols->d_size / sizeof(GElf_Sym);
 
-	scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx);
-	if (scn)
-		data = elf_getdata(scn, NULL);
-	if (!scn || !data) {
-		pr_warning("failed to get Elf_Data from map section %d\n",
-			   obj->efile.maps_shndx);
-		return -EINVAL;
+	if (obj->efile.maps_shndx >= 0) {
+		Elf_Scn *scn = elf_getscn(obj->efile.elf,
+					  obj->efile.maps_shndx);
+
+		if (scn)
+			data = elf_getdata(scn, NULL);
+		if (!scn || !data) {
+			pr_warning("failed to get Elf_Data from map section %d\n",
+				   obj->efile.maps_shndx);
+			return -EINVAL;
+		}
 	}
 
 	/*
@@ -646,7 +737,13 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
 	 *
 	 * TODO: Detect array of map and report error.
 	 */
-	for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+	if (obj->efile.data_shndx >= 0)
+		nr_maps_glob++;
+	if (obj->efile.rodata_shndx >= 0)
+		nr_maps_glob++;
+	if (obj->efile.bss_shndx >= 0)
+		nr_maps_glob++;
+	for (i = 0; data && i < nr_syms; i++) {
 		GElf_Sym sym;
 
 		if (!gelf_getsym(symbols, i, &sym))
@@ -659,19 +756,21 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
 	/* Alloc obj->maps and fill nr_maps. */
 	pr_debug("maps in %s: %d maps in %zd bytes\n", obj->path,
 		 nr_maps, data->d_size);
-
-	if (!nr_maps)
+	if (!nr_maps && !nr_maps_glob)
 		return 0;
 
 	/* Assume equally sized map definitions */
-	map_def_sz = data->d_size / nr_maps;
-	if (!data->d_size || (data->d_size % nr_maps) != 0) {
-		pr_warning("unable to determine map definition size "
-			   "section %s, %d maps in %zd bytes\n",
-			   obj->path, nr_maps, data->d_size);
-		return -EINVAL;
+	if (data) {
+		map_def_sz = data->d_size / nr_maps;
+		if (!data->d_size || (data->d_size % nr_maps) != 0) {
+			pr_warning("unable to determine map definition size "
+				   "section %s, %d maps in %zd bytes\n",
+				   obj->path, nr_maps, data->d_size);
+			return -EINVAL;
+		}
 	}
 
+	nr_maps += nr_maps_glob;
 	obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
 	if (!obj->maps) {
 		pr_warning("alloc maps for object failed\n");
@@ -692,7 +791,7 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
 	/*
 	 * Fill obj->maps using data in "maps" section.
 	 */
-	for (i = 0, map_idx = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
+	for (i = 0, map_idx = 0; data && i < nr_syms; i++) {
 		GElf_Sym sym;
 		const char *map_name;
 		struct bpf_map_def *def;
@@ -705,6 +804,8 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
 		map_name = elf_strptr(obj->efile.elf,
 				      obj->efile.strtabidx,
 				      sym.st_name);
+
+		obj->maps[map_idx].libbpf_type = LIBBPF_MAP_UNSPEC;
 		obj->maps[map_idx].offset = sym.st_value;
 		if (sym.st_value + map_def_sz > data->d_size) {
 			pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
@@ -753,8 +854,27 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
 		map_idx++;
 	}
 
-	qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]), compare_bpf_map);
-	return 0;
+	/*
+	 * Populate rest of obj->maps with libbpf internal maps.
+	 */
+	if (obj->efile.data_shndx >= 0)
+		ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
+						    LIBBPF_MAP_DATA,
+						    obj->efile.data,
+						    &obj->sections.data);
+	if (!ret && obj->efile.rodata_shndx >= 0)
+		ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
+						    LIBBPF_MAP_RODATA,
+						    obj->efile.rodata,
+						    &obj->sections.rodata);
+	if (!ret && obj->efile.bss_shndx >= 0)
+		ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
+						    LIBBPF_MAP_BSS,
+						    obj->efile.bss, NULL);
+	if (!ret)
+		qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]),
+		      compare_bpf_map);
+	return ret;
 }
 
 static bool section_have_execinstr(struct bpf_object *obj, int idx)
@@ -865,6 +985,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 					pr_warning("failed to alloc program %s (%s): %s",
 						   name, obj->path, cp);
 				}
+			} else if (strcmp(name, ".data") == 0) {
+				obj->efile.data = data;
+				obj->efile.data_shndx = idx;
+			} else if (strcmp(name, ".rodata") == 0) {
+				obj->efile.rodata = data;
+				obj->efile.rodata_shndx = idx;
+			} else {
+				pr_debug("skip section(%d) %s\n", idx, name);
 			}
 		} else if (sh.sh_type == SHT_REL) {
 			void *reloc = obj->efile.reloc;
@@ -892,6 +1020,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 				obj->efile.reloc[n].shdr = sh;
 				obj->efile.reloc[n].data = data;
 			}
+		} else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
+			obj->efile.bss = data;
+			obj->efile.bss_shndx = idx;
 		} else {
 			pr_debug("skip section(%d) %s\n", idx, name);
 		}
@@ -918,7 +1049,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
 			}
 		}
 	}
-	if (obj->efile.maps_shndx >= 0) {
+	if (bpf_object__has_maps(obj)) {
 		err = bpf_object__init_maps(obj, flags);
 		if (err)
 			goto out;
@@ -954,13 +1085,46 @@ bpf_object__find_program_by_title(struct bpf_object *obj, const char *title)
 	return NULL;
 }
 
+static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
+				      int shndx)
+{
+	return shndx == obj->efile.data_shndx ||
+	       shndx == obj->efile.bss_shndx ||
+	       shndx == obj->efile.rodata_shndx;
+}
+
+static bool bpf_object__shndx_is_maps(const struct bpf_object *obj,
+				      int shndx)
+{
+	return shndx == obj->efile.maps_shndx;
+}
+
+static bool bpf_object__relo_in_known_section(const struct bpf_object *obj,
+					      int shndx)
+{
+	return shndx == obj->efile.text_shndx ||
+	       bpf_object__shndx_is_maps(obj, shndx) ||
+	       bpf_object__shndx_is_data(obj, shndx);
+}
+
+static enum libbpf_map_type
+bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
+{
+	if (shndx == obj->efile.data_shndx)
+		return LIBBPF_MAP_DATA;
+	else if (shndx == obj->efile.bss_shndx)
+		return LIBBPF_MAP_BSS;
+	else if (shndx == obj->efile.rodata_shndx)
+		return LIBBPF_MAP_RODATA;
+	else
+		return LIBBPF_MAP_UNSPEC;
+}
+
 static int
 bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 			   Elf_Data *data, struct bpf_object *obj)
 {
 	Elf_Data *symbols = obj->efile.symbols;
-	int text_shndx = obj->efile.text_shndx;
-	int maps_shndx = obj->efile.maps_shndx;
 	struct bpf_map *maps = obj->maps;
 	size_t nr_maps = obj->nr_maps;
 	int i, nrels;
@@ -980,7 +1144,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 		GElf_Sym sym;
 		GElf_Rel rel;
 		unsigned int insn_idx;
+		unsigned int shdr_idx;
 		struct bpf_insn *insns = prog->insns;
+		enum libbpf_map_type type;
+		const char *name;
 		size_t map_idx;
 
 		if (!gelf_getrel(data, i, &rel)) {
@@ -995,13 +1162,18 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 				   GELF_R_SYM(rel.r_info));
 			return -LIBBPF_ERRNO__FORMAT;
 		}
-		pr_debug("relo for %lld value %lld name %d\n",
+
+		name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
+				  sym.st_name) ? : "<?>";
+
+		pr_debug("relo for %lld value %lld name %d (\'%s\')\n",
 			 (long long) (rel.r_info >> 32),
-			 (long long) sym.st_value, sym.st_name);
+			 (long long) sym.st_value, sym.st_name, name);
 
-		if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
-			pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
-				   prog->section_name, sym.st_shndx);
+		shdr_idx = sym.st_shndx;
+		if (!bpf_object__relo_in_known_section(obj, shdr_idx)) {
+			pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
+				   prog->section_name, shdr_idx);
 			return -LIBBPF_ERRNO__RELOC;
 		}
 
@@ -1026,10 +1198,22 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 			return -LIBBPF_ERRNO__RELOC;
 		}
 
-		if (sym.st_shndx == maps_shndx) {
-			/* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
+		if (bpf_object__shndx_is_maps(obj, shdr_idx) ||
+		    bpf_object__shndx_is_data(obj, shdr_idx)) {
+			type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx);
+			if (type != LIBBPF_MAP_UNSPEC &&
+			    GELF_ST_BIND(sym.st_info) == STB_GLOBAL) {
+				pr_warning("bpf: relocation: not yet supported relo for non-static global \'%s\' variable found in insns[%d].code 0x%x\n",
+					   name, insn_idx, insns[insn_idx].code);
+				return -LIBBPF_ERRNO__RELOC;
+			}
+
 			for (map_idx = 0; map_idx < nr_maps; map_idx++) {
-				if (maps[map_idx].offset == sym.st_value) {
+				if (maps[map_idx].libbpf_type != type)
+					continue;
+				if (type != LIBBPF_MAP_UNSPEC ||
+				    (type == LIBBPF_MAP_UNSPEC &&
+				     maps[map_idx].offset == sym.st_value)) {
 					pr_debug("relocation: find map %zd (%s) for insn %u\n",
 						 map_idx, maps[map_idx].name, insn_idx);
 					break;
@@ -1042,7 +1226,8 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 				return -LIBBPF_ERRNO__RELOC;
 			}
 
-			prog->reloc_desc[i].type = RELO_LD64;
+			prog->reloc_desc[i].type = type != LIBBPF_MAP_UNSPEC ?
+						   RELO_DATA : RELO_LD64;
 			prog->reloc_desc[i].insn_idx = insn_idx;
 			prog->reloc_desc[i].map_idx = map_idx;
 		}
@@ -1050,13 +1235,25 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
 	return 0;
 }
 
+static const char *bpf_map___btf_name(struct bpf_map *map)
+{
+	if (!bpf_map__is_internal(map))
+		return map->name;
+	/*
+	 * LLVM annotates global data differently in BTF, that is,
+	 * only as '.data', '.bss' or '.rodata'.
+	 */
+	return libbpf_type_to_btf_name[map->libbpf_type];
+}
+
 static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
 {
+	const char *name = bpf_map___btf_name(map);
 	struct bpf_map_def *def = &map->def;
 	__u32 key_type_id, value_type_id;
 	int ret;
 
-	ret = btf__get_map_kv_tids(btf, map->name, def->key_size,
+	ret = btf__get_map_kv_tids(btf, name, def->key_size,
 				   def->value_size, &key_type_id,
 				   &value_type_id);
 	if (ret)
@@ -1175,6 +1372,25 @@ bpf_object__probe_caps(struct bpf_object *obj)
 	return bpf_object__probe_name(obj);
 }
 
+static int
+bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
+{
+	int err, zero = 0;
+	__u8 *data;
+
+	/* Nothing to do here since kernel already zero-initializes .bss map. */
+	if (map->libbpf_type == LIBBPF_MAP_BSS)
+		return 0;
+
+	data = map->libbpf_type == LIBBPF_MAP_DATA ?
+	       obj->sections.data : obj->sections.rodata;
+	err = bpf_map_update_elem(map->fd, &zero, data, 0);
+	/* Lock .rodata map as read-only from syscall side. */
+	if (!err && map->libbpf_type == LIBBPF_MAP_RODATA)
+		err = bpf_map_lock(map->fd);
+	return err;
+}
+
 static int
 bpf_object__create_maps(struct bpf_object *obj)
 {
@@ -1232,6 +1448,7 @@ bpf_object__create_maps(struct bpf_object *obj)
 			size_t j;
 
 			err = *pfd;
+err_out:
 			cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
 			pr_warning("failed to create map (name: '%s'): %s\n",
 				   map->name, cp);
@@ -1239,6 +1456,15 @@ bpf_object__create_maps(struct bpf_object *obj)
 				zclose(obj->maps[j].fd);
 			return err;
 		}
+
+		if (bpf_map__is_internal(map)) {
+			err = bpf_object__populate_internal_map(obj, map);
+			if (err < 0) {
+				zclose(*pfd);
+				goto err_out;
+			}
+		}
+
 		pr_debug("create map %s: fd=%d\n", map->name, *pfd);
 	}
 
@@ -1393,19 +1619,27 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
 		return 0;
 
 	for (i = 0; i < prog->nr_reloc; i++) {
-		if (prog->reloc_desc[i].type == RELO_LD64) {
+		if (prog->reloc_desc[i].type == RELO_LD64 ||
+		    prog->reloc_desc[i].type == RELO_DATA) {
+			bool relo_data = prog->reloc_desc[i].type == RELO_DATA;
 			struct bpf_insn *insns = prog->insns;
 			int insn_idx, map_idx;
 
 			insn_idx = prog->reloc_desc[i].insn_idx;
 			map_idx = prog->reloc_desc[i].map_idx;
 
-			if (insn_idx >= (int)prog->insns_cnt) {
+			if (insn_idx + 1 >= (int)prog->insns_cnt) {
 				pr_warning("relocation out of range: '%s'\n",
 					   prog->section_name);
 				return -LIBBPF_ERRNO__RELOC;
 			}
-			insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+
+			if (!relo_data) {
+				insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
+			} else {
+				insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
+				insns[insn_idx + 1].imm = insns[insn_idx].imm;
+			}
 			insns[insn_idx].imm = obj->maps[map_idx].fd;
 		} else if (prog->reloc_desc[i].type == RELO_CALL) {
 			err = bpf_program__reloc_text(prog, obj,
@@ -2291,6 +2525,9 @@ void bpf_object__close(struct bpf_object *obj)
 		obj->maps[i].priv = NULL;
 		obj->maps[i].clear_priv = NULL;
 	}
+
+	zfree(&obj->sections.rodata);
+	zfree(&obj->sections.data);
 	zfree(&obj->maps);
 	obj->nr_maps = 0;
 
@@ -2768,6 +3005,11 @@ bool bpf_map__is_offload_neutral(struct bpf_map *map)
 	return map->def.type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
 }
 
+bool bpf_map__is_internal(struct bpf_map *map)
+{
+	return map->libbpf_type != LIBBPF_MAP_UNSPEC;
+}
+
 void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex)
 {
 	map->map_ifindex = ifindex;
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index b4652aa1a58a..6e162da8e9f9 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -300,6 +300,7 @@ LIBBPF_API void *bpf_map__priv(struct bpf_map *map);
 LIBBPF_API int bpf_map__reuse_fd(struct bpf_map *map, int fd);
 LIBBPF_API int bpf_map__resize(struct bpf_map *map, __u32 max_entries);
 LIBBPF_API bool bpf_map__is_offload_neutral(struct bpf_map *map);
+LIBBPF_API bool bpf_map__is_internal(struct bpf_map *map);
 LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
 LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
 LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 778a26702a70..3f493f3520c9 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -154,3 +154,9 @@ LIBBPF_0.0.2 {
 		xsk_umem__fd;
 		xsk_socket__fd;
 } LIBBPF_0.0.1;
+
+LIBBPF_0.0.3 {
+	global:
+		bpf_map__is_internal;
+		bpf_map_lock;
+} LIBBPF_0.0.2;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH rfc v3 bpf-next 8/9] bpf, selftest: test {rd,wr}only flags and direct value access
  2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
                   ` (6 preceding siblings ...)
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 7/9] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
@ 2019-03-11 21:51 ` Daniel Borkmann
  2019-03-19 18:18   ` Andrii Nakryiko
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 9/9] bpf, selftest: test global data/bss/rodata sections Daniel Borkmann
  8 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

Extend test_verifier with various test cases around the two kernel
extensions, that is, {rd,wr}only map support as well as direct map
value access. All passing, one skipped due to xskmap not present
on test machine:

  # ./test_verifier
  [...]
  #920/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
  #921/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
  Summary: 1366 PASSED, 1 SKIPPED, 0 FAILED

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/include/linux/filter.h                  |  14 ++
 tools/testing/selftests/bpf/test_verifier.c   |  42 +++-
 .../selftests/bpf/verifier/array_access.c     | 159 ++++++++++++
 .../bpf/verifier/direct_value_access.c        | 226 ++++++++++++++++++
 4 files changed, 436 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/verifier/direct_value_access.c

diff --git a/tools/include/linux/filter.h b/tools/include/linux/filter.h
index cce0b02c0e28..d288576e0bcd 100644
--- a/tools/include/linux/filter.h
+++ b/tools/include/linux/filter.h
@@ -283,6 +283,20 @@
 #define BPF_LD_MAP_FD(DST, MAP_FD)				\
 	BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
 
+#define BPF_LD_MAP_VALUE(DST, MAP_FD, VALUE_IDX, VALUE_OFF)	\
+	((struct bpf_insn) {					\
+		.code  = BPF_LD | BPF_DW | BPF_IMM,		\
+		.dst_reg = DST,					\
+		.src_reg = BPF_PSEUDO_MAP_VALUE,		\
+		.off   = (__u16)(VALUE_IDX),			\
+		.imm   = MAP_FD }),				\
+	((struct bpf_insn) {					\
+		.code  = 0, /* zero is reserved opcode */	\
+		.dst_reg = 0,					\
+		.src_reg = 0,					\
+		.off   = ((__u32)(VALUE_IDX)) >> 16,		\
+		.imm   = VALUE_OFF })
+
 /* Relative call */
 
 #define BPF_CALL_REL(TGT)					\
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 477a9dcf9fff..7ef1991a5295 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -51,7 +51,7 @@
 
 #define MAX_INSNS	BPF_MAXINSNS
 #define MAX_FIXUPS	8
-#define MAX_NR_MAPS	14
+#define MAX_NR_MAPS	16
 #define MAX_TEST_RUNS	8
 #define POINTER_VALUE	0xcafe4all
 #define TEST_DATA_LEN	64
@@ -80,6 +80,8 @@ struct bpf_test {
 	int fixup_cgroup_storage[MAX_FIXUPS];
 	int fixup_percpu_cgroup_storage[MAX_FIXUPS];
 	int fixup_map_spin_lock[MAX_FIXUPS];
+	int fixup_map_array_ro[MAX_FIXUPS];
+	int fixup_map_array_wo[MAX_FIXUPS];
 	const char *errstr;
 	const char *errstr_unpriv;
 	uint32_t retval, retval_unpriv, insn_processed;
@@ -277,13 +279,15 @@ static bool skip_unsupported_map(enum bpf_map_type map_type)
 	return false;
 }
 
-static int create_map(uint32_t type, uint32_t size_key,
-		      uint32_t size_value, uint32_t max_elem)
+static int __create_map(uint32_t type, uint32_t size_key,
+			uint32_t size_value, uint32_t max_elem,
+			uint32_t extra_flags)
 {
 	int fd;
 
 	fd = bpf_create_map(type, size_key, size_value, max_elem,
-			    type == BPF_MAP_TYPE_HASH ? BPF_F_NO_PREALLOC : 0);
+			    (type == BPF_MAP_TYPE_HASH ?
+			     BPF_F_NO_PREALLOC : 0) | extra_flags);
 	if (fd < 0) {
 		if (skip_unsupported_map(type))
 			return -1;
@@ -293,6 +297,12 @@ static int create_map(uint32_t type, uint32_t size_key,
 	return fd;
 }
 
+static int create_map(uint32_t type, uint32_t size_key,
+		      uint32_t size_value, uint32_t max_elem)
+{
+	return __create_map(type, size_key, size_value, max_elem, 0);
+}
+
 static void update_map(int fd, int index)
 {
 	struct test_val value = {
@@ -519,6 +529,8 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 	int *fixup_cgroup_storage = test->fixup_cgroup_storage;
 	int *fixup_percpu_cgroup_storage = test->fixup_percpu_cgroup_storage;
 	int *fixup_map_spin_lock = test->fixup_map_spin_lock;
+	int *fixup_map_array_ro = test->fixup_map_array_ro;
+	int *fixup_map_array_wo = test->fixup_map_array_wo;
 
 	if (test->fill_helper)
 		test->fill_helper(test);
@@ -556,7 +568,7 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 
 	if (*fixup_map_array_48b) {
 		map_fds[3] = create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
-					sizeof(struct test_val), 1);
+					sizeof(struct test_val), 2);
 		update_map(map_fds[3], 0);
 		do {
 			prog[*fixup_map_array_48b].imm = map_fds[3];
@@ -642,6 +654,26 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
 			fixup_map_spin_lock++;
 		} while (*fixup_map_spin_lock);
 	}
+	if (*fixup_map_array_ro) {
+		map_fds[14] = __create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
+					   sizeof(struct test_val), 1,
+					   BPF_F_RDONLY_PROG);
+		update_map(map_fds[14], 0);
+		do {
+			prog[*fixup_map_array_ro].imm = map_fds[14];
+			fixup_map_array_ro++;
+		} while (*fixup_map_array_ro);
+	}
+	if (*fixup_map_array_wo) {
+		map_fds[15] = __create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
+					   sizeof(struct test_val), 1,
+					   BPF_F_WRONLY_PROG);
+		update_map(map_fds[15], 0);
+		do {
+			prog[*fixup_map_array_wo].imm = map_fds[15];
+			fixup_map_array_wo++;
+		} while (*fixup_map_array_wo);
+	}
 }
 
 static int set_admin(bool admin)
diff --git a/tools/testing/selftests/bpf/verifier/array_access.c b/tools/testing/selftests/bpf/verifier/array_access.c
index 0dcecaf3ec6f..9a2b6f9b4414 100644
--- a/tools/testing/selftests/bpf/verifier/array_access.c
+++ b/tools/testing/selftests/bpf/verifier/array_access.c
@@ -217,3 +217,162 @@
 	.result = REJECT,
 	.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
 },
+{
+	"valid read map access into a read-only array 1",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_ro = { 3 },
+	.result = ACCEPT,
+	.retval = 28,
+},
+{
+	"valid read map access into a read-only array 2",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 4),
+	BPF_MOV64_IMM(BPF_REG_3, 0),
+	BPF_MOV64_IMM(BPF_REG_4, 0),
+	BPF_MOV64_IMM(BPF_REG_5, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+		     BPF_FUNC_csum_diff),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_array_ro = { 3 },
+	.result = ACCEPT,
+	.retval = -29,
+},
+{
+	"invalid write map access into a read-only array 1",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_ro = { 3 },
+	.result = REJECT,
+	.errstr = "write into map forbidden",
+},
+{
+	"invalid write map access into a read-only array 2",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_4, 8),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+		     BPF_FUNC_skb_load_bytes),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_array_ro = { 4 },
+	.result = REJECT,
+	.errstr = "write into map forbidden",
+},
+{
+	"valid write map access into a write-only array 1",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_wo = { 3 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"valid write map access into a write-only array 2",
+	.insns = {
+	BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+	BPF_MOV64_IMM(BPF_REG_2, 0),
+	BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_4, 8),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+		     BPF_FUNC_skb_load_bytes),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_array_wo = { 4 },
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"invalid read map access into a write-only array 1",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
+	BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_wo = { 3 },
+	.result = REJECT,
+	.errstr = "read into map forbidden",
+},
+{
+	"invalid read map access into a write-only array 2",
+	.insns = {
+	BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+	BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
+	BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
+	BPF_LD_MAP_FD(BPF_REG_1, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
+	BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
+
+	BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+	BPF_MOV64_IMM(BPF_REG_2, 4),
+	BPF_MOV64_IMM(BPF_REG_3, 0),
+	BPF_MOV64_IMM(BPF_REG_4, 0),
+	BPF_MOV64_IMM(BPF_REG_5, 0),
+	BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
+		     BPF_FUNC_csum_diff),
+	BPF_EXIT_INSN(),
+	},
+	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
+	.fixup_map_array_wo = { 3 },
+	.result = REJECT,
+	.errstr = "read into map forbidden",
+},
diff --git a/tools/testing/selftests/bpf/verifier/direct_value_access.c b/tools/testing/selftests/bpf/verifier/direct_value_access.c
new file mode 100644
index 000000000000..656c3675b735
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/direct_value_access.c
@@ -0,0 +1,226 @@
+{
+	"direct map access, write test 1",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 0),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 2",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 8),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 3",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 8),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 8, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 4",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 40),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 5",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 32),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 8, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 6",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 40),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 4, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "R1 min value is outside of the array range",
+},
+{
+	"direct map access, write test 7",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, -1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 4, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "direct value offset of 4294967295 is not allowed",
+},
+{
+	"direct map access, write test 8",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 1),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, -1, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 9",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 48),
+	BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "invalid access to map value pointer",
+},
+{
+	"direct map access, write test 10",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 47),
+	BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = ACCEPT,
+	.retval = 1,
+},
+{
+	"direct map access, write test 11",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 48),
+	BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "invalid access to map value pointer",
+},
+{
+	"direct map access, write test 12",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, (1<<29)),
+	BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "direct value offset of 536870912 is not allowed",
+},
+{
+	"direct map access, write test 13",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, (1<<29)-1),
+	BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1 },
+	.result = REJECT,
+	.errstr = "invalid access to map value pointer, value_size=48 index=0 off=536870911",
+},
+{
+	"direct map access, write test 14",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 47),
+	BPF_LD_MAP_VALUE(BPF_REG_2, 0, 0, 46),
+	BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
+	BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1, 3 },
+	.result = ACCEPT,
+	.retval = 0xff,
+},
+{
+	"direct map access, write test 15",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 1, 47),
+	BPF_LD_MAP_VALUE(BPF_REG_2, 0, 1, 46),
+	BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
+	BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1, 3 },
+	.result = ACCEPT,
+	.retval = 0xff,
+},
+{
+	"direct map access, write test 16",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 1, 46),
+	BPF_LD_MAP_VALUE(BPF_REG_2, 0, 0, 46),
+	BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
+	BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1, 3 },
+	.result = ACCEPT,
+	.retval = 0,
+},
+{
+	"direct map access, write test 17",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, 1, 46),
+	BPF_LD_MAP_VALUE(BPF_REG_2, 0, 2, 46),
+	BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
+	BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1, 3 },
+	.result = REJECT,
+	.errstr = "invalid access to map value pointer, value_size=48 index=2 off=46",
+},
+{
+	"direct map access, write test 18",
+	.insns = {
+	BPF_MOV64_IMM(BPF_REG_0, 1),
+	BPF_LD_MAP_VALUE(BPF_REG_1, 0, ~0, 46),
+	BPF_LD_MAP_VALUE(BPF_REG_2, 0, ~0, 46),
+	BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
+	BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1, 0),
+	BPF_EXIT_INSN(),
+	},
+	.fixup_map_array_48b = { 1, 3 },
+	.result = REJECT,
+	.errstr = "invalid access to map value pointer, value_size=48 index=4294967295 off=46",
+},
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH rfc v3 bpf-next 9/9] bpf, selftest: test global data/bss/rodata sections
  2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
                   ` (7 preceding siblings ...)
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 8/9] bpf, selftest: test {rd,wr}only flags and direct value access Daniel Borkmann
@ 2019-03-11 21:51 ` Daniel Borkmann
  2019-03-19 18:28   ` Andrii Nakryiko
  8 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 21:51 UTC (permalink / raw)
  To: ast
  Cc: bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb, Daniel Borkmann

From: Joe Stringer <joe@wand.net.nz>

Add tests for libbpf relocation of static variable references
into the .data, .rodata and .bss sections of the ELF, also add
read-only test for .rodata. All passing:

  # ./test_progs
  [...]
  test_global_data:PASS:load program 0 nsec
  test_global_data:PASS:pass global data run 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .data reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .data reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .bss reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_number:PASS:relocate .rodata reference 925 nsec
  test_global_data_string:PASS:relocate .rodata reference 925 nsec
  test_global_data_string:PASS:relocate .data reference 925 nsec
  test_global_data_string:PASS:relocate .bss reference 925 nsec
  test_global_data_string:PASS:relocate .data reference 925 nsec
  test_global_data_string:PASS:relocate .bss reference 925 nsec
  test_global_data_struct:PASS:relocate .rodata reference 925 nsec
  test_global_data_struct:PASS:relocate .bss reference 925 nsec
  test_global_data_struct:PASS:relocate .rodata reference 925 nsec
  test_global_data_struct:PASS:relocate .data reference 925 nsec
  test_global_data_rdonly:PASS:test .rodata read-only map 925 nsec
  [...]
  Summary: 229 PASSED, 0 FAILED

Note map helper signatures have been changed to avoid warnings
when passing in const data.

Joint work with Daniel Borkmann.

Signed-off-by: Joe Stringer <joe@wand.net.nz>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 tools/testing/selftests/bpf/bpf_helpers.h     |   8 +-
 .../selftests/bpf/prog_tests/global_data.c    | 157 ++++++++++++++++++
 .../selftests/bpf/progs/test_global_data.c    | 106 ++++++++++++
 3 files changed, 267 insertions(+), 4 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/global_data.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_global_data.c

diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index c9433a496d54..91c53dac95c8 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -9,14 +9,14 @@
 #define SEC(NAME) __attribute__((section(NAME), used))
 
 /* helper functions called from eBPF programs written in C */
-static void *(*bpf_map_lookup_elem)(void *map, void *key) =
+static void *(*bpf_map_lookup_elem)(void *map, const void *key) =
 	(void *) BPF_FUNC_map_lookup_elem;
-static int (*bpf_map_update_elem)(void *map, void *key, void *value,
+static int (*bpf_map_update_elem)(void *map, const void *key, const void *value,
 				  unsigned long long flags) =
 	(void *) BPF_FUNC_map_update_elem;
-static int (*bpf_map_delete_elem)(void *map, void *key) =
+static int (*bpf_map_delete_elem)(void *map, const void *key) =
 	(void *) BPF_FUNC_map_delete_elem;
-static int (*bpf_map_push_elem)(void *map, void *value,
+static int (*bpf_map_push_elem)(void *map, const void *value,
 				unsigned long long flags) =
 	(void *) BPF_FUNC_map_push_elem;
 static int (*bpf_map_pop_elem)(void *map, void *value) =
diff --git a/tools/testing/selftests/bpf/prog_tests/global_data.c b/tools/testing/selftests/bpf/prog_tests/global_data.c
new file mode 100644
index 000000000000..d011079fb0bf
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/global_data.c
@@ -0,0 +1,157 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <test_progs.h>
+
+static void test_global_data_number(struct bpf_object *obj, __u32 duration)
+{
+	int i, err, map_fd;
+	uint64_t num;
+
+	map_fd = bpf_find_map(__func__, obj, "result_number");
+	if (map_fd < 0) {
+		error_cnt++;
+		return;
+	}
+
+	struct {
+		char *name;
+		uint32_t key;
+		uint64_t num;
+	} tests[] = {
+		{ "relocate .bss reference",     0, 0 },
+		{ "relocate .data reference",    1, 42 },
+		{ "relocate .rodata reference",  2, 24 },
+		{ "relocate .bss reference",     3, 0 },
+		{ "relocate .data reference",    4, 0xffeeff },
+		{ "relocate .rodata reference",  5, 0xabab },
+		{ "relocate .bss reference",     6, 1234 },
+		{ "relocate .bss reference",     7, 0 },
+		{ "relocate .rodata reference",  8, 0xab },
+		{ "relocate .rodata reference",  9, 0x1111111111111111 },
+		{ "relocate .rodata reference", 10, ~0 },
+	};
+
+	for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+		err = bpf_map_lookup_elem(map_fd, &tests[i].key, &num);
+		CHECK(err || num != tests[i].num, tests[i].name,
+		      "err %d result %lx expected %lx\n",
+		      err, num, tests[i].num);
+	}
+}
+
+static void test_global_data_string(struct bpf_object *obj, __u32 duration)
+{
+	int i, err, map_fd;
+	char str[32];
+
+	map_fd = bpf_find_map(__func__, obj, "result_string");
+	if (map_fd < 0) {
+		error_cnt++;
+		return;
+	}
+
+	struct {
+		char *name;
+		uint32_t key;
+		char str[32];
+	} tests[] = {
+		{ "relocate .rodata reference", 0, "abcdefghijklmnopqrstuvwxyz" },
+		{ "relocate .data reference",   1, "abcdefghijklmnopqrstuvwxyz" },
+		{ "relocate .bss reference",    2, "" },
+		{ "relocate .data reference",   3, "abcdexghijklmnopqrstuvwxyz" },
+		{ "relocate .bss reference",    4, "\0\0hello" },
+	};
+
+	for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+		err = bpf_map_lookup_elem(map_fd, &tests[i].key, str);
+		CHECK(err || memcmp(str, tests[i].str, sizeof(str)),
+		      tests[i].name, "err %d result \'%s\' expected \'%s\'\n",
+		      err, str, tests[i].str);
+	}
+}
+
+struct foo {
+	__u8  a;
+	__u32 b;
+	__u64 c;
+};
+
+static void test_global_data_struct(struct bpf_object *obj, __u32 duration)
+{
+	int i, err, map_fd;
+	struct foo val;
+
+	map_fd = bpf_find_map(__func__, obj, "result_struct");
+	if (map_fd < 0) {
+		error_cnt++;
+		return;
+	}
+
+	struct {
+		char *name;
+		uint32_t key;
+		struct foo val;
+	} tests[] = {
+		{ "relocate .rodata reference", 0, { 42, 0xfefeefef, 0x1111111111111111ULL, } },
+		{ "relocate .bss reference",    1, { } },
+		{ "relocate .rodata reference", 2, { } },
+		{ "relocate .data reference",   3, { 41, 0xeeeeefef, 0x2111111111111111ULL, } },
+	};
+
+	for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
+		err = bpf_map_lookup_elem(map_fd, &tests[i].key, &val);
+		CHECK(err || memcmp(&val, &tests[i].val, sizeof(val)),
+		      tests[i].name, "err %d result { %u, %u, %llu } expected { %u, %u, %llu }\n",
+		      err, val.a, val.b, val.c, tests[i].val.a, tests[i].val.b, tests[i].val.c);
+	}
+}
+
+static void test_global_data_rdonly(struct bpf_object *obj, __u32 duration)
+{
+	int err = -ENOMEM, map_fd, zero = 0;
+	struct bpf_map *map;
+	__u8 *buff;
+
+	map = bpf_object__find_map_by_name(obj, "test_glo.rodata");
+	if (!map || !bpf_map__is_internal(map)) {
+		error_cnt++;
+		return;
+	}
+
+	map_fd = bpf_map__fd(map);
+	if (map_fd < 0) {
+		error_cnt++;
+		return;
+	}
+
+	buff = malloc(bpf_map__def(map)->value_size);
+	if (buff)
+		err = bpf_map_update_elem(map_fd, &zero, buff, 0);
+	free(buff);
+	CHECK(!err || errno != EPERM, "test .rodata read-only map",
+	      "err %d errno %d\n", err, errno);
+}
+
+void test_global_data(void)
+{
+	const char *file = "./test_global_data.o";
+	__u32 duration = 0, retval;
+	struct bpf_object *obj;
+	int err, prog_fd;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
+	if (CHECK(err, "load program", "error %d loading %s\n", err, file))
+		return;
+
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				NULL, NULL, &retval, &duration);
+	CHECK(err || retval, "pass global data run",
+	      "err %d errno %d retval %d duration %d\n",
+	      err, errno, retval, duration);
+
+	test_global_data_number(obj, duration);
+	test_global_data_string(obj, duration);
+	test_global_data_struct(obj, duration);
+	test_global_data_rdonly(obj, duration);
+
+	bpf_object__close(obj);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_global_data.c b/tools/testing/selftests/bpf/progs/test_global_data.c
new file mode 100644
index 000000000000..5ab14e941980
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_global_data.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2019 Isovalent, Inc.
+
+#include <linux/bpf.h>
+#include <linux/pkt_cls.h>
+#include <string.h>
+
+#include "bpf_helpers.h"
+
+struct bpf_map_def SEC("maps") result_number = {
+	.type		= BPF_MAP_TYPE_ARRAY,
+	.key_size	= sizeof(__u32),
+	.value_size	= sizeof(__u64),
+	.max_entries	= 11,
+};
+
+struct bpf_map_def SEC("maps") result_string = {
+	.type		= BPF_MAP_TYPE_ARRAY,
+	.key_size	= sizeof(__u32),
+	.value_size	= 32,
+	.max_entries	= 5,
+};
+
+struct foo {
+	__u8  a;
+	__u32 b;
+	__u64 c;
+};
+
+struct bpf_map_def SEC("maps") result_struct = {
+	.type		= BPF_MAP_TYPE_ARRAY,
+	.key_size	= sizeof(__u32),
+	.value_size	= sizeof(struct foo),
+	.max_entries	= 5,
+};
+
+/* Relocation tests for __u64s. */
+static       __u64 num0;
+static       __u64 num1 = 42;
+static const __u64 num2 = 24;
+static       __u64 num3 = 0;
+static       __u64 num4 = 0xffeeff;
+static const __u64 num5 = 0xabab;
+static const __u64 num6 = 0xab;
+
+/* Relocation tests for strings. */
+static const char str0[32] = "abcdefghijklmnopqrstuvwxyz";
+static       char str1[32] = "abcdefghijklmnopqrstuvwxyz";
+static       char str2[32];
+
+/* Relocation tests for structs. */
+static const struct foo struct0 = {
+	.a = 42,
+	.b = 0xfefeefef,
+	.c = 0x1111111111111111ULL,
+};
+static struct foo struct1;
+static const struct foo struct2;
+static struct foo struct3 = {
+	.a = 41,
+	.b = 0xeeeeefef,
+	.c = 0x2111111111111111ULL,
+};
+
+#define test_reloc(map, num, var)					\
+	do {								\
+		__u32 key = num;					\
+		bpf_map_update_elem(&result_##map, &key, var, 0);	\
+	} while (0)
+
+SEC("static_data_load")
+int load_static_data(struct __sk_buff *skb)
+{
+	static const __u64 bar = ~0;
+
+	test_reloc(number, 0, &num0);
+	test_reloc(number, 1, &num1);
+	test_reloc(number, 2, &num2);
+	test_reloc(number, 3, &num3);
+	test_reloc(number, 4, &num4);
+	test_reloc(number, 5, &num5);
+	num4 = 1234;
+	test_reloc(number, 6, &num4);
+	test_reloc(number, 7, &num0);
+	test_reloc(number, 8, &num6);
+
+	test_reloc(string, 0, str0);
+	test_reloc(string, 1, str1);
+	test_reloc(string, 2, str2);
+	str1[5] = 'x';
+	test_reloc(string, 3, str1);
+	__builtin_memcpy(&str2[2], "hello", sizeof("hello"));
+	test_reloc(string, 4, str2);
+
+	test_reloc(struct, 0, &struct0);
+	test_reloc(struct, 1, &struct1);
+	test_reloc(struct, 2, &struct2);
+	test_reloc(struct, 3, &struct3);
+
+	test_reloc(number,  9, &struct0.c);
+	test_reloc(number, 10, &bar);
+
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "GPL";
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support for maps
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support " Daniel Borkmann
@ 2019-03-11 23:06   ` Alexei Starovoitov
  2019-03-11 23:34     ` Daniel Borkmann
  2019-03-14 19:27     ` Andrii Nakryiko
  2019-03-14 19:26   ` Andrii Nakryiko
  1 sibling, 2 replies; 20+ messages in thread
From: Alexei Starovoitov @ 2019-03-11 23:06 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: ast, bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb

On Mon, Mar 11, 2019 at 10:51:18PM +0100, Daniel Borkmann wrote:
> This work adds two new map creation flags BPF_F_RDONLY_PROG
> and BPF_F_WRONLY_PROG in order to allow for read-only or
> write-only BPF maps from a BPF program side.
> 
> Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
> applies to system call side, meaning the BPF program has full
> read/write access to the map as usual while bpf(2) calls with
> map fd can either only read or write into the map depending
> on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
> for the exact opposite such that verifier is going to reject
> program loads if write into a read-only map or a read into a
> write-only map is detected. For read-only map case also some
> helpers are forbidden for programs that would alter the map
> state such as map deletion, update, etc.
> 
> We've enabled this generic map extension to various non-special
> maps holding normal user data: array, hash, lru, lpm, local
> storage, queue and stack. Further map types could be followed
> up in future depending on use-case. Main use case here is to
> forbid writes into .rodata map values from verifier side.

I think WRONLY | WRONLY_PROG should be invalid combination?
Since these attributes are set at creation time and cannot be changed,
nothing can ever read from it, so why write into it?
Similarly RDONLY | RDONLY_PROG is invalid too?

Also looking at the next patch and 'lock' command...
May be it would be cleaner to do add WRONCE (from syscall) flag?
Then for .rodata the attrs will be RDONLY_PROG | WRONCE
and no 'lock' necessary.
WRONCE_PROG probably doesn't make sense.
Storing dangling task_struct pointer in the next patch doesn't look great.
The whole 'lock' concept feels useful, but in the context of implementing
.rodata it feels that WRONCE would be a better fit,
since libbpf won't be able to make a mistake and forget to 'lock'.
'lock' syscall cmd can be confused with BPF_F_LOCK too.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support for maps
  2019-03-11 23:06   ` Alexei Starovoitov
@ 2019-03-11 23:34     ` Daniel Borkmann
  2019-03-14 19:27     ` Andrii Nakryiko
  1 sibling, 0 replies; 20+ messages in thread
From: Daniel Borkmann @ 2019-03-11 23:34 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: ast, bpf, netdev, joe, john.fastabend, yhs, andrii.nakryiko,
	jakub.kicinski, tgraf, lmb

On 03/12/2019 12:06 AM, Alexei Starovoitov wrote:
> On Mon, Mar 11, 2019 at 10:51:18PM +0100, Daniel Borkmann wrote:
>> This work adds two new map creation flags BPF_F_RDONLY_PROG
>> and BPF_F_WRONLY_PROG in order to allow for read-only or
>> write-only BPF maps from a BPF program side.
>>
>> Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
>> applies to system call side, meaning the BPF program has full
>> read/write access to the map as usual while bpf(2) calls with
>> map fd can either only read or write into the map depending
>> on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
>> for the exact opposite such that verifier is going to reject
>> program loads if write into a read-only map or a read into a
>> write-only map is detected. For read-only map case also some
>> helpers are forbidden for programs that would alter the map
>> state such as map deletion, update, etc.
>>
>> We've enabled this generic map extension to various non-special
>> maps holding normal user data: array, hash, lru, lpm, local
>> storage, queue and stack. Further map types could be followed
>> up in future depending on use-case. Main use case here is to
>> forbid writes into .rodata map values from verifier side.
> 
> I think WRONLY | WRONLY_PROG should be invalid combination?
> Since these attributes are set at creation time and cannot be changed,
> nothing can ever read from it, so why write into it?
> Similarly RDONLY | RDONLY_PROG is invalid too?

Yeah, I can add this. Note that 'these attributes are set at creation
time and cannot be changed' does not fully hold for WRONLY/RDONLY,
e.g. you can create a map as RDONLY, but later on retrieve fd by id
or from bpf fs but without RDONLY, so this fd will be able to write
into it from syscall side (we're checking struct file flags, not the
attributes' map flags at runtime). Given that I thought it made more
sense to keep these two logically separated since one is only revelant
for lifetime of fd and the other one for map. (I also think keeping
RDONLY/WRONLY stored in map_flags attr and exposing it is pretty
confusing since it doesn't say anything, this should probably be
masked out as a fix, thoughts?)

> Also looking at the next patch and 'lock' command...
> May be it would be cleaner to do add WRONCE (from syscall) flag?
> Then for .rodata the attrs will be RDONLY_PROG | WRONCE
> and no 'lock' necessary.
> WRONCE_PROG probably doesn't make sense.

Agree, from prog side not and would probably also just make fast-path
slower. Hm, for the WRONCE at creation flags, we'd need some form of
locking on syscall side to avoid racing I'd think. The lock cmd,
simply just does the write_once() and rcu sync to wait for everything
to complete, so this is pretty trivial w/o slowing down anything on
syscall side. I can take a look, but feels worse to me right now.
The other thing is, given RDONLY/WRONLY semantics are _only_ for fd
and _not_ map, mixing these might be confusing as well, since WRONCE
would then be a map property whereas RDONLY/WRONLY not.

> Storing dangling task_struct pointer in the next patch doesn't look great.

Hm, it's actually not dangling, on close we exchange it with NULL
(which current can never be) such that either root or only original
map creator may lock it as read-only.

> The whole 'lock' concept feels useful, but in the context of implementing
> .rodata it feels that WRONCE would be a better fit,
> since libbpf won't be able to make a mistake and forget to 'lock'.
> 'lock' syscall cmd can be confused with BPF_F_LOCK too.

That's a naming detail, but sure it can be changed. :)

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 1/9] bpf: implement lookup-free direct value access for maps
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 1/9] bpf: implement lookup-free direct value access for maps Daniel Borkmann
@ 2019-03-14 18:11   ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-03-14 18:11 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, Yonghong Song, Jakub Kicinski, tgraf, lmb

On Mon, Mar 11, 2019 at 2:51 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> This generic extension to BPF maps allows for directly loading an
> address residing inside a BPF map value as a single BPF ldimm64
> instruction.
>
> The idea is similar to what BPF_PSEUDO_MAP_FD does today, which
> is a special src_reg flag for ldimm64 instruction that indicates
> that inside the first part of the double insns's imm field is a
> file descriptor which the verifier then replaces as a full 64bit
> address of the map into both imm parts. For the newly added
> BPF_PSEUDO_MAP_VALUE src_reg flag, the idea is the following:
> the first part of the double insns's imm field is again a file
> descriptor corresponding to the map, and the second part of the
> imm field is an offset into the value. Both insns's off fields
> build the optional key resp. index to the map if it contains
> more than just one element. The verifier will then replace both
> imm parts with an address that points into the BPF map value
> for maps that support this operation. BPF_PSEUDO_MAP_VALUE is
> a distinct flag as otherwise with BPF_PSEUDO_MAP_FD we could
> not differ offset 0 between load of map pointer versus load of
> map's value at offset 0, and changing BPF_PSEUDO_MAP_FD's
> encoding into off by one to differ between regular map pointer
> and map value pointer would add unnecessary complexity and
> increases barrier for debuggability thus less suitable.
>
> This extension allows for efficiently retrieving an address to
> a map value memory area without having to issue a helper call
> which needs to prepare registers according to calling convention,
> etc, without needing the extra NULL test, and without having to
> add the offset in an additional instruction to the value base
> pointer. The verifier then treats the destination register as
> PTR_TO_MAP_VALUE with constant reg->off from the user passed
> offset from the second imm field, and guarantees that this is
> within bounds of the map value. Any subsequent operations are
> normally treated as typical map value handling without anything
> else needed for verification.
>
> The two map operations for direct value access have been added to
> array map for now. In future other types could be supported as
> well depending on the use case. The main use case for this commit
> is to allow for BPF loader support for global variables that
> reside in .data/.rodata/.bss sections such that we can directly
> load the address of them with minimal additional infrastructure
> required. Loader support has been added in subsequent commits for
> libbpf library.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Thanks! It looks really good, I'm just worried about imm -> idx + off
conversion. Please double-check.

> ---
>  include/linux/bpf.h               |   6 ++
>  include/linux/bpf_verifier.h      |   4 ++
>  include/uapi/linux/bpf.h          |  13 +++-
>  kernel/bpf/arraymap.c             |  29 ++++++++
>  kernel/bpf/core.c                 |   3 +-
>  kernel/bpf/disasm.c               |   5 +-
>  kernel/bpf/syscall.c              |  31 +++++++--
>  kernel/bpf/verifier.c             | 109 ++++++++++++++++++++++--------
>  tools/bpf/bpftool/xlated_dumper.c |   6 ++
>  9 files changed, 168 insertions(+), 38 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index a2132e09dc1c..85b6b5dc883f 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -57,6 +57,12 @@ struct bpf_map_ops {
>                              const struct btf *btf,
>                              const struct btf_type *key_type,
>                              const struct btf_type *value_type);
> +
> +       /* Direct value access helpers. */
> +       int (*map_direct_value_addr)(const struct bpf_map *map,
> +                                    u64 *imm, u32 idx, u32 off);
> +       int (*map_direct_value_meta)(const struct bpf_map *map,
> +                                    u64 imm, u32 *idx, u32 *off);
>  };
>
>  struct bpf_map {
> diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> index 69f7a3449eda..6e28f1c24710 100644
> --- a/include/linux/bpf_verifier.h
> +++ b/include/linux/bpf_verifier.h
> @@ -183,6 +183,10 @@ struct bpf_insn_aux_data {
>                 unsigned long map_state;        /* pointer/poison value for maps */
>                 s32 call_imm;                   /* saved imm field of call insn */
>                 u32 alu_limit;                  /* limit for add/sub register with pointer */
> +               struct {
> +                       u32 map_index;          /* index into used_maps[] */
> +                       u32 map_off;            /* offset from value base address */
> +               };
>         };
>         int ctx_field_size; /* the ctx field size for load insn, maybe 0 */
>         int sanitize_stack_off; /* stack slot to be cleared */
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 3c38ac9a92a7..d0b80fce0fc9 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -255,8 +255,19 @@ enum bpf_attach_type {
>   */
>  #define BPF_F_ANY_ALIGNMENT    (1U << 1)
>
> -/* when bpf_ldimm64->src_reg == BPF_PSEUDO_MAP_FD, bpf_ldimm64->imm == fd */
> +/* When BPF ldimm64's insn[0].src_reg != 0 then this can have
> + * two extensions:
> + *
> + * insn[0].src_reg:  BPF_PSEUDO_MAP_FD   BPF_PSEUDO_MAP_VALUE
> + * insn[0].imm:      map fd              map fd
> + * insn[1].imm:      0                   offset into value
> + * insn[0].off:      0                   32 bit index to the
> + * insn[1].off:      0                   map value

It would be good to also document here which off part is lower 16
bits, and which one is higher 16 bits.

> + * ldimm64 rewrite:  address of map      address of map[index]+offset
> + * verifier type:    CONST_PTR_TO_MAP    PTR_TO_MAP_VALUE
> + */
>  #define BPF_PSEUDO_MAP_FD      1
> +#define BPF_PSEUDO_MAP_VALUE   2
>
>  /* when bpf_call->src_reg == BPF_PSEUDO_CALL, bpf_call->imm == pc-relative
>   * offset to another bpf function
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index c72e0d8e1e65..862d20422ad1 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -160,6 +160,33 @@ static void *array_map_lookup_elem(struct bpf_map *map, void *key)
>         return array->value + array->elem_size * (index & array->index_mask);
>  }
>
> +static int array_map_direct_value_addr(const struct bpf_map *map, u64 *imm,
> +                                      u32 idx, u32 off)
> +{
> +       struct bpf_array *array = container_of(map, struct bpf_array, map);
> +
> +       if (idx >= map->max_entries || off >= map->value_size)
> +               return -EINVAL;
> +       *imm = (unsigned long)(array->value +

Is there anything wrong with using u64 here (and few other places below)?

> +                              array->elem_size * (idx & array->index_mask));
> +       return 0;
> +}
> +
> +static int array_map_direct_value_meta(const struct bpf_map *map, u64 imm,
> +                                      u32 *idx, u32 *off)
> +{
> +       struct bpf_array *array = container_of(map, struct bpf_array, map);
> +       u64 rem, base = (unsigned long)array->value, slot = map->value_size;

Should slot use map->elem_size instead of map->value_size?

> +       u64 range = slot * map->max_entries;
> +
> +       if (imm < base || imm >= base + range)
> +               return -ENOENT;
> +       base = imm - base;
> +       *idx = div64_u64_rem(base, slot, &rem);
> +       *off = rem;
> +       return 0;
> +}
> +
>  /* emit BPF instructions equivalent to C code of array_map_lookup_elem() */
>  static u32 array_map_gen_lookup(struct bpf_map *map, struct bpf_insn *insn_buf)
>  {
> @@ -419,6 +446,8 @@ const struct bpf_map_ops array_map_ops = {
>         .map_update_elem = array_map_update_elem,
>         .map_delete_elem = array_map_delete_elem,
>         .map_gen_lookup = array_map_gen_lookup,
> +       .map_direct_value_addr = array_map_direct_value_addr,
> +       .map_direct_value_meta = array_map_direct_value_meta,
>         .map_seq_show_elem = array_map_seq_show_elem,
>         .map_check_btf = array_map_check_btf,
>  };
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 3f08c257858e..af3dcd8b852b 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -292,7 +292,8 @@ int bpf_prog_calc_tag(struct bpf_prog *fp)
>                 dst[i] = fp->insnsi[i];
>                 if (!was_ld_map &&
>                     dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
> -                   dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
> +                   (dst[i].src_reg == BPF_PSEUDO_MAP_FD ||
> +                    dst[i].src_reg == BPF_PSEUDO_MAP_VALUE)) {
>                         was_ld_map = true;
>                         dst[i].imm = 0;
>                 } else if (was_ld_map &&
> diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
> index de73f55e42fd..d9ce383c0f9c 100644
> --- a/kernel/bpf/disasm.c
> +++ b/kernel/bpf/disasm.c
> @@ -205,10 +205,11 @@ void print_bpf_insn(const struct bpf_insn_cbs *cbs,
>                          * part of the ldimm64 insn is accessible.
>                          */
>                         u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
> -                       bool map_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD;
> +                       bool is_ptr = insn->src_reg == BPF_PSEUDO_MAP_FD ||
> +                                     insn->src_reg == BPF_PSEUDO_MAP_VALUE;
>                         char tmp[64];
>
> -                       if (map_ptr && !allow_ptr_leaks)
> +                       if (is_ptr && !allow_ptr_leaks)
>                                 imm = 0;
>
>                         verbose(cbs->private_data, "(%02x) r%d = %s\n",
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index bc34cf9fe9ee..b0c7a6485c49 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2061,13 +2061,27 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
>  }
>
>  static const struct bpf_map *bpf_map_from_imm(const struct bpf_prog *prog,
> -                                             unsigned long addr)
> +                                             unsigned long addr, u32 *idx,
> +                                             u32 *off, u32 *type)
>  {
> +       const struct bpf_map *map;
>         int i;
>
> -       for (i = 0; i < prog->aux->used_map_cnt; i++)
> -               if (prog->aux->used_maps[i] == (void *)addr)
> -                       return prog->aux->used_maps[i];
> +       *off = *idx = 0;
> +       for (i = 0; i < prog->aux->used_map_cnt; i++) {
> +               map = prog->aux->used_maps[i];
> +               if (map == (void *)addr) {
> +                       *type = BPF_PSEUDO_MAP_FD;
> +                       return map;
> +               }
> +               if (!map->ops->map_direct_value_meta)
> +                       continue;
> +               if (!map->ops->map_direct_value_meta(map, addr, idx, off)) {
> +                       *type = BPF_PSEUDO_MAP_VALUE;
> +                       return map;
> +               }
> +       }
> +
>         return NULL;
>  }
>
> @@ -2075,6 +2089,7 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
>  {
>         const struct bpf_map *map;
>         struct bpf_insn *insns;
> +       u32 idx, off, type;
>         u64 imm;
>         int i;
>
> @@ -2102,11 +2117,13 @@ static struct bpf_insn *bpf_insn_prepare_dump(const struct bpf_prog *prog)
>                         continue;
>
>                 imm = ((u64)insns[i + 1].imm << 32) | (u32)insns[i].imm;
> -               map = bpf_map_from_imm(prog, imm);
> +               map = bpf_map_from_imm(prog, imm, &idx, &off, &type);
>                 if (map) {
> -                       insns[i].src_reg = BPF_PSEUDO_MAP_FD;
> +                       insns[i].src_reg = type;
>                         insns[i].imm = map->id;
> -                       insns[i + 1].imm = 0;
> +                       insns[i].off = (u16)idx;
> +                       insns[i + 1].imm = off;
> +                       insns[i + 1].off = (u16)(idx >> 16);
>                         continue;
>                 }
>         }
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index ce166a002d16..57678cef9a2c 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4944,25 +4944,20 @@ static int check_cond_jmp_op(struct bpf_verifier_env *env,
>         return 0;
>  }
>
> -/* return the map pointer stored inside BPF_LD_IMM64 instruction */
> -static struct bpf_map *ld_imm64_to_map_ptr(struct bpf_insn *insn)
> -{
> -       u64 imm64 = ((u64) (u32) insn[0].imm) | ((u64) (u32) insn[1].imm) << 32;
> -
> -       return (struct bpf_map *) (unsigned long) imm64;
> -}
> -
>  /* verify BPF_LD_IMM64 instruction */
>  static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>  {
> +       struct bpf_insn_aux_data *aux = cur_aux(env);
>         struct bpf_reg_state *regs = cur_regs(env);
> +       struct bpf_map *map;
>         int err;
>
>         if (BPF_SIZE(insn->code) != BPF_DW) {
>                 verbose(env, "invalid BPF_LD_IMM insn\n");
>                 return -EINVAL;
>         }
> -       if (insn->off != 0) {
> +
> +       if (insn->src_reg != BPF_PSEUDO_MAP_VALUE && insn->off != 0) {
>                 verbose(env, "BPF_LD_IMM64 uses reserved fields\n");
>                 return -EINVAL;
>         }
> @@ -4979,11 +4974,22 @@ static int check_ld_imm(struct bpf_verifier_env *env, struct bpf_insn *insn)
>                 return 0;
>         }
>
> -       /* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
> -       BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
> +       map = env->used_maps[aux->map_index];
> +       mark_reg_known_zero(env, regs, insn->dst_reg);
> +       regs[insn->dst_reg].map_ptr = map;
> +
> +       if (insn->src_reg == BPF_PSEUDO_MAP_VALUE) {
> +               regs[insn->dst_reg].type = PTR_TO_MAP_VALUE;
> +               regs[insn->dst_reg].off = aux->map_off;
> +               if (map_value_has_spin_lock(map))
> +                       regs[insn->dst_reg].id = ++env->id_gen;
> +       } else if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
> +               regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
> +       } else {
> +               verbose(env, "bpf verifier is misconfigured\n");
> +               return -EINVAL;
> +       }
>
> -       regs[insn->dst_reg].type = CONST_PTR_TO_MAP;
> -       regs[insn->dst_reg].map_ptr = ld_imm64_to_map_ptr(insn);
>         return 0;
>  }
>
> @@ -6664,23 +6670,34 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                 }
>
>                 if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
> +                       struct bpf_insn_aux_data *aux;
>                         struct bpf_map *map;
>                         struct fd f;
> -
> -                       if (i == insn_cnt - 1 || insn[1].code != 0 ||
> -                           insn[1].dst_reg != 0 || insn[1].src_reg != 0 ||
> -                           insn[1].off != 0) {
> +                       u64 addr;
> +
> +                       if (i == insn_cnt - 1 ||
> +                           insn[1].code != 0 ||
> +                           insn[1].dst_reg != 0 ||
> +                           insn[1].src_reg != 0 ||
> +                           (insn[1].off != 0 &&
> +                            insn[0].src_reg != BPF_PSEUDO_MAP_VALUE)) {
>                                 verbose(env, "invalid bpf_ld_imm64 insn\n");
>                                 return -EINVAL;
>                         }
>
> -                       if (insn->src_reg == 0)
> +                       if (insn[0].src_reg == 0)
>                                 /* valid generic load 64-bit imm */
>                                 goto next_insn;
>
> -                       if (insn[0].src_reg != BPF_PSEUDO_MAP_FD ||
> -                           insn[1].imm != 0) {
> -                               verbose(env, "unrecognized bpf_ld_imm64 insn\n");
> +                       /* In final convert_pseudo_ld_imm64() step, this is
> +                        * converted into regular 64-bit imm load insn.
> +                        */
> +                       if ((insn[0].src_reg != BPF_PSEUDO_MAP_FD &&
> +                            insn[0].src_reg != BPF_PSEUDO_MAP_VALUE) ||
> +                           (insn[0].src_reg == BPF_PSEUDO_MAP_FD &&
> +                            insn[1].imm != 0)) {
> +                               verbose(env,
> +                                       "unrecognized bpf_ld_imm64 insn\n");
>                                 return -EINVAL;
>                         }
>
> @@ -6698,16 +6715,49 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                                 return err;
>                         }
>
> -                       /* store map pointer inside BPF_LD_IMM64 instruction */
> -                       insn[0].imm = (u32) (unsigned long) map;
> -                       insn[1].imm = ((u64) (unsigned long) map) >> 32;
> +                       aux = &env->insn_aux_data[i];
> +                       if (insn->src_reg == BPF_PSEUDO_MAP_FD) {
> +                               addr = (unsigned long)map;
> +                       } else {
> +                               u32 idx = ((u32)(u16)insn[0].off) |
> +                                         ((u32)(u16)insn[1].off) << 16;
> +                               u32 off = insn[1].imm;
> +
> +                               if (off >= BPF_MAX_VAR_OFF) {
> +                                       verbose(env, "direct value offset of %u is not allowed\n", off);
> +                                       fdput(f);
> +                                       return -EINVAL;
> +                               }
> +
> +                               if (!map->ops->map_direct_value_addr) {
> +                                       verbose(env, "no direct value access support for this map type\n");
> +                                       fdput(f);
> +                                       return -EINVAL;
> +                               }
> +
> +                               err = map->ops->map_direct_value_addr(map, &addr, idx, off);
> +                               if (err) {
> +                                       verbose(env, "invalid access to map value pointer, value_size=%u index=%u off=%u\n",
> +                                               map->value_size, idx, off);
> +                                       fdput(f);
> +                                       return err;
> +                               }
> +
> +                               aux->map_off = off;
> +                               addr += off;
> +                       }
> +
> +                       insn[0].imm = (u32)addr;
> +                       insn[1].imm = addr >> 32;
>
>                         /* check whether we recorded this map already */
> -                       for (j = 0; j < env->used_map_cnt; j++)
> +                       for (j = 0; j < env->used_map_cnt; j++) {
>                                 if (env->used_maps[j] == map) {
> +                                       aux->map_index = j;
>                                         fdput(f);
>                                         goto next_insn;
>                                 }
> +                       }
>
>                         if (env->used_map_cnt >= MAX_USED_MAPS) {
>                                 fdput(f);
> @@ -6724,6 +6774,8 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
>                                 fdput(f);
>                                 return PTR_ERR(map);
>                         }
> +
> +                       aux->map_index = env->used_map_cnt;
>                         env->used_maps[env->used_map_cnt++] = map;
>
>                         if (bpf_map_is_cgroup_storage(map) &&
> @@ -6778,9 +6830,12 @@ static void convert_pseudo_ld_imm64(struct bpf_verifier_env *env)
>         int insn_cnt = env->prog->len;
>         int i;
>
> -       for (i = 0; i < insn_cnt; i++, insn++)
> -               if (insn->code == (BPF_LD | BPF_IMM | BPF_DW))
> +       for (i = 0; i < insn_cnt; i++, insn++) {
> +               if (insn->code == (BPF_LD | BPF_IMM | BPF_DW)) {
>                         insn->src_reg = 0;
> +                       insn->off = (insn + 1)->off = 0;
> +               }
> +       }
>  }
>
>  /* single env->prog->insni[off] instruction was replaced with the range
> diff --git a/tools/bpf/bpftool/xlated_dumper.c b/tools/bpf/bpftool/xlated_dumper.c
> index 7073dbe1ff27..5391a9a70112 100644
> --- a/tools/bpf/bpftool/xlated_dumper.c
> +++ b/tools/bpf/bpftool/xlated_dumper.c
> @@ -195,6 +195,12 @@ static const char *print_imm(void *private_data,
>         if (insn->src_reg == BPF_PSEUDO_MAP_FD)
>                 snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>                          "map[id:%u]", insn->imm);
> +       else if (insn->src_reg == BPF_PSEUDO_MAP_VALUE)
> +               snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
> +                        "map[id:%u][%u]+%u", insn->imm,
> +                        ((__u32)(__u16)insn[0].off) |
> +                        ((__u32)(__u16)insn[1].off) << 16,
> +                        (insn + 1)->imm);
>         else
>                 snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
>                          "0x%llx", (unsigned long long)full_imm);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support for maps
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support " Daniel Borkmann
  2019-03-11 23:06   ` Alexei Starovoitov
@ 2019-03-14 19:26   ` Andrii Nakryiko
  1 sibling, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-03-14 19:26 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, Yonghong Song, Jakub Kicinski, tgraf, lmb

On Mon, Mar 11, 2019 at 2:51 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> This work adds two new map creation flags BPF_F_RDONLY_PROG
> and BPF_F_WRONLY_PROG in order to allow for read-only or
> write-only BPF maps from a BPF program side.
>
> Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
> applies to system call side, meaning the BPF program has full
> read/write access to the map as usual while bpf(2) calls with
> map fd can either only read or write into the map depending
> on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
> for the exact opposite such that verifier is going to reject
> program loads if write into a read-only map or a read into a
> write-only map is detected. For read-only map case also some
> helpers are forbidden for programs that would alter the map
> state such as map deletion, update, etc.
>
> We've enabled this generic map extension to various non-special
> maps holding normal user data: array, hash, lru, lpm, local
> storage, queue and stack. Further map types could be followed
> up in future depending on use-case. Main use case here is to
> forbid writes into .rodata map values from verifier side.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  include/linux/bpf.h           | 24 ++++++++++++++++++
>  include/uapi/linux/bpf.h      | 10 +++++++-
>  kernel/bpf/arraymap.c         |  3 ++-
>  kernel/bpf/hashtab.c          |  6 ++---
>  kernel/bpf/local_storage.c    |  6 ++---
>  kernel/bpf/lpm_trie.c         |  3 ++-
>  kernel/bpf/queue_stack_maps.c |  6 ++---
>  kernel/bpf/syscall.c          |  2 ++
>  kernel/bpf/verifier.c         | 46 +++++++++++++++++++++++++++++++++--
>  9 files changed, 92 insertions(+), 14 deletions(-)
>
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 85b6b5dc883f..bb80c78924b0 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -427,6 +427,30 @@ struct bpf_array {
>         };
>  };
>
> +#define BPF_MAP_CAN_READ       BIT(0)
> +#define BPF_MAP_CAN_WRITE      BIT(1)
> +
> +static inline u32 bpf_map_flags_to_cap(struct bpf_map *map)
> +{
> +       u32 access_flags = map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
> +
> +       /* Combination of BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG is
> +        * not possible.
> +        */
> +       if (access_flags & BPF_F_RDONLY_PROG)
> +               return BPF_MAP_CAN_READ;
> +       else if (access_flags & BPF_F_WRONLY_PROG)
> +               return BPF_MAP_CAN_WRITE;
> +       else
> +               return BPF_MAP_CAN_READ | BPF_MAP_CAN_WRITE;
> +}
> +
> +static inline bool bpf_map_flags_access_ok(u32 access_flags)
> +{
> +       return (access_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG)) !=
> +              (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG);
> +}
> +
>  #define MAX_TAIL_CALL_CNT 32
>
>  struct bpf_event_entry {
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index d0b80fce0fc9..e64fd9862e68 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -294,7 +294,7 @@ enum bpf_attach_type {
>
>  #define BPF_OBJ_NAME_LEN 16U
>
> -/* Flags for accessing BPF object */
> +/* Flags for accessing BPF object from syscall side. */
>  #define BPF_F_RDONLY           (1U << 3)
>  #define BPF_F_WRONLY           (1U << 4)
>
> @@ -304,6 +304,14 @@ enum bpf_attach_type {
>  /* Zero-initialize hash function seed. This should only be used for testing. */
>  #define BPF_F_ZERO_SEED                (1U << 6)
>
> +/* Flags for accessing BPF object from program side. */
> +#define BPF_F_RDONLY_PROG      (1U << 7)
> +#define BPF_F_WRONLY_PROG      (1U << 8)
> +#define BPF_F_ACCESS_MASK      (BPF_F_RDONLY |         \
> +                                BPF_F_RDONLY_PROG |    \
> +                                BPF_F_WRONLY |         \
> +                                BPF_F_WRONLY_PROG)
> +
>  /* flags for BPF_PROG_QUERY */
>  #define BPF_F_QUERY_EFFECTIVE  (1U << 0)
>
> diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> index 862d20422ad1..6d2ce06485ae 100644
> --- a/kernel/bpf/arraymap.c
> +++ b/kernel/bpf/arraymap.c
> @@ -22,7 +22,7 @@
>  #include "map_in_map.h"
>
>  #define ARRAY_CREATE_FLAG_MASK \
> -       (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
> +       (BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
>
>  static void bpf_array_free_percpu(struct bpf_array *array)
>  {
> @@ -63,6 +63,7 @@ int array_map_alloc_check(union bpf_attr *attr)
>         if (attr->max_entries == 0 || attr->key_size != 4 ||
>             attr->value_size == 0 ||
>             attr->map_flags & ~ARRAY_CREATE_FLAG_MASK ||
> +           !bpf_map_flags_access_ok(attr->map_flags) ||
>             (percpu && numa_node != NUMA_NO_NODE))
>                 return -EINVAL;
>
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index fed15cf94dca..192d32e77db3 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -23,7 +23,7 @@
>
>  #define HTAB_CREATE_FLAG_MASK                                          \
>         (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE |    \
> -        BPF_F_RDONLY | BPF_F_WRONLY | BPF_F_ZERO_SEED)
> +        BPF_F_ACCESS_MASK | BPF_F_ZERO_SEED)
>
>  struct bucket {
>         struct hlist_nulls_head head;
> @@ -262,8 +262,8 @@ static int htab_map_alloc_check(union bpf_attr *attr)
>                 /* Guard against local DoS, and discourage production use. */
>                 return -EPERM;
>
> -       if (attr->map_flags & ~HTAB_CREATE_FLAG_MASK)
> -               /* reserved bits should not be used */
> +       if (attr->map_flags & ~HTAB_CREATE_FLAG_MASK ||
> +           !bpf_map_flags_access_ok(attr->map_flags))
>                 return -EINVAL;
>
>         if (!lru && percpu_lru)
> diff --git a/kernel/bpf/local_storage.c b/kernel/bpf/local_storage.c
> index 6b572e2de7fb..980e8f1f6cb5 100644
> --- a/kernel/bpf/local_storage.c
> +++ b/kernel/bpf/local_storage.c
> @@ -14,7 +14,7 @@ DEFINE_PER_CPU(struct bpf_cgroup_storage*, bpf_cgroup_storage[MAX_BPF_CGROUP_STO
>  #ifdef CONFIG_CGROUP_BPF
>
>  #define LOCAL_STORAGE_CREATE_FLAG_MASK                                 \
> -       (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
> +       (BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
>
>  struct bpf_cgroup_storage_map {
>         struct bpf_map map;
> @@ -282,8 +282,8 @@ static struct bpf_map *cgroup_storage_map_alloc(union bpf_attr *attr)
>         if (attr->value_size > PAGE_SIZE)
>                 return ERR_PTR(-E2BIG);
>
> -       if (attr->map_flags & ~LOCAL_STORAGE_CREATE_FLAG_MASK)
> -               /* reserved bits should not be used */
> +       if (attr->map_flags & ~LOCAL_STORAGE_CREATE_FLAG_MASK ||
> +           !bpf_map_flags_access_ok(attr->map_flags))
>                 return ERR_PTR(-EINVAL);
>
>         if (attr->max_entries)
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 93a5cbbde421..e61630c2e50b 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -538,7 +538,7 @@ static int trie_delete_elem(struct bpf_map *map, void *_key)
>  #define LPM_KEY_SIZE_MIN       LPM_KEY_SIZE(LPM_DATA_SIZE_MIN)
>
>  #define LPM_CREATE_FLAG_MASK   (BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE |  \
> -                                BPF_F_RDONLY | BPF_F_WRONLY)
> +                                BPF_F_ACCESS_MASK)
>
>  static struct bpf_map *trie_alloc(union bpf_attr *attr)
>  {
> @@ -553,6 +553,7 @@ static struct bpf_map *trie_alloc(union bpf_attr *attr)
>         if (attr->max_entries == 0 ||
>             !(attr->map_flags & BPF_F_NO_PREALLOC) ||
>             attr->map_flags & ~LPM_CREATE_FLAG_MASK ||
> +           !bpf_map_flags_access_ok(attr->map_flags) ||
>             attr->key_size < LPM_KEY_SIZE_MIN ||
>             attr->key_size > LPM_KEY_SIZE_MAX ||
>             attr->value_size < LPM_VAL_SIZE_MIN ||
> diff --git a/kernel/bpf/queue_stack_maps.c b/kernel/bpf/queue_stack_maps.c
> index b384ea9f3254..0b140d236889 100644
> --- a/kernel/bpf/queue_stack_maps.c
> +++ b/kernel/bpf/queue_stack_maps.c
> @@ -11,8 +11,7 @@
>  #include "percpu_freelist.h"
>
>  #define QUEUE_STACK_CREATE_FLAG_MASK \
> -       (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
> -
> +       (BPF_F_NUMA_NODE | BPF_F_ACCESS_MASK)
>
>  struct bpf_queue_stack {
>         struct bpf_map map;
> @@ -52,7 +51,8 @@ static int queue_stack_map_alloc_check(union bpf_attr *attr)
>         /* check sanity of attributes */
>         if (attr->max_entries == 0 || attr->key_size != 0 ||
>             attr->value_size == 0 ||
> -           attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK)
> +           attr->map_flags & ~QUEUE_STACK_CREATE_FLAG_MASK ||
> +           !bpf_map_flags_access_ok(attr->map_flags))
>                 return -EINVAL;
>
>         if (attr->value_size > KMALLOC_MAX_SIZE)
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index b0c7a6485c49..ba2fe4cfad09 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -481,6 +481,8 @@ static int map_check_btf(struct bpf_map *map, const struct btf *btf,
>         map->spin_lock_off = btf_find_spin_lock(btf, value_type);
>
>         if (map_value_has_spin_lock(map)) {
> +               if (map->map_flags & BPF_F_RDONLY_PROG)
> +                       return -EACCES;

Do we need to enforce this restriction? This would make sense if we
were enforcing that any element that has spinlock inside has to have a
lock taken, do we do that in verifier?

>                 if (map->map_type != BPF_MAP_TYPE_HASH &&
>                     map->map_type != BPF_MAP_TYPE_ARRAY &&
>                     map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE)
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 57678cef9a2c..af3cddb18efb 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1429,6 +1429,28 @@ static int check_stack_access(struct bpf_verifier_env *env,
>         return 0;
>  }
>
> +static int check_map_access_type(struct bpf_verifier_env *env, u32 regno,
> +                                int off, int size, enum bpf_access_type type)
> +{
> +       struct bpf_reg_state *regs = cur_regs(env);
> +       struct bpf_map *map = regs[regno].map_ptr;
> +       u32 cap = bpf_map_flags_to_cap(map);
> +
> +       if (type == BPF_WRITE && !(cap & BPF_MAP_CAN_WRITE)) {
> +               verbose(env, "write into map forbidden, value_size=%d off=%d size=%d\n",
> +                       map->value_size, off, size);
> +               return -EACCES;
> +       }
> +
> +       if (type == BPF_READ && !(cap & BPF_MAP_CAN_READ)) {
> +               verbose(env, "read into map forbidden, value_size=%d off=%d size=%d\n",

typo: "read from"?

> +                       map->value_size, off, size);
> +               return -EACCES;
> +       }
> +
> +       return 0;
> +}
> +
>  /* check read/write into map element returned by bpf_map_lookup_elem() */
>  static int __check_map_access(struct bpf_verifier_env *env, u32 regno, int off,
>                               int size, bool zero_size_allowed)
> @@ -2014,7 +2036,9 @@ static int check_mem_access(struct bpf_verifier_env *env, int insn_idx, u32 regn
>                         verbose(env, "R%d leaks addr into map\n", value_regno);
>                         return -EACCES;
>                 }
> -
> +               err = check_map_access_type(env, regno, off, size, t);
> +               if (err)
> +                       return err;
>                 err = check_map_access(env, regno, off, size, false);
>                 if (!err && t == BPF_READ && value_regno >= 0)
>                         mark_reg_unknown(env, regs, value_regno);
> @@ -2250,6 +2274,10 @@ static int check_helper_mem_access(struct bpf_verifier_env *env, int regno,
>                 return check_packet_access(env, regno, reg->off, access_size,
>                                            zero_size_allowed);
>         case PTR_TO_MAP_VALUE:
> +               if (check_map_access_type(env, regno, reg->off, access_size,
> +                                         meta && meta->raw_mode ? BPF_WRITE :
> +                                         BPF_READ))
> +                       return -EACCES;
>                 return check_map_access(env, regno, reg->off, access_size,
>                                         zero_size_allowed);
>         default: /* scalar_value|ptr_to_stack or invalid ptr */
> @@ -2971,6 +2999,7 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
>                 int func_id, int insn_idx)
>  {
>         struct bpf_insn_aux_data *aux = &env->insn_aux_data[insn_idx];
> +       struct bpf_map *map = meta->map_ptr;
>
>         if (func_id != BPF_FUNC_tail_call &&
>             func_id != BPF_FUNC_map_lookup_elem &&
> @@ -2981,11 +3010,24 @@ record_func_map(struct bpf_verifier_env *env, struct bpf_call_arg_meta *meta,
>             func_id != BPF_FUNC_map_peek_elem)
>                 return 0;
>
> -       if (meta->map_ptr == NULL) {
> +       if (map == NULL) {
>                 verbose(env, "kernel subsystem misconfigured verifier\n");
>                 return -EINVAL;
>         }
>
> +       /* In case of read-only, some additional restrictions
> +        * need to be applied in order to prevent altering the
> +        * state of the map from program side.
> +        */
> +       if ((map->map_flags & BPF_F_RDONLY_PROG) &&
> +           (func_id == BPF_FUNC_map_delete_elem ||
> +            func_id == BPF_FUNC_map_update_elem ||
> +            func_id == BPF_FUNC_map_push_elem ||
> +            func_id == BPF_FUNC_map_pop_elem)) {

Curious, what about tail_calls? Is it considered a read? Is this
checked as well?

> +               verbose(env, "write into map forbidden\n");
> +               return -EACCES;
> +       }
> +
>         if (!BPF_MAP_PTR(aux->map_state))
>                 bpf_map_ptr_store(aux, meta->map_ptr,
>                                   meta->map_ptr->unpriv_array);
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support for maps
  2019-03-11 23:06   ` Alexei Starovoitov
  2019-03-11 23:34     ` Daniel Borkmann
@ 2019-03-14 19:27     ` Andrii Nakryiko
  1 sibling, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-03-14 19:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Alexei Starovoitov, bpf, Networking,
	Joe Stringer, john fastabend, Yonghong Song, Jakub Kicinski,
	tgraf, lmb

On Mon, Mar 11, 2019 at 4:06 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Mar 11, 2019 at 10:51:18PM +0100, Daniel Borkmann wrote:
> > This work adds two new map creation flags BPF_F_RDONLY_PROG
> > and BPF_F_WRONLY_PROG in order to allow for read-only or
> > write-only BPF maps from a BPF program side.
> >
> > Today we have BPF_F_RDONLY and BPF_F_WRONLY, but this only
> > applies to system call side, meaning the BPF program has full
> > read/write access to the map as usual while bpf(2) calls with
> > map fd can either only read or write into the map depending
> > on the flags. BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG allows
> > for the exact opposite such that verifier is going to reject
> > program loads if write into a read-only map or a read into a
> > write-only map is detected. For read-only map case also some
> > helpers are forbidden for programs that would alter the map
> > state such as map deletion, update, etc.
> >
> > We've enabled this generic map extension to various non-special
> > maps holding normal user data: array, hash, lru, lpm, local
> > storage, queue and stack. Further map types could be followed
> > up in future depending on use-case. Main use case here is to
> > forbid writes into .rodata map values from verifier side.
>
> I think WRONLY | WRONLY_PROG should be invalid combination?

Just curious, what about the cases of special arrays (e.g., at least
PROG_ARRAY). Is doing tail call a read from that map?

> Since these attributes are set at creation time and cannot be changed,
> nothing can ever read from it, so why write into it?
> Similarly RDONLY | RDONLY_PROG is invalid too?
>
> Also looking at the next patch and 'lock' command...
> May be it would be cleaner to do add WRONCE (from syscall) flag?
> Then for .rodata the attrs will be RDONLY_PROG | WRONCE
> and no 'lock' necessary.
> WRONCE_PROG probably doesn't make sense.
> Storing dangling task_struct pointer in the next patch doesn't look great.
> The whole 'lock' concept feels useful, but in the context of implementing
> .rodata it feels that WRONCE would be a better fit,
> since libbpf won't be able to make a mistake and forget to 'lock'.
> 'lock' syscall cmd can be confused with BPF_F_LOCK too.
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 4/9] bpf, obj: allow . char as part of the name
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 4/9] bpf, obj: allow . char as part of the name Daniel Borkmann
@ 2019-03-14 19:40   ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-03-14 19:40 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, Yonghong Song, Jakub Kicinski, tgraf, lmb

On Mon, Mar 11, 2019 at 2:51 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Trivial addition to allow '.' aside from '_' as "special" characters
> in the object name. Used to allow for substrings in maps from loader
> side such as ".bss", ".data", ".rodata", but could also be useful for
> other purposes.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Acked-by: Andrii Nakryiko <andriin@fb.com>

> ---
>  kernel/bpf/syscall.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index b5ba138351e1..04279747c092 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -456,10 +456,10 @@ static int bpf_obj_name_cpy(char *dst, const char *src)
>         const char *end = src + BPF_OBJ_NAME_LEN;
>
>         memset(dst, 0, BPF_OBJ_NAME_LEN);
> -
> -       /* Copy all isalnum() and '_' char */
> +       /* Copy all isalnum(), '_' and '.' chars. */
>         while (src < end && *src) {
> -               if (!isalnum(*src) && *src != '_')
> +               if (!isalnum(*src) &&
> +                   *src != '_' && *src != '.')
>                         return -EINVAL;
>                 *dst++ = *src++;
>         }
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 6/9] bpf, libbpf: refactor relocation handling
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 6/9] bpf, libbpf: refactor relocation handling Daniel Borkmann
@ 2019-03-14 21:05   ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-03-14 21:05 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, Yonghong Song, Jakub Kicinski, tgraf, lmb

On Mon, Mar 11, 2019 at 2:51 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> From: Joe Stringer <joe@wand.net.nz>
>
> Adjust the code for relocations slightly with no functional changes,
> so that upcoming patches that will introduce support for relocations
> into the .data, .rodata and .bss sections can be added independent
> of these changes.
>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

lgtm

Acked-by: Andrii Nakryiko <andriin@fb.com>

> ---
>  tools/lib/bpf/libbpf.c | 62 ++++++++++++++++++++++--------------------
>  1 file changed, 32 insertions(+), 30 deletions(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index f5eb60379c8d..0afdb8914386 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -851,20 +851,20 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>                                 obj->efile.symbols = data;
>                                 obj->efile.strtabidx = sh.sh_link;
>                         }
> -               } else if ((sh.sh_type == SHT_PROGBITS) &&
> -                          (sh.sh_flags & SHF_EXECINSTR) &&
> -                          (data->d_size > 0)) {
> -                       if (strcmp(name, ".text") == 0)
> -                               obj->efile.text_shndx = idx;
> -                       err = bpf_object__add_program(obj, data->d_buf,
> -                                                     data->d_size, name, idx);
> -                       if (err) {
> -                               char errmsg[STRERR_BUFSIZE];
> -                               char *cp = libbpf_strerror_r(-err, errmsg,
> -                                                            sizeof(errmsg));
> -
> -                               pr_warning("failed to alloc program %s (%s): %s",
> -                                          name, obj->path, cp);
> +               } else if (sh.sh_type == SHT_PROGBITS && data->d_size > 0) {
> +                       if (sh.sh_flags & SHF_EXECINSTR) {
> +                               if (strcmp(name, ".text") == 0)
> +                                       obj->efile.text_shndx = idx;
> +                               err = bpf_object__add_program(obj, data->d_buf,
> +                                                             data->d_size, name, idx);
> +                               if (err) {
> +                                       char errmsg[STRERR_BUFSIZE];
> +                                       char *cp = libbpf_strerror_r(-err, errmsg,
> +                                                                    sizeof(errmsg));
> +
> +                                       pr_warning("failed to alloc program %s (%s): %s",
> +                                                  name, obj->path, cp);
> +                               }
>                         }
>                 } else if (sh.sh_type == SHT_REL) {
>                         void *reloc = obj->efile.reloc;
> @@ -1026,24 +1026,26 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                         return -LIBBPF_ERRNO__RELOC;
>                 }
>
> -               /* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
> -               for (map_idx = 0; map_idx < nr_maps; map_idx++) {
> -                       if (maps[map_idx].offset == sym.st_value) {
> -                               pr_debug("relocation: find map %zd (%s) for insn %u\n",
> -                                        map_idx, maps[map_idx].name, insn_idx);
> -                               break;
> +               if (sym.st_shndx == maps_shndx) {
> +                       /* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
> +                       for (map_idx = 0; map_idx < nr_maps; map_idx++) {
> +                               if (maps[map_idx].offset == sym.st_value) {
> +                                       pr_debug("relocation: find map %zd (%s) for insn %u\n",
> +                                                map_idx, maps[map_idx].name, insn_idx);
> +                                       break;
> +                               }
>                         }
> -               }
>
> -               if (map_idx >= nr_maps) {
> -                       pr_warning("bpf relocation: map_idx %d large than %d\n",
> -                                  (int)map_idx, (int)nr_maps - 1);
> -                       return -LIBBPF_ERRNO__RELOC;
> -               }
> +                       if (map_idx >= nr_maps) {
> +                               pr_warning("bpf relocation: map_idx %d large than %d\n",
> +                                          (int)map_idx, (int)nr_maps - 1);
> +                               return -LIBBPF_ERRNO__RELOC;
> +                       }
>
> -               prog->reloc_desc[i].type = RELO_LD64;
> -               prog->reloc_desc[i].insn_idx = insn_idx;
> -               prog->reloc_desc[i].map_idx = map_idx;
> +                       prog->reloc_desc[i].type = RELO_LD64;
> +                       prog->reloc_desc[i].insn_idx = insn_idx;
> +                       prog->reloc_desc[i].map_idx = map_idx;
> +               }
>         }
>         return 0;
>  }
> @@ -1405,7 +1407,7 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>                         }
>                         insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
>                         insns[insn_idx].imm = obj->maps[map_idx].fd;
> -               } else {
> +               } else if (prog->reloc_desc[i].type == RELO_CALL) {
>                         err = bpf_program__reloc_text(prog, obj,
>                                                       &prog->reloc_desc[i]);
>                         if (err)
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 7/9] bpf, libbpf: support global data/bss/rodata sections
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 7/9] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
@ 2019-03-14 22:14   ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-03-14 22:14 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, Yonghong Song, Jakub Kicinski, tgraf, lmb

On Mon, Mar 11, 2019 at 2:51 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> This work adds BPF loader support for global data sections
> to libbpf. This allows to write BPF programs in more natural
> C-like way by being able to define global variables and const
> data.
>
> Back at LPC 2018 [0] we presented a first prototype which
> implemented support for global data sections by extending BPF
> syscall where union bpf_attr would get additional memory/size
> pair for each section passed during prog load in order to later
> add this base address into the ldimm64 instruction along with
> the user provided offset when accessing a variable. Consensus
> from LPC was that for proper upstream support, it would be
> more desirable to use maps instead of bpf_attr extension as
> this would allow for introspection of these sections as well
> as potential life updates of their content. This work follows

typo: live

> this path by taking the following steps from loader side:
>
>  1) In bpf_object__elf_collect() step we pick up ".data",
>     ".rodata", and ".bss" section information.
>
>  2) If present, in bpf_object__init_internal_map() we add
>     maps to the obj's map array that corresponds to each
>     of the present sections. Given section size and access
>     properties can differ, a single entry array map is
>     created with value size that is corresponding to the
>     ELF section size of .data, .bss or .rodata. These
>     internal maps are integrated into the normal map
>     handling of libbpf such that when user traverses all
>     obj maps, they can be differentiated from user-created
>     ones via bpf_map__is_internal(). In later steps when
>     we actually create these maps in the kernel via
>     bpf_object__create_maps(), then for .data and .rodata
>     sections their content is copied into the map through
>     bpf_map_update_elem(). For .bss this is not necessary
>     since array map is already zero-initialized by default.
>     Additionally, for .rodata the map is locked as read-only
>     after setup, such that neither from program nor syscall
>     side writes would be possible.
>
>  3) In bpf_program__collect_reloc() step, we record the
>     corresponding map, insn index, and relocation type for
>     the global data.
>
>  4) And last but not least in the actual relocation step in
>     bpf_program__relocate(), we mark the ldimm64 instruction
>     with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
>     imm field the map's file descriptor is stored as similarly
>     done as in BPF_PSEUDO_MAP_FD, and in the second imm field
>     (as ldimm64 is 2-insn wide) we store the access offset
>     into the section. Given these maps have only single element
>     ldimm64's off remains zero in both parts.
>
>  5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
>     load will then store the actual target address in order
>     to have a 'map-lookup'-free access. That is, the actual
>     map value base address + offset. The destination register
>     in the verifier will then be marked as PTR_TO_MAP_VALUE,
>     containing the fixed offset as reg->off and backing BPF
>     map as reg->map_ptr. Meaning, it's treated as any other
>     normal map value from verification side, only with
>     efficient, direct value access instead of actual call to
>     map lookup helper as in the typical case.
>
> Currently, only support for static global variables has been
> added, and libbpf rejects non-static global variables from
> loading. This can be lifted until we have proper semantics
> for how BPF will treat these.
>
> From BTF side, libbpf associates these three maps with BTF
> map name of ".bss", ".data" and ".rodata" which LLVM will
> emit (w/o the object name prefix).
>
> Simple example dump of program using globals vars in each
> section:
>
>   # bpftool prog
>   [...]
>   6784: sched_cls  name load_static_dat  tag a7e1291567277844  gpl
>         loaded_at 2019-03-11T15:39:34+0000  uid 0
>         xlated 1776B  jited 993B  memlock 4096B  map_ids 2238,2237,2235,2236,2239,2240
>
>   # bpftool map show id 2237
>   2237: array  name test_glo.bss  flags 0x0
>         key 4B  value 64B  max_entries 1  memlock 4096B
>   # bpftool map show id 2235
>   2235: array  name test_glo.data  flags 0x0
>         key 4B  value 64B  max_entries 1  memlock 4096B
>   # bpftool map show id 2236
>   2236: array  name test_glo.rodata  flags 0x80
>         key 4B  value 96B  max_entries 1  memlock 4096B
>
>   # bpftool prog dump xlated id 6784
>   int load_static_data(struct __sk_buff * skb):
>   ; int load_static_data(struct __sk_buff *skb)
>      0: (b7) r6 = 0
>   ; test_reloc(number, 0, &num0);
>      1: (63) *(u32 *)(r10 -4) = r6
>      2: (bf) r2 = r10
>   ; int load_static_data(struct __sk_buff *skb)
>      3: (07) r2 += -4
>   ; test_reloc(number, 0, &num0);
>      4: (18) r1 = map[id:2238]
>      6: (18) r3 = map[id:2237][0]+0    <-- direct addr in .bss area
>      8: (b7) r4 = 0
>      9: (85) call array_map_update_elem#100464
>     10: (b7) r1 = 1
>   ; test_reloc(number, 1, &num1);
>   [...]
>   ; test_reloc(string, 2, str2);
>    120: (18) r8 = map[id:2237][0]+16   <-- same here at offset +16
>    122: (18) r1 = map[id:2239]
>    124: (18) r3 = map[id:2237][0]+16
>    126: (b7) r4 = 0
>    127: (85) call array_map_update_elem#100464
>    128: (b7) r1 = 120
>   ; str1[5] = 'x';
>    129: (73) *(u8 *)(r9 +5) = r1
>   ; test_reloc(string, 3, str1);
>    130: (b7) r1 = 3
>    131: (63) *(u32 *)(r10 -4) = r1
>    132: (b7) r9 = 3
>    133: (bf) r2 = r10
>   ; int load_static_data(struct __sk_buff *skb)
>    134: (07) r2 += -4
>   ; test_reloc(string, 3, str1);
>    135: (18) r1 = map[id:2239]
>    137: (18) r3 = map[id:2235][0]+16   <-- direct addr in .data area
>    139: (b7) r4 = 0
>    140: (85) call array_map_update_elem#100464
>    141: (b7) r1 = 111
>   ; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
>    142: (73) *(u8 *)(r8 +6) = r1       <-- further access based on .bss data
>    143: (b7) r1 = 108
>    144: (73) *(u8 *)(r8 +5) = r1
>   [...]
>
> Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't
> fail for static variables").
>
>   [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
>       http://vger.kernel.org/lpc-bpf2018.html#session-3
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>


This looks great, thanks! Pending on whatever you and Alexei decide
for bpf_map_lock:

Acked-by: Andrii Nakryiko <andriin@fb.com>

> ---
>  tools/lib/bpf/bpf.c      |  10 ++
>  tools/lib/bpf/bpf.h      |   1 +
>  tools/lib/bpf/libbpf.c   | 324 ++++++++++++++++++++++++++++++++++-----
>  tools/lib/bpf/libbpf.h   |   1 +
>  tools/lib/bpf/libbpf.map |   6 +
>  5 files changed, 301 insertions(+), 41 deletions(-)
>
> diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> index 9cd015574e83..cba2a615e135 100644
> --- a/tools/lib/bpf/bpf.c
> +++ b/tools/lib/bpf/bpf.c
> @@ -429,6 +429,16 @@ int bpf_map_get_next_key(int fd, const void *key, void *next_key)
>         return sys_bpf(BPF_MAP_GET_NEXT_KEY, &attr, sizeof(attr));
>  }
>
> +int bpf_map_lock(int fd)
> +{
> +       union bpf_attr attr;
> +
> +       memset(&attr, 0, sizeof(attr));
> +       attr.map_fd = fd;
> +
> +       return sys_bpf(BPF_MAP_LOCK, &attr, sizeof(attr));
> +}
> +
>  int bpf_obj_pin(int fd, const char *pathname)
>  {
>         union bpf_attr attr;
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 6ffdd79bea89..fa2bdbba6f00 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -117,6 +117,7 @@ LIBBPF_API int bpf_map_lookup_and_delete_elem(int fd, const void *key,
>                                               void *value);
>  LIBBPF_API int bpf_map_delete_elem(int fd, const void *key);
>  LIBBPF_API int bpf_map_get_next_key(int fd, const void *key, void *next_key);
> +LIBBPF_API int bpf_map_lock(int fd);
>  LIBBPF_API int bpf_obj_pin(int fd, const char *pathname);
>  LIBBPF_API int bpf_obj_get(const char *pathname);
>  LIBBPF_API int bpf_prog_attach(int prog_fd, int attachable_fd,
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 0afdb8914386..7821c9b1e838 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -7,6 +7,7 @@
>   * Copyright (C) 2015 Wang Nan <wangnan0@huawei.com>
>   * Copyright (C) 2015 Huawei Inc.
>   * Copyright (C) 2017 Nicira, Inc.
> + * Copyright (C) 2019 Isovalent, Inc.
>   */
>
>  #ifndef _GNU_SOURCE
> @@ -139,6 +140,7 @@ struct bpf_program {
>                 enum {
>                         RELO_LD64,
>                         RELO_CALL,
> +                       RELO_DATA,
>                 } type;
>                 int insn_idx;
>                 union {
> @@ -171,6 +173,19 @@ struct bpf_program {
>         __u32 line_info_cnt;
>  };
>
> +enum libbpf_map_type {
> +       LIBBPF_MAP_UNSPEC,
> +       LIBBPF_MAP_DATA,
> +       LIBBPF_MAP_BSS,
> +       LIBBPF_MAP_RODATA,
> +};
> +
> +static const char *libbpf_type_to_btf_name[] = {
> +       [LIBBPF_MAP_DATA]       = ".data",
> +       [LIBBPF_MAP_BSS]        = ".bss",
> +       [LIBBPF_MAP_RODATA]     = ".rodata",
> +};
> +
>  struct bpf_map {
>         int fd;
>         char *name;
> @@ -182,11 +197,18 @@ struct bpf_map {
>         __u32 btf_value_type_id;
>         void *priv;
>         bpf_map_clear_priv_t clear_priv;
> +       enum libbpf_map_type libbpf_type;
> +};
> +
> +struct bpf_secdata {
> +       void *rodata;
> +       void *data;
>  };
>
>  static LIST_HEAD(bpf_objects_list);
>
>  struct bpf_object {
> +       char name[BPF_OBJ_NAME_LEN];
>         char license[64];
>         __u32 kern_version;
>
> @@ -194,6 +216,7 @@ struct bpf_object {
>         size_t nr_programs;
>         struct bpf_map *maps;
>         size_t nr_maps;
> +       struct bpf_secdata sections;
>
>         bool loaded;
>         bool has_pseudo_calls;
> @@ -209,6 +232,9 @@ struct bpf_object {
>                 Elf *elf;
>                 GElf_Ehdr ehdr;
>                 Elf_Data *symbols;
> +               Elf_Data *data;
> +               Elf_Data *rodata;
> +               Elf_Data *bss;
>                 size_t strtabidx;
>                 struct {
>                         GElf_Shdr shdr;
> @@ -217,6 +243,9 @@ struct bpf_object {
>                 int nr_reloc;
>                 int maps_shndx;
>                 int text_shndx;
> +               int data_shndx;
> +               int rodata_shndx;
> +               int bss_shndx;
>         } efile;
>         /*
>          * All loaded bpf_object is linked in a list, which is
> @@ -438,6 +467,7 @@ static struct bpf_object *bpf_object__new(const char *path,
>                                           size_t obj_buf_sz)
>  {
>         struct bpf_object *obj;
> +       char *end;
>
>         obj = calloc(1, sizeof(struct bpf_object) + strlen(path) + 1);
>         if (!obj) {
> @@ -446,8 +476,14 @@ static struct bpf_object *bpf_object__new(const char *path,
>         }
>
>         strcpy(obj->path, path);
> -       obj->efile.fd = -1;
> +       /* Using basename() GNU version which doesn't modify arg. */
> +       strncpy(obj->name, basename((void *)path),
> +               sizeof(obj->name) - 1);
> +       end = strchr(obj->name, '.');
> +       if (end)
> +               *end = 0;
>
> +       obj->efile.fd = -1;
>         /*
>          * Caller of this function should also calls
>          * bpf_object__elf_finish() after data collection to return
> @@ -457,6 +493,9 @@ static struct bpf_object *bpf_object__new(const char *path,
>         obj->efile.obj_buf = obj_buf;
>         obj->efile.obj_buf_sz = obj_buf_sz;
>         obj->efile.maps_shndx = -1;
> +       obj->efile.data_shndx = -1;
> +       obj->efile.rodata_shndx = -1;
> +       obj->efile.bss_shndx = -1;
>
>         obj->loaded = false;
>
> @@ -475,6 +514,9 @@ static void bpf_object__elf_finish(struct bpf_object *obj)
>                 obj->efile.elf = NULL;
>         }
>         obj->efile.symbols = NULL;
> +       obj->efile.data = NULL;
> +       obj->efile.rodata = NULL;
> +       obj->efile.bss = NULL;
>
>         zfree(&obj->efile.reloc);
>         obj->efile.nr_reloc = 0;
> @@ -616,27 +658,76 @@ static bool bpf_map_type__is_map_in_map(enum bpf_map_type type)
>         return false;
>  }
>
> +static bool bpf_object__has_maps(const struct bpf_object *obj)
> +{
> +       return obj->efile.maps_shndx >= 0 ||
> +              obj->efile.data_shndx >= 0 ||
> +              obj->efile.rodata_shndx >= 0 ||
> +              obj->efile.bss_shndx >= 0;
> +}
> +
> +static int
> +bpf_object__init_internal_map(struct bpf_object *obj, struct bpf_map *map,
> +                             enum libbpf_map_type type, Elf_Data *data,
> +                             void **data_buff)
> +{
> +       struct bpf_map_def *def = &map->def;
> +       char map_name[BPF_OBJ_NAME_LEN];
> +
> +       map->libbpf_type = type;
> +       map->offset = ~(typeof(map->offset))0;
> +       snprintf(map_name, sizeof(map_name), "%.8s%.7s", obj->name,
> +                libbpf_type_to_btf_name[type]);
> +       map->name = strdup(map_name);
> +       if (!map->name) {
> +               pr_warning("failed to alloc map name\n");
> +               return -ENOMEM;
> +       }
> +
> +       def->type = BPF_MAP_TYPE_ARRAY;
> +       def->key_size = sizeof(int);
> +       def->value_size = data->d_size;
> +       def->max_entries = 1;
> +       def->map_flags = type == LIBBPF_MAP_RODATA ?
> +                        BPF_F_RDONLY_PROG : 0;
> +       if (data_buff) {
> +               *data_buff = malloc(data->d_size);
> +               if (!*data_buff) {
> +                       zfree(&map->name);
> +                       pr_warning("failed to alloc map content buffer\n");
> +                       return -ENOMEM;
> +               }
> +               memcpy(*data_buff, data->d_buf, data->d_size);
> +       }
> +
> +       pr_debug("map %ld is \"%s\"\n", map - obj->maps, map->name);
> +       return 0;
> +}
> +
>  static int
>  bpf_object__init_maps(struct bpf_object *obj, int flags)
>  {
> +       int i, map_idx, map_def_sz, nr_syms, nr_maps = 0, nr_maps_glob = 0;
>         bool strict = !(flags & MAPS_RELAX_COMPAT);
> -       int i, map_idx, map_def_sz, nr_maps = 0;
> -       Elf_Scn *scn;
> -       Elf_Data *data;
>         Elf_Data *symbols = obj->efile.symbols;
> +       Elf_Data *data = NULL;
> +       int ret = 0;
>
> -       if (obj->efile.maps_shndx < 0)
> -               return -EINVAL;
>         if (!symbols)
>                 return -EINVAL;
> +       nr_syms = symbols->d_size / sizeof(GElf_Sym);
>
> -       scn = elf_getscn(obj->efile.elf, obj->efile.maps_shndx);
> -       if (scn)
> -               data = elf_getdata(scn, NULL);
> -       if (!scn || !data) {
> -               pr_warning("failed to get Elf_Data from map section %d\n",
> -                          obj->efile.maps_shndx);
> -               return -EINVAL;
> +       if (obj->efile.maps_shndx >= 0) {
> +               Elf_Scn *scn = elf_getscn(obj->efile.elf,
> +                                         obj->efile.maps_shndx);
> +
> +               if (scn)
> +                       data = elf_getdata(scn, NULL);
> +               if (!scn || !data) {
> +                       pr_warning("failed to get Elf_Data from map section %d\n",
> +                                  obj->efile.maps_shndx);
> +                       return -EINVAL;
> +               }
>         }
>
>         /*
> @@ -646,7 +737,13 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
>          *
>          * TODO: Detect array of map and report error.
>          */
> -       for (i = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
> +       if (obj->efile.data_shndx >= 0)
> +               nr_maps_glob++;
> +       if (obj->efile.rodata_shndx >= 0)
> +               nr_maps_glob++;
> +       if (obj->efile.bss_shndx >= 0)
> +               nr_maps_glob++;
> +       for (i = 0; data && i < nr_syms; i++) {
>                 GElf_Sym sym;
>
>                 if (!gelf_getsym(symbols, i, &sym))
> @@ -659,19 +756,21 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
>         /* Alloc obj->maps and fill nr_maps. */
>         pr_debug("maps in %s: %d maps in %zd bytes\n", obj->path,
>                  nr_maps, data->d_size);
> -
> -       if (!nr_maps)
> +       if (!nr_maps && !nr_maps_glob)
>                 return 0;
>
>         /* Assume equally sized map definitions */
> -       map_def_sz = data->d_size / nr_maps;
> -       if (!data->d_size || (data->d_size % nr_maps) != 0) {
> -               pr_warning("unable to determine map definition size "
> -                          "section %s, %d maps in %zd bytes\n",
> -                          obj->path, nr_maps, data->d_size);
> -               return -EINVAL;
> +       if (data) {
> +               map_def_sz = data->d_size / nr_maps;
> +               if (!data->d_size || (data->d_size % nr_maps) != 0) {
> +                       pr_warning("unable to determine map definition size "
> +                                  "section %s, %d maps in %zd bytes\n",
> +                                  obj->path, nr_maps, data->d_size);
> +                       return -EINVAL;
> +               }
>         }
>
> +       nr_maps += nr_maps_glob;
>         obj->maps = calloc(nr_maps, sizeof(obj->maps[0]));
>         if (!obj->maps) {
>                 pr_warning("alloc maps for object failed\n");
> @@ -692,7 +791,7 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
>         /*
>          * Fill obj->maps using data in "maps" section.
>          */
> -       for (i = 0, map_idx = 0; i < symbols->d_size / sizeof(GElf_Sym); i++) {
> +       for (i = 0, map_idx = 0; data && i < nr_syms; i++) {
>                 GElf_Sym sym;
>                 const char *map_name;
>                 struct bpf_map_def *def;
> @@ -705,6 +804,8 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
>                 map_name = elf_strptr(obj->efile.elf,
>                                       obj->efile.strtabidx,
>                                       sym.st_name);
> +
> +               obj->maps[map_idx].libbpf_type = LIBBPF_MAP_UNSPEC;
>                 obj->maps[map_idx].offset = sym.st_value;
>                 if (sym.st_value + map_def_sz > data->d_size) {
>                         pr_warning("corrupted maps section in %s: last map \"%s\" too small\n",
> @@ -753,8 +854,27 @@ bpf_object__init_maps(struct bpf_object *obj, int flags)
>                 map_idx++;
>         }
>
> -       qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]), compare_bpf_map);
> -       return 0;
> +       /*
> +        * Populate rest of obj->maps with libbpf internal maps.
> +        */
> +       if (obj->efile.data_shndx >= 0)
> +               ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
> +                                                   LIBBPF_MAP_DATA,
> +                                                   obj->efile.data,
> +                                                   &obj->sections.data);
> +       if (!ret && obj->efile.rodata_shndx >= 0)
> +               ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
> +                                                   LIBBPF_MAP_RODATA,
> +                                                   obj->efile.rodata,
> +                                                   &obj->sections.rodata);
> +       if (!ret && obj->efile.bss_shndx >= 0)
> +               ret = bpf_object__init_internal_map(obj, &obj->maps[map_idx++],
> +                                                   LIBBPF_MAP_BSS,
> +                                                   obj->efile.bss, NULL);
> +       if (!ret)
> +               qsort(obj->maps, obj->nr_maps, sizeof(obj->maps[0]),
> +                     compare_bpf_map);
> +       return ret;
>  }
>
>  static bool section_have_execinstr(struct bpf_object *obj, int idx)
> @@ -865,6 +985,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>                                         pr_warning("failed to alloc program %s (%s): %s",
>                                                    name, obj->path, cp);
>                                 }
> +                       } else if (strcmp(name, ".data") == 0) {
> +                               obj->efile.data = data;
> +                               obj->efile.data_shndx = idx;
> +                       } else if (strcmp(name, ".rodata") == 0) {
> +                               obj->efile.rodata = data;
> +                               obj->efile.rodata_shndx = idx;
> +                       } else {
> +                               pr_debug("skip section(%d) %s\n", idx, name);
>                         }
>                 } else if (sh.sh_type == SHT_REL) {
>                         void *reloc = obj->efile.reloc;
> @@ -892,6 +1020,9 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>                                 obj->efile.reloc[n].shdr = sh;
>                                 obj->efile.reloc[n].data = data;
>                         }
> +               } else if (sh.sh_type == SHT_NOBITS && strcmp(name, ".bss") == 0) {
> +                       obj->efile.bss = data;
> +                       obj->efile.bss_shndx = idx;
>                 } else {
>                         pr_debug("skip section(%d) %s\n", idx, name);
>                 }
> @@ -918,7 +1049,7 @@ static int bpf_object__elf_collect(struct bpf_object *obj, int flags)
>                         }
>                 }
>         }
> -       if (obj->efile.maps_shndx >= 0) {
> +       if (bpf_object__has_maps(obj)) {
>                 err = bpf_object__init_maps(obj, flags);
>                 if (err)
>                         goto out;
> @@ -954,13 +1085,46 @@ bpf_object__find_program_by_title(struct bpf_object *obj, const char *title)
>         return NULL;
>  }
>
> +static bool bpf_object__shndx_is_data(const struct bpf_object *obj,
> +                                     int shndx)
> +{
> +       return shndx == obj->efile.data_shndx ||
> +              shndx == obj->efile.bss_shndx ||
> +              shndx == obj->efile.rodata_shndx;
> +}
> +
> +static bool bpf_object__shndx_is_maps(const struct bpf_object *obj,
> +                                     int shndx)
> +{
> +       return shndx == obj->efile.maps_shndx;
> +}
> +
> +static bool bpf_object__relo_in_known_section(const struct bpf_object *obj,
> +                                             int shndx)
> +{
> +       return shndx == obj->efile.text_shndx ||
> +              bpf_object__shndx_is_maps(obj, shndx) ||
> +              bpf_object__shndx_is_data(obj, shndx);
> +}
> +
> +static enum libbpf_map_type
> +bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
> +{
> +       if (shndx == obj->efile.data_shndx)
> +               return LIBBPF_MAP_DATA;
> +       else if (shndx == obj->efile.bss_shndx)
> +               return LIBBPF_MAP_BSS;
> +       else if (shndx == obj->efile.rodata_shndx)
> +               return LIBBPF_MAP_RODATA;
> +       else
> +               return LIBBPF_MAP_UNSPEC;
> +}
> +
>  static int
>  bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                            Elf_Data *data, struct bpf_object *obj)
>  {
>         Elf_Data *symbols = obj->efile.symbols;
> -       int text_shndx = obj->efile.text_shndx;
> -       int maps_shndx = obj->efile.maps_shndx;
>         struct bpf_map *maps = obj->maps;
>         size_t nr_maps = obj->nr_maps;
>         int i, nrels;
> @@ -980,7 +1144,10 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                 GElf_Sym sym;
>                 GElf_Rel rel;
>                 unsigned int insn_idx;
> +               unsigned int shdr_idx;
>                 struct bpf_insn *insns = prog->insns;
> +               enum libbpf_map_type type;
> +               const char *name;
>                 size_t map_idx;
>
>                 if (!gelf_getrel(data, i, &rel)) {
> @@ -995,13 +1162,18 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                                    GELF_R_SYM(rel.r_info));
>                         return -LIBBPF_ERRNO__FORMAT;
>                 }
> -               pr_debug("relo for %lld value %lld name %d\n",
> +
> +               name = elf_strptr(obj->efile.elf, obj->efile.strtabidx,
> +                                 sym.st_name) ? : "<?>";
> +
> +               pr_debug("relo for %lld value %lld name %d (\'%s\')\n",
>                          (long long) (rel.r_info >> 32),
> -                        (long long) sym.st_value, sym.st_name);
> +                        (long long) sym.st_value, sym.st_name, name);
>
> -               if (sym.st_shndx != maps_shndx && sym.st_shndx != text_shndx) {
> -                       pr_warning("Program '%s' contains non-map related relo data pointing to section %u\n",
> -                                  prog->section_name, sym.st_shndx);
> +               shdr_idx = sym.st_shndx;
> +               if (!bpf_object__relo_in_known_section(obj, shdr_idx)) {
> +                       pr_warning("Program '%s' contains unrecognized relo data pointing to section %u\n",
> +                                  prog->section_name, shdr_idx);
>                         return -LIBBPF_ERRNO__RELOC;
>                 }
>
> @@ -1026,10 +1198,22 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                         return -LIBBPF_ERRNO__RELOC;
>                 }
>
> -               if (sym.st_shndx == maps_shndx) {
> -                       /* TODO: 'maps' is sorted. We can use bsearch to make it faster. */
> +               if (bpf_object__shndx_is_maps(obj, shdr_idx) ||
> +                   bpf_object__shndx_is_data(obj, shdr_idx)) {
> +                       type = bpf_object__section_to_libbpf_map_type(obj, shdr_idx);
> +                       if (type != LIBBPF_MAP_UNSPEC &&
> +                           GELF_ST_BIND(sym.st_info) == STB_GLOBAL) {
> +                               pr_warning("bpf: relocation: not yet supported relo for non-static global \'%s\' variable found in insns[%d].code 0x%x\n",
> +                                          name, insn_idx, insns[insn_idx].code);
> +                               return -LIBBPF_ERRNO__RELOC;
> +                       }
> +
>                         for (map_idx = 0; map_idx < nr_maps; map_idx++) {
> -                               if (maps[map_idx].offset == sym.st_value) {
> +                               if (maps[map_idx].libbpf_type != type)
> +                                       continue;
> +                               if (type != LIBBPF_MAP_UNSPEC ||
> +                                   (type == LIBBPF_MAP_UNSPEC &&
> +                                    maps[map_idx].offset == sym.st_value)) {
>                                         pr_debug("relocation: find map %zd (%s) for insn %u\n",
>                                                  map_idx, maps[map_idx].name, insn_idx);
>                                         break;
> @@ -1042,7 +1226,8 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>                                 return -LIBBPF_ERRNO__RELOC;
>                         }
>
> -                       prog->reloc_desc[i].type = RELO_LD64;
> +                       prog->reloc_desc[i].type = type != LIBBPF_MAP_UNSPEC ?
> +                                                  RELO_DATA : RELO_LD64;
>                         prog->reloc_desc[i].insn_idx = insn_idx;
>                         prog->reloc_desc[i].map_idx = map_idx;
>                 }
> @@ -1050,13 +1235,25 @@ bpf_program__collect_reloc(struct bpf_program *prog, GElf_Shdr *shdr,
>         return 0;
>  }
>
> +static const char *bpf_map___btf_name(struct bpf_map *map)
> +{
> +       if (!bpf_map__is_internal(map))
> +               return map->name;
> +       /*
> +        * LLVM annotates global data differently in BTF, that is,
> +        * only as '.data', '.bss' or '.rodata'.
> +        */
> +       return libbpf_type_to_btf_name[map->libbpf_type];
> +}
> +
>  static int bpf_map_find_btf_info(struct bpf_map *map, const struct btf *btf)
>  {
> +       const char *name = bpf_map___btf_name(map);
>         struct bpf_map_def *def = &map->def;
>         __u32 key_type_id, value_type_id;
>         int ret;
>
> -       ret = btf__get_map_kv_tids(btf, map->name, def->key_size,
> +       ret = btf__get_map_kv_tids(btf, name, def->key_size,
>                                    def->value_size, &key_type_id,
>                                    &value_type_id);
>         if (ret)
> @@ -1175,6 +1372,25 @@ bpf_object__probe_caps(struct bpf_object *obj)
>         return bpf_object__probe_name(obj);
>  }
>
> +static int
> +bpf_object__populate_internal_map(struct bpf_object *obj, struct bpf_map *map)
> +{
> +       int err, zero = 0;
> +       __u8 *data;
> +
> +       /* Nothing to do here since kernel already zero-initializes .bss map. */
> +       if (map->libbpf_type == LIBBPF_MAP_BSS)
> +               return 0;
> +
> +       data = map->libbpf_type == LIBBPF_MAP_DATA ?
> +              obj->sections.data : obj->sections.rodata;
> +       err = bpf_map_update_elem(map->fd, &zero, data, 0);
> +       /* Lock .rodata map as read-only from syscall side. */
> +       if (!err && map->libbpf_type == LIBBPF_MAP_RODATA)
> +               err = bpf_map_lock(map->fd);
> +       return err;
> +}
> +
>  static int
>  bpf_object__create_maps(struct bpf_object *obj)
>  {
> @@ -1232,6 +1448,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>                         size_t j;
>
>                         err = *pfd;
> +err_out:
>                         cp = libbpf_strerror_r(errno, errmsg, sizeof(errmsg));
>                         pr_warning("failed to create map (name: '%s'): %s\n",
>                                    map->name, cp);
> @@ -1239,6 +1456,15 @@ bpf_object__create_maps(struct bpf_object *obj)
>                                 zclose(obj->maps[j].fd);
>                         return err;
>                 }
> +
> +               if (bpf_map__is_internal(map)) {
> +                       err = bpf_object__populate_internal_map(obj, map);
> +                       if (err < 0) {
> +                               zclose(*pfd);
> +                               goto err_out;
> +                       }
> +               }
> +
>                 pr_debug("create map %s: fd=%d\n", map->name, *pfd);
>         }
>
> @@ -1393,19 +1619,27 @@ bpf_program__relocate(struct bpf_program *prog, struct bpf_object *obj)
>                 return 0;
>
>         for (i = 0; i < prog->nr_reloc; i++) {
> -               if (prog->reloc_desc[i].type == RELO_LD64) {
> +               if (prog->reloc_desc[i].type == RELO_LD64 ||
> +                   prog->reloc_desc[i].type == RELO_DATA) {
> +                       bool relo_data = prog->reloc_desc[i].type == RELO_DATA;
>                         struct bpf_insn *insns = prog->insns;
>                         int insn_idx, map_idx;
>
>                         insn_idx = prog->reloc_desc[i].insn_idx;
>                         map_idx = prog->reloc_desc[i].map_idx;
>
> -                       if (insn_idx >= (int)prog->insns_cnt) {
> +                       if (insn_idx + 1 >= (int)prog->insns_cnt) {
>                                 pr_warning("relocation out of range: '%s'\n",
>                                            prog->section_name);
>                                 return -LIBBPF_ERRNO__RELOC;
>                         }
> -                       insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
> +
> +                       if (!relo_data) {
> +                               insns[insn_idx].src_reg = BPF_PSEUDO_MAP_FD;
> +                       } else {
> +                               insns[insn_idx].src_reg = BPF_PSEUDO_MAP_VALUE;
> +                               insns[insn_idx + 1].imm = insns[insn_idx].imm;
> +                       }
>                         insns[insn_idx].imm = obj->maps[map_idx].fd;
>                 } else if (prog->reloc_desc[i].type == RELO_CALL) {
>                         err = bpf_program__reloc_text(prog, obj,
> @@ -2291,6 +2525,9 @@ void bpf_object__close(struct bpf_object *obj)
>                 obj->maps[i].priv = NULL;
>                 obj->maps[i].clear_priv = NULL;
>         }
> +
> +       zfree(&obj->sections.rodata);
> +       zfree(&obj->sections.data);
>         zfree(&obj->maps);
>         obj->nr_maps = 0;
>
> @@ -2768,6 +3005,11 @@ bool bpf_map__is_offload_neutral(struct bpf_map *map)
>         return map->def.type == BPF_MAP_TYPE_PERF_EVENT_ARRAY;
>  }
>
> +bool bpf_map__is_internal(struct bpf_map *map)
> +{
> +       return map->libbpf_type != LIBBPF_MAP_UNSPEC;
> +}
> +
>  void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex)
>  {
>         map->map_ifindex = ifindex;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index b4652aa1a58a..6e162da8e9f9 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -300,6 +300,7 @@ LIBBPF_API void *bpf_map__priv(struct bpf_map *map);
>  LIBBPF_API int bpf_map__reuse_fd(struct bpf_map *map, int fd);
>  LIBBPF_API int bpf_map__resize(struct bpf_map *map, __u32 max_entries);
>  LIBBPF_API bool bpf_map__is_offload_neutral(struct bpf_map *map);
> +LIBBPF_API bool bpf_map__is_internal(struct bpf_map *map);
>  LIBBPF_API void bpf_map__set_ifindex(struct bpf_map *map, __u32 ifindex);
>  LIBBPF_API int bpf_map__pin(struct bpf_map *map, const char *path);
>  LIBBPF_API int bpf_map__unpin(struct bpf_map *map, const char *path);
> diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
> index 778a26702a70..3f493f3520c9 100644
> --- a/tools/lib/bpf/libbpf.map
> +++ b/tools/lib/bpf/libbpf.map
> @@ -154,3 +154,9 @@ LIBBPF_0.0.2 {
>                 xsk_umem__fd;
>                 xsk_socket__fd;
>  } LIBBPF_0.0.1;
> +
> +LIBBPF_0.0.3 {
> +       global:
> +               bpf_map__is_internal;
> +               bpf_map_lock;
> +} LIBBPF_0.0.2;
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 8/9] bpf, selftest: test {rd,wr}only flags and direct value access
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 8/9] bpf, selftest: test {rd,wr}only flags and direct value access Daniel Borkmann
@ 2019-03-19 18:18   ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-03-19 18:18 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, Yonghong Song, Jakub Kicinski, tgraf, lmb

On Mon, Mar 11, 2019 at 2:51 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Extend test_verifier with various test cases around the two kernel
> extensions, that is, {rd,wr}only map support as well as direct map
> value access. All passing, one skipped due to xskmap not present
> on test machine:

It would be good to have few tests with value size < sizeof(long) to
catch bugs with not rounding up value size internally.

>
>   # ./test_verifier
>   [...]
>   #920/p XDP pkt read, pkt_data <= pkt_meta', bad access 1 OK
>   #921/p XDP pkt read, pkt_data <= pkt_meta', bad access 2 OK
>   Summary: 1366 PASSED, 1 SKIPPED, 0 FAILED
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  tools/include/linux/filter.h                  |  14 ++
>  tools/testing/selftests/bpf/test_verifier.c   |  42 +++-
>  .../selftests/bpf/verifier/array_access.c     | 159 ++++++++++++
>  .../bpf/verifier/direct_value_access.c        | 226 ++++++++++++++++++
>  4 files changed, 436 insertions(+), 5 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/verifier/direct_value_access.c
>
> diff --git a/tools/include/linux/filter.h b/tools/include/linux/filter.h
> index cce0b02c0e28..d288576e0bcd 100644
> --- a/tools/include/linux/filter.h
> +++ b/tools/include/linux/filter.h
> @@ -283,6 +283,20 @@
>  #define BPF_LD_MAP_FD(DST, MAP_FD)                             \
>         BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
>
> +#define BPF_LD_MAP_VALUE(DST, MAP_FD, VALUE_IDX, VALUE_OFF)    \
> +       ((struct bpf_insn) {                                    \
> +               .code  = BPF_LD | BPF_DW | BPF_IMM,             \
> +               .dst_reg = DST,                                 \
> +               .src_reg = BPF_PSEUDO_MAP_VALUE,                \
> +               .off   = (__u16)(VALUE_IDX),                    \
> +               .imm   = MAP_FD }),                             \
> +       ((struct bpf_insn) {                                    \
> +               .code  = 0, /* zero is reserved opcode */       \
> +               .dst_reg = 0,                                   \
> +               .src_reg = 0,                                   \
> +               .off   = ((__u32)(VALUE_IDX)) >> 16,            \
> +               .imm   = VALUE_OFF })
> +
>  /* Relative call */
>
>  #define BPF_CALL_REL(TGT)                                      \
> diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
> index 477a9dcf9fff..7ef1991a5295 100644
> --- a/tools/testing/selftests/bpf/test_verifier.c
> +++ b/tools/testing/selftests/bpf/test_verifier.c
> @@ -51,7 +51,7 @@
>
>  #define MAX_INSNS      BPF_MAXINSNS
>  #define MAX_FIXUPS     8
> -#define MAX_NR_MAPS    14
> +#define MAX_NR_MAPS    16
>  #define MAX_TEST_RUNS  8
>  #define POINTER_VALUE  0xcafe4all
>  #define TEST_DATA_LEN  64
> @@ -80,6 +80,8 @@ struct bpf_test {
>         int fixup_cgroup_storage[MAX_FIXUPS];
>         int fixup_percpu_cgroup_storage[MAX_FIXUPS];
>         int fixup_map_spin_lock[MAX_FIXUPS];
> +       int fixup_map_array_ro[MAX_FIXUPS];
> +       int fixup_map_array_wo[MAX_FIXUPS];
>         const char *errstr;
>         const char *errstr_unpriv;
>         uint32_t retval, retval_unpriv, insn_processed;
> @@ -277,13 +279,15 @@ static bool skip_unsupported_map(enum bpf_map_type map_type)
>         return false;
>  }
>
> -static int create_map(uint32_t type, uint32_t size_key,
> -                     uint32_t size_value, uint32_t max_elem)
> +static int __create_map(uint32_t type, uint32_t size_key,
> +                       uint32_t size_value, uint32_t max_elem,
> +                       uint32_t extra_flags)
>  {
>         int fd;
>
>         fd = bpf_create_map(type, size_key, size_value, max_elem,
> -                           type == BPF_MAP_TYPE_HASH ? BPF_F_NO_PREALLOC : 0);
> +                           (type == BPF_MAP_TYPE_HASH ?
> +                            BPF_F_NO_PREALLOC : 0) | extra_flags);
>         if (fd < 0) {
>                 if (skip_unsupported_map(type))
>                         return -1;
> @@ -293,6 +297,12 @@ static int create_map(uint32_t type, uint32_t size_key,
>         return fd;
>  }
>
> +static int create_map(uint32_t type, uint32_t size_key,
> +                     uint32_t size_value, uint32_t max_elem)
> +{
> +       return __create_map(type, size_key, size_value, max_elem, 0);
> +}
> +
>  static void update_map(int fd, int index)
>  {
>         struct test_val value = {
> @@ -519,6 +529,8 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
>         int *fixup_cgroup_storage = test->fixup_cgroup_storage;
>         int *fixup_percpu_cgroup_storage = test->fixup_percpu_cgroup_storage;
>         int *fixup_map_spin_lock = test->fixup_map_spin_lock;
> +       int *fixup_map_array_ro = test->fixup_map_array_ro;
> +       int *fixup_map_array_wo = test->fixup_map_array_wo;
>
>         if (test->fill_helper)
>                 test->fill_helper(test);
> @@ -556,7 +568,7 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
>
>         if (*fixup_map_array_48b) {
>                 map_fds[3] = create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
> -                                       sizeof(struct test_val), 1);
> +                                       sizeof(struct test_val), 2);
>                 update_map(map_fds[3], 0);
>                 do {
>                         prog[*fixup_map_array_48b].imm = map_fds[3];
> @@ -642,6 +654,26 @@ static void do_test_fixup(struct bpf_test *test, enum bpf_prog_type prog_type,
>                         fixup_map_spin_lock++;
>                 } while (*fixup_map_spin_lock);
>         }
> +       if (*fixup_map_array_ro) {
> +               map_fds[14] = __create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
> +                                          sizeof(struct test_val), 1,
> +                                          BPF_F_RDONLY_PROG);
> +               update_map(map_fds[14], 0);
> +               do {
> +                       prog[*fixup_map_array_ro].imm = map_fds[14];
> +                       fixup_map_array_ro++;
> +               } while (*fixup_map_array_ro);
> +       }
> +       if (*fixup_map_array_wo) {
> +               map_fds[15] = __create_map(BPF_MAP_TYPE_ARRAY, sizeof(int),
> +                                          sizeof(struct test_val), 1,
> +                                          BPF_F_WRONLY_PROG);
> +               update_map(map_fds[15], 0);
> +               do {
> +                       prog[*fixup_map_array_wo].imm = map_fds[15];
> +                       fixup_map_array_wo++;
> +               } while (*fixup_map_array_wo);
> +       }
>  }
>
>  static int set_admin(bool admin)
> diff --git a/tools/testing/selftests/bpf/verifier/array_access.c b/tools/testing/selftests/bpf/verifier/array_access.c
> index 0dcecaf3ec6f..9a2b6f9b4414 100644
> --- a/tools/testing/selftests/bpf/verifier/array_access.c
> +++ b/tools/testing/selftests/bpf/verifier/array_access.c
> @@ -217,3 +217,162 @@
>         .result = REJECT,
>         .flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
>  },
> +{
> +       "valid read map access into a read-only array 1",
> +       .insns = {
> +       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
> +       BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_ro = { 3 },
> +       .result = ACCEPT,
> +       .retval = 28,
> +},
> +{
> +       "valid read map access into a read-only array 2",
> +       .insns = {
> +       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
> +
> +       BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> +       BPF_MOV64_IMM(BPF_REG_2, 4),
> +       BPF_MOV64_IMM(BPF_REG_3, 0),
> +       BPF_MOV64_IMM(BPF_REG_4, 0),
> +       BPF_MOV64_IMM(BPF_REG_5, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +                    BPF_FUNC_csum_diff),
> +       BPF_EXIT_INSN(),
> +       },
> +       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> +       .fixup_map_array_ro = { 3 },
> +       .result = ACCEPT,
> +       .retval = -29,
> +},
> +{
> +       "invalid write map access into a read-only array 1",
> +       .insns = {
> +       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_ro = { 3 },
> +       .result = REJECT,
> +       .errstr = "write into map forbidden",
> +},
> +{
> +       "invalid write map access into a read-only array 2",
> +       .insns = {
> +       BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
> +       BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
> +       BPF_MOV64_IMM(BPF_REG_2, 0),
> +       BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
> +       BPF_MOV64_IMM(BPF_REG_4, 8),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +                    BPF_FUNC_skb_load_bytes),
> +       BPF_EXIT_INSN(),
> +       },
> +       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> +       .fixup_map_array_ro = { 4 },
> +       .result = REJECT,
> +       .errstr = "write into map forbidden",
> +},
> +{
> +       "valid write map access into a write-only array 1",
> +       .insns = {
> +       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 42),
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_wo = { 3 },
> +       .result = ACCEPT,
> +       .retval = 1,
> +},
> +{
> +       "valid write map access into a write-only array 2",
> +       .insns = {
> +       BPF_MOV64_REG(BPF_REG_6, BPF_REG_1),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5),
> +       BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
> +       BPF_MOV64_IMM(BPF_REG_2, 0),
> +       BPF_MOV64_REG(BPF_REG_3, BPF_REG_0),
> +       BPF_MOV64_IMM(BPF_REG_4, 8),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +                    BPF_FUNC_skb_load_bytes),
> +       BPF_EXIT_INSN(),
> +       },
> +       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> +       .fixup_map_array_wo = { 4 },
> +       .result = ACCEPT,
> +       .retval = 0,
> +},
> +{
> +       "invalid read map access into a write-only array 1",
> +       .insns = {
> +       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1),
> +       BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_0, 0),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_wo = { 3 },
> +       .result = REJECT,
> +       .errstr = "read into map forbidden",
> +},
> +{
> +       "invalid read map access into a write-only array 2",
> +       .insns = {
> +       BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +       BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
> +       BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
> +       BPF_LD_MAP_FD(BPF_REG_1, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
> +       BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 6),
> +
> +       BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
> +       BPF_MOV64_IMM(BPF_REG_2, 4),
> +       BPF_MOV64_IMM(BPF_REG_3, 0),
> +       BPF_MOV64_IMM(BPF_REG_4, 0),
> +       BPF_MOV64_IMM(BPF_REG_5, 0),
> +       BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
> +                    BPF_FUNC_csum_diff),
> +       BPF_EXIT_INSN(),
> +       },
> +       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
> +       .fixup_map_array_wo = { 3 },
> +       .result = REJECT,
> +       .errstr = "read into map forbidden",
> +},
> diff --git a/tools/testing/selftests/bpf/verifier/direct_value_access.c b/tools/testing/selftests/bpf/verifier/direct_value_access.c
> new file mode 100644
> index 000000000000..656c3675b735
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/verifier/direct_value_access.c
> @@ -0,0 +1,226 @@
> +{
> +       "direct map access, write test 1",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 0),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = ACCEPT,
> +       .retval = 1,
> +},
> +{
> +       "direct map access, write test 2",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 8),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = ACCEPT,
> +       .retval = 1,
> +},
> +{
> +       "direct map access, write test 3",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 8),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 8, 4242),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = ACCEPT,
> +       .retval = 1,
> +},
> +{
> +       "direct map access, write test 4",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 40),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = ACCEPT,
> +       .retval = 1,
> +},
> +{
> +       "direct map access, write test 5",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 32),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 8, 4242),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = ACCEPT,
> +       .retval = 1,
> +},
> +{
> +       "direct map access, write test 6",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 40),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 4, 4242),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = REJECT,
> +       .errstr = "R1 min value is outside of the array range",
> +},
> +{
> +       "direct map access, write test 7",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, -1),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 4, 4242),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = REJECT,
> +       .errstr = "direct value offset of 4294967295 is not allowed",
> +},
> +{
> +       "direct map access, write test 8",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 1),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, -1, 4242),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = ACCEPT,
> +       .retval = 1,
> +},
> +{
> +       "direct map access, write test 9",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 48),
> +       BPF_ST_MEM(BPF_DW, BPF_REG_1, 0, 4242),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = REJECT,
> +       .errstr = "invalid access to map value pointer",
> +},
> +{
> +       "direct map access, write test 10",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 47),
> +       BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = ACCEPT,
> +       .retval = 1,
> +},
> +{
> +       "direct map access, write test 11",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 48),
> +       BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = REJECT,
> +       .errstr = "invalid access to map value pointer",
> +},
> +{
> +       "direct map access, write test 12",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, (1<<29)),
> +       BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = REJECT,
> +       .errstr = "direct value offset of 536870912 is not allowed",
> +},
> +{
> +       "direct map access, write test 13",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, (1<<29)-1),
> +       BPF_ST_MEM(BPF_B, BPF_REG_1, 0, 4),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1 },
> +       .result = REJECT,
> +       .errstr = "invalid access to map value pointer, value_size=48 index=0 off=536870911",
> +},
> +{
> +       "direct map access, write test 14",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 0, 47),
> +       BPF_LD_MAP_VALUE(BPF_REG_2, 0, 0, 46),
> +       BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
> +       BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1, 0),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1, 3 },
> +       .result = ACCEPT,
> +       .retval = 0xff,
> +},
> +{
> +       "direct map access, write test 15",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 1, 47),
> +       BPF_LD_MAP_VALUE(BPF_REG_2, 0, 1, 46),
> +       BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
> +       BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1, 0),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1, 3 },
> +       .result = ACCEPT,
> +       .retval = 0xff,
> +},
> +{
> +       "direct map access, write test 16",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 1, 46),
> +       BPF_LD_MAP_VALUE(BPF_REG_2, 0, 0, 46),
> +       BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
> +       BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1, 0),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1, 3 },
> +       .result = ACCEPT,
> +       .retval = 0,
> +},
> +{
> +       "direct map access, write test 17",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, 1, 46),
> +       BPF_LD_MAP_VALUE(BPF_REG_2, 0, 2, 46),
> +       BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
> +       BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1, 0),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1, 3 },
> +       .result = REJECT,
> +       .errstr = "invalid access to map value pointer, value_size=48 index=2 off=46",
> +},
> +{
> +       "direct map access, write test 18",
> +       .insns = {
> +       BPF_MOV64_IMM(BPF_REG_0, 1),
> +       BPF_LD_MAP_VALUE(BPF_REG_1, 0, ~0, 46),
> +       BPF_LD_MAP_VALUE(BPF_REG_2, 0, ~0, 46),
> +       BPF_ST_MEM(BPF_H, BPF_REG_2, 0, 0xffff),
> +       BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1, 0),
> +       BPF_EXIT_INSN(),
> +       },
> +       .fixup_map_array_48b = { 1, 3 },
> +       .result = REJECT,
> +       .errstr = "invalid access to map value pointer, value_size=48 index=4294967295 off=46",
> +},
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH rfc v3 bpf-next 9/9] bpf, selftest: test global data/bss/rodata sections
  2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 9/9] bpf, selftest: test global data/bss/rodata sections Daniel Borkmann
@ 2019-03-19 18:28   ` Andrii Nakryiko
  0 siblings, 0 replies; 20+ messages in thread
From: Andrii Nakryiko @ 2019-03-19 18:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, bpf, Networking, Joe Stringer,
	john fastabend, Yonghong Song, Jakub Kicinski, tgraf, lmb

On Mon, Mar 11, 2019 at 2:51 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> From: Joe Stringer <joe@wand.net.nz>
>
> Add tests for libbpf relocation of static variable references
> into the .data, .rodata and .bss sections of the ELF, also add
> read-only test for .rodata. All passing:
>
>   # ./test_progs
>   [...]
>   test_global_data:PASS:load program 0 nsec
>   test_global_data:PASS:pass global data run 925 nsec
>   test_global_data_number:PASS:relocate .bss reference 925 nsec
>   test_global_data_number:PASS:relocate .data reference 925 nsec
>   test_global_data_number:PASS:relocate .rodata reference 925 nsec
>   test_global_data_number:PASS:relocate .bss reference 925 nsec
>   test_global_data_number:PASS:relocate .data reference 925 nsec
>   test_global_data_number:PASS:relocate .rodata reference 925 nsec
>   test_global_data_number:PASS:relocate .bss reference 925 nsec
>   test_global_data_number:PASS:relocate .bss reference 925 nsec
>   test_global_data_number:PASS:relocate .rodata reference 925 nsec
>   test_global_data_number:PASS:relocate .rodata reference 925 nsec
>   test_global_data_number:PASS:relocate .rodata reference 925 nsec
>   test_global_data_string:PASS:relocate .rodata reference 925 nsec
>   test_global_data_string:PASS:relocate .data reference 925 nsec
>   test_global_data_string:PASS:relocate .bss reference 925 nsec
>   test_global_data_string:PASS:relocate .data reference 925 nsec
>   test_global_data_string:PASS:relocate .bss reference 925 nsec
>   test_global_data_struct:PASS:relocate .rodata reference 925 nsec
>   test_global_data_struct:PASS:relocate .bss reference 925 nsec
>   test_global_data_struct:PASS:relocate .rodata reference 925 nsec
>   test_global_data_struct:PASS:relocate .data reference 925 nsec
>   test_global_data_rdonly:PASS:test .rodata read-only map 925 nsec
>   [...]
>   Summary: 229 PASSED, 0 FAILED
>
> Note map helper signatures have been changed to avoid warnings
> when passing in const data.
>
> Joint work with Daniel Borkmann.
>
> Signed-off-by: Joe Stringer <joe@wand.net.nz>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---


Looks good!

Just wanted to mention, that there is one interesting and useful case,
which isn't yet supported:

static const char *varstr = "This string will go into one of
.rodata.str1.* section";

It could be supported today with libbpf doing corresponding
relocations, but beyond that there is no good zero-terminated strings
support yet, so I think it's ok to postpone this. And from the
program's perspective, .data variables will have pointers into .rodata
map, which will need to be handled explicitly today, which is also
quite not nice.

Acked-by: Andrii Nakryiko <andriin@fb.com>


>  tools/testing/selftests/bpf/bpf_helpers.h     |   8 +-
>  .../selftests/bpf/prog_tests/global_data.c    | 157 ++++++++++++++++++
>  .../selftests/bpf/progs/test_global_data.c    | 106 ++++++++++++
>  3 files changed, 267 insertions(+), 4 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/global_data.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_global_data.c
>
> diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
> index c9433a496d54..91c53dac95c8 100644
> --- a/tools/testing/selftests/bpf/bpf_helpers.h
> +++ b/tools/testing/selftests/bpf/bpf_helpers.h
> @@ -9,14 +9,14 @@
>  #define SEC(NAME) __attribute__((section(NAME), used))
>
>  /* helper functions called from eBPF programs written in C */
> -static void *(*bpf_map_lookup_elem)(void *map, void *key) =
> +static void *(*bpf_map_lookup_elem)(void *map, const void *key) =
>         (void *) BPF_FUNC_map_lookup_elem;
> -static int (*bpf_map_update_elem)(void *map, void *key, void *value,
> +static int (*bpf_map_update_elem)(void *map, const void *key, const void *value,
>                                   unsigned long long flags) =
>         (void *) BPF_FUNC_map_update_elem;
> -static int (*bpf_map_delete_elem)(void *map, void *key) =
> +static int (*bpf_map_delete_elem)(void *map, const void *key) =
>         (void *) BPF_FUNC_map_delete_elem;
> -static int (*bpf_map_push_elem)(void *map, void *value,
> +static int (*bpf_map_push_elem)(void *map, const void *value,
>                                 unsigned long long flags) =
>         (void *) BPF_FUNC_map_push_elem;
>  static int (*bpf_map_pop_elem)(void *map, void *value) =
> diff --git a/tools/testing/selftests/bpf/prog_tests/global_data.c b/tools/testing/selftests/bpf/prog_tests/global_data.c
> new file mode 100644
> index 000000000000..d011079fb0bf
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/global_data.c
> @@ -0,0 +1,157 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <test_progs.h>
> +
> +static void test_global_data_number(struct bpf_object *obj, __u32 duration)
> +{
> +       int i, err, map_fd;
> +       uint64_t num;
> +
> +       map_fd = bpf_find_map(__func__, obj, "result_number");
> +       if (map_fd < 0) {
> +               error_cnt++;
> +               return;
> +       }
> +
> +       struct {
> +               char *name;
> +               uint32_t key;
> +               uint64_t num;
> +       } tests[] = {
> +               { "relocate .bss reference",     0, 0 },
> +               { "relocate .data reference",    1, 42 },
> +               { "relocate .rodata reference",  2, 24 },
> +               { "relocate .bss reference",     3, 0 },
> +               { "relocate .data reference",    4, 0xffeeff },
> +               { "relocate .rodata reference",  5, 0xabab },
> +               { "relocate .bss reference",     6, 1234 },
> +               { "relocate .bss reference",     7, 0 },
> +               { "relocate .rodata reference",  8, 0xab },
> +               { "relocate .rodata reference",  9, 0x1111111111111111 },
> +               { "relocate .rodata reference", 10, ~0 },
> +       };
> +
> +       for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
> +               err = bpf_map_lookup_elem(map_fd, &tests[i].key, &num);
> +               CHECK(err || num != tests[i].num, tests[i].name,
> +                     "err %d result %lx expected %lx\n",
> +                     err, num, tests[i].num);
> +       }
> +}
> +
> +static void test_global_data_string(struct bpf_object *obj, __u32 duration)
> +{
> +       int i, err, map_fd;
> +       char str[32];
> +
> +       map_fd = bpf_find_map(__func__, obj, "result_string");
> +       if (map_fd < 0) {
> +               error_cnt++;
> +               return;
> +       }
> +
> +       struct {
> +               char *name;
> +               uint32_t key;
> +               char str[32];
> +       } tests[] = {
> +               { "relocate .rodata reference", 0, "abcdefghijklmnopqrstuvwxyz" },
> +               { "relocate .data reference",   1, "abcdefghijklmnopqrstuvwxyz" },
> +               { "relocate .bss reference",    2, "" },
> +               { "relocate .data reference",   3, "abcdexghijklmnopqrstuvwxyz" },
> +               { "relocate .bss reference",    4, "\0\0hello" },
> +       };
> +
> +       for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
> +               err = bpf_map_lookup_elem(map_fd, &tests[i].key, str);
> +               CHECK(err || memcmp(str, tests[i].str, sizeof(str)),
> +                     tests[i].name, "err %d result \'%s\' expected \'%s\'\n",
> +                     err, str, tests[i].str);
> +       }
> +}
> +
> +struct foo {
> +       __u8  a;
> +       __u32 b;
> +       __u64 c;
> +};
> +
> +static void test_global_data_struct(struct bpf_object *obj, __u32 duration)
> +{
> +       int i, err, map_fd;
> +       struct foo val;
> +
> +       map_fd = bpf_find_map(__func__, obj, "result_struct");
> +       if (map_fd < 0) {
> +               error_cnt++;
> +               return;
> +       }
> +
> +       struct {
> +               char *name;
> +               uint32_t key;
> +               struct foo val;
> +       } tests[] = {
> +               { "relocate .rodata reference", 0, { 42, 0xfefeefef, 0x1111111111111111ULL, } },
> +               { "relocate .bss reference",    1, { } },
> +               { "relocate .rodata reference", 2, { } },
> +               { "relocate .data reference",   3, { 41, 0xeeeeefef, 0x2111111111111111ULL, } },
> +       };
> +
> +       for (i = 0; i < sizeof(tests) / sizeof(tests[0]); i++) {
> +               err = bpf_map_lookup_elem(map_fd, &tests[i].key, &val);
> +               CHECK(err || memcmp(&val, &tests[i].val, sizeof(val)),
> +                     tests[i].name, "err %d result { %u, %u, %llu } expected { %u, %u, %llu }\n",
> +                     err, val.a, val.b, val.c, tests[i].val.a, tests[i].val.b, tests[i].val.c);
> +       }
> +}
> +
> +static void test_global_data_rdonly(struct bpf_object *obj, __u32 duration)
> +{
> +       int err = -ENOMEM, map_fd, zero = 0;
> +       struct bpf_map *map;
> +       __u8 *buff;
> +
> +       map = bpf_object__find_map_by_name(obj, "test_glo.rodata");
> +       if (!map || !bpf_map__is_internal(map)) {
> +               error_cnt++;
> +               return;
> +       }
> +
> +       map_fd = bpf_map__fd(map);
> +       if (map_fd < 0) {
> +               error_cnt++;
> +               return;
> +       }
> +
> +       buff = malloc(bpf_map__def(map)->value_size);
> +       if (buff)
> +               err = bpf_map_update_elem(map_fd, &zero, buff, 0);
> +       free(buff);
> +       CHECK(!err || errno != EPERM, "test .rodata read-only map",
> +             "err %d errno %d\n", err, errno);
> +}
> +
> +void test_global_data(void)
> +{
> +       const char *file = "./test_global_data.o";
> +       __u32 duration = 0, retval;
> +       struct bpf_object *obj;
> +       int err, prog_fd;
> +
> +       err = bpf_prog_load(file, BPF_PROG_TYPE_SCHED_CLS, &obj, &prog_fd);
> +       if (CHECK(err, "load program", "error %d loading %s\n", err, file))
> +               return;
> +
> +       err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
> +                               NULL, NULL, &retval, &duration);
> +       CHECK(err || retval, "pass global data run",
> +             "err %d errno %d retval %d duration %d\n",
> +             err, errno, retval, duration);
> +
> +       test_global_data_number(obj, duration);
> +       test_global_data_string(obj, duration);
> +       test_global_data_struct(obj, duration);
> +       test_global_data_rdonly(obj, duration);
> +
> +       bpf_object__close(obj);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_global_data.c b/tools/testing/selftests/bpf/progs/test_global_data.c
> new file mode 100644
> index 000000000000..5ab14e941980
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_global_data.c
> @@ -0,0 +1,106 @@
> +// SPDX-License-Identifier: GPL-2.0
> +// Copyright (c) 2019 Isovalent, Inc.
> +
> +#include <linux/bpf.h>
> +#include <linux/pkt_cls.h>
> +#include <string.h>
> +
> +#include "bpf_helpers.h"
> +
> +struct bpf_map_def SEC("maps") result_number = {
> +       .type           = BPF_MAP_TYPE_ARRAY,
> +       .key_size       = sizeof(__u32),
> +       .value_size     = sizeof(__u64),
> +       .max_entries    = 11,
> +};
> +
> +struct bpf_map_def SEC("maps") result_string = {
> +       .type           = BPF_MAP_TYPE_ARRAY,
> +       .key_size       = sizeof(__u32),
> +       .value_size     = 32,
> +       .max_entries    = 5,
> +};
> +
> +struct foo {
> +       __u8  a;
> +       __u32 b;
> +       __u64 c;
> +};
> +
> +struct bpf_map_def SEC("maps") result_struct = {
> +       .type           = BPF_MAP_TYPE_ARRAY,
> +       .key_size       = sizeof(__u32),
> +       .value_size     = sizeof(struct foo),
> +       .max_entries    = 5,
> +};
> +
> +/* Relocation tests for __u64s. */
> +static       __u64 num0;
> +static       __u64 num1 = 42;
> +static const __u64 num2 = 24;
> +static       __u64 num3 = 0;
> +static       __u64 num4 = 0xffeeff;
> +static const __u64 num5 = 0xabab;
> +static const __u64 num6 = 0xab;
> +
> +/* Relocation tests for strings. */
> +static const char str0[32] = "abcdefghijklmnopqrstuvwxyz";
> +static       char str1[32] = "abcdefghijklmnopqrstuvwxyz";
> +static       char str2[32];
> +
> +/* Relocation tests for structs. */
> +static const struct foo struct0 = {
> +       .a = 42,
> +       .b = 0xfefeefef,
> +       .c = 0x1111111111111111ULL,
> +};
> +static struct foo struct1;
> +static const struct foo struct2;
> +static struct foo struct3 = {
> +       .a = 41,
> +       .b = 0xeeeeefef,
> +       .c = 0x2111111111111111ULL,
> +};
> +
> +#define test_reloc(map, num, var)                                      \
> +       do {                                                            \
> +               __u32 key = num;                                        \
> +               bpf_map_update_elem(&result_##map, &key, var, 0);       \
> +       } while (0)
> +
> +SEC("static_data_load")
> +int load_static_data(struct __sk_buff *skb)
> +{
> +       static const __u64 bar = ~0;
> +
> +       test_reloc(number, 0, &num0);
> +       test_reloc(number, 1, &num1);
> +       test_reloc(number, 2, &num2);
> +       test_reloc(number, 3, &num3);
> +       test_reloc(number, 4, &num4);
> +       test_reloc(number, 5, &num5);
> +       num4 = 1234;
> +       test_reloc(number, 6, &num4);
> +       test_reloc(number, 7, &num0);
> +       test_reloc(number, 8, &num6);
> +
> +       test_reloc(string, 0, str0);
> +       test_reloc(string, 1, str1);
> +       test_reloc(string, 2, str2);
> +       str1[5] = 'x';
> +       test_reloc(string, 3, str1);
> +       __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
> +       test_reloc(string, 4, str2);
> +
> +       test_reloc(struct, 0, &struct0);
> +       test_reloc(struct, 1, &struct1);
> +       test_reloc(struct, 2, &struct2);
> +       test_reloc(struct, 3, &struct3);
> +
> +       test_reloc(number,  9, &struct0.c);
> +       test_reloc(number, 10, &bar);
> +
> +       return TC_ACT_OK;
> +}
> +
> +char _license[] SEC("license") = "GPL";
> --
> 2.17.1
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2019-03-19 18:28 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-11 21:51 [PATCH rfc v3 bpf-next 0/9] BPF support for global data Daniel Borkmann
2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 1/9] bpf: implement lookup-free direct value access for maps Daniel Borkmann
2019-03-14 18:11   ` Andrii Nakryiko
2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 2/9] bpf: add program side {rd,wr}only support " Daniel Borkmann
2019-03-11 23:06   ` Alexei Starovoitov
2019-03-11 23:34     ` Daniel Borkmann
2019-03-14 19:27     ` Andrii Nakryiko
2019-03-14 19:26   ` Andrii Nakryiko
2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 3/9] bpf: add syscall side map lock support Daniel Borkmann
2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 4/9] bpf, obj: allow . char as part of the name Daniel Borkmann
2019-03-14 19:40   ` Andrii Nakryiko
2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 5/9] bpf: sync bpf.h uapi header from tools infrastructure Daniel Borkmann
2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 6/9] bpf, libbpf: refactor relocation handling Daniel Borkmann
2019-03-14 21:05   ` Andrii Nakryiko
2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 7/9] bpf, libbpf: support global data/bss/rodata sections Daniel Borkmann
2019-03-14 22:14   ` Andrii Nakryiko
2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 8/9] bpf, selftest: test {rd,wr}only flags and direct value access Daniel Borkmann
2019-03-19 18:18   ` Andrii Nakryiko
2019-03-11 21:51 ` [PATCH rfc v3 bpf-next 9/9] bpf, selftest: test global data/bss/rodata sections Daniel Borkmann
2019-03-19 18:28   ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.