[RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
@ 2018-02-13  4:05 Sandipan Das
  2018-02-13  4:06 ` [RFC][PATCH bpf v2 2/2] bpf: powerpc64: add JIT support for multi-function programs Sandipan Das
  2018-02-15 16:25 ` [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls Daniel Borkmann
  0 siblings, 2 replies; 24+ messages in thread
From: Sandipan Das @ 2018-02-13  4:05 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao

The imm field of a bpf_insn is a signed 32-bit integer. For
JIT-ed bpf-to-bpf function calls, it stores the offset from
__bpf_call_base to the start of the callee function.

For some architectures, such as powerpc64, it was found that
this offset may be as large as 64 bits because of which this
cannot be accomodated in the imm field without truncation.

To resolve this, we additionally make aux->func within each
bpf_prog associated with the functions to point to the list
of all function addresses determined by the verifier.

We keep the value assigned to the off field of the bpf_insn
as a way to index into aux->func and also set aux->func_cnt
so that this can be used for performing basic upper bound
checks for the off field.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
v2: Make aux->func point to the list of functions determined
    by the verifier rather than allocating a separate callee
    list for each function.
---
 kernel/bpf/verifier.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fb69a85d967..1c4d9cd485ed 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5288,11 +5288,25 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 			    insn->src_reg != BPF_PSEUDO_CALL)
 				continue;
 			subprog = insn->off;
-			insn->off = 0;
 			insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
 				func[subprog]->bpf_func -
 				__bpf_call_base;
 		}
+
+		/* the offset to a callee function from __bpf_call_base
+		 * may be larger than what the 32 bit integer imm can
+		 * accomodate which will truncate the higher order bits
+		 *
+		 * to avoid this, we additionally utilize the aux data
+		 * of each function to point to a list of all function
+		 * addresses determined by the verifier
+		 *
+		 * the off field of the instruction provides the index
+		 * in this list where the start address of a function
+		 * is available
+		 */
+		func[i]->aux->func = func;
+		func[i]->aux->func_cnt = env->subprog_cnt + 1;
 	}
 	for (i = 0; i <= env->subprog_cnt; i++) {
 		old_bpf_func = func[i]->bpf_func;
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH bpf v2 2/2] bpf: powerpc64: add JIT support for multi-function programs
  2018-02-13  4:05 [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls Sandipan Das
@ 2018-02-13  4:06 ` Sandipan Das
  2018-02-15 16:25 ` [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls Daniel Borkmann
  1 sibling, 0 replies; 24+ messages in thread
From: Sandipan Das @ 2018-02-13  4:06 UTC (permalink / raw)
  To: ast, daniel; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao

This adds support for bpf-to-bpf function calls for the powerpc64
JIT compiler. After a round of the usual JIT passes, the offsets
to callee functions from __bpf_call_base are known. To update the
target addresses for the branch instructions associated with each
BPF_CALL, an extra pass is performed.

Since it is seen that the offsets may be as large as 64 bits for
powerpc64, we use the aux data associated with each caller to get
the correct branch target address rather than using the imm field
of the BPF_CALL instruction.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
v2: Use the off field of the instruction as an index for
    aux->func to determine the start address of a callee
    function.
---
 arch/powerpc/net/bpf_jit_comp64.c | 73 +++++++++++++++++++++++++++++++++------
 1 file changed, 63 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 0a34b0cec7b7..cf0d4e32aa52 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -290,7 +290,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32
 /* Assemble the body code between the prologue & epilogue */
 static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 			      struct codegen_context *ctx,
-			      u32 *addrs)
+			      u32 *addrs, bool extra_pass)
 {
 	const struct bpf_insn *insn = fp->insnsi;
 	int flen = fp->len;
@@ -746,11 +746,17 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 			break;
 
 		/*
-		 * Call kernel helper
+		 * Call kernel helper or bpf function
 		 */
 		case BPF_JMP | BPF_CALL:
 			ctx->seen |= SEEN_FUNC;
-			func = (u8 *) __bpf_call_base + imm;
+			if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
+				if (fp->aux->func && off < fp->aux->func_cnt)
+					func = (u8 *) fp->aux->func[off]->bpf_func;
+				else
+					return -EINVAL;
+			else
+				func = (u8 *) __bpf_call_base + imm;
 
 			/* Save skb pointer if we need to re-cache skb data */
 			if ((ctx->seen & SEEN_SKB) &&
@@ -970,6 +976,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 	return 0;
 }
 
+struct powerpc64_jit_data {
+	struct bpf_binary_header *header;
+	u32 *addrs;
+	u8 *image;
+	u32 proglen;
+	struct codegen_context ctx;
+};
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
 	u32 proglen;
@@ -977,6 +991,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	u8 *image = NULL;
 	u32 *code_base;
 	u32 *addrs;
+	struct powerpc64_jit_data *jit_data;
 	struct codegen_context cgctx;
 	int pass;
 	int flen;
@@ -984,6 +999,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	struct bpf_prog *org_fp = fp;
 	struct bpf_prog *tmp_fp;
 	bool bpf_blinded = false;
+	bool extra_pass = false;
 
 	if (!fp->jit_requested)
 		return org_fp;
@@ -997,7 +1013,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		fp = tmp_fp;
 	}
 
+	jit_data = fp->aux->jit_data;
+	if (!jit_data) {
+		jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+		if (!jit_data) {
+			fp = org_fp;
+			goto out;
+		}
+		fp->aux->jit_data = jit_data;
+	}
+
 	flen = fp->len;
+	addrs = jit_data->addrs;
+	if (addrs) {
+		cgctx = jit_data->ctx;
+		image = jit_data->image;
+		bpf_hdr = jit_data->header;
+		proglen = jit_data->proglen;
+		alloclen = proglen + FUNCTION_DESCR_SIZE;
+		extra_pass = true;
+		goto skip_init_ctx;
+	}
+
 	addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
 	if (addrs == NULL) {
 		fp = org_fp;
@@ -1010,10 +1047,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
 	/* Scouting faux-generate pass 0 */
-	if (bpf_jit_build_body(fp, 0, &cgctx, addrs)) {
+	if (bpf_jit_build_body(fp, 0, &cgctx, addrs, false)) {
 		/* We hit something illegal or unsupported. */
 		fp = org_fp;
-		goto out;
+		goto out_addrs;
 	}
 
 	/*
@@ -1031,9 +1068,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 			bpf_jit_fill_ill_insns);
 	if (!bpf_hdr) {
 		fp = org_fp;
-		goto out;
+		goto out_addrs;
 	}
 
+skip_init_ctx:
 	code_base = (u32 *)(image + FUNCTION_DESCR_SIZE);
 
 	/* Code generation passes 1-2 */
@@ -1041,7 +1079,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		/* Now build the prologue, body code & epilogue for real. */
 		cgctx.idx = 0;
 		bpf_jit_build_prologue(code_base, &cgctx);
-		bpf_jit_build_body(fp, code_base, &cgctx, addrs);
+		bpf_jit_build_body(fp, code_base, &cgctx, addrs, extra_pass);
 		bpf_jit_build_epilogue(code_base, &cgctx);
 
 		if (bpf_jit_enable > 1)
@@ -1062,15 +1100,30 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	((u64 *)image)[1] = local_paca->kernel_toc;
 #endif
 
+	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
+
+	if (!fp->is_func || extra_pass) {
+		bpf_jit_binary_lock_ro(bpf_hdr);
+	} else {
+		jit_data->addrs = addrs;
+		jit_data->ctx = cgctx;
+		jit_data->proglen = proglen;
+		jit_data->image = image;
+		jit_data->header = bpf_hdr;
+	}
+
 	fp->bpf_func = (void *)image;
 	fp->jited = 1;
 	fp->jited_len = alloclen;
 
-	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
+	if (!fp->is_func || extra_pass) {
+out_addrs:
+		kfree(addrs);
+		kfree(jit_data);
+		fp->aux->jit_data = NULL;
+	}
 
 out:
-	kfree(addrs);
-
 	if (bpf_blinded)
 		bpf_jit_prog_release_other(fp, fp == org_fp ? tmp_fp : org_fp);
 
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
  2018-02-13  4:05 [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls Sandipan Das
  2018-02-13  4:06 ` [RFC][PATCH bpf v2 2/2] bpf: powerpc64: add JIT support for multi-function programs Sandipan Das
@ 2018-02-15 16:25 ` Daniel Borkmann
  2018-02-15 20:18   ` Daniel Borkmann
  1 sibling, 1 reply; 24+ messages in thread
From: Daniel Borkmann @ 2018-02-15 16:25 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao

On 02/13/2018 05:05 AM, Sandipan Das wrote:
> The imm field of a bpf_insn is a signed 32-bit integer. For
> JIT-ed bpf-to-bpf function calls, it stores the offset from
> __bpf_call_base to the start of the callee function.
> 
> For some architectures, such as powerpc64, it was found that
> this offset may be as large as 64 bits because of which this
> cannot be accomodated in the imm field without truncation.
> 
> To resolve this, we additionally make aux->func within each
> bpf_prog associated with the functions to point to the list
> of all function addresses determined by the verifier.
> 
> We keep the value assigned to the off field of the bpf_insn
> as a way to index into aux->func and also set aux->func_cnt
> so that this can be used for performing basic upper bound
> checks for the off field.
> 
> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> ---
> v2: Make aux->func point to the list of functions determined
>     by the verifier rather than allocating a separate callee
>     list for each function.

Approach looks good to me; do you know whether s390x JIT would
have similar requirement? I think one limitation that would still
need to be addressed later with such approach would be regarding the
xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
("bpf: allow for correlation of maps and helpers in dump"). Any
ideas for this (potentially if we could use off + imm for calls,
we'd get to 48 bits, but that seems still not be enough as you say)?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
  2018-02-15 16:25 ` [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls Daniel Borkmann
@ 2018-02-15 20:18   ` Daniel Borkmann
  2018-02-16 15:50       ` Naveen N. Rao
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Borkmann @ 2018-02-15 20:18 UTC (permalink / raw)
  To: Sandipan Das, ast; +Cc: netdev, linuxppc-dev, mpe, naveen.n.rao

On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
> On 02/13/2018 05:05 AM, Sandipan Das wrote:
>> The imm field of a bpf_insn is a signed 32-bit integer. For
>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>> __bpf_call_base to the start of the callee function.
>>
>> For some architectures, such as powerpc64, it was found that
>> this offset may be as large as 64 bits because of which this
>> cannot be accomodated in the imm field without truncation.
>>
>> To resolve this, we additionally make aux->func within each
>> bpf_prog associated with the functions to point to the list
>> of all function addresses determined by the verifier.
>>
>> We keep the value assigned to the off field of the bpf_insn
>> as a way to index into aux->func and also set aux->func_cnt
>> so that this can be used for performing basic upper bound
>> checks for the off field.
>>
>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>> ---
>> v2: Make aux->func point to the list of functions determined
>>     by the verifier rather than allocating a separate callee
>>     list for each function.
> 
> Approach looks good to me; do you know whether s390x JIT would
> have similar requirement? I think one limitation that would still
> need to be addressed later with such approach would be regarding the
> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
> ("bpf: allow for correlation of maps and helpers in dump"). Any
> ideas for this (potentially if we could use off + imm for calls,
> we'd get to 48 bits, but that seems still not be enough as you say)?

One other random thought, although I'm not sure how feasible this
is for ppc64 JIT to realize ... but idea would be to have something
like the below:

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 29ca920..daa7258 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
 	return ret;
 }

+void * __weak bpf_jit_image_alloc(unsigned long size)
+{
+	return module_alloc(size);
+}
+
 struct bpf_binary_header *
 bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 		     unsigned int alignment,
@@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
 	 * random section of illegal instructions.
 	 */
 	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
-	hdr = module_alloc(size);
+	hdr = bpf_jit_image_alloc(size);
 	if (hdr == NULL)
 		return NULL;

And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
like some archs would override the module_alloc() helper through a
custom implementation, usually via __vmalloc_node_range(), so we
could perhaps fit the range for BPF JITed images in a way that they
could use the 32bit imm in the end? There are not that many progs
loaded typically, so the range could be a bit narrower in such case
anyway. (Not sure if this would work out though, but I thought to
bring it up.)

Thanks,
Daniel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
  2018-02-15 20:18   ` Daniel Borkmann
@ 2018-02-16 15:50       ` Naveen N. Rao
  0 siblings, 0 replies; 24+ messages in thread
From: Naveen N. Rao @ 2018-02-16 15:50 UTC (permalink / raw)
  To: ast, Daniel Borkmann, Sandipan Das, Michael Ellerman
  Cc: linuxppc-dev, netdev, Michael Holzheu

Daniel Borkmann wrote:
> On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
>> On 02/13/2018 05:05 AM, Sandipan Das wrote:
>>> The imm field of a bpf_insn is a signed 32-bit integer. For
>>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>>> __bpf_call_base to the start of the callee function.
>>>
>>> For some architectures, such as powerpc64, it was found that
>>> this offset may be as large as 64 bits because of which this
>>> cannot be accomodated in the imm field without truncation.
>>>
>>> To resolve this, we additionally make aux->func within each
>>> bpf_prog associated with the functions to point to the list
>>> of all function addresses determined by the verifier.
>>>
>>> We keep the value assigned to the off field of the bpf_insn
>>> as a way to index into aux->func and also set aux->func_cnt
>>> so that this can be used for performing basic upper bound
>>> checks for the off field.
>>>
>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>> ---
>>> v2: Make aux->func point to the list of functions determined
>>>     by the verifier rather than allocating a separate callee
>>>     list for each function.
>> 
>> Approach looks good to me; do you know whether s390x JIT would
>> have similar requirement? I think one limitation that would still
>> need to be addressed later with such approach would be regarding the
>> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
>> ("bpf: allow for correlation of maps and helpers in dump"). Any
>> ideas for this (potentially if we could use off + imm for calls,
>> we'd get to 48 bits, but that seems still not be enough as you say)?

All good points. I'm not really sure how s390x works, so I can't comment 
on that, but I'm copying Michael Holzheu for his consideration.

With the existing scheme, 48 bits won't be enough, so we rejected that 
approach. I can also see how this will be a problem with bpftool, but I 
haven't looked into it in detail. I wonder if we can annotate the output 
to indicate the function being referred to?

> 
> One other random thought, although I'm not sure how feasible this
> is for ppc64 JIT to realize ... but idea would be to have something
> like the below:
> 
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 29ca920..daa7258 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
>  	return ret;
>  }
> 
> +void * __weak bpf_jit_image_alloc(unsigned long size)
> +{
> +	return module_alloc(size);
> +}
> +
>  struct bpf_binary_header *
>  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>  		     unsigned int alignment,
> @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>  	 * random section of illegal instructions.
>  	 */
>  	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
> -	hdr = module_alloc(size);
> +	hdr = bpf_jit_image_alloc(size);
>  	if (hdr == NULL)
>  		return NULL;
> 
> And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
> like some archs would override the module_alloc() helper through a
> custom implementation, usually via __vmalloc_node_range(), so we
> could perhaps fit the range for BPF JITed images in a way that they
> could use the 32bit imm in the end? There are not that many progs
> loaded typically, so the range could be a bit narrower in such case
> anyway. (Not sure if this would work out though, but I thought to
> bring it up.)

That'd be a good option to consider. I don't think we want to allocate 
anything from the linear memory range since users could load 
unprivileged BPF programs and consume a lot of memory that way. I doubt 
if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
entirely sure.

Michael,
Is the above possible? The question is if we can have BPF programs be 
allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
that calls to those programs can be encoded in a 32-bit immediate field 
in a BPF instruction. As an extension, we may be able to extend it to 
48-bits by combining with another BPF instruction field (offset). In 
either case, the vmalloc'ed address range won't work.

The alternative is to pass the full 64-bit address of the BPF program in 
an auxiliary field (as proposed in this patch set) but we need to fix it 
up for 'bpftool' as well.

Thanks,
Naveen

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
@ 2018-02-16 15:50       ` Naveen N. Rao
  0 siblings, 0 replies; 24+ messages in thread
From: Naveen N. Rao @ 2018-02-16 15:50 UTC (permalink / raw)
  To: ast, Daniel Borkmann, Sandipan Das, Michael Ellerman
  Cc: linuxppc-dev, netdev, Michael Holzheu

Daniel Borkmann wrote:
> On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
>> On 02/13/2018 05:05 AM, Sandipan Das wrote:
>>> The imm field of a bpf_insn is a signed 32-bit integer. For
>>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>>> __bpf_call_base to the start of the callee function.
>>>
>>> For some architectures, such as powerpc64, it was found that
>>> this offset may be as large as 64 bits because of which this
>>> cannot be accomodated in the imm field without truncation.
>>>
>>> To resolve this, we additionally make aux->func within each
>>> bpf_prog associated with the functions to point to the list
>>> of all function addresses determined by the verifier.
>>>
>>> We keep the value assigned to the off field of the bpf_insn
>>> as a way to index into aux->func and also set aux->func_cnt
>>> so that this can be used for performing basic upper bound
>>> checks for the off field.
>>>
>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>> ---
>>> v2: Make aux->func point to the list of functions determined
>>>     by the verifier rather than allocating a separate callee
>>>     list for each function.
>>=20
>> Approach looks good to me; do you know whether s390x JIT would
>> have similar requirement? I think one limitation that would still
>> need to be addressed later with such approach would be regarding the
>> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
>> ("bpf: allow for correlation of maps and helpers in dump"). Any
>> ideas for this (potentially if we could use off + imm for calls,
>> we'd get to 48 bits, but that seems still not be enough as you say)?

All good points. I'm not really sure how s390x works, so I can't comment=20
on that, but I'm copying Michael Holzheu for his consideration.

With the existing scheme, 48 bits won't be enough, so we rejected that=20
approach. I can also see how this will be a problem with bpftool, but I=20
haven't looked into it in detail. I wonder if we can annotate the output=20
to indicate the function being referred to?

>=20
> One other random thought, although I'm not sure how feasible this
> is for ppc64 JIT to realize ... but idea would be to have something
> like the below:
>=20
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index 29ca920..daa7258 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned lo=
ng *value, char *type,
>  	return ret;
>  }
>=20
> +void * __weak bpf_jit_image_alloc(unsigned long size)
> +{
> +	return module_alloc(size);
> +}
> +
>  struct bpf_binary_header *
>  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>  		     unsigned int alignment,
> @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image=
_ptr,
>  	 * random section of illegal instructions.
>  	 */
>  	size =3D round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
> -	hdr =3D module_alloc(size);
> +	hdr =3D bpf_jit_image_alloc(size);
>  	if (hdr =3D=3D NULL)
>  		return NULL;
>=20
> And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
> like some archs would override the module_alloc() helper through a
> custom implementation, usually via __vmalloc_node_range(), so we
> could perhaps fit the range for BPF JITed images in a way that they
> could use the 32bit imm in the end? There are not that many progs
> loaded typically, so the range could be a bit narrower in such case
> anyway. (Not sure if this would work out though, but I thought to
> bring it up.)

That'd be a good option to consider. I don't think we want to allocate=20
anything from the linear memory range since users could load=20
unprivileged BPF programs and consume a lot of memory that way. I doubt=20
if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not=20
entirely sure.

Michael,
Is the above possible? The question is if we can have BPF programs be=20
allocated within 4GB of __bpf_call_base (which is a kernel symbol), so=20
that calls to those programs can be encoded in a 32-bit immediate field=20
in a BPF instruction. As an extension, we may be able to extend it to=20
48-bits by combining with another BPF instruction field (offset). In=20
either case, the vmalloc'ed address range won't work.

The alternative is to pass the full 64-bit address of the BPF program in=20
an auxiliary field (as proposed in this patch set) but we need to fix it=20
up for 'bpftool' as well.

Thanks,
Naveen

=

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
  2018-02-16 15:50       ` Naveen N. Rao
@ 2018-02-20  9:29         ` Michael Ellerman
  -1 siblings, 0 replies; 24+ messages in thread
From: Michael Ellerman @ 2018-02-20  9:29 UTC (permalink / raw)
  To: Naveen N. Rao, ast, Daniel Borkmann, Sandipan Das
  Cc: netdev, Michael Holzheu, linuxppc-dev

"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
> Daniel Borkmann wrote:
>> On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
>>> On 02/13/2018 05:05 AM, Sandipan Das wrote:
>>>> The imm field of a bpf_insn is a signed 32-bit integer. For
>>>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>>>> __bpf_call_base to the start of the callee function.
>>>>
>>>> For some architectures, such as powerpc64, it was found that
>>>> this offset may be as large as 64 bits because of which this
>>>> cannot be accomodated in the imm field without truncation.
>>>>
>>>> To resolve this, we additionally make aux->func within each
>>>> bpf_prog associated with the functions to point to the list
>>>> of all function addresses determined by the verifier.
>>>>
>>>> We keep the value assigned to the off field of the bpf_insn
>>>> as a way to index into aux->func and also set aux->func_cnt
>>>> so that this can be used for performing basic upper bound
>>>> checks for the off field.
>>>>
>>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>>> ---
>>>> v2: Make aux->func point to the list of functions determined
>>>>     by the verifier rather than allocating a separate callee
>>>>     list for each function.
>>> 
>>> Approach looks good to me; do you know whether s390x JIT would
>>> have similar requirement? I think one limitation that would still
>>> need to be addressed later with such approach would be regarding the
>>> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
>>> ("bpf: allow for correlation of maps and helpers in dump"). Any
>>> ideas for this (potentially if we could use off + imm for calls,
>>> we'd get to 48 bits, but that seems still not be enough as you say)?
>
> All good points. I'm not really sure how s390x works, so I can't comment 
> on that, but I'm copying Michael Holzheu for his consideration.
>
> With the existing scheme, 48 bits won't be enough, so we rejected that 
> approach. I can also see how this will be a problem with bpftool, but I 
> haven't looked into it in detail. I wonder if we can annotate the output 
> to indicate the function being referred to?
>
>> 
>> One other random thought, although I'm not sure how feasible this
>> is for ppc64 JIT to realize ... but idea would be to have something
>> like the below:
>> 
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 29ca920..daa7258 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
>>  	return ret;
>>  }
>> 
>> +void * __weak bpf_jit_image_alloc(unsigned long size)
>> +{
>> +	return module_alloc(size);
>> +}
>> +
>>  struct bpf_binary_header *
>>  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>  		     unsigned int alignment,
>> @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>  	 * random section of illegal instructions.
>>  	 */
>>  	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
>> -	hdr = module_alloc(size);
>> +	hdr = bpf_jit_image_alloc(size);
>>  	if (hdr == NULL)
>>  		return NULL;
>> 
>> And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
>> like some archs would override the module_alloc() helper through a
>> custom implementation, usually via __vmalloc_node_range(), so we
>> could perhaps fit the range for BPF JITed images in a way that they
>> could use the 32bit imm in the end? There are not that many progs
>> loaded typically, so the range could be a bit narrower in such case
>> anyway. (Not sure if this would work out though, but I thought to
>> bring it up.)
>
> That'd be a good option to consider. I don't think we want to allocate 
> anything from the linear memory range since users could load 
> unprivileged BPF programs and consume a lot of memory that way. I doubt 
> if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
> entirely sure.
>
> Michael,
> Is the above possible? The question is if we can have BPF programs be 
> allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
> that calls to those programs can be encoded in a 32-bit immediate field 
> in a BPF instruction.

Hmmm.

It's not technically impossible, but I don't think it's really a good
option.

The 0xc range is a linear mapping of RAM, and the kernel tends to be
near the start of RAM for reasons. That means there generally isn't a
hole in the 0xc range within 4GB for you to map BPF programs.

You could create a hole by making the 0xc mapping non linear, ie.
mapping some RAM near the kernel elsewhere in the 0xc range, to make a
hole that you can then remap BPF programs into. But I think that would
cause a lot of bugs, it's a pretty fundamental assumption that the
linear mapping is 1:1.

> As an extension, we may be able to extend it to 
> 48-bits by combining with another BPF instruction field (offset). In 
> either case, the vmalloc'ed address range won't work.

48-bits could possibly work, we don't have systems with that much RAM
*yet*. So you could remap the BPF programs at the end of the 0xc range,
or somewhere we have a gap in RAM.

That would probably still confuse some things, ie. the 0xc mapping would
be a 1:1 mapping of RAM plus some other stuff, but it might be tractable
to fix.

We don't have page tables for the 0xc range when using the hash MMU, so
we'd need to either fix that or use some bespoke data structure for
keeping track of the mappings.

It doesn't really appeal :)

We load modules in the 0xd range, and they call functions in the kernel,
we handle that with trampolines. Can you do something similar?

You obviously still need a 64-bit branch somewhere, but perhaps you only
need one, or one per BPF program, rather than one per function call.

> The alternative is to pass the full 64-bit address of the BPF program in 
> an auxiliary field (as proposed in this patch set) but we need to fix it 
> up for 'bpftool' as well.

OK. You'll have to explain to me how bpftool is related, I thought it
was just for loading/examining BPF programs.

cheers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
@ 2018-02-20  9:29         ` Michael Ellerman
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Ellerman @ 2018-02-20  9:29 UTC (permalink / raw)
  To: Naveen N. Rao, ast, Daniel Borkmann, Sandipan Das
  Cc: linuxppc-dev, netdev, Michael Holzheu

"Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
> Daniel Borkmann wrote:
>> On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
>>> On 02/13/2018 05:05 AM, Sandipan Das wrote:
>>>> The imm field of a bpf_insn is a signed 32-bit integer. For
>>>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>>>> __bpf_call_base to the start of the callee function.
>>>>
>>>> For some architectures, such as powerpc64, it was found that
>>>> this offset may be as large as 64 bits because of which this
>>>> cannot be accomodated in the imm field without truncation.
>>>>
>>>> To resolve this, we additionally make aux->func within each
>>>> bpf_prog associated with the functions to point to the list
>>>> of all function addresses determined by the verifier.
>>>>
>>>> We keep the value assigned to the off field of the bpf_insn
>>>> as a way to index into aux->func and also set aux->func_cnt
>>>> so that this can be used for performing basic upper bound
>>>> checks for the off field.
>>>>
>>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>>> ---
>>>> v2: Make aux->func point to the list of functions determined
>>>>     by the verifier rather than allocating a separate callee
>>>>     list for each function.
>>> 
>>> Approach looks good to me; do you know whether s390x JIT would
>>> have similar requirement? I think one limitation that would still
>>> need to be addressed later with such approach would be regarding the
>>> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
>>> ("bpf: allow for correlation of maps and helpers in dump"). Any
>>> ideas for this (potentially if we could use off + imm for calls,
>>> we'd get to 48 bits, but that seems still not be enough as you say)?
>
> All good points. I'm not really sure how s390x works, so I can't comment 
> on that, but I'm copying Michael Holzheu for his consideration.
>
> With the existing scheme, 48 bits won't be enough, so we rejected that 
> approach. I can also see how this will be a problem with bpftool, but I 
> haven't looked into it in detail. I wonder if we can annotate the output 
> to indicate the function being referred to?
>
>> 
>> One other random thought, although I'm not sure how feasible this
>> is for ppc64 JIT to realize ... but idea would be to have something
>> like the below:
>> 
>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>> index 29ca920..daa7258 100644
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
>>  	return ret;
>>  }
>> 
>> +void * __weak bpf_jit_image_alloc(unsigned long size)
>> +{
>> +	return module_alloc(size);
>> +}
>> +
>>  struct bpf_binary_header *
>>  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>  		     unsigned int alignment,
>> @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>  	 * random section of illegal instructions.
>>  	 */
>>  	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
>> -	hdr = module_alloc(size);
>> +	hdr = bpf_jit_image_alloc(size);
>>  	if (hdr == NULL)
>>  		return NULL;
>> 
>> And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
>> like some archs would override the module_alloc() helper through a
>> custom implementation, usually via __vmalloc_node_range(), so we
>> could perhaps fit the range for BPF JITed images in a way that they
>> could use the 32bit imm in the end? There are not that many progs
>> loaded typically, so the range could be a bit narrower in such case
>> anyway. (Not sure if this would work out though, but I thought to
>> bring it up.)
>
> That'd be a good option to consider. I don't think we want to allocate 
> anything from the linear memory range since users could load 
> unprivileged BPF programs and consume a lot of memory that way. I doubt 
> if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
> entirely sure.
>
> Michael,
> Is the above possible? The question is if we can have BPF programs be 
> allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
> that calls to those programs can be encoded in a 32-bit immediate field 
> in a BPF instruction.

Hmmm.

It's not technically impossible, but I don't think it's really a good
option.

The 0xc range is a linear mapping of RAM, and the kernel tends to be
near the start of RAM for reasons. That means there generally isn't a
hole in the 0xc range within 4GB for you to map BPF programs.

You could create a hole by making the 0xc mapping non linear, ie.
mapping some RAM near the kernel elsewhere in the 0xc range, to make a
hole that you can then remap BPF programs into. But I think that would
cause a lot of bugs, it's a pretty fundamental assumption that the
linear mapping is 1:1.

> As an extension, we may be able to extend it to 
> 48-bits by combining with another BPF instruction field (offset). In 
> either case, the vmalloc'ed address range won't work.

48-bits could possibly work, we don't have systems with that much RAM
*yet*. So you could remap the BPF programs at the end of the 0xc range,
or somewhere we have a gap in RAM.

That would probably still confuse some things, ie. the 0xc mapping would
be a 1:1 mapping of RAM plus some other stuff, but it might be tractable
to fix.

We don't have page tables for the 0xc range when using the hash MMU, so
we'd need to either fix that or use some bespoke data structure for
keeping track of the mappings.

It doesn't really appeal :)

We load modules in the 0xd range, and they call functions in the kernel,
we handle that with trampolines. Can you do something similar?

You obviously still need a 64-bit branch somewhere, but perhaps you only
need one, or one per BPF program, rather than one per function call.

> The alternative is to pass the full 64-bit address of the BPF program in 
> an auxiliary field (as proposed in this patch set) but we need to fix it 
> up for 'bpftool' as well.

OK. You'll have to explain to me how bpftool is related, I thought it
was just for loading/examining BPF programs.

cheers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
  2018-02-20  9:29         ` Michael Ellerman
@ 2018-02-20 19:22           ` Naveen N. Rao
  -1 siblings, 0 replies; 24+ messages in thread
From: Naveen N. Rao @ 2018-02-20 19:22 UTC (permalink / raw)
  To: ast, Daniel Borkmann, Michael Ellerman, Sandipan Das
  Cc: netdev, Michael Holzheu, linuxppc-dev

Michael Ellerman wrote:
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>> Daniel Borkmann wrote:
>>> On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
>>>> On 02/13/2018 05:05 AM, Sandipan Das wrote:
>>>>> The imm field of a bpf_insn is a signed 32-bit integer. For
>>>>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>>>>> __bpf_call_base to the start of the callee function.
>>>>>
>>>>> For some architectures, such as powerpc64, it was found that
>>>>> this offset may be as large as 64 bits because of which this
>>>>> cannot be accomodated in the imm field without truncation.
>>>>>
>>>>> To resolve this, we additionally make aux->func within each
>>>>> bpf_prog associated with the functions to point to the list
>>>>> of all function addresses determined by the verifier.
>>>>>
>>>>> We keep the value assigned to the off field of the bpf_insn
>>>>> as a way to index into aux->func and also set aux->func_cnt
>>>>> so that this can be used for performing basic upper bound
>>>>> checks for the off field.
>>>>>
>>>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>>>> ---
>>>>> v2: Make aux->func point to the list of functions determined
>>>>>     by the verifier rather than allocating a separate callee
>>>>>     list for each function.
>>>> 
>>>> Approach looks good to me; do you know whether s390x JIT would
>>>> have similar requirement? I think one limitation that would still
>>>> need to be addressed later with such approach would be regarding the
>>>> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
>>>> ("bpf: allow for correlation of maps and helpers in dump"). Any
>>>> ideas for this (potentially if we could use off + imm for calls,
>>>> we'd get to 48 bits, but that seems still not be enough as you say)?
>>
>> All good points. I'm not really sure how s390x works, so I can't comment 
>> on that, but I'm copying Michael Holzheu for his consideration.
>>
>> With the existing scheme, 48 bits won't be enough, so we rejected that 
>> approach. I can also see how this will be a problem with bpftool, but I 
>> haven't looked into it in detail. I wonder if we can annotate the output 
>> to indicate the function being referred to?
>>
>>> 
>>> One other random thought, although I'm not sure how feasible this
>>> is for ppc64 JIT to realize ... but idea would be to have something
>>> like the below:
>>> 
>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>> index 29ca920..daa7258 100644
>>> --- a/kernel/bpf/core.c
>>> +++ b/kernel/bpf/core.c
>>> @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
>>>  	return ret;
>>>  }
>>> 
>>> +void * __weak bpf_jit_image_alloc(unsigned long size)
>>> +{
>>> +	return module_alloc(size);
>>> +}
>>> +
>>>  struct bpf_binary_header *
>>>  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>>  		     unsigned int alignment,
>>> @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>>  	 * random section of illegal instructions.
>>>  	 */
>>>  	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
>>> -	hdr = module_alloc(size);
>>> +	hdr = bpf_jit_image_alloc(size);
>>>  	if (hdr == NULL)
>>>  		return NULL;
>>> 
>>> And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
>>> like some archs would override the module_alloc() helper through a
>>> custom implementation, usually via __vmalloc_node_range(), so we
>>> could perhaps fit the range for BPF JITed images in a way that they
>>> could use the 32bit imm in the end? There are not that many progs
>>> loaded typically, so the range could be a bit narrower in such case
>>> anyway. (Not sure if this would work out though, but I thought to
>>> bring it up.)
>>
>> That'd be a good option to consider. I don't think we want to allocate 
>> anything from the linear memory range since users could load 
>> unprivileged BPF programs and consume a lot of memory that way. I doubt 
>> if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
>> entirely sure.
>>
>> Michael,
>> Is the above possible? The question is if we can have BPF programs be 
>> allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
>> that calls to those programs can be encoded in a 32-bit immediate field 
>> in a BPF instruction.
> 
> Hmmm.
> 
> It's not technically impossible, but I don't think it's really a good
> option.
> 
> The 0xc range is a linear mapping of RAM, and the kernel tends to be
> near the start of RAM for reasons. That means there generally isn't a
> hole in the 0xc range within 4GB for you to map BPF programs.
> 
> You could create a hole by making the 0xc mapping non linear, ie.
> mapping some RAM near the kernel elsewhere in the 0xc range, to make a
> hole that you can then remap BPF programs into. But I think that would
> cause a lot of bugs, it's a pretty fundamental assumption that the
> linear mapping is 1:1.
> 
>> As an extension, we may be able to extend it to 
>> 48-bits by combining with another BPF instruction field (offset). In 
>> either case, the vmalloc'ed address range won't work.
> 
> 48-bits could possibly work, we don't have systems with that much RAM
> *yet*. So you could remap the BPF programs at the end of the 0xc range,
> or somewhere we have a gap in RAM.
> 
> That would probably still confuse some things, ie. the 0xc mapping would
> be a 1:1 mapping of RAM plus some other stuff, but it might be tractable
> to fix.
> 
> We don't have page tables for the 0xc range when using the hash MMU, so
> we'd need to either fix that or use some bespoke data structure for
> keeping track of the mappings.
> 
> It doesn't really appeal :)

Thanks for the detailed explanation. That does look gross :)

> 
> We load modules in the 0xd range, and they call functions in the kernel,
> we handle that with trampolines. Can you do something similar?

Yes, we already handle this in our JIT. The scenario being considered 
here is in how the BPF core informs the BPF JIT engine when it has to 
call out to a different BPF JIT'ed function. The existing approach 
encodes an offset from __bpf_call_base() in the imm32 field of the BPF 
instruction itself. For powerpc64 though, I think we'll have to have the 
address of the JIT'ed BPF functions passed in through other means.

> 
> You obviously still need a 64-bit branch somewhere, but perhaps you only
> need one, or one per BPF program, rather than one per function call.

We may have more than a single branch depending on how the program is 
laid out. We are looking to see if it is worth detecting that a call is 
close-by to do a relative branch, rather than loading a full 64-bit 
address and branching to that. For BPF-to-BPF calls, we should also be 
able to skip all the r12/r2 GEP/TOC handling. We may need to emit an 
additional branch to skip over some instructions (or emit nops) just so 
we consistently emit a fixed number of instructions each pass.

> 
>> The alternative is to pass the full 64-bit address of the BPF program in 
>> an auxiliary field (as proposed in this patch set) but we need to fix it 
>> up for 'bpftool' as well.
> 
> OK. You'll have to explain to me how bpftool is related, I thought it
> was just for loading/examining BPF programs.

It is, and it can also dump loaded BPF programs. When such programs call 
out to other BPF programs, bpftool currently figures out the address of 
the target BPF function by essentially doing (__bpf_call_base + imm32) 
and looking up kallsyms. Since we've established that imm32 is not 
enough for us, this won't work on powerpc64.

However, if bpf_jit_kallsyms is not enabled, it looks like bpftool will 
still fail to resolve the call, and only print out an address today. I'm 
wondering if we can instead encode the bpf prog id in imm32. That way, 
we should be able to indicate the BPF function being called into.  
Daniel, is that something we can consider?


Thanks!
- Naveen


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
@ 2018-02-20 19:22           ` Naveen N. Rao
  0 siblings, 0 replies; 24+ messages in thread
From: Naveen N. Rao @ 2018-02-20 19:22 UTC (permalink / raw)
  To: ast, Daniel Borkmann, Michael Ellerman, Sandipan Das
  Cc: Michael Holzheu, linuxppc-dev, netdev

Michael Ellerman wrote:
> "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com> writes:
>> Daniel Borkmann wrote:
>>> On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
>>>> On 02/13/2018 05:05 AM, Sandipan Das wrote:
>>>>> The imm field of a bpf_insn is a signed 32-bit integer. For
>>>>> JIT-ed bpf-to-bpf function calls, it stores the offset from
>>>>> __bpf_call_base to the start of the callee function.
>>>>>
>>>>> For some architectures, such as powerpc64, it was found that
>>>>> this offset may be as large as 64 bits because of which this
>>>>> cannot be accomodated in the imm field without truncation.
>>>>>
>>>>> To resolve this, we additionally make aux->func within each
>>>>> bpf_prog associated with the functions to point to the list
>>>>> of all function addresses determined by the verifier.
>>>>>
>>>>> We keep the value assigned to the off field of the bpf_insn
>>>>> as a way to index into aux->func and also set aux->func_cnt
>>>>> so that this can be used for performing basic upper bound
>>>>> checks for the off field.
>>>>>
>>>>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
>>>>> ---
>>>>> v2: Make aux->func point to the list of functions determined
>>>>>     by the verifier rather than allocating a separate callee
>>>>>     list for each function.
>>>>=20
>>>> Approach looks good to me; do you know whether s390x JIT would
>>>> have similar requirement? I think one limitation that would still
>>>> need to be addressed later with such approach would be regarding the
>>>> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
>>>> ("bpf: allow for correlation of maps and helpers in dump"). Any
>>>> ideas for this (potentially if we could use off + imm for calls,
>>>> we'd get to 48 bits, but that seems still not be enough as you say)?
>>
>> All good points. I'm not really sure how s390x works, so I can't comment=
=20
>> on that, but I'm copying Michael Holzheu for his consideration.
>>
>> With the existing scheme, 48 bits won't be enough, so we rejected that=20
>> approach. I can also see how this will be a problem with bpftool, but I=20
>> haven't looked into it in detail. I wonder if we can annotate the output=
=20
>> to indicate the function being referred to?
>>
>>>=20
>>> One other random thought, although I'm not sure how feasible this
>>> is for ppc64 JIT to realize ... but idea would be to have something
>>> like the below:
>>>=20
>>> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
>>> index 29ca920..daa7258 100644
>>> --- a/kernel/bpf/core.c
>>> +++ b/kernel/bpf/core.c
>>> @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned =
long *value, char *type,
>>>  	return ret;
>>>  }
>>>=20
>>> +void * __weak bpf_jit_image_alloc(unsigned long size)
>>> +{
>>> +	return module_alloc(size);
>>> +}
>>> +
>>>  struct bpf_binary_header *
>>>  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
>>>  		     unsigned int alignment,
>>> @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **ima=
ge_ptr,
>>>  	 * random section of illegal instructions.
>>>  	 */
>>>  	size =3D round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
>>> -	hdr =3D module_alloc(size);
>>> +	hdr =3D bpf_jit_image_alloc(size);
>>>  	if (hdr =3D=3D NULL)
>>>  		return NULL;
>>>=20
>>> And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
>>> like some archs would override the module_alloc() helper through a
>>> custom implementation, usually via __vmalloc_node_range(), so we
>>> could perhaps fit the range for BPF JITed images in a way that they
>>> could use the 32bit imm in the end? There are not that many progs
>>> loaded typically, so the range could be a bit narrower in such case
>>> anyway. (Not sure if this would work out though, but I thought to
>>> bring it up.)
>>
>> That'd be a good option to consider. I don't think we want to allocate=20
>> anything from the linear memory range since users could load=20
>> unprivileged BPF programs and consume a lot of memory that way. I doubt=20
>> if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not=
=20
>> entirely sure.
>>
>> Michael,
>> Is the above possible? The question is if we can have BPF programs be=20
>> allocated within 4GB of __bpf_call_base (which is a kernel symbol), so=20
>> that calls to those programs can be encoded in a 32-bit immediate field=20
>> in a BPF instruction.
>=20
> Hmmm.
>=20
> It's not technically impossible, but I don't think it's really a good
> option.
>=20
> The 0xc range is a linear mapping of RAM, and the kernel tends to be
> near the start of RAM for reasons. That means there generally isn't a
> hole in the 0xc range within 4GB for you to map BPF programs.
>=20
> You could create a hole by making the 0xc mapping non linear, ie.
> mapping some RAM near the kernel elsewhere in the 0xc range, to make a
> hole that you can then remap BPF programs into. But I think that would
> cause a lot of bugs, it's a pretty fundamental assumption that the
> linear mapping is 1:1.
>=20
>> As an extension, we may be able to extend it to=20
>> 48-bits by combining with another BPF instruction field (offset). In=20
>> either case, the vmalloc'ed address range won't work.
>=20
> 48-bits could possibly work, we don't have systems with that much RAM
> *yet*. So you could remap the BPF programs at the end of the 0xc range,
> or somewhere we have a gap in RAM.
>=20
> That would probably still confuse some things, ie. the 0xc mapping would
> be a 1:1 mapping of RAM plus some other stuff, but it might be tractable
> to fix.
>=20
> We don't have page tables for the 0xc range when using the hash MMU, so
> we'd need to either fix that or use some bespoke data structure for
> keeping track of the mappings.
>=20
> It doesn't really appeal :)

Thanks for the detailed explanation. That does look gross :)

>=20
> We load modules in the 0xd range, and they call functions in the kernel,
> we handle that with trampolines. Can you do something similar?

Yes, we already handle this in our JIT. The scenario being considered=20
here is in how the BPF core informs the BPF JIT engine when it has to=20
call out to a different BPF JIT'ed function. The existing approach=20
encodes an offset from __bpf_call_base() in the imm32 field of the BPF=20
instruction itself. For powerpc64 though, I think we'll have to have the=20
address of the JIT'ed BPF functions passed in through other means.

>=20
> You obviously still need a 64-bit branch somewhere, but perhaps you only
> need one, or one per BPF program, rather than one per function call.

We may have more than a single branch depending on how the program is=20
laid out. We are looking to see if it is worth detecting that a call is=20
close-by to do a relative branch, rather than loading a full 64-bit=20
address and branching to that. For BPF-to-BPF calls, we should also be=20
able to skip all the r12/r2 GEP/TOC handling. We may need to emit an=20
additional branch to skip over some instructions (or emit nops) just so=20
we consistently emit a fixed number of instructions each pass.

>=20
>> The alternative is to pass the full 64-bit address of the BPF program in=
=20
>> an auxiliary field (as proposed in this patch set) but we need to fix it=
=20
>> up for 'bpftool' as well.
>=20
> OK. You'll have to explain to me how bpftool is related, I thought it
> was just for loading/examining BPF programs.

It is, and it can also dump loaded BPF programs. When such programs call=20
out to other BPF programs, bpftool currently figures out the address of=20
the target BPF function by essentially doing (__bpf_call_base + imm32)=20
and looking up kallsyms. Since we've established that imm32 is not=20
enough for us, this won't work on powerpc64.

However, if bpf_jit_kallsyms is not enabled, it looks like bpftool will=20
still fail to resolve the call, and only print out an address today. I'm=20
wondering if we can instead encode the bpf prog id in imm32. That way,=20
we should be able to indicate the BPF function being called into. =20
Daniel, is that something we can consider?


Thanks!
- Naveen

=

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
  2018-02-16 15:50       ` Naveen N. Rao
@ 2018-02-22 12:06         ` Michael Holzheu
  -1 siblings, 0 replies; 24+ messages in thread
From: Michael Holzheu @ 2018-02-22 12:06 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: Sandipan Das, Daniel Borkmann, ast, netdev, linuxppc-dev

Am Fri, 16 Feb 2018 21:20:09 +0530
schrieb "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>:

> Daniel Borkmann wrote:
> > On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
> >> On 02/13/2018 05:05 AM, Sandipan Das wrote:
> >>> The imm field of a bpf_insn is a signed 32-bit integer. For
> >>> JIT-ed bpf-to-bpf function calls, it stores the offset from
> >>> __bpf_call_base to the start of the callee function.
> >>>
> >>> For some architectures, such as powerpc64, it was found that
> >>> this offset may be as large as 64 bits because of which this
> >>> cannot be accomodated in the imm field without truncation.
> >>>
> >>> To resolve this, we additionally make aux->func within each
> >>> bpf_prog associated with the functions to point to the list
> >>> of all function addresses determined by the verifier.
> >>>
> >>> We keep the value assigned to the off field of the bpf_insn
> >>> as a way to index into aux->func and also set aux->func_cnt
> >>> so that this can be used for performing basic upper bound
> >>> checks for the off field.
> >>>
> >>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> >>> ---
> >>> v2: Make aux->func point to the list of functions determined
> >>>     by the verifier rather than allocating a separate callee
> >>>     list for each function.
> >> 
> >> Approach looks good to me; do you know whether s390x JIT would
> >> have similar requirement? I think one limitation that would still
> >> need to be addressed later with such approach would be regarding the
> >> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
> >> ("bpf: allow for correlation of maps and helpers in dump"). Any
> >> ideas for this (potentially if we could use off + imm for calls,
> >> we'd get to 48 bits, but that seems still not be enough as you say)?
> 
> All good points. I'm not really sure how s390x works, so I can't comment 
> on that, but I'm copying Michael Holzheu for his consideration.
> 
> With the existing scheme, 48 bits won't be enough, so we rejected that 
> approach. I can also see how this will be a problem with bpftool, but I 
> haven't looked into it in detail. I wonder if we can annotate the output 
> to indicate the function being referred to?
> 
> > 
> > One other random thought, although I'm not sure how feasible this
> > is for ppc64 JIT to realize ... but idea would be to have something
> > like the below:
> > 
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 29ca920..daa7258 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> >  	return ret;
> >  }
> > 
> > +void * __weak bpf_jit_image_alloc(unsigned long size)
> > +{
> > +	return module_alloc(size);
> > +}
> > +
> >  struct bpf_binary_header *
> >  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> >  		     unsigned int alignment,
> > @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> >  	 * random section of illegal instructions.
> >  	 */
> >  	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
> > -	hdr = module_alloc(size);
> > +	hdr = bpf_jit_image_alloc(size);
> >  	if (hdr == NULL)
> >  		return NULL;
> > 
> > And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
> > like some archs would override the module_alloc() helper through a
> > custom implementation, usually via __vmalloc_node_range(), so we
> > could perhaps fit the range for BPF JITed images in a way that they
> > could use the 32bit imm in the end? There are not that many progs
> > loaded typically, so the range could be a bit narrower in such case
> > anyway. (Not sure if this would work out though, but I thought to
> > bring it up.)
> 
> That'd be a good option to consider. I don't think we want to allocate 
> anything from the linear memory range since users could load 
> unprivileged BPF programs and consume a lot of memory that way. I doubt 
> if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
> entirely sure.
> 
> Michael,
> Is the above possible? The question is if we can have BPF programs be 
> allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
> that calls to those programs can be encoded in a 32-bit immediate field 
> in a BPF instruction. As an extension, we may be able to extend it to 
> 48-bits by combining with another BPF instruction field (offset). In 
> either case, the vmalloc'ed address range won't work.
> 
> The alternative is to pass the full 64-bit address of the BPF program in 
> an auxiliary field (as proposed in this patch set) but we need to fix it 
> up for 'bpftool' as well.

Hi Naveen,

Our s390 kernel maintainer Martin Schwidefsky took over
eBPF responsibility for s390 from me.

@Martin: Can you answer Navee's question?

Michael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
@ 2018-02-22 12:06         ` Michael Holzheu
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Holzheu @ 2018-02-22 12:06 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: ast, Daniel Borkmann, Sandipan Das, Michael Ellerman,
	linuxppc-dev, netdev

Am Fri, 16 Feb 2018 21:20:09 +0530
schrieb "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>:

> Daniel Borkmann wrote:
> > On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
> >> On 02/13/2018 05:05 AM, Sandipan Das wrote:
> >>> The imm field of a bpf_insn is a signed 32-bit integer. For
> >>> JIT-ed bpf-to-bpf function calls, it stores the offset from
> >>> __bpf_call_base to the start of the callee function.
> >>>
> >>> For some architectures, such as powerpc64, it was found that
> >>> this offset may be as large as 64 bits because of which this
> >>> cannot be accomodated in the imm field without truncation.
> >>>
> >>> To resolve this, we additionally make aux->func within each
> >>> bpf_prog associated with the functions to point to the list
> >>> of all function addresses determined by the verifier.
> >>>
> >>> We keep the value assigned to the off field of the bpf_insn
> >>> as a way to index into aux->func and also set aux->func_cnt
> >>> so that this can be used for performing basic upper bound
> >>> checks for the off field.
> >>>
> >>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> >>> ---
> >>> v2: Make aux->func point to the list of functions determined
> >>>     by the verifier rather than allocating a separate callee
> >>>     list for each function.
> >> 
> >> Approach looks good to me; do you know whether s390x JIT would
> >> have similar requirement? I think one limitation that would still
> >> need to be addressed later with such approach would be regarding the
> >> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
> >> ("bpf: allow for correlation of maps and helpers in dump"). Any
> >> ideas for this (potentially if we could use off + imm for calls,
> >> we'd get to 48 bits, but that seems still not be enough as you say)?
> 
> All good points. I'm not really sure how s390x works, so I can't comment 
> on that, but I'm copying Michael Holzheu for his consideration.
> 
> With the existing scheme, 48 bits won't be enough, so we rejected that 
> approach. I can also see how this will be a problem with bpftool, but I 
> haven't looked into it in detail. I wonder if we can annotate the output 
> to indicate the function being referred to?
> 
> > 
> > One other random thought, although I'm not sure how feasible this
> > is for ppc64 JIT to realize ... but idea would be to have something
> > like the below:
> > 
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 29ca920..daa7258 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> >  	return ret;
> >  }
> > 
> > +void * __weak bpf_jit_image_alloc(unsigned long size)
> > +{
> > +	return module_alloc(size);
> > +}
> > +
> >  struct bpf_binary_header *
> >  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> >  		     unsigned int alignment,
> > @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> >  	 * random section of illegal instructions.
> >  	 */
> >  	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
> > -	hdr = module_alloc(size);
> > +	hdr = bpf_jit_image_alloc(size);
> >  	if (hdr == NULL)
> >  		return NULL;
> > 
> > And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
> > like some archs would override the module_alloc() helper through a
> > custom implementation, usually via __vmalloc_node_range(), so we
> > could perhaps fit the range for BPF JITed images in a way that they
> > could use the 32bit imm in the end? There are not that many progs
> > loaded typically, so the range could be a bit narrower in such case
> > anyway. (Not sure if this would work out though, but I thought to
> > bring it up.)
> 
> That'd be a good option to consider. I don't think we want to allocate 
> anything from the linear memory range since users could load 
> unprivileged BPF programs and consume a lot of memory that way. I doubt 
> if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
> entirely sure.
> 
> Michael,
> Is the above possible? The question is if we can have BPF programs be 
> allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
> that calls to those programs can be encoded in a 32-bit immediate field 
> in a BPF instruction. As an extension, we may be able to extend it to 
> 48-bits by combining with another BPF instruction field (offset). In 
> either case, the vmalloc'ed address range won't work.
> 
> The alternative is to pass the full 64-bit address of the BPF program in 
> an auxiliary field (as proposed in this patch set) but we need to fix it 
> up for 'bpftool' as well.

Hi Naveen,

Our s390 kernel maintainer Martin Schwidefsky took over
eBPF responsibility for s390 from me.

@Martin: Can you answer Navee's question?

Michael

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
  2018-02-22 12:06         ` Michael Holzheu
@ 2018-02-22 12:10           ` Michael Holzheu
  -1 siblings, 0 replies; 24+ messages in thread
From: Michael Holzheu @ 2018-02-22 12:10 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Sandipan Das, Daniel Borkmann, ast, netdev, Naveen N. Rao,
	Michael Holzheu, linuxppc-dev

Am Thu, 22 Feb 2018 13:06:40 +0100
schrieb Michael Holzheu <holzheu@linux.vnet.ibm.com>:

> Am Fri, 16 Feb 2018 21:20:09 +0530
> schrieb "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>:
> 
> > Daniel Borkmann wrote:
> > > On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
> > >> On 02/13/2018 05:05 AM, Sandipan Das wrote:
> > >>> The imm field of a bpf_insn is a signed 32-bit integer. For
> > >>> JIT-ed bpf-to-bpf function calls, it stores the offset from
> > >>> __bpf_call_base to the start of the callee function.
> > >>>
> > >>> For some architectures, such as powerpc64, it was found that
> > >>> this offset may be as large as 64 bits because of which this
> > >>> cannot be accomodated in the imm field without truncation.
> > >>>
> > >>> To resolve this, we additionally make aux->func within each
> > >>> bpf_prog associated with the functions to point to the list
> > >>> of all function addresses determined by the verifier.
> > >>>
> > >>> We keep the value assigned to the off field of the bpf_insn
> > >>> as a way to index into aux->func and also set aux->func_cnt
> > >>> so that this can be used for performing basic upper bound
> > >>> checks for the off field.
> > >>>
> > >>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> > >>> ---
> > >>> v2: Make aux->func point to the list of functions determined
> > >>>     by the verifier rather than allocating a separate callee
> > >>>     list for each function.
> > >> 
> > >> Approach looks good to me; do you know whether s390x JIT would
> > >> have similar requirement? I think one limitation that would still
> > >> need to be addressed later with such approach would be regarding the
> > >> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
> > >> ("bpf: allow for correlation of maps and helpers in dump"). Any
> > >> ideas for this (potentially if we could use off + imm for calls,
> > >> we'd get to 48 bits, but that seems still not be enough as you say)?
> > 
> > All good points. I'm not really sure how s390x works, so I can't comment 
> > on that, but I'm copying Michael Holzheu for his consideration.
> > 
> > With the existing scheme, 48 bits won't be enough, so we rejected that 
> > approach. I can also see how this will be a problem with bpftool, but I 
> > haven't looked into it in detail. I wonder if we can annotate the output 
> > to indicate the function being referred to?
> > 
> > > 
> > > One other random thought, although I'm not sure how feasible this
> > > is for ppc64 JIT to realize ... but idea would be to have something
> > > like the below:
> > > 
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index 29ca920..daa7258 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> > >  	return ret;
> > >  }
> > > 
> > > +void * __weak bpf_jit_image_alloc(unsigned long size)
> > > +{
> > > +	return module_alloc(size);
> > > +}
> > > +
> > >  struct bpf_binary_header *
> > >  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> > >  		     unsigned int alignment,
> > > @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> > >  	 * random section of illegal instructions.
> > >  	 */
> > >  	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
> > > -	hdr = module_alloc(size);
> > > +	hdr = bpf_jit_image_alloc(size);
> > >  	if (hdr == NULL)
> > >  		return NULL;
> > > 
> > > And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
> > > like some archs would override the module_alloc() helper through a
> > > custom implementation, usually via __vmalloc_node_range(), so we
> > > could perhaps fit the range for BPF JITed images in a way that they
> > > could use the 32bit imm in the end? There are not that many progs
> > > loaded typically, so the range could be a bit narrower in such case
> > > anyway. (Not sure if this would work out though, but I thought to
> > > bring it up.)
> > 
> > That'd be a good option to consider. I don't think we want to allocate 
> > anything from the linear memory range since users could load 
> > unprivileged BPF programs and consume a lot of memory that way. I doubt 
> > if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
> > entirely sure.
> > 
> > Michael,
> > Is the above possible? The question is if we can have BPF programs be 
> > allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
> > that calls to those programs can be encoded in a 32-bit immediate field 
> > in a BPF instruction. As an extension, we may be able to extend it to 
> > 48-bits by combining with another BPF instruction field (offset). In 
> > either case, the vmalloc'ed address range won't work.
> > 
> > The alternative is to pass the full 64-bit address of the BPF program in 
> > an auxiliary field (as proposed in this patch set) but we need to fix it 
> > up for 'bpftool' as well.
 
> Hi Naveen,
> 
> Our s390 kernel maintainer Martin Schwidefsky took over
> eBPF responsibility for s390 from me.
> 
> @Martin: Can you answer Navee's question?
> 
> Michael

Damn, I forgot adding Martin to cc.
Time for vacation ;-)

Michael 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls
@ 2018-02-22 12:10           ` Michael Holzheu
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Holzheu @ 2018-02-22 12:10 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Michael Holzheu, Naveen N. Rao, ast, Daniel Borkmann,
	Sandipan Das, Michael Ellerman, linuxppc-dev, netdev

Am Thu, 22 Feb 2018 13:06:40 +0100
schrieb Michael Holzheu <holzheu@linux.vnet.ibm.com>:

> Am Fri, 16 Feb 2018 21:20:09 +0530
> schrieb "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>:
> 
> > Daniel Borkmann wrote:
> > > On 02/15/2018 05:25 PM, Daniel Borkmann wrote:
> > >> On 02/13/2018 05:05 AM, Sandipan Das wrote:
> > >>> The imm field of a bpf_insn is a signed 32-bit integer. For
> > >>> JIT-ed bpf-to-bpf function calls, it stores the offset from
> > >>> __bpf_call_base to the start of the callee function.
> > >>>
> > >>> For some architectures, such as powerpc64, it was found that
> > >>> this offset may be as large as 64 bits because of which this
> > >>> cannot be accomodated in the imm field without truncation.
> > >>>
> > >>> To resolve this, we additionally make aux->func within each
> > >>> bpf_prog associated with the functions to point to the list
> > >>> of all function addresses determined by the verifier.
> > >>>
> > >>> We keep the value assigned to the off field of the bpf_insn
> > >>> as a way to index into aux->func and also set aux->func_cnt
> > >>> so that this can be used for performing basic upper bound
> > >>> checks for the off field.
> > >>>
> > >>> Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
> > >>> ---
> > >>> v2: Make aux->func point to the list of functions determined
> > >>>     by the verifier rather than allocating a separate callee
> > >>>     list for each function.
> > >> 
> > >> Approach looks good to me; do you know whether s390x JIT would
> > >> have similar requirement? I think one limitation that would still
> > >> need to be addressed later with such approach would be regarding the
> > >> xlated prog dump in bpftool, see 'BPF calls via JIT' in 7105e828c087
> > >> ("bpf: allow for correlation of maps and helpers in dump"). Any
> > >> ideas for this (potentially if we could use off + imm for calls,
> > >> we'd get to 48 bits, but that seems still not be enough as you say)?
> > 
> > All good points. I'm not really sure how s390x works, so I can't comment 
> > on that, but I'm copying Michael Holzheu for his consideration.
> > 
> > With the existing scheme, 48 bits won't be enough, so we rejected that 
> > approach. I can also see how this will be a problem with bpftool, but I 
> > haven't looked into it in detail. I wonder if we can annotate the output 
> > to indicate the function being referred to?
> > 
> > > 
> > > One other random thought, although I'm not sure how feasible this
> > > is for ppc64 JIT to realize ... but idea would be to have something
> > > like the below:
> > > 
> > > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > > index 29ca920..daa7258 100644
> > > --- a/kernel/bpf/core.c
> > > +++ b/kernel/bpf/core.c
> > > @@ -512,6 +512,11 @@ int bpf_get_kallsym(unsigned int symnum, unsigned long *value, char *type,
> > >  	return ret;
> > >  }
> > > 
> > > +void * __weak bpf_jit_image_alloc(unsigned long size)
> > > +{
> > > +	return module_alloc(size);
> > > +}
> > > +
> > >  struct bpf_binary_header *
> > >  bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> > >  		     unsigned int alignment,
> > > @@ -525,7 +530,7 @@ bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr,
> > >  	 * random section of illegal instructions.
> > >  	 */
> > >  	size = round_up(proglen + sizeof(*hdr) + 128, PAGE_SIZE);
> > > -	hdr = module_alloc(size);
> > > +	hdr = bpf_jit_image_alloc(size);
> > >  	if (hdr == NULL)
> > >  		return NULL;
> > > 
> > > And ppc64 JIT could override bpf_jit_image_alloc() in a similar way
> > > like some archs would override the module_alloc() helper through a
> > > custom implementation, usually via __vmalloc_node_range(), so we
> > > could perhaps fit the range for BPF JITed images in a way that they
> > > could use the 32bit imm in the end? There are not that many progs
> > > loaded typically, so the range could be a bit narrower in such case
> > > anyway. (Not sure if this would work out though, but I thought to
> > > bring it up.)
> > 
> > That'd be a good option to consider. I don't think we want to allocate 
> > anything from the linear memory range since users could load 
> > unprivileged BPF programs and consume a lot of memory that way. I doubt 
> > if we can map vmalloc'ed memory into the 0xc0 address range, but I'm not 
> > entirely sure.
> > 
> > Michael,
> > Is the above possible? The question is if we can have BPF programs be 
> > allocated within 4GB of __bpf_call_base (which is a kernel symbol), so 
> > that calls to those programs can be encoded in a 32-bit immediate field 
> > in a BPF instruction. As an extension, we may be able to extend it to 
> > 48-bits by combining with another BPF instruction field (offset). In 
> > either case, the vmalloc'ed address range won't work.
> > 
> > The alternative is to pass the full 64-bit address of the BPF program in 
> > an auxiliary field (as proposed in this patch set) but we need to fix it 
> > up for 'bpftool' as well.
 
> Hi Naveen,
> 
> Our s390 kernel maintainer Martin Schwidefsky took over
> eBPF responsibility for s390 from me.
> 
> @Martin: Can you answer Navee's question?
> 
> Michael

Damn, I forgot adding Martin to cc.
Time for vacation ;-)

Michael 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
  2018-02-20 19:22           ` Naveen N. Rao
@ 2018-02-27 12:13             ` Sandipan Das
  -1 siblings, 0 replies; 24+ messages in thread
From: Sandipan Das @ 2018-02-27 12:13 UTC (permalink / raw)
  To: daniel, naveen.n.rao; +Cc: netdev, jakub.kicinski, linuxppc-dev, ast

"Naveen N. Rao" wrote:
> I'm wondering if we can instead encode the bpf prog id in
> imm32. That way, we should be able to indicate the BPF
> function being called into.  Daniel, is that something we
> can consider?

Since each subprog does not get a separate id, we cannot fetch
the fd and therefore the tag of a subprog. Instead we can use
the tag of the complete program as shown below.

"Daniel Borkmann" wrote:
> I think one limitation that would still need to be addressed later
> with such approach would be regarding the xlated prog dump in bpftool,
> see 'BPF calls via JIT' in 7105e828c087 ("bpf: allow for correlation
> of maps and helpers in dump"). Any ideas for this (potentially if we
> could use off + imm for calls, we'd get to 48 bits, but that seems
> still not be enough as you say)?

As an alternative, this is what I am thinking of:

Currently, for bpf-to-bpf calls, if bpf_jit_kallsyms is enabled,
bpftool looks up the name of the corresponding symbol for the
JIT-ed subprogram and shows it in the xlated dump alongside the
actual call instruction. However, the lookup is based on the
target address which is calculated using the imm field of the
instruction. So, once again, if imm is truncated, we will end
up with the wrong address. Also, the subprog aux data (which
has been proposed as a mitigation for this) is not accessible
from this tool.

We can still access the tag for the complete bpf program and use
this with the correct offset in an objdump-like notation as an
alterative for the name of the subprog that is the target of a
bpf-to-bpf call instruction.

Currently, an xlated dump looks like this:
   0: (85) call pc+2#bpf_prog_5f76847930402518_F
   1: (b7) r0 = 1
   2: (95) exit
   3: (b7) r0 = 2
   4: (95) exit

With this patch, it will look like this:
   0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3
   1: (b7) r0 = 1
   2: (95) exit
   3: (b7) r0 = 2
   4: (95) exit

where 8f85936f29a7790a is the tag of the bpf program and 3 is
the offset to the start of the subprog from the start of the
program.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
 tools/bpf/bpftool/prog.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index e8e2baaf93c2..93746d5d1e3c 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -415,9 +415,11 @@ struct kernel_sym {
 };
 
 struct dump_data {
+	unsigned char prog_tag[BPF_TAG_SIZE];
 	unsigned long address_call_base;
 	struct kernel_sym *sym_mapping;
 	__u32 sym_count;
+	unsigned int curr_line;
 	char scratch_buff[SYM_MAX_NAME];
 };
 
@@ -499,16 +501,16 @@ static void print_insn(struct bpf_verifier_env *env, const char *fmt, ...)
 }
 
 static const char *print_call_pcrel(struct dump_data *dd,
-				    struct kernel_sym *sym,
-				    unsigned long address,
 				    const struct bpf_insn *insn)
 {
-	if (sym)
-		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-			 "%+d#%s", insn->off, sym->name);
-	else
-		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-			 "%+d#0x%lx", insn->off, address);
+	snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+		 "+%d#bpf_prog_%02x%02x%02x%02x%02x%02x%02x%02x+%d",
+		 insn->off,
+		 dd->prog_tag[0], dd->prog_tag[1],
+		 dd->prog_tag[2], dd->prog_tag[3],
+		 dd->prog_tag[4], dd->prog_tag[5],
+		 dd->prog_tag[6], dd->prog_tag[7],
+		 dd->curr_line + insn->off + 1);
 	return dd->scratch_buff;
 }
 
@@ -534,7 +536,7 @@ static const char *print_call(void *private_data,
 
 	sym = kernel_syms_search(dd, address);
 	if (insn->src_reg == BPF_PSEUDO_CALL)
-		return print_call_pcrel(dd, sym, address, insn);
+		return print_call_pcrel(dd, insn);
 	else
 		return print_call_helper(dd, sym, address);
 }
@@ -576,6 +578,7 @@ static void dump_xlated_plain(struct dump_data *dd, void *buf,
 		double_insn = insn[i].code == (BPF_LD | BPF_IMM | BPF_DW);
 
 		printf("% 4d: ", i);
+		dd->curr_line = i;
 		print_bpf_insn(&cbs, NULL, insn + i, true);
 
 		if (opcodes) {
@@ -628,6 +631,7 @@ static void dump_xlated_json(struct dump_data *dd, void *buf,
 
 		jsonw_start_object(json_wtr);
 		jsonw_name(json_wtr, "disasm");
+		dd->curr_line = i;
 		print_bpf_insn(&cbs, NULL, insn + i, true);
 
 		if (opcodes) {
@@ -788,6 +792,7 @@ static int do_dump(int argc, char **argv)
 
 			disasm_print_insn(buf, *member_len, opcodes, name);
 		} else {
+			memcpy(dd.prog_tag, info.tag, sizeof(info.tag));
 			kernel_syms_load(&dd);
 			if (json_output)
 				dump_xlated_json(&dd, buf, *member_len, opcodes);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
@ 2018-02-27 12:13             ` Sandipan Das
  0 siblings, 0 replies; 24+ messages in thread
From: Sandipan Das @ 2018-02-27 12:13 UTC (permalink / raw)
  To: daniel, naveen.n.rao; +Cc: ast, mpe, jakub.kicinski, netdev, linuxppc-dev

"Naveen N. Rao" wrote:
> I'm wondering if we can instead encode the bpf prog id in
> imm32. That way, we should be able to indicate the BPF
> function being called into.  Daniel, is that something we
> can consider?

Since each subprog does not get a separate id, we cannot fetch
the fd and therefore the tag of a subprog. Instead we can use
the tag of the complete program as shown below.

"Daniel Borkmann" wrote:
> I think one limitation that would still need to be addressed later
> with such approach would be regarding the xlated prog dump in bpftool,
> see 'BPF calls via JIT' in 7105e828c087 ("bpf: allow for correlation
> of maps and helpers in dump"). Any ideas for this (potentially if we
> could use off + imm for calls, we'd get to 48 bits, but that seems
> still not be enough as you say)?

As an alternative, this is what I am thinking of:

Currently, for bpf-to-bpf calls, if bpf_jit_kallsyms is enabled,
bpftool looks up the name of the corresponding symbol for the
JIT-ed subprogram and shows it in the xlated dump alongside the
actual call instruction. However, the lookup is based on the
target address which is calculated using the imm field of the
instruction. So, once again, if imm is truncated, we will end
up with the wrong address. Also, the subprog aux data (which
has been proposed as a mitigation for this) is not accessible
from this tool.

We can still access the tag for the complete bpf program and use
this with the correct offset in an objdump-like notation as an
alterative for the name of the subprog that is the target of a
bpf-to-bpf call instruction.

Currently, an xlated dump looks like this:
   0: (85) call pc+2#bpf_prog_5f76847930402518_F
   1: (b7) r0 = 1
   2: (95) exit
   3: (b7) r0 = 2
   4: (95) exit

With this patch, it will look like this:
   0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3
   1: (b7) r0 = 1
   2: (95) exit
   3: (b7) r0 = 2
   4: (95) exit

where 8f85936f29a7790a is the tag of the bpf program and 3 is
the offset to the start of the subprog from the start of the
program.

Signed-off-by: Sandipan Das <sandipan@linux.vnet.ibm.com>
---
 tools/bpf/bpftool/prog.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index e8e2baaf93c2..93746d5d1e3c 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -415,9 +415,11 @@ struct kernel_sym {
 };
 
 struct dump_data {
+	unsigned char prog_tag[BPF_TAG_SIZE];
 	unsigned long address_call_base;
 	struct kernel_sym *sym_mapping;
 	__u32 sym_count;
+	unsigned int curr_line;
 	char scratch_buff[SYM_MAX_NAME];
 };
 
@@ -499,16 +501,16 @@ static void print_insn(struct bpf_verifier_env *env, const char *fmt, ...)
 }
 
 static const char *print_call_pcrel(struct dump_data *dd,
-				    struct kernel_sym *sym,
-				    unsigned long address,
 				    const struct bpf_insn *insn)
 {
-	if (sym)
-		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-			 "%+d#%s", insn->off, sym->name);
-	else
-		snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
-			 "%+d#0x%lx", insn->off, address);
+	snprintf(dd->scratch_buff, sizeof(dd->scratch_buff),
+		 "+%d#bpf_prog_%02x%02x%02x%02x%02x%02x%02x%02x+%d",
+		 insn->off,
+		 dd->prog_tag[0], dd->prog_tag[1],
+		 dd->prog_tag[2], dd->prog_tag[3],
+		 dd->prog_tag[4], dd->prog_tag[5],
+		 dd->prog_tag[6], dd->prog_tag[7],
+		 dd->curr_line + insn->off + 1);
 	return dd->scratch_buff;
 }
 
@@ -534,7 +536,7 @@ static const char *print_call(void *private_data,
 
 	sym = kernel_syms_search(dd, address);
 	if (insn->src_reg == BPF_PSEUDO_CALL)
-		return print_call_pcrel(dd, sym, address, insn);
+		return print_call_pcrel(dd, insn);
 	else
 		return print_call_helper(dd, sym, address);
 }
@@ -576,6 +578,7 @@ static void dump_xlated_plain(struct dump_data *dd, void *buf,
 		double_insn = insn[i].code == (BPF_LD | BPF_IMM | BPF_DW);
 
 		printf("% 4d: ", i);
+		dd->curr_line = i;
 		print_bpf_insn(&cbs, NULL, insn + i, true);
 
 		if (opcodes) {
@@ -628,6 +631,7 @@ static void dump_xlated_json(struct dump_data *dd, void *buf,
 
 		jsonw_start_object(json_wtr);
 		jsonw_name(json_wtr, "disasm");
+		dd->curr_line = i;
 		print_bpf_insn(&cbs, NULL, insn + i, true);
 
 		if (opcodes) {
@@ -788,6 +792,7 @@ static int do_dump(int argc, char **argv)
 
 			disasm_print_insn(buf, *member_len, opcodes, name);
 		} else {
+			memcpy(dd.prog_tag, info.tag, sizeof(info.tag));
 			kernel_syms_load(&dd);
 			if (json_output)
 				dump_xlated_json(&dd, buf, *member_len, opcodes);
-- 
2.14.3

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
  2018-02-27 12:13             ` Sandipan Das
@ 2018-02-27 14:44               ` Daniel Borkmann
  -1 siblings, 0 replies; 24+ messages in thread
From: Daniel Borkmann @ 2018-02-27 14:44 UTC (permalink / raw)
  To: Sandipan Das, naveen.n.rao; +Cc: netdev, jakub.kicinski, linuxppc-dev, ast

On 02/27/2018 01:13 PM, Sandipan Das wrote:
> "Naveen N. Rao" wrote:
>> I'm wondering if we can instead encode the bpf prog id in
>> imm32. That way, we should be able to indicate the BPF
>> function being called into.  Daniel, is that something we
>> can consider?
> 
> Since each subprog does not get a separate id, we cannot fetch
> the fd and therefore the tag of a subprog. Instead we can use
> the tag of the complete program as shown below.
> 
> "Daniel Borkmann" wrote:
>> I think one limitation that would still need to be addressed later
>> with such approach would be regarding the xlated prog dump in bpftool,
>> see 'BPF calls via JIT' in 7105e828c087 ("bpf: allow for correlation
>> of maps and helpers in dump"). Any ideas for this (potentially if we
>> could use off + imm for calls, we'd get to 48 bits, but that seems
>> still not be enough as you say)?
> 
> As an alternative, this is what I am thinking of:
> 
> Currently, for bpf-to-bpf calls, if bpf_jit_kallsyms is enabled,
> bpftool looks up the name of the corresponding symbol for the
> JIT-ed subprogram and shows it in the xlated dump alongside the
> actual call instruction. However, the lookup is based on the
> target address which is calculated using the imm field of the
> instruction. So, once again, if imm is truncated, we will end
> up with the wrong address. Also, the subprog aux data (which
> has been proposed as a mitigation for this) is not accessible
> from this tool.
> 
> We can still access the tag for the complete bpf program and use
> this with the correct offset in an objdump-like notation as an
> alterative for the name of the subprog that is the target of a
> bpf-to-bpf call instruction.
> 
> Currently, an xlated dump looks like this:
>    0: (85) call pc+2#bpf_prog_5f76847930402518_F
>    1: (b7) r0 = 1
>    2: (95) exit
>    3: (b7) r0 = 2
>    4: (95) exit
> 
> With this patch, it will look like this:
>    0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3

(Note the +2 is the insn->off already.)

>    1: (b7) r0 = 1
>    2: (95) exit
>    3: (b7) r0 = 2
>    4: (95) exit
> 
> where 8f85936f29a7790a is the tag of the bpf program and 3 is
> the offset to the start of the subprog from the start of the
> program.

The problem with this approach would be that right now the name is
something like bpf_prog_5f76847930402518_F where the subprog tag is
just a placeholder so in future, this may well adapt to e.g. the actual
function name from the elf file. Note that when kallsyms is enabled
then a name like bpf_prog_5f76847930402518_F will also appear in stack
traces, perf records, etc, so for correlation/debugging it would really
help to have them the same everywhere.

Worst case if there's nothing better, potentially what one could do in
bpf_prog_get_info_by_fd() is to dump an array of full addresses and
have the imm part as the index pointing to one of them, just unfortunate
that it's likely only needed in ppc64.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
@ 2018-02-27 14:44               ` Daniel Borkmann
  0 siblings, 0 replies; 24+ messages in thread
From: Daniel Borkmann @ 2018-02-27 14:44 UTC (permalink / raw)
  To: Sandipan Das, naveen.n.rao; +Cc: ast, mpe, jakub.kicinski, netdev, linuxppc-dev

On 02/27/2018 01:13 PM, Sandipan Das wrote:
> "Naveen N. Rao" wrote:
>> I'm wondering if we can instead encode the bpf prog id in
>> imm32. That way, we should be able to indicate the BPF
>> function being called into.  Daniel, is that something we
>> can consider?
> 
> Since each subprog does not get a separate id, we cannot fetch
> the fd and therefore the tag of a subprog. Instead we can use
> the tag of the complete program as shown below.
> 
> "Daniel Borkmann" wrote:
>> I think one limitation that would still need to be addressed later
>> with such approach would be regarding the xlated prog dump in bpftool,
>> see 'BPF calls via JIT' in 7105e828c087 ("bpf: allow for correlation
>> of maps and helpers in dump"). Any ideas for this (potentially if we
>> could use off + imm for calls, we'd get to 48 bits, but that seems
>> still not be enough as you say)?
> 
> As an alternative, this is what I am thinking of:
> 
> Currently, for bpf-to-bpf calls, if bpf_jit_kallsyms is enabled,
> bpftool looks up the name of the corresponding symbol for the
> JIT-ed subprogram and shows it in the xlated dump alongside the
> actual call instruction. However, the lookup is based on the
> target address which is calculated using the imm field of the
> instruction. So, once again, if imm is truncated, we will end
> up with the wrong address. Also, the subprog aux data (which
> has been proposed as a mitigation for this) is not accessible
> from this tool.
> 
> We can still access the tag for the complete bpf program and use
> this with the correct offset in an objdump-like notation as an
> alterative for the name of the subprog that is the target of a
> bpf-to-bpf call instruction.
> 
> Currently, an xlated dump looks like this:
>    0: (85) call pc+2#bpf_prog_5f76847930402518_F
>    1: (b7) r0 = 1
>    2: (95) exit
>    3: (b7) r0 = 2
>    4: (95) exit
> 
> With this patch, it will look like this:
>    0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3

(Note the +2 is the insn->off already.)

>    1: (b7) r0 = 1
>    2: (95) exit
>    3: (b7) r0 = 2
>    4: (95) exit
> 
> where 8f85936f29a7790a is the tag of the bpf program and 3 is
> the offset to the start of the subprog from the start of the
> program.

The problem with this approach would be that right now the name is
something like bpf_prog_5f76847930402518_F where the subprog tag is
just a placeholder so in future, this may well adapt to e.g. the actual
function name from the elf file. Note that when kallsyms is enabled
then a name like bpf_prog_5f76847930402518_F will also appear in stack
traces, perf records, etc, so for correlation/debugging it would really
help to have them the same everywhere.

Worst case if there's nothing better, potentially what one could do in
bpf_prog_get_info_by_fd() is to dump an array of full addresses and
have the imm part as the index pointing to one of them, just unfortunate
that it's likely only needed in ppc64.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
  2018-02-27 14:44               ` Daniel Borkmann
@ 2018-03-01  8:51                 ` Naveen N. Rao
  -1 siblings, 0 replies; 24+ messages in thread
From: Naveen N. Rao @ 2018-03-01  8:51 UTC (permalink / raw)
  To: Daniel Borkmann, Sandipan Das; +Cc: linuxppc-dev, netdev, jakub.kicinski, ast

Daniel Borkmann wrote:
> On 02/27/2018 01:13 PM, Sandipan Das wrote:
>> With this patch, it will look like this:
>>    0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3
> 
> (Note the +2 is the insn->off already.)
> 
>>    1: (b7) r0 = 1
>>    2: (95) exit
>>    3: (b7) r0 = 2
>>    4: (95) exit
>> 
>> where 8f85936f29a7790a is the tag of the bpf program and 3 is
>> the offset to the start of the subprog from the start of the
>> program.
> 
> The problem with this approach would be that right now the name is
> something like bpf_prog_5f76847930402518_F where the subprog tag is
> just a placeholder so in future, this may well adapt to e.g. the actual
> function name from the elf file. Note that when kallsyms is enabled
> then a name like bpf_prog_5f76847930402518_F will also appear in stack
> traces, perf records, etc, so for correlation/debugging it would really
> help to have them the same everywhere.
> 
> Worst case if there's nothing better, potentially what one could do in
> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
> have the imm part as the index pointing to one of them, just unfortunate
> that it's likely only needed in ppc64.

Ok. We seem to have discussed a few different aspects in this thread.  
Let me summarize the different aspects we have discussed:
1. Passing address of JIT'ed function to the JIT engines:
    Two approaches discussed:
    a. Existing approach, where the subprog address is encoded as an 
    offset from __bpf_call_base() in imm32 field of the BPF call 
    instruction. This requires the JIT'ed function to be within 2GB of 
    __bpf_call_base(), which won't be true on ppc64, at the least. So, 
    this won't on ppc64 (and any other architectures where vmalloc'ed 
    (module_alloc()) memory is from a different, far, address range).
    
    [As a side note, is it _actually_ guaranteed that JIT'ed functions 
    will be within 2GB (signed 32-bit...) on all other architectures 
    where BPF JIT is supported? I'm not quite sure how memory allocation 
    works on other architectures, but it looks like this can fail if 
    there are other larger allocations.]

    b. Pass the full 64-bit address of the call target in an auxiliary 
    field for the JIT engine to use (as implemented in this mail chain).  
    We can then use this to determine the call target if this is a 
    pseudo call.

    There is a third option we can consider:
    c. Convert BPF pseudo call instruction into a 2-instruction sequence 
    (similar to BPF_DW) and encode the full 64-bit call target in the 
    second bpf instruction. To distinguish this from other instruction 
    forms, we can set imm32 to -1.

    If we go with (b) or (c), we will need to take a call on whether we 
    will implement this in the same manner across all architectures, or 
    if we should have ppc64 (and any other affected architectures) work 
    differently from the rest.

    Further more, for (b), bpftool won't be able to derive the target 
    function call address, but approaches (a) and (c) are fine. More 
    about that below...

2. Indicating target function in bpftool:
    In the existing approach, bpftool can determine target address since 
    the offset is encoded in imm32 and is able to lookup the name from 
    kallsyms, if enabled.

    If we go with approach (b) for ppc64, this won't work and we will 
    have to minimally update bpftool to detect that the target address 
    is not available on ppc64.

    If we go with approach (c), the target address will be available and 
    we should be able to update bpftool to look that up.
 
    [As a side note, I suppose part of Sandipan's point with the 
    previous patch was to make the bpftool output consistent whether or 
    not JIT is enabled. It does look a bit weird that bpftool shows the 
    address of a JIT'ed function when asked to print the BPF bytecode.]

Thoughts?


- Naveen


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
@ 2018-03-01  8:51                 ` Naveen N. Rao
  0 siblings, 0 replies; 24+ messages in thread
From: Naveen N. Rao @ 2018-03-01  8:51 UTC (permalink / raw)
  To: Daniel Borkmann, Sandipan Das
  Cc: ast, jakub.kicinski, linuxppc-dev, mpe, netdev

Daniel Borkmann wrote:
> On 02/27/2018 01:13 PM, Sandipan Das wrote:
>> With this patch, it will look like this:
>>    0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3
>=20
> (Note the +2 is the insn->off already.)
>=20
>>    1: (b7) r0 =3D 1
>>    2: (95) exit
>>    3: (b7) r0 =3D 2
>>    4: (95) exit
>>=20
>> where 8f85936f29a7790a is the tag of the bpf program and 3 is
>> the offset to the start of the subprog from the start of the
>> program.
>=20
> The problem with this approach would be that right now the name is
> something like bpf_prog_5f76847930402518_F where the subprog tag is
> just a placeholder so in future, this may well adapt to e.g. the actual
> function name from the elf file. Note that when kallsyms is enabled
> then a name like bpf_prog_5f76847930402518_F will also appear in stack
> traces, perf records, etc, so for correlation/debugging it would really
> help to have them the same everywhere.
>=20
> Worst case if there's nothing better, potentially what one could do in
> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
> have the imm part as the index pointing to one of them, just unfortunate
> that it's likely only needed in ppc64.

Ok. We seem to have discussed a few different aspects in this thread. =20
Let me summarize the different aspects we have discussed:
1. Passing address of JIT'ed function to the JIT engines:
    Two approaches discussed:
    a. Existing approach, where the subprog address is encoded as an=20
    offset from __bpf_call_base() in imm32 field of the BPF call=20
    instruction. This requires the JIT'ed function to be within 2GB of=20
    __bpf_call_base(), which won't be true on ppc64, at the least. So,=20
    this won't on ppc64 (and any other architectures where vmalloc'ed=20
    (module_alloc()) memory is from a different, far, address range).
   =20
    [As a side note, is it _actually_ guaranteed that JIT'ed functions=20
    will be within 2GB (signed 32-bit...) on all other architectures=20
    where BPF JIT is supported? I'm not quite sure how memory allocation=20
    works on other architectures, but it looks like this can fail if=20
    there are other larger allocations.]

    b. Pass the full 64-bit address of the call target in an auxiliary=20
    field for the JIT engine to use (as implemented in this mail chain). =20
    We can then use this to determine the call target if this is a=20
    pseudo call.

    There is a third option we can consider:
    c. Convert BPF pseudo call instruction into a 2-instruction sequence=20
    (similar to BPF_DW) and encode the full 64-bit call target in the=20
    second bpf instruction. To distinguish this from other instruction=20
    forms, we can set imm32 to -1.

    If we go with (b) or (c), we will need to take a call on whether we=20
    will implement this in the same manner across all architectures, or=20
    if we should have ppc64 (and any other affected architectures) work=20
    differently from the rest.

    Further more, for (b), bpftool won't be able to derive the target=20
    function call address, but approaches (a) and (c) are fine. More=20
    about that below...

2. Indicating target function in bpftool:
    In the existing approach, bpftool can determine target address since=20
    the offset is encoded in imm32 and is able to lookup the name from=20
    kallsyms, if enabled.

    If we go with approach (b) for ppc64, this won't work and we will=20
    have to minimally update bpftool to detect that the target address=20
    is not available on ppc64.

    If we go with approach (c), the target address will be available and=20
    we should be able to update bpftool to look that up.
=20
    [As a side note, I suppose part of Sandipan's point with the=20
    previous patch was to make the bpftool output consistent whether or=20
    not JIT is enabled. It does look a bit weird that bpftool shows the=20
    address of a JIT'ed function when asked to print the BPF bytecode.]

Thoughts?


- Naveen

=

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
  2018-03-01  8:51                 ` Naveen N. Rao
@ 2018-03-05 17:02                   ` Alexei Starovoitov
  -1 siblings, 0 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2018-03-05 17:02 UTC (permalink / raw)
  To: Naveen N. Rao, Daniel Borkmann, Sandipan Das
  Cc: linuxppc-dev, netdev, jakub.kicinski

On 3/1/18 12:51 AM, Naveen N. Rao wrote:
> Daniel Borkmann wrote:
>> On 02/27/2018 01:13 PM, Sandipan Das wrote:
>>> With this patch, it will look like this:
>>>    0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3
>>
>> (Note the +2 is the insn->off already.)
>>
>>>    1: (b7) r0 = 1
>>>    2: (95) exit
>>>    3: (b7) r0 = 2
>>>    4: (95) exit
>>>
>>> where 8f85936f29a7790a is the tag of the bpf program and 3 is
>>> the offset to the start of the subprog from the start of the
>>> program.
>>
>> The problem with this approach would be that right now the name is
>> something like bpf_prog_5f76847930402518_F where the subprog tag is
>> just a placeholder so in future, this may well adapt to e.g. the actual
>> function name from the elf file. Note that when kallsyms is enabled
>> then a name like bpf_prog_5f76847930402518_F will also appear in stack
>> traces, perf records, etc, so for correlation/debugging it would really
>> help to have them the same everywhere.
>>
>> Worst case if there's nothing better, potentially what one could do in
>> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
>> have the imm part as the index pointing to one of them, just unfortunate
>> that it's likely only needed in ppc64.
>
> Ok. We seem to have discussed a few different aspects in this thread.
> Let me summarize the different aspects we have discussed:
> 1. Passing address of JIT'ed function to the JIT engines:
>    Two approaches discussed:
>    a. Existing approach, where the subprog address is encoded as an
> offset from __bpf_call_base() in imm32 field of the BPF call
> instruction. This requires the JIT'ed function to be within 2GB of
> __bpf_call_base(), which won't be true on ppc64, at the least. So,
> this won't on ppc64 (and any other architectures where vmalloc'ed
> (module_alloc()) memory is from a different, far, address range).

it looks like ppc64 doesn't guarantee today that all of module_alloc()
will be within 32-bit, but I think it should be trivial to add such
guarantee. If so, we can define another __bpf_call_base specifically
for bpf-to-bpf calls when jit is on.
Then jit_subprogs() math will fit:
insn->imm = func[subprog]->bpf_func - __bpf_call_base_for_jited_progs;
and will make it easier for ppc64 jit to optimize and use
near calls for bpf-to-bpf calls while still using trampoline
for bpf-to-kernel.
Also it solves bpftool issue.
For all other archs we can keep
__bpf_call_base_for_jited_progs == __bpf_call_base

>    There is a third option we can consider:
>    c. Convert BPF pseudo call instruction into a 2-instruction sequence
>    (similar to BPF_DW) and encode the full 64-bit call target in the
> second bpf instruction. To distinguish this from other instruction
> forms, we can set imm32 to -1.

Adding new instruction just for that case looks like overkill.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
@ 2018-03-05 17:02                   ` Alexei Starovoitov
  0 siblings, 0 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2018-03-05 17:02 UTC (permalink / raw)
  To: Naveen N. Rao, Daniel Borkmann, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, mpe, netdev

On 3/1/18 12:51 AM, Naveen N. Rao wrote:
> Daniel Borkmann wrote:
>> On 02/27/2018 01:13 PM, Sandipan Das wrote:
>>> With this patch, it will look like this:
>>>    0: (85) call pc+2#bpf_prog_8f85936f29a7790a+3
>>
>> (Note the +2 is the insn->off already.)
>>
>>>    1: (b7) r0 = 1
>>>    2: (95) exit
>>>    3: (b7) r0 = 2
>>>    4: (95) exit
>>>
>>> where 8f85936f29a7790a is the tag of the bpf program and 3 is
>>> the offset to the start of the subprog from the start of the
>>> program.
>>
>> The problem with this approach would be that right now the name is
>> something like bpf_prog_5f76847930402518_F where the subprog tag is
>> just a placeholder so in future, this may well adapt to e.g. the actual
>> function name from the elf file. Note that when kallsyms is enabled
>> then a name like bpf_prog_5f76847930402518_F will also appear in stack
>> traces, perf records, etc, so for correlation/debugging it would really
>> help to have them the same everywhere.
>>
>> Worst case if there's nothing better, potentially what one could do in
>> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
>> have the imm part as the index pointing to one of them, just unfortunate
>> that it's likely only needed in ppc64.
>
> Ok. We seem to have discussed a few different aspects in this thread.
> Let me summarize the different aspects we have discussed:
> 1. Passing address of JIT'ed function to the JIT engines:
>    Two approaches discussed:
>    a. Existing approach, where the subprog address is encoded as an
> offset from __bpf_call_base() in imm32 field of the BPF call
> instruction. This requires the JIT'ed function to be within 2GB of
> __bpf_call_base(), which won't be true on ppc64, at the least. So,
> this won't on ppc64 (and any other architectures where vmalloc'ed
> (module_alloc()) memory is from a different, far, address range).

it looks like ppc64 doesn't guarantee today that all of module_alloc()
will be within 32-bit, but I think it should be trivial to add such
guarantee. If so, we can define another __bpf_call_base specifically
for bpf-to-bpf calls when jit is on.
Then jit_subprogs() math will fit:
insn->imm = func[subprog]->bpf_func - __bpf_call_base_for_jited_progs;
and will make it easier for ppc64 jit to optimize and use
near calls for bpf-to-bpf calls while still using trampoline
for bpf-to-kernel.
Also it solves bpftool issue.
For all other archs we can keep
__bpf_call_base_for_jited_progs == __bpf_call_base

>    There is a third option we can consider:
>    c. Convert BPF pseudo call instruction into a 2-instruction sequence
>    (similar to BPF_DW) and encode the full 64-bit call target in the
> second bpf instruction. To distinguish this from other instruction
> forms, we can set imm32 to -1.

Adding new instruction just for that case looks like overkill.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
       [not found]                   ` <415b415e-f47f-082c-1bc9-87d3e9d3aed1__9575.16645216874$1520270545$gmane$org@ fb.com>
@ 2018-05-03 15:20                       ` Naveen N. Rao
  0 siblings, 0 replies; 24+ messages in thread
From: Naveen N. Rao @ 2018-05-03 15:20 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, netdev, Michael Ellerman

Alexei Starovoitov wrote:
> On 3/1/18 12:51 AM, Naveen N. Rao wrote:
>> Daniel Borkmann wrote:
>>>
>>> Worst case if there's nothing better, potentially what one could do in
>>> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
>>> have the imm part as the index pointing to one of them, just unfortunate
>>> that it's likely only needed in ppc64.
>>
>> Ok. We seem to have discussed a few different aspects in this thread.
>> Let me summarize the different aspects we have discussed:
>> 1. Passing address of JIT'ed function to the JIT engines:
>>    Two approaches discussed:
>>    a. Existing approach, where the subprog address is encoded as an
>> offset from __bpf_call_base() in imm32 field of the BPF call
>> instruction. This requires the JIT'ed function to be within 2GB of
>> __bpf_call_base(), which won't be true on ppc64, at the least. So,
>> this won't on ppc64 (and any other architectures where vmalloc'ed
>> (module_alloc()) memory is from a different, far, address range).
> 
> it looks like ppc64 doesn't guarantee today that all of module_alloc()
> will be within 32-bit, but I think it should be trivial to add such
> guarantee. If so, we can define another __bpf_call_base specifically
> for bpf-to-bpf calls when jit is on.

Ok, we prefer not to do that for powerpc (atleast, not for all of 
module_alloc()) at this point.

And since option (c) below is not preferable, I think we will implement 
what Daniel suggested above. This patchset already handles communicating 
the BPF function addresses to the JIT engine, and enhancing 
bpf_prog_get_info_by_fd() should address the concerns with bpftool.


- Naveen

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls
@ 2018-05-03 15:20                       ` Naveen N. Rao
  0 siblings, 0 replies; 24+ messages in thread
From: Naveen N. Rao @ 2018-05-03 15:20 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Sandipan Das
  Cc: jakub.kicinski, linuxppc-dev, netdev, Michael Ellerman

Alexei Starovoitov wrote:
> On 3/1/18 12:51 AM, Naveen N. Rao wrote:
>> Daniel Borkmann wrote:
>>>
>>> Worst case if there's nothing better, potentially what one could do in
>>> bpf_prog_get_info_by_fd() is to dump an array of full addresses and
>>> have the imm part as the index pointing to one of them, just unfortunat=
e
>>> that it's likely only needed in ppc64.
>>
>> Ok. We seem to have discussed a few different aspects in this thread.
>> Let me summarize the different aspects we have discussed:
>> 1. Passing address of JIT'ed function to the JIT engines:
>>    Two approaches discussed:
>>    a. Existing approach, where the subprog address is encoded as an
>> offset from __bpf_call_base() in imm32 field of the BPF call
>> instruction. This requires the JIT'ed function to be within 2GB of
>> __bpf_call_base(), which won't be true on ppc64, at the least. So,
>> this won't on ppc64 (and any other architectures where vmalloc'ed
>> (module_alloc()) memory is from a different, far, address range).
>=20
> it looks like ppc64 doesn't guarantee today that all of module_alloc()
> will be within 32-bit, but I think it should be trivial to add such
> guarantee. If so, we can define another __bpf_call_base specifically
> for bpf-to-bpf calls when jit is on.

Ok, we prefer not to do that for powerpc (atleast, not for all of=20
module_alloc()) at this point.

And since option (c) below is not preferable, I think we will implement=20
what Daniel suggested above. This patchset already handles communicating=20
the BPF function addresses to the JIT engine, and enhancing=20
bpf_prog_get_info_by_fd() should address the concerns with bpftool.

- Naveen

=

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2018-05-03 15:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-13  4:05 [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls Sandipan Das
2018-02-13  4:06 ` [RFC][PATCH bpf v2 2/2] bpf: powerpc64: add JIT support for multi-function programs Sandipan Das
2018-02-15 16:25 ` [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls Daniel Borkmann
2018-02-15 20:18   ` Daniel Borkmann
2018-02-16 15:50     ` Naveen N. Rao
2018-02-16 15:50       ` Naveen N. Rao
2018-02-20  9:29       ` Michael Ellerman
2018-02-20  9:29         ` Michael Ellerman
2018-02-20 19:22         ` Naveen N. Rao
2018-02-20 19:22           ` Naveen N. Rao
2018-02-27 12:13           ` [RFC][PATCH bpf] tools: bpftool: Fix tags for bpf-to-bpf calls Sandipan Das
2018-02-27 12:13             ` Sandipan Das
2018-02-27 14:44             ` Daniel Borkmann
2018-02-27 14:44               ` Daniel Borkmann
2018-03-01  8:51               ` Naveen N. Rao
2018-03-01  8:51                 ` Naveen N. Rao
2018-03-05 17:02                 ` Alexei Starovoitov
2018-03-05 17:02                   ` Alexei Starovoitov
     [not found]                 ` <415b415e-f47f-082c-1bc9-87d3e9d3aed1__9575.16645216874$1520270545$gmane$org@fb.com>
     [not found]                   ` <415b415e-f47f-082c-1bc9-87d3e9d3aed1__9575.16645216874$1520270545$gmane$org@ fb.com>
2018-05-03 15:20                     ` Naveen N. Rao
2018-05-03 15:20                       ` Naveen N. Rao
2018-02-22 12:06       ` [RFC][PATCH bpf v2 1/2] bpf: allow 64-bit offsets for bpf function calls Michael Holzheu
2018-02-22 12:06         ` Michael Holzheu
2018-02-22 12:10         ` Michael Holzheu
2018-02-22 12:10           ` Michael Holzheu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.