bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] bpf, x64: allow not-converged images when BPF_JIT_ALWAYS_ON is set
@ 2020-11-13  8:38 Gary Lin
  2020-11-14  1:48 ` Alexei Starovoitov
  0 siblings, 1 reply; 3+ messages in thread
From: Gary Lin @ 2020-11-13  8:38 UTC (permalink / raw)
  Cc: Alexei Starovoitov, Daniel Borkmann, netdev, bpf, andreas.taschner

The x64 bpf jit expects the bpf images converge within the given passes.
However there is a corner case:

  l0:     ldh [4]
  l1:     jeq #0x537d, l2, l40
  l2:     ld [0]
  l3:     jeq #0xfa163e0d, l4, l40
  l4:     ldh [12]
  l5:     ldx #0xe
  l6:     jeq #0x86dd, l41, l7
  l7:     jeq #0x800, l8, l41
  l8:     ld [x+16]
  l9:     ja 41

    [... repeated ja 41 ]

  l40:    ja 41
  l41:    ret #0
  l42:    ld #len
  l43:    ret a

The bpf program contains 32 "ja 41" and do_jit() only removes one "ja 41"
right before "l41:  ret #0" for offset==0 in each pass, so
bpf_int_jit_compile() needs to run do_jit() at least 32 times to
eliminate those JMP instructions. Since the current max number of passes
is 20, the bpf program couldn't converge within 20 passes and got rejected
when BPF_JIT_ALWAYS_ON is set even though it's legit as a classic socket
filter.

A not-converged image may be not optimal but at least the bpf
instructions are translated into x64 machine code. Maybe we could just
issue a warning instead so that the program is still loaded and the user
is also notified.

On the other hand, if the size convergence is mandatory, then it
deserves a document to collect the corner cases so that the user could
know the limitations and how to work around them.

Signed-off-by: Gary Lin <glin@suse.com>
---
 arch/x86/net/bpf_jit_comp.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 796506dcfc42..90814c2daaae 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1972,6 +1972,8 @@ struct x64_jit_data {
 	struct jit_context ctx;
 };
 
+#define MAX_JIT_PASSES 20
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	struct bpf_binary_header *header = NULL;
@@ -2042,7 +2044,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	 * may converge on the last pass. In such case do one more
 	 * pass to emit the final image.
 	 */
-	for (pass = 0; pass < 20 || image; pass++) {
+	for (pass = 0; pass < MAX_JIT_PASSES || image; pass++) {
 		proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
 		if (proglen <= 0) {
 out_image:
@@ -2054,13 +2056,22 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		}
 		if (image) {
 			if (proglen != oldproglen) {
+#ifdef CONFIG_BPF_JIT_ALWAYS_ON
+				pr_warn("bpf_jit: proglen=%d != oldproglen=%d pass=%d\n",
+					proglen, oldproglen, pass);
+#else
 				pr_err("bpf_jit: proglen=%d != oldproglen=%d\n",
 				       proglen, oldproglen);
 				goto out_image;
+#endif
 			}
 			break;
 		}
+#ifdef CONFIG_BPF_JIT_ALWAYS_ON
+		if (proglen == oldproglen || pass >= (MAX_JIT_PASSES - 1)) {
+#else
 		if (proglen == oldproglen) {
+#endif
 			/*
 			 * The number of entries in extable is the number of BPF_LDX
 			 * insns that access kernel memory via "pointer to BTF type".
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] bpf, x64: allow not-converged images when BPF_JIT_ALWAYS_ON is set
  2020-11-13  8:38 [PATCH RFC] bpf, x64: allow not-converged images when BPF_JIT_ALWAYS_ON is set Gary Lin
@ 2020-11-14  1:48 ` Alexei Starovoitov
  2020-11-16  6:31   ` Gary Lin
  0 siblings, 1 reply; 3+ messages in thread
From: Alexei Starovoitov @ 2020-11-14  1:48 UTC (permalink / raw)
  To: Gary Lin
  Cc: Alexei Starovoitov, Daniel Borkmann, Network Development, bpf,
	andreas.taschner

On Fri, Nov 13, 2020 at 12:40 AM Gary Lin <glin@suse.com> wrote:
>
> The x64 bpf jit expects the bpf images converge within the given passes.
> However there is a corner case:
>
>   l0:     ldh [4]
>   l1:     jeq #0x537d, l2, l40
>   l2:     ld [0]
>   l3:     jeq #0xfa163e0d, l4, l40
>   l4:     ldh [12]
>   l5:     ldx #0xe
>   l6:     jeq #0x86dd, l41, l7
>   l7:     jeq #0x800, l8, l41
>   l8:     ld [x+16]
>   l9:     ja 41
>
>     [... repeated ja 41 ]
>
>   l40:    ja 41
>   l41:    ret #0
>   l42:    ld #len
>   l43:    ret a
>
> The bpf program contains 32 "ja 41" and do_jit() only removes one "ja 41"
> right before "l41:  ret #0" for offset==0 in each pass, so
> bpf_int_jit_compile() needs to run do_jit() at least 32 times to
> eliminate those JMP instructions. Since the current max number of passes
> is 20, the bpf program couldn't converge within 20 passes and got rejected
> when BPF_JIT_ALWAYS_ON is set even though it's legit as a classic socket
> filter.
>
> A not-converged image may be not optimal but at least the bpf
> instructions are translated into x64 machine code. Maybe we could just
> issue a warning instead so that the program is still loaded and the user
> is also notified.

Non-convergence is not about being optimal. It's about correctness.
If size is different it likely means that at least one jump has the
wrong offset.

Bumping from 20 to 64 also won't solve it.
There could be a case where 64 isn't enough either.
One of the test_bpf.ko tests can hit any limit, iirc.

Also we've seen a case where JIT might never converge.
The iteration N can have size 40, iteration N+1 size 38, iteration N+2 size 40
and keep oscillating like this.

I think the fix could be is to avoid optimality in size when pass
number is getting large.
Like after pass > 10 BPF_JA could always use 32-bit offset regardless
of actual addrs[i + insn->off] - addrs[i]; difference.
There could be other solutions too.

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH RFC] bpf, x64: allow not-converged images when BPF_JIT_ALWAYS_ON is set
  2020-11-14  1:48 ` Alexei Starovoitov
@ 2020-11-16  6:31   ` Gary Lin
  0 siblings, 0 replies; 3+ messages in thread
From: Gary Lin @ 2020-11-16  6:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Network Development, bpf,
	andreas.taschner

On Fri, Nov 13, 2020 at 05:48:31PM -0800, Alexei Starovoitov wrote:
> On Fri, Nov 13, 2020 at 12:40 AM Gary Lin <glin@suse.com> wrote:
> >
> > The x64 bpf jit expects the bpf images converge within the given passes.
> > However there is a corner case:
> >
> >   l0:     ldh [4]
> >   l1:     jeq #0x537d, l2, l40
> >   l2:     ld [0]
> >   l3:     jeq #0xfa163e0d, l4, l40
> >   l4:     ldh [12]
> >   l5:     ldx #0xe
> >   l6:     jeq #0x86dd, l41, l7
> >   l7:     jeq #0x800, l8, l41
> >   l8:     ld [x+16]
> >   l9:     ja 41
> >
> >     [... repeated ja 41 ]
> >
> >   l40:    ja 41
> >   l41:    ret #0
> >   l42:    ld #len
> >   l43:    ret a
> >
> > The bpf program contains 32 "ja 41" and do_jit() only removes one "ja 41"
> > right before "l41:  ret #0" for offset==0 in each pass, so
> > bpf_int_jit_compile() needs to run do_jit() at least 32 times to
> > eliminate those JMP instructions. Since the current max number of passes
> > is 20, the bpf program couldn't converge within 20 passes and got rejected
> > when BPF_JIT_ALWAYS_ON is set even though it's legit as a classic socket
> > filter.
> >
> > A not-converged image may be not optimal but at least the bpf
> > instructions are translated into x64 machine code. Maybe we could just
> > issue a warning instead so that the program is still loaded and the user
> > is also notified.
> 
> Non-convergence is not about being optimal. It's about correctness.
> If size is different it likely means that at least one jump has the
> wrong offset.
> 
Ah, I see.

> Bumping from 20 to 64 also won't solve it.
> There could be a case where 64 isn't enough either.
True. Increasing the number of passes is just a workaround.

> One of the test_bpf.ko tests can hit any limit, iirc.
Thanks for the pointer. Will look into the tests.

>
> Also we've seen a case where JIT might never converge.
> The iteration N can have size 40, iteration N+1 size 38, iteration N+2 size 40
> and keep oscillating like this.
> 
> I think the fix could be is to avoid optimality in size when pass
> number is getting large.
> Like after pass > 10 BPF_JA could always use 32-bit offset regardless
> of actual addrs[i + insn->off] - addrs[i]; difference.
> There could be other solutions too.
> 
So the size convergence can be ignored if all BPF_JMPs are translated
into 32-bit offset jmp?

Thanks,

Gary Lin


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-11-16  6:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13  8:38 [PATCH RFC] bpf, x64: allow not-converged images when BPF_JIT_ALWAYS_ON is set Gary Lin
2020-11-14  1:48 ` Alexei Starovoitov
2020-11-16  6:31   ` Gary Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).