bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] bpf, x64: bump the number of passes to 64
@ 2020-12-03  9:12 Gary Lin
  2020-12-03 11:20 ` Eric Dumazet
  0 siblings, 1 reply; 5+ messages in thread
From: Gary Lin @ 2020-12-03  9:12 UTC (permalink / raw)
  To: netdev, bpf; +Cc: Alexei Starovoitov, Daniel Borkmann, andreas.taschner

The x64 bpf jit expects bpf images converge within the given passes, but
it could fail to do so with some corner cases. For example:

      l0:     ldh [4]
      l1:     jeq #0x537d, l2, l40
      l2:     ld [0]
      l3:     jeq #0xfa163e0d, l4, l40
      l4:     ldh [12]
      l5:     ldx #0xe
      l6:     jeq #0x86dd, l41, l7
      l8:     ld [x+16]
      l9:     ja 41

        [... repeated ja 41 ]

      l40:    ja 41
      l41:    ret #0
      l42:    ld #len
      l43:    ret a

This bpf program contains 32 "ja 41" instructions which are effectively
NOPs and designed to be replaced with valid code dynamically. Ideally,
bpf jit should optimize those "ja 41" instructions out when translating
the bpf instructions into x86_64 machine code. However, do_jit() can
only remove one "ja 41" for offset==0 on each pass, so it requires at
least 32 runs to eliminate those JMPs and exceeds the current limit of
passes (20). In the end, the program got rejected when BPF_JIT_ALWAYS_ON
is set even though it's legit as a classic socket filter.

Since this kind of programs are usually handcrafted rather than
generated by LLVM, those programs tend to be small. To avoid increasing
the complexity of BPF JIT, this commit just bumps the number of passes
to 64 as suggested by Daniel to make it less likely to fail on such cases.

Signed-off-by: Gary Lin <glin@suse.com>
---
 arch/x86/net/bpf_jit_comp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 796506dcfc42..43cc80387548 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -2042,7 +2042,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	 * may converge on the last pass. In such case do one more
 	 * pass to emit the final image.
 	 */
-	for (pass = 0; pass < 20 || image; pass++) {
+	for (pass = 0; pass < 64 || image; pass++) {
 		proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
 		if (proglen <= 0) {
 out_image:
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] bpf, x64: bump the number of passes to 64
  2020-12-03  9:12 [PATCH] bpf, x64: bump the number of passes to 64 Gary Lin
@ 2020-12-03 11:20 ` Eric Dumazet
  2020-12-03 18:14   ` Alexei Starovoitov
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Dumazet @ 2020-12-03 11:20 UTC (permalink / raw)
  To: Gary Lin, netdev, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, andreas.taschner



On 12/3/20 10:12 AM, Gary Lin wrote:
> The x64 bpf jit expects bpf images converge within the given passes, but
> it could fail to do so with some corner cases. For example:
> 
>       l0:     ldh [4]
>       l1:     jeq #0x537d, l2, l40
>       l2:     ld [0]
>       l3:     jeq #0xfa163e0d, l4, l40
>       l4:     ldh [12]
>       l5:     ldx #0xe
>       l6:     jeq #0x86dd, l41, l7
>       l8:     ld [x+16]
>       l9:     ja 41
> 
>         [... repeated ja 41 ]
> 
>       l40:    ja 41
>       l41:    ret #0
>       l42:    ld #len
>       l43:    ret a
> 
> This bpf program contains 32 "ja 41" instructions which are effectively
> NOPs and designed to be replaced with valid code dynamically. Ideally,
> bpf jit should optimize those "ja 41" instructions out when translating
> the bpf instructions into x86_64 machine code. However, do_jit() can
> only remove one "ja 41" for offset==0 on each pass, so it requires at
> least 32 runs to eliminate those JMPs and exceeds the current limit of
> passes (20). In the end, the program got rejected when BPF_JIT_ALWAYS_ON
> is set even though it's legit as a classic socket filter.
> 
> Since this kind of programs are usually handcrafted rather than
> generated by LLVM, those programs tend to be small. To avoid increasing
> the complexity of BPF JIT, this commit just bumps the number of passes
> to 64 as suggested by Daniel to make it less likely to fail on such cases.
> 

Another idea would be to stop trying to reduce size of generated
code after a given number of passes have been attempted.

Because even a limit of 64 wont ensure all 'valid' programs can be JITed.




> Signed-off-by: Gary Lin <glin@suse.com>
> ---
>  arch/x86/net/bpf_jit_comp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 796506dcfc42..43cc80387548 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -2042,7 +2042,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
>  	 * may converge on the last pass. In such case do one more
>  	 * pass to emit the final image.
>  	 */
> -	for (pass = 0; pass < 20 || image; pass++) {
> +	for (pass = 0; pass < 64 || image; pass++) {
>  		proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
>  		if (proglen <= 0) {
>  out_image:
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] bpf, x64: bump the number of passes to 64
  2020-12-03 11:20 ` Eric Dumazet
@ 2020-12-03 18:14   ` Alexei Starovoitov
  2020-12-04  3:42     ` Gary Lin
  0 siblings, 1 reply; 5+ messages in thread
From: Alexei Starovoitov @ 2020-12-03 18:14 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Gary Lin, netdev, bpf, Alexei Starovoitov, Daniel Borkmann,
	andreas.taschner

On Thu, Dec 03, 2020 at 12:20:38PM +0100, Eric Dumazet wrote:
> 
> 
> On 12/3/20 10:12 AM, Gary Lin wrote:
> > The x64 bpf jit expects bpf images converge within the given passes, but
> > it could fail to do so with some corner cases. For example:
> > 
> >       l0:     ldh [4]
> >       l1:     jeq #0x537d, l2, l40
> >       l2:     ld [0]
> >       l3:     jeq #0xfa163e0d, l4, l40
> >       l4:     ldh [12]
> >       l5:     ldx #0xe
> >       l6:     jeq #0x86dd, l41, l7
> >       l8:     ld [x+16]
> >       l9:     ja 41
> > 
> >         [... repeated ja 41 ]
> > 
> >       l40:    ja 41
> >       l41:    ret #0
> >       l42:    ld #len
> >       l43:    ret a
> > 
> > This bpf program contains 32 "ja 41" instructions which are effectively
> > NOPs and designed to be replaced with valid code dynamically. Ideally,
> > bpf jit should optimize those "ja 41" instructions out when translating
> > the bpf instructions into x86_64 machine code. However, do_jit() can
> > only remove one "ja 41" for offset==0 on each pass, so it requires at
> > least 32 runs to eliminate those JMPs and exceeds the current limit of
> > passes (20). In the end, the program got rejected when BPF_JIT_ALWAYS_ON
> > is set even though it's legit as a classic socket filter.
> > 
> > Since this kind of programs are usually handcrafted rather than
> > generated by LLVM, those programs tend to be small. To avoid increasing
> > the complexity of BPF JIT, this commit just bumps the number of passes
> > to 64 as suggested by Daniel to make it less likely to fail on such cases.
> > 
> 
> Another idea would be to stop trying to reduce size of generated
> code after a given number of passes have been attempted.
> 
> Because even a limit of 64 wont ensure all 'valid' programs can be JITed.

+1.
Bumping the limit is not solving anything.
It only allows bad actors force kernel to spend more time in JIT.
If we're holding locks the longer looping may cause issues.
I think JIT is parallel enough, but still it's a concern.

I wonder how assemblers deal with it?
They probably face the same issue.

Instead of going back to 32-bit jumps and suddenly increase image size
I think we can do nop padding instead.
After few loops every insn is more or less optimal.
I think the fix could be something like:
  if (is_imm8(jmp_offset)) {
       EMIT2(jmp_cond, jmp_offset);
       if (loop_cnt > 5) {
          EMIT N nops
          where N = addrs[i] - addrs[i - 1]; // not sure about this math.
          N can be 0 or 4 here.
          // or may be NOPs should be emitted before EMIT2.
          // need to think it through
       }
  }
Will something like this work?
I think that's what you're suggesting, right?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] bpf, x64: bump the number of passes to 64
  2020-12-03 18:14   ` Alexei Starovoitov
@ 2020-12-04  3:42     ` Gary Lin
  2020-12-04 10:15       ` Gary Lin
  0 siblings, 1 reply; 5+ messages in thread
From: Gary Lin @ 2020-12-04  3:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, netdev, bpf, Alexei Starovoitov, Daniel Borkmann,
	andreas.taschner

On Thu, Dec 03, 2020 at 10:14:31AM -0800, Alexei Starovoitov wrote:
> On Thu, Dec 03, 2020 at 12:20:38PM +0100, Eric Dumazet wrote:
> > 
> > 
> > On 12/3/20 10:12 AM, Gary Lin wrote:
> > > The x64 bpf jit expects bpf images converge within the given passes, but
> > > it could fail to do so with some corner cases. For example:
> > > 
> > >       l0:     ldh [4]
> > >       l1:     jeq #0x537d, l2, l40
> > >       l2:     ld [0]
> > >       l3:     jeq #0xfa163e0d, l4, l40
> > >       l4:     ldh [12]
> > >       l5:     ldx #0xe
> > >       l6:     jeq #0x86dd, l41, l7
> > >       l8:     ld [x+16]
> > >       l9:     ja 41
> > > 
> > >         [... repeated ja 41 ]
> > > 
> > >       l40:    ja 41
> > >       l41:    ret #0
> > >       l42:    ld #len
> > >       l43:    ret a
> > > 
> > > This bpf program contains 32 "ja 41" instructions which are effectively
> > > NOPs and designed to be replaced with valid code dynamically. Ideally,
> > > bpf jit should optimize those "ja 41" instructions out when translating
> > > the bpf instructions into x86_64 machine code. However, do_jit() can
> > > only remove one "ja 41" for offset==0 on each pass, so it requires at
> > > least 32 runs to eliminate those JMPs and exceeds the current limit of
> > > passes (20). In the end, the program got rejected when BPF_JIT_ALWAYS_ON
> > > is set even though it's legit as a classic socket filter.
> > > 
> > > Since this kind of programs are usually handcrafted rather than
> > > generated by LLVM, those programs tend to be small. To avoid increasing
> > > the complexity of BPF JIT, this commit just bumps the number of passes
> > > to 64 as suggested by Daniel to make it less likely to fail on such cases.
> > > 
> > 
> > Another idea would be to stop trying to reduce size of generated
> > code after a given number of passes have been attempted.
> > 
> > Because even a limit of 64 wont ensure all 'valid' programs can be JITed.
> 
> +1.
> Bumping the limit is not solving anything.
> It only allows bad actors force kernel to spend more time in JIT.
> If we're holding locks the longer looping may cause issues.
> I think JIT is parallel enough, but still it's a concern.
> 
> I wonder how assemblers deal with it?
> They probably face the same issue.
> 
> Instead of going back to 32-bit jumps and suddenly increase image size
> I think we can do nop padding instead.
> After few loops every insn is more or less optimal.
> I think the fix could be something like:
>   if (is_imm8(jmp_offset)) {
>        EMIT2(jmp_cond, jmp_offset);
>        if (loop_cnt > 5) {
>           EMIT N nops
>           where N = addrs[i] - addrs[i - 1]; // not sure about this math.
>           N can be 0 or 4 here.
>           // or may be NOPs should be emitted before EMIT2.
>           // need to think it through
>        }
>   }
This looks promising. Once we switch to nop padding, the image is likely
to converge soon. Maybe we can postpone the padding to the last 5 passes
so that do_jit() could optimize the image a bit more.

> Will something like this work?
> I think that's what you're suggesting, right?
> 
Besides nop padding, the optimization for 0 offset jump also has to be
disabled since it's actually the one causing image shrinking in my case.

Gary Lin


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] bpf, x64: bump the number of passes to 64
  2020-12-04  3:42     ` Gary Lin
@ 2020-12-04 10:15       ` Gary Lin
  0 siblings, 0 replies; 5+ messages in thread
From: Gary Lin @ 2020-12-04 10:15 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, netdev, bpf, Alexei Starovoitov, Daniel Borkmann,
	andreas.taschner

On Fri, Dec 04, 2020 at 11:42:13AM +0800, Gary Lin wrote:
> On Thu, Dec 03, 2020 at 10:14:31AM -0800, Alexei Starovoitov wrote:
> > On Thu, Dec 03, 2020 at 12:20:38PM +0100, Eric Dumazet wrote:
> > > 
> > > 
> > > On 12/3/20 10:12 AM, Gary Lin wrote:
> > > > The x64 bpf jit expects bpf images converge within the given passes, but
> > > > it could fail to do so with some corner cases. For example:
> > > > 
> > > >       l0:     ldh [4]
> > > >       l1:     jeq #0x537d, l2, l40
> > > >       l2:     ld [0]
> > > >       l3:     jeq #0xfa163e0d, l4, l40
> > > >       l4:     ldh [12]
> > > >       l5:     ldx #0xe
> > > >       l6:     jeq #0x86dd, l41, l7
> > > >       l8:     ld [x+16]
> > > >       l9:     ja 41
> > > > 
> > > >         [... repeated ja 41 ]
> > > > 
> > > >       l40:    ja 41
> > > >       l41:    ret #0
> > > >       l42:    ld #len
> > > >       l43:    ret a
> > > > 
> > > > This bpf program contains 32 "ja 41" instructions which are effectively
> > > > NOPs and designed to be replaced with valid code dynamically. Ideally,
> > > > bpf jit should optimize those "ja 41" instructions out when translating
> > > > the bpf instructions into x86_64 machine code. However, do_jit() can
> > > > only remove one "ja 41" for offset==0 on each pass, so it requires at
> > > > least 32 runs to eliminate those JMPs and exceeds the current limit of
> > > > passes (20). In the end, the program got rejected when BPF_JIT_ALWAYS_ON
> > > > is set even though it's legit as a classic socket filter.
> > > > 
> > > > Since this kind of programs are usually handcrafted rather than
> > > > generated by LLVM, those programs tend to be small. To avoid increasing
> > > > the complexity of BPF JIT, this commit just bumps the number of passes
> > > > to 64 as suggested by Daniel to make it less likely to fail on such cases.
> > > > 
> > > 
> > > Another idea would be to stop trying to reduce size of generated
> > > code after a given number of passes have been attempted.
> > > 
> > > Because even a limit of 64 wont ensure all 'valid' programs can be JITed.
> > 
> > +1.
> > Bumping the limit is not solving anything.
> > It only allows bad actors force kernel to spend more time in JIT.
> > If we're holding locks the longer looping may cause issues.
> > I think JIT is parallel enough, but still it's a concern.
> > 
> > I wonder how assemblers deal with it?
> > They probably face the same issue.
> > 
> > Instead of going back to 32-bit jumps and suddenly increase image size
> > I think we can do nop padding instead.
> > After few loops every insn is more or less optimal.
> > I think the fix could be something like:
> >   if (is_imm8(jmp_offset)) {
> >        EMIT2(jmp_cond, jmp_offset);
> >        if (loop_cnt > 5) {
> >           EMIT N nops
> >           where N = addrs[i] - addrs[i - 1]; // not sure about this math.
> >           N can be 0 or 4 here.
> >           // or may be NOPs should be emitted before EMIT2.
> >           // need to think it through
> >        }
> >   }
> This looks promising. Once we switch to nop padding, the image is likely
> to converge soon. Maybe we can postpone the padding to the last 5 passes
> so that do_jit() could optimize the image a bit more.
> 
> > Will something like this work?
> > I think that's what you're suggesting, right?
> > 
> Besides nop padding, the optimization for 0 offset jump also has to be
> disabled since it's actually the one causing image shrinking in my case.
> 
> Gary Lin
> 

Here is my testing patch. My sample program got accepted with this
patch. Haven't done the further test though.

---
 arch/x86/net/bpf_jit_comp.c | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 796506dcfc42..6a39c5ba6383 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -790,7 +790,7 @@ static void detect_reg_usage(struct bpf_insn *insn, int insn_cnt,
 }
 
 static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
-		  int oldproglen, struct jit_context *ctx)
+		  int oldproglen, struct jit_context *ctx, bool ja_padding)
 {
 	bool tail_call_reachable = bpf_prog->aux->tail_call_reachable;
 	struct bpf_insn *insn = bpf_prog->insnsi;
@@ -800,6 +800,7 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
 	bool seen_exit = false;
 	u8 temp[BPF_MAX_INSN_SIZE + BPF_INSN_SAFETY];
 	int i, cnt = 0, excnt = 0;
+	int p;
 	int proglen = 0;
 	u8 *prog = temp;
 
@@ -1410,6 +1411,11 @@ xadd:			if (is_imm8(insn->off))
 			jmp_offset = addrs[i + insn->off] - addrs[i];
 			if (is_imm8(jmp_offset)) {
 				EMIT2(jmp_cond, jmp_offset);
+				ilen = prog - temp;
+				if (ja_padding && (addrs[i] - addrs[i-1]) > ilen) {
+					for (p = 0; p < 4; p++)
+						EMIT1(0x90);
+				}
 			} else if (is_simm32(jmp_offset)) {
 				EMIT2_off32(0x0F, jmp_cond + 0x10, jmp_offset);
 			} else {
@@ -1431,12 +1437,17 @@ xadd:			if (is_imm8(insn->off))
 			else
 				jmp_offset = addrs[i + insn->off] - addrs[i];
 
-			if (!jmp_offset)
+			if (!jmp_offset && !ja_padding)
 				/* Optimize out nop jumps */
 				break;
 emit_jmp:
 			if (is_imm8(jmp_offset)) {
 				EMIT2(0xEB, jmp_offset);
+				ilen = prog - temp;
+				if (ja_padding && (addrs[i] - addrs[i-1]) > ilen) {
+					for (p = 0; p < 4; p++)
+						EMIT1(0x90);
+				}
 			} else if (is_simm32(jmp_offset)) {
 				EMIT1_off32(0xE9, jmp_offset);
 			} else {
@@ -1972,6 +1983,9 @@ struct x64_jit_data {
 	struct jit_context ctx;
 };
 
+#define MAX_PASSES 20
+#define PADDING_PASSES (MAX_PASSES - 5)
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
 	struct bpf_binary_header *header = NULL;
@@ -2043,7 +2057,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	 * pass to emit the final image.
 	 */
 	for (pass = 0; pass < 20 || image; pass++) {
-		proglen = do_jit(prog, addrs, image, oldproglen, &ctx);
+		if (pass < PADDING_PASSES)
+			proglen = do_jit(prog, addrs, image, oldproglen, &ctx, false);
+		else
+			proglen = do_jit(prog, addrs, image, oldproglen, &ctx, true);
 		if (proglen <= 0) {
 out_image:
 			image = NULL;


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-12-04 10:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03  9:12 [PATCH] bpf, x64: bump the number of passes to 64 Gary Lin
2020-12-03 11:20 ` Eric Dumazet
2020-12-03 18:14   ` Alexei Starovoitov
2020-12-04  3:42     ` Gary Lin
2020-12-04 10:15       ` Gary Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).