All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image
@ 2017-01-13 17:10 Naveen N. Rao
  2017-01-13 17:10 ` [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer Naveen N. Rao
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Naveen N. Rao @ 2017-01-13 17:10 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, netdev, ast, daniel, davem

From: Daniel Borkmann <daniel@iogearbox.net>

We have a check earlier to ensure we don't proceed if image is NULL. As
such, the redundant check can be removed.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
[Added similar changes for classic BPF JIT]
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp.c   | 17 +++++++++--------
 arch/powerpc/net/bpf_jit_comp64.c | 16 ++++++++--------
 2 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c
index 7e706f3..f9941b3 100644
--- a/arch/powerpc/net/bpf_jit_comp.c
+++ b/arch/powerpc/net/bpf_jit_comp.c
@@ -662,16 +662,17 @@ void bpf_jit_compile(struct bpf_prog *fp)
 		 */
 		bpf_jit_dump(flen, proglen, pass, code_base);
 
-	if (image) {
-		bpf_flush_icache(code_base, code_base + (proglen/4));
+	bpf_flush_icache(code_base, code_base + (proglen/4));
+
 #ifdef CONFIG_PPC64
-		/* Function descriptor nastiness: Address + TOC */
-		((u64 *)image)[0] = (u64)code_base;
-		((u64 *)image)[1] = local_paca->kernel_toc;
+	/* Function descriptor nastiness: Address + TOC */
+	((u64 *)image)[0] = (u64)code_base;
+	((u64 *)image)[1] = local_paca->kernel_toc;
 #endif
-		fp->bpf_func = (void *)image;
-		fp->jited = 1;
-	}
+
+	fp->bpf_func = (void *)image;
+	fp->jited = 1;
+
 out:
 	kfree(addrs);
 	return;
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 0fe98a5..89b6a86 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -1046,16 +1046,16 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		 */
 		bpf_jit_dump(flen, proglen, pass, code_base);
 
-	if (image) {
-		bpf_flush_icache(bpf_hdr, image + alloclen);
+	bpf_flush_icache(bpf_hdr, image + alloclen);
+
 #ifdef PPC64_ELF_ABI_v1
-		/* Function descriptor nastiness: Address + TOC */
-		((u64 *)image)[0] = (u64)code_base;
-		((u64 *)image)[1] = local_paca->kernel_toc;
+	/* Function descriptor nastiness: Address + TOC */
+	((u64 *)image)[0] = (u64)code_base;
+	((u64 *)image)[1] = local_paca->kernel_toc;
 #endif
-		fp->bpf_func = (void *)image;
-		fp->jited = 1;
-	}
+
+	fp->bpf_func = (void *)image;
+	fp->jited = 1;
 
 out:
 	kfree(addrs);
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer
  2017-01-13 17:10 [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Naveen N. Rao
@ 2017-01-13 17:10 ` Naveen N. Rao
  2017-01-13 20:10   ` Alexei Starovoitov
                     ` (2 more replies)
  2017-01-13 17:10 ` [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations Naveen N. Rao
                   ` (3 subsequent siblings)
  4 siblings, 3 replies; 19+ messages in thread
From: Naveen N. Rao @ 2017-01-13 17:10 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, netdev, ast, daniel, davem

With bpf_jit_binary_alloc(), we allocate at a page granularity and fill
the rest of the space with illegal instructions to mitigate BPF spraying
attacks, while having the actual JIT'ed BPF program at a random location
within the allocated space. Under this scenario, it would be better to
flush the entire allocated buffer rather than just the part containing
the actual program. We already flush the buffer from start to the end of
the BPF program. Extend this to include the illegal instructions after
the BPF program.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 89b6a86..1e313db 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -1046,8 +1046,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 		 */
 		bpf_jit_dump(flen, proglen, pass, code_base);
 
-	bpf_flush_icache(bpf_hdr, image + alloclen);
-
 #ifdef PPC64_ELF_ABI_v1
 	/* Function descriptor nastiness: Address + TOC */
 	((u64 *)image)[0] = (u64)code_base;
@@ -1057,6 +1055,8 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 	fp->bpf_func = (void *)image;
 	fp->jited = 1;
 
+	bpf_flush_icache(bpf_hdr, (u8 *)bpf_hdr + (bpf_hdr->pages * PAGE_SIZE));
+
 out:
 	kfree(addrs);
 
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
  2017-01-13 17:10 [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Naveen N. Rao
  2017-01-13 17:10 ` [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer Naveen N. Rao
@ 2017-01-13 17:10 ` Naveen N. Rao
  2017-01-13 17:17     ` David Laight
  2017-01-13 20:09 ` [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Alexei Starovoitov
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 19+ messages in thread
From: Naveen N. Rao @ 2017-01-13 17:10 UTC (permalink / raw)
  To: mpe; +Cc: linuxppc-dev, netdev, ast, daniel, davem

Generate instructions to perform the endian conversion using registers,
rather than generating two memory accesses.

The "way easier and faster" comment was obviously for the author, not
the processor.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/powerpc/net/bpf_jit_comp64.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 1e313db..0413a89 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -599,16 +599,22 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
 				break;
 			case 64:
 				/*
-				 * Way easier and faster(?) to store the value
-				 * into stack and then use ldbrx
+				 * We'll split it up into two words, swap those
+				 * independently and then merge them back.
 				 *
-				 * ctx->seen will be reliable in pass2, but
-				 * the instructions generated will remain the
-				 * same across all passes
+				 * First up, let's swap the most-significant word.
 				 */
-				PPC_STD(dst_reg, 1, bpf_jit_stack_local(ctx));
-				PPC_ADDI(b2p[TMP_REG_1], 1, bpf_jit_stack_local(ctx));
-				PPC_LDBRX(dst_reg, 0, b2p[TMP_REG_1]);
+				PPC_RLDICL(b2p[TMP_REG_1], dst_reg, 32, 32);
+				PPC_RLWINM(b2p[TMP_REG_2], b2p[TMP_REG_1], 8, 0, 31);
+				PPC_RLWIMI(b2p[TMP_REG_2], b2p[TMP_REG_1], 24, 0, 7);
+				PPC_RLWIMI(b2p[TMP_REG_2], b2p[TMP_REG_1], 24, 16, 23);
+				/* Then, the second half */
+				PPC_RLWINM(b2p[TMP_REG_1], dst_reg, 8, 0, 31);
+				PPC_RLWIMI(b2p[TMP_REG_1], dst_reg, 24, 0, 7);
+				PPC_RLWIMI(b2p[TMP_REG_1], dst_reg, 24, 16, 23);
+				/* Merge back */
+				PPC_RLDICR(dst_reg, b2p[TMP_REG_1], 32, 31);
+				PPC_OR(dst_reg, dst_reg, b2p[TMP_REG_2]);
 				break;
 			}
 			break;
-- 
2.10.2

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* RE: [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
  2017-01-13 17:10 ` [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations Naveen N. Rao
@ 2017-01-13 17:17     ` David Laight
  0 siblings, 0 replies; 19+ messages in thread
From: David Laight @ 2017-01-13 17:17 UTC (permalink / raw)
  To: 'Naveen N. Rao', mpe; +Cc: linuxppc-dev, netdev, ast, daniel, davem

From: Naveen N. Rao
> Sent: 13 January 2017 17:10
> Generate instructions to perform the endian conversion using registers,
> rather than generating two memory accesses.
> 
> The "way easier and faster" comment was obviously for the author, not
> the processor.

That rather depends on whether the processor has a store to load forwarder
that will satisfy the read from the store buffer.
I don't know about ppc, but at least some x86 will do that.

	David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
@ 2017-01-13 17:17     ` David Laight
  0 siblings, 0 replies; 19+ messages in thread
From: David Laight @ 2017-01-13 17:17 UTC (permalink / raw)
  To: 'Naveen N. Rao', mpe; +Cc: linuxppc-dev, netdev, ast, daniel, davem

From: Naveen N. Rao
> Sent: 13 January 2017 17:10
> Generate instructions to perform the endian conversion using registers,
> rather than generating two memory accesses.
>=20
> The "way easier and faster" comment was obviously for the author, not
> the processor.

That rather depends on whether the processor has a store to load forwarder
that will satisfy the read from the store buffer.
I don't know about ppc, but at least some x86 will do that.

	David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
  2017-01-13 17:17     ` David Laight
  (?)
@ 2017-01-13 17:52     ` 'Naveen N. Rao'
  2017-01-15 15:00       ` Benjamin Herrenschmidt
  -1 siblings, 1 reply; 19+ messages in thread
From: 'Naveen N. Rao' @ 2017-01-13 17:52 UTC (permalink / raw)
  To: David Laight; +Cc: mpe, linuxppc-dev, netdev, ast, daniel, davem

On 2017/01/13 05:17PM, David Laight wrote:
> From: Naveen N. Rao
> > Sent: 13 January 2017 17:10
> > Generate instructions to perform the endian conversion using registers,
> > rather than generating two memory accesses.
> > 
> > The "way easier and faster" comment was obviously for the author, not
> > the processor.
> 
> That rather depends on whether the processor has a store to load forwarder
> that will satisfy the read from the store buffer.
> I don't know about ppc, but at least some x86 will do that.

Interesting - good to know that.

However, I don't think powerpc does that and in-register swap is likely 
faster regardless. Note also that gcc prefers this form at higher 
optimization levels.

Thanks,
Naveen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image
  2017-01-13 17:10 [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Naveen N. Rao
  2017-01-13 17:10 ` [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer Naveen N. Rao
  2017-01-13 17:10 ` [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations Naveen N. Rao
@ 2017-01-13 20:09 ` Alexei Starovoitov
  2017-01-16 18:38 ` David Miller
  2017-01-27  0:40 ` [1/3] " Michael Ellerman
  4 siblings, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2017-01-13 20:09 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: mpe, linuxppc-dev, netdev, ast, daniel, davem

On Fri, Jan 13, 2017 at 10:40:00PM +0530, Naveen N. Rao wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> 
> We have a check earlier to ensure we don't proceed if image is NULL. As
> such, the redundant check can be removed.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> [Added similar changes for classic BPF JIT]
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer
  2017-01-13 17:10 ` [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer Naveen N. Rao
@ 2017-01-13 20:10   ` Alexei Starovoitov
  2017-01-13 22:55   ` Daniel Borkmann
  2017-01-27  0:40   ` [2/3] " Michael Ellerman
  2 siblings, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2017-01-13 20:10 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: mpe, linuxppc-dev, netdev, ast, daniel, davem

On Fri, Jan 13, 2017 at 10:40:01PM +0530, Naveen N. Rao wrote:
> With bpf_jit_binary_alloc(), we allocate at a page granularity and fill
> the rest of the space with illegal instructions to mitigate BPF spraying
> attacks, while having the actual JIT'ed BPF program at a random location
> within the allocated space. Under this scenario, it would be better to
> flush the entire allocated buffer rather than just the part containing
> the actual program. We already flush the buffer from start to the end of
> the BPF program. Extend this to include the illegal instructions after
> the BPF program.
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer
  2017-01-13 17:10 ` [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer Naveen N. Rao
  2017-01-13 20:10   ` Alexei Starovoitov
@ 2017-01-13 22:55   ` Daniel Borkmann
  2017-01-27  0:40   ` [2/3] " Michael Ellerman
  2 siblings, 0 replies; 19+ messages in thread
From: Daniel Borkmann @ 2017-01-13 22:55 UTC (permalink / raw)
  To: Naveen N. Rao, mpe; +Cc: linuxppc-dev, netdev, ast, davem

On 01/13/2017 06:10 PM, Naveen N. Rao wrote:
> With bpf_jit_binary_alloc(), we allocate at a page granularity and fill
> the rest of the space with illegal instructions to mitigate BPF spraying
> attacks, while having the actual JIT'ed BPF program at a random location
> within the allocated space. Under this scenario, it would be better to
> flush the entire allocated buffer rather than just the part containing
> the actual program. We already flush the buffer from start to the end of
> the BPF program. Extend this to include the illegal instructions after
> the BPF program.
>
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
  2017-01-13 17:52     ` 'Naveen N. Rao'
@ 2017-01-15 15:00       ` Benjamin Herrenschmidt
  2017-01-23 19:22         ` 'Naveen N. Rao'
  0 siblings, 1 reply; 19+ messages in thread
From: Benjamin Herrenschmidt @ 2017-01-15 15:00 UTC (permalink / raw)
  To: 'Naveen N. Rao', David Laight
  Cc: daniel, ast, netdev, linuxppc-dev, davem

On Fri, 2017-01-13 at 23:22 +0530, 'Naveen N. Rao' wrote:
> > That rather depends on whether the processor has a store to load forwarder
> > that will satisfy the read from the store buffer.
> > I don't know about ppc, but at least some x86 will do that.
> 
> Interesting - good to know that.
> 
> However, I don't think powerpc does that and in-register swap is likely 
> faster regardless. Note also that gcc prefers this form at higher 
> optimization levels.

Of course powerpc has a load-store forwarder these days, however, I
wouldn't be surprised if the in-register form was still faster on some
implementations, but this needs to be tested.

Ideally, you'd want to try to "optimize" load+swap or swap+store
though.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image
  2017-01-13 17:10 [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Naveen N. Rao
                   ` (2 preceding siblings ...)
  2017-01-13 20:09 ` [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Alexei Starovoitov
@ 2017-01-16 18:38 ` David Miller
  2017-01-23 17:14     ` Naveen N. Rao
  2017-01-27  0:40 ` [1/3] " Michael Ellerman
  4 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2017-01-16 18:38 UTC (permalink / raw)
  To: naveen.n.rao; +Cc: mpe, linuxppc-dev, netdev, ast, daniel


I'm assuming these patches will go via the powerpc tree.

If you want them to go into net-next, I kindly ask that you always
explicitly say so, and furthermore always submit a patch series with
a proper "[PATCH 0/N] ..." header posting.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image
  2017-01-16 18:38 ` David Miller
@ 2017-01-23 17:14     ` Naveen N. Rao
  0 siblings, 0 replies; 19+ messages in thread
From: Naveen N. Rao @ 2017-01-23 17:14 UTC (permalink / raw)
  To: David Miller; +Cc: ast, linuxppc-dev, daniel, netdev

Hi David,

On 2017/01/16 01:38PM, David Miller wrote:
> 
> I'm assuming these patches will go via the powerpc tree.
> 
> If you want them to go into net-next, I kindly ask that you always
> explicitly say so, and furthermore always submit a patch series with
> a proper "[PATCH 0/N] ..." header posting.

Sure. Sorry for the confusion. I will be more explicit next time.

Thanks,
Naveen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image
@ 2017-01-23 17:14     ` Naveen N. Rao
  0 siblings, 0 replies; 19+ messages in thread
From: Naveen N. Rao @ 2017-01-23 17:14 UTC (permalink / raw)
  To: David Miller; +Cc: mpe, linuxppc-dev, netdev, ast, daniel

Hi David,

On 2017/01/16 01:38PM, David Miller wrote:
> 
> I'm assuming these patches will go via the powerpc tree.
> 
> If you want them to go into net-next, I kindly ask that you always
> explicitly say so, and furthermore always submit a patch series with
> a proper "[PATCH 0/N] ..." header posting.

Sure. Sorry for the confusion. I will be more explicit next time.

Thanks,
Naveen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
  2017-01-15 15:00       ` Benjamin Herrenschmidt
@ 2017-01-23 19:22         ` 'Naveen N. Rao'
  2017-01-24 16:13             ` David Laight
  0 siblings, 1 reply; 19+ messages in thread
From: 'Naveen N. Rao' @ 2017-01-23 19:22 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David Laight, netdev, linuxppc-dev, davem, daniel, ast,
	Madhavan Srinivasan, Michael Ellerman

On 2017/01/15 09:00AM, Benjamin Herrenschmidt wrote:
> On Fri, 2017-01-13 at 23:22 +0530, 'Naveen N. Rao' wrote:
> > > That rather depends on whether the processor has a store to load forwarder
> > > that will satisfy the read from the store buffer.
> > > I don't know about ppc, but at least some x86 will do that.
> > 
> > Interesting - good to know that.
> > 
> > However, I don't think powerpc does that and in-register swap is likely 
> > faster regardless. Note also that gcc prefers this form at higher 
> > optimization levels.
> 
> Of course powerpc has a load-store forwarder these days, however, I
> wouldn't be surprised if the in-register form was still faster on some
> implementations, but this needs to be tested.

Thanks for clarifying! To test this, I wrote a simple (perhaps naive) 
test that just issues a whole lot of endian swaps and in _that_ test, it 
does look like the load-store forwarder is doing pretty well.

The tests:

bpf-bswap.S:
-----------
	.file   "bpf-bswap.S"
        .abiversion 2
        .section        ".text"
        .align 2
        .globl main
        .type   main, @function
main:
        mflr    0
        std     0,16(1)
        stdu    1,-32760(1)
	addi	3,1,32
	li	4,0
	li	5,32720
	li	11,32720
	mulli	11,11,8
	li	10,0
	li	7,16
1:	ldx	6,3,4
	stdx	6,1,7
	ldbrx	6,1,7
	stdx	6,3,4
	addi	4,4,8
	cmpd	4,5
	beq	2f
	b	1b
2:	addi	10,10,1
	li	4,0
	cmpd	10,11
	beq	3f
	b	1b
3:	li	3,0
        addi	1,1,32760
        ld      0,16(1)
	mtlr	0
	blr

bpf-bswap-reg.S:
---------------
	.file   "bpf-bswap-reg.S"
        .abiversion 2
        .section        ".text"
        .align 2
        .globl main
        .type   main, @function
main:
        mflr    0
        std     0,16(1)
        stdu    1,-32760(1)
	addi	3,1,32
	li	4,0
	li	5,32720
	li	11,32720
	mulli	11,11,8
	li	10,0
1:	ldx	6,3,4
	rldicl	7,6,32,32
	rlwinm	8,6,24,0,31
	rlwimi	8,6,8,8,15
	rlwinm	9,7,24,0,31
	rlwimi	8,6,8,24,31
	rlwimi	9,7,8,8,15
	rlwimi	9,7,8,24,31
	rldicr	8,8,32,31
	or	6,8,9
	stdx	6,3,4
	addi	4,4,8
	cmpd	4,5
	beq	2f
	b	1b
2:	addi	10,10,1
	li	4,0
	cmpd	10,11
	beq	3f
	b	1b
3:	li	3,0
        addi	1,1,32760
        ld      0,16(1)
	mtlr	0
	blr

Profiling the two variants:

# perf stat ./bpf-bswap

 Performance counter stats for './bpf-bswap':

       1395.979224      task-clock (msec)         #    0.999 CPUs utilized          
                 0      context-switches          #    0.000 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                45      page-faults               #    0.032 K/sec                  
     4,651,874,673      cycles                    #    3.332 GHz                      (66.87%)
         3,141,186      stalled-cycles-frontend   #    0.07% frontend cycles idle     (50.57%)
     1,117,289,485      stalled-cycles-backend    #   24.02% backend cycles idle      (50.57%)
     8,565,963,861      instructions              #    1.84  insn per cycle         
                                                  #    0.13  stalled cycles per insn  (67.05%)
     2,174,029,771      branches                  # 1557.351 M/sec                    (49.69%)
           262,656      branch-misses             #    0.01% of all branches          (50.05%)

       1.396893189 seconds time elapsed

# perf stat ./bpf-bswap-reg

 Performance counter stats for './bpf-bswap-reg':

       1819.758102      task-clock (msec)         #    0.999 CPUs utilized          
                 3      context-switches          #    0.002 K/sec                  
                 0      cpu-migrations            #    0.000 K/sec                  
                44      page-faults               #    0.024 K/sec                  
     6,034,777,602      cycles                    #    3.316 GHz                      (66.83%)
         2,010,983      stalled-cycles-frontend   #    0.03% frontend cycles idle     (50.47%)
     1,024,975,759      stalled-cycles-backend    #   16.98% backend cycles idle      (50.52%)
    16,043,732,849      instructions              #    2.66  insn per cycle         
                                                  #    0.06  stalled cycles per insn  (67.01%)
     2,148,710,750      branches                  # 1180.767 M/sec                    (49.57%)
           268,046      branch-misses             #    0.01% of all branches          (49.52%)

       1.821501345 seconds time elapsed


This is all in a POWER8 vm. On POWER7, the in-register variant is around 
4 times faster than the ldbrx variant.

So, yes, unless I've missed something, the ldbrx variant seems to 
perform better, if not on par with the in-register swap variant on 
POWER8.

> 
> Ideally, you'd want to try to "optimize" load+swap or swap+store
> though.

Agreed. This is already the case with BPF for packet access - those use 
skb helpers which issue the appropriate lhbrx/lwbrx/ldbrx. The newer 
BPF_FROM_LE/BPF_FROM_BE are for endian operations with other BPF 
programs.

We can probably implement an extra pass to detect use of endian swap and 
try to match it up with a previous load or a subsequent store though...

Thanks!
- Naveen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
  2017-01-23 19:22         ` 'Naveen N. Rao'
@ 2017-01-24 16:13             ` David Laight
  0 siblings, 0 replies; 19+ messages in thread
From: David Laight @ 2017-01-24 16:13 UTC (permalink / raw)
  To: 'Naveen N. Rao', Benjamin Herrenschmidt
  Cc: netdev, linuxppc-dev, davem, daniel, ast, Madhavan Srinivasan,
	Michael Ellerman

From: 'Naveen N. Rao'
> Sent: 23 January 2017 19:22
> On 2017/01/15 09:00AM, Benjamin Herrenschmidt wrote:
> > On Fri, 2017-01-13 at 23:22 +0530, 'Naveen N. Rao' wrote:
> > > > That rather depends on whether the processor has a store to load forwarder
> > > > that will satisfy the read from the store buffer.
> > > > I don't know about ppc, but at least some x86 will do that.
> > >
> > > Interesting - good to know that.
> > >
> > > However, I don't think powerpc does that and in-register swap is likely
> > > faster regardless. Note also that gcc prefers this form at higher
> > > optimization levels.
> >
> > Of course powerpc has a load-store forwarder these days, however, I
> > wouldn't be surprised if the in-register form was still faster on some
> > implementations, but this needs to be tested.
> 
> Thanks for clarifying! To test this, I wrote a simple (perhaps naive)
> test that just issues a whole lot of endian swaps and in _that_ test, it
> does look like the load-store forwarder is doing pretty well.
...
> This is all in a POWER8 vm. On POWER7, the in-register variant is around
> 4 times faster than the ldbrx variant.
...

I wonder which is faster on the little 1GHz embedded ppc we use here.

	David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
@ 2017-01-24 16:13             ` David Laight
  0 siblings, 0 replies; 19+ messages in thread
From: David Laight @ 2017-01-24 16:13 UTC (permalink / raw)
  To: 'Naveen N. Rao', Benjamin Herrenschmidt
  Cc: netdev, linuxppc-dev, davem, daniel, ast, Madhavan Srinivasan,
	Michael Ellerman

From: 'Naveen N. Rao'
> Sent: 23 January 2017 19:22
> On 2017/01/15 09:00AM, Benjamin Herrenschmidt wrote:
> > On Fri, 2017-01-13 at 23:22 +0530, 'Naveen N. Rao' wrote:
> > > > That rather depends on whether the processor has a store to load fo=
rwarder
> > > > that will satisfy the read from the store buffer.
> > > > I don't know about ppc, but at least some x86 will do that.
> > >
> > > Interesting - good to know that.
> > >
> > > However, I don't think powerpc does that and in-register swap is like=
ly
> > > faster regardless. Note also that gcc prefers this form at higher
> > > optimization levels.
> >
> > Of course powerpc has a load-store forwarder these days, however, I
> > wouldn't be surprised if the in-register form was still faster on some
> > implementations, but this needs to be tested.
>=20
> Thanks for clarifying! To test this, I wrote a simple (perhaps naive)
> test that just issues a whole lot of endian swaps and in _that_ test, it
> does look like the load-store forwarder is doing pretty well.
...
> This is all in a POWER8 vm. On POWER7, the in-register variant is around
> 4 times faster than the ldbrx variant.
...

I wonder which is faster on the little 1GHz embedded ppc we use here.

	David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations
  2017-01-24 16:13             ` David Laight
  (?)
@ 2017-01-24 16:25             ` 'Naveen N. Rao'
  -1 siblings, 0 replies; 19+ messages in thread
From: 'Naveen N. Rao' @ 2017-01-24 16:25 UTC (permalink / raw)
  To: David Laight
  Cc: Benjamin Herrenschmidt, netdev, linuxppc-dev, davem, daniel, ast,
	Madhavan Srinivasan, Michael Ellerman

On 2017/01/24 04:13PM, David Laight wrote:
> From: 'Naveen N. Rao'
> > Sent: 23 January 2017 19:22
> > On 2017/01/15 09:00AM, Benjamin Herrenschmidt wrote:
> > > On Fri, 2017-01-13 at 23:22 +0530, 'Naveen N. Rao' wrote:
> > > > > That rather depends on whether the processor has a store to load forwarder
> > > > > that will satisfy the read from the store buffer.
> > > > > I don't know about ppc, but at least some x86 will do that.
> > > >
> > > > Interesting - good to know that.
> > > >
> > > > However, I don't think powerpc does that and in-register swap is likely
> > > > faster regardless. Note also that gcc prefers this form at higher
> > > > optimization levels.
> > >
> > > Of course powerpc has a load-store forwarder these days, however, I
> > > wouldn't be surprised if the in-register form was still faster on some
> > > implementations, but this needs to be tested.
> > 
> > Thanks for clarifying! To test this, I wrote a simple (perhaps naive)
> > test that just issues a whole lot of endian swaps and in _that_ test, it
> > does look like the load-store forwarder is doing pretty well.
> ...
> > This is all in a POWER8 vm. On POWER7, the in-register variant is around
> > 4 times faster than the ldbrx variant.
> ...
> 
> I wonder which is faster on the little 1GHz embedded ppc we use here.

Worth a test, for sure.
FWIW, this patch won't matter since eBPF JIT is for ppc64.

Thanks,
Naveen

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [2/3] powerpc: bpf: flush the entire JIT buffer
  2017-01-13 17:10 ` [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer Naveen N. Rao
  2017-01-13 20:10   ` Alexei Starovoitov
  2017-01-13 22:55   ` Daniel Borkmann
@ 2017-01-27  0:40   ` Michael Ellerman
  2 siblings, 0 replies; 19+ messages in thread
From: Michael Ellerman @ 2017-01-27  0:40 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: netdev, linuxppc-dev, davem, daniel, ast

On Fri, 2017-01-13 at 17:10:01 UTC, "Naveen N. Rao" wrote:
> With bpf_jit_binary_alloc(), we allocate at a page granularity and fill
> the rest of the space with illegal instructions to mitigate BPF spraying
> attacks, while having the actual JIT'ed BPF program at a random location
> within the allocated space. Under this scenario, it would be better to
> flush the entire allocated buffer rather than just the part containing
> the actual program. We already flush the buffer from start to the end of
> the BPF program. Extend this to include the illegal instructions after
> the BPF program.
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
> Acked-by: Daniel Borkmann <daniel@iogearbox.net>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/10528b9c45cfb9e8f45217ef2f5ef8

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [1/3] powerpc: bpf: remove redundant check for non-null image
  2017-01-13 17:10 [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Naveen N. Rao
                   ` (3 preceding siblings ...)
  2017-01-16 18:38 ` David Miller
@ 2017-01-27  0:40 ` Michael Ellerman
  4 siblings, 0 replies; 19+ messages in thread
From: Michael Ellerman @ 2017-01-27  0:40 UTC (permalink / raw)
  To: Naveen N. Rao; +Cc: netdev, linuxppc-dev, davem, daniel, ast

On Fri, 2017-01-13 at 17:10:00 UTC, "Naveen N. Rao" wrote:
> From: Daniel Borkmann <daniel@iogearbox.net>
> 
> We have a check earlier to ensure we don't proceed if image is NULL. As
> such, the redundant check can be removed.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> [Added similar changes for classic BPF JIT]
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/052de33ca4f840bf35587eacdf78b3

cheers

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-01-27  0:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-13 17:10 [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Naveen N. Rao
2017-01-13 17:10 ` [PATCH 2/3] powerpc: bpf: flush the entire JIT buffer Naveen N. Rao
2017-01-13 20:10   ` Alexei Starovoitov
2017-01-13 22:55   ` Daniel Borkmann
2017-01-27  0:40   ` [2/3] " Michael Ellerman
2017-01-13 17:10 ` [PATCH 3/3] powerpc: bpf: implement in-register swap for 64-bit endian operations Naveen N. Rao
2017-01-13 17:17   ` David Laight
2017-01-13 17:17     ` David Laight
2017-01-13 17:52     ` 'Naveen N. Rao'
2017-01-15 15:00       ` Benjamin Herrenschmidt
2017-01-23 19:22         ` 'Naveen N. Rao'
2017-01-24 16:13           ` David Laight
2017-01-24 16:13             ` David Laight
2017-01-24 16:25             ` 'Naveen N. Rao'
2017-01-13 20:09 ` [PATCH 1/3] powerpc: bpf: remove redundant check for non-null image Alexei Starovoitov
2017-01-16 18:38 ` David Miller
2017-01-23 17:14   ` Naveen N. Rao
2017-01-23 17:14     ` Naveen N. Rao
2017-01-27  0:40 ` [1/3] " Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.