All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] arm64: bpf: add BPF_ST and BPF_XADD instructions support
@ 2015-11-10 22:41 ` Yang Shi
  0 siblings, 0 replies; 103+ messages in thread
From: Yang Shi @ 2015-11-10 22:41 UTC (permalink / raw)
  To: ast, daniel, catalin.marinas, will.deacon
  Cc: zlim.lnx, xi.wang, linux-kernel, netdev, linux-arm-kernel,
	linaro-kernel, yang.shi


Current ARM64 BPF JIT doesn't have store immediate and XADD instructions
support, and aarch64 doesn't have native instructions for them. Implement
them in instruction sequence. For detail, please refer to the commit log.

The implementation is tested by test_bpf kernel module.

The patches are applied after my BPF JIT stack fix [1].

[1] https://patches.linaro.org/56268/

Yang Shi (2):
      arm64: bpf: add 'store immediate' instruction
      arm64: bpf: add BPF XADD instruction

 arch/arm64/net/bpf_jit_comp.c | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 0/2] arm64: bpf: add BPF_ST and BPF_XADD instructions support
@ 2015-11-10 22:41 ` Yang Shi
  0 siblings, 0 replies; 103+ messages in thread
From: Yang Shi @ 2015-11-10 22:41 UTC (permalink / raw)
  To: ast, daniel, catalin.marinas, will.deacon
  Cc: yang.shi, linaro-kernel, zlim.lnx, linux-kernel, xi.wang, netdev,
	linux-arm-kernel


Current ARM64 BPF JIT doesn't have store immediate and XADD instructions
support, and aarch64 doesn't have native instructions for them. Implement
them in instruction sequence. For detail, please refer to the commit log.

The implementation is tested by test_bpf kernel module.

The patches are applied after my BPF JIT stack fix [1].

[1] https://patches.linaro.org/56268/

Yang Shi (2):
      arm64: bpf: add 'store immediate' instruction
      arm64: bpf: add BPF XADD instruction

 arch/arm64/net/bpf_jit_comp.c | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 0/2] arm64: bpf: add BPF_ST and BPF_XADD instructions support
@ 2015-11-10 22:41 ` Yang Shi
  0 siblings, 0 replies; 103+ messages in thread
From: Yang Shi @ 2015-11-10 22:41 UTC (permalink / raw)
  To: linux-arm-kernel


Current ARM64 BPF JIT doesn't have store immediate and XADD instructions
support, and aarch64 doesn't have native instructions for them. Implement
them in instruction sequence. For detail, please refer to the commit log.

The implementation is tested by test_bpf kernel module.

The patches are applied after my BPF JIT stack fix [1].

[1] https://patches.linaro.org/56268/

Yang Shi (2):
      arm64: bpf: add 'store immediate' instruction
      arm64: bpf: add BPF XADD instruction

 arch/arm64/net/bpf_jit_comp.c | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
  2015-11-10 22:41 ` Yang Shi
@ 2015-11-10 22:41   ` Yang Shi
  -1 siblings, 0 replies; 103+ messages in thread
From: Yang Shi @ 2015-11-10 22:41 UTC (permalink / raw)
  To: ast, daniel, catalin.marinas, will.deacon
  Cc: zlim.lnx, xi.wang, linux-kernel, netdev, linux-arm-kernel,
	linaro-kernel, yang.shi

aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:

Load immediate to register
Store register

Signed-off-by: Yang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
---
 arch/arm64/net/bpf_jit_comp.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 6809647..49c1f1b 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -563,7 +563,25 @@ emit_cond_jmp:
 	case BPF_ST | BPF_MEM | BPF_H:
 	case BPF_ST | BPF_MEM | BPF_B:
 	case BPF_ST | BPF_MEM | BPF_DW:
-		goto notyet;
+		/* Load imm to a register then store it */
+		ctx->tmp_used = 1;
+		emit_a64_mov_i(1, tmp2, off, ctx);
+		emit_a64_mov_i(1, tmp, imm, ctx);
+		switch (BPF_SIZE(code)) {
+		case BPF_W:
+			emit(A64_STR32(tmp, dst, tmp2), ctx);
+			break;
+		case BPF_H:
+			emit(A64_STRH(tmp, dst, tmp2), ctx);
+			break;
+		case BPF_B:
+			emit(A64_STRB(tmp, dst, tmp2), ctx);
+			break;
+		case BPF_DW:
+			emit(A64_STR64(tmp, dst, tmp2), ctx);
+			break;
+		}
+		break;
 
 	/* STX: *(size *)(dst + off) = src */
 	case BPF_STX | BPF_MEM | BPF_W:
-- 
2.0.2


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-10 22:41   ` Yang Shi
  0 siblings, 0 replies; 103+ messages in thread
From: Yang Shi @ 2015-11-10 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

aarch64 doesn't have native store immediate instruction, such operation
has to be implemented by the below instruction sequence:

Load immediate to register
Store register

Signed-off-by: Yang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
---
 arch/arm64/net/bpf_jit_comp.c | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 6809647..49c1f1b 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -563,7 +563,25 @@ emit_cond_jmp:
 	case BPF_ST | BPF_MEM | BPF_H:
 	case BPF_ST | BPF_MEM | BPF_B:
 	case BPF_ST | BPF_MEM | BPF_DW:
-		goto notyet;
+		/* Load imm to a register then store it */
+		ctx->tmp_used = 1;
+		emit_a64_mov_i(1, tmp2, off, ctx);
+		emit_a64_mov_i(1, tmp, imm, ctx);
+		switch (BPF_SIZE(code)) {
+		case BPF_W:
+			emit(A64_STR32(tmp, dst, tmp2), ctx);
+			break;
+		case BPF_H:
+			emit(A64_STRH(tmp, dst, tmp2), ctx);
+			break;
+		case BPF_B:
+			emit(A64_STRB(tmp, dst, tmp2), ctx);
+			break;
+		case BPF_DW:
+			emit(A64_STR64(tmp, dst, tmp2), ctx);
+			break;
+		}
+		break;
 
 	/* STX: *(size *)(dst + off) = src */
 	case BPF_STX | BPF_MEM | BPF_W:
-- 
2.0.2

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-10 22:41 ` Yang Shi
@ 2015-11-10 22:41   ` Yang Shi
  -1 siblings, 0 replies; 103+ messages in thread
From: Yang Shi @ 2015-11-10 22:41 UTC (permalink / raw)
  To: ast, daniel, catalin.marinas, will.deacon
  Cc: zlim.lnx, xi.wang, linux-kernel, netdev, linux-arm-kernel,
	linaro-kernel, yang.shi

aarch64 doesn't have native support for XADD instruction, implement it by
the below instruction sequence:

Load (dst + off) to a register
Add src to it
Store it back to (dst + off)

Signed-off-by: Yang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
---
 arch/arm64/net/bpf_jit_comp.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 49c1f1b..0b1d2d3 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -609,7 +609,21 @@ emit_cond_jmp:
 	case BPF_STX | BPF_XADD | BPF_W:
 	/* STX XADD: lock *(u64 *)(dst + off) += src */
 	case BPF_STX | BPF_XADD | BPF_DW:
-		goto notyet;
+		ctx->tmp_used = 1;
+		emit_a64_mov_i(1, tmp2, off, ctx);
+		switch (BPF_SIZE(code)) {
+		case BPF_W:
+			emit(A64_LDR32(tmp, dst, tmp2), ctx);
+			emit(A64_ADD(is64, tmp, tmp, src), ctx);
+			emit(A64_STR32(tmp, dst, tmp2), ctx);
+			break;
+		case BPF_DW:
+			emit(A64_LDR64(tmp, dst, tmp2), ctx);
+			emit(A64_ADD(is64, tmp, tmp, src), ctx);
+			emit(A64_STR64(tmp, dst, tmp2), ctx);
+			break;
+		}
+		break;
 
 	/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
 	case BPF_LD | BPF_ABS | BPF_W:
@@ -679,9 +693,6 @@ emit_cond_jmp:
 		}
 		break;
 	}
-notyet:
-		pr_info_once("*** NOT YET: opcode %02x ***\n", code);
-		return -EFAULT;
 
 	default:
 		pr_err_once("unknown opcode %02x\n", code);
-- 
2.0.2


^ permalink raw reply related	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-10 22:41   ` Yang Shi
  0 siblings, 0 replies; 103+ messages in thread
From: Yang Shi @ 2015-11-10 22:41 UTC (permalink / raw)
  To: linux-arm-kernel

aarch64 doesn't have native support for XADD instruction, implement it by
the below instruction sequence:

Load (dst + off) to a register
Add src to it
Store it back to (dst + off)

Signed-off-by: Yang Shi <yang.shi@linaro.org>
CC: Zi Shen Lim <zlim.lnx@gmail.com>
CC: Xi Wang <xi.wang@gmail.com>
---
 arch/arm64/net/bpf_jit_comp.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 49c1f1b..0b1d2d3 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -609,7 +609,21 @@ emit_cond_jmp:
 	case BPF_STX | BPF_XADD | BPF_W:
 	/* STX XADD: lock *(u64 *)(dst + off) += src */
 	case BPF_STX | BPF_XADD | BPF_DW:
-		goto notyet;
+		ctx->tmp_used = 1;
+		emit_a64_mov_i(1, tmp2, off, ctx);
+		switch (BPF_SIZE(code)) {
+		case BPF_W:
+			emit(A64_LDR32(tmp, dst, tmp2), ctx);
+			emit(A64_ADD(is64, tmp, tmp, src), ctx);
+			emit(A64_STR32(tmp, dst, tmp2), ctx);
+			break;
+		case BPF_DW:
+			emit(A64_LDR64(tmp, dst, tmp2), ctx);
+			emit(A64_ADD(is64, tmp, tmp, src), ctx);
+			emit(A64_STR64(tmp, dst, tmp2), ctx);
+			break;
+		}
+		break;
 
 	/* R0 = ntohx(*(size *)(((struct sk_buff *)R6)->data + imm)) */
 	case BPF_LD | BPF_ABS | BPF_W:
@@ -679,9 +693,6 @@ emit_cond_jmp:
 		}
 		break;
 	}
-notyet:
-		pr_info_once("*** NOT YET: opcode %02x ***\n", code);
-		return -EFAULT;
 
 	default:
 		pr_err_once("unknown opcode %02x\n", code);
-- 
2.0.2

^ permalink raw reply related	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-10 22:41   ` Yang Shi
@ 2015-11-11  0:08     ` Eric Dumazet
  -1 siblings, 0 replies; 103+ messages in thread
From: Eric Dumazet @ 2015-11-11  0:08 UTC (permalink / raw)
  To: Yang Shi
  Cc: ast, daniel, catalin.marinas, will.deacon, zlim.lnx, xi.wang,
	linux-kernel, netdev, linux-arm-kernel, linaro-kernel

On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> aarch64 doesn't have native support for XADD instruction, implement it by
> the below instruction sequence:
> 
> Load (dst + off) to a register
> Add src to it
> Store it back to (dst + off)

Not really what is needed ?

See this BPF_XADD as an atomic_add() equivalent.



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11  0:08     ` Eric Dumazet
  0 siblings, 0 replies; 103+ messages in thread
From: Eric Dumazet @ 2015-11-11  0:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> aarch64 doesn't have native support for XADD instruction, implement it by
> the below instruction sequence:
> 
> Load (dst + off) to a register
> Add src to it
> Store it back to (dst + off)

Not really what is needed ?

See this BPF_XADD as an atomic_add() equivalent.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11  0:08     ` Eric Dumazet
@ 2015-11-11  0:26       ` Shi, Yang
  -1 siblings, 0 replies; 103+ messages in thread
From: Shi, Yang @ 2015-11-11  0:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: ast, daniel, catalin.marinas, will.deacon, zlim.lnx, xi.wang,
	linux-kernel, netdev, linux-arm-kernel, linaro-kernel

On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>> aarch64 doesn't have native support for XADD instruction, implement it by
>> the below instruction sequence:
>>
>> Load (dst + off) to a register
>> Add src to it
>> Store it back to (dst + off)
>
> Not really what is needed ?
>
> See this BPF_XADD as an atomic_add() equivalent.

I see. Thanks. The documentation doesn't say too much about "exclusive" 
add. If so it should need load-acquire/store-release.

I will rework it.

Yang

>
>


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11  0:26       ` Shi, Yang
  0 siblings, 0 replies; 103+ messages in thread
From: Shi, Yang @ 2015-11-11  0:26 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>> aarch64 doesn't have native support for XADD instruction, implement it by
>> the below instruction sequence:
>>
>> Load (dst + off) to a register
>> Add src to it
>> Store it back to (dst + off)
>
> Not really what is needed ?
>
> See this BPF_XADD as an atomic_add() equivalent.

I see. Thanks. The documentation doesn't say too much about "exclusive" 
add. If so it should need load-acquire/store-release.

I will rework it.

Yang

>
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11  0:26       ` Shi, Yang
@ 2015-11-11  0:42         ` Alexei Starovoitov
  -1 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11  0:42 UTC (permalink / raw)
  To: Shi, Yang
  Cc: Eric Dumazet, ast, daniel, catalin.marinas, will.deacon,
	zlim.lnx, xi.wang, linux-kernel, netdev, linux-arm-kernel,
	linaro-kernel

On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>aarch64 doesn't have native support for XADD instruction, implement it by
> >>the below instruction sequence:
> >>
> >>Load (dst + off) to a register
> >>Add src to it
> >>Store it back to (dst + off)
> >
> >Not really what is needed ?
> >
> >See this BPF_XADD as an atomic_add() equivalent.
> 
> I see. Thanks. The documentation doesn't say too much about "exclusive" add.
> If so it should need load-acquire/store-release.

I think doc is clear enough, but it can always be improved. Pls suggest a patch.
It's quite hard to write a test for atomicity in test_bpf framework, so
code review is the key. Eric, thanks for catching it!


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11  0:42         ` Alexei Starovoitov
  0 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11  0:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>aarch64 doesn't have native support for XADD instruction, implement it by
> >>the below instruction sequence:
> >>
> >>Load (dst + off) to a register
> >>Add src to it
> >>Store it back to (dst + off)
> >
> >Not really what is needed ?
> >
> >See this BPF_XADD as an atomic_add() equivalent.
> 
> I see. Thanks. The documentation doesn't say too much about "exclusive" add.
> If so it should need load-acquire/store-release.

I think doc is clear enough, but it can always be improved. Pls suggest a patch.
It's quite hard to write a test for atomicity in test_bpf framework, so
code review is the key. Eric, thanks for catching it!

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
  2015-11-10 22:41   ` Yang Shi
  (?)
@ 2015-11-11  2:45     ` Z Lim
  -1 siblings, 0 replies; 103+ messages in thread
From: Z Lim @ 2015-11-11  2:45 UTC (permalink / raw)
  To: Yang Shi
  Cc: Alexei Starovoitov, daniel, Catalin Marinas, Will Deacon,
	Xi Wang, LKML, Network Development, linux-arm-kernel,
	linaro-kernel

On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
> aarch64 doesn't have native store immediate instruction, such operation

Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
consider using it as an optimization.

You may also want to consider adding a note about the corresponding test cases:
    commit cffc642d93f9 ("test_bpf: add 173 new testcases for eBPF").

Otherwise, the patch below looks good.
Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com>

> has to be implemented by the below instruction sequence:
>
> Load immediate to register
> Store register
>
> Signed-off-by: Yang Shi <yang.shi@linaro.org>
> CC: Zi Shen Lim <zlim.lnx@gmail.com>
> CC: Xi Wang <xi.wang@gmail.com>
> ---
>  arch/arm64/net/bpf_jit_comp.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 6809647..49c1f1b 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -563,7 +563,25 @@ emit_cond_jmp:
>         case BPF_ST | BPF_MEM | BPF_H:
>         case BPF_ST | BPF_MEM | BPF_B:
>         case BPF_ST | BPF_MEM | BPF_DW:
> -               goto notyet;
> +               /* Load imm to a register then store it */
> +               ctx->tmp_used = 1;
> +               emit_a64_mov_i(1, tmp2, off, ctx);
> +               emit_a64_mov_i(1, tmp, imm, ctx);
> +               switch (BPF_SIZE(code)) {
> +               case BPF_W:
> +                       emit(A64_STR32(tmp, dst, tmp2), ctx);
> +                       break;
> +               case BPF_H:
> +                       emit(A64_STRH(tmp, dst, tmp2), ctx);
> +                       break;
> +               case BPF_B:
> +                       emit(A64_STRB(tmp, dst, tmp2), ctx);
> +                       break;
> +               case BPF_DW:
> +                       emit(A64_STR64(tmp, dst, tmp2), ctx);
> +                       break;
> +               }
> +               break;
>
>         /* STX: *(size *)(dst + off) = src */
>         case BPF_STX | BPF_MEM | BPF_W:
> --
> 2.0.2
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-11  2:45     ` Z Lim
  0 siblings, 0 replies; 103+ messages in thread
From: Z Lim @ 2015-11-11  2:45 UTC (permalink / raw)
  To: Yang Shi
  Cc: Alexei Starovoitov, daniel, Catalin Marinas, Will Deacon,
	Xi Wang, LKML, Network Development, linux-arm-kernel,
	linaro-kernel

On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
> aarch64 doesn't have native store immediate instruction, such operation

Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
consider using it as an optimization.

You may also want to consider adding a note about the corresponding test cases:
    commit cffc642d93f9 ("test_bpf: add 173 new testcases for eBPF").

Otherwise, the patch below looks good.
Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com>

> has to be implemented by the below instruction sequence:
>
> Load immediate to register
> Store register
>
> Signed-off-by: Yang Shi <yang.shi@linaro.org>
> CC: Zi Shen Lim <zlim.lnx@gmail.com>
> CC: Xi Wang <xi.wang@gmail.com>
> ---
>  arch/arm64/net/bpf_jit_comp.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 6809647..49c1f1b 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -563,7 +563,25 @@ emit_cond_jmp:
>         case BPF_ST | BPF_MEM | BPF_H:
>         case BPF_ST | BPF_MEM | BPF_B:
>         case BPF_ST | BPF_MEM | BPF_DW:
> -               goto notyet;
> +               /* Load imm to a register then store it */
> +               ctx->tmp_used = 1;
> +               emit_a64_mov_i(1, tmp2, off, ctx);
> +               emit_a64_mov_i(1, tmp, imm, ctx);
> +               switch (BPF_SIZE(code)) {
> +               case BPF_W:
> +                       emit(A64_STR32(tmp, dst, tmp2), ctx);
> +                       break;
> +               case BPF_H:
> +                       emit(A64_STRH(tmp, dst, tmp2), ctx);
> +                       break;
> +               case BPF_B:
> +                       emit(A64_STRB(tmp, dst, tmp2), ctx);
> +                       break;
> +               case BPF_DW:
> +                       emit(A64_STR64(tmp, dst, tmp2), ctx);
> +                       break;
> +               }
> +               break;
>
>         /* STX: *(size *)(dst + off) = src */
>         case BPF_STX | BPF_MEM | BPF_W:
> --
> 2.0.2
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-11  2:45     ` Z Lim
  0 siblings, 0 replies; 103+ messages in thread
From: Z Lim @ 2015-11-11  2:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
> aarch64 doesn't have native store immediate instruction, such operation

Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
consider using it as an optimization.

You may also want to consider adding a note about the corresponding test cases:
    commit cffc642d93f9 ("test_bpf: add 173 new testcases for eBPF").

Otherwise, the patch below looks good.
Reviewed-by: Zi Shen Lim <zlim.lnx@gmail.com>

> has to be implemented by the below instruction sequence:
>
> Load immediate to register
> Store register
>
> Signed-off-by: Yang Shi <yang.shi@linaro.org>
> CC: Zi Shen Lim <zlim.lnx@gmail.com>
> CC: Xi Wang <xi.wang@gmail.com>
> ---
>  arch/arm64/net/bpf_jit_comp.c | 20 +++++++++++++++++++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 6809647..49c1f1b 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -563,7 +563,25 @@ emit_cond_jmp:
>         case BPF_ST | BPF_MEM | BPF_H:
>         case BPF_ST | BPF_MEM | BPF_B:
>         case BPF_ST | BPF_MEM | BPF_DW:
> -               goto notyet;
> +               /* Load imm to a register then store it */
> +               ctx->tmp_used = 1;
> +               emit_a64_mov_i(1, tmp2, off, ctx);
> +               emit_a64_mov_i(1, tmp, imm, ctx);
> +               switch (BPF_SIZE(code)) {
> +               case BPF_W:
> +                       emit(A64_STR32(tmp, dst, tmp2), ctx);
> +                       break;
> +               case BPF_H:
> +                       emit(A64_STRH(tmp, dst, tmp2), ctx);
> +                       break;
> +               case BPF_B:
> +                       emit(A64_STRB(tmp, dst, tmp2), ctx);
> +                       break;
> +               case BPF_DW:
> +                       emit(A64_STR64(tmp, dst, tmp2), ctx);
> +                       break;
> +               }
> +               break;
>
>         /* STX: *(size *)(dst + off) = src */
>         case BPF_STX | BPF_MEM | BPF_W:
> --
> 2.0.2
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11  0:42         ` Alexei Starovoitov
  (?)
@ 2015-11-11  2:52           ` Z Lim
  -1 siblings, 0 replies; 103+ messages in thread
From: Z Lim @ 2015-11-11  2:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Shi, Yang
  Cc: Eric Dumazet, Alexei Starovoitov, daniel, Catalin Marinas,
	Will Deacon, Xi Wang, LKML, Network Development,
	linux-arm-kernel, linaro-kernel

Yang,

On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>> >>aarch64 doesn't have native support for XADD instruction, implement it by
>> >>the below instruction sequence:

aarch64 supports atomic add in ARMv8.1.
For ARMv8(.0), please consider using LDXR/STXR sequence.

>> >>
>> >>Load (dst + off) to a register
>> >>Add src to it
>> >>Store it back to (dst + off)
>> >
>> >Not really what is needed ?
>> >
>> >See this BPF_XADD as an atomic_add() equivalent.
>>
>> I see. Thanks. The documentation doesn't say too much about "exclusive" add.
>> If so it should need load-acquire/store-release.
>
> I think doc is clear enough, but it can always be improved. Pls suggest a patch.
> It's quite hard to write a test for atomicity in test_bpf framework, so
> code review is the key. Eric, thanks for catching it!
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11  2:52           ` Z Lim
  0 siblings, 0 replies; 103+ messages in thread
From: Z Lim @ 2015-11-11  2:52 UTC (permalink / raw)
  To: Alexei Starovoitov, Shi, Yang
  Cc: Eric Dumazet, Alexei Starovoitov, daniel, Catalin Marinas,
	Will Deacon, Xi Wang, LKML, Network Development,
	linux-arm-kernel, linaro-kernel

Yang,

On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>> >>aarch64 doesn't have native support for XADD instruction, implement it by
>> >>the below instruction sequence:

aarch64 supports atomic add in ARMv8.1.
For ARMv8(.0), please consider using LDXR/STXR sequence.

>> >>
>> >>Load (dst + off) to a register
>> >>Add src to it
>> >>Store it back to (dst + off)
>> >
>> >Not really what is needed ?
>> >
>> >See this BPF_XADD as an atomic_add() equivalent.
>>
>> I see. Thanks. The documentation doesn't say too much about "exclusive" add.
>> If so it should need load-acquire/store-release.
>
> I think doc is clear enough, but it can always be improved. Pls suggest a patch.
> It's quite hard to write a test for atomicity in test_bpf framework, so
> code review is the key. Eric, thanks for catching it!
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11  2:52           ` Z Lim
  0 siblings, 0 replies; 103+ messages in thread
From: Z Lim @ 2015-11-11  2:52 UTC (permalink / raw)
  To: linux-arm-kernel

Yang,

On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>> >>aarch64 doesn't have native support for XADD instruction, implement it by
>> >>the below instruction sequence:

aarch64 supports atomic add in ARMv8.1.
For ARMv8(.0), please consider using LDXR/STXR sequence.

>> >>
>> >>Load (dst + off) to a register
>> >>Add src to it
>> >>Store it back to (dst + off)
>> >
>> >Not really what is needed ?
>> >
>> >See this BPF_XADD as an atomic_add() equivalent.
>>
>> I see. Thanks. The documentation doesn't say too much about "exclusive" add.
>> If so it should need load-acquire/store-release.
>
> I think doc is clear enough, but it can always be improved. Pls suggest a patch.
> It's quite hard to write a test for atomicity in test_bpf framework, so
> code review is the key. Eric, thanks for catching it!
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11  2:52           ` Z Lim
  (?)
@ 2015-11-11  8:49             ` Arnd Bergmann
  -1 siblings, 0 replies; 103+ messages in thread
From: Arnd Bergmann @ 2015-11-11  8:49 UTC (permalink / raw)
  To: linaro-kernel
  Cc: Z Lim, Alexei Starovoitov, Shi, Yang, Eric Dumazet, daniel,
	Catalin Marinas, Will Deacon, Alexei Starovoitov, LKML,
	linux-arm-kernel, Network Development, Xi Wang

On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >> >>aarch64 doesn't have native support for XADD instruction, implement it by
> >> >>the below instruction sequence:
> 
> aarch64 supports atomic add in ARMv8.1.
> For ARMv8(.0), please consider using LDXR/STXR sequence.

Is it worth optimizing for the 8.1 case? It would add a bit of complexity
to make the code depend on the CPU feature, but it's certainly doable.

	Arnd

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11  8:49             ` Arnd Bergmann
  0 siblings, 0 replies; 103+ messages in thread
From: Arnd Bergmann @ 2015-11-11  8:49 UTC (permalink / raw)
  To: linaro-kernel
  Cc: Z Lim, Alexei Starovoitov, Shi, Yang, Eric Dumazet, daniel,
	Catalin Marinas, Will Deacon, Alexei Starovoitov, LKML,
	linux-arm-kernel, Network Development, Xi Wang

On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >> >>aarch64 doesn't have native support for XADD instruction, implement it by
> >> >>the below instruction sequence:
> 
> aarch64 supports atomic add in ARMv8.1.
> For ARMv8(.0), please consider using LDXR/STXR sequence.

Is it worth optimizing for the 8.1 case? It would add a bit of complexity
to make the code depend on the CPU feature, but it's certainly doable.

	Arnd

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11  8:49             ` Arnd Bergmann
  0 siblings, 0 replies; 103+ messages in thread
From: Arnd Bergmann @ 2015-11-11  8:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >> >>aarch64 doesn't have native support for XADD instruction, implement it by
> >> >>the below instruction sequence:
> 
> aarch64 supports atomic add in ARMv8.1.
> For ARMv8(.0), please consider using LDXR/STXR sequence.

Is it worth optimizing for the 8.1 case? It would add a bit of complexity
to make the code depend on the CPU feature, but it's certainly doable.

	Arnd

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11  8:49             ` Arnd Bergmann
  (?)
@ 2015-11-11 10:24               ` Will Deacon
  -1 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 10:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-kernel, Z Lim, Alexei Starovoitov, Shi, Yang,
	Eric Dumazet, daniel, Catalin Marinas, Alexei Starovoitov, LKML,
	linux-arm-kernel, Network Development, Xi Wang

On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> > On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> > >> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> > >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> > >> >>aarch64 doesn't have native support for XADD instruction, implement it by
> > >> >>the below instruction sequence:
> > 
> > aarch64 supports atomic add in ARMv8.1.
> > For ARMv8(.0), please consider using LDXR/STXR sequence.
> 
> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> to make the code depend on the CPU feature, but it's certainly doable.

What's the atomicity required for? Put another way, what are we racing
with (I thought bpf was single-threaded)? Do we need to worry about
memory barriers?

Apologies if these are stupid questions, but all I could find was
samples/bpf/sock_example.c and it didn't help much :(

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 10:24               ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 10:24 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linaro-kernel, Z Lim, Alexei Starovoitov, Shi, Yang,
	Eric Dumazet, daniel, Catalin Marinas, Alexei Starovoitov, LKML,
	linux-arm-kernel, Network Development, Xi Wang

On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> > On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> > >> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> > >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> > >> >>aarch64 doesn't have native support for XADD instruction, implement it by
> > >> >>the below instruction sequence:
> > 
> > aarch64 supports atomic add in ARMv8.1.
> > For ARMv8(.0), please consider using LDXR/STXR sequence.
> 
> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> to make the code depend on the CPU feature, but it's certainly doable.

What's the atomicity required for? Put another way, what are we racing
with (I thought bpf was single-threaded)? Do we need to worry about
memory barriers?

Apologies if these are stupid questions, but all I could find was
samples/bpf/sock_example.c and it didn't help much :(

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 10:24               ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 10:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> > On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > > On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> > >> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> > >> >On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> > >> >>aarch64 doesn't have native support for XADD instruction, implement it by
> > >> >>the below instruction sequence:
> > 
> > aarch64 supports atomic add in ARMv8.1.
> > For ARMv8(.0), please consider using LDXR/STXR sequence.
> 
> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> to make the code depend on the CPU feature, but it's certainly doable.

What's the atomicity required for? Put another way, what are we racing
with (I thought bpf was single-threaded)? Do we need to worry about
memory barriers?

Apologies if these are stupid questions, but all I could find was
samples/bpf/sock_example.c and it didn't help much :(

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 10:24               ` Will Deacon
  (?)
@ 2015-11-11 10:42                 ` Daniel Borkmann
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 10:42 UTC (permalink / raw)
  To: Will Deacon, Arnd Bergmann
  Cc: linaro-kernel, Z Lim, Alexei Starovoitov, Shi, Yang,
	Eric Dumazet, Catalin Marinas, Alexei Starovoitov, LKML,
	linux-arm-kernel, Network Development, Xi Wang

On 11/11/2015 11:24 AM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
>> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
>>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by
>>>>>>> the below instruction sequence:
>>>
>>> aarch64 supports atomic add in ARMv8.1.
>>> For ARMv8(.0), please consider using LDXR/STXR sequence.
>>
>> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
>> to make the code depend on the CPU feature, but it's certainly doable.
>
> What's the atomicity required for? Put another way, what are we racing
> with (I thought bpf was single-threaded)? Do we need to worry about
> memory barriers?
>
> Apologies if these are stupid questions, but all I could find was
> samples/bpf/sock_example.c and it didn't help much :(

The equivalent code more readable in restricted C syntax (that can be
compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
built-in __sync_fetch_and_add() will be translated into a BPF_XADD
insn variant.

What you can race against is that an eBPF map can be _shared_ by
multiple eBPF programs that are attached somewhere in the system, and
they could all update a particular entry/counter from the map at the
same time.

Best,
Daniel

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 10:42                 ` Daniel Borkmann
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 10:42 UTC (permalink / raw)
  To: Will Deacon, Arnd Bergmann
  Cc: linaro-kernel, Z Lim, Alexei Starovoitov, Shi, Yang,
	Eric Dumazet, Catalin Marinas, Alexei Starovoitov, LKML,
	linux-arm-kernel, Network Development, Xi Wang

On 11/11/2015 11:24 AM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
>> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
>>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by
>>>>>>> the below instruction sequence:
>>>
>>> aarch64 supports atomic add in ARMv8.1.
>>> For ARMv8(.0), please consider using LDXR/STXR sequence.
>>
>> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
>> to make the code depend on the CPU feature, but it's certainly doable.
>
> What's the atomicity required for? Put another way, what are we racing
> with (I thought bpf was single-threaded)? Do we need to worry about
> memory barriers?
>
> Apologies if these are stupid questions, but all I could find was
> samples/bpf/sock_example.c and it didn't help much :(

The equivalent code more readable in restricted C syntax (that can be
compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
built-in __sync_fetch_and_add() will be translated into a BPF_XADD
insn variant.

What you can race against is that an eBPF map can be _shared_ by
multiple eBPF programs that are attached somewhere in the system, and
they could all update a particular entry/counter from the map at the
same time.

Best,
Daniel

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 10:42                 ` Daniel Borkmann
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/11/2015 11:24 AM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
>> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
>>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by
>>>>>>> the below instruction sequence:
>>>
>>> aarch64 supports atomic add in ARMv8.1.
>>> For ARMv8(.0), please consider using LDXR/STXR sequence.
>>
>> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
>> to make the code depend on the CPU feature, but it's certainly doable.
>
> What's the atomicity required for? Put another way, what are we racing
> with (I thought bpf was single-threaded)? Do we need to worry about
> memory barriers?
>
> Apologies if these are stupid questions, but all I could find was
> samples/bpf/sock_example.c and it didn't help much :(

The equivalent code more readable in restricted C syntax (that can be
compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
built-in __sync_fetch_and_add() will be translated into a BPF_XADD
insn variant.

What you can race against is that an eBPF map can be _shared_ by
multiple eBPF programs that are attached somewhere in the system, and
they could all update a particular entry/counter from the map at the
same time.

Best,
Daniel

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 10:42                 ` Daniel Borkmann
  (?)
@ 2015-11-11 11:58                   ` Will Deacon
  -1 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 11:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Arnd Bergmann, Shi, Yang, linaro-kernel, Eric Dumazet, Z Lim,
	Alexei Starovoitov, LKML, Network Development, Xi Wang,
	Catalin Marinas, Alexei Starovoitov, linux-arm-kernel, peterz

Hi Daniel,

On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
> On 11/11/2015 11:24 AM, Will Deacon wrote:
> >On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> >>On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> >>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> >>><alexei.starovoitov@gmail.com> wrote:
> >>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by
> >>>>>>>the below instruction sequence:
> >>>
> >>>aarch64 supports atomic add in ARMv8.1.
> >>>For ARMv8(.0), please consider using LDXR/STXR sequence.
> >>
> >>Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> >>to make the code depend on the CPU feature, but it's certainly doable.
> >
> >What's the atomicity required for? Put another way, what are we racing
> >with (I thought bpf was single-threaded)? Do we need to worry about
> >memory barriers?
> >
> >Apologies if these are stupid questions, but all I could find was
> >samples/bpf/sock_example.c and it didn't help much :(
> 
> The equivalent code more readable in restricted C syntax (that can be
> compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
> built-in __sync_fetch_and_add() will be translated into a BPF_XADD
> insn variant.

Yikes, so the memory-model for BPF is based around the deprecated GCC
__sync builtins, that inherit their semantics from ia64? Any reason not
to use the C11-compatible __atomic builtins[1] as a base?

> What you can race against is that an eBPF map can be _shared_ by
> multiple eBPF programs that are attached somewhere in the system, and
> they could all update a particular entry/counter from the map at the
> same time.

Ok, so it does sound like eBPF needs to define/choose a memory-model and
I worry that riding on the back of __sync isn't necessarily the right
thing to do, particularly as its fallen out of favour with the compiler
folks. On weakly-ordered architectures, it's also going to result in
heavy-weight barriers for all atomic operations.

Will

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 11:58                   ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 11:58 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Arnd Bergmann, Shi, Yang, linaro-kernel, Eric Dumazet, Z Lim,
	Alexei Starovoitov, LKML, Network Development, Xi Wang,
	Catalin Marinas, Alexei Starovoitov, linux-arm-kernel, peterz

Hi Daniel,

On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
> On 11/11/2015 11:24 AM, Will Deacon wrote:
> >On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> >>On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> >>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> >>><alexei.starovoitov@gmail.com> wrote:
> >>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by
> >>>>>>>the below instruction sequence:
> >>>
> >>>aarch64 supports atomic add in ARMv8.1.
> >>>For ARMv8(.0), please consider using LDXR/STXR sequence.
> >>
> >>Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> >>to make the code depend on the CPU feature, but it's certainly doable.
> >
> >What's the atomicity required for? Put another way, what are we racing
> >with (I thought bpf was single-threaded)? Do we need to worry about
> >memory barriers?
> >
> >Apologies if these are stupid questions, but all I could find was
> >samples/bpf/sock_example.c and it didn't help much :(
> 
> The equivalent code more readable in restricted C syntax (that can be
> compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
> built-in __sync_fetch_and_add() will be translated into a BPF_XADD
> insn variant.

Yikes, so the memory-model for BPF is based around the deprecated GCC
__sync builtins, that inherit their semantics from ia64? Any reason not
to use the C11-compatible __atomic builtins[1] as a base?

> What you can race against is that an eBPF map can be _shared_ by
> multiple eBPF programs that are attached somewhere in the system, and
> they could all update a particular entry/counter from the map at the
> same time.

Ok, so it does sound like eBPF needs to define/choose a memory-model and
I worry that riding on the back of __sync isn't necessarily the right
thing to do, particularly as its fallen out of favour with the compiler
folks. On weakly-ordered architectures, it's also going to result in
heavy-weight barriers for all atomic operations.

Will

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 11:58                   ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 11:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Daniel,

On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
> On 11/11/2015 11:24 AM, Will Deacon wrote:
> >On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> >>On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> >>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> >>><alexei.starovoitov@gmail.com> wrote:
> >>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by
> >>>>>>>the below instruction sequence:
> >>>
> >>>aarch64 supports atomic add in ARMv8.1.
> >>>For ARMv8(.0), please consider using LDXR/STXR sequence.
> >>
> >>Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> >>to make the code depend on the CPU feature, but it's certainly doable.
> >
> >What's the atomicity required for? Put another way, what are we racing
> >with (I thought bpf was single-threaded)? Do we need to worry about
> >memory barriers?
> >
> >Apologies if these are stupid questions, but all I could find was
> >samples/bpf/sock_example.c and it didn't help much :(
> 
> The equivalent code more readable in restricted C syntax (that can be
> compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
> built-in __sync_fetch_and_add() will be translated into a BPF_XADD
> insn variant.

Yikes, so the memory-model for BPF is based around the deprecated GCC
__sync builtins, that inherit their semantics from ia64? Any reason not
to use the C11-compatible __atomic builtins[1] as a base?

> What you can race against is that an eBPF map can be _shared_ by
> multiple eBPF programs that are attached somewhere in the system, and
> they could all update a particular entry/counter from the map at the
> same time.

Ok, so it does sound like eBPF needs to define/choose a memory-model and
I worry that riding on the back of __sync isn't necessarily the right
thing to do, particularly as its fallen out of favour with the compiler
folks. On weakly-ordered architectures, it's also going to result in
heavy-weight barriers for all atomic operations.

Will

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
  2015-11-11  2:45     ` Z Lim
  (?)
@ 2015-11-11 12:12       ` Will Deacon
  -1 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 12:12 UTC (permalink / raw)
  To: Z Lim
  Cc: Yang Shi, Alexei Starovoitov, daniel, Catalin Marinas, Xi Wang,
	LKML, Network Development, linux-arm-kernel, linaro-kernel

On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:
> On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
> > aarch64 doesn't have native store immediate instruction, such operation
> 
> Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
> consider using it as an optimization.

Yes, I'd definitely like to see that in preference to moving via a
temporary register.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-11 12:12       ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 12:12 UTC (permalink / raw)
  To: Z Lim
  Cc: Yang Shi, Alexei Starovoitov, daniel, Catalin Marinas, Xi Wang,
	LKML, Network Development, linux-arm-kernel, linaro-kernel

On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:
> On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
> > aarch64 doesn't have native store immediate instruction, such operation
> 
> Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
> consider using it as an optimization.

Yes, I'd definitely like to see that in preference to moving via a
temporary register.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-11 12:12       ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 12:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:
> On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
> > aarch64 doesn't have native store immediate instruction, such operation
> 
> Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
> consider using it as an optimization.

Yes, I'd definitely like to see that in preference to moving via a
temporary register.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 11:58                   ` Will Deacon
  (?)
@ 2015-11-11 12:21                     ` Daniel Borkmann
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 12:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnd Bergmann, Shi, Yang, linaro-kernel, Eric Dumazet, Z Lim,
	Alexei Starovoitov, LKML, Network Development, Xi Wang,
	Catalin Marinas, Alexei Starovoitov, linux-arm-kernel, peterz

On 11/11/2015 12:58 PM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
>> On 11/11/2015 11:24 AM, Will Deacon wrote:
>>> On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
>>>> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
>>>>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>>>>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>>>>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>>>>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by
>>>>>>>>> the below instruction sequence:
>>>>>
>>>>> aarch64 supports atomic add in ARMv8.1.
>>>>> For ARMv8(.0), please consider using LDXR/STXR sequence.
>>>>
>>>> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
>>>> to make the code depend on the CPU feature, but it's certainly doable.
>>>
>>> What's the atomicity required for? Put another way, what are we racing
>>> with (I thought bpf was single-threaded)? Do we need to worry about
>>> memory barriers?
>>>
>>> Apologies if these are stupid questions, but all I could find was
>>> samples/bpf/sock_example.c and it didn't help much :(
>>
>> The equivalent code more readable in restricted C syntax (that can be
>> compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
>> built-in __sync_fetch_and_add() will be translated into a BPF_XADD
>> insn variant.
>
> Yikes, so the memory-model for BPF is based around the deprecated GCC
> __sync builtins, that inherit their semantics from ia64? Any reason not
> to use the C11-compatible __atomic builtins[1] as a base?

Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
from sock_example.c can be regarded as one possible equivalent program
section output from the compiler.

>> What you can race against is that an eBPF map can be _shared_ by
>> multiple eBPF programs that are attached somewhere in the system, and
>> they could all update a particular entry/counter from the map at the
>> same time.
>
> Ok, so it does sound like eBPF needs to define/choose a memory-model and
> I worry that riding on the back of __sync isn't necessarily the right
> thing to do, particularly as its fallen out of favour with the compiler
> folks. On weakly-ordered architectures, it's also going to result in
> heavy-weight barriers for all atomic operations.
>
> Will
>
> [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 12:21                     ` Daniel Borkmann
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 12:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnd Bergmann, Shi, Yang, linaro-kernel, Eric Dumazet, Z Lim,
	Alexei Starovoitov, LKML, Network Development, Xi Wang,
	Catalin Marinas, Alexei Starovoitov, linux-arm-kernel, peterz

On 11/11/2015 12:58 PM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
>> On 11/11/2015 11:24 AM, Will Deacon wrote:
>>> On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
>>>> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
>>>>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>>>>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>>>>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>>>>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by
>>>>>>>>> the below instruction sequence:
>>>>>
>>>>> aarch64 supports atomic add in ARMv8.1.
>>>>> For ARMv8(.0), please consider using LDXR/STXR sequence.
>>>>
>>>> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
>>>> to make the code depend on the CPU feature, but it's certainly doable.
>>>
>>> What's the atomicity required for? Put another way, what are we racing
>>> with (I thought bpf was single-threaded)? Do we need to worry about
>>> memory barriers?
>>>
>>> Apologies if these are stupid questions, but all I could find was
>>> samples/bpf/sock_example.c and it didn't help much :(
>>
>> The equivalent code more readable in restricted C syntax (that can be
>> compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
>> built-in __sync_fetch_and_add() will be translated into a BPF_XADD
>> insn variant.
>
> Yikes, so the memory-model for BPF is based around the deprecated GCC
> __sync builtins, that inherit their semantics from ia64? Any reason not
> to use the C11-compatible __atomic builtins[1] as a base?

Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
from sock_example.c can be regarded as one possible equivalent program
section output from the compiler.

>> What you can race against is that an eBPF map can be _shared_ by
>> multiple eBPF programs that are attached somewhere in the system, and
>> they could all update a particular entry/counter from the map at the
>> same time.
>
> Ok, so it does sound like eBPF needs to define/choose a memory-model and
> I worry that riding on the back of __sync isn't necessarily the right
> thing to do, particularly as its fallen out of favour with the compiler
> folks. On weakly-ordered architectures, it's also going to result in
> heavy-weight barriers for all atomic operations.
>
> Will
>
> [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 12:21                     ` Daniel Borkmann
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 12:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/11/2015 12:58 PM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
>> On 11/11/2015 11:24 AM, Will Deacon wrote:
>>> On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
>>>> On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
>>>>> On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>> On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
>>>>>>> On 11/10/2015 4:08 PM, Eric Dumazet wrote:
>>>>>>>> On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
>>>>>>>>> aarch64 doesn't have native support for XADD instruction, implement it by
>>>>>>>>> the below instruction sequence:
>>>>>
>>>>> aarch64 supports atomic add in ARMv8.1.
>>>>> For ARMv8(.0), please consider using LDXR/STXR sequence.
>>>>
>>>> Is it worth optimizing for the 8.1 case? It would add a bit of complexity
>>>> to make the code depend on the CPU feature, but it's certainly doable.
>>>
>>> What's the atomicity required for? Put another way, what are we racing
>>> with (I thought bpf was single-threaded)? Do we need to worry about
>>> memory barriers?
>>>
>>> Apologies if these are stupid questions, but all I could find was
>>> samples/bpf/sock_example.c and it didn't help much :(
>>
>> The equivalent code more readable in restricted C syntax (that can be
>> compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
>> built-in __sync_fetch_and_add() will be translated into a BPF_XADD
>> insn variant.
>
> Yikes, so the memory-model for BPF is based around the deprecated GCC
> __sync builtins, that inherit their semantics from ia64? Any reason not
> to use the C11-compatible __atomic builtins[1] as a base?

Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
from sock_example.c can be regarded as one possible equivalent program
section output from the compiler.

>> What you can race against is that an eBPF map can be _shared_ by
>> multiple eBPF programs that are attached somewhere in the system, and
>> they could all update a particular entry/counter from the map at the
>> same time.
>
> Ok, so it does sound like eBPF needs to define/choose a memory-model and
> I worry that riding on the back of __sync isn't necessarily the right
> thing to do, particularly as its fallen out of favour with the compiler
> folks. On weakly-ordered architectures, it's also going to result in
> heavy-weight barriers for all atomic operations.
>
> Will
>
> [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 12:21                     ` Daniel Borkmann
  (?)
@ 2015-11-11 12:38                       ` Will Deacon
  -1 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 12:38 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Arnd Bergmann, Shi, Yang, linaro-kernel, Eric Dumazet, Z Lim,
	Alexei Starovoitov, LKML, Network Development, Xi Wang,
	Catalin Marinas, Alexei Starovoitov, linux-arm-kernel, peterz

On Wed, Nov 11, 2015 at 01:21:04PM +0100, Daniel Borkmann wrote:
> On 11/11/2015 12:58 PM, Will Deacon wrote:
> >On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
> >>On 11/11/2015 11:24 AM, Will Deacon wrote:
> >>>On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> >>>>On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> >>>>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> >>>>><alexei.starovoitov@gmail.com> wrote:
> >>>>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >>>>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >>>>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>>>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by
> >>>>>>>>>the below instruction sequence:
> >>>>>
> >>>>>aarch64 supports atomic add in ARMv8.1.
> >>>>>For ARMv8(.0), please consider using LDXR/STXR sequence.
> >>>>
> >>>>Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> >>>>to make the code depend on the CPU feature, but it's certainly doable.
> >>>
> >>>What's the atomicity required for? Put another way, what are we racing
> >>>with (I thought bpf was single-threaded)? Do we need to worry about
> >>>memory barriers?
> >>>
> >>>Apologies if these are stupid questions, but all I could find was
> >>>samples/bpf/sock_example.c and it didn't help much :(
> >>
> >>The equivalent code more readable in restricted C syntax (that can be
> >>compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
> >>built-in __sync_fetch_and_add() will be translated into a BPF_XADD
> >>insn variant.
> >
> >Yikes, so the memory-model for BPF is based around the deprecated GCC
> >__sync builtins, that inherit their semantics from ia64? Any reason not
> >to use the C11-compatible __atomic builtins[1] as a base?
> 
> Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
> gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
> keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
> interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
> and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
> from sock_example.c can be regarded as one possible equivalent program
> section output from the compiler.

Ok, so if I understand you correctly, then __sync_fetch_and_add() has
different semantics depending on the backend target. That seems counter
to the LLVM atomics Documentation:

  http://llvm.org/docs/Atomics.html

which specifically calls out the __sync_* primitives as being
sequentially-consistent and requiring barriers on ARM (which isn't the
case for atomic[64]_add in the kernel).

If we re-use the __sync_* naming scheme in the source language, I don't
think we can overlay our own semantics in the backend. The
__sync_fetch_and_add primitive is also expected to return the old value,
which doesn't appear to be the case for BPF_XADD.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 12:38                       ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 12:38 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Arnd Bergmann, Shi, Yang, linaro-kernel, Eric Dumazet, Z Lim,
	Alexei Starovoitov, LKML, Network Development, Xi Wang,
	Catalin Marinas, Alexei Starovoitov, linux-arm-kernel, peterz

On Wed, Nov 11, 2015 at 01:21:04PM +0100, Daniel Borkmann wrote:
> On 11/11/2015 12:58 PM, Will Deacon wrote:
> >On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
> >>On 11/11/2015 11:24 AM, Will Deacon wrote:
> >>>On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> >>>>On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> >>>>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> >>>>><alexei.starovoitov@gmail.com> wrote:
> >>>>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >>>>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >>>>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>>>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by
> >>>>>>>>>the below instruction sequence:
> >>>>>
> >>>>>aarch64 supports atomic add in ARMv8.1.
> >>>>>For ARMv8(.0), please consider using LDXR/STXR sequence.
> >>>>
> >>>>Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> >>>>to make the code depend on the CPU feature, but it's certainly doable.
> >>>
> >>>What's the atomicity required for? Put another way, what are we racing
> >>>with (I thought bpf was single-threaded)? Do we need to worry about
> >>>memory barriers?
> >>>
> >>>Apologies if these are stupid questions, but all I could find was
> >>>samples/bpf/sock_example.c and it didn't help much :(
> >>
> >>The equivalent code more readable in restricted C syntax (that can be
> >>compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
> >>built-in __sync_fetch_and_add() will be translated into a BPF_XADD
> >>insn variant.
> >
> >Yikes, so the memory-model for BPF is based around the deprecated GCC
> >__sync builtins, that inherit their semantics from ia64? Any reason not
> >to use the C11-compatible __atomic builtins[1] as a base?
> 
> Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
> gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
> keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
> interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
> and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
> from sock_example.c can be regarded as one possible equivalent program
> section output from the compiler.

Ok, so if I understand you correctly, then __sync_fetch_and_add() has
different semantics depending on the backend target. That seems counter
to the LLVM atomics Documentation:

  http://llvm.org/docs/Atomics.html

which specifically calls out the __sync_* primitives as being
sequentially-consistent and requiring barriers on ARM (which isn't the
case for atomic[64]_add in the kernel).

If we re-use the __sync_* naming scheme in the source language, I don't
think we can overlay our own semantics in the backend. The
__sync_fetch_and_add primitive is also expected to return the old value,
which doesn't appear to be the case for BPF_XADD.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 12:38                       ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 12:38 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 01:21:04PM +0100, Daniel Borkmann wrote:
> On 11/11/2015 12:58 PM, Will Deacon wrote:
> >On Wed, Nov 11, 2015 at 11:42:11AM +0100, Daniel Borkmann wrote:
> >>On 11/11/2015 11:24 AM, Will Deacon wrote:
> >>>On Wed, Nov 11, 2015 at 09:49:48AM +0100, Arnd Bergmann wrote:
> >>>>On Tuesday 10 November 2015 18:52:45 Z Lim wrote:
> >>>>>On Tue, Nov 10, 2015 at 4:42 PM, Alexei Starovoitov
> >>>>><alexei.starovoitov@gmail.com> wrote:
> >>>>>>On Tue, Nov 10, 2015 at 04:26:02PM -0800, Shi, Yang wrote:
> >>>>>>>On 11/10/2015 4:08 PM, Eric Dumazet wrote:
> >>>>>>>>On Tue, 2015-11-10 at 14:41 -0800, Yang Shi wrote:
> >>>>>>>>>aarch64 doesn't have native support for XADD instruction, implement it by
> >>>>>>>>>the below instruction sequence:
> >>>>>
> >>>>>aarch64 supports atomic add in ARMv8.1.
> >>>>>For ARMv8(.0), please consider using LDXR/STXR sequence.
> >>>>
> >>>>Is it worth optimizing for the 8.1 case? It would add a bit of complexity
> >>>>to make the code depend on the CPU feature, but it's certainly doable.
> >>>
> >>>What's the atomicity required for? Put another way, what are we racing
> >>>with (I thought bpf was single-threaded)? Do we need to worry about
> >>>memory barriers?
> >>>
> >>>Apologies if these are stupid questions, but all I could find was
> >>>samples/bpf/sock_example.c and it didn't help much :(
> >>
> >>The equivalent code more readable in restricted C syntax (that can be
> >>compiled by llvm) can be found in samples/bpf/sockex1_kern.c. So the
> >>built-in __sync_fetch_and_add() will be translated into a BPF_XADD
> >>insn variant.
> >
> >Yikes, so the memory-model for BPF is based around the deprecated GCC
> >__sync builtins, that inherit their semantics from ia64? Any reason not
> >to use the C11-compatible __atomic builtins[1] as a base?
> 
> Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
> gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
> keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
> interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
> and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
> from sock_example.c can be regarded as one possible equivalent program
> section output from the compiler.

Ok, so if I understand you correctly, then __sync_fetch_and_add() has
different semantics depending on the backend target. That seems counter
to the LLVM atomics Documentation:

  http://llvm.org/docs/Atomics.html

which specifically calls out the __sync_* primitives as being
sequentially-consistent and requiring barriers on ARM (which isn't the
case for atomic[64]_add in the kernel).

If we re-use the __sync_* naming scheme in the source language, I don't
think we can overlay our own semantics in the backend. The
__sync_fetch_and_add primitive is also expected to return the old value,
which doesn't appear to be the case for BPF_XADD.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
  2015-11-11 12:12       ` Will Deacon
  (?)
@ 2015-11-11 12:39         ` Will Deacon
  -1 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 12:39 UTC (permalink / raw)
  To: Z Lim
  Cc: Yang Shi, Alexei Starovoitov, daniel, Catalin Marinas, Xi Wang,
	LKML, Network Development, linux-arm-kernel, linaro-kernel

On Wed, Nov 11, 2015 at 12:12:56PM +0000, Will Deacon wrote:
> On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:
> > On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
> > > aarch64 doesn't have native store immediate instruction, such operation
> > 
> > Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
> > consider using it as an optimization.
> 
> Yes, I'd definitely like to see that in preference to moving via a
> temporary register.

Wait a second, we're both talking rubbish here :) The STR (immediate)
form is referring to the addressing mode, whereas this patch wants to
store an immediate value to memory, which does need moving to a register
first.

So the original patch is fine.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-11 12:39         ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 12:39 UTC (permalink / raw)
  To: Z Lim
  Cc: Yang Shi, Alexei Starovoitov, daniel, Catalin Marinas, Xi Wang,
	LKML, Network Development, linux-arm-kernel, linaro-kernel

On Wed, Nov 11, 2015 at 12:12:56PM +0000, Will Deacon wrote:
> On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:
> > On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
> > > aarch64 doesn't have native store immediate instruction, such operation
> > 
> > Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
> > consider using it as an optimization.
> 
> Yes, I'd definitely like to see that in preference to moving via a
> temporary register.

Wait a second, we're both talking rubbish here :) The STR (immediate)
form is referring to the addressing mode, whereas this patch wants to
store an immediate value to memory, which does need moving to a register
first.

So the original patch is fine.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-11 12:39         ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 12:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 12:12:56PM +0000, Will Deacon wrote:
> On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:
> > On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
> > > aarch64 doesn't have native store immediate instruction, such operation
> > 
> > Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
> > consider using it as an optimization.
> 
> Yes, I'd definitely like to see that in preference to moving via a
> temporary register.

Wait a second, we're both talking rubbish here :) The STR (immediate)
form is referring to the addressing mode, whereas this patch wants to
store an immediate value to memory, which does need moving to a register
first.

So the original patch is fine.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 12:38                       ` Will Deacon
  (?)
@ 2015-11-11 12:58                         ` Peter Zijlstra
  -1 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 12:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Daniel Borkmann, Arnd Bergmann, Shi, Yang, linaro-kernel,
	Eric Dumazet, Z Lim, Alexei Starovoitov, LKML,
	Network Development, Xi Wang, Catalin Marinas,
	Alexei Starovoitov, linux-arm-kernel

On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote:
> > Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
> > gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
> > keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
> > interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
> > and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
> > from sock_example.c can be regarded as one possible equivalent program
> > section output from the compiler.
> 
> Ok, so if I understand you correctly, then __sync_fetch_and_add() has
> different semantics depending on the backend target. That seems counter
> to the LLVM atomics Documentation:
> 
>   http://llvm.org/docs/Atomics.html
> 
> which specifically calls out the __sync_* primitives as being
> sequentially-consistent and requiring barriers on ARM (which isn't the
> case for atomic[64]_add in the kernel).
> 
> If we re-use the __sync_* naming scheme in the source language, I don't
> think we can overlay our own semantics in the backend. The
> __sync_fetch_and_add primitive is also expected to return the old value,
> which doesn't appear to be the case for BPF_XADD.

Yikes. That's double fail. Please don't do this.

If you use the __sync stuff (and I agree with Will, you should not) it
really _SHOULD_ be sequentially consistent, which means full barriers
all over the place.

And if you name something XADD (exchange and add, or fetch-add) then it
had better return the previous value.

atomic*_add() does neither.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 12:58                         ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 12:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Daniel Borkmann, Arnd Bergmann, Shi, Yang, linaro-kernel,
	Eric Dumazet, Z Lim, Alexei Starovoitov, LKML,
	Network Development, Xi Wang, Catalin Marinas,
	Alexei Starovoitov, linux-arm-kernel

On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote:
> > Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
> > gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
> > keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
> > interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
> > and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
> > from sock_example.c can be regarded as one possible equivalent program
> > section output from the compiler.
> 
> Ok, so if I understand you correctly, then __sync_fetch_and_add() has
> different semantics depending on the backend target. That seems counter
> to the LLVM atomics Documentation:
> 
>   http://llvm.org/docs/Atomics.html
> 
> which specifically calls out the __sync_* primitives as being
> sequentially-consistent and requiring barriers on ARM (which isn't the
> case for atomic[64]_add in the kernel).
> 
> If we re-use the __sync_* naming scheme in the source language, I don't
> think we can overlay our own semantics in the backend. The
> __sync_fetch_and_add primitive is also expected to return the old value,
> which doesn't appear to be the case for BPF_XADD.

Yikes. That's double fail. Please don't do this.

If you use the __sync stuff (and I agree with Will, you should not) it
really _SHOULD_ be sequentially consistent, which means full barriers
all over the place.

And if you name something XADD (exchange and add, or fetch-add) then it
had better return the previous value.

atomic*_add() does neither.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 12:58                         ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 12:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote:
> > Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
> > gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
> > keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
> > interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
> > and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
> > from sock_example.c can be regarded as one possible equivalent program
> > section output from the compiler.
> 
> Ok, so if I understand you correctly, then __sync_fetch_and_add() has
> different semantics depending on the backend target. That seems counter
> to the LLVM atomics Documentation:
> 
>   http://llvm.org/docs/Atomics.html
> 
> which specifically calls out the __sync_* primitives as being
> sequentially-consistent and requiring barriers on ARM (which isn't the
> case for atomic[64]_add in the kernel).
> 
> If we re-use the __sync_* naming scheme in the source language, I don't
> think we can overlay our own semantics in the backend. The
> __sync_fetch_and_add primitive is also expected to return the old value,
> which doesn't appear to be the case for BPF_XADD.

Yikes. That's double fail. Please don't do this.

If you use the __sync stuff (and I agree with Will, you should not) it
really _SHOULD_ be sequentially consistent, which means full barriers
all over the place.

And if you name something XADD (exchange and add, or fetch-add) then it
had better return the previous value.

atomic*_add() does neither.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 12:58                         ` Peter Zijlstra
  (?)
@ 2015-11-11 15:52                           ` Daniel Borkmann
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 15:52 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon
  Cc: Arnd Bergmann, Shi, Yang, linaro-kernel, Eric Dumazet, Z Lim,
	Alexei Starovoitov, LKML, Network Development, Xi Wang,
	Catalin Marinas, Alexei Starovoitov, linux-arm-kernel, yhs,
	bblanco

On 11/11/2015 01:58 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote:
>>> Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
>>> gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
>>> keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
>>> interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
>>> and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
>>> from sock_example.c can be regarded as one possible equivalent program
>>> section output from the compiler.
>>
>> Ok, so if I understand you correctly, then __sync_fetch_and_add() has
>> different semantics depending on the backend target. That seems counter
>> to the LLVM atomics Documentation:
>>
>>    http://llvm.org/docs/Atomics.html
>>
>> which specifically calls out the __sync_* primitives as being
>> sequentially-consistent and requiring barriers on ARM (which isn't the
>> case for atomic[64]_add in the kernel).
>>
>> If we re-use the __sync_* naming scheme in the source language, I don't
>> think we can overlay our own semantics in the backend. The
>> __sync_fetch_and_add primitive is also expected to return the old value,
>> which doesn't appear to be the case for BPF_XADD.
>
> Yikes. That's double fail. Please don't do this.
>
> If you use the __sync stuff (and I agree with Will, you should not) it
> really _SHOULD_ be sequentially consistent, which means full barriers
> all over the place.
>
> And if you name something XADD (exchange and add, or fetch-add) then it
> had better return the previous value.
>
> atomic*_add() does neither.

unsigned int ui;
unsigned long long ull;

void foo(void)
{
   (void) __sync_fetch_and_add(&ui, 1);
   (void) __sync_fetch_and_add(&ull, 1);
}

So clang front-end translates this snippet into intermediate
representation of ...

clang test.c -S -emit-llvm -o -
[...]
define void @foo() #0 {
   %1 = atomicrmw add i32* @ui, i32 1 seq_cst
   %2 = atomicrmw add i64* @ull, i64 1 seq_cst
   ret void
}
[...]

... which, if I see this correctly, then maps atomicrmw add {i32,i64}
in the BPF target into BPF_XADD as mentioned:

// Atomics
class XADD<bits<2> SizeOp, string OpcodeStr, PatFrag OpNode>
     : InstBPF<(outs GPR:$dst), (ins MEMri:$addr, GPR:$val),
               !strconcat(OpcodeStr, "\t$dst, $addr, $val"),
               [(set GPR:$dst, (OpNode ADDRri:$addr, GPR:$val))]> {
   bits<3> mode;
   bits<2> size;
   bits<4> src;
   bits<20> addr;

   let Inst{63-61} = mode;
   let Inst{60-59} = size;
   let Inst{51-48} = addr{19-16}; // base reg
   let Inst{55-52} = src;
   let Inst{47-32} = addr{15-0}; // offset

   let mode = 6;     // BPF_XADD
   let size = SizeOp;
   let BPFClass = 3; // BPF_STX
}

let Constraints = "$dst = $val" in {
def XADD32 : XADD<0, "xadd32", atomic_load_add_32>;
def XADD64 : XADD<3, "xadd64", atomic_load_add_64>;
// undefined def XADD16 : XADD<1, "xadd16", atomic_load_add_16>;
// undefined def XADD8  : XADD<2, "xadd8", atomic_load_add_8>;
}

I played a bit around with eBPF code to assign the __sync_fetch_and_add()
return value to a var and dump it to trace pipe, or use it as return code.
llvm compiles it (with the result assignment) and it looks like:

[...]
206: (b7) r3 = 3
207: (db) lock *(u64 *)(r0 +0) += r3
208: (bf) r1 = r10
209: (07) r1 += -16
210: (b7) r2 = 10
211: (85) call 6 // r3 dumped here
[...]

[...]
206: (b7) r5 = 3
207: (db) lock *(u64 *)(r0 +0) += r5
208: (bf) r1 = r10
209: (07) r1 += -16
210: (b7) r2 = 10
211: (b7) r3 = 43
212: (b7) r4 = 42
213: (85) call 6 // r5 dumped here
[...]

[...]
11: (b7) r0 = 3
12: (db) lock *(u64 *)(r1 +0) += r0
13: (95) exit // r0 returned here
[...]

What it seems is that we 'get back' the value (== 3 here in r3, r5, r0)
that we're adding, at least that's what seems to be generated wrt
register assignments. Hmm, the semantic differences of bpf target
should be documented somewhere for people writing eBPF programs to
be aware of.

Best,
Daniel

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 15:52                           ` Daniel Borkmann
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 15:52 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon
  Cc: Arnd Bergmann, Shi, Yang, linaro-kernel, Eric Dumazet, Z Lim,
	Alexei Starovoitov, LKML, Network Development, Xi Wang,
	Catalin Marinas, Alexei Starovoitov, linux-arm-kernel, yhs,
	bblanco

On 11/11/2015 01:58 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote:
>>> Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
>>> gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
>>> keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
>>> interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
>>> and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
>>> from sock_example.c can be regarded as one possible equivalent program
>>> section output from the compiler.
>>
>> Ok, so if I understand you correctly, then __sync_fetch_and_add() has
>> different semantics depending on the backend target. That seems counter
>> to the LLVM atomics Documentation:
>>
>>    http://llvm.org/docs/Atomics.html
>>
>> which specifically calls out the __sync_* primitives as being
>> sequentially-consistent and requiring barriers on ARM (which isn't the
>> case for atomic[64]_add in the kernel).
>>
>> If we re-use the __sync_* naming scheme in the source language, I don't
>> think we can overlay our own semantics in the backend. The
>> __sync_fetch_and_add primitive is also expected to return the old value,
>> which doesn't appear to be the case for BPF_XADD.
>
> Yikes. That's double fail. Please don't do this.
>
> If you use the __sync stuff (and I agree with Will, you should not) it
> really _SHOULD_ be sequentially consistent, which means full barriers
> all over the place.
>
> And if you name something XADD (exchange and add, or fetch-add) then it
> had better return the previous value.
>
> atomic*_add() does neither.

unsigned int ui;
unsigned long long ull;

void foo(void)
{
   (void) __sync_fetch_and_add(&ui, 1);
   (void) __sync_fetch_and_add(&ull, 1);
}

So clang front-end translates this snippet into intermediate
representation of ...

clang test.c -S -emit-llvm -o -
[...]
define void @foo() #0 {
   %1 = atomicrmw add i32* @ui, i32 1 seq_cst
   %2 = atomicrmw add i64* @ull, i64 1 seq_cst
   ret void
}
[...]

... which, if I see this correctly, then maps atomicrmw add {i32,i64}
in the BPF target into BPF_XADD as mentioned:

// Atomics
class XADD<bits<2> SizeOp, string OpcodeStr, PatFrag OpNode>
     : InstBPF<(outs GPR:$dst), (ins MEMri:$addr, GPR:$val),
               !strconcat(OpcodeStr, "\t$dst, $addr, $val"),
               [(set GPR:$dst, (OpNode ADDRri:$addr, GPR:$val))]> {
   bits<3> mode;
   bits<2> size;
   bits<4> src;
   bits<20> addr;

   let Inst{63-61} = mode;
   let Inst{60-59} = size;
   let Inst{51-48} = addr{19-16}; // base reg
   let Inst{55-52} = src;
   let Inst{47-32} = addr{15-0}; // offset

   let mode = 6;     // BPF_XADD
   let size = SizeOp;
   let BPFClass = 3; // BPF_STX
}

let Constraints = "$dst = $val" in {
def XADD32 : XADD<0, "xadd32", atomic_load_add_32>;
def XADD64 : XADD<3, "xadd64", atomic_load_add_64>;
// undefined def XADD16 : XADD<1, "xadd16", atomic_load_add_16>;
// undefined def XADD8  : XADD<2, "xadd8", atomic_load_add_8>;
}

I played a bit around with eBPF code to assign the __sync_fetch_and_add()
return value to a var and dump it to trace pipe, or use it as return code.
llvm compiles it (with the result assignment) and it looks like:

[...]
206: (b7) r3 = 3
207: (db) lock *(u64 *)(r0 +0) += r3
208: (bf) r1 = r10
209: (07) r1 += -16
210: (b7) r2 = 10
211: (85) call 6 // r3 dumped here
[...]

[...]
206: (b7) r5 = 3
207: (db) lock *(u64 *)(r0 +0) += r5
208: (bf) r1 = r10
209: (07) r1 += -16
210: (b7) r2 = 10
211: (b7) r3 = 43
212: (b7) r4 = 42
213: (85) call 6 // r5 dumped here
[...]

[...]
11: (b7) r0 = 3
12: (db) lock *(u64 *)(r1 +0) += r0
13: (95) exit // r0 returned here
[...]

What it seems is that we 'get back' the value (== 3 here in r3, r5, r0)
that we're adding, at least that's what seems to be generated wrt
register assignments. Hmm, the semantic differences of bpf target
should be documented somewhere for people writing eBPF programs to
be aware of.

Best,
Daniel

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 15:52                           ` Daniel Borkmann
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 15:52 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/11/2015 01:58 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 12:38:31PM +0000, Will Deacon wrote:
>>> Hmm, gcc doesn't have an eBPF compiler backend, so this won't work on
>>> gcc at all. The eBPF backend in LLVM recognizes the __sync_fetch_and_add()
>>> keyword and maps that to a BPF_XADD version (BPF_W or BPF_DW). In the
>>> interpreter (__bpf_prog_run()), as Eric mentioned, this maps to atomic_add()
>>> and atomic64_add(), respectively. So the struct bpf_insn prog[] you saw
>>> from sock_example.c can be regarded as one possible equivalent program
>>> section output from the compiler.
>>
>> Ok, so if I understand you correctly, then __sync_fetch_and_add() has
>> different semantics depending on the backend target. That seems counter
>> to the LLVM atomics Documentation:
>>
>>    http://llvm.org/docs/Atomics.html
>>
>> which specifically calls out the __sync_* primitives as being
>> sequentially-consistent and requiring barriers on ARM (which isn't the
>> case for atomic[64]_add in the kernel).
>>
>> If we re-use the __sync_* naming scheme in the source language, I don't
>> think we can overlay our own semantics in the backend. The
>> __sync_fetch_and_add primitive is also expected to return the old value,
>> which doesn't appear to be the case for BPF_XADD.
>
> Yikes. That's double fail. Please don't do this.
>
> If you use the __sync stuff (and I agree with Will, you should not) it
> really _SHOULD_ be sequentially consistent, which means full barriers
> all over the place.
>
> And if you name something XADD (exchange and add, or fetch-add) then it
> had better return the previous value.
>
> atomic*_add() does neither.

unsigned int ui;
unsigned long long ull;

void foo(void)
{
   (void) __sync_fetch_and_add(&ui, 1);
   (void) __sync_fetch_and_add(&ull, 1);
}

So clang front-end translates this snippet into intermediate
representation of ...

clang test.c -S -emit-llvm -o -
[...]
define void @foo() #0 {
   %1 = atomicrmw add i32* @ui, i32 1 seq_cst
   %2 = atomicrmw add i64* @ull, i64 1 seq_cst
   ret void
}
[...]

... which, if I see this correctly, then maps atomicrmw add {i32,i64}
in the BPF target into BPF_XADD as mentioned:

// Atomics
class XADD<bits<2> SizeOp, string OpcodeStr, PatFrag OpNode>
     : InstBPF<(outs GPR:$dst), (ins MEMri:$addr, GPR:$val),
               !strconcat(OpcodeStr, "\t$dst, $addr, $val"),
               [(set GPR:$dst, (OpNode ADDRri:$addr, GPR:$val))]> {
   bits<3> mode;
   bits<2> size;
   bits<4> src;
   bits<20> addr;

   let Inst{63-61} = mode;
   let Inst{60-59} = size;
   let Inst{51-48} = addr{19-16}; // base reg
   let Inst{55-52} = src;
   let Inst{47-32} = addr{15-0}; // offset

   let mode = 6;     // BPF_XADD
   let size = SizeOp;
   let BPFClass = 3; // BPF_STX
}

let Constraints = "$dst = $val" in {
def XADD32 : XADD<0, "xadd32", atomic_load_add_32>;
def XADD64 : XADD<3, "xadd64", atomic_load_add_64>;
// undefined def XADD16 : XADD<1, "xadd16", atomic_load_add_16>;
// undefined def XADD8  : XADD<2, "xadd8", atomic_load_add_8>;
}

I played a bit around with eBPF code to assign the __sync_fetch_and_add()
return value to a var and dump it to trace pipe, or use it as return code.
llvm compiles it (with the result assignment) and it looks like:

[...]
206: (b7) r3 = 3
207: (db) lock *(u64 *)(r0 +0) += r3
208: (bf) r1 = r10
209: (07) r1 += -16
210: (b7) r2 = 10
211: (85) call 6 // r3 dumped here
[...]

[...]
206: (b7) r5 = 3
207: (db) lock *(u64 *)(r0 +0) += r5
208: (bf) r1 = r10
209: (07) r1 += -16
210: (b7) r2 = 10
211: (b7) r3 = 43
212: (b7) r4 = 42
213: (85) call 6 // r5 dumped here
[...]

[...]
11: (b7) r0 = 3
12: (db) lock *(u64 *)(r1 +0) += r0
13: (95) exit // r0 returned here
[...]

What it seems is that we 'get back' the value (== 3 here in r3, r5, r0)
that we're adding, at least that's what seems to be generated wrt
register assignments. Hmm, the semantic differences of bpf target
should be documented somewhere for people writing eBPF programs to
be aware of.

Best,
Daniel

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 15:52                           ` Daniel Borkmann
  (?)
@ 2015-11-11 16:23                             ` Will Deacon
  -1 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 16:23 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Peter Zijlstra, Arnd Bergmann, Shi, Yang, linaro-kernel,
	Eric Dumazet, Z Lim, Alexei Starovoitov, LKML,
	Network Development, Xi Wang, Catalin Marinas,
	Alexei Starovoitov, linux-arm-kernel, yhs, bblanco

Hi Daniel,

Thanks for investigating this further.

On Wed, Nov 11, 2015 at 04:52:00PM +0100, Daniel Borkmann wrote:
> I played a bit around with eBPF code to assign the __sync_fetch_and_add()
> return value to a var and dump it to trace pipe, or use it as return code.
> llvm compiles it (with the result assignment) and it looks like:
> 
> [...]
> 206: (b7) r3 = 3
> 207: (db) lock *(u64 *)(r0 +0) += r3
> 208: (bf) r1 = r10
> 209: (07) r1 += -16
> 210: (b7) r2 = 10
> 211: (85) call 6 // r3 dumped here
> [...]
> 
> [...]
> 206: (b7) r5 = 3
> 207: (db) lock *(u64 *)(r0 +0) += r5
> 208: (bf) r1 = r10
> 209: (07) r1 += -16
> 210: (b7) r2 = 10
> 211: (b7) r3 = 43
> 212: (b7) r4 = 42
> 213: (85) call 6 // r5 dumped here
> [...]
> 
> [...]
> 11: (b7) r0 = 3
> 12: (db) lock *(u64 *)(r1 +0) += r0
> 13: (95) exit // r0 returned here
> [...]
> 
> What it seems is that we 'get back' the value (== 3 here in r3, r5, r0)
> that we're adding, at least that's what seems to be generated wrt
> register assignments. Hmm, the semantic differences of bpf target
> should be documented somewhere for people writing eBPF programs to
> be aware of.

If we're going to document it, a bug tracker might be a good place to
start. The behaviour, as it stands, is broken wrt the definition of the
__sync primitives. That is, there is no way to build __sync_fetch_and_add
out of BPF_XADD without changing its semantics.

We could fix this by either:

(1) Defining BPF_XADD to match __sync_fetch_and_add (including memory
    barriers).

(2) Introducing some new BPF_ atomics, that map to something like the
    C11 __atomic builtins and deprecating BPF_XADD in favour of these.

(3) Introducing new source-language intrinsics to match what BPF can do
    (unlikely to be popular).

As it stands, I'm not especially keen on adding BPF_XADD to the arm64
JIT backend until we have at least (1) and preferably (2) as well.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 16:23                             ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 16:23 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Peter Zijlstra, Arnd Bergmann, Shi, Yang, linaro-kernel,
	Eric Dumazet, Z Lim, Alexei Starovoitov, LKML,
	Network Development, Xi Wang, Catalin Marinas,
	Alexei Starovoitov, linux-arm-kernel, yhs, bblanco

Hi Daniel,

Thanks for investigating this further.

On Wed, Nov 11, 2015 at 04:52:00PM +0100, Daniel Borkmann wrote:
> I played a bit around with eBPF code to assign the __sync_fetch_and_add()
> return value to a var and dump it to trace pipe, or use it as return code.
> llvm compiles it (with the result assignment) and it looks like:
> 
> [...]
> 206: (b7) r3 = 3
> 207: (db) lock *(u64 *)(r0 +0) += r3
> 208: (bf) r1 = r10
> 209: (07) r1 += -16
> 210: (b7) r2 = 10
> 211: (85) call 6 // r3 dumped here
> [...]
> 
> [...]
> 206: (b7) r5 = 3
> 207: (db) lock *(u64 *)(r0 +0) += r5
> 208: (bf) r1 = r10
> 209: (07) r1 += -16
> 210: (b7) r2 = 10
> 211: (b7) r3 = 43
> 212: (b7) r4 = 42
> 213: (85) call 6 // r5 dumped here
> [...]
> 
> [...]
> 11: (b7) r0 = 3
> 12: (db) lock *(u64 *)(r1 +0) += r0
> 13: (95) exit // r0 returned here
> [...]
> 
> What it seems is that we 'get back' the value (== 3 here in r3, r5, r0)
> that we're adding, at least that's what seems to be generated wrt
> register assignments. Hmm, the semantic differences of bpf target
> should be documented somewhere for people writing eBPF programs to
> be aware of.

If we're going to document it, a bug tracker might be a good place to
start. The behaviour, as it stands, is broken wrt the definition of the
__sync primitives. That is, there is no way to build __sync_fetch_and_add
out of BPF_XADD without changing its semantics.

We could fix this by either:

(1) Defining BPF_XADD to match __sync_fetch_and_add (including memory
    barriers).

(2) Introducing some new BPF_ atomics, that map to something like the
    C11 __atomic builtins and deprecating BPF_XADD in favour of these.

(3) Introducing new source-language intrinsics to match what BPF can do
    (unlikely to be popular).

As it stands, I'm not especially keen on adding BPF_XADD to the arm64
JIT backend until we have at least (1) and preferably (2) as well.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 16:23                             ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 16:23 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Daniel,

Thanks for investigating this further.

On Wed, Nov 11, 2015 at 04:52:00PM +0100, Daniel Borkmann wrote:
> I played a bit around with eBPF code to assign the __sync_fetch_and_add()
> return value to a var and dump it to trace pipe, or use it as return code.
> llvm compiles it (with the result assignment) and it looks like:
> 
> [...]
> 206: (b7) r3 = 3
> 207: (db) lock *(u64 *)(r0 +0) += r3
> 208: (bf) r1 = r10
> 209: (07) r1 += -16
> 210: (b7) r2 = 10
> 211: (85) call 6 // r3 dumped here
> [...]
> 
> [...]
> 206: (b7) r5 = 3
> 207: (db) lock *(u64 *)(r0 +0) += r5
> 208: (bf) r1 = r10
> 209: (07) r1 += -16
> 210: (b7) r2 = 10
> 211: (b7) r3 = 43
> 212: (b7) r4 = 42
> 213: (85) call 6 // r5 dumped here
> [...]
> 
> [...]
> 11: (b7) r0 = 3
> 12: (db) lock *(u64 *)(r1 +0) += r0
> 13: (95) exit // r0 returned here
> [...]
> 
> What it seems is that we 'get back' the value (== 3 here in r3, r5, r0)
> that we're adding, at least that's what seems to be generated wrt
> register assignments. Hmm, the semantic differences of bpf target
> should be documented somewhere for people writing eBPF programs to
> be aware of.

If we're going to document it, a bug tracker might be a good place to
start. The behaviour, as it stands, is broken wrt the definition of the
__sync primitives. That is, there is no way to build __sync_fetch_and_add
out of BPF_XADD without changing its semantics.

We could fix this by either:

(1) Defining BPF_XADD to match __sync_fetch_and_add (including memory
    barriers).

(2) Introducing some new BPF_ atomics, that map to something like the
    C11 __atomic builtins and deprecating BPF_XADD in favour of these.

(3) Introducing new source-language intrinsics to match what BPF can do
    (unlikely to be popular).

As it stands, I'm not especially keen on adding BPF_XADD to the arm64
JIT backend until we have at least (1) and preferably (2) as well.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 16:23                             ` Will Deacon
  (?)
@ 2015-11-11 17:27                               ` Alexei Starovoitov
  -1 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 17:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Daniel Borkmann, Peter Zijlstra, Arnd Bergmann, Shi, Yang,
	linaro-kernel, Eric Dumazet, Z Lim, Alexei Starovoitov, LKML,
	Network Development, Xi Wang, Catalin Marinas, linux-arm-kernel,
	yhs, bblanco

On Wed, Nov 11, 2015 at 04:23:41PM +0000, Will Deacon wrote:
> 
> If we're going to document it, a bug tracker might be a good place to
> start. The behaviour, as it stands, is broken wrt the definition of the
> __sync primitives. That is, there is no way to build __sync_fetch_and_add
> out of BPF_XADD without changing its semantics.

BPF_XADD == atomic_add() in kernel. period.
we are not going to deprecate it or introduce something else.
Semantics of __sync* or atomic in C standard and/or gcc/llvm has
nothing to do with this.
arm64 JIT needs to JIT bpf_xadd insn equivalent to the code
of atomic_add() which is 'stadd' in armv8.1.
The cpu check can be done by jit and for older cpus just fall back
to interpreter. trivial.

> We could fix this by either:
> 
> (1) Defining BPF_XADD to match __sync_fetch_and_add (including memory
>     barriers).

nope.

> (2) Introducing some new BPF_ atomics, that map to something like the
>     C11 __atomic builtins and deprecating BPF_XADD in favour of these.

nope.

> (3) Introducing new source-language intrinsics to match what BPF can do
>     (unlikely to be popular).

llvm's __sync intrinsic is used temporarily until we have time to do
new intrinsic in llvm that matches kernel's atomic_add() properly.
It will be done similar to llvm-bpf load_byte/word intrinsics.
Note that we've been hiding it under lock_xadd() wrapper, like here:
https://github.com/iovisor/bcc/blob/master/examples/networking/tunnel_monitor/monitor.c#L130


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 17:27                               ` Alexei Starovoitov
  0 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 17:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Daniel Borkmann, Peter Zijlstra, Arnd Bergmann, Shi, Yang,
	linaro-kernel, Eric Dumazet, Z Lim, Alexei Starovoitov, LKML,
	Network Development, Xi Wang, Catalin Marinas, linux-arm-kernel,
	yhs, bblanco

On Wed, Nov 11, 2015 at 04:23:41PM +0000, Will Deacon wrote:
> 
> If we're going to document it, a bug tracker might be a good place to
> start. The behaviour, as it stands, is broken wrt the definition of the
> __sync primitives. That is, there is no way to build __sync_fetch_and_add
> out of BPF_XADD without changing its semantics.

BPF_XADD == atomic_add() in kernel. period.
we are not going to deprecate it or introduce something else.
Semantics of __sync* or atomic in C standard and/or gcc/llvm has
nothing to do with this.
arm64 JIT needs to JIT bpf_xadd insn equivalent to the code
of atomic_add() which is 'stadd' in armv8.1.
The cpu check can be done by jit and for older cpus just fall back
to interpreter. trivial.

> We could fix this by either:
> 
> (1) Defining BPF_XADD to match __sync_fetch_and_add (including memory
>     barriers).

nope.

> (2) Introducing some new BPF_ atomics, that map to something like the
>     C11 __atomic builtins and deprecating BPF_XADD in favour of these.

nope.

> (3) Introducing new source-language intrinsics to match what BPF can do
>     (unlikely to be popular).

llvm's __sync intrinsic is used temporarily until we have time to do
new intrinsic in llvm that matches kernel's atomic_add() properly.
It will be done similar to llvm-bpf load_byte/word intrinsics.
Note that we've been hiding it under lock_xadd() wrapper, like here:
https://github.com/iovisor/bcc/blob/master/examples/networking/tunnel_monitor/monitor.c#L130

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 17:27                               ` Alexei Starovoitov
  0 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 17:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 04:23:41PM +0000, Will Deacon wrote:
> 
> If we're going to document it, a bug tracker might be a good place to
> start. The behaviour, as it stands, is broken wrt the definition of the
> __sync primitives. That is, there is no way to build __sync_fetch_and_add
> out of BPF_XADD without changing its semantics.

BPF_XADD == atomic_add() in kernel. period.
we are not going to deprecate it or introduce something else.
Semantics of __sync* or atomic in C standard and/or gcc/llvm has
nothing to do with this.
arm64 JIT needs to JIT bpf_xadd insn equivalent to the code
of atomic_add() which is 'stadd' in armv8.1.
The cpu check can be done by jit and for older cpus just fall back
to interpreter. trivial.

> We could fix this by either:
> 
> (1) Defining BPF_XADD to match __sync_fetch_and_add (including memory
>     barriers).

nope.

> (2) Introducing some new BPF_ atomics, that map to something like the
>     C11 __atomic builtins and deprecating BPF_XADD in favour of these.

nope.

> (3) Introducing new source-language intrinsics to match what BPF can do
>     (unlikely to be popular).

llvm's __sync intrinsic is used temporarily until we have time to do
new intrinsic in llvm that matches kernel's atomic_add() properly.
It will be done similar to llvm-bpf load_byte/word intrinsics.
Note that we've been hiding it under lock_xadd() wrapper, like here:
https://github.com/iovisor/bcc/blob/master/examples/networking/tunnel_monitor/monitor.c#L130

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 17:27                               ` Alexei Starovoitov
@ 2015-11-11 17:35                                 ` David Miller
  -1 siblings, 0 replies; 103+ messages in thread
From: David Miller @ 2015-11-11 17:35 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: will.deacon, daniel, peterz, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Wed, 11 Nov 2015 09:27:00 -0800

> BPF_XADD == atomic_add() in kernel. period.
> we are not going to deprecate it or introduce something else.

Agreed, it makes no sense to try and tie C99 or whatever atomic
semantics to something that is already clearly defined to have
exactly kernel atomic_add() semantics.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 17:35                                 ` David Miller
  0 siblings, 0 replies; 103+ messages in thread
From: David Miller @ 2015-11-11 17:35 UTC (permalink / raw)
  To: linux-arm-kernel

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Wed, 11 Nov 2015 09:27:00 -0800

> BPF_XADD == atomic_add() in kernel. period.
> we are not going to deprecate it or introduce something else.

Agreed, it makes no sense to try and tie C99 or whatever atomic
semantics to something that is already clearly defined to have
exactly kernel atomic_add() semantics.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 17:35                                 ` David Miller
@ 2015-11-11 17:44                                   ` Will Deacon
  -1 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 17:44 UTC (permalink / raw)
  To: David Miller
  Cc: alexei.starovoitov, daniel, peterz, arnd, yang.shi,
	linaro-kernel, eric.dumazet, zlim.lnx, ast, linux-kernel, netdev,
	xi.wang, catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Date: Wed, 11 Nov 2015 09:27:00 -0800
> 
> > BPF_XADD == atomic_add() in kernel. period.
> > we are not going to deprecate it or introduce something else.
> 
> Agreed, it makes no sense to try and tie C99 or whatever atomic
> semantics to something that is already clearly defined to have
> exactly kernel atomic_add() semantics.

... and which is emitted by LLVM when asked to compile __sync_fetch_and_add,
which has clearly defined (yet conflicting) semantics.

If the discrepancy is in LLVM (and it sounds like it is), then I'll raise
a bug over there instead.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 17:44                                   ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 17:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Date: Wed, 11 Nov 2015 09:27:00 -0800
> 
> > BPF_XADD == atomic_add() in kernel. period.
> > we are not going to deprecate it or introduce something else.
> 
> Agreed, it makes no sense to try and tie C99 or whatever atomic
> semantics to something that is already clearly defined to have
> exactly kernel atomic_add() semantics.

... and which is emitted by LLVM when asked to compile __sync_fetch_and_add,
which has clearly defined (yet conflicting) semantics.

If the discrepancy is in LLVM (and it sounds like it is), then I'll raise
a bug over there instead.

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 17:35                                 ` David Miller
@ 2015-11-11 17:57                                   ` Peter Zijlstra
  -1 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 17:57 UTC (permalink / raw)
  To: David Miller
  Cc: alexei.starovoitov, will.deacon, daniel, arnd, yang.shi,
	linaro-kernel, eric.dumazet, zlim.lnx, ast, linux-kernel, netdev,
	xi.wang, catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Date: Wed, 11 Nov 2015 09:27:00 -0800
> 
> > BPF_XADD == atomic_add() in kernel. period.
> > we are not going to deprecate it or introduce something else.
> 
> Agreed, it makes no sense to try and tie C99 or whatever atomic
> semantics to something that is already clearly defined to have
> exactly kernel atomic_add() semantics.

Dave, this really doesn't make any sense to me. __sync primitives have
well defined semantics and (e)BPF is violating this.

Furthermore, the fetch_and_add (or XADD) name has well defined
semantics, which (e)BPF also violates.

Atomicy is hard enough as it is, backends giving random interpretations
to them isn't helping anybody.

It also baffles me that Alexei is seemingly unwilling to change/rev the
(e)BPF instructions, which would be invisible to the regular user, he
does want to change the language itself, which will impact all
'scripts'.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 17:57                                   ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> Date: Wed, 11 Nov 2015 09:27:00 -0800
> 
> > BPF_XADD == atomic_add() in kernel. period.
> > we are not going to deprecate it or introduce something else.
> 
> Agreed, it makes no sense to try and tie C99 or whatever atomic
> semantics to something that is already clearly defined to have
> exactly kernel atomic_add() semantics.

Dave, this really doesn't make any sense to me. __sync primitives have
well defined semantics and (e)BPF is violating this.

Furthermore, the fetch_and_add (or XADD) name has well defined
semantics, which (e)BPF also violates.

Atomicy is hard enough as it is, backends giving random interpretations
to them isn't helping anybody.

It also baffles me that Alexei is seemingly unwilling to change/rev the
(e)BPF instructions, which would be invisible to the regular user, he
does want to change the language itself, which will impact all
'scripts'.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 17:57                                   ` Peter Zijlstra
@ 2015-11-11 18:11                                     ` Alexei Starovoitov
  -1 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 18:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Miller, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Date: Wed, 11 Nov 2015 09:27:00 -0800
> > 
> > > BPF_XADD == atomic_add() in kernel. period.
> > > we are not going to deprecate it or introduce something else.
> > 
> > Agreed, it makes no sense to try and tie C99 or whatever atomic
> > semantics to something that is already clearly defined to have
> > exactly kernel atomic_add() semantics.
> 
> Dave, this really doesn't make any sense to me. __sync primitives have
> well defined semantics and (e)BPF is violating this.

bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
>From the day one it meant to be atomic_add() as kernel does it.
I did piggy back on __sync in the llvm backend because it was the quick
and dirty way to move forward.
In retrospect I should have introduced a clean intrinstic for that instead,
but it's not too late to do it now. user space we can change at any time
unlike kernel.

> Furthermore, the fetch_and_add (or XADD) name has well defined
> semantics, which (e)BPF also violates.

bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.

> Atomicy is hard enough as it is, backends giving random interpretations
> to them isn't helping anybody.

no randomness. bpf_xadd == atomic_add() in kernel.
imo that is the simplest and cleanest intepretantion one can have, no?

> It also baffles me that Alexei is seemingly unwilling to change/rev the
> (e)BPF instructions, which would be invisible to the regular user, he
> does want to change the language itself, which will impact all
> 'scripts'.

well, we cannot change it in kernel because it's ABI.
I'm not against adding new insns. We definitely can, but let's figure out why?
Is anything broken? No. So what new insns make sense?
Add new one that does 'fetch_and_add' ? What is the real use case it
will be used for?
Adding new intrinsic to llvm is not a big deal. I'll add it as soon
as I have time to work on it or if somebody beats me to it I would be
glad to test it and apply it.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 18:11                                     ` Alexei Starovoitov
  0 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 18:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > Date: Wed, 11 Nov 2015 09:27:00 -0800
> > 
> > > BPF_XADD == atomic_add() in kernel. period.
> > > we are not going to deprecate it or introduce something else.
> > 
> > Agreed, it makes no sense to try and tie C99 or whatever atomic
> > semantics to something that is already clearly defined to have
> > exactly kernel atomic_add() semantics.
> 
> Dave, this really doesn't make any sense to me. __sync primitives have
> well defined semantics and (e)BPF is violating this.

bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
>From the day one it meant to be atomic_add() as kernel does it.
I did piggy back on __sync in the llvm backend because it was the quick
and dirty way to move forward.
In retrospect I should have introduced a clean intrinstic for that instead,
but it's not too late to do it now. user space we can change at any time
unlike kernel.

> Furthermore, the fetch_and_add (or XADD) name has well defined
> semantics, which (e)BPF also violates.

bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.

> Atomicy is hard enough as it is, backends giving random interpretations
> to them isn't helping anybody.

no randomness. bpf_xadd == atomic_add() in kernel.
imo that is the simplest and cleanest intepretantion one can have, no?

> It also baffles me that Alexei is seemingly unwilling to change/rev the
> (e)BPF instructions, which would be invisible to the regular user, he
> does want to change the language itself, which will impact all
> 'scripts'.

well, we cannot change it in kernel because it's ABI.
I'm not against adding new insns. We definitely can, but let's figure out why?
Is anything broken? No. So what new insns make sense?
Add new one that does 'fetch_and_add' ? What is the real use case it
will be used for?
Adding new intrinsic to llvm is not a big deal. I'll add it as soon
as I have time to work on it or if somebody beats me to it I would be
glad to test it and apply it.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:11                                     ` Alexei Starovoitov
@ 2015-11-11 18:31                                       ` Peter Zijlstra
  -1 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 18:31 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Date: Wed, 11 Nov 2015 09:27:00 -0800
> > > 
> > > > BPF_XADD == atomic_add() in kernel. period.
> > > > we are not going to deprecate it or introduce something else.
> > > 
> > > Agreed, it makes no sense to try and tie C99 or whatever atomic
> > > semantics to something that is already clearly defined to have
> > > exactly kernel atomic_add() semantics.
> > 
> > Dave, this really doesn't make any sense to me. __sync primitives have
> > well defined semantics and (e)BPF is violating this.
> 
> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
> From the day one it meant to be atomic_add() as kernel does it.
> I did piggy back on __sync in the llvm backend because it was the quick
> and dirty way to move forward.
> In retrospect I should have introduced a clean intrinstic for that instead,
> but it's not too late to do it now. user space we can change at any time
> unlike kernel.

I would argue that breaking userspace (language in this case) is equally
bad. Programs that used to work will now no longer work.

> > Furthermore, the fetch_and_add (or XADD) name has well defined
> > semantics, which (e)BPF also violates.
> 
> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.

Then why the 'X'? The XADD name, does and always has meant: eXchange-ADD,
this means it must have a return value.

You using the XADD name for something that is not in fact XADD is just
wrong.

> > Atomicy is hard enough as it is, backends giving random interpretations
> > to them isn't helping anybody.
> 
> no randomness. 

You mean every other backend translating __sync_fetch_and_add()
differently than you isn't random on your part?

> bpf_xadd == atomic_add() in kernel.
> imo that is the simplest and cleanest intepretantion one can have, no?

Wrong though, if you'd named it BPF_ADD, sure, XADD, not so much. That
is 'randomly' co-opting something that has well defined meaning and
semantics with something else.

> > It also baffles me that Alexei is seemingly unwilling to change/rev the
> > (e)BPF instructions, which would be invisible to the regular user, he
> > does want to change the language itself, which will impact all
> > 'scripts'.
> 
> well, we cannot change it in kernel because it's ABI.

You can always rev it. Introduce a new set, and wait for users of the
old set to die, then remove it. We do that all the time with Linux ABI.

> I'm not against adding new insns. We definitely can, but let's figure out why?
> Is anything broken? No. 

Yes, __sync_fetch_and_add() is broken when pulled through the eBPF
backend.

> So what new insns make sense?

Depends a bit on how fancy you want to go. If you want to support weakly
ordered architectures at full speed you'll need more (and more
complexity) than if you decide to not go that way.

The simplest option would be a fully ordered compare-and-swap operation.
That is enough to implement everything else (at a cost). The other
extreme is a weak ll/sc with an optimizer pass recognising various forms
to translate into 'better' native instructions.

> Add new one that does 'fetch_and_add' ? What is the real use case it
> will be used for?

Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
example would be a reader-writer lock implementations. See
include/asm-generic/rwsem.h for examples.

> Adding new intrinsic to llvm is not a big deal. I'll add it as soon
> as I have time to work on it or if somebody beats me to it I would be
> glad to test it and apply it.

This isn't a speed coding contest. You want to think about this
properly.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 18:31                                       ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 18:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Date: Wed, 11 Nov 2015 09:27:00 -0800
> > > 
> > > > BPF_XADD == atomic_add() in kernel. period.
> > > > we are not going to deprecate it or introduce something else.
> > > 
> > > Agreed, it makes no sense to try and tie C99 or whatever atomic
> > > semantics to something that is already clearly defined to have
> > > exactly kernel atomic_add() semantics.
> > 
> > Dave, this really doesn't make any sense to me. __sync primitives have
> > well defined semantics and (e)BPF is violating this.
> 
> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
> From the day one it meant to be atomic_add() as kernel does it.
> I did piggy back on __sync in the llvm backend because it was the quick
> and dirty way to move forward.
> In retrospect I should have introduced a clean intrinstic for that instead,
> but it's not too late to do it now. user space we can change at any time
> unlike kernel.

I would argue that breaking userspace (language in this case) is equally
bad. Programs that used to work will now no longer work.

> > Furthermore, the fetch_and_add (or XADD) name has well defined
> > semantics, which (e)BPF also violates.
> 
> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.

Then why the 'X'? The XADD name, does and always has meant: eXchange-ADD,
this means it must have a return value.

You using the XADD name for something that is not in fact XADD is just
wrong.

> > Atomicy is hard enough as it is, backends giving random interpretations
> > to them isn't helping anybody.
> 
> no randomness. 

You mean every other backend translating __sync_fetch_and_add()
differently than you isn't random on your part?

> bpf_xadd == atomic_add() in kernel.
> imo that is the simplest and cleanest intepretantion one can have, no?

Wrong though, if you'd named it BPF_ADD, sure, XADD, not so much. That
is 'randomly' co-opting something that has well defined meaning and
semantics with something else.

> > It also baffles me that Alexei is seemingly unwilling to change/rev the
> > (e)BPF instructions, which would be invisible to the regular user, he
> > does want to change the language itself, which will impact all
> > 'scripts'.
> 
> well, we cannot change it in kernel because it's ABI.

You can always rev it. Introduce a new set, and wait for users of the
old set to die, then remove it. We do that all the time with Linux ABI.

> I'm not against adding new insns. We definitely can, but let's figure out why?
> Is anything broken? No. 

Yes, __sync_fetch_and_add() is broken when pulled through the eBPF
backend.

> So what new insns make sense?

Depends a bit on how fancy you want to go. If you want to support weakly
ordered architectures at full speed you'll need more (and more
complexity) than if you decide to not go that way.

The simplest option would be a fully ordered compare-and-swap operation.
That is enough to implement everything else (at a cost). The other
extreme is a weak ll/sc with an optimizer pass recognising various forms
to translate into 'better' native instructions.

> Add new one that does 'fetch_and_add' ? What is the real use case it
> will be used for?

Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
example would be a reader-writer lock implementations. See
include/asm-generic/rwsem.h for examples.

> Adding new intrinsic to llvm is not a big deal. I'll add it as soon
> as I have time to work on it or if somebody beats me to it I would be
> glad to test it and apply it.

This isn't a speed coding contest. You want to think about this
properly.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:31                                       ` Peter Zijlstra
@ 2015-11-11 18:41                                         ` Peter Zijlstra
  -1 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 18:41 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > Adding new intrinsic to llvm is not a big deal. I'll add it as soon
> > as I have time to work on it or if somebody beats me to it I would be
> > glad to test it and apply it.
> 
> This isn't a speed coding contest. You want to think about this
> properly.

That is, I don't think you want to go add LLVM intrinsics at all. You
want to piggy back on the memory model work done by the C/C++11 people.

What you want to think about is what the memory model of your virtual
machine is and how many instructions you want to expose for that.

Concurrency is a right pain, a little time and effort now will safe
heaps of pain down the road.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 18:41                                         ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 18:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > Adding new intrinsic to llvm is not a big deal. I'll add it as soon
> > as I have time to work on it or if somebody beats me to it I would be
> > glad to test it and apply it.
> 
> This isn't a speed coding contest. You want to think about this
> properly.

That is, I don't think you want to go add LLVM intrinsics at all. You
want to piggy back on the memory model work done by the C/C++11 people.

What you want to think about is what the memory model of your virtual
machine is and how many instructions you want to expose for that.

Concurrency is a right pain, a little time and effort now will safe
heaps of pain down the road.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:31                                       ` Peter Zijlstra
@ 2015-11-11 18:44                                         ` Peter Zijlstra
  -1 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 18:44 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > Add new one that does 'fetch_and_add' ? What is the real use case it
> > will be used for?
> 
> Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> example would be a reader-writer lock implementations. See
> include/asm-generic/rwsem.h for examples.

Maybe a better example would be refcounting, where you free on 0.

	if (!fetch_add(&obj->ref, -1))
		free(obj);



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 18:44                                         ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 18:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > Add new one that does 'fetch_and_add' ? What is the real use case it
> > will be used for?
> 
> Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> example would be a reader-writer lock implementations. See
> include/asm-generic/rwsem.h for examples.

Maybe a better example would be refcounting, where you free on 0.

	if (!fetch_add(&obj->ref, -1))
		free(obj);

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:11                                     ` Alexei Starovoitov
@ 2015-11-11 18:46                                       ` Will Deacon
  -1 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 18:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Peter Zijlstra, David Miller, daniel, arnd, yang.shi,
	linaro-kernel, eric.dumazet, zlim.lnx, ast, linux-kernel, netdev,
	xi.wang, catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Date: Wed, 11 Nov 2015 09:27:00 -0800
> > > 
> > > > BPF_XADD == atomic_add() in kernel. period.
> > > > we are not going to deprecate it or introduce something else.
> > > 
> > > Agreed, it makes no sense to try and tie C99 or whatever atomic
> > > semantics to something that is already clearly defined to have
> > > exactly kernel atomic_add() semantics.
> > 
> > Dave, this really doesn't make any sense to me. __sync primitives have
> > well defined semantics and (e)BPF is violating this.
> 
> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
> From the day one it meant to be atomic_add() as kernel does it.
> I did piggy back on __sync in the llvm backend because it was the quick
> and dirty way to move forward.
> In retrospect I should have introduced a clean intrinstic for that instead,
> but it's not too late to do it now. user space we can change at any time
> unlike kernel.

But it's not just "user space", it's the source language definition!
I also don't see how you can change it now, without simply rejecting
the __sync primitives outright.

> > Furthermore, the fetch_and_add (or XADD) name has well defined
> > semantics, which (e)BPF also violates.
> 
> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.

Right, so it's just a misnomer.

> > Atomicy is hard enough as it is, backends giving random interpretations
> > to them isn't helping anybody.
> 
> no randomness. bpf_xadd == atomic_add() in kernel.
> imo that is the simplest and cleanest intepretantion one can have, no?

I don't really mind, as long as there is a semantic that everybody agrees
on. Really, I just want this to be consistent because memory models are
a PITA enough without having multiple interpretations flying around.

> > It also baffles me that Alexei is seemingly unwilling to change/rev the
> > (e)BPF instructions, which would be invisible to the regular user, he
> > does want to change the language itself, which will impact all
> > 'scripts'.
> 
> well, we cannot change it in kernel because it's ABI.
> I'm not against adding new insns. We definitely can, but let's figure out why?
> Is anything broken? No. So what new insns make sense?

If you end up needing a suite of atomics, I would suggest the __atomic
builtins because they are likely to be more portable and more flexible
than trying to use the kernel memory model outside of the environment
for which it was developed. However, I agree with you that we can cross
that bridge when we get there.

> Adding new intrinsic to llvm is not a big deal. I'll add it as soon
> as I have time to work on it or if somebody beats me to it I would be
> glad to test it and apply it.

I'm more interested in what you do about the existing intrinsic. Anyway,
I'll raise a ticket against LLVM so that they're aware (and maybe
somebody else will fix it :).

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 18:46                                       ` Will Deacon
  0 siblings, 0 replies; 103+ messages in thread
From: Will Deacon @ 2015-11-11 18:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
> > > From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
> > > Date: Wed, 11 Nov 2015 09:27:00 -0800
> > > 
> > > > BPF_XADD == atomic_add() in kernel. period.
> > > > we are not going to deprecate it or introduce something else.
> > > 
> > > Agreed, it makes no sense to try and tie C99 or whatever atomic
> > > semantics to something that is already clearly defined to have
> > > exactly kernel atomic_add() semantics.
> > 
> > Dave, this really doesn't make any sense to me. __sync primitives have
> > well defined semantics and (e)BPF is violating this.
> 
> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
> From the day one it meant to be atomic_add() as kernel does it.
> I did piggy back on __sync in the llvm backend because it was the quick
> and dirty way to move forward.
> In retrospect I should have introduced a clean intrinstic for that instead,
> but it's not too late to do it now. user space we can change at any time
> unlike kernel.

But it's not just "user space", it's the source language definition!
I also don't see how you can change it now, without simply rejecting
the __sync primitives outright.

> > Furthermore, the fetch_and_add (or XADD) name has well defined
> > semantics, which (e)BPF also violates.
> 
> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.

Right, so it's just a misnomer.

> > Atomicy is hard enough as it is, backends giving random interpretations
> > to them isn't helping anybody.
> 
> no randomness. bpf_xadd == atomic_add() in kernel.
> imo that is the simplest and cleanest intepretantion one can have, no?

I don't really mind, as long as there is a semantic that everybody agrees
on. Really, I just want this to be consistent because memory models are
a PITA enough without having multiple interpretations flying around.

> > It also baffles me that Alexei is seemingly unwilling to change/rev the
> > (e)BPF instructions, which would be invisible to the regular user, he
> > does want to change the language itself, which will impact all
> > 'scripts'.
> 
> well, we cannot change it in kernel because it's ABI.
> I'm not against adding new insns. We definitely can, but let's figure out why?
> Is anything broken? No. So what new insns make sense?

If you end up needing a suite of atomics, I would suggest the __atomic
builtins because they are likely to be more portable and more flexible
than trying to use the kernel memory model outside of the environment
for which it was developed. However, I agree with you that we can cross
that bridge when we get there.

> Adding new intrinsic to llvm is not a big deal. I'll add it as soon
> as I have time to work on it or if somebody beats me to it I would be
> glad to test it and apply it.

I'm more interested in what you do about the existing intrinsic. Anyway,
I'll raise a ticket against LLVM so that they're aware (and maybe
somebody else will fix it :).

Will

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:31                                       ` Peter Zijlstra
@ 2015-11-11 18:50                                         ` Daniel Borkmann
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 18:50 UTC (permalink / raw)
  To: Peter Zijlstra, Alexei Starovoitov
  Cc: David Miller, will.deacon, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On 11/11/2015 07:31 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote:
>> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
>>> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
>>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>>> Date: Wed, 11 Nov 2015 09:27:00 -0800
>>>>
>>>>> BPF_XADD == atomic_add() in kernel. period.
>>>>> we are not going to deprecate it or introduce something else.
>>>>
>>>> Agreed, it makes no sense to try and tie C99 or whatever atomic
>>>> semantics to something that is already clearly defined to have
>>>> exactly kernel atomic_add() semantics.
>>>
>>> Dave, this really doesn't make any sense to me. __sync primitives have
>>> well defined semantics and (e)BPF is violating this.
>>
>> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
>>  From the day one it meant to be atomic_add() as kernel does it.
>> I did piggy back on __sync in the llvm backend because it was the quick
>> and dirty way to move forward.
>> In retrospect I should have introduced a clean intrinstic for that instead,
>> but it's not too late to do it now. user space we can change at any time
>> unlike kernel.
>
> I would argue that breaking userspace (language in this case) is equally
> bad. Programs that used to work will now no longer work.

Well, on that note, it's not like you just change the target to bpf in your
Makefile and can compile (& load into the kernel) anything you want with it.
You do have to write small, restricted programs from scratch for a specific
use-case with the limited set of helper functions and intrinsics that are
available from the kernel. So I don't think that "Programs that used to work
will now no longer work." holds if you regard it as such.

>>> Furthermore, the fetch_and_add (or XADD) name has well defined
>>> semantics, which (e)BPF also violates.
>>
>> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.
>
> Then why the 'X'? The XADD name, does and always has meant: eXchange-ADD,
> this means it must have a return value.
>
> You using the XADD name for something that is not in fact XADD is just
> wrong.
>
>>> Atomicy is hard enough as it is, backends giving random interpretations
>>> to them isn't helping anybody.
>>
>> no randomness.
>
> You mean every other backend translating __sync_fetch_and_add()
> differently than you isn't random on your part?
>
>> bpf_xadd == atomic_add() in kernel.
>> imo that is the simplest and cleanest intepretantion one can have, no?
>
> Wrong though, if you'd named it BPF_ADD, sure, XADD, not so much. That
> is 'randomly' co-opting something that has well defined meaning and
> semantics with something else.
>
>>> It also baffles me that Alexei is seemingly unwilling to change/rev the
>>> (e)BPF instructions, which would be invisible to the regular user, he
>>> does want to change the language itself, which will impact all
>>> 'scripts'.
>>
>> well, we cannot change it in kernel because it's ABI.
>
> You can always rev it. Introduce a new set, and wait for users of the
> old set to die, then remove it. We do that all the time with Linux ABI.
>
>> I'm not against adding new insns. We definitely can, but let's figure out why?
>> Is anything broken? No.
>
> Yes, __sync_fetch_and_add() is broken when pulled through the eBPF
> backend.
>
>> So what new insns make sense?
>
> Depends a bit on how fancy you want to go. If you want to support weakly
> ordered architectures at full speed you'll need more (and more
> complexity) than if you decide to not go that way.
>
> The simplest option would be a fully ordered compare-and-swap operation.
> That is enough to implement everything else (at a cost). The other
> extreme is a weak ll/sc with an optimizer pass recognising various forms
> to translate into 'better' native instructions.
>
>> Add new one that does 'fetch_and_add' ? What is the real use case it
>> will be used for?
>
> Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> example would be a reader-writer lock implementations. See
> include/asm-generic/rwsem.h for examples.
>
>> Adding new intrinsic to llvm is not a big deal. I'll add it as soon
>> as I have time to work on it or if somebody beats me to it I would be
>> glad to test it and apply it.
>
> This isn't a speed coding contest. You want to think about this
> properly.
>


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 18:50                                         ` Daniel Borkmann
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 18:50 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/11/2015 07:31 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 10:11:33AM -0800, Alexei Starovoitov wrote:
>> On Wed, Nov 11, 2015 at 06:57:41PM +0100, Peter Zijlstra wrote:
>>> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
>>>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>>>> Date: Wed, 11 Nov 2015 09:27:00 -0800
>>>>
>>>>> BPF_XADD == atomic_add() in kernel. period.
>>>>> we are not going to deprecate it or introduce something else.
>>>>
>>>> Agreed, it makes no sense to try and tie C99 or whatever atomic
>>>> semantics to something that is already clearly defined to have
>>>> exactly kernel atomic_add() semantics.
>>>
>>> Dave, this really doesn't make any sense to me. __sync primitives have
>>> well defined semantics and (e)BPF is violating this.
>>
>> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
>>  From the day one it meant to be atomic_add() as kernel does it.
>> I did piggy back on __sync in the llvm backend because it was the quick
>> and dirty way to move forward.
>> In retrospect I should have introduced a clean intrinstic for that instead,
>> but it's not too late to do it now. user space we can change at any time
>> unlike kernel.
>
> I would argue that breaking userspace (language in this case) is equally
> bad. Programs that used to work will now no longer work.

Well, on that note, it's not like you just change the target to bpf in your
Makefile and can compile (& load into the kernel) anything you want with it.
You do have to write small, restricted programs from scratch for a specific
use-case with the limited set of helper functions and intrinsics that are
available from the kernel. So I don't think that "Programs that used to work
will now no longer work." holds if you regard it as such.

>>> Furthermore, the fetch_and_add (or XADD) name has well defined
>>> semantics, which (e)BPF also violates.
>>
>> bpf_xadd also didn't meant to be 'fetch'. It was void return from the beginning.
>
> Then why the 'X'? The XADD name, does and always has meant: eXchange-ADD,
> this means it must have a return value.
>
> You using the XADD name for something that is not in fact XADD is just
> wrong.
>
>>> Atomicy is hard enough as it is, backends giving random interpretations
>>> to them isn't helping anybody.
>>
>> no randomness.
>
> You mean every other backend translating __sync_fetch_and_add()
> differently than you isn't random on your part?
>
>> bpf_xadd == atomic_add() in kernel.
>> imo that is the simplest and cleanest intepretantion one can have, no?
>
> Wrong though, if you'd named it BPF_ADD, sure, XADD, not so much. That
> is 'randomly' co-opting something that has well defined meaning and
> semantics with something else.
>
>>> It also baffles me that Alexei is seemingly unwilling to change/rev the
>>> (e)BPF instructions, which would be invisible to the regular user, he
>>> does want to change the language itself, which will impact all
>>> 'scripts'.
>>
>> well, we cannot change it in kernel because it's ABI.
>
> You can always rev it. Introduce a new set, and wait for users of the
> old set to die, then remove it. We do that all the time with Linux ABI.
>
>> I'm not against adding new insns. We definitely can, but let's figure out why?
>> Is anything broken? No.
>
> Yes, __sync_fetch_and_add() is broken when pulled through the eBPF
> backend.
>
>> So what new insns make sense?
>
> Depends a bit on how fancy you want to go. If you want to support weakly
> ordered architectures at full speed you'll need more (and more
> complexity) than if you decide to not go that way.
>
> The simplest option would be a fully ordered compare-and-swap operation.
> That is enough to implement everything else (at a cost). The other
> extreme is a weak ll/sc with an optimizer pass recognising various forms
> to translate into 'better' native instructions.
>
>> Add new one that does 'fetch_and_add' ? What is the real use case it
>> will be used for?
>
> Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> example would be a reader-writer lock implementations. See
> include/asm-generic/rwsem.h for examples.
>
>> Adding new intrinsic to llvm is not a big deal. I'll add it as soon
>> as I have time to work on it or if somebody beats me to it I would be
>> glad to test it and apply it.
>
> This isn't a speed coding contest. You want to think about this
> properly.
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:44                                         ` Peter Zijlstra
@ 2015-11-11 18:54                                           ` Peter Zijlstra
  -1 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 18:54 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 07:44:27PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > > Add new one that does 'fetch_and_add' ? What is the real use case it
> > > will be used for?
> > 
> > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> > example would be a reader-writer lock implementations. See
> > include/asm-generic/rwsem.h for examples.
> 
> Maybe a better example would be refcounting, where you free on 0.
> 
> 	if (!fetch_add(&obj->ref, -1))
> 		free(obj);

Urgh, too used to the atomic_add_return(), which returns post op. That
wants to be:

	if (fetch_add(&obj->ref, -1) == 1)
		free(obj);

Note that I would very much recommend _against_ encoding the post-op
thing in instructions. It works for reversible operations (like add) but
is pointless for irreversible operations (like or).

That is, given or_return(), you cannot reconstruct the state
prior to the operation, so or_return() provides less information than
fetch_or().



^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 18:54                                           ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 18:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 07:44:27PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > > Add new one that does 'fetch_and_add' ? What is the real use case it
> > > will be used for?
> > 
> > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> > example would be a reader-writer lock implementations. See
> > include/asm-generic/rwsem.h for examples.
> 
> Maybe a better example would be refcounting, where you free on 0.
> 
> 	if (!fetch_add(&obj->ref, -1))
> 		free(obj);

Urgh, too used to the atomic_add_return(), which returns post op. That
wants to be:

	if (fetch_add(&obj->ref, -1) == 1)
		free(obj);

Note that I would very much recommend _against_ encoding the post-op
thing in instructions. It works for reversible operations (like add) but
is pointless for irreversible operations (like or).

That is, given or_return(), you cannot reconstruct the state
prior to the operation, so or_return() provides less information than
fetch_or().

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 17:44                                   ` Will Deacon
@ 2015-11-11 19:01                                     ` David Miller
  -1 siblings, 0 replies; 103+ messages in thread
From: David Miller @ 2015-11-11 19:01 UTC (permalink / raw)
  To: will.deacon
  Cc: alexei.starovoitov, daniel, peterz, arnd, yang.shi,
	linaro-kernel, eric.dumazet, zlim.lnx, ast, linux-kernel, netdev,
	xi.wang, catalin.marinas, linux-arm-kernel, yhs, bblanco

From: Will Deacon <will.deacon@arm.com>
Date: Wed, 11 Nov 2015 17:44:01 +0000

> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Date: Wed, 11 Nov 2015 09:27:00 -0800
>> 
>> > BPF_XADD == atomic_add() in kernel. period.
>> > we are not going to deprecate it or introduce something else.
>> 
>> Agreed, it makes no sense to try and tie C99 or whatever atomic
>> semantics to something that is already clearly defined to have
>> exactly kernel atomic_add() semantics.
> 
> ... and which is emitted by LLVM when asked to compile __sync_fetch_and_add,
> which has clearly defined (yet conflicting) semantics.

Alexei clearly stated that he knows about this issue and will fully
fix this up in LLVM.

What more do you need to hear from him once he's stated that he is
aware and is working on it?  Meanwhile you should make your JIT emit
what is expected, rather than arguing to change the semantics.

Thanks.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 19:01                                     ` David Miller
  0 siblings, 0 replies; 103+ messages in thread
From: David Miller @ 2015-11-11 19:01 UTC (permalink / raw)
  To: linux-arm-kernel

From: Will Deacon <will.deacon@arm.com>
Date: Wed, 11 Nov 2015 17:44:01 +0000

> On Wed, Nov 11, 2015 at 12:35:48PM -0500, David Miller wrote:
>> From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
>> Date: Wed, 11 Nov 2015 09:27:00 -0800
>> 
>> > BPF_XADD == atomic_add() in kernel. period.
>> > we are not going to deprecate it or introduce something else.
>> 
>> Agreed, it makes no sense to try and tie C99 or whatever atomic
>> semantics to something that is already clearly defined to have
>> exactly kernel atomic_add() semantics.
> 
> ... and which is emitted by LLVM when asked to compile __sync_fetch_and_add,
> which has clearly defined (yet conflicting) semantics.

Alexei clearly stated that he knows about this issue and will fully
fix this up in LLVM.

What more do you need to hear from him once he's stated that he is
aware and is working on it?  Meanwhile you should make your JIT emit
what is expected, rather than arguing to change the semantics.

Thanks.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:11                                     ` Alexei Starovoitov
@ 2015-11-11 19:01                                       ` David Miller
  -1 siblings, 0 replies; 103+ messages in thread
From: David Miller @ 2015-11-11 19:01 UTC (permalink / raw)
  To: alexei.starovoitov
  Cc: peterz, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Wed, 11 Nov 2015 10:11:33 -0800

> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
> From the day one it meant to be atomic_add() as kernel does it.

+1

> I did piggy back on __sync in the llvm backend because it was the quick
> and dirty way to move forward.
> In retrospect I should have introduced a clean intrinstic for that instead,
> but it's not too late to do it now. user space we can change at any time
> unlike kernel.

+1

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 19:01                                       ` David Miller
  0 siblings, 0 replies; 103+ messages in thread
From: David Miller @ 2015-11-11 19:01 UTC (permalink / raw)
  To: linux-arm-kernel

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Date: Wed, 11 Nov 2015 10:11:33 -0800

> bpf_xadd was never meant to be __sync_fetch_and_add equivalent.
> From the day one it meant to be atomic_add() as kernel does it.

+1

> I did piggy back on __sync in the llvm backend because it was the quick
> and dirty way to move forward.
> In retrospect I should have introduced a clean intrinstic for that instead,
> but it's not too late to do it now. user space we can change at any time
> unlike kernel.

+1

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:50                                         ` Daniel Borkmann
@ 2015-11-11 19:04                                           ` David Miller
  -1 siblings, 0 replies; 103+ messages in thread
From: David Miller @ 2015-11-11 19:04 UTC (permalink / raw)
  To: daniel
  Cc: peterz, alexei.starovoitov, will.deacon, arnd, yang.shi,
	linaro-kernel, eric.dumazet, zlim.lnx, ast, linux-kernel, netdev,
	xi.wang, catalin.marinas, linux-arm-kernel, yhs, bblanco

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 11 Nov 2015 19:50:15 +0100

> Well, on that note, it's not like you just change the target to bpf
> in your Makefile and can compile (& load into the kernel) anything
> you want with it.  You do have to write small, restricted programs
> from scratch for a specific use-case with the limited set of helper
> functions and intrinsics that are available from the kernel. So I
> don't think that "Programs that used to work will now no longer
> work." holds if you regard it as such.

+1

Strict C language semantics do not apply here at all, we are talking
about purposfully built modules of "C like" code that have any
semantics we want and make the most sense for us.

Maybe BPF_XADD is unfortunately named, but this is tangental to
our ability to choose what atomic operations mean and what semantics
they match up to.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 19:04                                           ` David Miller
  0 siblings, 0 replies; 103+ messages in thread
From: David Miller @ 2015-11-11 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Wed, 11 Nov 2015 19:50:15 +0100

> Well, on that note, it's not like you just change the target to bpf
> in your Makefile and can compile (& load into the kernel) anything
> you want with it.  You do have to write small, restricted programs
> from scratch for a specific use-case with the limited set of helper
> functions and intrinsics that are available from the kernel. So I
> don't think that "Programs that used to work will now no longer
> work." holds if you regard it as such.

+1

Strict C language semantics do not apply here at all, we are talking
about purposfully built modules of "C like" code that have any
semantics we want and make the most sense for us.

Maybe BPF_XADD is unfortunately named, but this is tangental to
our ability to choose what atomic operations mean and what semantics
they match up to.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:50                                         ` Daniel Borkmann
@ 2015-11-11 19:23                                           ` Peter Zijlstra
  -1 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 19:23 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, David Miller, will.deacon, arnd, yang.shi,
	linaro-kernel, eric.dumazet, zlim.lnx, ast, linux-kernel, netdev,
	xi.wang, catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 07:50:15PM +0100, Daniel Borkmann wrote:
> Well, on that note, it's not like you just change the target to bpf in your
> Makefile and can compile (& load into the kernel) anything you want with it.
> You do have to write small, restricted programs from scratch for a specific
> use-case with the limited set of helper functions and intrinsics that are
> available from the kernel. So I don't think that "Programs that used to work
> will now no longer work." holds if you regard it as such.

So I don't get this argument. If everything is so targeted, then why are
the BPF instructions an ABI.

If OTOH you're expected to be able to transfer these small proglets,
then too I would expect to transfer the source of these proglets.

You cannot argue both ways.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 19:23                                           ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 19:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 07:50:15PM +0100, Daniel Borkmann wrote:
> Well, on that note, it's not like you just change the target to bpf in your
> Makefile and can compile (& load into the kernel) anything you want with it.
> You do have to write small, restricted programs from scratch for a specific
> use-case with the limited set of helper functions and intrinsics that are
> available from the kernel. So I don't think that "Programs that used to work
> will now no longer work." holds if you regard it as such.

So I don't get this argument. If everything is so targeted, then why are
the BPF instructions an ABI.

If OTOH you're expected to be able to transfer these small proglets,
then too I would expect to transfer the source of these proglets.

You cannot argue both ways.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 19:23                                           ` Peter Zijlstra
@ 2015-11-11 19:41                                             ` Daniel Borkmann
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 19:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexei Starovoitov, David Miller, will.deacon, arnd, yang.shi,
	linaro-kernel, eric.dumazet, zlim.lnx, ast, linux-kernel, netdev,
	xi.wang, catalin.marinas, linux-arm-kernel, yhs, bblanco

On 11/11/2015 08:23 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 07:50:15PM +0100, Daniel Borkmann wrote:
>> Well, on that note, it's not like you just change the target to bpf in your
>> Makefile and can compile (& load into the kernel) anything you want with it.
>> You do have to write small, restricted programs from scratch for a specific
>> use-case with the limited set of helper functions and intrinsics that are
>> available from the kernel. So I don't think that "Programs that used to work
>> will now no longer work." holds if you regard it as such.
>
> So I don't get this argument. If everything is so targeted, then why are
> the BPF instructions an ABI.
>
> If OTOH you're expected to be able to transfer these small proglets,
> then too I would expect to transfer the source of these proglets.
>
> You cannot argue both ways.

Ohh, I think we were talking past each other. ;) So, yeah, you'd likely need
to add new intrinstics that then map to the existing BPF_XADD instructions,
and perhaps spill a warning when __sync_fetch_and_add() is being used to
advise the developer to switch to the new intrinstics instead. From kernel
ABI PoV nothing would change.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 19:41                                             ` Daniel Borkmann
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Borkmann @ 2015-11-11 19:41 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/11/2015 08:23 PM, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 07:50:15PM +0100, Daniel Borkmann wrote:
>> Well, on that note, it's not like you just change the target to bpf in your
>> Makefile and can compile (& load into the kernel) anything you want with it.
>> You do have to write small, restricted programs from scratch for a specific
>> use-case with the limited set of helper functions and intrinsics that are
>> available from the kernel. So I don't think that "Programs that used to work
>> will now no longer work." holds if you regard it as such.
>
> So I don't get this argument. If everything is so targeted, then why are
> the BPF instructions an ABI.
>
> If OTOH you're expected to be able to transfer these small proglets,
> then too I would expect to transfer the source of these proglets.
>
> You cannot argue both ways.

Ohh, I think we were talking past each other. ;) So, yeah, you'd likely need
to add new intrinstics that then map to the existing BPF_XADD instructions,
and perhaps spill a warning when __sync_fetch_and_add() is being used to
advise the developer to switch to the new intrinstics instead. From kernel
ABI PoV nothing would change.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 18:54                                           ` Peter Zijlstra
@ 2015-11-11 19:55                                             ` Alexei Starovoitov
  -1 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 19:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Miller, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 07:54:15PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 07:44:27PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > > > Add new one that does 'fetch_and_add' ? What is the real use case it
> > > > will be used for?
> > > 
> > > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> > > example would be a reader-writer lock implementations. See
> > > include/asm-generic/rwsem.h for examples.
> > 
> > Maybe a better example would be refcounting, where you free on 0.
> > 
> > 	if (!fetch_add(&obj->ref, -1))
> > 		free(obj);
> 
> Urgh, too used to the atomic_add_return(), which returns post op. That
> wants to be:
> 
> 	if (fetch_add(&obj->ref, -1) == 1)
> 		free(obj);

this type of code will never be acceptable in bpf world.
If C code does cmpxchg-like things, it's clearly beyond bpf abilities.
There are no locks or support for locks in bpf design and will not be.
We don't want a program to grab a lock and then terminate automatically
because it did divide by zero.
Programs are not allowed to directly allocate/free memory either.
We don't want dangling pointers.
Therefore things like memory barriers, full set of atomics are not applicable
in bpf world.
The only goal for bpf_xadd (could have been named better, agreed) was to
do counters. Like counting packets or bytes or events. In all such cases
there is no need to do 'fetch' part.
Another reason for lack of 'fetch' part is simplifying JIT.
It's easier to emit 'atomic_add' equivalent than to emit 'atomic_add_return'.
The only shared data structure two programs can see is a map element.
They can increment counters via bpf_xadd or replace the whole map element
atomically via bpf_update_map_elem() helper. That's it.
If the program needs to grab the lock, do some writes and release it,
then probably bpf is not suitable for such use case.
The bpf programs should be "fast by design" meaning that there should
be no mechanisms in bpf architecture that would allow a program to slow
down other programs or the kernel in general.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 19:55                                             ` Alexei Starovoitov
  0 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 19:55 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 07:54:15PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 07:44:27PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 07:31:28PM +0100, Peter Zijlstra wrote:
> > > > Add new one that does 'fetch_and_add' ? What is the real use case it
> > > > will be used for?
> > > 
> > > Look at all the atomic_{add,dec}_return*() users in the kernel. A typical
> > > example would be a reader-writer lock implementations. See
> > > include/asm-generic/rwsem.h for examples.
> > 
> > Maybe a better example would be refcounting, where you free on 0.
> > 
> > 	if (!fetch_add(&obj->ref, -1))
> > 		free(obj);
> 
> Urgh, too used to the atomic_add_return(), which returns post op. That
> wants to be:
> 
> 	if (fetch_add(&obj->ref, -1) == 1)
> 		free(obj);

this type of code will never be acceptable in bpf world.
If C code does cmpxchg-like things, it's clearly beyond bpf abilities.
There are no locks or support for locks in bpf design and will not be.
We don't want a program to grab a lock and then terminate automatically
because it did divide by zero.
Programs are not allowed to directly allocate/free memory either.
We don't want dangling pointers.
Therefore things like memory barriers, full set of atomics are not applicable
in bpf world.
The only goal for bpf_xadd (could have been named better, agreed) was to
do counters. Like counting packets or bytes or events. In all such cases
there is no need to do 'fetch' part.
Another reason for lack of 'fetch' part is simplifying JIT.
It's easier to emit 'atomic_add' equivalent than to emit 'atomic_add_return'.
The only shared data structure two programs can see is a map element.
They can increment counters via bpf_xadd or replace the whole map element
atomically via bpf_update_map_elem() helper. That's it.
If the program needs to grab the lock, do some writes and release it,
then probably bpf is not suitable for such use case.
The bpf programs should be "fast by design" meaning that there should
be no mechanisms in bpf architecture that would allow a program to slow
down other programs or the kernel in general.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 19:55                                             ` Alexei Starovoitov
@ 2015-11-11 22:21                                               ` Peter Zijlstra
  -1 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 22:21 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> Therefore things like memory barriers, full set of atomics are not applicable
> in bpf world.

There are still plenty of wait-free constructs one can make using them.

Say a barrier/rendezvous construct for knowing when an event has
happened on all CPUs.

But if you really do not want any of that, I suppose that is a valid
choice.


Is even privileged (e)BPF not allowed things like this? I was thinking
the strict no loops stuff was for unpriv (e)BPF only.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 22:21                                               ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-11 22:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> Therefore things like memory barriers, full set of atomics are not applicable
> in bpf world.

There are still plenty of wait-free constructs one can make using them.

Say a barrier/rendezvous construct for knowing when an event has
happened on all CPUs.

But if you really do not want any of that, I suppose that is a valid
choice.


Is even privileged (e)BPF not allowed things like this? I was thinking
the strict no loops stuff was for unpriv (e)BPF only.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 22:21                                               ` Peter Zijlstra
  (?)
@ 2015-11-11 23:40                                                 ` Alexei Starovoitov
  -1 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 23:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: David Miller, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 11:21:35PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> > Therefore things like memory barriers, full set of atomics are not applicable
> > in bpf world.
> 
> There are still plenty of wait-free constructs one can make using them.

yes, but all such lock-free algos are typically based on cmpxchg8b and
tight loop, so it would be very hard for verifier to proof termination
of such loops. I think when we'd need to add something like this, we'll
add new bpf insn that will be membarrier+cmpxhg8b+check+loop as
a single insn, so it cannot be misused.
I don't know of any concrete use case yet. All possible though.

> Say a barrier/rendezvous construct for knowing when an event has
> happened on all CPUs.
> 
> But if you really do not want any of that, I suppose that is a valid
> choice.

I do want it :) and I think in the future we'll add a bunch
of interesting stuff. May be including things like above. I just
don't want to rush things in just because x86 has such insn
or because gcc has a builtin for it.
Like we discussed adding popcnt insn. It can be useful in some cases,
but doesn't seem to worth the pain of adding it to interpreter, JITs
and llvm backends... as of today... May be tomorrow it will be must have.

> Is even privileged (e)BPF not allowed things like this? I was thinking
> the strict no loops stuff was for unpriv (e)BPF only.

the only difference between unpriv and priv is the ability to send
all values (including kernel addresses) to user space (like tracing
needs to see all registers). The rest is the same.
root should never crash the kernel as well. If we relax even little bit
for root then the whole bpf stuff is no better than kernel module.

btw, support for mini loops was requested many times in the past.
I guess we'd have to add something like this, but it's tricky.
Mainly because control flow graph analysis becomes much more complicated.


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 23:40                                                 ` Alexei Starovoitov
  0 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 23:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: yang.shi, yhs, linaro-kernel, catalin.marinas, daniel, arnd,
	zlim.lnx, bblanco, will.deacon, ast, linux-kernel,
	linux-arm-kernel, netdev, eric.dumazet, David Miller, xi.wang

On Wed, Nov 11, 2015 at 11:21:35PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> > Therefore things like memory barriers, full set of atomics are not applicable
> > in bpf world.
> 
> There are still plenty of wait-free constructs one can make using them.

yes, but all such lock-free algos are typically based on cmpxchg8b and
tight loop, so it would be very hard for verifier to proof termination
of such loops. I think when we'd need to add something like this, we'll
add new bpf insn that will be membarrier+cmpxhg8b+check+loop as
a single insn, so it cannot be misused.
I don't know of any concrete use case yet. All possible though.

> Say a barrier/rendezvous construct for knowing when an event has
> happened on all CPUs.
> 
> But if you really do not want any of that, I suppose that is a valid
> choice.

I do want it :) and I think in the future we'll add a bunch
of interesting stuff. May be including things like above. I just
don't want to rush things in just because x86 has such insn
or because gcc has a builtin for it.
Like we discussed adding popcnt insn. It can be useful in some cases,
but doesn't seem to worth the pain of adding it to interpreter, JITs
and llvm backends... as of today... May be tomorrow it will be must have.

> Is even privileged (e)BPF not allowed things like this? I was thinking
> the strict no loops stuff was for unpriv (e)BPF only.

the only difference between unpriv and priv is the ability to send
all values (including kernel addresses) to user space (like tracing
needs to see all registers). The rest is the same.
root should never crash the kernel as well. If we relax even little bit
for root then the whole bpf stuff is no better than kernel module.

btw, support for mini loops was requested many times in the past.
I guess we'd have to add something like this, but it's tricky.
Mainly because control flow graph analysis becomes much more complicated.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-11 23:40                                                 ` Alexei Starovoitov
  0 siblings, 0 replies; 103+ messages in thread
From: Alexei Starovoitov @ 2015-11-11 23:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 11:21:35PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> > Therefore things like memory barriers, full set of atomics are not applicable
> > in bpf world.
> 
> There are still plenty of wait-free constructs one can make using them.

yes, but all such lock-free algos are typically based on cmpxchg8b and
tight loop, so it would be very hard for verifier to proof termination
of such loops. I think when we'd need to add something like this, we'll
add new bpf insn that will be membarrier+cmpxhg8b+check+loop as
a single insn, so it cannot be misused.
I don't know of any concrete use case yet. All possible though.

> Say a barrier/rendezvous construct for knowing when an event has
> happened on all CPUs.
> 
> But if you really do not want any of that, I suppose that is a valid
> choice.

I do want it :) and I think in the future we'll add a bunch
of interesting stuff. May be including things like above. I just
don't want to rush things in just because x86 has such insn
or because gcc has a builtin for it.
Like we discussed adding popcnt insn. It can be useful in some cases,
but doesn't seem to worth the pain of adding it to interpreter, JITs
and llvm backends... as of today... May be tomorrow it will be must have.

> Is even privileged (e)BPF not allowed things like this? I was thinking
> the strict no loops stuff was for unpriv (e)BPF only.

the only difference between unpriv and priv is the ability to send
all values (including kernel addresses) to user space (like tracing
needs to see all registers). The rest is the same.
root should never crash the kernel as well. If we relax even little bit
for root then the whole bpf stuff is no better than kernel module.

btw, support for mini loops was requested many times in the past.
I guess we'd have to add something like this, but it's tricky.
Mainly because control flow graph analysis becomes much more complicated.

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 2/2] arm64: bpf: add BPF XADD instruction
  2015-11-11 23:40                                                 ` Alexei Starovoitov
@ 2015-11-12  8:57                                                   ` Peter Zijlstra
  -1 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-12  8:57 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, will.deacon, daniel, arnd, yang.shi, linaro-kernel,
	eric.dumazet, zlim.lnx, ast, linux-kernel, netdev, xi.wang,
	catalin.marinas, linux-arm-kernel, yhs, bblanco

On Wed, Nov 11, 2015 at 03:40:15PM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 11, 2015 at 11:21:35PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> > > Therefore things like memory barriers, full set of atomics are not applicable
> > > in bpf world.
> > 
> > There are still plenty of wait-free constructs one can make using them.
> 
> yes, but all such lock-free algos are typically based on cmpxchg8b and
> tight loop, so it would be very hard for verifier to proof termination
> of such loops. I think when we'd need to add something like this, we'll
> add new bpf insn that will be membarrier+cmpxhg8b+check+loop as
> a single insn, so it cannot be misused.
> I don't know of any concrete use case yet. All possible though.

So this is where the 'unconditional' atomic ops come in handy.

Like the x86: xchg, lock {xadd,add,sub,inc,dec,or,and,xor}

Those do not have a loop, and then you can create truly wait-free
things; even some applications of cmpxchg do not actually need the loop.

But this class of wait-free constructs is indeed significantly smaller
than the class of lock-less constructs.

> btw, support for mini loops was requested many times in the past.
> I guess we'd have to add something like this, but it's tricky.
> Mainly because control flow graph analysis becomes much more complicated.

Agreed, that does sound like an 'interesting' problem :-)

Something like:

atomic_op(ptr, f)
{
	for (;;) {
		val = *ptr;
		new = f(val)
		old = cmpxchg(ptr, val, new);
		if (old == val)
			break;

		cpu_relax();
	}
}

might be castable as an instruction I suppose, but I'm not sure you have
function references in (e)BPF.

The above is 'sane' if f is sane (although there is a
starvation case, which is why things like sparc (iirc) need an
increasing backoff instead of cpu_relax()).

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/2] arm64: bpf: add BPF XADD instruction
@ 2015-11-12  8:57                                                   ` Peter Zijlstra
  0 siblings, 0 replies; 103+ messages in thread
From: Peter Zijlstra @ 2015-11-12  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 11, 2015 at 03:40:15PM -0800, Alexei Starovoitov wrote:
> On Wed, Nov 11, 2015 at 11:21:35PM +0100, Peter Zijlstra wrote:
> > On Wed, Nov 11, 2015 at 11:55:59AM -0800, Alexei Starovoitov wrote:
> > > Therefore things like memory barriers, full set of atomics are not applicable
> > > in bpf world.
> > 
> > There are still plenty of wait-free constructs one can make using them.
> 
> yes, but all such lock-free algos are typically based on cmpxchg8b and
> tight loop, so it would be very hard for verifier to proof termination
> of such loops. I think when we'd need to add something like this, we'll
> add new bpf insn that will be membarrier+cmpxhg8b+check+loop as
> a single insn, so it cannot be misused.
> I don't know of any concrete use case yet. All possible though.

So this is where the 'unconditional' atomic ops come in handy.

Like the x86: xchg, lock {xadd,add,sub,inc,dec,or,and,xor}

Those do not have a loop, and then you can create truly wait-free
things; even some applications of cmpxchg do not actually need the loop.

But this class of wait-free constructs is indeed significantly smaller
than the class of lock-less constructs.

> btw, support for mini loops was requested many times in the past.
> I guess we'd have to add something like this, but it's tricky.
> Mainly because control flow graph analysis becomes much more complicated.

Agreed, that does sound like an 'interesting' problem :-)

Something like:

atomic_op(ptr, f)
{
	for (;;) {
		val = *ptr;
		new = f(val)
		old = cmpxchg(ptr, val, new);
		if (old == val)
			break;

		cpu_relax();
	}
}

might be castable as an instruction I suppose, but I'm not sure you have
function references in (e)BPF.

The above is 'sane' if f is sane (although there is a
starvation case, which is why things like sparc (iirc) need an
increasing backoff instead of cpu_relax()).

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
  2015-11-11 12:39         ` Will Deacon
  (?)
@ 2015-11-12 19:33           ` Shi, Yang
  -1 siblings, 0 replies; 103+ messages in thread
From: Shi, Yang @ 2015-11-12 19:33 UTC (permalink / raw)
  To: Will Deacon, Z Lim
  Cc: Alexei Starovoitov, daniel, Catalin Marinas, Xi Wang, LKML,
	Network Development, linux-arm-kernel, linaro-kernel

On 11/11/2015 4:39 AM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 12:12:56PM +0000, Will Deacon wrote:
>> On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:
>>> On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
>>>> aarch64 doesn't have native store immediate instruction, such operation
>>>
>>> Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
>>> consider using it as an optimization.
>>
>> Yes, I'd definitely like to see that in preference to moving via a
>> temporary register.
>
> Wait a second, we're both talking rubbish here :) The STR (immediate)
> form is referring to the addressing mode, whereas this patch wants to
> store an immediate value to memory, which does need moving to a register
> first.

Yes, the immediate means immediate offset for addressing index. Doesn't 
mean to store immediate to memory.

I don't think any load-store architecture has store immediate instruction.

Thanks,
Yang

>
> So the original patch is fine.
>
> Will
>


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-12 19:33           ` Shi, Yang
  0 siblings, 0 replies; 103+ messages in thread
From: Shi, Yang @ 2015-11-12 19:33 UTC (permalink / raw)
  To: Will Deacon, Z Lim
  Cc: Alexei Starovoitov, daniel, Catalin Marinas, Xi Wang, LKML,
	Network Development, linux-arm-kernel, linaro-kernel

On 11/11/2015 4:39 AM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 12:12:56PM +0000, Will Deacon wrote:
>> On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:
>>> On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
>>>> aarch64 doesn't have native store immediate instruction, such operation
>>>
>>> Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
>>> consider using it as an optimization.
>>
>> Yes, I'd definitely like to see that in preference to moving via a
>> temporary register.
>
> Wait a second, we're both talking rubbish here :) The STR (immediate)
> form is referring to the addressing mode, whereas this patch wants to
> store an immediate value to memory, which does need moving to a register
> first.

Yes, the immediate means immediate offset for addressing index. Doesn't 
mean to store immediate to memory.

I don't think any load-store architecture has store immediate instruction.

Thanks,
Yang

>
> So the original patch is fine.
>
> Will
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-12 19:33           ` Shi, Yang
  0 siblings, 0 replies; 103+ messages in thread
From: Shi, Yang @ 2015-11-12 19:33 UTC (permalink / raw)
  To: linux-arm-kernel

On 11/11/2015 4:39 AM, Will Deacon wrote:
> On Wed, Nov 11, 2015 at 12:12:56PM +0000, Will Deacon wrote:
>> On Tue, Nov 10, 2015 at 06:45:39PM -0800, Z Lim wrote:
>>> On Tue, Nov 10, 2015 at 2:41 PM, Yang Shi <yang.shi@linaro.org> wrote:
>>>> aarch64 doesn't have native store immediate instruction, such operation
>>>
>>> Actually, aarch64 does have "STR (immediate)". For arm64 JIT, we can
>>> consider using it as an optimization.
>>
>> Yes, I'd definitely like to see that in preference to moving via a
>> temporary register.
>
> Wait a second, we're both talking rubbish here :) The STR (immediate)
> form is referring to the addressing mode, whereas this patch wants to
> store an immediate value to memory, which does need moving to a register
> first.

Yes, the immediate means immediate offset for addressing index. Doesn't 
mean to store immediate to memory.

I don't think any load-store architecture has store immediate instruction.

Thanks,
Yang

>
> So the original patch is fine.
>
> Will
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
  2015-11-12 19:33           ` Shi, Yang
  (?)
@ 2015-11-13  3:45             ` Z Lim
  -1 siblings, 0 replies; 103+ messages in thread
From: Z Lim @ 2015-11-13  3:45 UTC (permalink / raw)
  To: Shi, Yang
  Cc: Will Deacon, Alexei Starovoitov, daniel, Catalin Marinas,
	Xi Wang, LKML, Network Development, linux-arm-kernel,
	linaro-kernel

On Thu, Nov 12, 2015 at 11:33 AM, Shi, Yang <yang.shi@linaro.org> wrote:
> On 11/11/2015 4:39 AM, Will Deacon wrote:
>>
>> Wait a second, we're both talking rubbish here :) The STR (immediate)
>> form is referring to the addressing mode, whereas this patch wants to
>> store an immediate value to memory, which does need moving to a register
>> first.
>
>
> Yes, the immediate means immediate offset for addressing index. Doesn't mean
> to store immediate to memory.
>
> I don't think any load-store architecture has store immediate instruction.
>

Indeed. Sorry for the noise.

Somehow Will caught a whiff of whatever I was smoking then :)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-13  3:45             ` Z Lim
  0 siblings, 0 replies; 103+ messages in thread
From: Z Lim @ 2015-11-13  3:45 UTC (permalink / raw)
  To: Shi, Yang
  Cc: Will Deacon, Alexei Starovoitov, daniel, Catalin Marinas,
	Xi Wang, LKML, Network Development, linux-arm-kernel,
	linaro-kernel

On Thu, Nov 12, 2015 at 11:33 AM, Shi, Yang <yang.shi@linaro.org> wrote:
> On 11/11/2015 4:39 AM, Will Deacon wrote:
>>
>> Wait a second, we're both talking rubbish here :) The STR (immediate)
>> form is referring to the addressing mode, whereas this patch wants to
>> store an immediate value to memory, which does need moving to a register
>> first.
>
>
> Yes, the immediate means immediate offset for addressing index. Doesn't mean
> to store immediate to memory.
>
> I don't think any load-store architecture has store immediate instruction.
>

Indeed. Sorry for the noise.

Somehow Will caught a whiff of whatever I was smoking then :)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-13  3:45             ` Z Lim
  0 siblings, 0 replies; 103+ messages in thread
From: Z Lim @ 2015-11-13  3:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 12, 2015 at 11:33 AM, Shi, Yang <yang.shi@linaro.org> wrote:
> On 11/11/2015 4:39 AM, Will Deacon wrote:
>>
>> Wait a second, we're both talking rubbish here :) The STR (immediate)
>> form is referring to the addressing mode, whereas this patch wants to
>> store an immediate value to memory, which does need moving to a register
>> first.
>
>
> Yes, the immediate means immediate offset for addressing index. Doesn't mean
> to store immediate to memory.
>
> I don't think any load-store architecture has store immediate instruction.
>

Indeed. Sorry for the noise.

Somehow Will caught a whiff of whatever I was smoking then :)

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
  2015-11-13  3:45             ` Z Lim
  (?)
@ 2015-11-23 19:34               ` Shi, Yang
  -1 siblings, 0 replies; 103+ messages in thread
From: Shi, Yang @ 2015-11-23 19:34 UTC (permalink / raw)
  To: Z Lim, Will Deacon, Alexei Starovoitov, David S. Miller
  Cc: daniel, Catalin Marinas, Xi Wang, LKML, Network Development,
	linux-arm-kernel, linaro-kernel

Hi folks,

Any more comments on this patch (store immediate only)?

I need more time to add XADD (I'm supposed everyone agrees it is 
equivalent to atomic_add). However, this one is irrelevant to XADD, so 
we may be able to apply it first?

Thanks,
Yang


On 11/12/2015 7:45 PM, Z Lim wrote:
> On Thu, Nov 12, 2015 at 11:33 AM, Shi, Yang <yang.shi@linaro.org> wrote:
>> On 11/11/2015 4:39 AM, Will Deacon wrote:
>>>
>>> Wait a second, we're both talking rubbish here :) The STR (immediate)
>>> form is referring to the addressing mode, whereas this patch wants to
>>> store an immediate value to memory, which does need moving to a register
>>> first.
>>
>>
>> Yes, the immediate means immediate offset for addressing index. Doesn't mean
>> to store immediate to memory.
>>
>> I don't think any load-store architecture has store immediate instruction.
>>
>
> Indeed. Sorry for the noise.
>
> Somehow Will caught a whiff of whatever I was smoking then :)
>


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-23 19:34               ` Shi, Yang
  0 siblings, 0 replies; 103+ messages in thread
From: Shi, Yang @ 2015-11-23 19:34 UTC (permalink / raw)
  To: Z Lim, Will Deacon, Alexei Starovoitov, David S. Miller
  Cc: daniel, Catalin Marinas, Xi Wang, LKML, Network Development,
	linux-arm-kernel, linaro-kernel

Hi folks,

Any more comments on this patch (store immediate only)?

I need more time to add XADD (I'm supposed everyone agrees it is 
equivalent to atomic_add). However, this one is irrelevant to XADD, so 
we may be able to apply it first?

Thanks,
Yang


On 11/12/2015 7:45 PM, Z Lim wrote:
> On Thu, Nov 12, 2015 at 11:33 AM, Shi, Yang <yang.shi@linaro.org> wrote:
>> On 11/11/2015 4:39 AM, Will Deacon wrote:
>>>
>>> Wait a second, we're both talking rubbish here :) The STR (immediate)
>>> form is referring to the addressing mode, whereas this patch wants to
>>> store an immediate value to memory, which does need moving to a register
>>> first.
>>
>>
>> Yes, the immediate means immediate offset for addressing index. Doesn't mean
>> to store immediate to memory.
>>
>> I don't think any load-store architecture has store immediate instruction.
>>
>
> Indeed. Sorry for the noise.
>
> Somehow Will caught a whiff of whatever I was smoking then :)
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 1/2] arm64: bpf: add 'store immediate' instruction
@ 2015-11-23 19:34               ` Shi, Yang
  0 siblings, 0 replies; 103+ messages in thread
From: Shi, Yang @ 2015-11-23 19:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi folks,

Any more comments on this patch (store immediate only)?

I need more time to add XADD (I'm supposed everyone agrees it is 
equivalent to atomic_add). However, this one is irrelevant to XADD, so 
we may be able to apply it first?

Thanks,
Yang


On 11/12/2015 7:45 PM, Z Lim wrote:
> On Thu, Nov 12, 2015 at 11:33 AM, Shi, Yang <yang.shi@linaro.org> wrote:
>> On 11/11/2015 4:39 AM, Will Deacon wrote:
>>>
>>> Wait a second, we're both talking rubbish here :) The STR (immediate)
>>> form is referring to the addressing mode, whereas this patch wants to
>>> store an immediate value to memory, which does need moving to a register
>>> first.
>>
>>
>> Yes, the immediate means immediate offset for addressing index. Doesn't mean
>> to store immediate to memory.
>>
>> I don't think any load-store architecture has store immediate instruction.
>>
>
> Indeed. Sorry for the noise.
>
> Somehow Will caught a whiff of whatever I was smoking then :)
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2015-11-23 19:35 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-10 22:41 [PATCH 0/2] arm64: bpf: add BPF_ST and BPF_XADD instructions support Yang Shi
2015-11-10 22:41 ` Yang Shi
2015-11-10 22:41 ` Yang Shi
2015-11-10 22:41 ` [PATCH 1/2] arm64: bpf: add 'store immediate' instruction Yang Shi
2015-11-10 22:41   ` Yang Shi
2015-11-11  2:45   ` Z Lim
2015-11-11  2:45     ` Z Lim
2015-11-11  2:45     ` Z Lim
2015-11-11 12:12     ` Will Deacon
2015-11-11 12:12       ` Will Deacon
2015-11-11 12:12       ` Will Deacon
2015-11-11 12:39       ` Will Deacon
2015-11-11 12:39         ` Will Deacon
2015-11-11 12:39         ` Will Deacon
2015-11-12 19:33         ` Shi, Yang
2015-11-12 19:33           ` Shi, Yang
2015-11-12 19:33           ` Shi, Yang
2015-11-13  3:45           ` Z Lim
2015-11-13  3:45             ` Z Lim
2015-11-13  3:45             ` Z Lim
2015-11-23 19:34             ` Shi, Yang
2015-11-23 19:34               ` Shi, Yang
2015-11-23 19:34               ` Shi, Yang
2015-11-10 22:41 ` [PATCH 2/2] arm64: bpf: add BPF XADD instruction Yang Shi
2015-11-10 22:41   ` Yang Shi
2015-11-11  0:08   ` Eric Dumazet
2015-11-11  0:08     ` Eric Dumazet
2015-11-11  0:26     ` Shi, Yang
2015-11-11  0:26       ` Shi, Yang
2015-11-11  0:42       ` Alexei Starovoitov
2015-11-11  0:42         ` Alexei Starovoitov
2015-11-11  2:52         ` Z Lim
2015-11-11  2:52           ` Z Lim
2015-11-11  2:52           ` Z Lim
2015-11-11  8:49           ` Arnd Bergmann
2015-11-11  8:49             ` Arnd Bergmann
2015-11-11  8:49             ` Arnd Bergmann
2015-11-11 10:24             ` Will Deacon
2015-11-11 10:24               ` Will Deacon
2015-11-11 10:24               ` Will Deacon
2015-11-11 10:42               ` Daniel Borkmann
2015-11-11 10:42                 ` Daniel Borkmann
2015-11-11 10:42                 ` Daniel Borkmann
2015-11-11 11:58                 ` Will Deacon
2015-11-11 11:58                   ` Will Deacon
2015-11-11 11:58                   ` Will Deacon
2015-11-11 12:21                   ` Daniel Borkmann
2015-11-11 12:21                     ` Daniel Borkmann
2015-11-11 12:21                     ` Daniel Borkmann
2015-11-11 12:38                     ` Will Deacon
2015-11-11 12:38                       ` Will Deacon
2015-11-11 12:38                       ` Will Deacon
2015-11-11 12:58                       ` Peter Zijlstra
2015-11-11 12:58                         ` Peter Zijlstra
2015-11-11 12:58                         ` Peter Zijlstra
2015-11-11 15:52                         ` Daniel Borkmann
2015-11-11 15:52                           ` Daniel Borkmann
2015-11-11 15:52                           ` Daniel Borkmann
2015-11-11 16:23                           ` Will Deacon
2015-11-11 16:23                             ` Will Deacon
2015-11-11 16:23                             ` Will Deacon
2015-11-11 17:27                             ` Alexei Starovoitov
2015-11-11 17:27                               ` Alexei Starovoitov
2015-11-11 17:27                               ` Alexei Starovoitov
2015-11-11 17:35                               ` David Miller
2015-11-11 17:35                                 ` David Miller
2015-11-11 17:44                                 ` Will Deacon
2015-11-11 17:44                                   ` Will Deacon
2015-11-11 19:01                                   ` David Miller
2015-11-11 19:01                                     ` David Miller
2015-11-11 17:57                                 ` Peter Zijlstra
2015-11-11 17:57                                   ` Peter Zijlstra
2015-11-11 18:11                                   ` Alexei Starovoitov
2015-11-11 18:11                                     ` Alexei Starovoitov
2015-11-11 18:31                                     ` Peter Zijlstra
2015-11-11 18:31                                       ` Peter Zijlstra
2015-11-11 18:41                                       ` Peter Zijlstra
2015-11-11 18:41                                         ` Peter Zijlstra
2015-11-11 18:44                                       ` Peter Zijlstra
2015-11-11 18:44                                         ` Peter Zijlstra
2015-11-11 18:54                                         ` Peter Zijlstra
2015-11-11 18:54                                           ` Peter Zijlstra
2015-11-11 19:55                                           ` Alexei Starovoitov
2015-11-11 19:55                                             ` Alexei Starovoitov
2015-11-11 22:21                                             ` Peter Zijlstra
2015-11-11 22:21                                               ` Peter Zijlstra
2015-11-11 23:40                                               ` Alexei Starovoitov
2015-11-11 23:40                                                 ` Alexei Starovoitov
2015-11-11 23:40                                                 ` Alexei Starovoitov
2015-11-12  8:57                                                 ` Peter Zijlstra
2015-11-12  8:57                                                   ` Peter Zijlstra
2015-11-11 18:50                                       ` Daniel Borkmann
2015-11-11 18:50                                         ` Daniel Borkmann
2015-11-11 19:04                                         ` David Miller
2015-11-11 19:04                                           ` David Miller
2015-11-11 19:23                                         ` Peter Zijlstra
2015-11-11 19:23                                           ` Peter Zijlstra
2015-11-11 19:41                                           ` Daniel Borkmann
2015-11-11 19:41                                             ` Daniel Borkmann
2015-11-11 18:46                                     ` Will Deacon
2015-11-11 18:46                                       ` Will Deacon
2015-11-11 19:01                                     ` David Miller
2015-11-11 19:01                                       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.