All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH bpf-next 0/3] arm64 BPF JIT Optimizations
@ 2020-05-07  1:05 ` Luke Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07  1:05 UTC (permalink / raw)
  To: bpf
  Cc: Luke Nelson, Xi Wang, Catalin Marinas, Will Deacon,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Mark Rutland, Greg Kroah-Hartman,
	Allison Randal, Ard Biesheuvel, Thomas Gleixner, Marc Zyngier,
	Christoffer Dall, linux-arm-kernel, linux-kernel, netdev,
	clang-built-linux

This patch series introduces several optimizations to the arm64 BPF JIT.
The optimizations make use of arm64 immediate instructions to avoid
loading BPF immediates to temporary registers, when possible.

In the process, we discovered two bugs in the logical immediate encoding
function in arch/arm64/kernel/insn.c using Serval. The series also fixes
the two bugs before introducing the optimizations.

Tested on aarch64 QEMU virt machine using test_bpf and test_verifier.

Luke Nelson (3):
  arm64: insn: Fix two bugs in encoding 32-bit logical immediates
  bpf, arm64: Optimize AND,OR,XOR,JSET BPF_K using arm64 logical
    immediates
  bpf, arm64: Optimize ADD,SUB,JMP BPF_K using arm64 add/sub immediates

 arch/arm64/kernel/insn.c      |  6 ++-
 arch/arm64/net/bpf_jit.h      | 22 +++++++++++
 arch/arm64/net/bpf_jit_comp.c | 73 ++++++++++++++++++++++++++++-------
 3 files changed, 85 insertions(+), 16 deletions(-)

Cc: Xi Wang <xi.wang@gmail.com>

-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 0/3] arm64 BPF JIT Optimizations
@ 2020-05-07  1:05 ` Luke Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07  1:05 UTC (permalink / raw)
  To: bpf
  Cc: Mark Rutland, Song Liu, Catalin Marinas, Alexei Starovoitov,
	Will Deacon, Ard Biesheuvel, Daniel Borkmann, Marc Zyngier,
	John Fastabend, clang-built-linux, linux-arm-kernel, Zi Shen Lim,
	Yonghong Song, Andrii Nakryiko, Xi Wang, Luke Nelson, KP Singh,
	Thomas Gleixner, Allison Randal, Greg Kroah-Hartman,
	linux-kernel, netdev, Martin KaFai Lau, Christoffer Dall

This patch series introduces several optimizations to the arm64 BPF JIT.
The optimizations make use of arm64 immediate instructions to avoid
loading BPF immediates to temporary registers, when possible.

In the process, we discovered two bugs in the logical immediate encoding
function in arch/arm64/kernel/insn.c using Serval. The series also fixes
the two bugs before introducing the optimizations.

Tested on aarch64 QEMU virt machine using test_bpf and test_verifier.

Luke Nelson (3):
  arm64: insn: Fix two bugs in encoding 32-bit logical immediates
  bpf, arm64: Optimize AND,OR,XOR,JSET BPF_K using arm64 logical
    immediates
  bpf, arm64: Optimize ADD,SUB,JMP BPF_K using arm64 add/sub immediates

 arch/arm64/kernel/insn.c      |  6 ++-
 arch/arm64/net/bpf_jit.h      | 22 +++++++++++
 arch/arm64/net/bpf_jit_comp.c | 73 ++++++++++++++++++++++++++++-------
 3 files changed, 85 insertions(+), 16 deletions(-)

Cc: Xi Wang <xi.wang@gmail.com>

-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
  2020-05-07  1:05 ` Luke Nelson
@ 2020-05-07  1:05   ` Luke Nelson
  -1 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07  1:05 UTC (permalink / raw)
  To: bpf
  Cc: Luke Nelson, Xi Wang, Catalin Marinas, Will Deacon,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Mark Rutland, Greg Kroah-Hartman,
	Thomas Gleixner, Christoffer Dall, Marc Zyngier,
	linux-arm-kernel, linux-kernel, netdev, clang-built-linux

This patch fixes two issues present in the current function for encoding
arm64 logical immediates when using the 32-bit variants of instructions.

First, the code does not correctly reject an all-ones 32-bit immediate
and returns an undefined instruction encoding, which can crash the kernel.
The fix is to add a check for this case.

Second, the code incorrectly rejects some 32-bit immediates that are
actually encodable as logical immediates. The root cause is that the code
uses a default mask of 64-bit all-ones, even for 32-bit immediates. This
causes an issue later on when the mask is used to fill the top bits of
the immediate with ones, shown here:

  /*
   * Pattern: 0..01..10..01..1
   *
   * Fill the unused top bits with ones, and check if
   * the result is a valid immediate (all ones with a
   * contiguous ranges of zeroes).
   */
  imm |= ~mask;
  if (!range_of_ones(~imm))
          return AARCH64_BREAK_FAULT;

To see the problem, consider an immediate of the form 0..01..10..01..1,
where the upper 32 bits are zero, such as 0x80000001. The code checks
if ~(imm | ~mask) contains a range of ones: the incorrect mask yields
1..10..01..10..0, which fails the check; the correct mask yields
0..01..10..0, which succeeds.

The fix is to use a 32-bit all-ones default mask for 32-bit immediates.

Currently, the only user of this function is in
arch/arm64/kvm/va_layout.c, which uses 64-bit immediates and won't
trigger these bugs.

We tested the new code against llvm-mc with all 1,302 encodable 32-bit
logical immediates and all 5,334 encodable 64-bit logical immediates.

Fixes: ef3935eeebff ("arm64: insn: Add encoder for bitwise operations using literals")
Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
---
 arch/arm64/kernel/insn.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 4a9e773a177f..42fad79546bb 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1535,7 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
 				    u32 insn)
 {
 	unsigned int immr, imms, n, ones, ror, esz, tmp;
-	u64 mask = ~0UL;
+	u64 mask;
 
 	/* Can't encode full zeroes or full ones */
 	if (!imm || !~imm)
@@ -1543,13 +1543,15 @@ static u32 aarch64_encode_immediate(u64 imm,
 
 	switch (variant) {
 	case AARCH64_INSN_VARIANT_32BIT:
-		if (upper_32_bits(imm))
+		if (upper_32_bits(imm) || imm == 0xffffffffUL)
 			return AARCH64_BREAK_FAULT;
 		esz = 32;
+		mask = 0xffffffffUL;
 		break;
 	case AARCH64_INSN_VARIANT_64BIT:
 		insn |= AARCH64_INSN_SF_BIT;
 		esz = 64;
+		mask = ~0UL;
 		break;
 	default:
 		pr_err("%s: unknown variant encoding %d\n", __func__, variant);
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
@ 2020-05-07  1:05   ` Luke Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07  1:05 UTC (permalink / raw)
  To: bpf
  Cc: Mark Rutland, Song Liu, Catalin Marinas, Alexei Starovoitov,
	Will Deacon, Daniel Borkmann, Marc Zyngier, John Fastabend,
	clang-built-linux, Zi Shen Lim, Yonghong Song, Andrii Nakryiko,
	Xi Wang, Luke Nelson, KP Singh, Thomas Gleixner,
	linux-arm-kernel, netdev, linux-kernel, Greg Kroah-Hartman,
	Martin KaFai Lau, Christoffer Dall

This patch fixes two issues present in the current function for encoding
arm64 logical immediates when using the 32-bit variants of instructions.

First, the code does not correctly reject an all-ones 32-bit immediate
and returns an undefined instruction encoding, which can crash the kernel.
The fix is to add a check for this case.

Second, the code incorrectly rejects some 32-bit immediates that are
actually encodable as logical immediates. The root cause is that the code
uses a default mask of 64-bit all-ones, even for 32-bit immediates. This
causes an issue later on when the mask is used to fill the top bits of
the immediate with ones, shown here:

  /*
   * Pattern: 0..01..10..01..1
   *
   * Fill the unused top bits with ones, and check if
   * the result is a valid immediate (all ones with a
   * contiguous ranges of zeroes).
   */
  imm |= ~mask;
  if (!range_of_ones(~imm))
          return AARCH64_BREAK_FAULT;

To see the problem, consider an immediate of the form 0..01..10..01..1,
where the upper 32 bits are zero, such as 0x80000001. The code checks
if ~(imm | ~mask) contains a range of ones: the incorrect mask yields
1..10..01..10..0, which fails the check; the correct mask yields
0..01..10..0, which succeeds.

The fix is to use a 32-bit all-ones default mask for 32-bit immediates.

Currently, the only user of this function is in
arch/arm64/kvm/va_layout.c, which uses 64-bit immediates and won't
trigger these bugs.

We tested the new code against llvm-mc with all 1,302 encodable 32-bit
logical immediates and all 5,334 encodable 64-bit logical immediates.

Fixes: ef3935eeebff ("arm64: insn: Add encoder for bitwise operations using literals")
Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
---
 arch/arm64/kernel/insn.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 4a9e773a177f..42fad79546bb 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1535,7 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
 				    u32 insn)
 {
 	unsigned int immr, imms, n, ones, ror, esz, tmp;
-	u64 mask = ~0UL;
+	u64 mask;
 
 	/* Can't encode full zeroes or full ones */
 	if (!imm || !~imm)
@@ -1543,13 +1543,15 @@ static u32 aarch64_encode_immediate(u64 imm,
 
 	switch (variant) {
 	case AARCH64_INSN_VARIANT_32BIT:
-		if (upper_32_bits(imm))
+		if (upper_32_bits(imm) || imm == 0xffffffffUL)
 			return AARCH64_BREAK_FAULT;
 		esz = 32;
+		mask = 0xffffffffUL;
 		break;
 	case AARCH64_INSN_VARIANT_64BIT:
 		insn |= AARCH64_INSN_SF_BIT;
 		esz = 64;
+		mask = ~0UL;
 		break;
 	default:
 		pr_err("%s: unknown variant encoding %d\n", __func__, variant);
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 2/3] bpf, arm64: Optimize AND,OR,XOR,JSET BPF_K using arm64 logical immediates
  2020-05-07  1:05 ` Luke Nelson
@ 2020-05-07  1:05   ` Luke Nelson
  -1 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07  1:05 UTC (permalink / raw)
  To: bpf
  Cc: Luke Nelson, Xi Wang, Catalin Marinas, Will Deacon,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Mark Rutland, Enrico Weigelt,
	Alexios Zavras, Thomas Gleixner, Marc Zyngier, Christoffer Dall,
	linux-arm-kernel, linux-kernel, netdev, clang-built-linux

The current code for BPF_{AND,OR,XOR,JSET} BPF_K loads the immediate to
a temporary register before use.

This patch changes the code to avoid using a temporary register
when the BPF immediate is encodable using an arm64 logical immediate
instruction. If the encoding fails (due to the immediate not being
encodable), it falls back to using a temporary register.

Example of generated code for BPF_ALU32_IMM(BPF_AND, R0, 0x80000001):

without optimization:

  24: mov  w10, #0x8000ffff
  28: movk w10, #0x1
  2c: and  w7, w7, w10

with optimization:

  24: and  w7, w7, #0x80000001

Since the encoding process is quite complex, the JIT reuses existing
functionality in arch/arm64/kernel/insn.c for encoding logical immediates
rather than duplicate it in the JIT.

Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
---
 arch/arm64/net/bpf_jit.h      | 14 +++++++++++++
 arch/arm64/net/bpf_jit_comp.c | 37 +++++++++++++++++++++++++++--------
 2 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index eb73f9f72c46..f36a779949e6 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -189,4 +189,18 @@
 /* Rn & Rm; set condition flags */
 #define A64_TST(sf, Rn, Rm) A64_ANDS(sf, A64_ZR, Rn, Rm)
 
+/* Logical (immediate) */
+#define A64_LOGIC_IMM(sf, Rd, Rn, imm, type) ({ \
+	u64 imm64 = (sf) ? (u64)imm : (u64)(u32)imm; \
+	aarch64_insn_gen_logical_immediate(AARCH64_INSN_LOGIC_##type, \
+		A64_VARIANT(sf), Rn, Rd, imm64); \
+})
+/* Rd = Rn OP imm */
+#define A64_AND_I(sf, Rd, Rn, imm) A64_LOGIC_IMM(sf, Rd, Rn, imm, AND)
+#define A64_ORR_I(sf, Rd, Rn, imm) A64_LOGIC_IMM(sf, Rd, Rn, imm, ORR)
+#define A64_EOR_I(sf, Rd, Rn, imm) A64_LOGIC_IMM(sf, Rd, Rn, imm, EOR)
+#define A64_ANDS_I(sf, Rd, Rn, imm) A64_LOGIC_IMM(sf, Rd, Rn, imm, AND_SETFLAGS)
+/* Rn & imm; set condition flags */
+#define A64_TST_I(sf, Rn, imm) A64_ANDS_I(sf, A64_ZR, Rn, imm)
+
 #endif /* _BPF_JIT_H */
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index cdc79de0c794..083e5d8a5e2c 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -356,6 +356,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	const bool isdw = BPF_SIZE(code) == BPF_DW;
 	u8 jmp_cond, reg;
 	s32 jmp_offset;
+	u32 a64_insn;
 
 #define check_imm(bits, imm) do {				\
 	if ((((imm) > 0) && ((imm) >> (bits))) ||		\
@@ -488,18 +489,33 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		break;
 	case BPF_ALU | BPF_AND | BPF_K:
 	case BPF_ALU64 | BPF_AND | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_AND(is64, dst, dst, tmp), ctx);
+		a64_insn = A64_AND_I(is64, dst, dst, imm);
+		if (a64_insn != AARCH64_BREAK_FAULT) {
+			emit(a64_insn, ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_AND(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_OR | BPF_K:
 	case BPF_ALU64 | BPF_OR | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_ORR(is64, dst, dst, tmp), ctx);
+		a64_insn = A64_ORR_I(is64, dst, dst, imm);
+		if (a64_insn != AARCH64_BREAK_FAULT) {
+			emit(a64_insn, ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_ORR(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_XOR | BPF_K:
 	case BPF_ALU64 | BPF_XOR | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_EOR(is64, dst, dst, tmp), ctx);
+		a64_insn = A64_EOR_I(is64, dst, dst, imm);
+		if (a64_insn != AARCH64_BREAK_FAULT) {
+			emit(a64_insn, ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_EOR(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_MUL | BPF_K:
 	case BPF_ALU64 | BPF_MUL | BPF_K:
@@ -628,8 +644,13 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		goto emit_cond_jmp;
 	case BPF_JMP | BPF_JSET | BPF_K:
 	case BPF_JMP32 | BPF_JSET | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_TST(is64, dst, tmp), ctx);
+		a64_insn = A64_TST_I(is64, dst, imm);
+		if (a64_insn != AARCH64_BREAK_FAULT) {
+			emit(a64_insn, ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_TST(is64, dst, tmp), ctx);
+		}
 		goto emit_cond_jmp;
 	/* function call */
 	case BPF_JMP | BPF_CALL:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 2/3] bpf, arm64: Optimize AND, OR, XOR, JSET BPF_K using arm64 logical immediates
@ 2020-05-07  1:05   ` Luke Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07  1:05 UTC (permalink / raw)
  To: bpf
  Cc: Mark Rutland, Song Liu, Catalin Marinas, Alexei Starovoitov,
	Will Deacon, Daniel Borkmann, Marc Zyngier, John Fastabend,
	clang-built-linux, Zi Shen Lim, Yonghong Song, Andrii Nakryiko,
	Xi Wang, Luke Nelson, Alexios Zavras, KP Singh, Thomas Gleixner,
	linux-arm-kernel, netdev, linux-kernel, Enrico Weigelt,
	Martin KaFai Lau, Christoffer Dall

The current code for BPF_{AND,OR,XOR,JSET} BPF_K loads the immediate to
a temporary register before use.

This patch changes the code to avoid using a temporary register
when the BPF immediate is encodable using an arm64 logical immediate
instruction. If the encoding fails (due to the immediate not being
encodable), it falls back to using a temporary register.

Example of generated code for BPF_ALU32_IMM(BPF_AND, R0, 0x80000001):

without optimization:

  24: mov  w10, #0x8000ffff
  28: movk w10, #0x1
  2c: and  w7, w7, w10

with optimization:

  24: and  w7, w7, #0x80000001

Since the encoding process is quite complex, the JIT reuses existing
functionality in arch/arm64/kernel/insn.c for encoding logical immediates
rather than duplicate it in the JIT.

Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
---
 arch/arm64/net/bpf_jit.h      | 14 +++++++++++++
 arch/arm64/net/bpf_jit_comp.c | 37 +++++++++++++++++++++++++++--------
 2 files changed, 43 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index eb73f9f72c46..f36a779949e6 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -189,4 +189,18 @@
 /* Rn & Rm; set condition flags */
 #define A64_TST(sf, Rn, Rm) A64_ANDS(sf, A64_ZR, Rn, Rm)
 
+/* Logical (immediate) */
+#define A64_LOGIC_IMM(sf, Rd, Rn, imm, type) ({ \
+	u64 imm64 = (sf) ? (u64)imm : (u64)(u32)imm; \
+	aarch64_insn_gen_logical_immediate(AARCH64_INSN_LOGIC_##type, \
+		A64_VARIANT(sf), Rn, Rd, imm64); \
+})
+/* Rd = Rn OP imm */
+#define A64_AND_I(sf, Rd, Rn, imm) A64_LOGIC_IMM(sf, Rd, Rn, imm, AND)
+#define A64_ORR_I(sf, Rd, Rn, imm) A64_LOGIC_IMM(sf, Rd, Rn, imm, ORR)
+#define A64_EOR_I(sf, Rd, Rn, imm) A64_LOGIC_IMM(sf, Rd, Rn, imm, EOR)
+#define A64_ANDS_I(sf, Rd, Rn, imm) A64_LOGIC_IMM(sf, Rd, Rn, imm, AND_SETFLAGS)
+/* Rn & imm; set condition flags */
+#define A64_TST_I(sf, Rn, imm) A64_ANDS_I(sf, A64_ZR, Rn, imm)
+
 #endif /* _BPF_JIT_H */
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index cdc79de0c794..083e5d8a5e2c 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -356,6 +356,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	const bool isdw = BPF_SIZE(code) == BPF_DW;
 	u8 jmp_cond, reg;
 	s32 jmp_offset;
+	u32 a64_insn;
 
 #define check_imm(bits, imm) do {				\
 	if ((((imm) > 0) && ((imm) >> (bits))) ||		\
@@ -488,18 +489,33 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		break;
 	case BPF_ALU | BPF_AND | BPF_K:
 	case BPF_ALU64 | BPF_AND | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_AND(is64, dst, dst, tmp), ctx);
+		a64_insn = A64_AND_I(is64, dst, dst, imm);
+		if (a64_insn != AARCH64_BREAK_FAULT) {
+			emit(a64_insn, ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_AND(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_OR | BPF_K:
 	case BPF_ALU64 | BPF_OR | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_ORR(is64, dst, dst, tmp), ctx);
+		a64_insn = A64_ORR_I(is64, dst, dst, imm);
+		if (a64_insn != AARCH64_BREAK_FAULT) {
+			emit(a64_insn, ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_ORR(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_XOR | BPF_K:
 	case BPF_ALU64 | BPF_XOR | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_EOR(is64, dst, dst, tmp), ctx);
+		a64_insn = A64_EOR_I(is64, dst, dst, imm);
+		if (a64_insn != AARCH64_BREAK_FAULT) {
+			emit(a64_insn, ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_EOR(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_MUL | BPF_K:
 	case BPF_ALU64 | BPF_MUL | BPF_K:
@@ -628,8 +644,13 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		goto emit_cond_jmp;
 	case BPF_JMP | BPF_JSET | BPF_K:
 	case BPF_JMP32 | BPF_JSET | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_TST(is64, dst, tmp), ctx);
+		a64_insn = A64_TST_I(is64, dst, imm);
+		if (a64_insn != AARCH64_BREAK_FAULT) {
+			emit(a64_insn, ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_TST(is64, dst, tmp), ctx);
+		}
 		goto emit_cond_jmp;
 	/* function call */
 	case BPF_JMP | BPF_CALL:
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 3/3] bpf, arm64: Optimize ADD,SUB,JMP BPF_K using arm64 add/sub immediates
  2020-05-07  1:05 ` Luke Nelson
@ 2020-05-07  1:05   ` Luke Nelson
  -1 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07  1:05 UTC (permalink / raw)
  To: bpf
  Cc: Luke Nelson, Xi Wang, Catalin Marinas, Will Deacon,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Mark Rutland, Enrico Weigelt,
	Torsten Duwe, Allison Randal, Thomas Gleixner, Christoffer Dall,
	Marc Zyngier, linux-arm-kernel, linux-kernel, netdev,
	clang-built-linux

The current code for BPF_{ADD,SUB} BPF_K loads the BPF immediate to a
temporary register before performing the addition/subtraction. Similarly,
BPF_JMP BPF_K cases load the immediate to a temporary register before
comparison.

This patch introduces optimizations that use arm64 immediate add, sub,
cmn, or cmp instructions when the BPF immediate fits. If the immediate
does not fit, it falls back to using a temporary register.

Example of generated code for BPF_ALU64_IMM(BPF_ADD, R0, 2):

without optimization:

  24: mov x10, #0x2
  28: add x7, x7, x10

with optimization:

  24: add x7, x7, #0x2

The code could use A64_{ADD,SUB}_I directly and check if it returns
AARCH64_BREAK_FAULT, similar to how logical immediates are handled.
However, aarch64_insn_gen_add_sub_imm from insn.c prints error messages
when the immediate does not fit, and it's simpler to check if the
immediate fits ahead of time.

Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
---
 arch/arm64/net/bpf_jit.h      |  8 ++++++++
 arch/arm64/net/bpf_jit_comp.c | 36 +++++++++++++++++++++++++++++------
 2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index f36a779949e6..923ae7ff68c8 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -100,6 +100,14 @@
 /* Rd = Rn OP imm12 */
 #define A64_ADD_I(sf, Rd, Rn, imm12) A64_ADDSUB_IMM(sf, Rd, Rn, imm12, ADD)
 #define A64_SUB_I(sf, Rd, Rn, imm12) A64_ADDSUB_IMM(sf, Rd, Rn, imm12, SUB)
+#define A64_ADDS_I(sf, Rd, Rn, imm12) \
+	A64_ADDSUB_IMM(sf, Rd, Rn, imm12, ADD_SETFLAGS)
+#define A64_SUBS_I(sf, Rd, Rn, imm12) \
+	A64_ADDSUB_IMM(sf, Rd, Rn, imm12, SUB_SETFLAGS)
+/* Rn + imm12; set condition flags */
+#define A64_CMN_I(sf, Rn, imm12) A64_ADDS_I(sf, A64_ZR, Rn, imm12)
+/* Rn - imm12; set condition flags */
+#define A64_CMP_I(sf, Rn, imm12) A64_SUBS_I(sf, A64_ZR, Rn, imm12)
 /* Rd = Rn */
 #define A64_MOV(sf, Rd, Rn) A64_ADD_I(sf, Rd, Rn, 0)
 
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 083e5d8a5e2c..561a2fea9cdd 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -167,6 +167,12 @@ static inline int epilogue_offset(const struct jit_ctx *ctx)
 	return to - from;
 }
 
+static bool is_addsub_imm(u32 imm)
+{
+	/* Either imm12 or shifted imm12. */
+	return !(imm & ~0xfff) || !(imm & ~0xfff000);
+}
+
 /* Stack must be multiples of 16B */
 #define STACK_ALIGN(sz) (((sz) + 15) & ~15)
 
@@ -479,13 +485,25 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	/* dst = dst OP imm */
 	case BPF_ALU | BPF_ADD | BPF_K:
 	case BPF_ALU64 | BPF_ADD | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_ADD(is64, dst, dst, tmp), ctx);
+		if (is_addsub_imm(imm)) {
+			emit(A64_ADD_I(is64, dst, dst, imm), ctx);
+		} else if (is_addsub_imm(-imm)) {
+			emit(A64_SUB_I(is64, dst, dst, -imm), ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_ADD(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_SUB | BPF_K:
 	case BPF_ALU64 | BPF_SUB | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_SUB(is64, dst, dst, tmp), ctx);
+		if (is_addsub_imm(imm)) {
+			emit(A64_SUB_I(is64, dst, dst, imm), ctx);
+		} else if (is_addsub_imm(-imm)) {
+			emit(A64_ADD_I(is64, dst, dst, -imm), ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_SUB(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_AND | BPF_K:
 	case BPF_ALU64 | BPF_AND | BPF_K:
@@ -639,8 +657,14 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP32 | BPF_JSLT | BPF_K:
 	case BPF_JMP32 | BPF_JSGE | BPF_K:
 	case BPF_JMP32 | BPF_JSLE | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_CMP(is64, dst, tmp), ctx);
+		if (is_addsub_imm(imm)) {
+			emit(A64_CMP_I(is64, dst, imm), ctx);
+		} else if (is_addsub_imm(-imm)) {
+			emit(A64_CMN_I(is64, dst, -imm), ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_CMP(is64, dst, tmp), ctx);
+		}
 		goto emit_cond_jmp;
 	case BPF_JMP | BPF_JSET | BPF_K:
 	case BPF_JMP32 | BPF_JSET | BPF_K:
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [RFC PATCH bpf-next 3/3] bpf, arm64: Optimize ADD, SUB, JMP BPF_K using arm64 add/sub immediates
@ 2020-05-07  1:05   ` Luke Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07  1:05 UTC (permalink / raw)
  To: bpf
  Cc: Mark Rutland, Song Liu, Catalin Marinas, Alexei Starovoitov,
	Will Deacon, Daniel Borkmann, Marc Zyngier, John Fastabend,
	clang-built-linux, linux-arm-kernel, Zi Shen Lim, Yonghong Song,
	Andrii Nakryiko, Xi Wang, Luke Nelson, Torsten Duwe, KP Singh,
	Thomas Gleixner, Allison Randal, netdev, linux-kernel,
	Enrico Weigelt, Martin KaFai Lau, Christoffer Dall

The current code for BPF_{ADD,SUB} BPF_K loads the BPF immediate to a
temporary register before performing the addition/subtraction. Similarly,
BPF_JMP BPF_K cases load the immediate to a temporary register before
comparison.

This patch introduces optimizations that use arm64 immediate add, sub,
cmn, or cmp instructions when the BPF immediate fits. If the immediate
does not fit, it falls back to using a temporary register.

Example of generated code for BPF_ALU64_IMM(BPF_ADD, R0, 2):

without optimization:

  24: mov x10, #0x2
  28: add x7, x7, x10

with optimization:

  24: add x7, x7, #0x2

The code could use A64_{ADD,SUB}_I directly and check if it returns
AARCH64_BREAK_FAULT, similar to how logical immediates are handled.
However, aarch64_insn_gen_add_sub_imm from insn.c prints error messages
when the immediate does not fit, and it's simpler to check if the
immediate fits ahead of time.

Co-developed-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
---
 arch/arm64/net/bpf_jit.h      |  8 ++++++++
 arch/arm64/net/bpf_jit_comp.c | 36 +++++++++++++++++++++++++++++------
 2 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index f36a779949e6..923ae7ff68c8 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -100,6 +100,14 @@
 /* Rd = Rn OP imm12 */
 #define A64_ADD_I(sf, Rd, Rn, imm12) A64_ADDSUB_IMM(sf, Rd, Rn, imm12, ADD)
 #define A64_SUB_I(sf, Rd, Rn, imm12) A64_ADDSUB_IMM(sf, Rd, Rn, imm12, SUB)
+#define A64_ADDS_I(sf, Rd, Rn, imm12) \
+	A64_ADDSUB_IMM(sf, Rd, Rn, imm12, ADD_SETFLAGS)
+#define A64_SUBS_I(sf, Rd, Rn, imm12) \
+	A64_ADDSUB_IMM(sf, Rd, Rn, imm12, SUB_SETFLAGS)
+/* Rn + imm12; set condition flags */
+#define A64_CMN_I(sf, Rn, imm12) A64_ADDS_I(sf, A64_ZR, Rn, imm12)
+/* Rn - imm12; set condition flags */
+#define A64_CMP_I(sf, Rn, imm12) A64_SUBS_I(sf, A64_ZR, Rn, imm12)
 /* Rd = Rn */
 #define A64_MOV(sf, Rd, Rn) A64_ADD_I(sf, Rd, Rn, 0)
 
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 083e5d8a5e2c..561a2fea9cdd 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -167,6 +167,12 @@ static inline int epilogue_offset(const struct jit_ctx *ctx)
 	return to - from;
 }
 
+static bool is_addsub_imm(u32 imm)
+{
+	/* Either imm12 or shifted imm12. */
+	return !(imm & ~0xfff) || !(imm & ~0xfff000);
+}
+
 /* Stack must be multiples of 16B */
 #define STACK_ALIGN(sz) (((sz) + 15) & ~15)
 
@@ -479,13 +485,25 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	/* dst = dst OP imm */
 	case BPF_ALU | BPF_ADD | BPF_K:
 	case BPF_ALU64 | BPF_ADD | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_ADD(is64, dst, dst, tmp), ctx);
+		if (is_addsub_imm(imm)) {
+			emit(A64_ADD_I(is64, dst, dst, imm), ctx);
+		} else if (is_addsub_imm(-imm)) {
+			emit(A64_SUB_I(is64, dst, dst, -imm), ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_ADD(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_SUB | BPF_K:
 	case BPF_ALU64 | BPF_SUB | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_SUB(is64, dst, dst, tmp), ctx);
+		if (is_addsub_imm(imm)) {
+			emit(A64_SUB_I(is64, dst, dst, imm), ctx);
+		} else if (is_addsub_imm(-imm)) {
+			emit(A64_ADD_I(is64, dst, dst, -imm), ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_SUB(is64, dst, dst, tmp), ctx);
+		}
 		break;
 	case BPF_ALU | BPF_AND | BPF_K:
 	case BPF_ALU64 | BPF_AND | BPF_K:
@@ -639,8 +657,14 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_JMP32 | BPF_JSLT | BPF_K:
 	case BPF_JMP32 | BPF_JSGE | BPF_K:
 	case BPF_JMP32 | BPF_JSLE | BPF_K:
-		emit_a64_mov_i(is64, tmp, imm, ctx);
-		emit(A64_CMP(is64, dst, tmp), ctx);
+		if (is_addsub_imm(imm)) {
+			emit(A64_CMP_I(is64, dst, imm), ctx);
+		} else if (is_addsub_imm(-imm)) {
+			emit(A64_CMN_I(is64, dst, -imm), ctx);
+		} else {
+			emit_a64_mov_i(is64, tmp, imm, ctx);
+			emit(A64_CMP(is64, dst, tmp), ctx);
+		}
 		goto emit_cond_jmp;
 	case BPF_JMP | BPF_JSET | BPF_K:
 	case BPF_JMP32 | BPF_JSET | BPF_K:
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
  2020-05-07  1:05   ` Luke Nelson
@ 2020-05-07  8:19     ` Marc Zyngier
  -1 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2020-05-07  8:19 UTC (permalink / raw)
  To: Luke Nelson
  Cc: bpf, Luke Nelson, Xi Wang, Catalin Marinas, Will Deacon,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Mark Rutland, Greg Kroah-Hartman,
	Thomas Gleixner, Christoffer Dall, linux-arm-kernel,
	linux-kernel, netdev, clang-built-linux

Hi Luke,

Thanks a lot for nailing these bugs.

On Wed,  6 May 2020 18:05:01 -0700
Luke Nelson <lukenels@cs.washington.edu> wrote:

> This patch fixes two issues present in the current function for encoding
> arm64 logical immediates when using the 32-bit variants of instructions.
> 
> First, the code does not correctly reject an all-ones 32-bit immediate
> and returns an undefined instruction encoding, which can crash the kernel.

You make it sound more dramatic than it needs to be! ;-) As you pointed
out below, nothing in the kernel calls this code to encode a 32bit
immediate, so triggering a crash is not possible (unless you manage to
exploit something else to call into this code). It definitely needs
fixing though!

> The fix is to add a check for this case.
> 
> Second, the code incorrectly rejects some 32-bit immediates that are
> actually encodable as logical immediates. The root cause is that the code
> uses a default mask of 64-bit all-ones, even for 32-bit immediates. This
> causes an issue later on when the mask is used to fill the top bits of
> the immediate with ones, shown here:
> 
>   /*
>    * Pattern: 0..01..10..01..1
>    *
>    * Fill the unused top bits with ones, and check if
>    * the result is a valid immediate (all ones with a
>    * contiguous ranges of zeroes).
>    */
>   imm |= ~mask;
>   if (!range_of_ones(~imm))
>           return AARCH64_BREAK_FAULT;
> 
> To see the problem, consider an immediate of the form 0..01..10..01..1,
> where the upper 32 bits are zero, such as 0x80000001. The code checks
> if ~(imm | ~mask) contains a range of ones: the incorrect mask yields
> 1..10..01..10..0, which fails the check; the correct mask yields
> 0..01..10..0, which succeeds.
> 
> The fix is to use a 32-bit all-ones default mask for 32-bit immediates.

Paging this thing back in is really hard (I only had one coffee, more
needed). Yes, I see what you mean. Duh! I think this only happens if
mask hasn't been adjusted by the "pattern spotting" code the first place
though.

> 
> Currently, the only user of this function is in
> arch/arm64/kvm/va_layout.c, which uses 64-bit immediates and won't
> trigger these bugs.
> 
> We tested the new code against llvm-mc with all 1,302 encodable 32-bit
> logical immediates and all 5,334 encodable 64-bit logical immediates.

That, on its own, is awesome information. Do you have any pointer on
how to set this up?

> 
> Fixes: ef3935eeebff ("arm64: insn: Add encoder for bitwise operations using literals")
> Co-developed-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
> ---
>  arch/arm64/kernel/insn.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 4a9e773a177f..42fad79546bb 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -1535,7 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
>  				    u32 insn)
>  {
>  	unsigned int immr, imms, n, ones, ror, esz, tmp;
> -	u64 mask = ~0UL;
> +	u64 mask;
>  
>  	/* Can't encode full zeroes or full ones */
>  	if (!imm || !~imm)
> @@ -1543,13 +1543,15 @@ static u32 aarch64_encode_immediate(u64 imm,
>  
>  	switch (variant) {
>  	case AARCH64_INSN_VARIANT_32BIT:
> -		if (upper_32_bits(imm))
> +		if (upper_32_bits(imm) || imm == 0xffffffffUL)

nit: I don't like the fact that this create a small dissymmetry in the
way we check things (we start by checking !~imm, which is not relevant
to 32bit constants).

>  			return AARCH64_BREAK_FAULT;
>  		esz = 32;
> +		mask = 0xffffffffUL;
>  		break;
>  	case AARCH64_INSN_VARIANT_64BIT:
>  		insn |= AARCH64_INSN_SF_BIT;
>  		esz = 64;
> +		mask = ~0UL;

I'd rather we generate the mask in a programmatic way, which is pretty
easy to do since we have the initial element size.

>  		break;
>  	default:
>  		pr_err("%s: unknown variant encoding %d\n", __func__, variant);

To account for the above remarks, I came up with the following patch:

diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 4a9e773a177f..422bf9a79ed6 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1535,11 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
 				    u32 insn)
 {
 	unsigned int immr, imms, n, ones, ror, esz, tmp;
-	u64 mask = ~0UL;
-
-	/* Can't encode full zeroes or full ones */
-	if (!imm || !~imm)
-		return AARCH64_BREAK_FAULT;
+	u64 mask;
 
 	switch (variant) {
 	case AARCH64_INSN_VARIANT_32BIT:
@@ -1556,6 +1552,11 @@ static u32 aarch64_encode_immediate(u64 imm,
 		return AARCH64_BREAK_FAULT;
 	}
 
+	/* Can't encode full zeroes or full ones */
+	mask = GENMASK_ULL(esz - 1, 0);
+	if (!imm || !(~imm & mask))
+		return AARCH64_BREAK_FAULT;
+
 	/*
 	 * Inverse of Replicate(). Try to spot a repeating pattern
 	 * with a pow2 stride.

which is of course completely untested (it does compile though).

Thoughts?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
@ 2020-05-07  8:19     ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2020-05-07  8:19 UTC (permalink / raw)
  To: Luke Nelson
  Cc: Mark Rutland, Song Liu, Catalin Marinas, Alexei Starovoitov,
	Will Deacon, Daniel Borkmann, John Fastabend, clang-built-linux,
	Zi Shen Lim, Yonghong Song, Andrii Nakryiko, Xi Wang,
	Luke Nelson, KP Singh, Thomas Gleixner, linux-arm-kernel, netdev,
	linux-kernel, Greg Kroah-Hartman, bpf, Martin KaFai Lau,
	Christoffer Dall

Hi Luke,

Thanks a lot for nailing these bugs.

On Wed,  6 May 2020 18:05:01 -0700
Luke Nelson <lukenels@cs.washington.edu> wrote:

> This patch fixes two issues present in the current function for encoding
> arm64 logical immediates when using the 32-bit variants of instructions.
> 
> First, the code does not correctly reject an all-ones 32-bit immediate
> and returns an undefined instruction encoding, which can crash the kernel.

You make it sound more dramatic than it needs to be! ;-) As you pointed
out below, nothing in the kernel calls this code to encode a 32bit
immediate, so triggering a crash is not possible (unless you manage to
exploit something else to call into this code). It definitely needs
fixing though!

> The fix is to add a check for this case.
> 
> Second, the code incorrectly rejects some 32-bit immediates that are
> actually encodable as logical immediates. The root cause is that the code
> uses a default mask of 64-bit all-ones, even for 32-bit immediates. This
> causes an issue later on when the mask is used to fill the top bits of
> the immediate with ones, shown here:
> 
>   /*
>    * Pattern: 0..01..10..01..1
>    *
>    * Fill the unused top bits with ones, and check if
>    * the result is a valid immediate (all ones with a
>    * contiguous ranges of zeroes).
>    */
>   imm |= ~mask;
>   if (!range_of_ones(~imm))
>           return AARCH64_BREAK_FAULT;
> 
> To see the problem, consider an immediate of the form 0..01..10..01..1,
> where the upper 32 bits are zero, such as 0x80000001. The code checks
> if ~(imm | ~mask) contains a range of ones: the incorrect mask yields
> 1..10..01..10..0, which fails the check; the correct mask yields
> 0..01..10..0, which succeeds.
> 
> The fix is to use a 32-bit all-ones default mask for 32-bit immediates.

Paging this thing back in is really hard (I only had one coffee, more
needed). Yes, I see what you mean. Duh! I think this only happens if
mask hasn't been adjusted by the "pattern spotting" code the first place
though.

> 
> Currently, the only user of this function is in
> arch/arm64/kvm/va_layout.c, which uses 64-bit immediates and won't
> trigger these bugs.
> 
> We tested the new code against llvm-mc with all 1,302 encodable 32-bit
> logical immediates and all 5,334 encodable 64-bit logical immediates.

That, on its own, is awesome information. Do you have any pointer on
how to set this up?

> 
> Fixes: ef3935eeebff ("arm64: insn: Add encoder for bitwise operations using literals")
> Co-developed-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
> ---
>  arch/arm64/kernel/insn.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 4a9e773a177f..42fad79546bb 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -1535,7 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
>  				    u32 insn)
>  {
>  	unsigned int immr, imms, n, ones, ror, esz, tmp;
> -	u64 mask = ~0UL;
> +	u64 mask;
>  
>  	/* Can't encode full zeroes or full ones */
>  	if (!imm || !~imm)
> @@ -1543,13 +1543,15 @@ static u32 aarch64_encode_immediate(u64 imm,
>  
>  	switch (variant) {
>  	case AARCH64_INSN_VARIANT_32BIT:
> -		if (upper_32_bits(imm))
> +		if (upper_32_bits(imm) || imm == 0xffffffffUL)

nit: I don't like the fact that this create a small dissymmetry in the
way we check things (we start by checking !~imm, which is not relevant
to 32bit constants).

>  			return AARCH64_BREAK_FAULT;
>  		esz = 32;
> +		mask = 0xffffffffUL;
>  		break;
>  	case AARCH64_INSN_VARIANT_64BIT:
>  		insn |= AARCH64_INSN_SF_BIT;
>  		esz = 64;
> +		mask = ~0UL;

I'd rather we generate the mask in a programmatic way, which is pretty
easy to do since we have the initial element size.

>  		break;
>  	default:
>  		pr_err("%s: unknown variant encoding %d\n", __func__, variant);

To account for the above remarks, I came up with the following patch:

diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 4a9e773a177f..422bf9a79ed6 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1535,11 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
 				    u32 insn)
 {
 	unsigned int immr, imms, n, ones, ror, esz, tmp;
-	u64 mask = ~0UL;
-
-	/* Can't encode full zeroes or full ones */
-	if (!imm || !~imm)
-		return AARCH64_BREAK_FAULT;
+	u64 mask;
 
 	switch (variant) {
 	case AARCH64_INSN_VARIANT_32BIT:
@@ -1556,6 +1552,11 @@ static u32 aarch64_encode_immediate(u64 imm,
 		return AARCH64_BREAK_FAULT;
 	}
 
+	/* Can't encode full zeroes or full ones */
+	mask = GENMASK_ULL(esz - 1, 0);
+	if (!imm || !(~imm & mask))
+		return AARCH64_BREAK_FAULT;
+
 	/*
 	 * Inverse of Replicate(). Try to spot a repeating pattern
 	 * with a pow2 stride.

which is of course completely untested (it does compile though).

Thoughts?

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
  2020-05-07  1:05   ` Luke Nelson
@ 2020-05-07  8:29     ` Will Deacon
  -1 siblings, 0 replies; 24+ messages in thread
From: Will Deacon @ 2020-05-07  8:29 UTC (permalink / raw)
  To: Luke Nelson
  Cc: bpf, Luke Nelson, Xi Wang, Catalin Marinas, Daniel Borkmann,
	Alexei Starovoitov, Zi Shen Lim, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Mark Rutland, Greg Kroah-Hartman, Thomas Gleixner,
	Christoffer Dall, Marc Zyngier, linux-arm-kernel, linux-kernel,
	netdev, clang-built-linux

Hi Luke,

Thanks for the patches.

On Wed, May 06, 2020 at 06:05:01PM -0700, Luke Nelson wrote:
> This patch fixes two issues present in the current function for encoding
> arm64 logical immediates when using the 32-bit variants of instructions.
> 
> First, the code does not correctly reject an all-ones 32-bit immediate
> and returns an undefined instruction encoding, which can crash the kernel.
> The fix is to add a check for this case.
> 
> Second, the code incorrectly rejects some 32-bit immediates that are
> actually encodable as logical immediates. The root cause is that the code
> uses a default mask of 64-bit all-ones, even for 32-bit immediates. This
> causes an issue later on when the mask is used to fill the top bits of
> the immediate with ones, shown here:
> 
>   /*
>    * Pattern: 0..01..10..01..1
>    *
>    * Fill the unused top bits with ones, and check if
>    * the result is a valid immediate (all ones with a
>    * contiguous ranges of zeroes).
>    */
>   imm |= ~mask;
>   if (!range_of_ones(~imm))
>           return AARCH64_BREAK_FAULT;
> 
> To see the problem, consider an immediate of the form 0..01..10..01..1,
> where the upper 32 bits are zero, such as 0x80000001. The code checks
> if ~(imm | ~mask) contains a range of ones: the incorrect mask yields
> 1..10..01..10..0, which fails the check; the correct mask yields
> 0..01..10..0, which succeeds.
> 
> The fix is to use a 32-bit all-ones default mask for 32-bit immediates.
> 
> Currently, the only user of this function is in
> arch/arm64/kvm/va_layout.c, which uses 64-bit immediates and won't
> trigger these bugs.

Ah, so this isn't a fix or a bpf patch ;)

I can queue it via arm64 for 5.8, along with the bpf patches since there
are some other small changes pending in the arm64 bpf backend for BTI.

> We tested the new code against llvm-mc with all 1,302 encodable 32-bit
> logical immediates and all 5,334 encodable 64-bit logical immediates.
> 
> Fixes: ef3935eeebff ("arm64: insn: Add encoder for bitwise operations using literals")
> Co-developed-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
> ---
>  arch/arm64/kernel/insn.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 4a9e773a177f..42fad79546bb 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -1535,7 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
>  				    u32 insn)
>  {
>  	unsigned int immr, imms, n, ones, ror, esz, tmp;
> -	u64 mask = ~0UL;
> +	u64 mask;
>  
>  	/* Can't encode full zeroes or full ones */
>  	if (!imm || !~imm)

It's a bit grotty spreading the checks out now. How about we tweak things
slightly along the lines of:


diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 4a9e773a177f..60ec788eaf33 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1535,16 +1535,10 @@ static u32 aarch64_encode_immediate(u64 imm,
 				    u32 insn)
 {
 	unsigned int immr, imms, n, ones, ror, esz, tmp;
-	u64 mask = ~0UL;
-
-	/* Can't encode full zeroes or full ones */
-	if (!imm || !~imm)
-		return AARCH64_BREAK_FAULT;
+	u64 mask;
 
 	switch (variant) {
 	case AARCH64_INSN_VARIANT_32BIT:
-		if (upper_32_bits(imm))
-			return AARCH64_BREAK_FAULT;
 		esz = 32;
 		break;
 	case AARCH64_INSN_VARIANT_64BIT:
@@ -1556,6 +1550,12 @@ static u32 aarch64_encode_immediate(u64 imm,
 		return AARCH64_BREAK_FAULT;
 	}
 
+	mask = GENMASK(esz - 1, 0);
+
+	/* Can't encode full zeroes or full ones */
+	if (imm & ~mask || !imm || imm == mask)
+		return AARCH64_BREAK_FAULT;
+
 	/*
 	 * Inverse of Replicate(). Try to spot a repeating pattern
 	 * with a pow2 stride.


What do you think?

Will

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
@ 2020-05-07  8:29     ` Will Deacon
  0 siblings, 0 replies; 24+ messages in thread
From: Will Deacon @ 2020-05-07  8:29 UTC (permalink / raw)
  To: Luke Nelson
  Cc: Mark Rutland, Song Liu, Catalin Marinas, Alexei Starovoitov,
	Daniel Borkmann, Marc Zyngier, John Fastabend, clang-built-linux,
	Zi Shen Lim, Yonghong Song, Andrii Nakryiko, Xi Wang,
	Luke Nelson, KP Singh, Thomas Gleixner, linux-arm-kernel, netdev,
	linux-kernel, Greg Kroah-Hartman, bpf, Martin KaFai Lau,
	Christoffer Dall

Hi Luke,

Thanks for the patches.

On Wed, May 06, 2020 at 06:05:01PM -0700, Luke Nelson wrote:
> This patch fixes two issues present in the current function for encoding
> arm64 logical immediates when using the 32-bit variants of instructions.
> 
> First, the code does not correctly reject an all-ones 32-bit immediate
> and returns an undefined instruction encoding, which can crash the kernel.
> The fix is to add a check for this case.
> 
> Second, the code incorrectly rejects some 32-bit immediates that are
> actually encodable as logical immediates. The root cause is that the code
> uses a default mask of 64-bit all-ones, even for 32-bit immediates. This
> causes an issue later on when the mask is used to fill the top bits of
> the immediate with ones, shown here:
> 
>   /*
>    * Pattern: 0..01..10..01..1
>    *
>    * Fill the unused top bits with ones, and check if
>    * the result is a valid immediate (all ones with a
>    * contiguous ranges of zeroes).
>    */
>   imm |= ~mask;
>   if (!range_of_ones(~imm))
>           return AARCH64_BREAK_FAULT;
> 
> To see the problem, consider an immediate of the form 0..01..10..01..1,
> where the upper 32 bits are zero, such as 0x80000001. The code checks
> if ~(imm | ~mask) contains a range of ones: the incorrect mask yields
> 1..10..01..10..0, which fails the check; the correct mask yields
> 0..01..10..0, which succeeds.
> 
> The fix is to use a 32-bit all-ones default mask for 32-bit immediates.
> 
> Currently, the only user of this function is in
> arch/arm64/kvm/va_layout.c, which uses 64-bit immediates and won't
> trigger these bugs.

Ah, so this isn't a fix or a bpf patch ;)

I can queue it via arm64 for 5.8, along with the bpf patches since there
are some other small changes pending in the arm64 bpf backend for BTI.

> We tested the new code against llvm-mc with all 1,302 encodable 32-bit
> logical immediates and all 5,334 encodable 64-bit logical immediates.
> 
> Fixes: ef3935eeebff ("arm64: insn: Add encoder for bitwise operations using literals")
> Co-developed-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
> ---
>  arch/arm64/kernel/insn.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 4a9e773a177f..42fad79546bb 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -1535,7 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
>  				    u32 insn)
>  {
>  	unsigned int immr, imms, n, ones, ror, esz, tmp;
> -	u64 mask = ~0UL;
> +	u64 mask;
>  
>  	/* Can't encode full zeroes or full ones */
>  	if (!imm || !~imm)

It's a bit grotty spreading the checks out now. How about we tweak things
slightly along the lines of:


diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
index 4a9e773a177f..60ec788eaf33 100644
--- a/arch/arm64/kernel/insn.c
+++ b/arch/arm64/kernel/insn.c
@@ -1535,16 +1535,10 @@ static u32 aarch64_encode_immediate(u64 imm,
 				    u32 insn)
 {
 	unsigned int immr, imms, n, ones, ror, esz, tmp;
-	u64 mask = ~0UL;
-
-	/* Can't encode full zeroes or full ones */
-	if (!imm || !~imm)
-		return AARCH64_BREAK_FAULT;
+	u64 mask;
 
 	switch (variant) {
 	case AARCH64_INSN_VARIANT_32BIT:
-		if (upper_32_bits(imm))
-			return AARCH64_BREAK_FAULT;
 		esz = 32;
 		break;
 	case AARCH64_INSN_VARIANT_64BIT:
@@ -1556,6 +1550,12 @@ static u32 aarch64_encode_immediate(u64 imm,
 		return AARCH64_BREAK_FAULT;
 	}
 
+	mask = GENMASK(esz - 1, 0);
+
+	/* Can't encode full zeroes or full ones */
+	if (imm & ~mask || !imm || imm == mask)
+		return AARCH64_BREAK_FAULT;
+
 	/*
 	 * Inverse of Replicate(). Try to spot a repeating pattern
 	 * with a pow2 stride.


What do you think?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
  2020-05-07  8:29     ` Will Deacon
@ 2020-05-07  9:12       ` Marc Zyngier
  -1 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2020-05-07  9:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Luke Nelson, bpf, Luke Nelson, Xi Wang, Catalin Marinas,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Mark Rutland, Greg Kroah-Hartman,
	Thomas Gleixner, linux-arm-kernel, linux-kernel, netdev,
	clang-built-linux

On Thu, 7 May 2020 09:29:35 +0100
Will Deacon <will@kernel.org> wrote:

Hi Will,

> Hi Luke,
> 
> Thanks for the patches.
> 
> On Wed, May 06, 2020 at 06:05:01PM -0700, Luke Nelson wrote:
> > This patch fixes two issues present in the current function for encoding
> > arm64 logical immediates when using the 32-bit variants of instructions.
> > 
> > First, the code does not correctly reject an all-ones 32-bit immediate
> > and returns an undefined instruction encoding, which can crash the kernel.
> > The fix is to add a check for this case.
> > 
> > Second, the code incorrectly rejects some 32-bit immediates that are
> > actually encodable as logical immediates. The root cause is that the code
> > uses a default mask of 64-bit all-ones, even for 32-bit immediates. This
> > causes an issue later on when the mask is used to fill the top bits of
> > the immediate with ones, shown here:
> > 
> >   /*
> >    * Pattern: 0..01..10..01..1
> >    *
> >    * Fill the unused top bits with ones, and check if
> >    * the result is a valid immediate (all ones with a
> >    * contiguous ranges of zeroes).
> >    */
> >   imm |= ~mask;
> >   if (!range_of_ones(~imm))
> >           return AARCH64_BREAK_FAULT;
> > 
> > To see the problem, consider an immediate of the form 0..01..10..01..1,
> > where the upper 32 bits are zero, such as 0x80000001. The code checks
> > if ~(imm | ~mask) contains a range of ones: the incorrect mask yields
> > 1..10..01..10..0, which fails the check; the correct mask yields
> > 0..01..10..0, which succeeds.
> > 
> > The fix is to use a 32-bit all-ones default mask for 32-bit immediates.
> > 
> > Currently, the only user of this function is in
> > arch/arm64/kvm/va_layout.c, which uses 64-bit immediates and won't
> > trigger these bugs.  
> 
> Ah, so this isn't a fix or a bpf patch ;)
> 
> I can queue it via arm64 for 5.8, along with the bpf patches since there
> are some other small changes pending in the arm64 bpf backend for BTI.
> 
> > We tested the new code against llvm-mc with all 1,302 encodable 32-bit
> > logical immediates and all 5,334 encodable 64-bit logical immediates.
> > 
> > Fixes: ef3935eeebff ("arm64: insn: Add encoder for bitwise operations using literals")
> > Co-developed-by: Xi Wang <xi.wang@gmail.com>
> > Signed-off-by: Xi Wang <xi.wang@gmail.com>
> > Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
> > ---
> >  arch/arm64/kernel/insn.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > index 4a9e773a177f..42fad79546bb 100644
> > --- a/arch/arm64/kernel/insn.c
> > +++ b/arch/arm64/kernel/insn.c
> > @@ -1535,7 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
> >  				    u32 insn)
> >  {
> >  	unsigned int immr, imms, n, ones, ror, esz, tmp;
> > -	u64 mask = ~0UL;
> > +	u64 mask;
> >  
> >  	/* Can't encode full zeroes or full ones */
> >  	if (!imm || !~imm)  
> 
> It's a bit grotty spreading the checks out now. How about we tweak things
> slightly along the lines of:
> 
> 
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 4a9e773a177f..60ec788eaf33 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -1535,16 +1535,10 @@ static u32 aarch64_encode_immediate(u64 imm,
>  				    u32 insn)
>  {
>  	unsigned int immr, imms, n, ones, ror, esz, tmp;
> -	u64 mask = ~0UL;
> -
> -	/* Can't encode full zeroes or full ones */
> -	if (!imm || !~imm)
> -		return AARCH64_BREAK_FAULT;
> +	u64 mask;
>  
>  	switch (variant) {
>  	case AARCH64_INSN_VARIANT_32BIT:
> -		if (upper_32_bits(imm))
> -			return AARCH64_BREAK_FAULT;
>  		esz = 32;
>  		break;
>  	case AARCH64_INSN_VARIANT_64BIT:
> @@ -1556,6 +1550,12 @@ static u32 aarch64_encode_immediate(u64 imm,
>  		return AARCH64_BREAK_FAULT;
>  	}
>  
> +	mask = GENMASK(esz - 1, 0);
> +
> +	/* Can't encode full zeroes or full ones */

... nor a value wider than the mask.

> +	if (imm & ~mask || !imm || imm == mask)
> +		return AARCH64_BREAK_FAULT;
> +
>  	/*
>  	 * Inverse of Replicate(). Try to spot a repeating pattern
>  	 * with a pow2 stride.
> 
> 
> What do you think?

I'd be pretty happy with that.

Reviewed-by: Marc Zyngier <maz@kernel.org>

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
@ 2020-05-07  9:12       ` Marc Zyngier
  0 siblings, 0 replies; 24+ messages in thread
From: Marc Zyngier @ 2020-05-07  9:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Song Liu, Thomas Gleixner, Daniel Borkmann,
	Luke Nelson, Catalin Marinas, John Fastabend, Alexei Starovoitov,
	linux-kernel, netdev, clang-built-linux, Luke Nelson,
	Greg Kroah-Hartman, linux-arm-kernel, Zi Shen Lim, KP Singh,
	Yonghong Song, bpf, Andrii Nakryiko, Martin KaFai Lau, Xi Wang

On Thu, 7 May 2020 09:29:35 +0100
Will Deacon <will@kernel.org> wrote:

Hi Will,

> Hi Luke,
> 
> Thanks for the patches.
> 
> On Wed, May 06, 2020 at 06:05:01PM -0700, Luke Nelson wrote:
> > This patch fixes two issues present in the current function for encoding
> > arm64 logical immediates when using the 32-bit variants of instructions.
> > 
> > First, the code does not correctly reject an all-ones 32-bit immediate
> > and returns an undefined instruction encoding, which can crash the kernel.
> > The fix is to add a check for this case.
> > 
> > Second, the code incorrectly rejects some 32-bit immediates that are
> > actually encodable as logical immediates. The root cause is that the code
> > uses a default mask of 64-bit all-ones, even for 32-bit immediates. This
> > causes an issue later on when the mask is used to fill the top bits of
> > the immediate with ones, shown here:
> > 
> >   /*
> >    * Pattern: 0..01..10..01..1
> >    *
> >    * Fill the unused top bits with ones, and check if
> >    * the result is a valid immediate (all ones with a
> >    * contiguous ranges of zeroes).
> >    */
> >   imm |= ~mask;
> >   if (!range_of_ones(~imm))
> >           return AARCH64_BREAK_FAULT;
> > 
> > To see the problem, consider an immediate of the form 0..01..10..01..1,
> > where the upper 32 bits are zero, such as 0x80000001. The code checks
> > if ~(imm | ~mask) contains a range of ones: the incorrect mask yields
> > 1..10..01..10..0, which fails the check; the correct mask yields
> > 0..01..10..0, which succeeds.
> > 
> > The fix is to use a 32-bit all-ones default mask for 32-bit immediates.
> > 
> > Currently, the only user of this function is in
> > arch/arm64/kvm/va_layout.c, which uses 64-bit immediates and won't
> > trigger these bugs.  
> 
> Ah, so this isn't a fix or a bpf patch ;)
> 
> I can queue it via arm64 for 5.8, along with the bpf patches since there
> are some other small changes pending in the arm64 bpf backend for BTI.
> 
> > We tested the new code against llvm-mc with all 1,302 encodable 32-bit
> > logical immediates and all 5,334 encodable 64-bit logical immediates.
> > 
> > Fixes: ef3935eeebff ("arm64: insn: Add encoder for bitwise operations using literals")
> > Co-developed-by: Xi Wang <xi.wang@gmail.com>
> > Signed-off-by: Xi Wang <xi.wang@gmail.com>
> > Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>
> > ---
> >  arch/arm64/kernel/insn.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > index 4a9e773a177f..42fad79546bb 100644
> > --- a/arch/arm64/kernel/insn.c
> > +++ b/arch/arm64/kernel/insn.c
> > @@ -1535,7 +1535,7 @@ static u32 aarch64_encode_immediate(u64 imm,
> >  				    u32 insn)
> >  {
> >  	unsigned int immr, imms, n, ones, ror, esz, tmp;
> > -	u64 mask = ~0UL;
> > +	u64 mask;
> >  
> >  	/* Can't encode full zeroes or full ones */
> >  	if (!imm || !~imm)  
> 
> It's a bit grotty spreading the checks out now. How about we tweak things
> slightly along the lines of:
> 
> 
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 4a9e773a177f..60ec788eaf33 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> @@ -1535,16 +1535,10 @@ static u32 aarch64_encode_immediate(u64 imm,
>  				    u32 insn)
>  {
>  	unsigned int immr, imms, n, ones, ror, esz, tmp;
> -	u64 mask = ~0UL;
> -
> -	/* Can't encode full zeroes or full ones */
> -	if (!imm || !~imm)
> -		return AARCH64_BREAK_FAULT;
> +	u64 mask;
>  
>  	switch (variant) {
>  	case AARCH64_INSN_VARIANT_32BIT:
> -		if (upper_32_bits(imm))
> -			return AARCH64_BREAK_FAULT;
>  		esz = 32;
>  		break;
>  	case AARCH64_INSN_VARIANT_64BIT:
> @@ -1556,6 +1550,12 @@ static u32 aarch64_encode_immediate(u64 imm,
>  		return AARCH64_BREAK_FAULT;
>  	}
>  
> +	mask = GENMASK(esz - 1, 0);
> +
> +	/* Can't encode full zeroes or full ones */

... nor a value wider than the mask.

> +	if (imm & ~mask || !imm || imm == mask)
> +		return AARCH64_BREAK_FAULT;
> +
>  	/*
>  	 * Inverse of Replicate(). Try to spot a repeating pattern
>  	 * with a pow2 stride.
> 
> 
> What do you think?

I'd be pretty happy with that.

Reviewed-by: Marc Zyngier <maz@kernel.org>

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 2/3] bpf, arm64: Optimize AND,OR,XOR,JSET BPF_K using arm64 logical immediates
  2020-05-07  1:05   ` [RFC PATCH bpf-next 2/3] bpf, arm64: Optimize AND, OR, XOR, JSET " Luke Nelson
@ 2020-05-07 20:19     ` Daniel Borkmann
  -1 siblings, 0 replies; 24+ messages in thread
From: Daniel Borkmann @ 2020-05-07 20:19 UTC (permalink / raw)
  To: Luke Nelson, bpf
  Cc: Luke Nelson, Xi Wang, Catalin Marinas, Will Deacon,
	Alexei Starovoitov, Zi Shen Lim, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Mark Rutland, Enrico Weigelt, Alexios Zavras, Thomas Gleixner,
	Marc Zyngier, Christoffer Dall, linux-arm-kernel, linux-kernel,
	netdev, clang-built-linux

On 5/7/20 3:05 AM, Luke Nelson wrote:
> The current code for BPF_{AND,OR,XOR,JSET} BPF_K loads the immediate to
> a temporary register before use.
> 
> This patch changes the code to avoid using a temporary register
> when the BPF immediate is encodable using an arm64 logical immediate
> instruction. If the encoding fails (due to the immediate not being
> encodable), it falls back to using a temporary register.
> 
> Example of generated code for BPF_ALU32_IMM(BPF_AND, R0, 0x80000001):
> 
> without optimization:
> 
>    24: mov  w10, #0x8000ffff
>    28: movk w10, #0x1
>    2c: and  w7, w7, w10
> 
> with optimization:
> 
>    24: and  w7, w7, #0x80000001
> 
> Since the encoding process is quite complex, the JIT reuses existing
> functionality in arch/arm64/kernel/insn.c for encoding logical immediates
> rather than duplicate it in the JIT.
> 
> Co-developed-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>

Great find, thanks! Given Will wanted to queue them:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 2/3] bpf, arm64: Optimize AND,OR,XOR,JSET BPF_K using arm64 logical immediates
@ 2020-05-07 20:19     ` Daniel Borkmann
  0 siblings, 0 replies; 24+ messages in thread
From: Daniel Borkmann @ 2020-05-07 20:19 UTC (permalink / raw)
  To: Luke Nelson, bpf
  Cc: Mark Rutland, Song Liu, Zi Shen Lim, Alexei Starovoitov,
	Will Deacon, Marc Zyngier, John Fastabend, clang-built-linux,
	Catalin Marinas, Yonghong Song, Andrii Nakryiko, Xi Wang,
	Luke Nelson, Alexios Zavras, KP Singh, Thomas Gleixner,
	linux-arm-kernel, netdev, linux-kernel, Enrico Weigelt,
	Martin KaFai Lau, Christoffer Dall

On 5/7/20 3:05 AM, Luke Nelson wrote:
> The current code for BPF_{AND,OR,XOR,JSET} BPF_K loads the immediate to
> a temporary register before use.
> 
> This patch changes the code to avoid using a temporary register
> when the BPF immediate is encodable using an arm64 logical immediate
> instruction. If the encoding fails (due to the immediate not being
> encodable), it falls back to using a temporary register.
> 
> Example of generated code for BPF_ALU32_IMM(BPF_AND, R0, 0x80000001):
> 
> without optimization:
> 
>    24: mov  w10, #0x8000ffff
>    28: movk w10, #0x1
>    2c: and  w7, w7, w10
> 
> with optimization:
> 
>    24: and  w7, w7, #0x80000001
> 
> Since the encoding process is quite complex, the JIT reuses existing
> functionality in arch/arm64/kernel/insn.c for encoding logical immediates
> rather than duplicate it in the JIT.
> 
> Co-developed-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>

Great find, thanks! Given Will wanted to queue them:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 3/3] bpf, arm64: Optimize ADD,SUB,JMP BPF_K using arm64 add/sub immediates
  2020-05-07  1:05   ` [RFC PATCH bpf-next 3/3] bpf, arm64: Optimize ADD, SUB, JMP " Luke Nelson
@ 2020-05-07 20:22     ` Daniel Borkmann
  -1 siblings, 0 replies; 24+ messages in thread
From: Daniel Borkmann @ 2020-05-07 20:22 UTC (permalink / raw)
  To: Luke Nelson, bpf
  Cc: Luke Nelson, Xi Wang, Catalin Marinas, Will Deacon,
	Alexei Starovoitov, Zi Shen Lim, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Mark Rutland, Enrico Weigelt, Torsten Duwe, Allison Randal,
	Thomas Gleixner, Christoffer Dall, Marc Zyngier,
	linux-arm-kernel, linux-kernel, netdev, clang-built-linux

On 5/7/20 3:05 AM, Luke Nelson wrote:
> The current code for BPF_{ADD,SUB} BPF_K loads the BPF immediate to a
> temporary register before performing the addition/subtraction. Similarly,
> BPF_JMP BPF_K cases load the immediate to a temporary register before
> comparison.
> 
> This patch introduces optimizations that use arm64 immediate add, sub,
> cmn, or cmp instructions when the BPF immediate fits. If the immediate
> does not fit, it falls back to using a temporary register.
> 
> Example of generated code for BPF_ALU64_IMM(BPF_ADD, R0, 2):
> 
> without optimization:
> 
>    24: mov x10, #0x2
>    28: add x7, x7, x10
> 
> with optimization:
> 
>    24: add x7, x7, #0x2
> 
> The code could use A64_{ADD,SUB}_I directly and check if it returns
> AARCH64_BREAK_FAULT, similar to how logical immediates are handled.
> However, aarch64_insn_gen_add_sub_imm from insn.c prints error messages
> when the immediate does not fit, and it's simpler to check if the
> immediate fits ahead of time.
> 
> Co-developed-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>

Same here:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

Thanks!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 3/3] bpf, arm64: Optimize ADD,SUB,JMP BPF_K using arm64 add/sub immediates
@ 2020-05-07 20:22     ` Daniel Borkmann
  0 siblings, 0 replies; 24+ messages in thread
From: Daniel Borkmann @ 2020-05-07 20:22 UTC (permalink / raw)
  To: Luke Nelson, bpf
  Cc: Mark Rutland, Song Liu, Zi Shen Lim, Alexei Starovoitov,
	Will Deacon, Marc Zyngier, John Fastabend, clang-built-linux,
	linux-arm-kernel, Catalin Marinas, Yonghong Song,
	Andrii Nakryiko, Xi Wang, Luke Nelson, Torsten Duwe, KP Singh,
	Thomas Gleixner, Allison Randal, netdev, linux-kernel,
	Enrico Weigelt, Martin KaFai Lau, Christoffer Dall

On 5/7/20 3:05 AM, Luke Nelson wrote:
> The current code for BPF_{ADD,SUB} BPF_K loads the BPF immediate to a
> temporary register before performing the addition/subtraction. Similarly,
> BPF_JMP BPF_K cases load the immediate to a temporary register before
> comparison.
> 
> This patch introduces optimizations that use arm64 immediate add, sub,
> cmn, or cmp instructions when the BPF immediate fits. If the immediate
> does not fit, it falls back to using a temporary register.
> 
> Example of generated code for BPF_ALU64_IMM(BPF_ADD, R0, 2):
> 
> without optimization:
> 
>    24: mov x10, #0x2
>    28: add x7, x7, x10
> 
> with optimization:
> 
>    24: add x7, x7, #0x2
> 
> The code could use A64_{ADD,SUB}_I directly and check if it returns
> AARCH64_BREAK_FAULT, similar to how logical immediates are handled.
> However, aarch64_insn_gen_add_sub_imm from insn.c prints error messages
> when the immediate does not fit, and it's simpler to check if the
> immediate fits ahead of time.
> 
> Co-developed-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Xi Wang <xi.wang@gmail.com>
> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com>

Same here:

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

Thanks!

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
  2020-05-07  9:12       ` Marc Zyngier
@ 2020-05-07 21:48         ` Luke Nelson
  -1 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07 21:48 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Luke Nelson, bpf, Xi Wang, Catalin Marinas, Daniel Borkmann,
	Alexei Starovoitov, Zi Shen Lim, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Mark Rutland, Greg Kroah-Hartman, Thomas Gleixner,
	linux-arm-kernel, open list, Networking, clang-built-linux

Hi everyone,

Thanks for the comments! Responses below:

> It's a bit grotty spreading the checks out now. How about we tweak things
> slightly along the lines of:
>
>
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 4a9e773a177f..60ec788eaf33 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> [...]

Agreed; this new version looks much cleaner. I re-ran all the tests /
verification and everything seems good. Would you like me to submit a
v2 of this series with this new code?


>> We tested the new code against llvm-mc with all 1,302 encodable 32-bit
>> logical immediates and all 5,334 encodable 64-bit logical immediates.
>
> That, on its own, is awesome information. Do you have any pointer on
> how to set this up?

Sure! The process of running the tests is pretty involved, but I'll
describe it below and give some links here.

We found the bugs in insn.c while adding support for logical immediates
to the BPF JIT and verifying the changes with our tool, Serval:
https://github.com/uw-unsat/serval-bpf. The basic idea for how we tested /
verified logical immediates is the following:

First, we have a Python script [1] for generating every encodable
logical immediate and the corresponding instruction fields that encode
that immediate. The script validates the list by checking that llvm-mc
decodes each instruction back to the expected immediate.

Next, we use the list [2] from the first step to check a Racket
translation [3] of the logical immediate encoding function in insn.c.
We found the second mask bug by noticing that some (encodable) 32-bit
immediates were being rejected by the encoding function.

Last, we use the Racket translation of the encoding function to verify
the correctness of the BPF JIT implementation [4], i.e., the JIT
correctly compiles BPF_{AND,OR,XOR,JSET} BPF_K instructions to arm64
instructions with equivalent semantics. We found the first bug as the
verifier complained that the function was producing invalid encodings
for 32-bit -1 immediates, and we were able to reproduce a kernel crash
using the BPF tests.

We manually translated the C code to Racket because our verifier, Serval,
currently only works on Racket code.

Thanks again,
- Luke

[1]: https://github.com/uw-unsat/serval-bpf/blob/00838174659034e9527e67d9eccd2def2354cec6/racket/test/arm64/gen-logic-imm.py
[2]: https://github.com/uw-unsat/serval-bpf/blob/00838174659034e9527e67d9eccd2def2354cec6/racket/test/arm64/logic-imm.rkt
[3]: https://github.com/uw-unsat/serval-bpf/blob/00838174659034e9527e67d9eccd2def2354cec6/racket/arm64/insn.rkt#L66
[4]: https://github.com/uw-unsat/serval-bpf/blob/00838174659034e9527e67d9eccd2def2354cec6/racket/arm64/bpf_jit_comp.rkt

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
@ 2020-05-07 21:48         ` Luke Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-07 21:48 UTC (permalink / raw)
  To: Marc Zyngier, Will Deacon
  Cc: Mark Rutland, Song Liu, Daniel Borkmann, Greg Kroah-Hartman,
	Catalin Marinas, John Fastabend, Alexei Starovoitov, open list,
	Networking, clang-built-linux, Luke Nelson, Thomas Gleixner,
	linux-arm-kernel, Zi Shen Lim, KP Singh, Yonghong Song, bpf,
	Andrii Nakryiko, Martin KaFai Lau, Xi Wang

Hi everyone,

Thanks for the comments! Responses below:

> It's a bit grotty spreading the checks out now. How about we tweak things
> slightly along the lines of:
>
>
> diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> index 4a9e773a177f..60ec788eaf33 100644
> --- a/arch/arm64/kernel/insn.c
> +++ b/arch/arm64/kernel/insn.c
> [...]

Agreed; this new version looks much cleaner. I re-ran all the tests /
verification and everything seems good. Would you like me to submit a
v2 of this series with this new code?


>> We tested the new code against llvm-mc with all 1,302 encodable 32-bit
>> logical immediates and all 5,334 encodable 64-bit logical immediates.
>
> That, on its own, is awesome information. Do you have any pointer on
> how to set this up?

Sure! The process of running the tests is pretty involved, but I'll
describe it below and give some links here.

We found the bugs in insn.c while adding support for logical immediates
to the BPF JIT and verifying the changes with our tool, Serval:
https://github.com/uw-unsat/serval-bpf. The basic idea for how we tested /
verified logical immediates is the following:

First, we have a Python script [1] for generating every encodable
logical immediate and the corresponding instruction fields that encode
that immediate. The script validates the list by checking that llvm-mc
decodes each instruction back to the expected immediate.

Next, we use the list [2] from the first step to check a Racket
translation [3] of the logical immediate encoding function in insn.c.
We found the second mask bug by noticing that some (encodable) 32-bit
immediates were being rejected by the encoding function.

Last, we use the Racket translation of the encoding function to verify
the correctness of the BPF JIT implementation [4], i.e., the JIT
correctly compiles BPF_{AND,OR,XOR,JSET} BPF_K instructions to arm64
instructions with equivalent semantics. We found the first bug as the
verifier complained that the function was producing invalid encodings
for 32-bit -1 immediates, and we were able to reproduce a kernel crash
using the BPF tests.

We manually translated the C code to Racket because our verifier, Serval,
currently only works on Racket code.

Thanks again,
- Luke

[1]: https://github.com/uw-unsat/serval-bpf/blob/00838174659034e9527e67d9eccd2def2354cec6/racket/test/arm64/gen-logic-imm.py
[2]: https://github.com/uw-unsat/serval-bpf/blob/00838174659034e9527e67d9eccd2def2354cec6/racket/test/arm64/logic-imm.rkt
[3]: https://github.com/uw-unsat/serval-bpf/blob/00838174659034e9527e67d9eccd2def2354cec6/racket/arm64/insn.rkt#L66
[4]: https://github.com/uw-unsat/serval-bpf/blob/00838174659034e9527e67d9eccd2def2354cec6/racket/arm64/bpf_jit_comp.rkt

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
  2020-05-07 21:48         ` Luke Nelson
@ 2020-05-08 11:47           ` Will Deacon
  -1 siblings, 0 replies; 24+ messages in thread
From: Will Deacon @ 2020-05-08 11:47 UTC (permalink / raw)
  To: Luke Nelson
  Cc: Marc Zyngier, Luke Nelson, bpf, Xi Wang, Catalin Marinas,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Mark Rutland, Greg Kroah-Hartman,
	Thomas Gleixner, linux-arm-kernel, open list, Networking,
	clang-built-linux

On Thu, May 07, 2020 at 02:48:07PM -0700, Luke Nelson wrote:
> Thanks for the comments! Responses below:
> 
> > It's a bit grotty spreading the checks out now. How about we tweak things
> > slightly along the lines of:
> >
> >
> > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > index 4a9e773a177f..60ec788eaf33 100644
> > --- a/arch/arm64/kernel/insn.c
> > +++ b/arch/arm64/kernel/insn.c
> > [...]
> 
> Agreed; this new version looks much cleaner. I re-ran all the tests /
> verification and everything seems good. Would you like me to submit a
> v2 of this series with this new code?

Yes, please! And please include Daniel's acks on the BPF changes too. It's a
public holiday here in the UK today, but I can pick this up next week.

> >> We tested the new code against llvm-mc with all 1,302 encodable 32-bit
> >> logical immediates and all 5,334 encodable 64-bit logical immediates.
> >
> > That, on its own, is awesome information. Do you have any pointer on
> > how to set this up?
> 
> Sure! The process of running the tests is pretty involved, but I'll
> describe it below and give some links here.
> 
> We found the bugs in insn.c while adding support for logical immediates
> to the BPF JIT and verifying the changes with our tool, Serval:
> https://github.com/uw-unsat/serval-bpf. The basic idea for how we tested /
> verified logical immediates is the following:
> 
> First, we have a Python script [1] for generating every encodable
> logical immediate and the corresponding instruction fields that encode
> that immediate. The script validates the list by checking that llvm-mc
> decodes each instruction back to the expected immediate.
> 
> Next, we use the list [2] from the first step to check a Racket
> translation [3] of the logical immediate encoding function in insn.c.
> We found the second mask bug by noticing that some (encodable) 32-bit
> immediates were being rejected by the encoding function.
> 
> Last, we use the Racket translation of the encoding function to verify
> the correctness of the BPF JIT implementation [4], i.e., the JIT
> correctly compiles BPF_{AND,OR,XOR,JSET} BPF_K instructions to arm64
> instructions with equivalent semantics. We found the first bug as the
> verifier complained that the function was producing invalid encodings
> for 32-bit -1 immediates, and we were able to reproduce a kernel crash
> using the BPF tests.
> 
> We manually translated the C code to Racket because our verifier, Serval,
> currently only works on Racket code.

Nice! Two things:

(1) I really think you should give a talk on this at a Linux conference
(2) Did you look at any instruction generation functions other than the
    logical immediate encoding function?

Cheers,

Will

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
@ 2020-05-08 11:47           ` Will Deacon
  0 siblings, 0 replies; 24+ messages in thread
From: Will Deacon @ 2020-05-08 11:47 UTC (permalink / raw)
  To: Luke Nelson
  Cc: Mark Rutland, Song Liu, clang-built-linux, Daniel Borkmann,
	Greg Kroah-Hartman, Marc Zyngier, John Fastabend,
	Alexei Starovoitov, open list, Networking, KP Singh, Luke Nelson,
	Thomas Gleixner, linux-arm-kernel, Catalin Marinas, Zi Shen Lim,
	Yonghong Song, bpf, Andrii Nakryiko, Martin KaFai Lau, Xi Wang

On Thu, May 07, 2020 at 02:48:07PM -0700, Luke Nelson wrote:
> Thanks for the comments! Responses below:
> 
> > It's a bit grotty spreading the checks out now. How about we tweak things
> > slightly along the lines of:
> >
> >
> > diff --git a/arch/arm64/kernel/insn.c b/arch/arm64/kernel/insn.c
> > index 4a9e773a177f..60ec788eaf33 100644
> > --- a/arch/arm64/kernel/insn.c
> > +++ b/arch/arm64/kernel/insn.c
> > [...]
> 
> Agreed; this new version looks much cleaner. I re-ran all the tests /
> verification and everything seems good. Would you like me to submit a
> v2 of this series with this new code?

Yes, please! And please include Daniel's acks on the BPF changes too. It's a
public holiday here in the UK today, but I can pick this up next week.

> >> We tested the new code against llvm-mc with all 1,302 encodable 32-bit
> >> logical immediates and all 5,334 encodable 64-bit logical immediates.
> >
> > That, on its own, is awesome information. Do you have any pointer on
> > how to set this up?
> 
> Sure! The process of running the tests is pretty involved, but I'll
> describe it below and give some links here.
> 
> We found the bugs in insn.c while adding support for logical immediates
> to the BPF JIT and verifying the changes with our tool, Serval:
> https://github.com/uw-unsat/serval-bpf. The basic idea for how we tested /
> verified logical immediates is the following:
> 
> First, we have a Python script [1] for generating every encodable
> logical immediate and the corresponding instruction fields that encode
> that immediate. The script validates the list by checking that llvm-mc
> decodes each instruction back to the expected immediate.
> 
> Next, we use the list [2] from the first step to check a Racket
> translation [3] of the logical immediate encoding function in insn.c.
> We found the second mask bug by noticing that some (encodable) 32-bit
> immediates were being rejected by the encoding function.
> 
> Last, we use the Racket translation of the encoding function to verify
> the correctness of the BPF JIT implementation [4], i.e., the JIT
> correctly compiles BPF_{AND,OR,XOR,JSET} BPF_K instructions to arm64
> instructions with equivalent semantics. We found the first bug as the
> verifier complained that the function was producing invalid encodings
> for 32-bit -1 immediates, and we were able to reproduce a kernel crash
> using the BPF tests.
> 
> We manually translated the C code to Racket because our verifier, Serval,
> currently only works on Racket code.

Nice! Two things:

(1) I really think you should give a talk on this at a Linux conference
(2) Did you look at any instruction generation functions other than the
    logical immediate encoding function?

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
  2020-05-08 11:47           ` Will Deacon
@ 2020-05-08 18:12             ` Luke Nelson
  -1 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-08 18:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Marc Zyngier, Luke Nelson, bpf, Xi Wang, Catalin Marinas,
	Daniel Borkmann, Alexei Starovoitov, Zi Shen Lim,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	John Fastabend, KP Singh, Mark Rutland, Greg Kroah-Hartman,
	Thomas Gleixner, linux-arm-kernel, open list, Networking,
	clang-built-linux

Hi Will,

On Fri, May 8, 2020 at 4:47 AM Will Deacon <will@kernel.org> wrote:
>
> Yes, please! And please include Daniel's acks on the BPF changes too. It's a
> public holiday here in the UK today, but I can pick this up next week.

Thanks!

> Nice! Two things:
>
> (1) I really think you should give a talk on this at a Linux conference

That would be great, I'd be happy to give a talk on our verification
work some time in the future :)

> (2) Did you look at any instruction generation functions other than the
>     logical immediate encoding function?

Other instruction generation functions are on our todo list, but we
haven't got a chance to spend more time on them yet.

Thanks again,

- Luke

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates
@ 2020-05-08 18:12             ` Luke Nelson
  0 siblings, 0 replies; 24+ messages in thread
From: Luke Nelson @ 2020-05-08 18:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Song Liu, clang-built-linux, Daniel Borkmann,
	Greg Kroah-Hartman, Marc Zyngier, John Fastabend,
	Alexei Starovoitov, open list, Networking, KP Singh, Luke Nelson,
	Thomas Gleixner, linux-arm-kernel, Catalin Marinas, Zi Shen Lim,
	Yonghong Song, bpf, Andrii Nakryiko, Martin KaFai Lau, Xi Wang

Hi Will,

On Fri, May 8, 2020 at 4:47 AM Will Deacon <will@kernel.org> wrote:
>
> Yes, please! And please include Daniel's acks on the BPF changes too. It's a
> public holiday here in the UK today, but I can pick this up next week.

Thanks!

> Nice! Two things:
>
> (1) I really think you should give a talk on this at a Linux conference

That would be great, I'd be happy to give a talk on our verification
work some time in the future :)

> (2) Did you look at any instruction generation functions other than the
>     logical immediate encoding function?

Other instruction generation functions are on our todo list, but we
haven't got a chance to spend more time on them yet.

Thanks again,

- Luke

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-05-08 18:12 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-07  1:05 [RFC PATCH bpf-next 0/3] arm64 BPF JIT Optimizations Luke Nelson
2020-05-07  1:05 ` Luke Nelson
2020-05-07  1:05 ` [RFC PATCH bpf-next 1/3] arm64: insn: Fix two bugs in encoding 32-bit logical immediates Luke Nelson
2020-05-07  1:05   ` Luke Nelson
2020-05-07  8:19   ` Marc Zyngier
2020-05-07  8:19     ` Marc Zyngier
2020-05-07  8:29   ` Will Deacon
2020-05-07  8:29     ` Will Deacon
2020-05-07  9:12     ` Marc Zyngier
2020-05-07  9:12       ` Marc Zyngier
2020-05-07 21:48       ` Luke Nelson
2020-05-07 21:48         ` Luke Nelson
2020-05-08 11:47         ` Will Deacon
2020-05-08 11:47           ` Will Deacon
2020-05-08 18:12           ` Luke Nelson
2020-05-08 18:12             ` Luke Nelson
2020-05-07  1:05 ` [RFC PATCH bpf-next 2/3] bpf, arm64: Optimize AND,OR,XOR,JSET BPF_K using arm64 " Luke Nelson
2020-05-07  1:05   ` [RFC PATCH bpf-next 2/3] bpf, arm64: Optimize AND, OR, XOR, JSET " Luke Nelson
2020-05-07 20:19   ` [RFC PATCH bpf-next 2/3] bpf, arm64: Optimize AND,OR,XOR,JSET " Daniel Borkmann
2020-05-07 20:19     ` Daniel Borkmann
2020-05-07  1:05 ` [RFC PATCH bpf-next 3/3] bpf, arm64: Optimize ADD,SUB,JMP BPF_K using arm64 add/sub immediates Luke Nelson
2020-05-07  1:05   ` [RFC PATCH bpf-next 3/3] bpf, arm64: Optimize ADD, SUB, JMP " Luke Nelson
2020-05-07 20:22   ` [RFC PATCH bpf-next 3/3] bpf, arm64: Optimize ADD,SUB,JMP " Daniel Borkmann
2020-05-07 20:22     ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.