bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.4 17/74] bpf: Check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto
       [not found] <20210706112502.2064236-1-sashal@kernel.org>
@ 2021-07-06 11:24 ` Sasha Levin
  2021-07-06 11:24 ` [PATCH AUTOSEL 5.4 47/74] bpf: Fix up register-based shifts in interpreter to silence KUBSAN Sasha Levin
  2021-07-06 11:24 ` [PATCH AUTOSEL 5.4 64/74] media, bpf: Do not copy more entries than user space requested Sasha Levin
  2 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2021-07-06 11:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Dongseok Yi, Daniel Borkmann, Willem de Bruijn, Sasha Levin, netdev, bpf

From: Dongseok Yi <dseok.yi@samsung.com>

[ Upstream commit fa7b83bf3b156c767f3e4a25bbf3817b08f3ff8e ]

In the forwarding path GRO -> BPF 6 to 4 -> GSO for TCP traffic, the
coalesced packet payload can be > MSS, but < MSS + 20.

bpf_skb_proto_6_to_4() will upgrade the MSS and it can be > the payload
length. After then tcp_gso_segment checks for the payload length if it
is <= MSS. The condition is causing the packet to be dropped.

tcp_gso_segment():
        [...]
        mss = skb_shinfo(skb)->gso_size;
        if (unlikely(skb->len <= mss))
                goto out;
        [...]

Allow to upgrade/downgrade MSS only when BPF_F_ADJ_ROOM_FIXED_GSO is
not set.

Signed-off-by: Dongseok Yi <dseok.yi@samsung.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/1620804453-57566-1-git-send-email-dseok.yi@samsung.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/core/filter.c | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 108bcf600052..823e77846525 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2833,7 +2833,7 @@ static int bpf_skb_net_hdr_pop(struct sk_buff *skb, u32 off, u32 len)
 	return ret;
 }
 
-static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
+static int bpf_skb_proto_4_to_6(struct sk_buff *skb, u64 flags)
 {
 	const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
 	u32 off = skb_mac_header_len(skb);
@@ -2862,7 +2862,9 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
 		}
 
 		/* Due to IPv6 header, MSS needs to be downgraded. */
-		skb_decrease_gso_size(shinfo, len_diff);
+		if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
+			skb_decrease_gso_size(shinfo, len_diff);
+
 		/* Header must be checked, and gso_segs recomputed. */
 		shinfo->gso_type |= SKB_GSO_DODGY;
 		shinfo->gso_segs = 0;
@@ -2874,7 +2876,7 @@ static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
 	return 0;
 }
 
-static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
+static int bpf_skb_proto_6_to_4(struct sk_buff *skb, u64 flags)
 {
 	const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
 	u32 off = skb_mac_header_len(skb);
@@ -2903,7 +2905,9 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
 		}
 
 		/* Due to IPv4 header, MSS can be upgraded. */
-		skb_increase_gso_size(shinfo, len_diff);
+		if (!(flags & BPF_F_ADJ_ROOM_FIXED_GSO))
+			skb_increase_gso_size(shinfo, len_diff);
+
 		/* Header must be checked, and gso_segs recomputed. */
 		shinfo->gso_type |= SKB_GSO_DODGY;
 		shinfo->gso_segs = 0;
@@ -2915,17 +2919,17 @@ static int bpf_skb_proto_6_to_4(struct sk_buff *skb)
 	return 0;
 }
 
-static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto)
+static int bpf_skb_proto_xlat(struct sk_buff *skb, __be16 to_proto, u64 flags)
 {
 	__be16 from_proto = skb->protocol;
 
 	if (from_proto == htons(ETH_P_IP) &&
 	      to_proto == htons(ETH_P_IPV6))
-		return bpf_skb_proto_4_to_6(skb);
+		return bpf_skb_proto_4_to_6(skb, flags);
 
 	if (from_proto == htons(ETH_P_IPV6) &&
 	      to_proto == htons(ETH_P_IP))
-		return bpf_skb_proto_6_to_4(skb);
+		return bpf_skb_proto_6_to_4(skb, flags);
 
 	return -ENOTSUPP;
 }
@@ -2935,7 +2939,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
 {
 	int ret;
 
-	if (unlikely(flags))
+	if (unlikely(flags & ~(BPF_F_ADJ_ROOM_FIXED_GSO)))
 		return -EINVAL;
 
 	/* General idea is that this helper does the basic groundwork
@@ -2955,7 +2959,7 @@ BPF_CALL_3(bpf_skb_change_proto, struct sk_buff *, skb, __be16, proto,
 	 * that. For offloads, we mark packet as dodgy, so that headers
 	 * need to be verified first.
 	 */
-	ret = bpf_skb_proto_xlat(skb, proto);
+	ret = bpf_skb_proto_xlat(skb, proto, flags);
 	bpf_compute_data_pointers(skb);
 	return ret;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH AUTOSEL 5.4 47/74] bpf: Fix up register-based shifts in interpreter to silence KUBSAN
       [not found] <20210706112502.2064236-1-sashal@kernel.org>
  2021-07-06 11:24 ` [PATCH AUTOSEL 5.4 17/74] bpf: Check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto Sasha Levin
@ 2021-07-06 11:24 ` Sasha Levin
  2021-07-06 11:24 ` [PATCH AUTOSEL 5.4 64/74] media, bpf: Do not copy more entries than user space requested Sasha Levin
  2 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2021-07-06 11:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Daniel Borkmann, syzbot+bed360704c521841c85d, Kurt Manucredo,
	Eric Biggers, Alexei Starovoitov, Andrii Nakryiko, Edward Cree,
	Sasha Levin, netdev, bpf

From: Daniel Borkmann <daniel@iogearbox.net>

[ Upstream commit 28131e9d933339a92f78e7ab6429f4aaaa07061c ]

syzbot reported a shift-out-of-bounds that KUBSAN observed in the
interpreter:

  [...]
  UBSAN: shift-out-of-bounds in kernel/bpf/core.c:1420:2
  shift exponent 255 is too large for 64-bit type 'long long unsigned int'
  CPU: 1 PID: 11097 Comm: syz-executor.4 Not tainted 5.12.0-rc2-syzkaller #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
  Call Trace:
   __dump_stack lib/dump_stack.c:79 [inline]
   dump_stack+0x141/0x1d7 lib/dump_stack.c:120
   ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
   __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:327
   ___bpf_prog_run.cold+0x19/0x56c kernel/bpf/core.c:1420
   __bpf_prog_run32+0x8f/0xd0 kernel/bpf/core.c:1735
   bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
   bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline]
   bpf_prog_run_clear_cb include/linux/filter.h:755 [inline]
   run_filter+0x1a1/0x470 net/packet/af_packet.c:2031
   packet_rcv+0x313/0x13e0 net/packet/af_packet.c:2104
   dev_queue_xmit_nit+0x7c2/0xa90 net/core/dev.c:2387
   xmit_one net/core/dev.c:3588 [inline]
   dev_hard_start_xmit+0xad/0x920 net/core/dev.c:3609
   __dev_queue_xmit+0x2121/0x2e00 net/core/dev.c:4182
   __bpf_tx_skb net/core/filter.c:2116 [inline]
   __bpf_redirect_no_mac net/core/filter.c:2141 [inline]
   __bpf_redirect+0x548/0xc80 net/core/filter.c:2164
   ____bpf_clone_redirect net/core/filter.c:2448 [inline]
   bpf_clone_redirect+0x2ae/0x420 net/core/filter.c:2420
   ___bpf_prog_run+0x34e1/0x77d0 kernel/bpf/core.c:1523
   __bpf_prog_run512+0x99/0xe0 kernel/bpf/core.c:1737
   bpf_dispatcher_nop_func include/linux/bpf.h:644 [inline]
   bpf_test_run+0x3ed/0xc50 net/bpf/test_run.c:50
   bpf_prog_test_run_skb+0xabc/0x1c50 net/bpf/test_run.c:582
   bpf_prog_test_run kernel/bpf/syscall.c:3127 [inline]
   __do_sys_bpf+0x1ea9/0x4f00 kernel/bpf/syscall.c:4406
   do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
   entry_SYSCALL_64_after_hwframe+0x44/0xae
  [...]

Generally speaking, KUBSAN reports from the kernel should be fixed.
However, in case of BPF, this particular report caused concerns since
the large shift is not wrong from BPF point of view, just undefined.
In the verifier, K-based shifts that are >= {64,32} (depending on the
bitwidth of the instruction) are already rejected. The register-based
cases were not given their content might not be known at verification
time. Ideas such as verifier instruction rewrite with an additional
AND instruction for the source register were brought up, but regularly
rejected due to the additional runtime overhead they incur.

As Edward Cree rightly put it:

  Shifts by more than insn bitness are legal in the BPF ISA; they are
  implementation-defined behaviour [of the underlying architecture],
  rather than UB, and have been made legal for performance reasons.
  Each of the JIT backends compiles the BPF shift operations to machine
  instructions which produce implementation-defined results in such a
  case; the resulting contents of the register may be arbitrary but
  program behaviour as a whole remains defined.

  Guard checks in the fast path (i.e. affecting JITted code) will thus
  not be accepted.

  The case of division by zero is not truly analogous here, as division
  instructions on many of the JIT-targeted architectures will raise a
  machine exception / fault on division by zero, whereas (to the best
  of my knowledge) none will do so on an out-of-bounds shift.

Given the KUBSAN report only affects the BPF interpreter, but not JITs,
one solution is to add the ANDs with 63 or 31 into ___bpf_prog_run().
That would make the shifts defined, and thus shuts up KUBSAN, and the
compiler would optimize out the AND on any CPU that interprets the shift
amounts modulo the width anyway (e.g., confirmed from disassembly that
on x86-64 and arm64 the generated interpreter code is the same before
and after this fix).

The BPF interpreter is slow path, and most likely compiled out anyway
as distros select BPF_JIT_ALWAYS_ON to avoid speculative execution of
BPF instructions by the interpreter. Given the main argument was to
avoid sacrificing performance, the fact that the AND is optimized away
from compiler for mainstream archs helps as well as a solution moving
forward. Also add a comment on LSH/RSH/ARSH translation for JIT authors
to provide guidance when they see the ___bpf_prog_run() interpreter
code and use it as a model for a new JIT backend.

Reported-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
Reported-by: Kurt Manucredo <fuzzybritches0@gmail.com>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Co-developed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: syzbot+bed360704c521841c85d@syzkaller.appspotmail.com
Cc: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/bpf/0000000000008f912605bd30d5d7@google.com
Link: https://lore.kernel.org/bpf/bac16d8d-c174-bdc4-91bd-bfa62b410190@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 kernel/bpf/core.c | 61 +++++++++++++++++++++++++++++++++--------------
 1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 56bc96f5ad20..323913ba13b3 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1321,29 +1321,54 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
 select_insn:
 	goto *jumptable[insn->code];
 
-	/* ALU */
-#define ALU(OPCODE, OP)			\
-	ALU64_##OPCODE##_X:		\
-		DST = DST OP SRC;	\
-		CONT;			\
-	ALU_##OPCODE##_X:		\
-		DST = (u32) DST OP (u32) SRC;	\
-		CONT;			\
-	ALU64_##OPCODE##_K:		\
-		DST = DST OP IMM;		\
-		CONT;			\
-	ALU_##OPCODE##_K:		\
-		DST = (u32) DST OP (u32) IMM;	\
+	/* Explicitly mask the register-based shift amounts with 63 or 31
+	 * to avoid undefined behavior. Normally this won't affect the
+	 * generated code, for example, in case of native 64 bit archs such
+	 * as x86-64 or arm64, the compiler is optimizing the AND away for
+	 * the interpreter. In case of JITs, each of the JIT backends compiles
+	 * the BPF shift operations to machine instructions which produce
+	 * implementation-defined results in such a case; the resulting
+	 * contents of the register may be arbitrary, but program behaviour
+	 * as a whole remains defined. In other words, in case of JIT backends,
+	 * the AND must /not/ be added to the emitted LSH/RSH/ARSH translation.
+	 */
+	/* ALU (shifts) */
+#define SHT(OPCODE, OP)					\
+	ALU64_##OPCODE##_X:				\
+		DST = DST OP (SRC & 63);		\
+		CONT;					\
+	ALU_##OPCODE##_X:				\
+		DST = (u32) DST OP ((u32) SRC & 31);	\
+		CONT;					\
+	ALU64_##OPCODE##_K:				\
+		DST = DST OP IMM;			\
+		CONT;					\
+	ALU_##OPCODE##_K:				\
+		DST = (u32) DST OP (u32) IMM;		\
+		CONT;
+	/* ALU (rest) */
+#define ALU(OPCODE, OP)					\
+	ALU64_##OPCODE##_X:				\
+		DST = DST OP SRC;			\
+		CONT;					\
+	ALU_##OPCODE##_X:				\
+		DST = (u32) DST OP (u32) SRC;		\
+		CONT;					\
+	ALU64_##OPCODE##_K:				\
+		DST = DST OP IMM;			\
+		CONT;					\
+	ALU_##OPCODE##_K:				\
+		DST = (u32) DST OP (u32) IMM;		\
 		CONT;
-
 	ALU(ADD,  +)
 	ALU(SUB,  -)
 	ALU(AND,  &)
 	ALU(OR,   |)
-	ALU(LSH, <<)
-	ALU(RSH, >>)
 	ALU(XOR,  ^)
 	ALU(MUL,  *)
+	SHT(LSH, <<)
+	SHT(RSH, >>)
+#undef SHT
 #undef ALU
 	ALU_NEG:
 		DST = (u32) -DST;
@@ -1368,13 +1393,13 @@ static u64 ___bpf_prog_run(u64 *regs, const struct bpf_insn *insn, u64 *stack)
 		insn++;
 		CONT;
 	ALU_ARSH_X:
-		DST = (u64) (u32) (((s32) DST) >> SRC);
+		DST = (u64) (u32) (((s32) DST) >> (SRC & 31));
 		CONT;
 	ALU_ARSH_K:
 		DST = (u64) (u32) (((s32) DST) >> IMM);
 		CONT;
 	ALU64_ARSH_X:
-		(*(s64 *) &DST) >>= SRC;
+		(*(s64 *) &DST) >>= (SRC & 63);
 		CONT;
 	ALU64_ARSH_K:
 		(*(s64 *) &DST) >>= IMM;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [PATCH AUTOSEL 5.4 64/74] media, bpf: Do not copy more entries than user space requested
       [not found] <20210706112502.2064236-1-sashal@kernel.org>
  2021-07-06 11:24 ` [PATCH AUTOSEL 5.4 17/74] bpf: Check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto Sasha Levin
  2021-07-06 11:24 ` [PATCH AUTOSEL 5.4 47/74] bpf: Fix up register-based shifts in interpreter to silence KUBSAN Sasha Levin
@ 2021-07-06 11:24 ` Sasha Levin
  2 siblings, 0 replies; 3+ messages in thread
From: Sasha Levin @ 2021-07-06 11:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sean Young, Daniel Borkmann, Sasha Levin, linux-media, netdev, bpf

From: Sean Young <sean@mess.org>

[ Upstream commit 647d446d66e493d23ca1047fa8492b0269674530 ]

The syscall bpf(BPF_PROG_QUERY, &attr) should use the prog_cnt field to
see how many entries user space provided and return ENOSPC if there are
more programs than that. Before this patch, this is not checked and
ENOSPC is never returned.

Note that one lirc device is limited to 64 bpf programs, and user space
I'm aware of -- ir-keytable -- always gives enough space for 64 entries
already. However, we should not copy program ids than are requested.

Signed-off-by: Sean Young <sean@mess.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20210623213754.632-1-sean@mess.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/media/rc/bpf-lirc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/media/rc/bpf-lirc.c b/drivers/media/rc/bpf-lirc.c
index 0a0ce620e4a2..d5f839fdcde7 100644
--- a/drivers/media/rc/bpf-lirc.c
+++ b/drivers/media/rc/bpf-lirc.c
@@ -329,7 +329,8 @@ int lirc_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr)
 	}
 
 	if (attr->query.prog_cnt != 0 && prog_ids && cnt)
-		ret = bpf_prog_array_copy_to_user(progs, prog_ids, cnt);
+		ret = bpf_prog_array_copy_to_user(progs, prog_ids,
+						  attr->query.prog_cnt);
 
 unlock:
 	mutex_unlock(&ir_raw_handler_lock);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-07-06 12:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20210706112502.2064236-1-sashal@kernel.org>
2021-07-06 11:24 ` [PATCH AUTOSEL 5.4 17/74] bpf: Check for BPF_F_ADJ_ROOM_FIXED_GSO when bpf_skb_change_proto Sasha Levin
2021-07-06 11:24 ` [PATCH AUTOSEL 5.4 47/74] bpf: Fix up register-based shifts in interpreter to silence KUBSAN Sasha Levin
2021-07-06 11:24 ` [PATCH AUTOSEL 5.4 64/74] media, bpf: Do not copy more entries than user space requested Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).