All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC bpf-next 0/3] bpf_sk_lookup.remote_port fixes
@ 2022-02-22 18:25 Ilya Leoshkevich
  2022-02-22 18:25 ` [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets Ilya Leoshkevich
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-02-22 18:25 UTC (permalink / raw)
  To: Jakub Sitnicki, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Heiko Carstens, Vasily Gorbik, Ilya Leoshkevich

Hi,

These are my current patches for the discussion from [1] and [2].

With these changes tests pass on both x86_64 and s390x, but there are
quite a few things I'm not sure about, hence it's just an RFC:

- Can moving size < target_size check lead to incorrect shifts in some
  corner cases that I missed?

- Are different layouts of remote_port on little- and big-endian a bug
  or a feature? Do we want things to be this way? If not, are we bound
  by the ABI anyway?

- Is there any way to make uapi changes look nicer? A wall of nested
  structs, unions and ifdefs in an otherwise clean struct definition
  isn't looking particularly good.

- What is the Officially Approved way to access the remote_port field
  from C code? I'm leaning towards bpf_ntohs((__u16)remote_port), like
  in [3], and I adjusted the test accordingly.

[1] https://lore.kernel.org/bpf/CAEf4BzaRNLw9_EnaMo5e46CdEkzbJiVU3j9oxnsemBKjNFf3wQ@mail.gmail.com/
[2] https://lore.kernel.org/bpf/20220221180358.169101-1-jakub@cloudflare.com/
[3] https://lore.kernel.org/bpf/20220113070245.791577-1-imagedong@tencent.com/

Ilya Leoshkevich (3):
  bpf: Fix certain narrow loads with offsets
  bpf: Fix bpf_sk_lookup.remote_port on big-endian
  selftests/bpf: Adapt bpf_sk_lookup.remote_port loads

 include/uapi/linux/bpf.h                        | 17 +++++++++++++++--
 kernel/bpf/verifier.c                           | 14 +++++++++-----
 net/core/filter.c                               |  5 ++---
 tools/include/uapi/linux/bpf.h                  | 17 +++++++++++++++--
 .../selftests/bpf/progs/test_sk_lookup.c        | 17 +++++++++++------
 5 files changed, 52 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
  2022-02-22 18:25 [PATCH RFC bpf-next 0/3] bpf_sk_lookup.remote_port fixes Ilya Leoshkevich
@ 2022-02-22 18:25 ` Ilya Leoshkevich
  2022-03-08 15:01   ` Jakub Sitnicki
  2022-02-22 18:25 ` [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian Ilya Leoshkevich
  2022-02-22 18:25 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Adapt bpf_sk_lookup.remote_port loads Ilya Leoshkevich
  2 siblings, 1 reply; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-02-22 18:25 UTC (permalink / raw)
  To: Jakub Sitnicki, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Heiko Carstens, Vasily Gorbik, Ilya Leoshkevich

Verifier treats bpf_sk_lookup.remote_port as a 32-bit field for
backward compatibility, regardless of what the uapi headers say.
This field is mapped onto the 16-bit bpf_sk_lookup_kern.sport field.
Therefore, accessing the most significant 16 bits of
bpf_sk_lookup.remote_port must produce 0, which is currently not
the case.

The problem is that narrow loads with offset - commit 46f53a65d2de
("bpf: Allow narrow loads with offset > 0"), don't play nicely with
the masking optimization - commit 239946314e57 ("bpf: possibly avoid
extra masking for narrower load in verifier"). In particular, when we
suppress extra masking, we suppress shifting as well, which is not
correct.

Fix by moving the masking suppression check to BPF_AND generation.

Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 kernel/bpf/verifier.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d7473fee247c..195f2e9b5a47 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12848,7 +12848,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			return -EINVAL;
 		}
 
-		if (is_narrower_load && size < target_size) {
+		if (is_narrower_load) {
 			u8 shift = bpf_ctx_narrow_access_offset(
 				off, size, size_default) * 8;
 			if (shift && cnt + 1 >= ARRAY_SIZE(insn_buf)) {
@@ -12860,15 +12860,19 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
 					insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
 									insn->dst_reg,
 									shift);
-				insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
-								(1 << size * 8) - 1);
+				if (size < target_size)
+					insn_buf[cnt++] = BPF_ALU32_IMM(
+						BPF_AND, insn->dst_reg,
+						(1 << size * 8) - 1);
 			} else {
 				if (shift)
 					insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
 									insn->dst_reg,
 									shift);
-				insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
-								(1ULL << size * 8) - 1);
+				if (size < target_size)
+					insn_buf[cnt++] = BPF_ALU64_IMM(
+						BPF_AND, insn->dst_reg,
+						(1ULL << size * 8) - 1);
 			}
 		}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian
  2022-02-22 18:25 [PATCH RFC bpf-next 0/3] bpf_sk_lookup.remote_port fixes Ilya Leoshkevich
  2022-02-22 18:25 ` [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets Ilya Leoshkevich
@ 2022-02-22 18:25 ` Ilya Leoshkevich
  2022-02-27  2:44   ` Alexei Starovoitov
  2022-02-22 18:25 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Adapt bpf_sk_lookup.remote_port loads Ilya Leoshkevich
  2 siblings, 1 reply; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-02-22 18:25 UTC (permalink / raw)
  To: Jakub Sitnicki, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Heiko Carstens, Vasily Gorbik, Ilya Leoshkevich

On big-endian, the port is available in the second __u16, not the first
one. Therefore, provide a big-endian-specific definition that reflects
that. Also, define remote_port_compat in order to have nicer
architecture-agnostic code in the verifier and in tests.

Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 include/uapi/linux/bpf.h       | 17 +++++++++++++++--
 net/core/filter.c              |  5 ++---
 tools/include/uapi/linux/bpf.h | 17 +++++++++++++++--
 3 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index afe3d0d7f5f2..7b0e5efa58e0 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -10,6 +10,7 @@
 
 #include <linux/types.h>
 #include <linux/bpf_common.h>
+#include <asm/byteorder.h>
 
 /* Extended instruction set based on top of classic BPF */
 
@@ -6453,8 +6454,20 @@ struct bpf_sk_lookup {
 	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
 	__u32 remote_ip4;	/* Network byte order */
 	__u32 remote_ip6[4];	/* Network byte order */
-	__be16 remote_port;	/* Network byte order */
-	__u16 :16;		/* Zero padding */
+	union {
+		struct {
+#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN)
+			__be16 remote_port;	/* Network byte order */
+			__u16 :16;		/* Zero padding */
+#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
+			__u16 :16;		/* Zero padding */
+			__be16 remote_port;	/* Network byte order */
+#else
+#error unspecified endianness
+#endif
+		};
+		__u32 remote_port_compat;
+	};
 	__u32 local_ip4;	/* Network byte order */
 	__u32 local_ip6[4];	/* Network byte order */
 	__u32 local_port;	/* Host byte order */
diff --git a/net/core/filter.c b/net/core/filter.c
index 65869fd510e8..4b247d5aebe8 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10856,8 +10856,7 @@ static bool sk_lookup_is_valid_access(int off, int size,
 	case bpf_ctx_range(struct bpf_sk_lookup, local_ip4):
 	case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]):
 	case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]):
-	case offsetof(struct bpf_sk_lookup, remote_port) ...
-	     offsetof(struct bpf_sk_lookup, local_ip4) - 1:
+	case bpf_ctx_range(struct bpf_sk_lookup, remote_port_compat):
 	case bpf_ctx_range(struct bpf_sk_lookup, local_port):
 	case bpf_ctx_range(struct bpf_sk_lookup, ingress_ifindex):
 		bpf_ctx_record_field_size(info, sizeof(__u32));
@@ -10938,7 +10937,7 @@ static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
 #endif
 		break;
 	}
-	case offsetof(struct bpf_sk_lookup, remote_port):
+	case offsetof(struct bpf_sk_lookup, remote_port_compat):
 		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
 				      bpf_target_off(struct bpf_sk_lookup_kern,
 						     sport, 2, target_size));
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index afe3d0d7f5f2..7b0e5efa58e0 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -10,6 +10,7 @@
 
 #include <linux/types.h>
 #include <linux/bpf_common.h>
+#include <asm/byteorder.h>
 
 /* Extended instruction set based on top of classic BPF */
 
@@ -6453,8 +6454,20 @@ struct bpf_sk_lookup {
 	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
 	__u32 remote_ip4;	/* Network byte order */
 	__u32 remote_ip6[4];	/* Network byte order */
-	__be16 remote_port;	/* Network byte order */
-	__u16 :16;		/* Zero padding */
+	union {
+		struct {
+#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN)
+			__be16 remote_port;	/* Network byte order */
+			__u16 :16;		/* Zero padding */
+#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
+			__u16 :16;		/* Zero padding */
+			__be16 remote_port;	/* Network byte order */
+#else
+#error unspecified endianness
+#endif
+		};
+		__u32 remote_port_compat;
+	};
 	__u32 local_ip4;	/* Network byte order */
 	__u32 local_ip6[4];	/* Network byte order */
 	__u32 local_port;	/* Host byte order */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH RFC bpf-next 3/3] selftests/bpf: Adapt bpf_sk_lookup.remote_port loads
  2022-02-22 18:25 [PATCH RFC bpf-next 0/3] bpf_sk_lookup.remote_port fixes Ilya Leoshkevich
  2022-02-22 18:25 ` [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets Ilya Leoshkevich
  2022-02-22 18:25 ` [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian Ilya Leoshkevich
@ 2022-02-22 18:25 ` Ilya Leoshkevich
  2 siblings, 0 replies; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-02-22 18:25 UTC (permalink / raw)
  To: Jakub Sitnicki, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko
  Cc: bpf, Heiko Carstens, Vasily Gorbik, Ilya Leoshkevich

- Remove some remote_port tests that are not applicable to u16.
- Use remote_port_compat for backward compatibility tests.
- Add LSB/LSW backward compatibility tests.
- u32 load produces SRC_PORT on both little- and big-endian machines,
  so check just that.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
---
 .../selftests/bpf/progs/test_sk_lookup.c        | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
index bf5b7caefdd0..7106fedfd2cc 100644
--- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
@@ -392,7 +392,7 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
 {
 	struct bpf_sock *sk;
 	int err, family;
-	__u32 val_u32;
+	__u32 *ptr_u32;
 	bool v4;
 
 	v4 = (ctx->family == AF_INET);
@@ -413,15 +413,20 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
 
 	/* Narrow loads from remote_port field. Expect SRC_PORT. */
 	if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) ||
-	    LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) ||
-	    LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3) != 0)
+	    LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff))
 		return SK_DROP;
-	if (LSW(ctx->remote_port, 0) != SRC_PORT)
+	if (ctx->remote_port != SRC_PORT)
 		return SK_DROP;
 
 	/* Load from remote_port field with zero padding (backward compatibility) */
-	val_u32 = *(__u32 *)&ctx->remote_port;
-	if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
+	ptr_u32 = &ctx->remote_port_compat;
+	if (LSB(*ptr_u32, 0) != ((SRC_PORT >> 0) & 0xff) ||
+	    LSB(*ptr_u32, 1) != ((SRC_PORT >> 8) & 0xff) ||
+	    LSB(*ptr_u32, 2) != 0 || LSB(*ptr_u32, 3) != 0)
+		return SK_DROP;
+	if (LSW(*ptr_u32, 0) != SRC_PORT || LSW(*ptr_u32, 1) != 0)
+		return SK_DROP;
+	if (*ptr_u32 != SRC_PORT)
 		return SK_DROP;
 
 	/* Narrow loads from local_port field. Expect DST_PORT. */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian
  2022-02-22 18:25 ` [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian Ilya Leoshkevich
@ 2022-02-27  2:44   ` Alexei Starovoitov
  2022-02-27 20:30     ` Jakub Sitnicki
  0 siblings, 1 reply; 18+ messages in thread
From: Alexei Starovoitov @ 2022-02-27  2:44 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Jakub Sitnicki, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Heiko Carstens, Vasily Gorbik

On Tue, Feb 22, 2022 at 07:25:58PM +0100, Ilya Leoshkevich wrote:
> On big-endian, the port is available in the second __u16, not the first
> one. Therefore, provide a big-endian-specific definition that reflects
> that. Also, define remote_port_compat in order to have nicer
> architecture-agnostic code in the verifier and in tests.
> 
> Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide")
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
>  include/uapi/linux/bpf.h       | 17 +++++++++++++++--
>  net/core/filter.c              |  5 ++---
>  tools/include/uapi/linux/bpf.h | 17 +++++++++++++++--
>  3 files changed, 32 insertions(+), 7 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index afe3d0d7f5f2..7b0e5efa58e0 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -10,6 +10,7 @@
>  
>  #include <linux/types.h>
>  #include <linux/bpf_common.h>
> +#include <asm/byteorder.h>
>  
>  /* Extended instruction set based on top of classic BPF */
>  
> @@ -6453,8 +6454,20 @@ struct bpf_sk_lookup {
>  	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
>  	__u32 remote_ip4;	/* Network byte order */
>  	__u32 remote_ip6[4];	/* Network byte order */
> -	__be16 remote_port;	/* Network byte order */
> -	__u16 :16;		/* Zero padding */
> +	union {
> +		struct {
> +#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN)
> +			__be16 remote_port;	/* Network byte order */
> +			__u16 :16;		/* Zero padding */
> +#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
> +			__u16 :16;		/* Zero padding */
> +			__be16 remote_port;	/* Network byte order */
> +#else
> +#error unspecified endianness
> +#endif
> +		};
> +		__u32 remote_port_compat;

Sorry this hack is not an option.
Don't have any suggestions at this point. Pls come up with something else.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian
  2022-02-27  2:44   ` Alexei Starovoitov
@ 2022-02-27 20:30     ` Jakub Sitnicki
  2022-02-28 10:19       ` Ilya Leoshkevich
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Sitnicki @ 2022-02-27 20:30 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Ilya Leoshkevich, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Heiko Carstens, Vasily Gorbik


On Sat, Feb 26, 2022 at 06:44 PM -08, Alexei Starovoitov wrote:
> On Tue, Feb 22, 2022 at 07:25:58PM +0100, Ilya Leoshkevich wrote:
>> On big-endian, the port is available in the second __u16, not the first
>> one. Therefore, provide a big-endian-specific definition that reflects
>> that. Also, define remote_port_compat in order to have nicer
>> architecture-agnostic code in the verifier and in tests.
>> 
>> Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide")
>> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
>> ---
>>  include/uapi/linux/bpf.h       | 17 +++++++++++++++--
>>  net/core/filter.c              |  5 ++---
>>  tools/include/uapi/linux/bpf.h | 17 +++++++++++++++--
>>  3 files changed, 32 insertions(+), 7 deletions(-)
>> 
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index afe3d0d7f5f2..7b0e5efa58e0 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -10,6 +10,7 @@
>>  
>>  #include <linux/types.h>
>>  #include <linux/bpf_common.h>
>> +#include <asm/byteorder.h>
>>  
>>  /* Extended instruction set based on top of classic BPF */
>>  
>> @@ -6453,8 +6454,20 @@ struct bpf_sk_lookup {
>>  	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
>>  	__u32 remote_ip4;	/* Network byte order */
>>  	__u32 remote_ip6[4];	/* Network byte order */
>> -	__be16 remote_port;	/* Network byte order */
>> -	__u16 :16;		/* Zero padding */
>> +	union {
>> +		struct {
>> +#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN : defined(__LITTLE_ENDIAN)
>> +			__be16 remote_port;	/* Network byte order */
>> +			__u16 :16;		/* Zero padding */
>> +#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN : defined(__BIG_ENDIAN)
>> +			__u16 :16;		/* Zero padding */
>> +			__be16 remote_port;	/* Network byte order */
>> +#else
>> +#error unspecified endianness
>> +#endif
>> +		};
>> +		__u32 remote_port_compat;
>
> Sorry this hack is not an option.
> Don't have any suggestions at this point. Pls come up with something else.

I think we can keep the bpf_sk_lookup definition as is, if we leave the
4-byte load from remote_port offset quirky behavior on little-endian.

Please take a look at the test fix I've posted for 4-byte load from
bpf_sock dst_port that works for me on x86_64 and s390. It is exactly
the same case as we're dealing with here:

https://lore.kernel.org/bpf/20220227202757.519015-4-jakub@cloudflare.com/T/#u


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian
  2022-02-27 20:30     ` Jakub Sitnicki
@ 2022-02-28 10:19       ` Ilya Leoshkevich
  2022-02-28 13:26         ` Jakub Sitnicki
  0 siblings, 1 reply; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-02-28 10:19 UTC (permalink / raw)
  To: Jakub Sitnicki, Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Heiko Carstens, Vasily Gorbik

On Sun, 2022-02-27 at 21:30 +0100, Jakub Sitnicki wrote:
> 
> On Sat, Feb 26, 2022 at 06:44 PM -08, Alexei Starovoitov wrote:
> > On Tue, Feb 22, 2022 at 07:25:58PM +0100, Ilya Leoshkevich wrote:
> > > On big-endian, the port is available in the second __u16, not the
> > > first
> > > one. Therefore, provide a big-endian-specific definition that
> > > reflects
> > > that. Also, define remote_port_compat in order to have nicer
> > > architecture-agnostic code in the verifier and in tests.
> > > 
> > > Fixes: 9a69e2b385f4 ("bpf: Make remote_port field in struct
> > > bpf_sk_lookup 16-bit wide")
> > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > > ---
> > >  include/uapi/linux/bpf.h       | 17 +++++++++++++++--
> > >  net/core/filter.c              |  5 ++---
> > >  tools/include/uapi/linux/bpf.h | 17 +++++++++++++++--
> > >  3 files changed, 32 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index afe3d0d7f5f2..7b0e5efa58e0 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -10,6 +10,7 @@
> > >  
> > >  #include <linux/types.h>
> > >  #include <linux/bpf_common.h>
> > > +#include <asm/byteorder.h>
> > >  
> > >  /* Extended instruction set based on top of classic BPF */
> > >  
> > > @@ -6453,8 +6454,20 @@ struct bpf_sk_lookup {
> > >         __u32 protocol;         /* IP protocol (IPPROTO_TCP,
> > > IPPROTO_UDP) */
> > >         __u32 remote_ip4;       /* Network byte order */
> > >         __u32 remote_ip6[4];    /* Network byte order */
> > > -       __be16 remote_port;     /* Network byte order */
> > > -       __u16 :16;              /* Zero padding */
> > > +       union {
> > > +               struct {
> > > +#if defined(__BYTE_ORDER) ? __BYTE_ORDER == __LITTLE_ENDIAN :
> > > defined(__LITTLE_ENDIAN)
> > > +                       __be16 remote_port;     /* Network byte
> > > order */
> > > +                       __u16 :16;              /* Zero padding
> > > */
> > > +#elif defined(__BYTE_ORDER) ? __BYTE_ORDER == __BIG_ENDIAN :
> > > defined(__BIG_ENDIAN)
> > > +                       __u16 :16;              /* Zero padding
> > > */
> > > +                       __be16 remote_port;     /* Network byte
> > > order */
> > > +#else
> > > +#error unspecified endianness
> > > +#endif
> > > +               };
> > > +               __u32 remote_port_compat;
> > 
> > Sorry this hack is not an option.
> > Don't have any suggestions at this point. Pls come up with
> > something else.
> 
> I think we can keep the bpf_sk_lookup definition as is, if we leave
> the
> 4-byte load from remote_port offset quirky behavior on little-endian.
> 
> Please take a look at the test fix I've posted for 4-byte load from
> bpf_sock dst_port that works for me on x86_64 and s390. It is exactly
> the same case as we're dealing with here:
> 
> https://lore.kernel.org/bpf/20220227202757.519015-4-jakub@cloudflare.com/T/#u
> 

What about 2-byte loads?

static __noinline bool sk_dst_port__load_half(struct bpf_sock *sk)
{
	__u16 *half = (__u16 *)&sk->dst_port;
	return half[0] == bpf_htons(0xcafe);
}

requires "ca fe ?? ??" in memory on BE, while

static __noinline bool sk_dst_port__load_word(struct bpf_sock *sk)
{
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
	const __u8 SHIFT = 16;
#elif __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
	const __u8 SHIFT = 0;
#else
#error "Unrecognized __BYTE_ORDER__"
#endif
	__u32 *word = (__u32 *)&sk->dst_port;
	return word[0] == bpf_htonl(0xcafe << SHIFT);
}

requires "00 00 ca fe". This is inconsistent. Furthermore, one
cannot see it with bpf_sock thanks to

	case offsetofend(struct bpf_sock, dst_port) ...
	     offsetof(struct bpf_sock, dst_ip4) - 1:
		return false;

however, with sk_lookup this is the case: loading the most significant
half of the port produces non-zero! So, it's not simply a quirkiness of
the 4-byte load, it's a mutual inconsistency between LSW loads, MSW
loads and 4-byte loads.

One might argue that we can live with that, especially since all the
user-relevant tests pass - here I can only say that an inconsistency on
such a fundamental level makes me nervous.

In order to resolve this inconsistency I've implemented patch 1 of this
series. With that, "sk->dst_port == bpf_htons(0xcafe)" starts to fail,
and that's where one needs something like this patch.

Best regards,
Ilya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian
  2022-02-28 10:19       ` Ilya Leoshkevich
@ 2022-02-28 13:26         ` Jakub Sitnicki
  2022-03-01  0:39           ` Ilya Leoshkevich
  2022-03-01  0:40           ` Ilya Leoshkevich
  0 siblings, 2 replies; 18+ messages in thread
From: Jakub Sitnicki @ 2022-02-28 13:26 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Heiko Carstens, Vasily Gorbik

On Mon, Feb 28, 2022 at 11:19 AM +01, Ilya Leoshkevich wrote:
> In order to resolve this inconsistency I've implemented patch 1 of this
> series. With that, "sk->dst_port == bpf_htons(0xcafe)" starts to fail,
> and that's where one needs something like this patch.

Truth be told I can't reproduce the said failure.
I've extended the test with an additional check:

   306          bool ok = sk->dst_port == bpf_htons(0xcafe);
   307          if (!ok)
   308                  RET_LOG();
   309          if (!sk_dst_port__load_word(sk))
   310                  RET_LOG();

... but it translates to the same BPF assembly as
sk_dst_port__load_half. That is:

; bool ok = sk->dst_port == bpf_htons(0xcafe);
   9: (69) r1 = *(u16 *)(r6 +12)
  10: (bc) w1 = w1
; if (!ok)
  11: (16) if w1 == 0xcafe goto pc+2
  12: (b4) w1 = 308
  13: (05) goto pc+14

During the test I had patch 1 from this series applied on top of [1].
The latter series should not matter, though, I didn't touch the access
converter.

Mind sharing what does the `bpftool prog objdump` output look like for
the failing test on your side?

[1] https://lore.kernel.org/bpf/20220227202757.519015-1-jakub@cloudflare.com/


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian
  2022-02-28 13:26         ` Jakub Sitnicki
@ 2022-03-01  0:39           ` Ilya Leoshkevich
  2022-03-01  0:40           ` Ilya Leoshkevich
  1 sibling, 0 replies; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-03-01  0:39 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Heiko Carstens, Vasily Gorbik

On Mon, 2022-02-28 at 14:26 +0100, Jakub Sitnicki wrote:
> On Mon, Feb 28, 2022 at 11:19 AM +01, Ilya Leoshkevich wrote:
> > In order to resolve this inconsistency I've implemented patch 1 of
> > this
> > series. With that, "sk->dst_port == bpf_htons(0xcafe)" starts to
> > fail,
> > and that's where one needs something like this patch.
> 
> Truth be told I can't reproduce the said failure.
> I've extended the test with an additional check:
> 
>    306          bool ok = sk->dst_port == bpf_htons(0xcafe);
>    307          if (!ok)
>    308                  RET_LOG();
>    309          if (!sk_dst_port__load_word(sk))
>    310                  RET_LOG();
> 
> ... but it translates to the same BPF assembly as
> sk_dst_port__load_half. That is:
> 
> ; bool ok = sk->dst_port == bpf_htons(0xcafe);
>    9: (69) r1 = *(u16 *)(r6 +12)
>   10: (bc) w1 = w1
> ; if (!ok)
>   11: (16) if w1 == 0xcafe goto pc+2
>   12: (b4) w1 = 308
>   13: (05) goto pc+14
> 
> During the test I had patch 1 from this series applied on top of [1].
> The latter series should not matter, though, I didn't touch the
> access
> converter.
> 
> Mind sharing what does the `bpftool prog objdump` output look like
> for
> the failing test on your side?
> 
> [1]
> https://lore.kernel.org/bpf/20220227202757.519015-1-jakub@cloudflare.com/
> 

Sure.

As I mentioned, it's best demonstrated with sk_lookup, so that's what
I'll be using. Apply this on top of bpf-next:

--- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
@@ -392,7 +392,7 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
 {
 	struct bpf_sock *sk;
 	int err, family;
-	__u32 val_u32;
+	__u32 *ptr_u32;
 	bool v4;
 
 	v4 = (ctx->family == AF_INET);
@@ -411,17 +411,12 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
 	if (LSW(ctx->protocol, 0) != IPPROTO_TCP)
 		return SK_DROP;
 
-	/* Narrow loads from remote_port field. Expect SRC_PORT. */
-	if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) ||
-	    LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) ||
-	    LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3)
!= 0)
+	ptr_u32 = (__u32 *)&ctx->remote_port;
+	if (LSW(*ptr_u32, 0) != SRC_PORT)
 		return SK_DROP;
-	if (LSW(ctx->remote_port, 0) != SRC_PORT)
+	if (LSW(*ptr_u32, 1) != 0)
 		return SK_DROP;
-
-	/* Load from remote_port field with zero padding (backward
compatibility) */
-	val_u32 = *(__u32 *)&ctx->remote_port;
-	if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
+	if (*ptr_u32 != SRC_PORT)
 		return SK_DROP;
 
 	/* Narrow loads from local_port field. Expect DST_PORT. */

The checks look as follows:

$ llvm-objdump -d -l tools/testing/selftests/bpf/test_sk_lookup.o

; if (LSW(*ptr_u32, 0) != SRC_PORT)
     370:	69 26 00 26 00 00 00 00	r2 = *(u16 *)(r6 + 38)
     371:	56 20 01 00 00 00 1f 48	if w2 != 8008 goto
+256 <LBB15_135>

; if (LSW(*ptr_u32, 1) != 0)
     372:	69 26 00 24 00 00 00 00	r2 = *(u16 *)(r6 + 36)
     373:	56 20 00 fe 00 00 00 00	if w2 != 0 goto +254
<LBB15_135>

; if (*ptr_u32 != SRC_PORT)
     374:	61 26 00 24 00 00 00 00	r2 = *(u32 *)(r6 + 36)
     375:	56 20 00 fc 00 00 1f 48	if w2 != 8008 goto
+252 <LBB15_135>

After loading:

# tools/testing/selftests/bpf/tools/sbin/bpftool prog dump xlated id
141

; if (LSW(*ptr_u32, 0) != SRC_PORT)
  49: (69) r2 = *(u16 *)(r6 +4)
  50: (bc) w2 = w2
  51: (56) if w2 != 0x1f48 goto pc+938

; if (LSW(*ptr_u32, 1) != 0)
  52: (69) r2 = *(u16 *)(r6 +4)
  53: (bc) w2 = w2
  54: (56) if w2 != 0x0 goto pc+935

; if (*ptr_u32 != SRC_PORT)
  55: (69) r2 = *(u16 *)(r6 +4)
  56: (bc) w2 = w2
  57: (56) if w2 != 0x1f48 goto pc+932

Note that both LSW(0) and LSW(1) loads result in the same code after
rewriting. This is because the offset is ignored for this particular
combination of sizes.

Best regards,
Ilya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian
  2022-02-28 13:26         ` Jakub Sitnicki
  2022-03-01  0:39           ` Ilya Leoshkevich
@ 2022-03-01  0:40           ` Ilya Leoshkevich
  1 sibling, 0 replies; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-03-01  0:40 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, bpf, Heiko Carstens, Vasily Gorbik

On Mon, 2022-02-28 at 14:26 +0100, Jakub Sitnicki wrote:
> On Mon, Feb 28, 2022 at 11:19 AM +01, Ilya Leoshkevich wrote:
> > In order to resolve this inconsistency I've implemented patch 1 of
> > this
> > series. With that, "sk->dst_port == bpf_htons(0xcafe)" starts to
> > fail,
> > and that's where one needs something like this patch.
> 
> Truth be told I can't reproduce the said failure.
> I've extended the test with an additional check:
> 
>    306          bool ok = sk->dst_port == bpf_htons(0xcafe);
>    307          if (!ok)
>    308                  RET_LOG();
>    309          if (!sk_dst_port__load_word(sk))
>    310                  RET_LOG();
> 
> ... but it translates to the same BPF assembly as
> sk_dst_port__load_half. That is:
> 
> ; bool ok = sk->dst_port == bpf_htons(0xcafe);
>    9: (69) r1 = *(u16 *)(r6 +12)
>   10: (bc) w1 = w1
> ; if (!ok)
>   11: (16) if w1 == 0xcafe goto pc+2
>   12: (b4) w1 = 308
>   13: (05) goto pc+14
> 
> During the test I had patch 1 from this series applied on top of [1].
> The latter series should not matter, though, I didn't touch the
> access
> converter.
> 
> Mind sharing what does the `bpftool prog objdump` output look like
> for
> the failing test on your side?
> 
> [1]
> https://lore.kernel.org/bpf/20220227202757.519015-1-jakub@cloudflare.com/
> 

Sure.

As I mentioned, it's best demonstrated with sk_lookup, so that's what
I'll be using. Apply this on top of bpf-next:

--- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
@@ -392,7 +392,7 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
 {
 	struct bpf_sock *sk;
 	int err, family;
-	__u32 val_u32;
+	__u32 *ptr_u32;
 	bool v4;
 
 	v4 = (ctx->family == AF_INET);
@@ -411,17 +411,12 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
 	if (LSW(ctx->protocol, 0) != IPPROTO_TCP)
 		return SK_DROP;
 
-	/* Narrow loads from remote_port field. Expect SRC_PORT. */
-	if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) ||
-	    LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) ||
-	    LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3)
!= 0)
+	ptr_u32 = (__u32 *)&ctx->remote_port;
+	if (LSW(*ptr_u32, 0) != SRC_PORT)
 		return SK_DROP;
-	if (LSW(ctx->remote_port, 0) != SRC_PORT)
+	if (LSW(*ptr_u32, 1) != 0)
 		return SK_DROP;
-
-	/* Load from remote_port field with zero padding (backward
compatibility) */
-	val_u32 = *(__u32 *)&ctx->remote_port;
-	if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
+	if (*ptr_u32 != SRC_PORT)
 		return SK_DROP;
 
 	/* Narrow loads from local_port field. Expect DST_PORT. */

The checks look as follows:

$ llvm-objdump -d -l tools/testing/selftests/bpf/test_sk_lookup.o

; if (LSW(*ptr_u32, 0) != SRC_PORT)
     370:	69 26 00 26 00 00 00 00	r2 = *(u16 *)(r6 + 38)
     371:	56 20 01 00 00 00 1f 48	if w2 != 8008 goto
+256 <LBB15_135>

; if (LSW(*ptr_u32, 1) != 0)
     372:	69 26 00 24 00 00 00 00	r2 = *(u16 *)(r6 + 36)
     373:	56 20 00 fe 00 00 00 00	if w2 != 0 goto +254
<LBB15_135>

; if (*ptr_u32 != SRC_PORT)
     374:	61 26 00 24 00 00 00 00	r2 = *(u32 *)(r6 + 36)
     375:	56 20 00 fc 00 00 1f 48	if w2 != 8008 goto
+252 <LBB15_135>

After loading:

# tools/testing/selftests/bpf/tools/sbin/bpftool prog dump xlated id
141

; if (LSW(*ptr_u32, 0) != SRC_PORT)
  49: (69) r2 = *(u16 *)(r6 +4)
  50: (bc) w2 = w2
  51: (56) if w2 != 0x1f48 goto pc+938

; if (LSW(*ptr_u32, 1) != 0)
  52: (69) r2 = *(u16 *)(r6 +4)
  53: (bc) w2 = w2
  54: (56) if w2 != 0x0 goto pc+935

; if (*ptr_u32 != SRC_PORT)
  55: (69) r2 = *(u16 *)(r6 +4)
  56: (bc) w2 = w2
  57: (56) if w2 != 0x1f48 goto pc+932

Note that both LSW(0) and LSW(1) loads result in the same code after
rewriting. This is because the offset is ignored for this particular
combination of sizes.

Best regards,
Ilya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
  2022-02-22 18:25 ` [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets Ilya Leoshkevich
@ 2022-03-08 15:01   ` Jakub Sitnicki
  2022-03-08 23:58     ` Ilya Leoshkevich
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Sitnicki @ 2022-03-08 15:01 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Heiko Carstens, Vasily Gorbik

On Tue, Feb 22, 2022 at 07:25 PM +01, Ilya Leoshkevich wrote:
> Verifier treats bpf_sk_lookup.remote_port as a 32-bit field for
> backward compatibility, regardless of what the uapi headers say.
> This field is mapped onto the 16-bit bpf_sk_lookup_kern.sport field.
> Therefore, accessing the most significant 16 bits of
> bpf_sk_lookup.remote_port must produce 0, which is currently not
> the case.
>
> The problem is that narrow loads with offset - commit 46f53a65d2de
> ("bpf: Allow narrow loads with offset > 0"), don't play nicely with
> the masking optimization - commit 239946314e57 ("bpf: possibly avoid
> extra masking for narrower load in verifier"). In particular, when we
> suppress extra masking, we suppress shifting as well, which is not
> correct.
>
> Fix by moving the masking suppression check to BPF_AND generation.
>
> Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> ---
>  kernel/bpf/verifier.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index d7473fee247c..195f2e9b5a47 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12848,7 +12848,7 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  			return -EINVAL;
>  		}
>  
> -		if (is_narrower_load && size < target_size) {
> +		if (is_narrower_load) {
>  			u8 shift = bpf_ctx_narrow_access_offset(
>  				off, size, size_default) * 8;
>  			if (shift && cnt + 1 >= ARRAY_SIZE(insn_buf)) {
> @@ -12860,15 +12860,19 @@ static int convert_ctx_accesses(struct bpf_verifier_env *env)
>  					insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH,
>  									insn->dst_reg,
>  									shift);
> -				insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
> -								(1 << size * 8) - 1);
> +				if (size < target_size)
> +					insn_buf[cnt++] = BPF_ALU32_IMM(
> +						BPF_AND, insn->dst_reg,
> +						(1 << size * 8) - 1);
>  			} else {
>  				if (shift)
>  					insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH,
>  									insn->dst_reg,
>  									shift);
> -				insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
> -								(1ULL << size * 8) - 1);
> +				if (size < target_size)
> +					insn_buf[cnt++] = BPF_ALU64_IMM(
> +						BPF_AND, insn->dst_reg,
> +						(1ULL << size * 8) - 1);
>  			}
>  		}

Thanks for patience. I'm coming back to this.

This fix affects the 2-byte load from bpf_sk_lookup.remote_port.
Dumping the xlated BPF code confirms it.

On LE (x86-64) things look well.

Before this patch:

* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (b7) r0 = 0
   2: (95) exit

* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (b7) r0 = 0
   2: (95) exit

After this patch:

* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (b7) r0 = 0
   2: (95) exit

* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (74) w2 >>= 16
   2: (b7) r0 = 0
   3: (95) exit

Which works great because the JIT generates a zero-extended load movzwq:

* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
bpf_prog_5e4fe3dbdcb18fd3:
   0:   nopl   0x0(%rax,%rax,1)
   5:   xchg   %ax,%ax
   7:   push   %rbp
   8:   mov    %rsp,%rbp
   b:   movzwq 0x4(%rdi),%rsi
  10:   xor    %eax,%eax
  12:   leave
  13:   ret


* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
bpf_prog_4a6336c64a340b96:
   0:   nopl   0x0(%rax,%rax,1)
   5:   xchg   %ax,%ax
   7:   push   %rbp
   8:   mov    %rsp,%rbp
   b:   movzwq 0x4(%rdi),%rsi
  10:   shr    $0x10,%esi
  13:   xor    %eax,%eax
  15:   leave
  16:   ret

Runtime checks for bpf_sk_lookup.remote_port load and the 2-bytes of
zero padding following it, like below, pass with flying colors:

	ok = ctx->remote_port == bpf_htons(8008);
	if (!ok)
		return SK_DROP;
	ok = *((__u16 *)&ctx->remote_port + 1) == 0;
	if (!ok)
		return SK_DROP;

(The above checks compile to half-word (2-byte) loads.)


On BE (s390x) things look different:

Before the patch:

* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (bc) w2 = w2
   2: (b7) r0 = 0
   3: (95) exit

* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (bc) w2 = w2
   2: (b7) r0 = 0
   3: (95) exit

After the patch:

* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (bc) w2 = w2
   2: (74) w2 >>= 16
   3: (bc) w2 = w2
   4: (b7) r0 = 0
   5: (95) exit

* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (bc) w2 = w2
   2: (b7) r0 = 0
   3: (95) exit

These compile to:

* size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
bpf_prog_fdd58b8caca29f00:
   0:   j       0x0000000000000006
   4:   nopr
   6:   stmg    %r11,%r15,112(%r15)
   c:   la      %r13,64(%r15)
  10:   aghi    %r15,-96
  14:   llgh    %r3,4(%r2,%r0)
  1a:   srl     %r3,16
  1e:   llgfr   %r3,%r3
  22:   lgfi    %r14,0
  28:   lgr     %r2,%r14
  2c:   lmg     %r11,%r15,208(%r15)
  32:   br      %r14


* size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
bpf_prog_5e3d8e92223c6841:
   0:   j       0x0000000000000006
   4:   nopr
   6:   stmg    %r11,%r15,112(%r15)
   c:   la      %r13,64(%r15)
  10:   aghi    %r15,-96
  14:   llgh    %r3,4(%r2,%r0)
  1a:   lgfi    %r14,0
  20:   lgr     %r2,%r14
  24:   lmg     %r11,%r15,208(%r15)
  2a:   br      %r14

Now, we right shift the value when loading

  *(u16 *)(r1 +36)

which in C BPF is equivalent to

  *((__u16 *)&ctx->remote_port + 0)

due to how the shift is calculated by bpf_ctx_narrow_access_offset().

This makes the expected typical use-case

  ctx->remote_port == bpf_htons(8008)

fail on s390x because llgh (Load Logical Halfword (64<-16)) seems to lay
out the data in the destination register so that it holds
0x0000_0000_0000_1f48.

I don't know that was the intention here, as it makes the BPF C code
non-portable.

WDYT?


BTW. Out of curiosity, how does a Logical Load Halfword (llgh) differ
differ from a non-logical Load Halfword (lgh) on s390x? Compiler
Explorer generates a non-logical load for similar C code.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
  2022-03-08 15:01   ` Jakub Sitnicki
@ 2022-03-08 23:58     ` Ilya Leoshkevich
  2022-03-09  8:36       ` Jakub Sitnicki
  0 siblings, 1 reply; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-03-08 23:58 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Heiko Carstens, Vasily Gorbik

On Tue, 2022-03-08 at 16:01 +0100, Jakub Sitnicki wrote:
> On Tue, Feb 22, 2022 at 07:25 PM +01, Ilya Leoshkevich wrote:
> > Verifier treats bpf_sk_lookup.remote_port as a 32-bit field for
> > backward compatibility, regardless of what the uapi headers say.
> > This field is mapped onto the 16-bit bpf_sk_lookup_kern.sport
> > field.
> > Therefore, accessing the most significant 16 bits of
> > bpf_sk_lookup.remote_port must produce 0, which is currently not
> > the case.
> > 
> > The problem is that narrow loads with offset - commit 46f53a65d2de
> > ("bpf: Allow narrow loads with offset > 0"), don't play nicely with
> > the masking optimization - commit 239946314e57 ("bpf: possibly
> > avoid
> > extra masking for narrower load in verifier"). In particular, when
> > we
> > suppress extra masking, we suppress shifting as well, which is not
> > correct.
> > 
> > Fix by moving the masking suppression check to BPF_AND generation.
> > 
> > Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > ---
> >  kernel/bpf/verifier.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> > 
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index d7473fee247c..195f2e9b5a47 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -12848,7 +12848,7 @@ static int convert_ctx_accesses(struct
> > bpf_verifier_env *env)
> >                         return -EINVAL;
> >                 }
> >  
> > -               if (is_narrower_load && size < target_size) {
> > +               if (is_narrower_load) {
> >                         u8 shift = bpf_ctx_narrow_access_offset(
> >                                 off, size, size_default) * 8;
> >                         if (shift && cnt + 1 >=
> > ARRAY_SIZE(insn_buf)) {
> > @@ -12860,15 +12860,19 @@ static int convert_ctx_accesses(struct
> > bpf_verifier_env *env)
> >                                         insn_buf[cnt++] =
> > BPF_ALU32_IMM(BPF_RSH,
> >                                                                    
> >      insn->dst_reg,
> >                                                                    
> >      shift);
> > -                               insn_buf[cnt++] =
> > BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
> > -                                                               (1
> > << size * 8) - 1);
> > +                               if (size < target_size)
> > +                                       insn_buf[cnt++] =
> > BPF_ALU32_IMM(
> > +                                               BPF_AND, insn-
> > >dst_reg,
> > +                                               (1 << size * 8) -
> > 1);
> >                         } else {
> >                                 if (shift)
> >                                         insn_buf[cnt++] =
> > BPF_ALU64_IMM(BPF_RSH,
> >                                                                    
> >      insn->dst_reg,
> >                                                                    
> >      shift);
> > -                               insn_buf[cnt++] =
> > BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
> > -
> >                                                                (1ULL
> >  << size * 8) - 1);
> > +                               if (size < target_size)
> > +                                       insn_buf[cnt++] =
> > BPF_ALU64_IMM(
> > +                                               BPF_AND, insn-
> > >dst_reg,
> > +                                               (1ULL << size * 8)
> > - 1);
> >                         }
> >                 }
> 
> Thanks for patience. I'm coming back to this.
> 
> This fix affects the 2-byte load from bpf_sk_lookup.remote_port.
> Dumping the xlated BPF code confirms it.
> 
> On LE (x86-64) things look well.
> 
> Before this patch:
> 
> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (b7) r0 = 0
>    2: (95) exit
> 
> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (b7) r0 = 0
>    2: (95) exit
> 
> After this patch:
> 
> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (b7) r0 = 0
>    2: (95) exit
> 
> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (74) w2 >>= 16
>    2: (b7) r0 = 0
>    3: (95) exit
> 
> Which works great because the JIT generates a zero-extended load
> movzwq:
> 
> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
> bpf_prog_5e4fe3dbdcb18fd3:
>    0:   nopl   0x0(%rax,%rax,1)
>    5:   xchg   %ax,%ax
>    7:   push   %rbp
>    8:   mov    %rsp,%rbp
>    b:   movzwq 0x4(%rdi),%rsi
>   10:   xor    %eax,%eax
>   12:   leave
>   13:   ret
> 
> 
> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
> bpf_prog_4a6336c64a340b96:
>    0:   nopl   0x0(%rax,%rax,1)
>    5:   xchg   %ax,%ax
>    7:   push   %rbp
>    8:   mov    %rsp,%rbp
>    b:   movzwq 0x4(%rdi),%rsi
>   10:   shr    $0x10,%esi
>   13:   xor    %eax,%eax
>   15:   leave
>   16:   ret
> 
> Runtime checks for bpf_sk_lookup.remote_port load and the 2-bytes of
> zero padding following it, like below, pass with flying colors:
> 
>         ok = ctx->remote_port == bpf_htons(8008);
>         if (!ok)
>                 return SK_DROP;
>         ok = *((__u16 *)&ctx->remote_port + 1) == 0;
>         if (!ok)
>                 return SK_DROP;
> 
> (The above checks compile to half-word (2-byte) loads.)
> 
> 
> On BE (s390x) things look different:
> 
> Before the patch:
> 
> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (bc) w2 = w2
>    2: (b7) r0 = 0
>    3: (95) exit
> 
> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (bc) w2 = w2
>    2: (b7) r0 = 0
>    3: (95) exit
> 
> After the patch:
> 
> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (bc) w2 = w2
>    2: (74) w2 >>= 16
>    3: (bc) w2 = w2
>    4: (b7) r0 = 0
>    5: (95) exit
> 
> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (bc) w2 = w2
>    2: (b7) r0 = 0
>    3: (95) exit
> 
> These compile to:
> 
> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
> bpf_prog_fdd58b8caca29f00:
>    0:   j       0x0000000000000006
>    4:   nopr
>    6:   stmg    %r11,%r15,112(%r15)
>    c:   la      %r13,64(%r15)
>   10:   aghi    %r15,-96
>   14:   llgh    %r3,4(%r2,%r0)
>   1a:   srl     %r3,16
>   1e:   llgfr   %r3,%r3
>   22:   lgfi    %r14,0
>   28:   lgr     %r2,%r14
>   2c:   lmg     %r11,%r15,208(%r15)
>   32:   br      %r14
> 
> 
> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
> bpf_prog_5e3d8e92223c6841:
>    0:   j       0x0000000000000006
>    4:   nopr
>    6:   stmg    %r11,%r15,112(%r15)
>    c:   la      %r13,64(%r15)
>   10:   aghi    %r15,-96
>   14:   llgh    %r3,4(%r2,%r0)
>   1a:   lgfi    %r14,0
>   20:   lgr     %r2,%r14
>   24:   lmg     %r11,%r15,208(%r15)
>   2a:   br      %r14
> 
> Now, we right shift the value when loading
> 
>   *(u16 *)(r1 +36)
> 
> which in C BPF is equivalent to
> 
>   *((__u16 *)&ctx->remote_port + 0)
> 
> due to how the shift is calculated by bpf_ctx_narrow_access_offset().

Right, that's exactly the intention here.
The way I see the situation is: the ABI forces us to treat remote_port
as a 32-bit field, even though the updated header now says otherwise.
And this:

    unsigned int remote_port;
    unsigned short result = *(unsigned short *)remote_port;

should be the same as:

    unsigned short result = remote_port >> 16;

on big-endian. Note that this is inherently non-portable.

> This makes the expected typical use-case
> 
>   ctx->remote_port == bpf_htons(8008)
> 
> fail on s390x because llgh (Load Logical Halfword (64<-16)) seems to
> lay
> out the data in the destination register so that it holds
> 0x0000_0000_0000_1f48.
> 
> I don't know that was the intention here, as it makes the BPF C code
> non-portable.
> 
> WDYT?

This depends on how we define the remote_port field. I would argue that
the definition from patch 2 - even though ugly - is the correct one.
It is consistent with both the little-endian (1f 48 00 00) and
big-endian (00 00 1f 48) ABIs.

I don't think the current definition is correct, because it expects
1f 48 00 00 on big-endian, and this is not the case. We can verify this
by taking 9a69e2^ and applying

--- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
@@ -417,6 +417,8 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
                return SK_DROP;
        if (LSW(ctx->remote_port, 0) != SRC_PORT)
                return SK_DROP;
+       if (ctx->remote_port != SRC_PORT)
+               return SK_DROP;
 
        /* Narrow loads from local_port field. Expect DST_PORT. */
        if (LSB(ctx->local_port, 0) != ((DST_PORT >> 0) & 0xff) ||

Therefore that

  ctx->remote_port == bpf_htons(8008)

fails without patch 2 is as expected.

> BTW. Out of curiosity, how does a Logical Load Halfword (llgh) differ
> differ from a non-logical Load Halfword (lgh) on s390x? Compiler
> Explorer generates a non-logical load for similar C code.

The logical one does zero extension, and the regular one does sign
extension.

The following

  unsigned long foo(unsigned short *bar) {
          return *bar;
  }

is compiled to

  foo(unsigned short*):
          llgh    %r2,0(%r2)
          br      %r14

with -O3. Without -O3 zeroing out the upper bits is done using the sllg
and srlg (left and right logical shifts respectively).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
  2022-03-08 23:58     ` Ilya Leoshkevich
@ 2022-03-09  8:36       ` Jakub Sitnicki
  2022-03-09 12:34         ` Ilya Leoshkevich
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Sitnicki @ 2022-03-09  8:36 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Heiko Carstens, Vasily Gorbik

On Wed, Mar 09, 2022 at 12:58 AM +01, Ilya Leoshkevich wrote:
> On Tue, 2022-03-08 at 16:01 +0100, Jakub Sitnicki wrote:
>> On Tue, Feb 22, 2022 at 07:25 PM +01, Ilya Leoshkevich wrote:
>> > Verifier treats bpf_sk_lookup.remote_port as a 32-bit field for
>> > backward compatibility, regardless of what the uapi headers say.
>> > This field is mapped onto the 16-bit bpf_sk_lookup_kern.sport
>> > field.
>> > Therefore, accessing the most significant 16 bits of
>> > bpf_sk_lookup.remote_port must produce 0, which is currently not
>> > the case.
>> > 
>> > The problem is that narrow loads with offset - commit 46f53a65d2de
>> > ("bpf: Allow narrow loads with offset > 0"), don't play nicely with
>> > the masking optimization - commit 239946314e57 ("bpf: possibly
>> > avoid
>> > extra masking for narrower load in verifier"). In particular, when
>> > we
>> > suppress extra masking, we suppress shifting as well, which is not
>> > correct.
>> > 
>> > Fix by moving the masking suppression check to BPF_AND generation.
>> > 
>> > Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
>> > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
>> > ---
>> >  kernel/bpf/verifier.c | 14 +++++++++-----
>> >  1 file changed, 9 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
>> > index d7473fee247c..195f2e9b5a47 100644
>> > --- a/kernel/bpf/verifier.c
>> > +++ b/kernel/bpf/verifier.c
>> > @@ -12848,7 +12848,7 @@ static int convert_ctx_accesses(struct
>> > bpf_verifier_env *env)
>> >                         return -EINVAL;
>> >                 }
>> >  
>> > -               if (is_narrower_load && size < target_size) {
>> > +               if (is_narrower_load) {
>> >                         u8 shift = bpf_ctx_narrow_access_offset(
>> >                                 off, size, size_default) * 8;
>> >                         if (shift && cnt + 1 >=
>> > ARRAY_SIZE(insn_buf)) {
>> > @@ -12860,15 +12860,19 @@ static int convert_ctx_accesses(struct
>> > bpf_verifier_env *env)
>> >                                         insn_buf[cnt++] =
>> > BPF_ALU32_IMM(BPF_RSH,
>> >                                                                    
>> >      insn->dst_reg,
>> >                                                                    
>> >      shift);
>> > -                               insn_buf[cnt++] =
>> > BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
>> > -                                                               (1
>> > << size * 8) - 1);
>> > +                               if (size < target_size)
>> > +                                       insn_buf[cnt++] =
>> > BPF_ALU32_IMM(
>> > +                                               BPF_AND, insn-
>> > >dst_reg,
>> > +                                               (1 << size * 8) -
>> > 1);
>> >                         } else {
>> >                                 if (shift)
>> >                                         insn_buf[cnt++] =
>> > BPF_ALU64_IMM(BPF_RSH,
>> >                                                                    
>> >      insn->dst_reg,
>> >                                                                    
>> >      shift);
>> > -                               insn_buf[cnt++] =
>> > BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
>> > -
>> >                                                                (1ULL
>> >  << size * 8) - 1);
>> > +                               if (size < target_size)
>> > +                                       insn_buf[cnt++] =
>> > BPF_ALU64_IMM(
>> > +                                               BPF_AND, insn-
>> > >dst_reg,
>> > +                                               (1ULL << size * 8)
>> > - 1);
>> >                         }
>> >                 }
>> 
>> Thanks for patience. I'm coming back to this.
>> 
>> This fix affects the 2-byte load from bpf_sk_lookup.remote_port.
>> Dumping the xlated BPF code confirms it.
>> 
>> On LE (x86-64) things look well.
>> 
>> Before this patch:
>> 
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (b7) r0 = 0
>>    2: (95) exit
>> 
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (b7) r0 = 0
>>    2: (95) exit
>> 
>> After this patch:
>> 
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (b7) r0 = 0
>>    2: (95) exit
>> 
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (74) w2 >>= 16
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> Which works great because the JIT generates a zero-extended load
>> movzwq:
>> 
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>> bpf_prog_5e4fe3dbdcb18fd3:
>>    0:   nopl   0x0(%rax,%rax,1)
>>    5:   xchg   %ax,%ax
>>    7:   push   %rbp
>>    8:   mov    %rsp,%rbp
>>    b:   movzwq 0x4(%rdi),%rsi
>>   10:   xor    %eax,%eax
>>   12:   leave
>>   13:   ret
>> 
>> 
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>> bpf_prog_4a6336c64a340b96:
>>    0:   nopl   0x0(%rax,%rax,1)
>>    5:   xchg   %ax,%ax
>>    7:   push   %rbp
>>    8:   mov    %rsp,%rbp
>>    b:   movzwq 0x4(%rdi),%rsi
>>   10:   shr    $0x10,%esi
>>   13:   xor    %eax,%eax
>>   15:   leave
>>   16:   ret
>> 
>> Runtime checks for bpf_sk_lookup.remote_port load and the 2-bytes of
>> zero padding following it, like below, pass with flying colors:
>> 
>>         ok = ctx->remote_port == bpf_htons(8008);
>>         if (!ok)
>>                 return SK_DROP;
>>         ok = *((__u16 *)&ctx->remote_port + 1) == 0;
>>         if (!ok)
>>                 return SK_DROP;
>> 
>> (The above checks compile to half-word (2-byte) loads.)
>> 
>> 
>> On BE (s390x) things look different:
>> 
>> Before the patch:
>> 
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> After the patch:
>> 
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (74) w2 >>= 16
>>    3: (bc) w2 = w2
>>    4: (b7) r0 = 0
>>    5: (95) exit
>> 
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> These compile to:
>> 
>> * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
>> bpf_prog_fdd58b8caca29f00:
>>    0:   j       0x0000000000000006
>>    4:   nopr
>>    6:   stmg    %r11,%r15,112(%r15)
>>    c:   la      %r13,64(%r15)
>>   10:   aghi    %r15,-96
>>   14:   llgh    %r3,4(%r2,%r0)
>>   1a:   srl     %r3,16
>>   1e:   llgfr   %r3,%r3
>>   22:   lgfi    %r14,0
>>   28:   lgr     %r2,%r14
>>   2c:   lmg     %r11,%r15,208(%r15)
>>   32:   br      %r14
>> 
>> 
>> * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
>> bpf_prog_5e3d8e92223c6841:
>>    0:   j       0x0000000000000006
>>    4:   nopr
>>    6:   stmg    %r11,%r15,112(%r15)
>>    c:   la      %r13,64(%r15)
>>   10:   aghi    %r15,-96
>>   14:   llgh    %r3,4(%r2,%r0)
>>   1a:   lgfi    %r14,0
>>   20:   lgr     %r2,%r14
>>   24:   lmg     %r11,%r15,208(%r15)
>>   2a:   br      %r14
>> 
>> Now, we right shift the value when loading
>> 
>>   *(u16 *)(r1 +36)
>> 
>> which in C BPF is equivalent to
>> 
>>   *((__u16 *)&ctx->remote_port + 0)
>> 
>> due to how the shift is calculated by bpf_ctx_narrow_access_offset().
>
> Right, that's exactly the intention here.
> The way I see the situation is: the ABI forces us to treat remote_port
> as a 32-bit field, even though the updated header now says otherwise.
> And this:
>
>     unsigned int remote_port;
>     unsigned short result = *(unsigned short *)remote_port;
>
> should be the same as:
>
>     unsigned short result = remote_port >> 16;
>
> on big-endian. Note that this is inherently non-portable.





>
>> This makes the expected typical use-case
>> 
>>   ctx->remote_port == bpf_htons(8008)
>> 
>> fail on s390x because llgh (Load Logical Halfword (64<-16)) seems to
>> lay
>> out the data in the destination register so that it holds
>> 0x0000_0000_0000_1f48.
>> 
>> I don't know that was the intention here, as it makes the BPF C code
>> non-portable.
>> 
>> WDYT?
>
> This depends on how we define the remote_port field. I would argue that
> the definition from patch 2 - even though ugly - is the correct one.
> It is consistent with both the little-endian (1f 48 00 00) and
> big-endian (00 00 1f 48) ABIs.
>
> I don't think the current definition is correct, because it expects
> 1f 48 00 00 on big-endian, and this is not the case. We can verify this
> by taking 9a69e2^ and applying
>
> --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> @@ -417,6 +417,8 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)
>                 return SK_DROP;
>         if (LSW(ctx->remote_port, 0) != SRC_PORT)
>                 return SK_DROP;
> +       if (ctx->remote_port != SRC_PORT)
> +               return SK_DROP;
>  
>         /* Narrow loads from local_port field. Expect DST_PORT. */
>         if (LSB(ctx->local_port, 0) != ((DST_PORT >> 0) & 0xff) ||
>
> Therefore that
>
>   ctx->remote_port == bpf_htons(8008)
>
> fails without patch 2 is as expected.
>

Consider this - today the below is true on both LE and BE, right?

  *(u32 *)&ctx->remote_port == *(u16 *)&ctx->remote_port

because the loads get converted to:

  *(u16 *)&ctx_kern->sport == *(u16 *)&ctx_kern->sport

IOW, today, because of the bug that you are fixing here, the data layout
changes from the PoV of the BPF program depending on the load size.

With 2-byte loads, without this patch, the data layout appears as:

  struct bpf_sk_lookup {
    ...
    __be16 remote_port;
    __be16 remote_port;
    ...
  }

While for 4-byte loads, it appears as in your 2nd patch:

  struct bpf_sk_lookup {
    ...
    #if little-endian
    __be16 remote_port;
    __u16  :16; /* zero padding */
    #elif big-endian
    __u16  :16; /* zero padding */
    __be16 remote_port;
    #endif
    ...
  }

Because of that I don't see how we could keep complete ABI compatiblity,
and have just one definition of struct bpf_sk_lookup that reflects
it. These are conflicting requirements.

I'd bite the bullet for 4-byte loads, for the sake of having an
endian-agnostic struct bpf_sk_lookup and struct bpf_sock definition in
the uAPI header.

The sacrifice here is that the access converter will have to keep
rewriting 4-byte access to bpf_sk_lookup.remote_port and
bpf_sock.dst_port in this unexpected, quirky manner.

The expectation is that with time users will recompile their BPF progs
against the updated bpf.h, and switch to 2-byte loads. That will make
the quirk in the access converter dead code in time.

I don't have any better ideas. Sorry.

[...]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
  2022-03-09  8:36       ` Jakub Sitnicki
@ 2022-03-09 12:34         ` Ilya Leoshkevich
  2022-03-10 22:57           ` Jakub Sitnicki
  0 siblings, 1 reply; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-03-09 12:34 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Heiko Carstens, Vasily Gorbik

On Wed, 2022-03-09 at 09:36 +0100, Jakub Sitnicki wrote:
> On Wed, Mar 09, 2022 at 12:58 AM +01, Ilya Leoshkevich wrote:
> > On Tue, 2022-03-08 at 16:01 +0100, Jakub Sitnicki wrote:
> > > On Tue, Feb 22, 2022 at 07:25 PM +01, Ilya Leoshkevich wrote:
> > > > Verifier treats bpf_sk_lookup.remote_port as a 32-bit field for
> > > > backward compatibility, regardless of what the uapi headers
> > > > say.
> > > > This field is mapped onto the 16-bit bpf_sk_lookup_kern.sport
> > > > field.
> > > > Therefore, accessing the most significant 16 bits of
> > > > bpf_sk_lookup.remote_port must produce 0, which is currently
> > > > not
> > > > the case.
> > > > 
> > > > The problem is that narrow loads with offset - commit
> > > > 46f53a65d2de
> > > > ("bpf: Allow narrow loads with offset > 0"), don't play nicely
> > > > with
> > > > the masking optimization - commit 239946314e57 ("bpf: possibly
> > > > avoid
> > > > extra masking for narrower load in verifier"). In particular,
> > > > when
> > > > we
> > > > suppress extra masking, we suppress shifting as well, which is
> > > > not
> > > > correct.
> > > > 
> > > > Fix by moving the masking suppression check to BPF_AND
> > > > generation.
> > > > 
> > > > Fixes: 46f53a65d2de ("bpf: Allow narrow loads with offset > 0")
> > > > Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
> > > > ---
> > > >  kernel/bpf/verifier.c | 14 +++++++++-----
> > > >  1 file changed, 9 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > > index d7473fee247c..195f2e9b5a47 100644
> > > > --- a/kernel/bpf/verifier.c
> > > > +++ b/kernel/bpf/verifier.c
> > > > @@ -12848,7 +12848,7 @@ static int convert_ctx_accesses(struct
> > > > bpf_verifier_env *env)
> > > >                         return -EINVAL;
> > > >                 }
> > > >  
> > > > -               if (is_narrower_load && size < target_size) {
> > > > +               if (is_narrower_load) {
> > > >                         u8 shift =
> > > > bpf_ctx_narrow_access_offset(
> > > >                                 off, size, size_default) * 8;
> > > >                         if (shift && cnt + 1 >=
> > > > ARRAY_SIZE(insn_buf)) {
> > > > @@ -12860,15 +12860,19 @@ static int
> > > > convert_ctx_accesses(struct
> > > > bpf_verifier_env *env)
> > > >                                         insn_buf[cnt++] =
> > > > BPF_ALU32_IMM(BPF_RSH,
> > > >                                                                
> > > >     
> > > >      insn->dst_reg,
> > > >                                                                
> > > >     
> > > >      shift);
> > > > -                               insn_buf[cnt++] =
> > > > BPF_ALU32_IMM(BPF_AND, insn->dst_reg,
> > > > -
> > > >                                                                (
> > > > 1
> > > > << size * 8) - 1);
> > > > +                               if (size < target_size)
> > > > +                                       insn_buf[cnt++] =
> > > > BPF_ALU32_IMM(
> > > > +                                               BPF_AND, insn-
> > > > > dst_reg,
> > > > +                                               (1 << size * 8)
> > > > -
> > > > 1);
> > > >                         } else {
> > > >                                 if (shift)
> > > >                                         insn_buf[cnt++] =
> > > > BPF_ALU64_IMM(BPF_RSH,
> > > >                                                                
> > > >     
> > > >      insn->dst_reg,
> > > >                                                                
> > > >     
> > > >      shift);
> > > > -                               insn_buf[cnt++] =
> > > > BPF_ALU64_IMM(BPF_AND, insn->dst_reg,
> > > > -
> > > >                                                                
> > > > (1ULL
> > > >  << size * 8) - 1);
> > > > +                               if (size < target_size)
> > > > +                                       insn_buf[cnt++] =
> > > > BPF_ALU64_IMM(
> > > > +                                               BPF_AND, insn-
> > > > > dst_reg,
> > > > +                                               (1ULL << size *
> > > > 8)
> > > > - 1);
> > > >                         }
> > > >                 }
> > > 
> > > Thanks for patience. I'm coming back to this.
> > > 
> > > This fix affects the 2-byte load from bpf_sk_lookup.remote_port.
> > > Dumping the xlated BPF code confirms it.
> > > 
> > > On LE (x86-64) things look well.
> > > 
> > > Before this patch:
> > > 
> > > * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
> > >    0: (69) r2 = *(u16 *)(r1 +4)
> > >    1: (b7) r0 = 0
> > >    2: (95) exit
> > > 
> > > * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
> > >    0: (69) r2 = *(u16 *)(r1 +4)
> > >    1: (b7) r0 = 0
> > >    2: (95) exit
> > > 
> > > After this patch:
> > > 
> > > * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
> > >    0: (69) r2 = *(u16 *)(r1 +4)
> > >    1: (b7) r0 = 0
> > >    2: (95) exit
> > > 
> > > * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
> > >    0: (69) r2 = *(u16 *)(r1 +4)
> > >    1: (74) w2 >>= 16
> > >    2: (b7) r0 = 0
> > >    3: (95) exit
> > > 
> > > Which works great because the JIT generates a zero-extended load
> > > movzwq:
> > > 
> > > * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
> > > bpf_prog_5e4fe3dbdcb18fd3:
> > >    0:   nopl   0x0(%rax,%rax,1)
> > >    5:   xchg   %ax,%ax
> > >    7:   push   %rbp
> > >    8:   mov    %rsp,%rbp
> > >    b:   movzwq 0x4(%rdi),%rsi
> > >   10:   xor    %eax,%eax
> > >   12:   leave
> > >   13:   ret
> > > 
> > > 
> > > * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
> > > bpf_prog_4a6336c64a340b96:
> > >    0:   nopl   0x0(%rax,%rax,1)
> > >    5:   xchg   %ax,%ax
> > >    7:   push   %rbp
> > >    8:   mov    %rsp,%rbp
> > >    b:   movzwq 0x4(%rdi),%rsi
> > >   10:   shr    $0x10,%esi
> > >   13:   xor    %eax,%eax
> > >   15:   leave
> > >   16:   ret
> > > 
> > > Runtime checks for bpf_sk_lookup.remote_port load and the 2-bytes
> > > of
> > > zero padding following it, like below, pass with flying colors:
> > > 
> > >         ok = ctx->remote_port == bpf_htons(8008);
> > >         if (!ok)
> > >                 return SK_DROP;
> > >         ok = *((__u16 *)&ctx->remote_port + 1) == 0;
> > >         if (!ok)
> > >                 return SK_DROP;
> > > 
> > > (The above checks compile to half-word (2-byte) loads.)
> > > 
> > > 
> > > On BE (s390x) things look different:
> > > 
> > > Before the patch:
> > > 
> > > * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
> > >    0: (69) r2 = *(u16 *)(r1 +4)
> > >    1: (bc) w2 = w2
> > >    2: (b7) r0 = 0
> > >    3: (95) exit
> > > 
> > > * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
> > >    0: (69) r2 = *(u16 *)(r1 +4)
> > >    1: (bc) w2 = w2
> > >    2: (b7) r0 = 0
> > >    3: (95) exit
> > > 
> > > After the patch:
> > > 
> > > * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
> > >    0: (69) r2 = *(u16 *)(r1 +4)
> > >    1: (bc) w2 = w2
> > >    2: (74) w2 >>= 16
> > >    3: (bc) w2 = w2
> > >    4: (b7) r0 = 0
> > >    5: (95) exit
> > > 
> > > * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
> > >    0: (69) r2 = *(u16 *)(r1 +4)
> > >    1: (bc) w2 = w2
> > >    2: (b7) r0 = 0
> > >    3: (95) exit
> > > 
> > > These compile to:
> > > 
> > > * size=2, offset=0, 0: (69) r2 = *(u16 *)(r1 +36)
> > > bpf_prog_fdd58b8caca29f00:
> > >    0:   j       0x0000000000000006
> > >    4:   nopr
> > >    6:   stmg    %r11,%r15,112(%r15)
> > >    c:   la      %r13,64(%r15)
> > >   10:   aghi    %r15,-96
> > >   14:   llgh    %r3,4(%r2,%r0)
> > >   1a:   srl     %r3,16
> > >   1e:   llgfr   %r3,%r3
> > >   22:   lgfi    %r14,0
> > >   28:   lgr     %r2,%r14
> > >   2c:   lmg     %r11,%r15,208(%r15)
> > >   32:   br      %r14
> > > 
> > > 
> > > * size=2, offset=2, 0: (69) r2 = *(u16 *)(r1 +38)
> > > bpf_prog_5e3d8e92223c6841:
> > >    0:   j       0x0000000000000006
> > >    4:   nopr
> > >    6:   stmg    %r11,%r15,112(%r15)
> > >    c:   la      %r13,64(%r15)
> > >   10:   aghi    %r15,-96
> > >   14:   llgh    %r3,4(%r2,%r0)
> > >   1a:   lgfi    %r14,0
> > >   20:   lgr     %r2,%r14
> > >   24:   lmg     %r11,%r15,208(%r15)
> > >   2a:   br      %r14
> > > 
> > > Now, we right shift the value when loading
> > > 
> > >   *(u16 *)(r1 +36)
> > > 
> > > which in C BPF is equivalent to
> > > 
> > >   *((__u16 *)&ctx->remote_port + 0)
> > > 
> > > due to how the shift is calculated by
> > > bpf_ctx_narrow_access_offset().
> > 
> > Right, that's exactly the intention here.
> > The way I see the situation is: the ABI forces us to treat
> > remote_port
> > as a 32-bit field, even though the updated header now says
> > otherwise.
> > And this:
> > 
> >     unsigned int remote_port;
> >     unsigned short result = *(unsigned short *)remote_port;
> > 
> > should be the same as:
> > 
> >     unsigned short result = remote_port >> 16;
> > 
> > on big-endian. Note that this is inherently non-portable.
> 
> 
> 
> 
> 
> > 
> > > This makes the expected typical use-case
> > > 
> > >   ctx->remote_port == bpf_htons(8008)
> > > 
> > > fail on s390x because llgh (Load Logical Halfword (64<-16)) seems
> > > to
> > > lay
> > > out the data in the destination register so that it holds
> > > 0x0000_0000_0000_1f48.
> > > 
> > > I don't know that was the intention here, as it makes the BPF C
> > > code
> > > non-portable.
> > > 
> > > WDYT?
> > 
> > This depends on how we define the remote_port field. I would argue
> > that
> > the definition from patch 2 - even though ugly - is the correct
> > one.
> > It is consistent with both the little-endian (1f 48 00 00) and
> > big-endian (00 00 1f 48) ABIs.
> > 
> > I don't think the current definition is correct, because it expects
> > 1f 48 00 00 on big-endian, and this is not the case. We can verify
> > this
> > by taking 9a69e2^ and applying
> > 
> > --- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > +++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
> > @@ -417,6 +417,8 @@ int ctx_narrow_access(struct bpf_sk_lookup
> > *ctx)
> >                 return SK_DROP;
> >         if (LSW(ctx->remote_port, 0) != SRC_PORT)
> >                 return SK_DROP;
> > +       if (ctx->remote_port != SRC_PORT)
> > +               return SK_DROP;
> >  
> >         /* Narrow loads from local_port field. Expect DST_PORT. */
> >         if (LSB(ctx->local_port, 0) != ((DST_PORT >> 0) & 0xff) ||
> > 
> > Therefore that
> > 
> >   ctx->remote_port == bpf_htons(8008)
> > 
> > fails without patch 2 is as expected.
> > 
> 
> Consider this - today the below is true on both LE and BE, right?
> 
>   *(u32 *)&ctx->remote_port == *(u16 *)&ctx->remote_port
> 
> because the loads get converted to:
> 
>   *(u16 *)&ctx_kern->sport == *(u16 *)&ctx_kern->sport
> 
> IOW, today, because of the bug that you are fixing here, the data
> layout
> changes from the PoV of the BPF program depending on the load size.
> 
> With 2-byte loads, without this patch, the data layout appears as:
> 
>   struct bpf_sk_lookup {
>     ...
>     __be16 remote_port;
>     __be16 remote_port;
>     ...
>   }

I see, one can indeed argue that this is also a part of the ABI now.
So we're stuck between a rock and a hard place.

> While for 4-byte loads, it appears as in your 2nd patch:
> 
>   struct bpf_sk_lookup {
>     ...
>     #if little-endian
>     __be16 remote_port;
>     __u16  :16; /* zero padding */
>     #elif big-endian
>     __u16  :16; /* zero padding */
>     __be16 remote_port;
>     #endif
>     ...
>   }
> 
> Because of that I don't see how we could keep complete ABI
> compatiblity,
> and have just one definition of struct bpf_sk_lookup that reflects
> it. These are conflicting requirements.
> 
> I'd bite the bullet for 4-byte loads, for the sake of having an
> endian-agnostic struct bpf_sk_lookup and struct bpf_sock definition
> in
> the uAPI header.
> 
> The sacrifice here is that the access converter will have to keep
> rewriting 4-byte access to bpf_sk_lookup.remote_port and
> bpf_sock.dst_port in this unexpected, quirky manner.
> 
> The expectation is that with time users will recompile their BPF
> progs
> against the updated bpf.h, and switch to 2-byte loads. That will make
> the quirk in the access converter dead code in time.
> 
> I don't have any better ideas. Sorry.
> 
> [...]

I agree, let's go ahead with this solution.

The only remaining problem that I see is: the bug is in the common
code, and it will affect the fields that we add in the future.

Can we either document this state of things in a comment, or fix the
bug and emulate the old behavior for certain fields?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
  2022-03-09 12:34         ` Ilya Leoshkevich
@ 2022-03-10 22:57           ` Jakub Sitnicki
  2022-03-14 17:35             ` Jakub Sitnicki
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Sitnicki @ 2022-03-10 22:57 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Heiko Carstens, Vasily Gorbik


On Wed, Mar 09, 2022 at 01:34 PM +01, Ilya Leoshkevich wrote:
> On Wed, 2022-03-09 at 09:36 +0100, Jakub Sitnicki wrote:

[...]

>> 
>> Consider this - today the below is true on both LE and BE, right?
>> 
>>   *(u32 *)&ctx->remote_port == *(u16 *)&ctx->remote_port
>> 
>> because the loads get converted to:
>> 
>>   *(u16 *)&ctx_kern->sport == *(u16 *)&ctx_kern->sport
>> 
>> IOW, today, because of the bug that you are fixing here, the data
>> layout
>> changes from the PoV of the BPF program depending on the load size.
>> 
>> With 2-byte loads, without this patch, the data layout appears as:
>> 
>>   struct bpf_sk_lookup {
>>     ...
>>     __be16 remote_port;
>>     __be16 remote_port;
>>     ...
>>   }
>
> I see, one can indeed argue that this is also a part of the ABI now.
> So we're stuck between a rock and a hard place.
>
>> While for 4-byte loads, it appears as in your 2nd patch:
>> 
>>   struct bpf_sk_lookup {
>>     ...
>>     #if little-endian
>>     __be16 remote_port;
>>     __u16  :16; /* zero padding */
>>     #elif big-endian
>>     __u16  :16; /* zero padding */
>>     __be16 remote_port;
>>     #endif
>>     ...
>>   }
>> 
>> Because of that I don't see how we could keep complete ABI
>> compatiblity,
>> and have just one definition of struct bpf_sk_lookup that reflects
>> it. These are conflicting requirements.
>> 
>> I'd bite the bullet for 4-byte loads, for the sake of having an
>> endian-agnostic struct bpf_sk_lookup and struct bpf_sock definition
>> in
>> the uAPI header.
>> 
>> The sacrifice here is that the access converter will have to keep
>> rewriting 4-byte access to bpf_sk_lookup.remote_port and
>> bpf_sock.dst_port in this unexpected, quirky manner.
>> 
>> The expectation is that with time users will recompile their BPF
>> progs
>> against the updated bpf.h, and switch to 2-byte loads. That will make
>> the quirk in the access converter dead code in time.
>> 
>> I don't have any better ideas. Sorry.
>> 
>> [...]
>
> I agree, let's go ahead with this solution.
>
> The only remaining problem that I see is: the bug is in the common
> code, and it will affect the fields that we add in the future.
>
> Can we either document this state of things in a comment, or fix the
> bug and emulate the old behavior for certain fields?

I think we can fix the bug in the common code, and compensate for the
quirky 4-byte access to bpf_sk_lookup.remote_port and bpf_sock.dst_port
in the is_valid_access and convert_ctx_access.

With the patch as below, access to remote_port gets rewritten to:

* size=1, offset=0, r2 = *(u8 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (54) w2 &= 255
   2: (b7) r0 = 0
   3: (95) exit

* size=1, offset=1, r2 = *(u8 *)(r1 +37)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (74) w2 >>= 8
   2: (54) w2 &= 255
   3: (b7) r0 = 0
   4: (95) exit

* size=1, offset=2, r2 = *(u8 *)(r1 +38)
   0: (b4) w2 = 0
   1: (54) w2 &= 255
   2: (b7) r0 = 0
   3: (95) exit

* size=1, offset=3, r2 = *(u8 *)(r1 +39)
   0: (b4) w2 = 0
   1: (74) w2 >>= 8
   2: (54) w2 &= 255
   3: (b7) r0 = 0
   4: (95) exit

* size=2, offset=0, r2 = *(u16 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (b7) r0 = 0
   2: (95) exit

* size=2, offset=2, r2 = *(u16 *)(r1 +38)
   0: (b4) w2 = 0
   1: (b7) r0 = 0
   2: (95) exit

* size=4, offset=0, r2 = *(u32 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (b7) r0 = 0
   2: (95) exit

How does that look to you?

Still need to give it a test on s390x.

--8<--

diff --git a/net/core/filter.c b/net/core/filter.c
index 65869fd510e8..2625a1d2dabc 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -10856,13 +10856,24 @@ static bool sk_lookup_is_valid_access(int off, int size,
 	case bpf_ctx_range(struct bpf_sk_lookup, local_ip4):
 	case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]):
 	case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]):
-	case offsetof(struct bpf_sk_lookup, remote_port) ...
-	     offsetof(struct bpf_sk_lookup, local_ip4) - 1:
 	case bpf_ctx_range(struct bpf_sk_lookup, local_port):
 	case bpf_ctx_range(struct bpf_sk_lookup, ingress_ifindex):
 		bpf_ctx_record_field_size(info, sizeof(__u32));
 		return bpf_ctx_narrow_access_ok(off, size, sizeof(__u32));
 
+	case bpf_ctx_range(struct bpf_sk_lookup, remote_port):
+		/* Allow 4-byte access to 2-byte field for backward compatibility */
+		if (size == sizeof(__u32))
+			return off == offsetof(struct bpf_sk_lookup, remote_port);
+		bpf_ctx_record_field_size(info, sizeof(__be16));
+		return bpf_ctx_narrow_access_ok(off, size, sizeof(__be16));
+
+	case offsetofend(struct bpf_sk_lookup, remote_port) ...
+	     offsetof(struct bpf_sk_lookup, local_ip4) - 1:
+		/* Allow access to zero padding for backward compatiblity */
+		bpf_ctx_record_field_size(info, sizeof(__u16));
+		return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16));
+
 	default:
 		return false;
 	}
@@ -10944,6 +10955,11 @@ static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
 						     sport, 2, target_size));
 		break;
 
+	case offsetofend(struct bpf_sk_lookup, remote_port):
+		*target_size = 2;
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+		break;
+
 	case offsetof(struct bpf_sk_lookup, local_port):
 		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
 				      bpf_target_off(struct bpf_sk_lookup_kern,

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
  2022-03-10 22:57           ` Jakub Sitnicki
@ 2022-03-14 17:35             ` Jakub Sitnicki
  2022-03-14 18:25               ` Ilya Leoshkevich
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Sitnicki @ 2022-03-14 17:35 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Heiko Carstens, Vasily Gorbik

On Thu, Mar 10, 2022 at 11:57 PM +01, Jakub Sitnicki wrote:
> On Wed, Mar 09, 2022 at 01:34 PM +01, Ilya Leoshkevich wrote:
>> On Wed, 2022-03-09 at 09:36 +0100, Jakub Sitnicki wrote:
>
> [...]
>
>>> 
>>> Consider this - today the below is true on both LE and BE, right?
>>> 
>>>   *(u32 *)&ctx->remote_port == *(u16 *)&ctx->remote_port
>>> 
>>> because the loads get converted to:
>>> 
>>>   *(u16 *)&ctx_kern->sport == *(u16 *)&ctx_kern->sport
>>> 
>>> IOW, today, because of the bug that you are fixing here, the data
>>> layout
>>> changes from the PoV of the BPF program depending on the load size.
>>> 
>>> With 2-byte loads, without this patch, the data layout appears as:
>>> 
>>>   struct bpf_sk_lookup {
>>>     ...
>>>     __be16 remote_port;
>>>     __be16 remote_port;
>>>     ...
>>>   }
>>
>> I see, one can indeed argue that this is also a part of the ABI now.
>> So we're stuck between a rock and a hard place.
>>
>>> While for 4-byte loads, it appears as in your 2nd patch:
>>> 
>>>   struct bpf_sk_lookup {
>>>     ...
>>>     #if little-endian
>>>     __be16 remote_port;
>>>     __u16  :16; /* zero padding */
>>>     #elif big-endian
>>>     __u16  :16; /* zero padding */
>>>     __be16 remote_port;
>>>     #endif
>>>     ...
>>>   }
>>> 
>>> Because of that I don't see how we could keep complete ABI
>>> compatiblity,
>>> and have just one definition of struct bpf_sk_lookup that reflects
>>> it. These are conflicting requirements.
>>> 
>>> I'd bite the bullet for 4-byte loads, for the sake of having an
>>> endian-agnostic struct bpf_sk_lookup and struct bpf_sock definition
>>> in
>>> the uAPI header.
>>> 
>>> The sacrifice here is that the access converter will have to keep
>>> rewriting 4-byte access to bpf_sk_lookup.remote_port and
>>> bpf_sock.dst_port in this unexpected, quirky manner.
>>> 
>>> The expectation is that with time users will recompile their BPF
>>> progs
>>> against the updated bpf.h, and switch to 2-byte loads. That will make
>>> the quirk in the access converter dead code in time.
>>> 
>>> I don't have any better ideas. Sorry.
>>> 
>>> [...]
>>
>> I agree, let's go ahead with this solution.
>>
>> The only remaining problem that I see is: the bug is in the common
>> code, and it will affect the fields that we add in the future.
>>
>> Can we either document this state of things in a comment, or fix the
>> bug and emulate the old behavior for certain fields?
>
> I think we can fix the bug in the common code, and compensate for the
> quirky 4-byte access to bpf_sk_lookup.remote_port and bpf_sock.dst_port
> in the is_valid_access and convert_ctx_access.
>
> With the patch as below, access to remote_port gets rewritten to:
>
> * size=1, offset=0, r2 = *(u8 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (54) w2 &= 255
>    2: (b7) r0 = 0
>    3: (95) exit
>
> * size=1, offset=1, r2 = *(u8 *)(r1 +37)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (74) w2 >>= 8
>    2: (54) w2 &= 255
>    3: (b7) r0 = 0
>    4: (95) exit
>
> * size=1, offset=2, r2 = *(u8 *)(r1 +38)
>    0: (b4) w2 = 0
>    1: (54) w2 &= 255
>    2: (b7) r0 = 0
>    3: (95) exit
>
> * size=1, offset=3, r2 = *(u8 *)(r1 +39)
>    0: (b4) w2 = 0
>    1: (74) w2 >>= 8
>    2: (54) w2 &= 255
>    3: (b7) r0 = 0
>    4: (95) exit
>
> * size=2, offset=0, r2 = *(u16 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (b7) r0 = 0
>    2: (95) exit
>
> * size=2, offset=2, r2 = *(u16 *)(r1 +38)
>    0: (b4) w2 = 0
>    1: (b7) r0 = 0
>    2: (95) exit
>
> * size=4, offset=0, r2 = *(u32 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (b7) r0 = 0
>    2: (95) exit
>
> How does that look to you?
>
> Still need to give it a test on s390x.

Context conversion with patch below applied looks correct to me on s390x
as well:

* size=1, offset=0, r2 = *(u8 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (bc) w2 = w2
   2: (74) w2 >>= 8
   3: (bc) w2 = w2
   4: (54) w2 &= 255
   5: (bc) w2 = w2
   6: (b7) r0 = 0
   7: (95) exit

* size=1, offset=1, r2 = *(u8 *)(r1 +37)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (bc) w2 = w2
   2: (54) w2 &= 255
   3: (bc) w2 = w2
   4: (b7) r0 = 0
   5: (95) exit

* size=1, offset=2, r2 = *(u8 *)(r1 +38)
   0: (b4) w2 = 0
   1: (bc) w2 = w2
   2: (74) w2 >>= 8
   3: (bc) w2 = w2
   4: (54) w2 &= 255
   5: (bc) w2 = w2
   6: (b7) r0 = 0
   7: (95) exit

* size=1, offset=3, r2 = *(u8 *)(r1 +39)
   0: (b4) w2 = 0
   1: (bc) w2 = w2
   2: (54) w2 &= 255
   3: (bc) w2 = w2
   4: (b7) r0 = 0
   5: (95) exit

* size=2, offset=0, r2 = *(u16 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (bc) w2 = w2
   2: (b7) r0 = 0
   3: (95) exit

* size=2, offset=2, r2 = *(u16 *)(r1 +38)
   0: (b4) w2 = 0
   1: (bc) w2 = w2
   2: (b7) r0 = 0
   3: (95) exit

* size=4, offset=0, r2 = *(u32 *)(r1 +36)
   0: (69) r2 = *(u16 *)(r1 +4)
   1: (bc) w2 = w2
   2: (b7) r0 = 0
   3: (95) exit

If we go this way, this should unbreak the bpf selftests on BE,
independently of the patch 1 from this series.

Will send it as a patch, so that we continue the review discussion.

>
> --8<--
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 65869fd510e8..2625a1d2dabc 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -10856,13 +10856,24 @@ static bool sk_lookup_is_valid_access(int off, int size,
>  	case bpf_ctx_range(struct bpf_sk_lookup, local_ip4):
>  	case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]):
>  	case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]):
> -	case offsetof(struct bpf_sk_lookup, remote_port) ...
> -	     offsetof(struct bpf_sk_lookup, local_ip4) - 1:
>  	case bpf_ctx_range(struct bpf_sk_lookup, local_port):
>  	case bpf_ctx_range(struct bpf_sk_lookup, ingress_ifindex):
>  		bpf_ctx_record_field_size(info, sizeof(__u32));
>  		return bpf_ctx_narrow_access_ok(off, size, sizeof(__u32));
>  
> +	case bpf_ctx_range(struct bpf_sk_lookup, remote_port):
> +		/* Allow 4-byte access to 2-byte field for backward compatibility */
> +		if (size == sizeof(__u32))
> +			return off == offsetof(struct bpf_sk_lookup, remote_port);
> +		bpf_ctx_record_field_size(info, sizeof(__be16));
> +		return bpf_ctx_narrow_access_ok(off, size, sizeof(__be16));
> +
> +	case offsetofend(struct bpf_sk_lookup, remote_port) ...
> +	     offsetof(struct bpf_sk_lookup, local_ip4) - 1:
> +		/* Allow access to zero padding for backward compatiblity */
> +		bpf_ctx_record_field_size(info, sizeof(__u16));
> +		return bpf_ctx_narrow_access_ok(off, size, sizeof(__u16));
> +
>  	default:
>  		return false;
>  	}
> @@ -10944,6 +10955,11 @@ static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
>  						     sport, 2, target_size));
>  		break;
>  
> +	case offsetofend(struct bpf_sk_lookup, remote_port):
> +		*target_size = 2;
> +		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
> +		break;
> +
>  	case offsetof(struct bpf_sk_lookup, local_port):
>  		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
>  				      bpf_target_off(struct bpf_sk_lookup_kern,


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
  2022-03-14 17:35             ` Jakub Sitnicki
@ 2022-03-14 18:25               ` Ilya Leoshkevich
  2022-03-14 20:57                 ` Jakub Sitnicki
  0 siblings, 1 reply; 18+ messages in thread
From: Ilya Leoshkevich @ 2022-03-14 18:25 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Heiko Carstens, Vasily Gorbik

On Mon, 2022-03-14 at 18:35 +0100, Jakub Sitnicki wrote:
> On Thu, Mar 10, 2022 at 11:57 PM +01, Jakub Sitnicki wrote:
> > On Wed, Mar 09, 2022 at 01:34 PM +01, Ilya Leoshkevich wrote:
> > > On Wed, 2022-03-09 at 09:36 +0100, Jakub Sitnicki wrote:
> > 
> > [...]
> > 
> > > > 
> > > > Consider this - today the below is true on both LE and BE,
> > > > right?
> > > > 
> > > >   *(u32 *)&ctx->remote_port == *(u16 *)&ctx->remote_port
> > > > 
> > > > because the loads get converted to:
> > > > 
> > > >   *(u16 *)&ctx_kern->sport == *(u16 *)&ctx_kern->sport
> > > > 
> > > > IOW, today, because of the bug that you are fixing here, the
> > > > data
> > > > layout
> > > > changes from the PoV of the BPF program depending on the load
> > > > size.
> > > > 
> > > > With 2-byte loads, without this patch, the data layout appears
> > > > as:
> > > > 
> > > >   struct bpf_sk_lookup {
> > > >     ...
> > > >     __be16 remote_port;
> > > >     __be16 remote_port;
> > > >     ...
> > > >   }
> > > 
> > > I see, one can indeed argue that this is also a part of the ABI
> > > now.
> > > So we're stuck between a rock and a hard place.
> > > 
> > > > While for 4-byte loads, it appears as in your 2nd patch:
> > > > 
> > > >   struct bpf_sk_lookup {
> > > >     ...
> > > >     #if little-endian
> > > >     __be16 remote_port;
> > > >     __u16  :16; /* zero padding */
> > > >     #elif big-endian
> > > >     __u16  :16; /* zero padding */
> > > >     __be16 remote_port;
> > > >     #endif
> > > >     ...
> > > >   }
> > > > 
> > > > Because of that I don't see how we could keep complete ABI
> > > > compatiblity,
> > > > and have just one definition of struct bpf_sk_lookup that
> > > > reflects
> > > > it. These are conflicting requirements.
> > > > 
> > > > I'd bite the bullet for 4-byte loads, for the sake of having an
> > > > endian-agnostic struct bpf_sk_lookup and struct bpf_sock
> > > > definition
> > > > in
> > > > the uAPI header.
> > > > 
> > > > The sacrifice here is that the access converter will have to
> > > > keep
> > > > rewriting 4-byte access to bpf_sk_lookup.remote_port and
> > > > bpf_sock.dst_port in this unexpected, quirky manner.
> > > > 
> > > > The expectation is that with time users will recompile their
> > > > BPF
> > > > progs
> > > > against the updated bpf.h, and switch to 2-byte loads. That
> > > > will make
> > > > the quirk in the access converter dead code in time.
> > > > 
> > > > I don't have any better ideas. Sorry.
> > > > 
> > > > [...]
> > > 
> > > I agree, let's go ahead with this solution.
> > > 
> > > The only remaining problem that I see is: the bug is in the
> > > common
> > > code, and it will affect the fields that we add in the future.
> > > 
> > > Can we either document this state of things in a comment, or fix
> > > the
> > > bug and emulate the old behavior for certain fields?
> > 
> > I think we can fix the bug in the common code, and compensate for
> > the
> > quirky 4-byte access to bpf_sk_lookup.remote_port and
> > bpf_sock.dst_port
> > in the is_valid_access and convert_ctx_access.
> > 
> > With the patch as below, access to remote_port gets rewritten to:
> > 
> > * size=1, offset=0, r2 = *(u8 *)(r1 +36)
> >    0: (69) r2 = *(u16 *)(r1 +4)
> >    1: (54) w2 &= 255
> >    2: (b7) r0 = 0
> >    3: (95) exit
> > 
> > * size=1, offset=1, r2 = *(u8 *)(r1 +37)
> >    0: (69) r2 = *(u16 *)(r1 +4)
> >    1: (74) w2 >>= 8
> >    2: (54) w2 &= 255
> >    3: (b7) r0 = 0
> >    4: (95) exit
> > 
> > * size=1, offset=2, r2 = *(u8 *)(r1 +38)
> >    0: (b4) w2 = 0
> >    1: (54) w2 &= 255
> >    2: (b7) r0 = 0
> >    3: (95) exit
> > 
> > * size=1, offset=3, r2 = *(u8 *)(r1 +39)
> >    0: (b4) w2 = 0
> >    1: (74) w2 >>= 8
> >    2: (54) w2 &= 255
> >    3: (b7) r0 = 0
> >    4: (95) exit
> > 
> > * size=2, offset=0, r2 = *(u16 *)(r1 +36)
> >    0: (69) r2 = *(u16 *)(r1 +4)
> >    1: (b7) r0 = 0
> >    2: (95) exit
> > 
> > * size=2, offset=2, r2 = *(u16 *)(r1 +38)
> >    0: (b4) w2 = 0
> >    1: (b7) r0 = 0
> >    2: (95) exit
> > 
> > * size=4, offset=0, r2 = *(u32 *)(r1 +36)
> >    0: (69) r2 = *(u16 *)(r1 +4)
> >    1: (b7) r0 = 0
> >    2: (95) exit
> > 
> > How does that look to you?
> > 
> > Still need to give it a test on s390x.
> 
> Context conversion with patch below applied looks correct to me on
> s390x
> as well:
> 
> * size=1, offset=0, r2 = *(u8 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (bc) w2 = w2
>    2: (74) w2 >>= 8
>    3: (bc) w2 = w2
>    4: (54) w2 &= 255
>    5: (bc) w2 = w2
>    6: (b7) r0 = 0
>    7: (95) exit
> 
> * size=1, offset=1, r2 = *(u8 *)(r1 +37)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (bc) w2 = w2
>    2: (54) w2 &= 255
>    3: (bc) w2 = w2
>    4: (b7) r0 = 0
>    5: (95) exit
> 
> * size=1, offset=2, r2 = *(u8 *)(r1 +38)
>    0: (b4) w2 = 0
>    1: (bc) w2 = w2
>    2: (74) w2 >>= 8
>    3: (bc) w2 = w2
>    4: (54) w2 &= 255
>    5: (bc) w2 = w2
>    6: (b7) r0 = 0
>    7: (95) exit
> 
> * size=1, offset=3, r2 = *(u8 *)(r1 +39)
>    0: (b4) w2 = 0
>    1: (bc) w2 = w2
>    2: (54) w2 &= 255
>    3: (bc) w2 = w2
>    4: (b7) r0 = 0
>    5: (95) exit
> 
> * size=2, offset=0, r2 = *(u16 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (bc) w2 = w2
>    2: (b7) r0 = 0
>    3: (95) exit
> 
> * size=2, offset=2, r2 = *(u16 *)(r1 +38)
>    0: (b4) w2 = 0
>    1: (bc) w2 = w2
>    2: (b7) r0 = 0
>    3: (95) exit
> 
> * size=4, offset=0, r2 = *(u32 *)(r1 +36)
>    0: (69) r2 = *(u16 *)(r1 +4)
>    1: (bc) w2 = w2
>    2: (b7) r0 = 0
>    3: (95) exit
> 
> If we go this way, this should unbreak the bpf selftests on BE,
> independently of the patch 1 from this series.
> 
> Will send it as a patch, so that we continue the review discussion.

I applied this patch to bpf-next, and while it looks reasonable, the
test still fails, e.g. here:

	/* Load from remote_port field with zero padding (backward
compatibility) */
	val_u32 = *(__u32 *)&ctx->remote_port;
	if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
		return SK_DROP;

> 
> > 
> > --8<--
> > 
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 65869fd510e8..2625a1d2dabc 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -10856,13 +10856,24 @@ static bool sk_lookup_is_valid_access(int
> > off, int size,
> >         case bpf_ctx_range(struct bpf_sk_lookup, local_ip4):
> >         case bpf_ctx_range_till(struct bpf_sk_lookup,
> > remote_ip6[0], remote_ip6[3]):
> >         case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0],
> > local_ip6[3]):
> > -       case offsetof(struct bpf_sk_lookup, remote_port) ...
> > -            offsetof(struct bpf_sk_lookup, local_ip4) - 1:
> >         case bpf_ctx_range(struct bpf_sk_lookup, local_port):
> >         case bpf_ctx_range(struct bpf_sk_lookup, ingress_ifindex):
> >                 bpf_ctx_record_field_size(info, sizeof(__u32));
> >                 return bpf_ctx_narrow_access_ok(off, size,
> > sizeof(__u32));
> >  
> > +       case bpf_ctx_range(struct bpf_sk_lookup, remote_port):
> > +               /* Allow 4-byte access to 2-byte field for backward
> > compatibility */
> > +               if (size == sizeof(__u32))
> > +                       return off == offsetof(struct
> > bpf_sk_lookup, remote_port);
> > +               bpf_ctx_record_field_size(info, sizeof(__be16));
> > +               return bpf_ctx_narrow_access_ok(off, size,
> > sizeof(__be16));
> > +
> > +       case offsetofend(struct bpf_sk_lookup, remote_port) ...
> > +            offsetof(struct bpf_sk_lookup, local_ip4) - 1:
> > +               /* Allow access to zero padding for backward
> > compatiblity */
> > +               bpf_ctx_record_field_size(info, sizeof(__u16));
> > +               return bpf_ctx_narrow_access_ok(off, size,
> > sizeof(__u16));
> > +
> >         default:
> >                 return false;
> >         }
> > @@ -10944,6 +10955,11 @@ static u32
> > sk_lookup_convert_ctx_access(enum bpf_access_type type,
> >                                                      sport, 2,
> > target_size));
> >                 break;
> >  
> > +       case offsetofend(struct bpf_sk_lookup, remote_port):
> > +               *target_size = 2;
> > +               *insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
> > +               break;
> > +
> >         case offsetof(struct bpf_sk_lookup, local_port):
> >                 *insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si-
> > >src_reg,
> >                                       bpf_target_off(struct
> > bpf_sk_lookup_kern,
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets
  2022-03-14 18:25               ` Ilya Leoshkevich
@ 2022-03-14 20:57                 ` Jakub Sitnicki
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Sitnicki @ 2022-03-14 20:57 UTC (permalink / raw)
  To: Ilya Leoshkevich
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, bpf,
	Heiko Carstens, Vasily Gorbik


On Mon, Mar 14, 2022 at 07:25 PM +01, Ilya Leoshkevich wrote:
> On Mon, 2022-03-14 at 18:35 +0100, Jakub Sitnicki wrote:
>> On Thu, Mar 10, 2022 at 11:57 PM +01, Jakub Sitnicki wrote:
>> > On Wed, Mar 09, 2022 at 01:34 PM +01, Ilya Leoshkevich wrote:
>> > > On Wed, 2022-03-09 at 09:36 +0100, Jakub Sitnicki wrote:
>> > 
>> > [...]
>> > 
>> > > > 
>> > > > Consider this - today the below is true on both LE and BE,
>> > > > right?
>> > > > 
>> > > >   *(u32 *)&ctx->remote_port == *(u16 *)&ctx->remote_port
>> > > > 
>> > > > because the loads get converted to:
>> > > > 
>> > > >   *(u16 *)&ctx_kern->sport == *(u16 *)&ctx_kern->sport
>> > > > 
>> > > > IOW, today, because of the bug that you are fixing here, the
>> > > > data
>> > > > layout
>> > > > changes from the PoV of the BPF program depending on the load
>> > > > size.
>> > > > 
>> > > > With 2-byte loads, without this patch, the data layout appears
>> > > > as:
>> > > > 
>> > > >   struct bpf_sk_lookup {
>> > > >     ...
>> > > >     __be16 remote_port;
>> > > >     __be16 remote_port;
>> > > >     ...
>> > > >   }
>> > > 
>> > > I see, one can indeed argue that this is also a part of the ABI
>> > > now.
>> > > So we're stuck between a rock and a hard place.
>> > > 
>> > > > While for 4-byte loads, it appears as in your 2nd patch:
>> > > > 
>> > > >   struct bpf_sk_lookup {
>> > > >     ...
>> > > >     #if little-endian
>> > > >     __be16 remote_port;
>> > > >     __u16  :16; /* zero padding */
>> > > >     #elif big-endian
>> > > >     __u16  :16; /* zero padding */
>> > > >     __be16 remote_port;
>> > > >     #endif
>> > > >     ...
>> > > >   }
>> > > > 
>> > > > Because of that I don't see how we could keep complete ABI
>> > > > compatiblity,
>> > > > and have just one definition of struct bpf_sk_lookup that
>> > > > reflects
>> > > > it. These are conflicting requirements.
>> > > > 
>> > > > I'd bite the bullet for 4-byte loads, for the sake of having an
>> > > > endian-agnostic struct bpf_sk_lookup and struct bpf_sock
>> > > > definition
>> > > > in
>> > > > the uAPI header.
>> > > > 
>> > > > The sacrifice here is that the access converter will have to
>> > > > keep
>> > > > rewriting 4-byte access to bpf_sk_lookup.remote_port and
>> > > > bpf_sock.dst_port in this unexpected, quirky manner.
>> > > > 
>> > > > The expectation is that with time users will recompile their
>> > > > BPF
>> > > > progs
>> > > > against the updated bpf.h, and switch to 2-byte loads. That
>> > > > will make
>> > > > the quirk in the access converter dead code in time.
>> > > > 
>> > > > I don't have any better ideas. Sorry.
>> > > > 
>> > > > [...]
>> > > 
>> > > I agree, let's go ahead with this solution.
>> > > 
>> > > The only remaining problem that I see is: the bug is in the
>> > > common
>> > > code, and it will affect the fields that we add in the future.
>> > > 
>> > > Can we either document this state of things in a comment, or fix
>> > > the
>> > > bug and emulate the old behavior for certain fields?
>> > 
>> > I think we can fix the bug in the common code, and compensate for
>> > the
>> > quirky 4-byte access to bpf_sk_lookup.remote_port and
>> > bpf_sock.dst_port
>> > in the is_valid_access and convert_ctx_access.
>> > 
>> > With the patch as below, access to remote_port gets rewritten to:
>> > 
>> > * size=1, offset=0, r2 = *(u8 *)(r1 +36)
>> >    0: (69) r2 = *(u16 *)(r1 +4)
>> >    1: (54) w2 &= 255
>> >    2: (b7) r0 = 0
>> >    3: (95) exit
>> > 
>> > * size=1, offset=1, r2 = *(u8 *)(r1 +37)
>> >    0: (69) r2 = *(u16 *)(r1 +4)
>> >    1: (74) w2 >>= 8
>> >    2: (54) w2 &= 255
>> >    3: (b7) r0 = 0
>> >    4: (95) exit
>> > 
>> > * size=1, offset=2, r2 = *(u8 *)(r1 +38)
>> >    0: (b4) w2 = 0
>> >    1: (54) w2 &= 255
>> >    2: (b7) r0 = 0
>> >    3: (95) exit
>> > 
>> > * size=1, offset=3, r2 = *(u8 *)(r1 +39)
>> >    0: (b4) w2 = 0
>> >    1: (74) w2 >>= 8
>> >    2: (54) w2 &= 255
>> >    3: (b7) r0 = 0
>> >    4: (95) exit
>> > 
>> > * size=2, offset=0, r2 = *(u16 *)(r1 +36)
>> >    0: (69) r2 = *(u16 *)(r1 +4)
>> >    1: (b7) r0 = 0
>> >    2: (95) exit
>> > 
>> > * size=2, offset=2, r2 = *(u16 *)(r1 +38)
>> >    0: (b4) w2 = 0
>> >    1: (b7) r0 = 0
>> >    2: (95) exit
>> > 
>> > * size=4, offset=0, r2 = *(u32 *)(r1 +36)
>> >    0: (69) r2 = *(u16 *)(r1 +4)
>> >    1: (b7) r0 = 0
>> >    2: (95) exit
>> > 
>> > How does that look to you?
>> > 
>> > Still need to give it a test on s390x.
>> 
>> Context conversion with patch below applied looks correct to me on
>> s390x
>> as well:
>> 
>> * size=1, offset=0, r2 = *(u8 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (74) w2 >>= 8
>>    3: (bc) w2 = w2
>>    4: (54) w2 &= 255
>>    5: (bc) w2 = w2
>>    6: (b7) r0 = 0
>>    7: (95) exit
>> 
>> * size=1, offset=1, r2 = *(u8 *)(r1 +37)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (54) w2 &= 255
>>    3: (bc) w2 = w2
>>    4: (b7) r0 = 0
>>    5: (95) exit
>> 
>> * size=1, offset=2, r2 = *(u8 *)(r1 +38)
>>    0: (b4) w2 = 0
>>    1: (bc) w2 = w2
>>    2: (74) w2 >>= 8
>>    3: (bc) w2 = w2
>>    4: (54) w2 &= 255
>>    5: (bc) w2 = w2
>>    6: (b7) r0 = 0
>>    7: (95) exit
>> 
>> * size=1, offset=3, r2 = *(u8 *)(r1 +39)
>>    0: (b4) w2 = 0
>>    1: (bc) w2 = w2
>>    2: (54) w2 &= 255
>>    3: (bc) w2 = w2
>>    4: (b7) r0 = 0
>>    5: (95) exit
>> 
>> * size=2, offset=0, r2 = *(u16 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> * size=2, offset=2, r2 = *(u16 *)(r1 +38)
>>    0: (b4) w2 = 0
>>    1: (bc) w2 = w2
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> * size=4, offset=0, r2 = *(u32 *)(r1 +36)
>>    0: (69) r2 = *(u16 *)(r1 +4)
>>    1: (bc) w2 = w2
>>    2: (b7) r0 = 0
>>    3: (95) exit
>> 
>> If we go this way, this should unbreak the bpf selftests on BE,
>> independently of the patch 1 from this series.
>> 
>> Will send it as a patch, so that we continue the review discussion.
>
> I applied this patch to bpf-next, and while it looks reasonable, the
> test still fails, e.g. here:
>
> 	/* Load from remote_port field with zero padding (backward
> compatibility) */
> 	val_u32 = *(__u32 *)&ctx->remote_port;
> 	if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
> 		return SK_DROP;
>

You are right. That's that the check I recently added that broke the bpf
CI runs for s390x [1], and started our thread.

I had a stale build of test_progs with a fix akin to patch [2] that I
was testing and missed that. Thanks for giving it a run.

If we go with Martin's suggestion [3] here and avoid bpf_htonl(), then
we could make it work and save ourselves endianess checks.

IOW, a patch like below would be also needed to unbreak the tests.

First chunk is copied from your fixes to test_sk_lookup in patch 3 in
this RFC series, naturally.

[1] https://lore.kernel.org/bpf/CAEf4BzaRNLw9_EnaMo5e46CdEkzbJiVU3j9oxnsemBKjNFf3wQ@mail.gmail.com/
[2] https://lore.kernel.org/bpf/20220227202757.519015-4-jakub@cloudflare.com/
[3] https://lore.kernel.org/bpf/20220301062207.d3aqge5qg623asr6@kafai-mbp.dhcp.thefacebook.com/

---8<---

diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
index bf5b7caefdd0..2765a4bd500c 100644
--- a/tools/testing/selftests/bpf/progs/test_sk_lookup.c
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
@@ -413,15 +413,14 @@ int ctx_narrow_access(struct bpf_sk_lookup *ctx)

        /* Narrow loads from remote_port field. Expect SRC_PORT. */
        if (LSB(ctx->remote_port, 0) != ((SRC_PORT >> 0) & 0xff) ||
-           LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff) ||
-           LSB(ctx->remote_port, 2) != 0 || LSB(ctx->remote_port, 3) != 0)
+           LSB(ctx->remote_port, 1) != ((SRC_PORT >> 8) & 0xff))
                return SK_DROP;
        if (LSW(ctx->remote_port, 0) != SRC_PORT)
                return SK_DROP;

        /* Load from remote_port field with zero padding (backward compatibility) */
        val_u32 = *(__u32 *)&ctx->remote_port;
-       if (val_u32 != bpf_htonl(bpf_ntohs(SRC_PORT) << 16))
+       if (val_u32 != SRC_PORT)
                return SK_DROP;

        /* Narrow loads from local_port field. Expect DST_PORT. */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-03-14 21:49 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-22 18:25 [PATCH RFC bpf-next 0/3] bpf_sk_lookup.remote_port fixes Ilya Leoshkevich
2022-02-22 18:25 ` [PATCH RFC bpf-next 1/3] bpf: Fix certain narrow loads with offsets Ilya Leoshkevich
2022-03-08 15:01   ` Jakub Sitnicki
2022-03-08 23:58     ` Ilya Leoshkevich
2022-03-09  8:36       ` Jakub Sitnicki
2022-03-09 12:34         ` Ilya Leoshkevich
2022-03-10 22:57           ` Jakub Sitnicki
2022-03-14 17:35             ` Jakub Sitnicki
2022-03-14 18:25               ` Ilya Leoshkevich
2022-03-14 20:57                 ` Jakub Sitnicki
2022-02-22 18:25 ` [PATCH RFC bpf-next 2/3] bpf: Fix bpf_sk_lookup.remote_port on big-endian Ilya Leoshkevich
2022-02-27  2:44   ` Alexei Starovoitov
2022-02-27 20:30     ` Jakub Sitnicki
2022-02-28 10:19       ` Ilya Leoshkevich
2022-02-28 13:26         ` Jakub Sitnicki
2022-03-01  0:39           ` Ilya Leoshkevich
2022-03-01  0:40           ` Ilya Leoshkevich
2022-02-22 18:25 ` [PATCH RFC bpf-next 3/3] selftests/bpf: Adapt bpf_sk_lookup.remote_port loads Ilya Leoshkevich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.