BPF Archive on lore.kernel.org
 help / color / Atom feed
* [bpf PATCH v2 0/5] Fix sock_ops field read splat
@ 2020-07-29 16:22 John Fastabend
  2020-07-29 16:22 ` [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case John Fastabend
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: John Fastabend @ 2020-07-29 16:22 UTC (permalink / raw)
  To: john.fastabend, kafai, daniel, ast; +Cc: netdev, bpf

Doing some refactoring resulted in a kernel splat when reading sock_ops
fields.

Patch 1, has the details and proposed fix for sock_ops sk field access.

Patch 2, has the details and proposed fix for reading sock_ops->sk field

Patch 3, Gives a reproducer and test to verify the fix. I used the netcnt
program to test this because I wanted a splat to be generated which can
only be done if we have real traffic exercising the code.

Patch 4, Is an optional patch. While doing above I wanted to also verify
loads were OK. The code looked good, but I wanted some xlated code to
review as well. It seems like a good idea to add it here or at least
shouldn't hurt. I could push it into bpf-next if folks want.

Patch 5, Add reproducers for reading scok_ops->sk field.

I split Patch1 and Patch2 into two two patches because they have different
fixes tags. Seems like this will help with backporting. They could be
squashed though if folks want.

For selftests I was fairly verbose creating three patches each with the
associated xlated code to handle each of the three cases. My hope is this
helps the reader understand issues and review fixes. Its more or less
how I debugged the issue and created reproducers so it at least helped
me to have them logically different patches.

---

John Fastabend (5):
      bpf: sock_ops ctx access may stomp registers in corner case
      bpf: sock_ops sk access may stomp registers when dst_reg = src_reg
      bpf, selftests: Add tests for ctx access in sock_ops with single register
      bpf, selftests: Add tests for sock_ops load with r9,r8.r7 registers
      bpf, selftests: Add tests to sock_ops for loading sk


 net/core/filter.c                                  |   75 +++++++++++++++++---
 .../testing/selftests/bpf/progs/test_tcpbpf_kern.c |   41 +++++++++++
 2 files changed, 103 insertions(+), 13 deletions(-)

--
Signature

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case
  2020-07-29 16:22 [bpf PATCH v2 0/5] Fix sock_ops field read splat John Fastabend
@ 2020-07-29 16:22 ` John Fastabend
  2020-07-29 21:29   ` Song Liu
  2020-07-31 12:25   ` Daniel Borkmann
  2020-07-29 16:23 ` [bpf PATCH v2 2/5] bpf: sock_ops sk access may stomp registers when dst_reg = src_reg John Fastabend
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 14+ messages in thread
From: John Fastabend @ 2020-07-29 16:22 UTC (permalink / raw)
  To: john.fastabend, kafai, daniel, ast; +Cc: netdev, bpf

I had a sockmap program that after doing some refactoring started spewing
this splat at me:

[18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
[...]
[18610.807359] Call Trace:
[18610.807370]  ? 0xffffffffc114d0d5
[18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
[18610.807391]  tcp_connect+0x895/0xd50
[18610.807400]  tcp_v4_connect+0x465/0x4e0
[18610.807407]  __inet_stream_connect+0xd6/0x3a0
[18610.807412]  ? __inet_stream_connect+0x5/0x3a0
[18610.807417]  inet_stream_connect+0x3b/0x60
[18610.807425]  __sys_connect+0xed/0x120

After some debugging I was able to build this simple reproducer,

 __section("sockops/reproducer_bad")
 int bpf_reproducer_bad(struct bpf_sock_ops *skops)
 {
        volatile __maybe_unused __u32 i = skops->snd_ssthresh;
        return 0;
 }

And along the way noticed that below program ran without splat,

__section("sockops/reproducer_good")
int bpf_reproducer_good(struct bpf_sock_ops *skops)
{
        volatile __maybe_unused __u32 i = skops->snd_ssthresh;
        volatile __maybe_unused __u32 family;

        compiler_barrier();

        family = skops->family;
        return 0;
}

So I decided to check out the code we generate for the above two
programs and noticed each generates the BPF code you would expect,

0000000000000000 <bpf_reproducer_bad>:
;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
       0:       r1 = *(u32 *)(r1 + 96)
       1:       *(u32 *)(r10 - 4) = r1
;       return 0;
       2:       r0 = 0
       3:       exit

0000000000000000 <bpf_reproducer_good>:
;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
       0:       r2 = *(u32 *)(r1 + 96)
       1:       *(u32 *)(r10 - 4) = r2
;       family = skops->family;
       2:       r1 = *(u32 *)(r1 + 20)
       3:       *(u32 *)(r10 - 8) = r1
;       return 0;
       4:       r0 = 0
       5:       exit

So we get reasonable assembly, but still something was causing the null
pointer dereference. So, we load the programs and dump the xlated version
observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
translated by the skops access helpers.

int bpf_reproducer_bad(struct bpf_sock_ops * skops):
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   0: (61) r1 = *(u32 *)(r1 +28)
   1: (15) if r1 == 0x0 goto pc+2
   2: (79) r1 = *(u64 *)(r1 +0)
   3: (61) r1 = *(u32 *)(r1 +2340)
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   4: (63) *(u32 *)(r10 -4) = r1
; return 0;
   5: (b7) r0 = 0
   6: (95) exit

int bpf_reproducer_good(struct bpf_sock_ops * skops):
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   0: (61) r2 = *(u32 *)(r1 +28)
   1: (15) if r2 == 0x0 goto pc+2
   2: (79) r2 = *(u64 *)(r1 +0)
   3: (61) r2 = *(u32 *)(r2 +2340)
; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
   4: (63) *(u32 *)(r10 -4) = r2
; family = skops->family;
   5: (79) r1 = *(u64 *)(r1 +0)
   6: (69) r1 = *(u16 *)(r1 +16)
; family = skops->family;
   7: (63) *(u32 *)(r10 -8) = r1
; return 0;
   8: (b7) r0 = 0
   9: (95) exit

Then we look at lines 0 and 2 above. In the good case we do the zero
check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
into the bpf_sock_ops check and we can confirm that is the 'struct
sock *sk' pointer field. But, in the bad case,

   0: (61) r1 = *(u32 *)(r1 +28)
   1: (15) if r1 == 0x0 goto pc+2
   2: (79) r1 = *(u64 *)(r1 +0)

Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,

[18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001

The 0x01 makes sense because that is exactly the fullsock value. And
its not a valid dereference so we splat.

To fix we need to guard the case when a program is doing a sock_ops field
access with src_reg == dst_reg. This is already handled in the load case
where the ctx_access handler uses a tmp register being careful to
store the old value and restore it. To fix the get case test if
src_reg == dst_reg and in this case do the is_fullsock test in the
temporary register. Remembering to restore the temporary register before
writing to either dst_reg or src_reg to avoid smashing the pointer into
the struct holding the tmp variable.

Adding this inline code to test_tcpbpf_kern will now be generated
correctly from,

  9: r2 = *(u32 *)(r2 + 96)

to xlated code,

  13: (61) r9 = *(u32 *)(r2 +28)
  14: (15) if r9 == 0x0 goto pc+4
  15: (79) r9 = *(u64 *)(r2 +32)
  16: (79) r2 = *(u64 *)(r2 +0)
  17: (61) r2 = *(u32 *)(r2 +2348)
  18: (05) goto pc+1
  19: (79) r9 = *(u64 *)(r2 +32)

And in the normal case we keep the original code, because really this
is an edge case. From this,

  9: r2 = *(u32 *)(r6 + 96)

to xlated code,

  22: (61) r2 = *(u32 *)(r6 +28)
  23: (15) if r2 == 0x0 goto pc+2
  24: (79) r2 = *(u64 *)(r6 +0)
  25: (61) r2 = *(u32 *)(r2 +2348)

So three additional instructions if dst == src register, but I scanned
my current code base and did not see this pattern anywhere so should
not be a big deal. Further, it seems no one else has hit this or at
least reported it so it must a fairly rare pattern.

Fixes: 9b1f3d6e5af29 ("bpf: Refactor sock_ops_convert_ctx_access")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/filter.c |   26 ++++++++++++++++++++++++--
 1 file changed, 24 insertions(+), 2 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 29e34551..15a0842 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8314,15 +8314,31 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 /* Helper macro for adding read access to tcp_sock or sock fields. */
 #define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)			      \
 	do {								      \
+		int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;     \
 		BUILD_BUG_ON(sizeof_field(OBJ, OBJ_FIELD) >		      \
 			     sizeof_field(struct bpf_sock_ops, BPF_FIELD));   \
+		if (si->dst_reg == reg || si->src_reg == reg)		      \
+			reg--;						      \
+		if (si->dst_reg == reg || si->src_reg == reg)		      \
+			reg--;						      \
+		if (si->dst_reg == si->src_reg) {			      \
+			*insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,	      \
+					  offsetof(struct bpf_sock_ops_kern,  \
+					  temp));			      \
+			fullsock_reg = reg;				      \
+			jmp += 2;					      \
+		}							      \
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
 						struct bpf_sock_ops_kern,     \
 						is_fullsock),		      \
-				      si->dst_reg, si->src_reg,		      \
+				      fullsock_reg, si->src_reg,	      \
 				      offsetof(struct bpf_sock_ops_kern,      \
 					       is_fullsock));		      \
-		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 2);	      \
+		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);	      \
+		if (si->dst_reg == si->src_reg)				      \
+			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
+				      offsetof(struct bpf_sock_ops_kern,      \
+				      temp));				      \
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
 						struct bpf_sock_ops_kern, sk),\
 				      si->dst_reg, si->src_reg,		      \
@@ -8331,6 +8347,12 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 						       OBJ_FIELD),	      \
 				      si->dst_reg, si->dst_reg,		      \
 				      offsetof(OBJ, OBJ_FIELD));	      \
+		if (si->dst_reg == si->src_reg)	{			      \
+			*insn++ = BPF_JMP_A(1);				      \
+			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
+				      offsetof(struct bpf_sock_ops_kern,      \
+				      temp));				      \
+		}							      \
 	} while (0)
 
 #define SOCK_OPS_GET_TCP_SOCK_FIELD(FIELD) \


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [bpf PATCH v2 2/5] bpf: sock_ops sk access may stomp registers when dst_reg = src_reg
  2020-07-29 16:22 [bpf PATCH v2 0/5] Fix sock_ops field read splat John Fastabend
  2020-07-29 16:22 ` [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case John Fastabend
@ 2020-07-29 16:23 ` John Fastabend
  2020-07-29 21:30   ` Song Liu
  2020-07-29 16:23 ` [bpf PATCH v2 3/5] bpf, selftests: Add tests for ctx access in sock_ops with single register John Fastabend
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2020-07-29 16:23 UTC (permalink / raw)
  To: john.fastabend, kafai, daniel, ast; +Cc: netdev, bpf

Similar to patch ("bpf: sock_ops ctx access may stomp registers") if the
src_reg = dst_reg when reading the sk field of a sock_ops struct we
generate xlated code,

  53: (61) r9 = *(u32 *)(r9 +28)
  54: (15) if r9 == 0x0 goto pc+3
  56: (79) r9 = *(u64 *)(r9 +0)

This stomps on the r9 reg to do the sk_fullsock check and then when
reading the skops->sk field instead of the sk pointer we get the
sk_fullsock. To fix use similar pattern noted in the previous fix
and use the temp field to save/restore a register used to do
sk_fullsock check.

After the fix the generated xlated code reads,

  52: (7b) *(u64 *)(r9 +32) = r8
  53: (61) r8 = *(u32 *)(r9 +28)
  54: (15) if r9 == 0x0 goto pc+3
  55: (79) r8 = *(u64 *)(r9 +32)
  56: (79) r9 = *(u64 *)(r9 +0)
  57: (05) goto pc+1
  58: (79) r8 = *(u64 *)(r9 +32)

Here r9 register was in-use so r8 is chosen as the temporary register.
In line 52 r8 is saved in temp variable and at line 54 restored in case
fullsock != 0. Finally we handle fullsock == 0 case by restoring at
line 58.

This adds a new macro SOCK_OPS_GET_SK it is almost possible to merge
this with SOCK_OPS_GET_FIELD, but I found the extra branch logic a
bit more confusing than just adding a new macro despite a bit of
duplicating code.

Fixes: 1314ef561102e ("bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 net/core/filter.c |   49 ++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 38 insertions(+), 11 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 15a0842..0ddaed3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8355,6 +8355,43 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 		}							      \
 	} while (0)
 
+#define SOCK_OPS_GET_SK()							      \
+	do {								      \
+		int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 1;     \
+		if (si->dst_reg == reg || si->src_reg == reg)		      \
+			reg--;						      \
+		if (si->dst_reg == reg || si->src_reg == reg)		      \
+			reg--;						      \
+		if (si->dst_reg == si->src_reg) {			      \
+			*insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,	      \
+					  offsetof(struct bpf_sock_ops_kern,  \
+					  temp));			      \
+			fullsock_reg = reg;				      \
+			jmp += 2;					      \
+		}							      \
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
+						struct bpf_sock_ops_kern,     \
+						is_fullsock),		      \
+				      fullsock_reg, si->src_reg,	      \
+				      offsetof(struct bpf_sock_ops_kern,      \
+					       is_fullsock));		      \
+		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);	      \
+		if (si->dst_reg == si->src_reg)				      \
+			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
+				      offsetof(struct bpf_sock_ops_kern,      \
+				      temp));				      \
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
+						struct bpf_sock_ops_kern, sk),\
+				      si->dst_reg, si->src_reg,		      \
+				      offsetof(struct bpf_sock_ops_kern, sk));\
+		if (si->dst_reg == si->src_reg)	{			      \
+			*insn++ = BPF_JMP_A(1);				      \
+			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
+				      offsetof(struct bpf_sock_ops_kern,      \
+				      temp));				      \
+		}							      \
+	} while (0)
+
 #define SOCK_OPS_GET_TCP_SOCK_FIELD(FIELD) \
 		SOCK_OPS_GET_FIELD(FIELD, FIELD, struct tcp_sock)
 
@@ -8639,17 +8676,7 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
 		SOCK_OPS_GET_TCP_SOCK_FIELD(bytes_acked);
 		break;
 	case offsetof(struct bpf_sock_ops, sk):
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
-						struct bpf_sock_ops_kern,
-						is_fullsock),
-				      si->dst_reg, si->src_reg,
-				      offsetof(struct bpf_sock_ops_kern,
-					       is_fullsock));
-		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1);
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(
-						struct bpf_sock_ops_kern, sk),
-				      si->dst_reg, si->src_reg,
-				      offsetof(struct bpf_sock_ops_kern, sk));
+		SOCK_OPS_GET_SK();
 		break;
 	}
 	return insn - insn_buf;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [bpf PATCH v2 3/5] bpf, selftests: Add tests for ctx access in sock_ops with single register
  2020-07-29 16:22 [bpf PATCH v2 0/5] Fix sock_ops field read splat John Fastabend
  2020-07-29 16:22 ` [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case John Fastabend
  2020-07-29 16:23 ` [bpf PATCH v2 2/5] bpf: sock_ops sk access may stomp registers when dst_reg = src_reg John Fastabend
@ 2020-07-29 16:23 ` John Fastabend
  2020-07-29 21:35   ` Song Liu
  2020-07-29 16:23 ` [bpf PATCH v2 4/5] bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registers John Fastabend
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2020-07-29 16:23 UTC (permalink / raw)
  To: john.fastabend, kafai, daniel, ast; +Cc: netdev, bpf

To verify fix ("bpf: sock_ops ctx access may stomp registers in corner case")
we want to force compiler to generate the following code when accessing a
field with BPF_TCP_SOCK_GET_COMMON,

     r1 = *(u32 *)(r1 + 96) // r1 is skops ptr

Rather than depend on clang to do this we add the test with inline asm to
the tcpbpf test. This saves us from having to create another runner and
ensures that if we break this again test_tcpbpf will crash.

With above code we get the xlated code,

  11: (7b) *(u64 *)(r1 +32) = r9
  12: (61) r9 = *(u32 *)(r1 +28)
  13: (15) if r9 == 0x0 goto pc+4
  14: (79) r9 = *(u64 *)(r1 +32)
  15: (79) r1 = *(u64 *)(r1 +0)
  16: (61) r1 = *(u32 *)(r1 +2348)
  17: (05) goto pc+1
  18: (79) r9 = *(u64 *)(r1 +32)

We also add the normal case where src_reg != dst_reg so we can compare
code generation easily from llvm-objdump and ensure that case continues
to work correctly. The normal code is xlated to,

  20: (b7) r1 = 0
  21: (61) r1 = *(u32 *)(r3 +28)
  22: (15) if r1 == 0x0 goto pc+2
  23: (79) r1 = *(u64 *)(r3 +0)
  24: (61) r1 = *(u32 *)(r1 +2348)

Where the temp variable is not used.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 .../testing/selftests/bpf/progs/test_tcpbpf_kern.c |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
index 1f1966e..f8b13682 100644
--- a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
@@ -54,6 +54,7 @@ SEC("sockops")
 int bpf_testcb(struct bpf_sock_ops *skops)
 {
 	char header[sizeof(struct ipv6hdr) + sizeof(struct tcphdr)];
+	struct bpf_sock_ops *reuse = skops;
 	struct tcphdr *thdr;
 	int good_call_rv = 0;
 	int bad_call_rv = 0;
@@ -62,6 +63,18 @@ int bpf_testcb(struct bpf_sock_ops *skops)
 	int v = 0;
 	int op;
 
+	/* Test reading fields in bpf_sock_ops using single register */
+	asm volatile (
+		"%[reuse] = *(u32 *)(%[reuse] +96)"
+		: [reuse] "+r"(reuse)
+		:);
+
+	asm volatile (
+		"%[op] = *(u32 *)(%[skops] +96)"
+		: [op] "+r"(op)
+		: [skops] "r"(skops)
+		:);
+
 	op = (int) skops->op;
 
 	update_event_map(op);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [bpf PATCH v2 4/5] bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registers
  2020-07-29 16:22 [bpf PATCH v2 0/5] Fix sock_ops field read splat John Fastabend
                   ` (2 preceding siblings ...)
  2020-07-29 16:23 ` [bpf PATCH v2 3/5] bpf, selftests: Add tests for ctx access in sock_ops with single register John Fastabend
@ 2020-07-29 16:23 ` John Fastabend
  2020-07-29 21:36   ` Song Liu
  2020-07-29 16:24 ` [bpf PATCH v2 5/5] bpf, selftests: Add tests to sock_ops for loading sk John Fastabend
  2020-07-29 21:57 ` [bpf PATCH v2 0/5] Fix sock_ops field read splat Martin KaFai Lau
  5 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2020-07-29 16:23 UTC (permalink / raw)
  To: john.fastabend, kafai, daniel, ast; +Cc: netdev, bpf

Loads in sock_ops case when using high registers requires extra logic to
ensure the correct temporary value is used. We need to ensure the temp
register does not use either the src_reg or dst_reg. Lets add an asm
test to force the logic is triggered.

The xlated code is here,

  30: (7b) *(u64 *)(r9 +32) = r7
  31: (61) r7 = *(u32 *)(r9 +28)
  32: (15) if r7 == 0x0 goto pc+2
  33: (79) r7 = *(u64 *)(r9 +0)
  34: (63) *(u32 *)(r7 +916) = r8
  35: (79) r7 = *(u64 *)(r9 +32)

Notice r9 and r8 are not used for temp registers and r7 is chosen.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 .../testing/selftests/bpf/progs/test_tcpbpf_kern.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
index f8b13682..6420b61 100644
--- a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
@@ -75,6 +75,13 @@ int bpf_testcb(struct bpf_sock_ops *skops)
 		: [skops] "r"(skops)
 		:);
 
+	asm volatile (
+		"r9 = %[skops];\n"
+		"r8 = *(u32 *)(r9 +164);\n"
+		"*(u32 *)(r9 +164) = r8;\n"
+		:: [skops] "r"(skops)
+		: "r9", "r8");
+
 	op = (int) skops->op;
 
 	update_event_map(op);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [bpf PATCH v2 5/5] bpf, selftests: Add tests to sock_ops for loading sk
  2020-07-29 16:22 [bpf PATCH v2 0/5] Fix sock_ops field read splat John Fastabend
                   ` (3 preceding siblings ...)
  2020-07-29 16:23 ` [bpf PATCH v2 4/5] bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registers John Fastabend
@ 2020-07-29 16:24 ` John Fastabend
  2020-07-29 21:36   ` Song Liu
  2020-07-29 21:57 ` [bpf PATCH v2 0/5] Fix sock_ops field read splat Martin KaFai Lau
  5 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2020-07-29 16:24 UTC (permalink / raw)
  To: john.fastabend, kafai, daniel, ast; +Cc: netdev, bpf

Add tests to directly accesse sock_ops sk field. Then use it to
ensure a bad pointer access will fault if something goes wrong.
We do three tests:

The first test ensures when we read sock_ops sk pointer into the
same register that we don't fault as described earlier. Here r9
is chosen as the temp register.  The xlated code is,

  36: (7b) *(u64 *)(r1 +32) = r9
  37: (61) r9 = *(u32 *)(r1 +28)
  38: (15) if r9 == 0x0 goto pc+3
  39: (79) r9 = *(u64 *)(r1 +32)
  40: (79) r1 = *(u64 *)(r1 +0)
  41: (05) goto pc+1
  42: (79) r9 = *(u64 *)(r1 +32)

The second test ensures the temp register selection does not collide
with in-use register r9. Shown here r8 is chosen because r9 is the
sock_ops pointer. The xlated code is as follows,

  46: (7b) *(u64 *)(r9 +32) = r8
  47: (61) r8 = *(u32 *)(r9 +28)
  48: (15) if r8 == 0x0 goto pc+3
  49: (79) r8 = *(u64 *)(r9 +32)
  50: (79) r9 = *(u64 *)(r9 +0)
  51: (05) goto pc+1
  52: (79) r8 = *(u64 *)(r9 +32)

And finally, ensure we didn't break the base case where dst_reg does
not equal the source register,

  56: (61) r2 = *(u32 *)(r1 +28)
  57: (15) if r2 == 0x0 goto pc+1
  58: (79) r2 = *(u64 *)(r1 +0)

Notice it takes us an extra four instructions when src reg is the
same as dst reg. One to save the reg, two to restore depending on
the branch taken and a goto to jump over the second restore.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 .../testing/selftests/bpf/progs/test_tcpbpf_kern.c |   21 ++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
index 6420b61..3e6912e 100644
--- a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
+++ b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
@@ -82,6 +82,27 @@ int bpf_testcb(struct bpf_sock_ops *skops)
 		:: [skops] "r"(skops)
 		: "r9", "r8");
 
+	asm volatile (
+		"r1 = %[skops];\n"
+		"r1 = *(u64 *)(r1 +184);\n"
+		"if r1 == 0 goto +1;\n"
+		"r1 = *(u32 *)(r1 +4);\n"
+		:: [skops] "r"(skops):"r1");
+
+	asm volatile (
+		"r9 = %[skops];\n"
+		"r9 = *(u64 *)(r9 +184);\n"
+		"if r9 == 0 goto +1;\n"
+		"r9 = *(u32 *)(r9 +4);\n"
+		:: [skops] "r"(skops):"r9");
+
+	asm volatile (
+		"r1 = %[skops];\n"
+		"r2 = *(u64 *)(r1 +184);\n"
+		"if r2 == 0 goto +1;\n"
+		"r2 = *(u32 *)(r2 +4);\n"
+		:: [skops] "r"(skops):"r1", "r2");
+
 	op = (int) skops->op;
 
 	update_event_map(op);


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case
  2020-07-29 16:22 ` [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case John Fastabend
@ 2020-07-29 21:29   ` Song Liu
  2020-07-31 12:25   ` Daniel Borkmann
  1 sibling, 0 replies; 14+ messages in thread
From: Song Liu @ 2020-07-29 21:29 UTC (permalink / raw)
  To: John Fastabend
  Cc: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov, Networking, bpf

On Wed, Jul 29, 2020 at 9:24 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> I had a sockmap program that after doing some refactoring started spewing
> this splat at me:
[...]
> least reported it so it must a fairly rare pattern.
>
> Fixes: 9b1f3d6e5af29 ("bpf: Refactor sock_ops_convert_ctx_access")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bpf PATCH v2 2/5] bpf: sock_ops sk access may stomp registers when dst_reg = src_reg
  2020-07-29 16:23 ` [bpf PATCH v2 2/5] bpf: sock_ops sk access may stomp registers when dst_reg = src_reg John Fastabend
@ 2020-07-29 21:30   ` Song Liu
  0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2020-07-29 21:30 UTC (permalink / raw)
  To: John Fastabend
  Cc: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov, Networking, bpf

On Wed, Jul 29, 2020 at 9:25 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Similar to patch ("bpf: sock_ops ctx access may stomp registers") if the
> src_reg = dst_reg when reading the sk field of a sock_ops struct we
> generate xlated code,
>
>   53: (61) r9 = *(u32 *)(r9 +28)
>   54: (15) if r9 == 0x0 goto pc+3
>   56: (79) r9 = *(u64 *)(r9 +0)
>
> This stomps on the r9 reg to do the sk_fullsock check and then when
> reading the skops->sk field instead of the sk pointer we get the
> sk_fullsock. To fix use similar pattern noted in the previous fix
> and use the temp field to save/restore a register used to do
> sk_fullsock check.
>
> After the fix the generated xlated code reads,
>
>   52: (7b) *(u64 *)(r9 +32) = r8
>   53: (61) r8 = *(u32 *)(r9 +28)
>   54: (15) if r9 == 0x0 goto pc+3
>   55: (79) r8 = *(u64 *)(r9 +32)
>   56: (79) r9 = *(u64 *)(r9 +0)
>   57: (05) goto pc+1
>   58: (79) r8 = *(u64 *)(r9 +32)
>
> Here r9 register was in-use so r8 is chosen as the temporary register.
> In line 52 r8 is saved in temp variable and at line 54 restored in case
> fullsock != 0. Finally we handle fullsock == 0 case by restoring at
> line 58.
>
> This adds a new macro SOCK_OPS_GET_SK it is almost possible to merge
> this with SOCK_OPS_GET_FIELD, but I found the extra branch logic a
> bit more confusing than just adding a new macro despite a bit of
> duplicating code.
>
> Fixes: 1314ef561102e ("bpf: export bpf_sock for BPF_PROG_TYPE_SOCK_OPS prog type")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>
[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bpf PATCH v2 3/5] bpf, selftests: Add tests for ctx access in sock_ops with single register
  2020-07-29 16:23 ` [bpf PATCH v2 3/5] bpf, selftests: Add tests for ctx access in sock_ops with single register John Fastabend
@ 2020-07-29 21:35   ` Song Liu
  0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2020-07-29 21:35 UTC (permalink / raw)
  To: John Fastabend
  Cc: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov, Networking, bpf

On Wed, Jul 29, 2020 at 9:24 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
[...]

>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

> ---
>  .../testing/selftests/bpf/progs/test_tcpbpf_kern.c |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
> index 1f1966e..f8b13682 100644
> --- a/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
> +++ b/tools/testing/selftests/bpf/progs/test_tcpbpf_kern.c
> @@ -54,6 +54,7 @@ SEC("sockops")
>  int bpf_testcb(struct bpf_sock_ops *skops)
>  {
>         char header[sizeof(struct ipv6hdr) + sizeof(struct tcphdr)];
> +       struct bpf_sock_ops *reuse = skops;
>         struct tcphdr *thdr;
>         int good_call_rv = 0;
>         int bad_call_rv = 0;
> @@ -62,6 +63,18 @@ int bpf_testcb(struct bpf_sock_ops *skops)
>         int v = 0;
>         int op;
>
> +       /* Test reading fields in bpf_sock_ops using single register */
> +       asm volatile (
> +               "%[reuse] = *(u32 *)(%[reuse] +96)"
> +               : [reuse] "+r"(reuse)
> +               :);
> +
> +       asm volatile (
> +               "%[op] = *(u32 *)(%[skops] +96)"
> +               : [op] "+r"(op)
> +               : [skops] "r"(skops)
> +               :);
> +

Shall we add a separate test for this? It does seem to fix in bpf_testcb().

>         op = (int) skops->op;
>
>         update_event_map(op);
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bpf PATCH v2 4/5] bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registers
  2020-07-29 16:23 ` [bpf PATCH v2 4/5] bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registers John Fastabend
@ 2020-07-29 21:36   ` Song Liu
  0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2020-07-29 21:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov, Networking, bpf

On Wed, Jul 29, 2020 at 9:24 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Loads in sock_ops case when using high registers requires extra logic to
> ensure the correct temporary value is used. We need to ensure the temp
> register does not use either the src_reg or dst_reg. Lets add an asm
> test to force the logic is triggered.
>
> The xlated code is here,
>
>   30: (7b) *(u64 *)(r9 +32) = r7
>   31: (61) r7 = *(u32 *)(r9 +28)
>   32: (15) if r7 == 0x0 goto pc+2
>   33: (79) r7 = *(u64 *)(r9 +0)
>   34: (63) *(u32 *)(r7 +916) = r8
>   35: (79) r7 = *(u64 *)(r9 +32)
>
> Notice r9 and r8 are not used for temp registers and r7 is chosen.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bpf PATCH v2 5/5] bpf, selftests: Add tests to sock_ops for loading sk
  2020-07-29 16:24 ` [bpf PATCH v2 5/5] bpf, selftests: Add tests to sock_ops for loading sk John Fastabend
@ 2020-07-29 21:36   ` Song Liu
  0 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2020-07-29 21:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: Martin KaFai Lau, Daniel Borkmann, Alexei Starovoitov, Networking, bpf

On Wed, Jul 29, 2020 at 9:26 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
[...]
>
> Notice it takes us an extra four instructions when src reg is the
> same as dst reg. One to save the reg, two to restore depending on
> the branch taken and a goto to jump over the second restore.
>
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>

Acked-by: Song Liu <songliubraving@fb.com>

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bpf PATCH v2 0/5] Fix sock_ops field read splat
  2020-07-29 16:22 [bpf PATCH v2 0/5] Fix sock_ops field read splat John Fastabend
                   ` (4 preceding siblings ...)
  2020-07-29 16:24 ` [bpf PATCH v2 5/5] bpf, selftests: Add tests to sock_ops for loading sk John Fastabend
@ 2020-07-29 21:57 ` Martin KaFai Lau
  5 siblings, 0 replies; 14+ messages in thread
From: Martin KaFai Lau @ 2020-07-29 21:57 UTC (permalink / raw)
  To: John Fastabend; +Cc: daniel, ast, netdev, bpf

On Wed, Jul 29, 2020 at 09:22:36AM -0700, John Fastabend wrote:
> Doing some refactoring resulted in a kernel splat when reading sock_ops
> fields.
> 
> Patch 1, has the details and proposed fix for sock_ops sk field access.
> 
> Patch 2, has the details and proposed fix for reading sock_ops->sk field
> 
> Patch 3, Gives a reproducer and test to verify the fix. I used the netcnt
> program to test this because I wanted a splat to be generated which can
> only be done if we have real traffic exercising the code.
> 
> Patch 4, Is an optional patch. While doing above I wanted to also verify
> loads were OK. The code looked good, but I wanted some xlated code to
> review as well. It seems like a good idea to add it here or at least
> shouldn't hurt. I could push it into bpf-next if folks want.
> 
> Patch 5, Add reproducers for reading scok_ops->sk field.
> 
> I split Patch1 and Patch2 into two two patches because they have different
> fixes tags. Seems like this will help with backporting. They could be
> squashed though if folks want.
> 
> For selftests I was fairly verbose creating three patches each with the
> associated xlated code to handle each of the three cases. My hope is this
> helps the reader understand issues and review fixes. Its more or less
> how I debugged the issue and created reproducers so it at least helped
> me to have them logically different patches.
LGTM also.  Thanks for the fixes and the tests!

Acked-by: Martin KaFai Lau <kafai@fb.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case
  2020-07-29 16:22 ` [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case John Fastabend
  2020-07-29 21:29   ` Song Liu
@ 2020-07-31 12:25   ` Daniel Borkmann
  2020-07-31 22:46     ` John Fastabend
  1 sibling, 1 reply; 14+ messages in thread
From: Daniel Borkmann @ 2020-07-31 12:25 UTC (permalink / raw)
  To: John Fastabend, kafai, ast; +Cc: netdev, bpf

On 7/29/20 6:22 PM, John Fastabend wrote:
> I had a sockmap program that after doing some refactoring started spewing
> this splat at me:
> 
> [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> [...]
> [18610.807359] Call Trace:
> [18610.807370]  ? 0xffffffffc114d0d5
> [18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
> [18610.807391]  tcp_connect+0x895/0xd50
> [18610.807400]  tcp_v4_connect+0x465/0x4e0
> [18610.807407]  __inet_stream_connect+0xd6/0x3a0
> [18610.807412]  ? __inet_stream_connect+0x5/0x3a0
> [18610.807417]  inet_stream_connect+0x3b/0x60
> [18610.807425]  __sys_connect+0xed/0x120
> 
> After some debugging I was able to build this simple reproducer,
> 
>   __section("sockops/reproducer_bad")
>   int bpf_reproducer_bad(struct bpf_sock_ops *skops)
>   {
>          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>          return 0;
>   }
> 
> And along the way noticed that below program ran without splat,
> 
> __section("sockops/reproducer_good")
> int bpf_reproducer_good(struct bpf_sock_ops *skops)
> {
>          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>          volatile __maybe_unused __u32 family;
> 
>          compiler_barrier();
> 
>          family = skops->family;
>          return 0;
> }
> 
> So I decided to check out the code we generate for the above two
> programs and noticed each generates the BPF code you would expect,
> 
> 0000000000000000 <bpf_reproducer_bad>:
> ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>         0:       r1 = *(u32 *)(r1 + 96)
>         1:       *(u32 *)(r10 - 4) = r1
> ;       return 0;
>         2:       r0 = 0
>         3:       exit
> 
> 0000000000000000 <bpf_reproducer_good>:
> ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>         0:       r2 = *(u32 *)(r1 + 96)
>         1:       *(u32 *)(r10 - 4) = r2
> ;       family = skops->family;
>         2:       r1 = *(u32 *)(r1 + 20)
>         3:       *(u32 *)(r10 - 8) = r1
> ;       return 0;
>         4:       r0 = 0
>         5:       exit
> 
> So we get reasonable assembly, but still something was causing the null
> pointer dereference. So, we load the programs and dump the xlated version
> observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
> translated by the skops access helpers.
> 
> int bpf_reproducer_bad(struct bpf_sock_ops * skops):
> ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>     0: (61) r1 = *(u32 *)(r1 +28)
>     1: (15) if r1 == 0x0 goto pc+2
>     2: (79) r1 = *(u64 *)(r1 +0)
>     3: (61) r1 = *(u32 *)(r1 +2340)
> ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>     4: (63) *(u32 *)(r10 -4) = r1
> ; return 0;
>     5: (b7) r0 = 0
>     6: (95) exit
> 
> int bpf_reproducer_good(struct bpf_sock_ops * skops):
> ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>     0: (61) r2 = *(u32 *)(r1 +28)
>     1: (15) if r2 == 0x0 goto pc+2
>     2: (79) r2 = *(u64 *)(r1 +0)
>     3: (61) r2 = *(u32 *)(r2 +2340)
> ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
>     4: (63) *(u32 *)(r10 -4) = r2
> ; family = skops->family;
>     5: (79) r1 = *(u64 *)(r1 +0)
>     6: (69) r1 = *(u16 *)(r1 +16)
> ; family = skops->family;
>     7: (63) *(u32 *)(r10 -8) = r1
> ; return 0;
>     8: (b7) r0 = 0
>     9: (95) exit
> 
> Then we look at lines 0 and 2 above. In the good case we do the zero
> check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
> into the bpf_sock_ops check and we can confirm that is the 'struct
> sock *sk' pointer field. But, in the bad case,
> 
>     0: (61) r1 = *(u32 *)(r1 +28)
>     1: (15) if r1 == 0x0 goto pc+2
>     2: (79) r1 = *(u64 *)(r1 +0)
> 
> Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
> line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
> 
> [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> 
> The 0x01 makes sense because that is exactly the fullsock value. And
> its not a valid dereference so we splat.
> 
> To fix we need to guard the case when a program is doing a sock_ops field
> access with src_reg == dst_reg. This is already handled in the load case
> where the ctx_access handler uses a tmp register being careful to
> store the old value and restore it. To fix the get case test if
> src_reg == dst_reg and in this case do the is_fullsock test in the
> temporary register. Remembering to restore the temporary register before
> writing to either dst_reg or src_reg to avoid smashing the pointer into
> the struct holding the tmp variable.
> 
> Adding this inline code to test_tcpbpf_kern will now be generated
> correctly from,
> 
>    9: r2 = *(u32 *)(r2 + 96)
> 
> to xlated code,
> 
>    13: (61) r9 = *(u32 *)(r2 +28)
>    14: (15) if r9 == 0x0 goto pc+4
>    15: (79) r9 = *(u64 *)(r2 +32)
>    16: (79) r2 = *(u64 *)(r2 +0)
>    17: (61) r2 = *(u32 *)(r2 +2348)
>    18: (05) goto pc+1
>    19: (79) r9 = *(u64 *)(r2 +32)

The diff below looks good to me, but I'm confused on this one above. I'm probably
missing something, but given this is the dst == src case with the r2 register, where
in the dump do we first saves the content of r9 into the scratch tmp store?
Line 19 seems to restore it, but the save is missing, no?

Please double check whether this was just omitted, but I would really like to have
the commit message 100% correct as it otherwise causes confusion when we stare at it
again a month later wrt what was the original intention.

> And in the normal case we keep the original code, because really this
> is an edge case. From this,
> 
>    9: r2 = *(u32 *)(r6 + 96)
> 
> to xlated code,
> 
>    22: (61) r2 = *(u32 *)(r6 +28)
>    23: (15) if r2 == 0x0 goto pc+2
>    24: (79) r2 = *(u64 *)(r6 +0)
>    25: (61) r2 = *(u32 *)(r2 +2348)
> 
> So three additional instructions if dst == src register, but I scanned
> my current code base and did not see this pattern anywhere so should
> not be a big deal. Further, it seems no one else has hit this or at
> least reported it so it must a fairly rare pattern.
> 
> Fixes: 9b1f3d6e5af29 ("bpf: Refactor sock_ops_convert_ctx_access")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ---
>   net/core/filter.c |   26 ++++++++++++++++++++++++--
>   1 file changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 29e34551..15a0842 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -8314,15 +8314,31 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>   /* Helper macro for adding read access to tcp_sock or sock fields. */
>   #define SOCK_OPS_GET_FIELD(BPF_FIELD, OBJ_FIELD, OBJ)			      \
>   	do {								      \
> +		int fullsock_reg = si->dst_reg, reg = BPF_REG_9, jmp = 2;     \
>   		BUILD_BUG_ON(sizeof_field(OBJ, OBJ_FIELD) >		      \
>   			     sizeof_field(struct bpf_sock_ops, BPF_FIELD));   \
> +		if (si->dst_reg == reg || si->src_reg == reg)		      \
> +			reg--;						      \
> +		if (si->dst_reg == reg || si->src_reg == reg)		      \
> +			reg--;						      \
> +		if (si->dst_reg == si->src_reg) {			      \
> +			*insn++ = BPF_STX_MEM(BPF_DW, si->src_reg, reg,	      \
> +					  offsetof(struct bpf_sock_ops_kern,  \
> +					  temp));			      \
> +			fullsock_reg = reg;				      \
> +			jmp += 2;					      \
> +		}							      \
>   		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
>   						struct bpf_sock_ops_kern,     \
>   						is_fullsock),		      \
> -				      si->dst_reg, si->src_reg,		      \
> +				      fullsock_reg, si->src_reg,	      \
>   				      offsetof(struct bpf_sock_ops_kern,      \
>   					       is_fullsock));		      \
> -		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 2);	      \
> +		*insn++ = BPF_JMP_IMM(BPF_JEQ, fullsock_reg, 0, jmp);	      \
> +		if (si->dst_reg == si->src_reg)				      \
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
> +				      offsetof(struct bpf_sock_ops_kern,      \
> +				      temp));				      \
>   		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(			      \
>   						struct bpf_sock_ops_kern, sk),\
>   				      si->dst_reg, si->src_reg,		      \
> @@ -8331,6 +8347,12 @@ static u32 sock_ops_convert_ctx_access(enum bpf_access_type type,
>   						       OBJ_FIELD),	      \
>   				      si->dst_reg, si->dst_reg,		      \
>   				      offsetof(OBJ, OBJ_FIELD));	      \
> +		if (si->dst_reg == si->src_reg)	{			      \
> +			*insn++ = BPF_JMP_A(1);				      \
> +			*insn++ = BPF_LDX_MEM(BPF_DW, reg, si->src_reg,	      \
> +				      offsetof(struct bpf_sock_ops_kern,      \
> +				      temp));				      \
> +		}							      \
>   	} while (0)
>   
>   #define SOCK_OPS_GET_TCP_SOCK_FIELD(FIELD) \
> 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case
  2020-07-31 12:25   ` Daniel Borkmann
@ 2020-07-31 22:46     ` John Fastabend
  0 siblings, 0 replies; 14+ messages in thread
From: John Fastabend @ 2020-07-31 22:46 UTC (permalink / raw)
  To: Daniel Borkmann, John Fastabend, kafai, ast; +Cc: netdev, bpf

Daniel Borkmann wrote:
> On 7/29/20 6:22 PM, John Fastabend wrote:
> > I had a sockmap program that after doing some refactoring started spewing
> > this splat at me:
> > 
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > [...]
> > [18610.807359] Call Trace:
> > [18610.807370]  ? 0xffffffffc114d0d5
> > [18610.807382]  __cgroup_bpf_run_filter_sock_ops+0x7d/0xb0
> > [18610.807391]  tcp_connect+0x895/0xd50
> > [18610.807400]  tcp_v4_connect+0x465/0x4e0
> > [18610.807407]  __inet_stream_connect+0xd6/0x3a0
> > [18610.807412]  ? __inet_stream_connect+0x5/0x3a0
> > [18610.807417]  inet_stream_connect+0x3b/0x60
> > [18610.807425]  __sys_connect+0xed/0x120
> > 
> > After some debugging I was able to build this simple reproducer,
> > 
> >   __section("sockops/reproducer_bad")
> >   int bpf_reproducer_bad(struct bpf_sock_ops *skops)
> >   {
> >          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >          return 0;
> >   }
> > 
> > And along the way noticed that below program ran without splat,
> > 
> > __section("sockops/reproducer_good")
> > int bpf_reproducer_good(struct bpf_sock_ops *skops)
> > {
> >          volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >          volatile __maybe_unused __u32 family;
> > 
> >          compiler_barrier();
> > 
> >          family = skops->family;
> >          return 0;
> > }
> > 
> > So I decided to check out the code we generate for the above two
> > programs and noticed each generates the BPF code you would expect,
> > 
> > 0000000000000000 <bpf_reproducer_bad>:
> > ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >         0:       r1 = *(u32 *)(r1 + 96)
> >         1:       *(u32 *)(r10 - 4) = r1
> > ;       return 0;
> >         2:       r0 = 0
> >         3:       exit
> > 
> > 0000000000000000 <bpf_reproducer_good>:
> > ;       volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >         0:       r2 = *(u32 *)(r1 + 96)
> >         1:       *(u32 *)(r10 - 4) = r2
> > ;       family = skops->family;
> >         2:       r1 = *(u32 *)(r1 + 20)
> >         3:       *(u32 *)(r10 - 8) = r1
> > ;       return 0;
> >         4:       r0 = 0
> >         5:       exit
> > 
> > So we get reasonable assembly, but still something was causing the null
> > pointer dereference. So, we load the programs and dump the xlated version
> > observing that line 0 above 'r* = *(u32 *)(r1 +96)' is going to be
> > translated by the skops access helpers.
> > 
> > int bpf_reproducer_bad(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     0: (61) r1 = *(u32 *)(r1 +28)
> >     1: (15) if r1 == 0x0 goto pc+2
> >     2: (79) r1 = *(u64 *)(r1 +0)
> >     3: (61) r1 = *(u32 *)(r1 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     4: (63) *(u32 *)(r10 -4) = r1
> > ; return 0;
> >     5: (b7) r0 = 0
> >     6: (95) exit
> > 
> > int bpf_reproducer_good(struct bpf_sock_ops * skops):
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     0: (61) r2 = *(u32 *)(r1 +28)
> >     1: (15) if r2 == 0x0 goto pc+2
> >     2: (79) r2 = *(u64 *)(r1 +0)
> >     3: (61) r2 = *(u32 *)(r2 +2340)
> > ; volatile __maybe_unused __u32 i = skops->snd_ssthresh;
> >     4: (63) *(u32 *)(r10 -4) = r2
> > ; family = skops->family;
> >     5: (79) r1 = *(u64 *)(r1 +0)
> >     6: (69) r1 = *(u16 *)(r1 +16)
> > ; family = skops->family;
> >     7: (63) *(u32 *)(r10 -8) = r1
> > ; return 0;
> >     8: (b7) r0 = 0
> >     9: (95) exit
> > 
> > Then we look at lines 0 and 2 above. In the good case we do the zero
> > check in r2 and then load 'r1 + 0' at line 2. Do a quick cross-check
> > into the bpf_sock_ops check and we can confirm that is the 'struct
> > sock *sk' pointer field. But, in the bad case,
> > 
> >     0: (61) r1 = *(u32 *)(r1 +28)
> >     1: (15) if r1 == 0x0 goto pc+2
> >     2: (79) r1 = *(u64 *)(r1 +0)
> > 
> > Oh no, we read 'r1 +28' into r1, this is skops->fullsock and then in
> > line 2 we read the 'r1 +0' as a pointer. Now jumping back to our spat,
> > 
> > [18610.807284] BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
> > 
> > The 0x01 makes sense because that is exactly the fullsock value. And
> > its not a valid dereference so we splat.
> > 
> > To fix we need to guard the case when a program is doing a sock_ops field
> > access with src_reg == dst_reg. This is already handled in the load case
> > where the ctx_access handler uses a tmp register being careful to
> > store the old value and restore it. To fix the get case test if
> > src_reg == dst_reg and in this case do the is_fullsock test in the
> > temporary register. Remembering to restore the temporary register before
> > writing to either dst_reg or src_reg to avoid smashing the pointer into
> > the struct holding the tmp variable.
> > 
> > Adding this inline code to test_tcpbpf_kern will now be generated
> > correctly from,
> > 
> >    9: r2 = *(u32 *)(r2 + 96)
> > 
> > to xlated code,

I have this in my logs at line 12,

                *(u64 *)(r2 +32) = r9
> >    13: (61) r9 = *(u32 *)(r2 +28)
> >    14: (15) if r9 == 0x0 goto pc+4
> >    15: (79) r9 = *(u64 *)(r2 +32)
> >    16: (79) r2 = *(u64 *)(r2 +0)
> >    17: (61) r2 = *(u32 *)(r2 +2348)
> >    18: (05) goto pc+1
> >    19: (79) r9 = *(u64 *)(r2 +32)
> 
> The diff below looks good to me, but I'm confused on this one above. I'm probably
> missing something, but given this is the dst == src case with the r2 register, where
> in the dump do we first saves the content of r9 into the scratch tmp store?
> Line 19 seems to restore it, but the save is missing, no?
> 
> Please double check whether this was just omitted, but I would really like to have
> the commit message 100% correct as it otherwise causes confusion when we stare at it
> again a month later wrt what was the original intention.

off-by-one on the cut'n'paste into the commit message. Let me send a v3
with a correction to the commit. I do want this to be correct.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, back to index

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-29 16:22 [bpf PATCH v2 0/5] Fix sock_ops field read splat John Fastabend
2020-07-29 16:22 ` [bpf PATCH v2 1/5] bpf: sock_ops ctx access may stomp registers in corner case John Fastabend
2020-07-29 21:29   ` Song Liu
2020-07-31 12:25   ` Daniel Borkmann
2020-07-31 22:46     ` John Fastabend
2020-07-29 16:23 ` [bpf PATCH v2 2/5] bpf: sock_ops sk access may stomp registers when dst_reg = src_reg John Fastabend
2020-07-29 21:30   ` Song Liu
2020-07-29 16:23 ` [bpf PATCH v2 3/5] bpf, selftests: Add tests for ctx access in sock_ops with single register John Fastabend
2020-07-29 21:35   ` Song Liu
2020-07-29 16:23 ` [bpf PATCH v2 4/5] bpf, selftests: Add tests for sock_ops load with r9, r8.r7 registers John Fastabend
2020-07-29 21:36   ` Song Liu
2020-07-29 16:24 ` [bpf PATCH v2 5/5] bpf, selftests: Add tests to sock_ops for loading sk John Fastabend
2020-07-29 21:36   ` Song Liu
2020-07-29 21:57 ` [bpf PATCH v2 0/5] Fix sock_ops field read splat Martin KaFai Lau

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git