netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic()
       [not found]   ` <20200724012546.302155-20-viro@ZenIV.linux.org.uk>
@ 2020-10-14 22:26     ` Jason A. Donenfeld
  2020-10-14 22:51       ` Linus Torvalds
  2020-10-14 23:02       ` [PATCH] powerpc32: don't adjust unmoved stack pointer in csum_partial_copy_generic() epilogue Jason A. Donenfeld
  0 siblings, 2 replies; 8+ messages in thread
From: Jason A. Donenfeld @ 2020-10-14 22:26 UTC (permalink / raw)
  To: Al Viro; +Cc: Linus Torvalds, linux-kernel, linux-arch, netdev

Hi Al,

On Fri, Jul 24, 2020 at 02:25:46AM +0100, Al Viro wrote:
> From: Al Viro <viro@zeniv.linux.org.uk>
> 
> ... and get rid of the pointless fallback in the wrappers.  On error it used
> to zero the unwritten area and calculate the csum of the entire thing.  Not
> wanting to do it in assembler part had been very reasonable; doing that in
> the first place, OTOH...  In case of an error the caller discards the data
> we'd copied, along with whatever checksum it might've had.

This patch is causing crashes in WireGuard's CI over at
https://www.wireguard.com/build-status/ . Apparently sending a simple
network packet winds up triggering refcount_t's warn-on-saturate code. I
don't know if the new assembly failed to reset some flag or if something
else is up. I can start digging into it if you want, but I thought I
should let you know first about the issue. The splat follows below.

Thanks,
Jason

$ ping -c 10 -f -W 1 192.168.241.1
PING 192.168.241.1 (192.168.241.1) 56(84) bytes of data.
[    1.432922] ------------[ cut here ]------------
[    1.433069] refcount_t: saturated; leaking memory.
[    1.433344] WARNING: CPU: 3 PID: 90 at refcount_warn_saturate+0x100/0x1bc
[    1.433646] CPU: 3 PID: 90 Comm: ping Not tainted 5.9.0+ #3
[    1.433797] NIP:  c01a6fa0 LR: c01a6fa0 CTR: c01ccbec
[    1.433964] REGS: cfacfb80 TRAP: 0700   Not tainted  (5.9.0+)
[    1.434102] MSR:  00029000 <CE,EE,ME>  CR: 28022404  XER: 00000000
[    1.434345]
[    1.434345] GPR00: c01a6fa0 cfacfc38 cf8eeae0 00000026 3fffefff cfacfa90 cfacfaa0 00021000
[    1.434345] GPR08: 0f4a1000 00000000 c08b4674 c0918704 42022404 00000000 cfa34180 00000000
[    1.434345] GPR16: 00000000 cf8ef004 00000000 00000000 00000040 00000000 00000000 cfbac230
[    1.434345] GPR24: cfacfce8 c02a802c 00000000 cfa34180 cfacfc58 c02aa53c 55c0a4ff 00000000
[    1.435471] NIP [c01a6fa0] refcount_warn_saturate+0x100/0x1bc
[    1.435615] LR [c01a6fa0] refcount_warn_saturate+0x100/0x1bc
[    1.435825] Call Trace:
[    1.435922] [cfacfc38] [c01a6fa0] refcount_warn_saturate+0x100/0x1bc (unreliable)
[    1.436149] [cfacfc48] [c02a7f14] __ip_append_data.isra.0+0x8a8/0xde0
[    1.436302] [cfacfce8] [c02a84e0] ip_append_data.part.0+0x94/0xf0
[    1.436438] [cfacfd18] [c02dffe0] raw_sendmsg+0x298/0xa84
[    1.436544] [cfacfe48] [c020b9ec] __sys_sendto+0xdc/0x13c
[    1.436641] [cfacff38] [c000f1dc] ret_from_syscall+0x0/0x38
[    1.436824] --- interrupt: c01 at 0xb7e44f00
[    1.436824]     LR = 0xb7e21ba0
[    1.437038] Instruction dump:
[    1.437239] 3d20c092 39291bc1 89490001 2c0a0000 4082ff64 3c60c040 7c0802a6 39400001
[    1.437439] 38633b74 90010014 99490001 4be9b6e1 <0fe00000> 80010014 7c0803a6 4bffff38
[    1.437753] ---[ end trace aaa4b4788958d0a6 ]---
[    1.440214] ------------[ cut here ]------------
[    1.440301] refcount_t: underflow; use-after-free.
[    1.440397] WARNING: CPU: 3 PID: 90 at refcount_warn_saturate+0x1ac/0x1bc
[    1.440587] CPU: 3 PID: 90 Comm: ping Tainted: G        W         5.9.0+ #3
[    1.440741] NIP:  c01a704c LR: c01a704c CTR: c01ccbec
[    1.440857] REGS: cfacfaa0 TRAP: 0700   Tainted: G        W          (5.9.0+)
[    1.441016] MSR:  00029000 <CE,EE,ME>  CR: 48022404  XER: 00000000
[    1.441176]
[    1.441176] GPR00: c01a704c cfacfb58 cf8eeae0 00000026 3fffefff cfacf9b0 cfacf9c0 00021000
[    1.441176] GPR08: 0f4a1000 00000400 c08b4674 c0918704 42022404 00000000 10020464 00000003
[    1.441176] GPR16: 7ff00000 10020000 00000080 cfb27000 cfb2704c c0930000 cfacfc54 c092d260
[    1.441176] GPR24: 0000058c cfa82120 cfa8212c cfa8212c 00000000 cfa82000 cfacfd44 cfacfc58
[    1.441995] NIP [c01a704c] refcount_warn_saturate+0x1ac/0x1bc
[    1.442125] LR [c01a704c] refcount_warn_saturate+0x1ac/0x1bc
[    1.442252] Call Trace:
[    1.442320] [cfacfb58] [c01a704c] refcount_warn_saturate+0x1ac/0x1bc (unreliable)
[    1.442726] [cfacfb68] [c020e7dc] sock_wfree+0x130/0x134
[    1.442877] [cfacfb78] [c01f1388] wg_packet_send_staged_packets+0x234/0x6b4
[    1.443061] [cfacfbb8] [c01eecf8] wg_xmit+0x2a0/0x46c
[    1.443204] [cfacfbf8] [c0232134] dev_hard_start_xmit+0x190/0x1c0
[    1.443369] [cfacfc38] [c0232f2c] __dev_queue_xmit+0x4d0/0x844
[    1.443527] [cfacfc88] [c02a7134] ip_finish_output2+0x180/0x6b8
[    1.443686] [cfacfcb8] [c02aa3e8] ip_output+0xf0/0x1c0
[    1.443829] [cfacfd08] [c02ab14c] ip_send_skb+0x24/0xe8
[    1.443975] [cfacfd18] [c02e04bc] raw_sendmsg+0x774/0xa84
[    1.444124] [cfacfe48] [c020b9ec] __sys_sendto+0xdc/0x13c
[    1.444274] [cfacff38] [c000f1dc] ret_from_syscall+0x0/0x38
[    1.444437] --- interrupt: c01 at 0xb7e44f00
[    1.444437]     LR = 0xb7e21ba0
[    1.444644] Instruction dump:
[    1.444736] 4be9b661 0fe00000 80010014 7c0803a6 4bfffeb8 3c60c040 7c0802a6 39400001
[    1.444989] 38633bd8 90010014 99490003 4be9b635 <0fe00000> 80010014 7c0803a6 4bfffe8c
[    1.445252] ---[ end trace aaa4b4788958d0a7 ]---
[    1.445583] BUG: Unable to handle kernel instruction fetch (NULL pointer?)
[    1.445767] Faulting instruction address: 0x00000000
[    1.446051] Oops: Kernel access of bad area, sig: 11 [#1]
[    1.446210] BE PAGE_SIZE=4K PREEMPT SMP NR_CPUS=4 QEMU e500
[    1.446379] CPU: 3 PID: 90 Comm: ping Tainted: G        W         5.9.0+ #3
[    1.446678] NIP:  00000000 LR: c020e758 CTR: 00000000
[    1.446812] REGS: cfacfab0 TRAP: 0400   Tainted: G        W          (5.9.0+)
[    1.446989] MSR:  00029000 <CE,EE,ME>  CR: 48022404  XER: 00000000
[    1.447183]
[    1.447183] GPR00: c020e7dc cfacfb68 cf8eeae0 cfacfc58 3fffefff cfacf9b0 cfacf9c0 00021000
[    1.447183] GPR08: 0f4a1000 00000000 c08b4674 c0918704 42022404 00000000 10020464 00000003
[    1.447183] GPR16: 7ff00000 10020000 00000080 cfb27000 cfb2704c c0930000 cfacfc54 c092d260
[    1.447183] GPR24: 0000058c cfa82120 cfa8212c cfa8212c 00000000 cfa82000 cfacfd44 cfacfc58
[    1.448144] NIP [00000000] 0x0
[    1.448236] LR [c020e758] sock_wfree+0xac/0x134
[    1.448351] Call Trace:
[    1.448425] [cfacfb68] [c020e7dc] sock_wfree+0x130/0x134 (unreliable)
[    1.448603] [cfacfb78] [c01f1388] wg_packet_send_staged_packets+0x234/0x6b4
[    1.448820] [cfacfbb8] [c01eecf8] wg_xmit+0x2a0/0x46c
[    1.448964] [cfacfbf8] [c0232134] dev_hard_start_xmit+0x190/0x1c0
[    1.449139] [cfacfc38] [c0232f2c] __dev_queue_xmit+0x4d0/0x844
[    1.449304] [cfacfc88] [c02a7134] ip_finish_output2+0x180/0x6b8
[    1.449475] [cfacfcb8] [c02aa3e8] ip_output+0xf0/0x1c0
[    1.449628] [cfacfd08] [c02ab14c] ip_send_skb+0x24/0xe8
[    1.449815] [cfacfd18] [c02e04bc] raw_sendmsg+0x774/0xa84
[    1.449983] [cfacfe48] [c020b9ec] __sys_sendto+0xdc/0x13c
[    1.450150] [cfacff38] [c000f1dc] ret_from_syscall+0x0/0x38
[    1.450320] --- interrupt: c01 at 0xb7e44f00
[    1.450320]     LR = 0xb7e21ba0
[    1.450794] Instruction dump:
[    1.450963] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[    1.451209] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
[    1.451637] ---[ end trace aaa4b4788958d0a8 ]---
[    1.451785]
[    2.555288] Kernel panic - not syncing: Aiee, killing interrupt handler!

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic()
  2020-10-14 22:26     ` [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic() Jason A. Donenfeld
@ 2020-10-14 22:51       ` Linus Torvalds
  2020-10-14 22:53         ` Linus Torvalds
                           ` (2 more replies)
  2020-10-14 23:02       ` [PATCH] powerpc32: don't adjust unmoved stack pointer in csum_partial_copy_generic() epilogue Jason A. Donenfeld
  1 sibling, 3 replies; 8+ messages in thread
From: Linus Torvalds @ 2020-10-14 22:51 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Al Viro, Linux Kernel Mailing List, linux-arch, Netdev

On Wed, Oct 14, 2020 at 3:27 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> This patch is causing crashes in WireGuard's CI over at
> https://www.wireguard.com/build-status/ . Apparently sending a simple
> network packet winds up triggering refcount_t's warn-on-saturate code. I

Ouch.

The C parts look fairly straightforward, and I don't see how they
could cause that odd refcount issue.

So I assume it's the low-level asm code conversion that is buggy. And
it's apparently the 32-bit conversion, since your ppc64 status looks
fine.

I think it's this instruction:

        addi    r1,r1,16

that should be removed from the function exit, because Al removed the

-       stwu    r1,-16(r1)

on function entry.

So I think you end up with a corrupt stack pointer and basically
random behavior.

Mind trying that? (This is obviously all in
arch/powerpc/lib/checksum_32.S, the csum_partial_copy_generic()
function).

               Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic()
  2020-10-14 22:51       ` Linus Torvalds
@ 2020-10-14 22:53         ` Linus Torvalds
  2020-10-14 22:54           ` Jason A. Donenfeld
  2020-10-14 22:53         ` Jason A. Donenfeld
  2020-10-14 23:12         ` Al Viro
  2 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2020-10-14 22:53 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Al Viro, Linux Kernel Mailing List, linux-arch, Netdev

[-- Attachment #1: Type: text/plain, Size: 704 bytes --]

On Wed, Oct 14, 2020 at 3:51 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I think it's this instruction:
>
>         addi    r1,r1,16
>
> that should be removed from the function exit, because Al removed the
>
> -       stwu    r1,-16(r1)
>
> on function entry.
>
> So I think you end up with a corrupt stack pointer and basically
> random behavior.
>
> Mind trying that? (This is obviously all in
> arch/powerpc/lib/checksum_32.S, the csum_partial_copy_generic()
> function).

Patch attached to make it easier to test.

NOTE! This is ENTIRELY untested. My ppc asm is so rusty that I might
be barking entirely up the wrong tree, and I just made things much
worse.

                 Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 471 bytes --]

 arch/powerpc/lib/checksum_32.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index ec5cd2dede35..27d9070617df 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -236,7 +236,6 @@ _GLOBAL(csum_partial_copy_generic)
 	slwi	r0,r0,8
 	adde	r12,r12,r0
 66:	addze	r3,r12
-	addi	r1,r1,16
 	beqlr+	cr7
 	rlwinm	r3,r3,8,0,31	/* odd destination address: rotate one byte */
 	blr

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic()
  2020-10-14 22:51       ` Linus Torvalds
  2020-10-14 22:53         ` Linus Torvalds
@ 2020-10-14 22:53         ` Jason A. Donenfeld
  2020-10-14 23:12         ` Al Viro
  2 siblings, 0 replies; 8+ messages in thread
From: Jason A. Donenfeld @ 2020-10-14 22:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Linux Kernel Mailing List, linux-arch, Netdev

On Thu, Oct 15, 2020 at 12:51 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Oct 14, 2020 at 3:27 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > This patch is causing crashes in WireGuard's CI over at
> > https://www.wireguard.com/build-status/ . Apparently sending a simple
> > network packet winds up triggering refcount_t's warn-on-saturate code. I
>
> Ouch.
>
> The C parts look fairly straightforward, and I don't see how they
> could cause that odd refcount issue.
>
> So I assume it's the low-level asm code conversion that is buggy. And
> it's apparently the 32-bit conversion, since your ppc64 status looks
> fine.
>
> I think it's this instruction:
>
>         addi    r1,r1,16
>
> that should be removed from the function exit, because Al removed the
>
> -       stwu    r1,-16(r1)

I just tried that about a minute ago, and indeed that seems to be
what's up. Problem goes away without it. I'll send a patch shortly.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic()
  2020-10-14 22:53         ` Linus Torvalds
@ 2020-10-14 22:54           ` Jason A. Donenfeld
  0 siblings, 0 replies; 8+ messages in thread
From: Jason A. Donenfeld @ 2020-10-14 22:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Al Viro, Linux Kernel Mailing List, linux-arch, Netdev

On Thu, Oct 15, 2020 at 12:53 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Oct 14, 2020 at 3:51 PM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > I think it's this instruction:
> >
> >         addi    r1,r1,16
> >
> > that should be removed from the function exit, because Al removed the
> >
> > -       stwu    r1,-16(r1)
> >
> > on function entry.
> >
> > So I think you end up with a corrupt stack pointer and basically
> > random behavior.
> >
> > Mind trying that? (This is obviously all in
> > arch/powerpc/lib/checksum_32.S, the csum_partial_copy_generic()
> > function).
>
> Patch attached to make it easier to test.
>
> NOTE! This is ENTIRELY untested. My ppc asm is so rusty that I might
> be barking entirely up the wrong tree, and I just made things much
> worse.

Indeed - exactly the same thing that's in my tree. Writing a commit
message for it now.

Jason

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH] powerpc32: don't adjust unmoved stack pointer in csum_partial_copy_generic() epilogue
  2020-10-14 22:26     ` [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic() Jason A. Donenfeld
  2020-10-14 22:51       ` Linus Torvalds
@ 2020-10-14 23:02       ` Jason A. Donenfeld
  2020-10-14 23:05         ` Linus Torvalds
  1 sibling, 1 reply; 8+ messages in thread
From: Jason A. Donenfeld @ 2020-10-14 23:02 UTC (permalink / raw)
  To: Al Viro, Linus Torvalds, linux-kernel, linux-arch, netdev
  Cc: Jason A. Donenfeld

A recent change to the checksum code removed usage of some extra
arguments, alongside with storage on the stack for those, and the stack
pointer no longer needed to be adjusted in the function prologue. But, a
left over subtraction wasn't removed in the function epilogue, causing
the function to return with the stack pointer moved 16 bytes away from
where it should have. This corrupted local state and lead to weird
crashes. This commit simply removes the leftover instruction from the
epilogue.

Fixes: 70d65cd555c5 ("ppc: propagate the calling conventions change down to csum_partial_copy_generic()")
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
 arch/powerpc/lib/checksum_32.S | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/lib/checksum_32.S b/arch/powerpc/lib/checksum_32.S
index ec5cd2dede35..27d9070617df 100644
--- a/arch/powerpc/lib/checksum_32.S
+++ b/arch/powerpc/lib/checksum_32.S
@@ -236,7 +236,6 @@ _GLOBAL(csum_partial_copy_generic)
 	slwi	r0,r0,8
 	adde	r12,r12,r0
 66:	addze	r3,r12
-	addi	r1,r1,16
 	beqlr+	cr7
 	rlwinm	r3,r3,8,0,31	/* odd destination address: rotate one byte */
 	blr
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc32: don't adjust unmoved stack pointer in csum_partial_copy_generic() epilogue
  2020-10-14 23:02       ` [PATCH] powerpc32: don't adjust unmoved stack pointer in csum_partial_copy_generic() epilogue Jason A. Donenfeld
@ 2020-10-14 23:05         ` Linus Torvalds
  0 siblings, 0 replies; 8+ messages in thread
From: Linus Torvalds @ 2020-10-14 23:05 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Al Viro, Linux Kernel Mailing List, linux-arch, Netdev

Thanks - applied and pushed out.

             Linus

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic()
  2020-10-14 22:51       ` Linus Torvalds
  2020-10-14 22:53         ` Linus Torvalds
  2020-10-14 22:53         ` Jason A. Donenfeld
@ 2020-10-14 23:12         ` Al Viro
  2 siblings, 0 replies; 8+ messages in thread
From: Al Viro @ 2020-10-14 23:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jason A. Donenfeld, Linux Kernel Mailing List, linux-arch, Netdev

On Wed, Oct 14, 2020 at 03:51:00PM -0700, Linus Torvalds wrote:
> On Wed, Oct 14, 2020 at 3:27 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >
> > This patch is causing crashes in WireGuard's CI over at
> > https://www.wireguard.com/build-status/ . Apparently sending a simple
> > network packet winds up triggering refcount_t's warn-on-saturate code. I
> 
> Ouch.
> 
> The C parts look fairly straightforward, and I don't see how they
> could cause that odd refcount issue.
> 
> So I assume it's the low-level asm code conversion that is buggy. And
> it's apparently the 32-bit conversion, since your ppc64 status looks
> fine.
> 
> I think it's this instruction:
> 
>         addi    r1,r1,16
> 
> that should be removed from the function exit, because Al removed the
> 
> -       stwu    r1,-16(r1)
> 
> on function entry.
> 
> So I think you end up with a corrupt stack pointer and basically
> random behavior.

Gyahh...  ACK, and I really wonder how the hell has it managed to avoid
crashing on testing.

Mea culpa, folks.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-10-15  1:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200724012512.GK2786714@ZenIV.linux.org.uk>
     [not found] ` <20200724012546.302155-1-viro@ZenIV.linux.org.uk>
     [not found]   ` <20200724012546.302155-20-viro@ZenIV.linux.org.uk>
2020-10-14 22:26     ` [PATCH v2 20/20] ppc: propagate the calling conventions change down to csum_partial_copy_generic() Jason A. Donenfeld
2020-10-14 22:51       ` Linus Torvalds
2020-10-14 22:53         ` Linus Torvalds
2020-10-14 22:54           ` Jason A. Donenfeld
2020-10-14 22:53         ` Jason A. Donenfeld
2020-10-14 23:12         ` Al Viro
2020-10-14 23:02       ` [PATCH] powerpc32: don't adjust unmoved stack pointer in csum_partial_copy_generic() epilogue Jason A. Donenfeld
2020-10-14 23:05         ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).