linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
@ 2019-10-13 15:56 bugzilla-daemon
  2019-11-18  4:28 ` [Bug 205183] " bugzilla-daemon
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-10-13 15:56 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=205183

            Bug ID: 205183
           Summary: PPC64: Signal delivery fails with SIGSEGV if between
                    about 1KB and 4KB bytes of stack remain
           Product: Platform Specific/Hardware
           Version: 2.5
    Kernel Version: 4.19.15 and others
          Hardware: PPC-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: PPC-64
          Assignee: platform_ppc-64@kernel-bugs.osdl.org
          Reporter: tgl@sss.pgh.pa.us
        Regression: No

Created attachment 285487
  --> https://bugzilla.kernel.org/attachment.cgi?id=285487&action=edit
stacktest.c

If there are between about 1K and 4K bytes remaining in a process' existing
stack segment, an attempt to deliver a signal that the process has a signal
handler for will result in SIGSEGV instead.  This situation should result in
extending the process' stack to allow handling the signal, but it does not.

The attached test program illustrates this.  It requires a parameter specifying
the amount of stack to consume before sleeping.  Waken the process with a
manual kill -USR1.  An example of a successful case is

[tgl@postgresql-fedora ~]$ gcc -g -Wall -O stacktest.c
[tgl@postgresql-fedora ~]$ ./a.out 1240000 &
[1] 7922
[tgl@postgresql-fedora ~]$ cat /proc/7922/maps | grep stack
7fffc9970000-7fffc9aa0000 rw-p 00000000 00:00 0                         
[stack]
[tgl@postgresql-fedora ~]$ kill -USR1 7922
[tgl@postgresql-fedora ~]$ signal delivered, stack base 0x7fffc9aa0000 top
0x7fffc9971420 (1240032 used)

[1]+  Done                    ./a.out 1240000

The above example shows that 0x7fffc9971420 - 0x7fffc9970000 = 5152 bytes
are enough to deliver the signal.  But with a slightly larger parameter,

[tgl@postgresql-fedora ~]$ ./a.out 1241000 &
[1] 7941
[tgl@postgresql-fedora ~]$ kill -USR1 7941
[tgl@postgresql-fedora ~]$ 
[1]+  Segmentation fault      (core dumped) ./a.out 1241000

With a still larger parameter, corresponding to just a few hundred bytes left,
it works again, showing that the kernel does know how to enlarge the stack in
such cases --- it's just got a boundary condition wrong somewhere.

On the particular userland toolchain I'm using here, parameters between about
1241000 and 1244000 (free space between about 1200 and 4200 bytes) will show
the error, but you might need to tweak it a bit with a different system.

The Postgres project has been chasing errors caused by this bug for months, and
we've seen it happen on a range of PPC64 kernels from 4.4.0 up to 4.19.15, but
not on other architectures, nor on non-Linux PPC64.  My colleague Thomas Munro
found a possible explanation in

https://github.com/torvalds/linux/blob/master/arch/powerpc/mm/fault.c#L251

which claims that

         * The kernel signal delivery code writes up to about 1.5kB
         * below the stack pointer (r1) before decrementing it.

and that seems to be the justification for the "2048" magic number at line 276.
Perhaps that number applies only to PPC32, and PPC64 requires more space?  At
the very least, this function's other magic number of 0x100000 seems highly
suspicious in view of the fact that we don't see the bug until the process has
consumed at least 1MB of stack space.  (Hence, please use values > 1MB with the
test program.)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
  2019-10-13 15:56 [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain bugzilla-daemon
@ 2019-11-18  4:28 ` bugzilla-daemon
  2019-12-10 13:25 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-11-18  4:28 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=205183

Daniel Black (daniel@linux.ibm.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |daniel@linux.ibm.com

--- Comment #1 from Daniel Black (daniel@linux.ibm.com) ---
Tom,

Thanks for the bug report. Appreciate it. Feel free to use the
linuxppc-dev@lists.ozlabs.org list.


Reproduced in 5.4.0-rc8

danielgb@talos2:~$ uname -a
Linux talos2 5.4.0-rc8 #5 SMP Mon Nov 18 13:27:11 AEDT 2019 ppc64le ppc64le
ppc64le GNU/Linux
danielgb@talos2:~$ gcc -g -Wall -O stacktest.c
danielgb@talos2:~$ ./a.out 1240000 &
[1] 2944
danielgb@talos2:~$  cat /proc/$(pidof a.out)/maps | grep stack
7fffc62f0000-7fffc6420000 rw-p 00000000 00:00 0                         
[stack]
danielgb@talos2:~$ kill -USR1 %1
danielgb@talos2:~$ signal delivered, stack base 0x7fffc6420000 top
0x7fffc62f1427 (1240025 used)

[1]+  Done                    ./a.out 1240000
danielgb@talos2:~$  ./a.out 1241000 &
[1] 2948
danielgb@talos2:~$ kill -USR1 %1
danielgb@talos2:~$ 
[1]+  Segmentation fault      ./a.out 1241000


[ 6415.077590] a.out[2948]: bad frame in setup_rt_frame: 00007fffe4fb0010 nip
000006a185d909fc lr 000077ecda3c04e8


I'll get someone to look at this soon.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
  2019-10-13 15:56 [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain bugzilla-daemon
  2019-11-18  4:28 ` [Bug 205183] " bugzilla-daemon
@ 2019-12-10 13:25 ` bugzilla-daemon
  2019-12-11  1:51 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-12-10 13:25 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=205183

Daniel Axtens (dja@axtens.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |dja@axtens.net

--- Comment #2 from Daniel Axtens (dja@axtens.net) ---
Hi, I'm starting to have a look at this for Daniel B.

So looking at the fault that fails, I see that it's a fault with the NIP in the
_kernel_ that fails, rather than in userspace. Dumping stack we see:

[  118.917679] Call Trace:
[  118.917715] [c00000007b457820] [c000000000b71538] dump_stack+0xbc/0x104
(unreliable)
[  118.917719] [c00000007b457860] [c00000000006e8f0]
__do_page_fault+0x860/0xf90
[  118.917721] [c00000007b457940] [c00000000000af68]
handle_page_fault+0x10/0x30
[  118.917725] --- interrupt: 301 at handle_rt_signal64+0x180/0x13a0
                   LR = handle_rt_signal64+0x148/0x13a0
[  118.917726] [c00000007b457d30] [c000000000023d30]
do_notify_resume+0x2e0/0x410
[  118.917728] [c00000007b457e20] [c00000000000e4c4]
ret_from_except_lite+0x70/0x74

I'm still debugging, but it looks like handle_rt_signal64 attempts to reserve a
stack frame for the signal, but computes a stack address that sits outside
valid stack space. Then when writing to it, it pagefaults, and because it's not
a userland NIP, it refuses to expand the stack.

I'll keep you up to date.

Regards,
Daniel A

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
  2019-10-13 15:56 [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain bugzilla-daemon
  2019-11-18  4:28 ` [Bug 205183] " bugzilla-daemon
  2019-12-10 13:25 ` bugzilla-daemon
@ 2019-12-11  1:51 ` bugzilla-daemon
  2020-06-11  6:43 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2019-12-11  1:51 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=205183

--- Comment #3 from Daniel Axtens (dja@axtens.net) ---
I have a proposed patch at
https://lore.kernel.org/linuxppc-dev/20191211014337.28128-1-dja@axtens.net/T/#u

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
  2019-10-13 15:56 [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-12-11  1:51 ` bugzilla-daemon
@ 2020-06-11  6:43 ` bugzilla-daemon
  2020-07-28  0:45 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-06-11  6:43 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=205183

--- Comment #4 from Daniel Black (daniel@linux.ibm.com) ---
Still broken.

danielgb@talos2:~$ gcc -g -Wall -O stacktest.c
danielgb@talos2:~$  ./a.out 1240000 &
[1] 494618
danielgb@talos2:~$ cat /proc/$(pidof a.out)/maps | grep stack
7fffcde80000-7fffcdfb0000 rw-p 00000000 00:00 0                         
[stack]
danielgb@talos2:~$ kill -USR1 %1
danielgb@talos2:~$ signal delivered, stack base 0x7fffcdfb0000 top
0x7fffcde81427 (1240025 used)

[1]+  Done                    ./a.out 1240000
danielgb@talos2:~$ ./a.out 1241000 &
[1] 494677
danielgb@talos2:~$ kill -USR1 %1
danielgb@talos2:~$ 
[1]+  Segmentation fault      ./a.out 1241000
danielgb@talos2:~$ 
danielgb@talos2:~$ dmesg | grep a.out
[10617.616145] a.out[494587]: bad frame in setup_rt_frame: 00007fffdea30010 nip
000000011a0a09fc lr 00007fffa1c404c8
[10865.752876] a.out[494677]: bad frame in setup_rt_frame: 00007fffcc420030 nip
0000000135a70a3c lr 00007fff952604c8
danielgb@talos2:~$ uname -a
Linux talos2 5.7.0-rc5-77151-gfea086b627a0 #1 SMP Mon May 11 16:00:00 AEST 2020
ppc64le ppc64le ppc64le GNU/Linux

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
  2019-10-13 15:56 [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain bugzilla-daemon
                   ` (3 preceding siblings ...)
  2020-06-11  6:43 ` bugzilla-daemon
@ 2020-07-28  0:45 ` bugzilla-daemon
  2020-07-28  0:46 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-07-28  0:45 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=205183

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
                 CC|                            |michael@ellerman.id.au

--- Comment #5 from Michael Ellerman (michael@ellerman.id.au) ---
Patches posted:

https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=192046

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
  2019-10-13 15:56 [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain bugzilla-daemon
                   ` (4 preceding siblings ...)
  2020-07-28  0:45 ` bugzilla-daemon
@ 2020-07-28  0:46 ` bugzilla-daemon
  2020-08-11  3:47 ` bugzilla-daemon
  2020-08-31 13:16 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-07-28  0:46 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=205183

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |CODE_FIX

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
  2019-10-13 15:56 [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain bugzilla-daemon
                   ` (5 preceding siblings ...)
  2020-07-28  0:46 ` bugzilla-daemon
@ 2020-08-11  3:47 ` bugzilla-daemon
  2020-08-31 13:16 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-08-11  3:47 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=205183

--- Comment #6 from Michael Ellerman (michael@ellerman.id.au) ---
Fixed in 63dee5df43a3 ("powerpc: Allow 4224 bytes of stack expansion for the
signal frame")

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug 205183] PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain
  2019-10-13 15:56 [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain bugzilla-daemon
                   ` (6 preceding siblings ...)
  2020-08-11  3:47 ` bugzilla-daemon
@ 2020-08-31 13:16 ` bugzilla-daemon
  7 siblings, 0 replies; 9+ messages in thread
From: bugzilla-daemon @ 2020-08-31 13:16 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=205183

Michael Ellerman (michael@ellerman.id.au) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-08-31 13:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-13 15:56 [Bug 205183] New: PPC64: Signal delivery fails with SIGSEGV if between about 1KB and 4KB bytes of stack remain bugzilla-daemon
2019-11-18  4:28 ` [Bug 205183] " bugzilla-daemon
2019-12-10 13:25 ` bugzilla-daemon
2019-12-11  1:51 ` bugzilla-daemon
2020-06-11  6:43 ` bugzilla-daemon
2020-07-28  0:45 ` bugzilla-daemon
2020-07-28  0:46 ` bugzilla-daemon
2020-08-11  3:47 ` bugzilla-daemon
2020-08-31 13:16 ` bugzilla-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).