linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shuah Khan <skhan@linuxfoundation.org>
To: Christian Brauner <brauner@kernel.org>,
	Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <christian@brauner.io>,
	Shuah Khan <shuah@kernel.org>, Zach O'Keefe <zokeefe@google.com>,
	linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
	Shuah Khan <skhan@linuxfoundation.org>
Subject: Re: [PATCH] pidfd: fix test failure due to stack overflow on some arches
Date: Wed, 2 Feb 2022 08:52:47 -0700	[thread overview]
Message-ID: <924d8fd8-7069-41d2-9887-f36160d9f598@linuxfoundation.org> (raw)
In-Reply-To: <20220128085616.tnsowlg5iff6ofm4@wittgenstein>

On 1/28/22 1:56 AM, Christian Brauner wrote:
> On Thu, Jan 27, 2022 at 01:29:51PM -0800, Axel Rasmussen wrote:
>> When running the pidfd_fdinfo_test on arm64, it fails for me. After some
>> digging, the reason is that the child exits due to SIGBUS, because it
>> overflows the 1024 byte stack we've reserved for it.
>>
>> To fix the issue, increase the stack size to 8192 bytes (this number is
>> somewhat arbitrary, and was arrived at through experimentation -- I kept
>> doubling until the failure no longer occurred).
>>
>> Also, let's make the issue easier to debug. wait_for_pid() returns an
>> ambiguous value: it may return -1 in all of these cases:
>>
>> 1. waitpid() itself returned -1
>> 2. waitpid() returned success, but we found !WIFEXITED(status).
>> 3. The child process exited, but it did so with a -1 exit code.
>>
>> There's no way for the caller to tell the difference. So, at least log
>> which occurred, so the test runner can debug things.
>>
>> While debugging this, I found that we had !WIFEXITED(), because the
>> child exited due to a signal. This seems like a reasonably common case,
>> so also print out whether or not we have WIFSIGNALED(), and the
>> associated WTERMSIG() (if any). This lets us see the SIGBUS I'm fixing
>> clearly when it occurs.
>>
>> Finally, I'm suspicious of allocating the child's stack on our stack.
>> man clone(2) suggests that the correct way to do this is with mmap(),
>> and in particular by setting MAP_STACK. So, switch to doing it that way
>> instead.
> 
> Heh, yes. :)
> 
> commit 99c3a000279919cc4875c9dfa9c3ebb41ed8773e
> Author: Michael Kerrisk <mtk.manpages@gmail.com>
> Date:   Thu Nov 14 12:19:21 2019 +0100
> 
>      clone.2: Allocate child's stack using mmap(2) rather than malloc(3)
> 
>      Christian Brauner suggested mmap(MAP_STACKED), rather than
>      malloc(), as the canonical way of allocating a stack for the
>      child of clone(), and Jann Horn noted some reasons why:
> 
>          Not on Linux, but on OpenBSD, they do use MAP_STACK now
>          AFAIK; this was announced here:
>          <http://openbsd-archive.7691.n7.nabble.com/stack-register-checking-td338238.html>.
>          Basically they periodically check whether the userspace
>          stack pointer points into a MAP_STACK region, and if not,
>          they kill the process. So even if it's a no-op on Linux, it
>          might make sense to advise people to use the flag to improve
>          portability? I'm not sure if that's something that belongs
>          in Linux manpages.
> 
>          Another reason against malloc() is that when setting up
>          thread stacks in proper, reliable software, you'll probably
>          want to place a guard page (in other words, a 4K PROT_NONE
>          VMA) at the bottom of the stack to reliably catch stack
>          overflows; and you probably don't want to do that with
>          malloc, in particular with non-page-aligned allocations.
> 
>      And the OpenBSD 6.5 manual pages says:
> 
>          MAP_STACK
>              Indicate that the mapping is used as a stack. This
>              flag must be used in combination with MAP_ANON and
>              MAP_PRIVATE.
> 
>      And I then noticed that MAP_STACK seems already to be on
>      FreeBSD for a long time:
> 
>          MAP_STACK
>              Map the area as a stack.  MAP_ANON is implied.
>              Offset should be 0, fd must be -1, and prot should
>              include at least PROT_READ and PROT_WRITE.  This
>              option creates a memory region that grows to at
>              most len bytes in size, starting from the stack
>              top and growing down.  The stack top is the start‐
>              ing address returned by the call, plus len bytes.
>              The bottom of the stack at maximum growth is the
>              starting address returned by the call.
> 
>              The entire area is reserved from the point of view
>              of other mmap() calls, even if not faulted in yet.
> 
>      Reported-by: Jann Horn <jannh@google.com>
>      Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
>      Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
> 
> 
>>
>> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
>> ---
> 
> Yeah, stack handling - especially with legacy clone() - is yucky on the
> best of days. Thank you for the fix.
> 
> Acked-by: Christian Brauner <brauner@kernel.org>
> 

Thank you both. Will apply for 5.17-rc4 or so.

thanks,
-- Shuah

      reply	other threads:[~2022-02-02 15:52 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-27 21:29 [PATCH] pidfd: fix test failure due to stack overflow on some arches Axel Rasmussen
2022-01-28  8:56 ` Christian Brauner
2022-02-02 15:52   ` Shuah Khan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=924d8fd8-7069-41d2-9887-f36160d9f598@linuxfoundation.org \
    --to=skhan@linuxfoundation.org \
    --cc=axelrasmussen@google.com \
    --cc=brauner@kernel.org \
    --cc=christian@brauner.io \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=shuah@kernel.org \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).