From: Shuah Khan <skhan@linuxfoundation.org>
To: Christian Brauner <brauner@kernel.org>,
Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <christian@brauner.io>,
Shuah Khan <shuah@kernel.org>, Zach O'Keefe <zokeefe@google.com>,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org,
Shuah Khan <skhan@linuxfoundation.org>
Subject: Re: [PATCH] pidfd: fix test failure due to stack overflow on some arches
Date: Wed, 2 Feb 2022 08:52:47 -0700 [thread overview]
Message-ID: <924d8fd8-7069-41d2-9887-f36160d9f598@linuxfoundation.org> (raw)
In-Reply-To: <20220128085616.tnsowlg5iff6ofm4@wittgenstein>
On 1/28/22 1:56 AM, Christian Brauner wrote:
> On Thu, Jan 27, 2022 at 01:29:51PM -0800, Axel Rasmussen wrote:
>> When running the pidfd_fdinfo_test on arm64, it fails for me. After some
>> digging, the reason is that the child exits due to SIGBUS, because it
>> overflows the 1024 byte stack we've reserved for it.
>>
>> To fix the issue, increase the stack size to 8192 bytes (this number is
>> somewhat arbitrary, and was arrived at through experimentation -- I kept
>> doubling until the failure no longer occurred).
>>
>> Also, let's make the issue easier to debug. wait_for_pid() returns an
>> ambiguous value: it may return -1 in all of these cases:
>>
>> 1. waitpid() itself returned -1
>> 2. waitpid() returned success, but we found !WIFEXITED(status).
>> 3. The child process exited, but it did so with a -1 exit code.
>>
>> There's no way for the caller to tell the difference. So, at least log
>> which occurred, so the test runner can debug things.
>>
>> While debugging this, I found that we had !WIFEXITED(), because the
>> child exited due to a signal. This seems like a reasonably common case,
>> so also print out whether or not we have WIFSIGNALED(), and the
>> associated WTERMSIG() (if any). This lets us see the SIGBUS I'm fixing
>> clearly when it occurs.
>>
>> Finally, I'm suspicious of allocating the child's stack on our stack.
>> man clone(2) suggests that the correct way to do this is with mmap(),
>> and in particular by setting MAP_STACK. So, switch to doing it that way
>> instead.
>
> Heh, yes. :)
>
> commit 99c3a000279919cc4875c9dfa9c3ebb41ed8773e
> Author: Michael Kerrisk <mtk.manpages@gmail.com>
> Date: Thu Nov 14 12:19:21 2019 +0100
>
> clone.2: Allocate child's stack using mmap(2) rather than malloc(3)
>
> Christian Brauner suggested mmap(MAP_STACKED), rather than
> malloc(), as the canonical way of allocating a stack for the
> child of clone(), and Jann Horn noted some reasons why:
>
> Not on Linux, but on OpenBSD, they do use MAP_STACK now
> AFAIK; this was announced here:
> <http://openbsd-archive.7691.n7.nabble.com/stack-register-checking-td338238.html>.
> Basically they periodically check whether the userspace
> stack pointer points into a MAP_STACK region, and if not,
> they kill the process. So even if it's a no-op on Linux, it
> might make sense to advise people to use the flag to improve
> portability? I'm not sure if that's something that belongs
> in Linux manpages.
>
> Another reason against malloc() is that when setting up
> thread stacks in proper, reliable software, you'll probably
> want to place a guard page (in other words, a 4K PROT_NONE
> VMA) at the bottom of the stack to reliably catch stack
> overflows; and you probably don't want to do that with
> malloc, in particular with non-page-aligned allocations.
>
> And the OpenBSD 6.5 manual pages says:
>
> MAP_STACK
> Indicate that the mapping is used as a stack. This
> flag must be used in combination with MAP_ANON and
> MAP_PRIVATE.
>
> And I then noticed that MAP_STACK seems already to be on
> FreeBSD for a long time:
>
> MAP_STACK
> Map the area as a stack. MAP_ANON is implied.
> Offset should be 0, fd must be -1, and prot should
> include at least PROT_READ and PROT_WRITE. This
> option creates a memory region that grows to at
> most len bytes in size, starting from the stack
> top and growing down. The stack top is the start‐
> ing address returned by the call, plus len bytes.
> The bottom of the stack at maximum growth is the
> starting address returned by the call.
>
> The entire area is reserved from the point of view
> of other mmap() calls, even if not faulted in yet.
>
> Reported-by: Jann Horn <jannh@google.com>
> Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
> Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
>
>
>>
>> Signed-off-by: Axel Rasmussen <axelrasmussen@google.com>
>> ---
>
> Yeah, stack handling - especially with legacy clone() - is yucky on the
> best of days. Thank you for the fix.
>
> Acked-by: Christian Brauner <brauner@kernel.org>
>
Thank you both. Will apply for 5.17-rc4 or so.
thanks,
-- Shuah
prev parent reply other threads:[~2022-02-02 15:52 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-27 21:29 [PATCH] pidfd: fix test failure due to stack overflow on some arches Axel Rasmussen
2022-01-28 8:56 ` Christian Brauner
2022-02-02 15:52 ` Shuah Khan [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=924d8fd8-7069-41d2-9887-f36160d9f598@linuxfoundation.org \
--to=skhan@linuxfoundation.org \
--cc=axelrasmussen@google.com \
--cc=brauner@kernel.org \
--cc=christian@brauner.io \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=shuah@kernel.org \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).