Linux-arch Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Martin <Dave.Martin@arm.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@suse.de>,
	Andy Lutomirski <luto@kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Len Brown <len.brown@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Tony Luck <tony.luck@intel.com>,
	"Ravi V. Shankar" <ravi.v.shankar@intel.com>,
	GNU C Library <libc-alpha@sourceware.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size
Date: Tue, 6 Oct 2020 16:25:54 +0100
Message-ID: <20201006152553.GY6642@arm.com> (raw)
In-Reply-To: <CAMe9rOq_nKa6xjHju3kVZephTiO+jEW3PqxgAhU9+RdLTo-jgg@mail.gmail.com>

On Tue, Oct 06, 2020 at 05:12:29AM -0700, H.J. Lu wrote:
> On Tue, Oct 6, 2020 at 2:25 AM Dave Martin <Dave.Martin@arm.com> wrote:
> >
> > On Mon, Oct 05, 2020 at 10:17:06PM +0100, H.J. Lu wrote:
> > > On Mon, Oct 5, 2020 at 6:45 AM Dave Martin <Dave.Martin@arm.com> wrote:
> > > >
> > > > On Tue, Sep 29, 2020 at 01:57:42PM -0700, Chang S. Bae wrote:
> > > > > During signal entry, the kernel pushes data onto the normal userspace
> > > > > stack. On x86, the data pushed onto the user stack includes XSAVE state,
> > > > > which has grown over time as new features and larger registers have been
> > > > > added to the architecture.
> > > > >
> > > > > MINSIGSTKSZ is a constant provided in the kernel signal.h headers and
> > > > > typically distributed in lib-dev(el) packages, e.g. [1]. Its value is
> > > > > compiled into programs and is part of the user/kernel ABI. The MINSIGSTKSZ
> > > > > constant indicates to userspace how much data the kernel expects to push on
> > > > > the user stack, [2][3].
> > > > >
> > > > > However, this constant is much too small and does not reflect recent
> > > > > additions to the architecture. For instance, when AVX-512 states are in
> > > > > use, the signal frame size can be 3.5KB while MINSIGSTKSZ remains 2KB.
> > > > >
> > > > > The bug report [4] explains this as an ABI issue. The small MINSIGSTKSZ can
> > > > > cause user stack overflow when delivering a signal.
> > > > >
> > > > > In this series, we suggest a couple of things:
> > > > > 1. Provide a variable minimum stack size to userspace, as a similar
> > > > >    approach to [5]
> > > > > 2. Avoid using a too-small alternate stack
> > > >
> > > > I can't comment on the x86 specifics, but the approach followed in this
> > > > series does seem consistent with the way arm64 populates
> > > > AT_MINSIGSTKSZ.
> > > >
> > > > I need to dig up my glibc hacks for providing a sysconf interface to
> > > > this...
> > >
> > > Here is my proposal for glibc:
> > >
> > > https://sourceware.org/pipermail/libc-alpha/2020-September/118098.html
> >
> > Thanks for the link.
> >
> > Are there patches yet?  I already had some hacks in the works, but I can
> > drop them if there's something already out there.
> 
> I am working on it.

OK.  I may post something for discussion, but I'm happy for it to be
superseded by someone (i.e., other than me) who actually knows what
they're doing...

> >
> > > 1. Define SIGSTKSZ and MINSIGSTKSZ to 64KB.
> >
> > Can we do this?  IIUC, this is an ABI break and carries the risk of
> > buffer overruns.
> >
> > The reason for not simply increasing the kernel's MINSIGSTKSZ #define
> > (apart from the fact that it is rarely used, due to glibc's shadowing
> > definitions) was that userspace binaries will have baked in the old
> > value of the constant and may be making assumptions about it.
> >
> > For example, the type (char [MINSIGSTKSZ]) changes if this #define
> > changes.  This could be a problem if an newly built library tries to
> > memcpy() or dump such an object defined by and old binary.
> > Bounds-checking and the stack sizes passed to things like sigaltstack()
> > and makecontext() could similarly go wrong.
> 
> With my original proposal:
> 
> https://sourceware.org/pipermail/libc-alpha/2020-September/118028.html
> 
> char [MINSIGSTKSZ] won't compile.  The feedback is to increase the
> constants:
> 
> https://sourceware.org/pipermail/libc-alpha/2020-September/118092.html

Ah, I see.  But both still API and ABI breaks; moreover, declaraing an
array with size based on (MIN)SIGSTKSZ is not just reasonable, but the
obvious thing to do with this constant in many simple cases.  Such usage
is widespread, see:

 * https://codesearch.debian.net/search?q=%5BSIGSTKSZ%5D&literal=1


Your two approaches seem to trade off two different sources of buffer
overruns: undersized stacks versus ABI breaks across library boundaries.

Since undersized stack is by far the more familiar problem and we at
least have guard regions to help detect overruns, I'd vote to keep
MINSIGSTKSZ and SIGSTKSZ as-is, at least for now.

Or are people reporting real stack overruns on x86 today?


For arm64, we made large vectors on SVE opt-in, so that oversized signal
frames are not seen by default.  Would somethine similar be feasible on
x86?


> > > 2. Add _SC_RSVD_SIG_STACK_SIZE for signal stack size reserved by the kernel.
> >
> > How about "_SC_MINSIGSTKSZ"?  This was my initial choice since only the
> > discovery method is changing.  The meaning of the value is exactly the
> > same as before.
> >
> > If we are going to rename it though, it could make sense to go for
> > something more directly descriptive, say, "_SC_SIGNAL_FRAME_SIZE".
> >
> > The trouble with including "STKSZ" is that is sounds like a
> > recommendation for your stack size.  While the signal frame size is
> > relevant to picking a stack size, it's not the only thing to
> > consider.
> 
> The problem is that AT_MINSIGSTKSZ is the signal frame size used by
> kernel.   The minimum stack size for a signal handler is more likely
> AT_MINSIGSTKSZ + 1.5KB unless AT_MINSIGSTKSZ returns the signal
> frame size used by kernel + 6KB for user application.

Ack; to be correct, you also need to take into account which signals may
be unmasked while running on this stack, and the stack requirements of
all their handlers.  Unfortunately, that's hard :(

What's your view on my naming suggesions?


> > Also, do we need a _SC_SIGSTKSZ constant, or should the entire concept
> > of a "recommended stack size" be abandoned?  glibc can at least make a
> > slightly more informed guess about suitable stack sizes than the kernel
> > (and glibc already has to guess anyway, in order to determine the
> > default thread stack size).
> 
> Glibc should try to deduct signal frame size if AT_MINSIGSTKSZ isn't
> available.

In my code, I generate _SC_SIGSTKSZ as the equivalent of

	max(sysconf(_SC_MINSIGSTKSZ) * 4, SIGSTKSZ)

which is >= the legacy value, and broadly reperesentative of the
relationship between MINSIGSTKSZ and SIGSTKSZ on most arches.


What do you think?


> > > 3. Deprecate SIGSTKSZ and MINSIGSTKSZ if _SC_RSVD_SIG_STACK_SIZE
> > > is in use.
> >
> > Great if we can do it.  I was concerned that this might be
> > controversial.
> >
> > Would this just be a recommendation, or can we enforce it somehow?
> 
> It is just an idea.  We need to move away from constant SIGSTKSZ and
> MINSIGSTKSZ.

Totally agree with that.

Cheers
---Dave

  parent reply index

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-29 20:57 Chang S. Bae
2020-09-29 20:57 ` [RFC PATCH 1/4] x86/signal: Introduce helpers to get the maximum signal frame size Chang S. Bae
2020-10-05 13:42   ` Dave Martin
2020-10-06 17:45     ` Bae, Chang Seok
2020-10-07 10:05       ` Dave Martin
2020-10-08 22:43         ` Bae, Chang Seok
2020-10-12 13:26           ` Dave Martin
2020-09-29 20:57 ` [RFC PATCH 2/4] x86/elf: Support a new ELF aux vector AT_MINSIGSTKSZ Chang S. Bae
2020-09-29 20:57 ` [RFC PATCH 3/4] x86/signal: Prevent an alternate stack overflow before a signal delivery Chang S. Bae
2020-09-29 20:57 ` [RFC PATCH 4/4] selftest/x86/signal: Include test cases for validating sigaltstack Chang S. Bae
2020-10-05 13:45 ` [RFC PATCH 0/4] x86: Improve Minimum Alternate Stack Size Dave Martin
2020-10-05 21:17   ` H.J. Lu
2020-10-06  9:25     ` Dave Martin
2020-10-06 12:12       ` H.J. Lu
2020-10-06 15:18         ` H.J. Lu
2020-10-06 15:43           ` Dave Martin
2020-10-06 16:52             ` H.J. Lu
2020-10-06 15:25         ` Dave Martin [this message]
2020-10-06 15:33           ` Dave Hansen
2020-10-06 17:00             ` Dave Martin
2020-10-06 18:21               ` Florian Weimer
2020-10-07 10:19                 ` Dave Martin
2020-10-06 18:30               ` Dave Hansen
2020-10-07 10:20                 ` Dave Martin
2020-10-06 15:34           ` H.J. Lu
2020-10-06 16:55             ` Dave Martin
2020-10-06 17:44               ` H.J. Lu
2020-10-07 10:47                 ` Dave Martin
2020-10-07 13:30                   ` H.J. Lu
2020-10-07 15:45                     ` Dave Martin
2020-10-07 17:43                       ` H.J. Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201006152553.GY6642@arm.com \
    --to=dave.martin@arm.com \
    --cc=bp@suse.de \
    --cc=chang.seok.bae@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=hjl.tools@gmail.com \
    --cc=len.brown@intel.com \
    --cc=libc-alpha@sourceware.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-arch Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-arch/0 linux-arch/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-arch linux-arch/ https://lore.kernel.org/linux-arch \
		linux-arch@vger.kernel.org
	public-inbox-index linux-arch

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-arch


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git