linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: "H. Peter Anvin" <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Brian Gerst <brgerst@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Christoph Hellwig <hch@lst.de>
Subject: Re: Rethinking sigcontext's xfeatures slightly for PKRU's benefit?
Date: Fri, 18 Dec 2015 08:04:10 -0800	[thread overview]
Message-ID: <CALCETrU28FoOwR+n8U0Qtt20=ZW6rgc3=M8X4G0QMt1_ZVHY7Q@mail.gmail.com> (raw)
In-Reply-To: <FEA53E3B-6AE2-4305-82B1-C57451B6E33E@zytor.com>

On Thu, Dec 17, 2015 at 10:43 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On December 17, 2015 9:29:21 PM PST, Andy Lutomirski <luto@amacapital.net> wrote:
>>On Dec 17, 2015 6:53 PM, "Dave Hansen" <dave.hansen@linux.intel.com>
>>wrote:
>>>
>>> On 12/17/2015 06:32 PM, Andy Lutomirski wrote:
>>> > On Thu, Dec 17, 2015 at 6:13 PM, Dave Hansen
>><dave.hansen@linux.intel.com> wrote:
>>> >> But what about the register state when delivering a signal?  Don't
>>we
>>> >> set the registers to the init state?  Do we need to preserve PKRU
>>state
>>> >> instead of init'ing it?  The init state _is_ nice here because it
>>is
>>> >> permissive allows us to do something useful no matter what PKRU
>>gets set to.
>>> >
>>> > I think we leave the extended regs alone.  Don't we?
>>> >
>>> > I think that, for most signals, we want to leave PKRU as is,
>>> > especially for things that aren't SIGSEGV.  For SIGSEGV, maybe we
>>want
>>> > an option to reset PKRU on delivery (and then set the flag to
>>restore
>>> > on return?).
>>>
>>> Is there some precedent for doing the state differently for different
>>> signals?
>>
>>Yes, to a very limited extent: SA_ONSTACK.
>>
>>>
>>> >> Well, the signal handler isn't necessarily going to clobber it,
>>but the
>>> >> delivery code already clobbers it when going to the init state.
>>> >
>>> > Can you point to that code?
>>>
>>> handle_signal() -> fpu__clear()
>>>
>>> The comment around it says:
>>>
>>> "Ensure the signal handler starts with the new fpu state."
>>>
>>
>>You win this round :)
>>
>>So maybe we should have a mask of xfeatures that aren't cleared on
>>signal delivery (e.g. PKRU, perhaps) and that are, by default,
>>preserved across signal returns.
>>

[...]

>
> I find the notion of PKRU not being initialized at the entry to a signal handler a bit odd.  Saving/restoring it seems like the right thing to me.
>
> What am I missing?

On Fri, Dec 18, 2015 at 12:32 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> So the principle is this: signals are conceptually like creating a new thread of
> execution, and that's a very powerful programming concept, like fork() or
> pthread_create() are powerful concepts. So we definitely want to keep that default
> behavior, and I don't think we want to deviate from that for typical new extended
> CPU context features, even if signal delivery slows down as a result.

I think that PKRU is in almost no respect a typical no extended CPU
context feature.  It's a control register, not a data register.

Let's try a concrete example.  One big use case for PKRU will be in
conjunction with DAX-style pmem.  Imagine some program (database
server, perhaps) that mmaps 1 TB of pmem MAP_SHARED, PROT_READ |
PROT_WRITE with protection key 1.  It sets PKRU so that key 1 can't be
written.  All of its accesses to the pmem store are in tight code
regions that do wrpkru; mov; wrpkru (just like what the kernel does
with stac/clac on SMAP systems).

>From the app's perspective, it's imperative that stray writes that
coincidentally hit the pmem region fail.  (1 TB is big enough that a
random write to canonical user memory hits it almost 1% of the time.)
This means:

1. If a signal is delivered, the app wants PKRU to be set to some safe
value as early as possible.  The app can do it by itself if it doesn't
use libraries that interfere, but it would be IMO much, much nicer if
the kernel made this happen automatically.

1b. If the app malfunctions such that RSP points to pmem, the kernel
MUST NOT clobber the pmem space.  I think that this basically mandates
that PKRU needs to have some safe state (i.e. definitely not the init
state) on signal delivery: the kernel is going to write a signal frame
at the address identified by RSP, and that address is in pmem, so
those writes need to fail.

2. clone with shared VM should preserve PKRU.

So in this regard, I think signals are kind of like new threads after all.

Now that I think about it more, there are at least two possibilities
that work for this use case.

Option A: signal delivery preserves PKRU.  If we go this route, I
think we should decide whether PKRU is preserved on exit.  Given that
we're talking about adding syscalls to adjust PKRU, it seems a bit odd
to me that sigreturn would, by default, undo whatever those syscalls
do.

Option B: signal delivery resets PKRU to user-specified values.  We're
talking about having syscalls to write PKRU.  We could have clone and
signal entry use the values set by syscall instead of the values in
the actual PKRU register.  That way:

set_protection_key(1, no writes);
...
wrpkru(allow writes to key 1)
mov something    <<<<<<<<<<< fault or async signal here
wrpkru(disallow writes to key 1)

leaves key 1 protected in the signal handler.

If we go that route, I think we must restore PKRU just like any other
xstate on sigreturn, so in some sense it's the simplest of all to
implement.  We'll need to change the signal delivery stuff to do the
fpu__clear and the automatic PKRU write before trying to set up the
signal frame so that the frame is written with the syscall-specified
protections instead of the wrpkru-overridden values, but that should
be easy.


Which reminds me: __get_user, etc all respect PKRU because the SDM
says that PKRU affects all user *and* kernel accesses to user memory
(i.e. anything with _PAGE_USER).  Should get_user_pages also respect
PKRU?

>
> But we've been arguing about 'lightweight signals' for up to two decades that I
> can remember. (The first such suggestion was to not save the FPU state, back when
> FPU saves were ridiculously slow compared to other parts of saving/restoring a
> context.)

I'm not making any claims about lightweightness here.  While
performance is an issue in some cases, I'd like to hash out the
baseline functionality first.

>
> So having a well-enumerated, extensible opt-in mask (which defaults to 'all state
> saved') that allows smart signal handlers to skip the save/restore of certain CPU
> context components would be acceptable I think.

I have a patch that helps considerably with a probably-unnoticeable
ABI change by letting signal delivery use sysret.   I'll dust it off.

--Andy

  reply	other threads:[~2015-12-18 16:04 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-18  1:48 Rethinking sigcontext's xfeatures slightly for PKRU's benefit? Andy Lutomirski
2015-12-18  2:13 ` Dave Hansen
2015-12-18  2:32   ` Andy Lutomirski
2015-12-18  2:52     ` Dave Hansen
2015-12-18  5:29       ` Andy Lutomirski
2015-12-18  6:43         ` H. Peter Anvin
2015-12-18 16:04           ` Andy Lutomirski [this message]
2015-12-18 16:56             ` Dave Hansen
2015-12-18 18:42             ` Dave Hansen
2015-12-18 19:21               ` Andy Lutomirski
2015-12-18 20:07                 ` Dave Hansen
2015-12-18 20:28                   ` Andy Lutomirski
2015-12-18 20:37                   ` Linus Torvalds
2015-12-18 20:49                     ` Andy Lutomirski
2015-12-18 20:58                       ` H. Peter Anvin
2015-12-18 21:02                         ` Andy Lutomirski
2015-12-18 21:08                           ` Dave Hansen
2015-12-18 21:04                       ` Linus Torvalds
2015-12-18 21:09                         ` Linus Torvalds
2015-12-18 21:12                         ` Dave Hansen
2015-12-18 21:45                           ` Linus Torvalds
2015-12-18 22:28                             ` Andy Lutomirski
2015-12-18 23:08                               ` Linus Torvalds
2015-12-18 23:16                                 ` Andy Lutomirski
2015-12-18 23:20                                   ` Linus Torvalds
2015-12-21 17:04                                   ` Dave Hansen
2015-12-21 22:52                                     ` Andy Lutomirski
2015-12-21 23:00                                       ` Dave Hansen
2015-12-21 23:02                                         ` Andy Lutomirski
2015-12-21 23:05                                           ` Dave Hansen
2015-12-21 23:04                               ` Dave Hansen
2015-12-21 23:07                                 ` Andy Lutomirski
2016-06-30 17:36                                   ` Andy Lutomirski
2016-06-30 21:25                                     ` Dave Hansen
2016-07-01 16:30                                       ` Andy Lutomirski
2015-12-29 23:48                             ` Dave Hansen
2015-12-18  8:32         ` Ingo Molnar
2015-12-18  8:59 ` Christoph Hellwig
2015-12-18 12:57   ` Borislav Petkov
2016-01-12 13:38     ` Ingo Molnar
2016-01-12 13:42       ` Christoph Hellwig
2016-01-13 10:48         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrU28FoOwR+n8U0Qtt20=ZW6rgc3=M8X4G0QMt1_ZVHY7Q@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hch@lst.de \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).