All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: David Laight <David.Laight@aculab.com>
Cc: Brian Gerst <brgerst@gmail.com>,
	Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH] x86: Remove force_iret()
Date: Fri, 20 Dec 2019 18:30:19 +0800	[thread overview]
Message-ID: <D890A7DA-542B-42A7-8F82-CDBA6EBCA958@amacapital.net> (raw)
In-Reply-To: <431a146f6461402da61d09fff155f35b@AcuMS.aculab.com>



> On Dec 20, 2019, at 6:10 PM, David Laight <David.Laight@aculab.com> wrote:
> 
> From: Brian Gerst
>> Sent: 20 December 2019 03:48
>>> On Thu, Dec 19, 2019 at 8:50 PM Andy Lutomirski <luto@kernel.org> wrote:
>>> 
>>> On Thu, Dec 19, 2019 at 3:58 AM Brian Gerst <brgerst@gmail.com> wrote:
>>>> 
>>>> force_iret() was originally intended to prevent the return to user mode with
>>>> the SYSRET or SYSEXIT instructions, in cases where the register state could
>>>> have been changed to be incompatible with those instructions.
>>> 
>>> It's more than that.  Before the big syscall rework, we didn't restore
>>> the caller-saved regs.  See:
>>> 
>>> commit 21d375b6b34ff511a507de27bf316b3dde6938d9
>>> Author: Andy Lutomirski <luto@kernel.org>
>>> Date:   Sun Jan 28 10:38:49 2018 -0800
>>> 
>>>    x86/entry/64: Remove the SYSCALL64 fast path
>>> 
>>> So if you changed r12, for example, the change would get lost.
>> 
>> force_iret() specifically dealt with changes to CS, SS and EFLAGS.
>> Saving and restoring the extra registers was a different problem
>> although it affected the same functions like ptrace, signals, and
>> exec.
> 
> Is it ever possible for any of the segment registers to refer to the LDT
> and for another thread to invalidate the entries 'very late' ?

Not in newer kernels, because the actual LDT is never modified.  Instead, LDT changes create a whole new LDT and propagate it with an IPI.

But the IRET path can fail due to changes to the selectors while in the kernel, due to sigreturn or ptrace.  We have delightful selftests for this.

> 
> So even though the values were valid when changed, they are
> invalid during the 'return to user' sequence.
> 
> I remember writing a signal handler that 'corrupted' all the
> segment registers (etc) and fixing the NetBSD kernel to handle
> all the faults restoring the segment registers and IRET faulting
> in kernel (IIRC invalid user %SS or %CS).
> (IRET can also fault in user space, but that is a normal fault.)

Did you remember to test the #NP case?  Many kernels forgot that this was possible :)

> 
> Is it actually cheaper to properly validate the segment registers,
> or take the 'hit' of the slightly slower IRET path and get the cpu
> to do it for you?
> 
> 

The validation we’re talking about is for SYSRET, not IRET.  It has its own set of nasty conditions involving EFLAGS, R11, RIP, and RCX.  Fortunately no segments are involved. The algorithm is, roughly:

if (okay for SYSRET) {
  SYSRET (and assume it can’t fail)
} else {
  if (need ESPFIX)
   Horrible hacks;
  IRET;
}

And we handle #GP, #SS, #NP and #DF from IRET. And we have selftests for all of this. And no one runs the bloody selftests on 32-bit kernels, resulting in truly awful bugs.

We can’t handle #GP from SYSRET. Thanks, Intel.

(AMD gets this more right. SYSRET is still a turd, but it can’t fault. Intel handles RIP canonical checks differently from AMD, and SYSRET will #GP if RCX is noncanonical.  The result was privilege escalation on basically every OS when this was noticed.)

  reply	other threads:[~2019-12-20 10:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-19 11:58 [PATCH] x86: Remove force_iret() Brian Gerst
2019-12-20  1:49 ` Andy Lutomirski
2019-12-20  3:48   ` Brian Gerst
2019-12-20 10:10     ` David Laight
2019-12-20 10:30       ` Andy Lutomirski [this message]
2019-12-20 10:59         ` David Laight
2019-12-20 21:20           ` Andy Lutomirski
2019-12-20 12:18       ` Brian Gerst
2019-12-20 12:35         ` David Laight
2019-12-20 19:35 ` Oleg Nesterov
2020-01-08 20:30 ` [tip: x86/asm] " tip-bot2 for Brian Gerst

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D890A7DA-542B-42A7-8F82-CDBA6EBCA958@amacapital.net \
    --to=luto@amacapital.net \
    --cc=David.Laight@aculab.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=brgerst@gmail.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=oleg@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.