LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: "Bae, Chang Seok" <chang.seok.bae@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, Andy Lutomirski <luto@kernel.org>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Andi Kleen <ak@linux.intel.com>,
	"Shankar, Ravi V" <ravi.v.shankar@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>
Subject: Re: [RESEND PATCH v6 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry
Date: Wed, 1 May 2019 10:40:50 -0700
Message-ID: <CALCETrV4zACb9L_FaU12ZF1O6_vjVyGrcyWwk-mfSUhyxGMXJA@mail.gmail.com> (raw)
In-Reply-To: <C79FA889-BD9B-4427-902F-52EE33A3E6EF@intel.com>

On Wed, May 1, 2019 at 6:52 AM Bae, Chang Seok <chang.seok.bae@intel.com> wrote:
>
>
> > On Apr 5, 2019, at 06:50, Andy Lutomirski <luto@amacapital.net> wrote:
> >
> >
> >
> >> On Apr 5, 2019, at 2:35 AM, Thomas Gleixner <tglx@linutronix.de> wrote:
> >>
> >>> On Mon, 25 Mar 2019, Thomas Gleixner wrote:
> >>>> On Fri, 15 Mar 2019, Chang S. Bae wrote:
> >>>> ENTRY(paranoid_exit)
> >>>>   UNWIND_HINT_REGS
> >>>>   DISABLE_INTERRUPTS(CLBR_ANY)
> >>>>   TRACE_IRQS_OFF_DEBUG
> >>>> +    ALTERNATIVE "jmp .Lparanoid_exit_no_fsgsbase",    "nop",\
> >>>> +        X86_FEATURE_FSGSBASE
> >>>> +    wrgsbase    %rbx
> >>>> +    jmp    .Lparanoid_exit_no_swapgs;
> >>>
> >>> Again. A few newlines would make it more readable.
> >>>
> >>> This modifies the semantics of paranoid_entry and paranoid_exit. Looking at
> >>> the usage sites there is the following code in the nmi maze:
> >>>
> >>>   /*
> >>>    * Use paranoid_entry to handle SWAPGS, but no need to use paranoid_exit
> >>>    * as we should not be calling schedule in NMI context.
> >>>    * Even with normal interrupts enabled. An NMI should not be
> >>>    * setting NEED_RESCHED or anything that normal interrupts and
> >>>    * exceptions might do.
> >>>    */
> >>>   call    paranoid_entry
> >>>   UNWIND_HINT_REGS
> >>>
> >>>   /* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
> >>>   movq    %rsp, %rdi
> >>>   movq    $-1, %rsi
> >>>   call    do_nmi
> >>>
> >>>   /* Always restore stashed CR3 value (see paranoid_entry) */
> >>>   RESTORE_CR3 scratch_reg=%r15 save_reg=%r14
> >>>
> >>>   testl    %ebx, %ebx            /* swapgs needed? */
> >>>   jnz    nmi_restore
> >>> nmi_swapgs:
> >>>   SWAPGS_UNSAFE_STACK
> >>> nmi_restore:
> >>>   POP_REGS
> >>>
> >>> I might be missing something, but how is that supposed to work when
> >>> paranoid_entry uses FSGSBASE? I think it's broken, but if it's not then
> >>> there is a big fat comment missing explaining why.
> >>
> >> So this _is_ broken.
> >>
> >>  On entry:
> >>
> >>     rbx = rdgsbase()
> >>     wrgsbase(KERNEL_GS)
> >>
> >>  On exit:
> >>
> >>     if (ebx == 0)
> >>          swapgs
> >>
> >> The resulting matrix:
> >>
> >>  |  ENTRY GS    | RBX        | EXIT        | GS on IRET    | RESULT
> >>  |        |        |        |        |
> >> 1 |  KERNEL_GS    | KERNEL_GS    | EBX == 0    | USER_GS    | FAIL
> >>  |        |        |        |        |
> >> 2 |  KERNEL_GS    | KERNEL_GS    | EBX != 0    | KERNEL_GS    | ok
> >>  |        |        |        |        |
> >> 3 |  USER_GS    | USER_GS    | EBX == 0    | USER_GS    | ok
> >>  |        |        |        |        |
> >> 4 |  USER_GS    | USER_GS    | EBX != 0    | KERNEL_GS    | FAIL
> >>
> >>
> >> #1 Just works by chance because it's unlikely that the lower 32bits of a
> >>  per CPU kernel GS are all 0.
> >>
> >>  But it's just a question of probability that this turns into a
> >>  non-debuggable once per year crash (think KASLR).
> >>
> >> #4 This can happen when the NMI hits the kernel in some other entry code
> >>  _BEFORE_ or _AFTER_ swapgs.
> >>
> >>  User space using GS addressing with GS[31:0] != 0 will crash and burn.
> >>
> >>
> >
> > Hi all-
> >
> > In a previous incarnation of these patches, I complained about the use of SWAPGS in the paranoid path. Now I’m putting my maintainer foot down.  On a non-FSGSBASE system, the paranoid path known, definitively, which GS is where, so SWAPGS is annoying. With FSGSBASE, unless you start looking at the RIP that you interrupted, you cannot know whether you have user or kernel GSBASE live, since they can have literally the same value.  One of the numerous versions of this patch compared the values and just said “well, it’s harmless to SWAPGS if user code happens to use the same value as the kernel”.  I complained that it was far too fragile.
> >
> > So I’m putting my foot down. If you all want my ack, you’re going to save the old GS, load the new one with WRGSBASE, and, on return, you’re going to restore the old one with WRGSBASE. You will not use SWAPGS in the paranoid path.
> >
> > Obviously, for the non-paranoid path, it all keeps working exactly like it does now.
>
> Although I can see some other concerns with this, looks like it is still worth pursuing.
>
> >
> > Furthermore, if you folks even want me to review this series, the ptrace tests need to be in place.  On inspection of the current code (after the debacle a few releases back), it appears the SETREGSET’s effect depends on the current values in the registers — it does not actually seem to reliably load the whole state. So my confidence will be greatly increased if your series first adds a test that detects that bug (and fails!), then fixes the bug in a tiny little patch, then adds FSGSBASE, and keeps the test working.
> >
>
> I think I need to understand the issue. Appreciate if you can elaborate a little bit.
>

This patch series gives a particular behavior to PTRACE_SETREGS and
PTRACE_POKEUSER.  There should be a test case that validates that
behavior, including testing the weird cases where gs != 0 and gsbase
contains unusual values.  Some existing tests might be pretty close to
doing what's needed.

Beyond that, the current putreg() code does this:

    case offsetof(struct user_regs_struct,gs_base):
        /*
         * Exactly the same here as the %fs handling above.
         */
        if (value >= TASK_SIZE_MAX)
            return -EIO;
        if (child->thread.gsbase != value)
            return do_arch_prctl_64(child, ARCH_SET_GS, value);
        return 0;

and do_arch_prctl_64(), in turn, does this:

    case ARCH_SET_GS: {
        if (unlikely(arg2 >= TASK_SIZE_MAX))
            return -EPERM;

        preempt_disable();
        /*
         * ARCH_SET_GS has always overwritten the index
         * and the base. Zero is the most sensible value
         * to put in the index, and is the only value that
         * makes any sense if FSGSBASE is unavailable.
         */
        if (task == current) {
         [not used for ptrace]
        } else {
            task->thread.gsindex = 0;
            x86_gsbase_write_task(task, arg2);
        }

        ...

So writing the value that was already there to gsbase via putreg()
does nothing, but writing a *different* value implicitly clears gs,
but writing a different value will clear gs.

This behavior is, AFAICT, complete nonsense.  It happens to work
because usually gdb writes the same value back, and, in any case, gs
comes *after* gsbase in user_regs_struct, so gs gets replaced anyway.
But I think that this behavior should be fixed up and probably tested.
Certainly the behavior should *not* be the same on a fsgsbase kernel,
and and the fsgsbase behavior definitely needs a selftest.

  reply index

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-15 20:06 [RESEND PATCH v6 00/12] x86: Enable FSGSBASE instructions Chang S. Bae
2019-03-15 20:06 ` [RESEND PATCH v6 01/12] x86/fsgsbase/64: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE Chang S. Bae
2019-03-15 20:06 ` [RESEND PATCH v6 02/12] kbuild: Raise the minimum required binutils version to 2.21 Chang S. Bae
2019-03-15 20:06 ` [RESEND PATCH v6 03/12] x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions Chang S. Bae
2019-03-15 20:06 ` [RESEND PATCH v6 04/12] x86/fsgsbase/64: Enable FSGSBASE instructions in the helper functions Chang S. Bae
2019-03-25 11:38   ` Thomas Gleixner
2019-03-25 12:46     ` Thomas Gleixner
2019-03-25 13:05       ` Thomas Gleixner
2019-03-26  0:38     ` Andi Kleen
2019-03-26 15:01       ` New feature/ABI review process [was Re: [RESEND PATCH v6 04/12] x86/fsgsbase/64:..] Thomas Gleixner
2019-03-26 22:56         ` Andi Kleen
2019-03-27 21:15           ` Thomas Gleixner
2019-03-15 20:06 ` [RESEND PATCH v6 05/12] x86/fsgsbase/64: Preserve FS/GS state in __switch_to() if FSGSBASE is on Chang S. Bae
2019-03-15 20:06 ` [RESEND PATCH v6 06/12] x86/fsgsbase/64: When copying a thread, use the FSGSBASE instructions if available Chang S. Bae
2019-03-15 20:06 ` [RESEND PATCH v6 07/12] x86/fsgsbase/64: Introduce the FIND_PERCPU_BASE macro Chang S. Bae
2019-03-25  9:02   ` Thomas Gleixner
2019-05-01 13:52     ` Bae, Chang Seok
2019-03-15 20:06 ` [RESEND PATCH v6 08/12] x86/fsgsbase/64: Use the per-CPU base as GSBASE at the paranoid_entry Chang S. Bae
2019-03-25  9:44   ` Thomas Gleixner
2019-04-05  8:35     ` Thomas Gleixner
2019-04-05 13:50       ` Andy Lutomirski
2019-05-01 13:52         ` Bae, Chang Seok
2019-05-01 17:40           ` Andy Lutomirski [this message]
2019-05-01 18:01             ` Bae, Chang Seok
     [not found]               ` <7029A32B-958E-4C1E-8B5F-D49BA68E4755@intel.com>
2019-05-01 20:25                 ` Andy Lutomirski
2019-05-01 21:04                   ` Bae, Chang Seok
2019-05-02  0:29                     ` Andy Lutomirski
2019-05-06 22:56     ` Bae, Chang Seok
2019-03-15 20:06 ` [RESEND PATCH v6 09/12] selftests/x86/fsgsbase: Test WRGSBASE Chang S. Bae
2019-03-15 20:06 ` [RESEND PATCH v6 10/12] x86/fsgsbase/64: Enable FSGSBASE by default and add a chicken bit Chang S. Bae
2019-03-15 20:06 ` [RESEND PATCH v6 11/12] x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 Chang S. Bae
2019-03-15 20:06 ` [RESEND PATCH v6 12/12] x86/fsgsbase/64: Add documentation for FSGSBASE Chang S. Bae
2019-03-30 16:15   ` Randy Dunlap
2019-03-26  0:43 ` [RESEND PATCH v6 00/12] x86: Enable FSGSBASE instructions Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrV4zACb9L_FaU12ZF1O6_vjVyGrcyWwk-mfSUhyxGMXJA@mail.gmail.com \
    --to=luto@kernel.org \
    --cc=ak@linux.intel.com \
    --cc=chang.seok.bae@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=ravi.v.shankar@intel.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git