All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joerg Roedel <joro@8bytes.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Juergen Gross <jgross@suse.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>, Jiri Kosina <jkosina@suse.cz>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Brian Gerst <brgerst@gmail.com>,
	David Laight <David.Laight@aculab.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Eduardo Valentin <eduval@amazon.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	Will Deacon <will.deacon@arm.com>,
	aliguori@amazon.com, daniel.gruss@iaik.tugraz.at,
	hughd@google.com, keescook@google.com,
	Andrea Arcangeli <aarcange@redhat.com>,
	Waiman Long <llong@redhat.com>, Pavel Machek <pavel@ucw.cz>,
	"David H . Gutteridge" <dhgutteridge@sympatico.ca>,
	jroedel@suse.de
Subject: Re: [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack
Date: Fri, 13 Jul 2018 12:56:20 +0200	[thread overview]
Message-ID: <20180713105620.z6bjhqzfez2hll6r@8bytes.org> (raw)
In-Reply-To: <A66D58A6-3DC6-4CF3-B2A5-433C6E974060@amacapital.net>

Hi Andy,

thanks for you valuable feedback.

On Thu, Jul 12, 2018 at 02:09:45PM -0700, Andy Lutomirski wrote:
> > On Jul 11, 2018, at 4:29 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > -.macro SAVE_ALL pt_regs_ax=%eax
> > +.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
> >    cld
> > +    /* Push segment registers and %eax */
> >    PUSH_GS
> >    pushl    %fs
> >    pushl    %es
> >    pushl    %ds
> >    pushl    \pt_regs_ax
> > +
> > +    /* Load kernel segments */
> > +    movl    $(__USER_DS), %eax
> 
> If \pt_regs_ax != %eax, then this will behave oddly. Maybe it’s okay.
> But I don’t see why this change was needed at all.

This is a left-over from a previous approach I tried and then abandoned
later. You are right, it is not needed.

> > +/*
> > + * Called with pt_regs fully populated and kernel segments loaded,
> > + * so we can access PER_CPU and use the integer registers.
> > + *
> > + * We need to be very careful here with the %esp switch, because an NMI
> > + * can happen everywhere. If the NMI handler finds itself on the
> > + * entry-stack, it will overwrite the task-stack and everything we
> > + * copied there. So allocate the stack-frame on the task-stack and
> > + * switch to it before we do any copying.
> 
> Ick, right. Same with machine check, though. You could alternatively
> fix it by running NMIs on an irq stack if the irq count is zero.  How
> confident are you that you got #MC right?

Pretty confident, #MC uses the exception entry path which also handles
entry-stack and user-cr3 correctly. It might go through through the slow
paranoid exit path, but that's okay for #MC I guess.

And when the #MC happens while we switch to the task stack and do the
copying the same precautions as for NMI apply.

> > + */
> > +.macro SWITCH_TO_KERNEL_STACK
> > +
> > +    ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
> > +
> > +    /* Are we on the entry stack? Bail out if not! */
> > +    movl    PER_CPU_VAR(cpu_entry_area), %edi
> > +    addl    $CPU_ENTRY_AREA_entry_stack, %edi
> > +    cmpl    %esp, %edi
> > +    jae    .Lend_\@
> 
> That’s an alarming assumption about the address space layout. How
> about an xor and an and instead of cmpl?  As it stands, if the address
> layout ever changes, the failure may be rather subtle.

Right, I implement a more restrictive check.

> Anyway, wouldn’t it be easier to solve this by just not switching
> stacks on entries from kernel mode and making the entry stack bigger?
> Stick an assertion in the scheduling code that we’re not on an entry
> stack, perhaps.

That'll save us the check whether we are on the entry stack and replace
it with a check whether we are coming from user/vm86 mode. I don't think
that this will simplify things much and I am a bit afraid that it'll
break unwritten assumptions elsewhere. It is probably something we can
look into later separatly from the basic pti-x32 enablement.


Thanks,

	Joerg


WARNING: multiple messages have this Message-ID (diff)
From: Joerg Roedel <joro@8bytes.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>, "H . Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Juergen Gross <jgross@suse.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bp@alien8.de>, Jiri Kosina <jkosina@suse.cz>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Brian Gerst <brgerst@gmail.com>,
	David Laight <David.Laight@aculab.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Eduardo Valentin <eduval@amazon.com>,
	Greg KH <gregkh@linuxfoundation.org>,
	Will Deacon <will.deacon@arm.com>,
	aliguori@amazon.com, daniel.gruss@iaik.tugraz.at,
	hughd@google.com, keescook@google.com,
	Andrea Arcangeli <aarcange@redhat.com>,
	Waiman Long <llong@redhat.com>, Pavel Machek <pavel@ucw.cz>,
	"David H . Gutteridge" <dhgutteridge@sympatico.ca>,
	jroedel@suse.de
Subject: Re: [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack
Date: Fri, 13 Jul 2018 12:56:20 +0200	[thread overview]
Message-ID: <20180713105620.z6bjhqzfez2hll6r@8bytes.org> (raw)
In-Reply-To: <A66D58A6-3DC6-4CF3-B2A5-433C6E974060@amacapital.net>

Hi Andy,

thanks for you valuable feedback.

On Thu, Jul 12, 2018 at 02:09:45PM -0700, Andy Lutomirski wrote:
> > On Jul 11, 2018, at 4:29 AM, Joerg Roedel <joro@8bytes.org> wrote:
> > -.macro SAVE_ALL pt_regs_ax=%eax
> > +.macro SAVE_ALL pt_regs_ax=%eax switch_stacks=0
> >    cld
> > +    /* Push segment registers and %eax */
> >    PUSH_GS
> >    pushl    %fs
> >    pushl    %es
> >    pushl    %ds
> >    pushl    \pt_regs_ax
> > +
> > +    /* Load kernel segments */
> > +    movl    $(__USER_DS), %eax
> 
> If \pt_regs_ax != %eax, then this will behave oddly. Maybe ita??s okay.
> But I dona??t see why this change was needed at all.

This is a left-over from a previous approach I tried and then abandoned
later. You are right, it is not needed.

> > +/*
> > + * Called with pt_regs fully populated and kernel segments loaded,
> > + * so we can access PER_CPU and use the integer registers.
> > + *
> > + * We need to be very careful here with the %esp switch, because an NMI
> > + * can happen everywhere. If the NMI handler finds itself on the
> > + * entry-stack, it will overwrite the task-stack and everything we
> > + * copied there. So allocate the stack-frame on the task-stack and
> > + * switch to it before we do any copying.
> 
> Ick, right. Same with machine check, though. You could alternatively
> fix it by running NMIs on an irq stack if the irq count is zero.  How
> confident are you that you got #MC right?

Pretty confident, #MC uses the exception entry path which also handles
entry-stack and user-cr3 correctly. It might go through through the slow
paranoid exit path, but that's okay for #MC I guess.

And when the #MC happens while we switch to the task stack and do the
copying the same precautions as for NMI apply.

> > + */
> > +.macro SWITCH_TO_KERNEL_STACK
> > +
> > +    ALTERNATIVE     "", "jmp .Lend_\@", X86_FEATURE_XENPV
> > +
> > +    /* Are we on the entry stack? Bail out if not! */
> > +    movl    PER_CPU_VAR(cpu_entry_area), %edi
> > +    addl    $CPU_ENTRY_AREA_entry_stack, %edi
> > +    cmpl    %esp, %edi
> > +    jae    .Lend_\@
> 
> Thata??s an alarming assumption about the address space layout. How
> about an xor and an and instead of cmpl?  As it stands, if the address
> layout ever changes, the failure may be rather subtle.

Right, I implement a more restrictive check.

> Anyway, wouldna??t it be easier to solve this by just not switching
> stacks on entries from kernel mode and making the entry stack bigger?
> Stick an assertion in the scheduling code that wea??re not on an entry
> stack, perhaps.

That'll save us the check whether we are on the entry stack and replace
it with a check whether we are coming from user/vm86 mode. I don't think
that this will simplify things much and I am a bit afraid that it'll
break unwritten assumptions elsewhere. It is probably something we can
look into later separatly from the basic pti-x32 enablement.


Thanks,

	Joerg

  reply	other threads:[~2018-07-13 10:56 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 11:29 [PATCH 00/39 v7] PTI support for x86-32 Joerg Roedel
2018-07-11 11:29 ` [PATCH 01/39] x86/asm-offsets: Move TSS_sp0 and TSS_sp1 to asm-offsets.c Joerg Roedel
2018-07-12 20:44   ` Andy Lutomirski
2018-07-11 11:29 ` [PATCH 02/39] x86/entry/32: Rename TSS_sysenter_sp0 to TSS_entry_stack Joerg Roedel
2018-07-12 20:44   ` Andy Lutomirski
2018-07-11 11:29 ` [PATCH 03/39] x86/entry/32: Load task stack from x86_tss.sp1 in SYSENTER handler Joerg Roedel
2018-07-12 20:49   ` Andy Lutomirski
2018-07-13  9:48     ` Joerg Roedel
2018-07-13  9:48       ` Joerg Roedel
2018-07-13 17:19       ` Andy Lutomirski
2018-07-13 23:17         ` Andy Lutomirski
2018-07-17  7:05           ` Joerg Roedel
2018-07-17 20:04             ` Andy Lutomirski
2018-07-11 11:29 ` [PATCH 04/39] x86/entry/32: Put ESPFIX code into a macro Joerg Roedel
2018-07-11 11:29 ` [PATCH 05/39] x86/entry/32: Unshare NMI return path Joerg Roedel
2018-07-12 20:53   ` Andy Lutomirski
2018-07-13 10:05     ` Joerg Roedel
2018-07-13 17:26       ` Andy Lutomirski
2018-07-11 11:29 ` [PATCH 06/39] x86/entry/32: Split off return-to-kernel path Joerg Roedel
2018-07-11 11:29 ` [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack Joerg Roedel
2018-07-12 21:09   ` Andy Lutomirski
2018-07-13 10:56     ` Joerg Roedel [this message]
2018-07-13 10:56       ` Joerg Roedel
2018-07-13 17:21       ` Andy Lutomirski
2018-07-17  7:07         ` Joerg Roedel
2018-07-11 11:29 ` [PATCH 08/39] x86/entry/32: Leave " Joerg Roedel
2018-07-11 11:29 ` [PATCH 09/39] x86/entry/32: Introduce SAVE_ALL_NMI and RESTORE_ALL_NMI Joerg Roedel
2018-07-11 11:29 ` [PATCH 10/39] x86/entry/32: Handle Entry from Kernel-Mode on Entry-Stack Joerg Roedel
2018-07-13 23:31   ` Andy Lutomirski
2018-07-14  5:21     ` Joerg Roedel
2018-07-14  6:26       ` Andy Lutomirski
2018-07-14  8:01         ` Joerg Roedel
2018-07-14  8:01           ` Joerg Roedel
2018-07-14 14:36           ` Andy Lutomirski
2018-07-17  7:15             ` Joerg Roedel
2018-07-17  7:15               ` Joerg Roedel
2018-07-17 20:06               ` Andy Lutomirski
2018-07-18 11:59                 ` Joerg Roedel
2018-07-11 11:29 ` [PATCH 11/39] x86/entry/32: Simplify debug entry point Joerg Roedel
2018-07-11 11:29 ` [PATCH 12/39] x86/32: Use tss.sp1 as cpu_current_top_of_stack Joerg Roedel
2018-07-11 11:29 ` [PATCH 13/39] x86/entry/32: Add PTI cr3 switch to non-NMI entry/exit points Joerg Roedel
2018-07-11 11:29 ` [PATCH 14/39] x86/entry/32: Add PTI cr3 switches to NMI handler code Joerg Roedel
2018-07-11 11:29 ` [PATCH 15/39] x86/pgtable: Rename pti_set_user_pgd to pti_set_user_pgtbl Joerg Roedel
2018-07-11 11:29 ` [PATCH 16/39] x86/pgtable/pae: Unshare kernel PMDs when PTI is enabled Joerg Roedel
2018-07-11 11:29 ` [PATCH 17/39] x86/pgtable/32: Allocate 8k page-tables " Joerg Roedel
2018-07-11 11:29 ` [PATCH 18/39] x86/pgtable: Move pgdp kernel/user conversion functions to pgtable.h Joerg Roedel
2018-07-11 11:29 ` [PATCH 19/39] x86/pgtable: Move pti_set_user_pgtbl() " Joerg Roedel
2018-07-11 11:29 ` [PATCH 20/39] x86/pgtable: Move two more functions from pgtable_64.h " Joerg Roedel
2018-07-11 11:29 ` [PATCH 21/39] x86/mm/pae: Populate valid user PGD entries Joerg Roedel
2018-07-11 11:29 ` [PATCH 22/39] x86/mm/pae: Populate the user page-table with user pgd's Joerg Roedel
2018-07-11 11:29 ` [PATCH 23/39] x86/mm/legacy: " Joerg Roedel
2018-07-11 11:29 ` [PATCH 24/39] x86/mm/pti: Add an overflow check to pti_clone_pmds() Joerg Roedel
2018-07-11 11:29 ` [PATCH 25/39] x86/mm/pti: Define X86_CR3_PTI_PCID_USER_BIT on x86_32 Joerg Roedel
2018-07-11 11:29 ` [PATCH 26/39] x86/mm/pti: Clone CPU_ENTRY_AREA on PMD level " Joerg Roedel
2018-07-11 11:29 ` [PATCH 27/39] x86/mm/pti: Make pti_clone_kernel_text() compile on 32 bit Joerg Roedel
2018-07-11 11:29 ` [PATCH 28/39] x86/mm/pti: Keep permissions when cloning kernel text in pti_clone_kernel_text() Joerg Roedel
2018-07-13 23:25   ` Andy Lutomirski
2018-07-11 11:29 ` [PATCH 29/39] x86/mm/pti: Introduce pti_finalize() Joerg Roedel
2018-07-11 11:29 ` [PATCH 30/39] x86/mm/pti: Clone entry-text again in pti_finalize() Joerg Roedel
2018-07-13 23:21   ` Andy Lutomirski
2018-07-14  5:04     ` Joerg Roedel
2018-07-11 11:29 ` [PATCH 31/39] x86/mm/dump_pagetables: Define INIT_PGD Joerg Roedel
2018-07-11 11:29 ` [PATCH 32/39] x86/pgtable/pae: Use separate kernel PMDs for user page-table Joerg Roedel
2018-07-11 11:29 ` [PATCH 33/39] x86/ldt: Reserve address-space range on 32 bit for the LDT Joerg Roedel
2018-07-11 11:29 ` [PATCH 34/39] x86/ldt: Define LDT_END_ADDR Joerg Roedel
2018-07-13 17:29   ` Andy Lutomirski
2018-07-11 11:29 ` [PATCH 35/39] x86/ldt: Split out sanity check in map_ldt_struct() Joerg Roedel
2018-07-13 23:18   ` Andy Lutomirski
2018-07-11 11:29 ` [PATCH 36/39] x86/ldt: Enable LDT user-mapping for PAE Joerg Roedel
2018-07-11 11:29 ` [PATCH 37/39] x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32 Joerg Roedel
2018-07-11 11:29 ` [PATCH 38/39] x86/mm/pti: Add Warning when booting on a PCID capable CPU Joerg Roedel
2018-07-13 18:59   ` Andy Lutomirski
2018-07-14  5:08     ` Joerg Roedel
2018-07-11 11:29 ` [PATCH 39/39] x86/entry/32: Add debug code to check entry/exit cr3 Joerg Roedel
2018-07-13 17:28   ` Andy Lutomirski
2018-07-14  5:09     ` Joerg Roedel
2018-07-11 16:28 ` [PATCH 00/39 v7] PTI support for x86-32 Linus Torvalds
2018-07-11 17:28   ` Jiri Kosina
2018-07-11 19:57     ` Thomas Backlund
2018-07-12 13:59       ` Boris Ostrovsky
2018-07-11 21:07   ` Pavel Machek
2018-07-16  7:51 ` Pavel Machek
2018-07-17  2:07 ` David H. Gutteridge
2018-07-17  6:16   ` Joerg Roedel
2018-07-18  9:40 [PATCH 00/39 v8] " Joerg Roedel
2018-07-18  9:40 ` [PATCH 07/39] x86/entry/32: Enter the kernel via trampoline stack Joerg Roedel
2018-07-18 18:09   ` Brian Gerst
2018-07-19 20:52     ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180713105620.z6bjhqzfez2hll6r@8bytes.org \
    --to=joro@8bytes.org \
    --cc=David.Laight@aculab.com \
    --cc=aarcange@redhat.com \
    --cc=aliguori@amazon.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=daniel.gruss@iaik.tugraz.at \
    --cc=dave.hansen@intel.com \
    --cc=dhgutteridge@sympatico.ca \
    --cc=dvlasenk@redhat.com \
    --cc=eduval@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hpa@zytor.com \
    --cc=hughd@google.com \
    --cc=jgross@suse.com \
    --cc=jkosina@suse.cz \
    --cc=jpoimboe@redhat.com \
    --cc=jroedel@suse.de \
    --cc=keescook@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=llong@redhat.com \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.