All of lore.kernel.org
 help / color / mirror / Atom feed
From: Thomas Gleixner <tglx@linutronix.de>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Andy Lutomirsky <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bpetkov@suse.de>,
	Greg KH <gregkh@linuxfoundation.org>,
	Kees Cook <keescook@google.com>, Hugh Dickins <hughd@google.com>,
	Brian Gerst <brgerst@gmail.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Juergen Gross <jgross@suse.com>,
	David Laight <David.Laight@aculab.com>,
	Eduardo Valentin <eduval@amazon.com>,
	"Liguori, Anthony" <aliguori@amazon.com>,
	Will Deacon <will.deacon@arm.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: [patch 13/16] x86/ldt: Introduce LDT write fault handler
Date: Tue, 12 Dec 2017 22:41:03 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1712122219580.2289@nanos> (raw)
In-Reply-To: <alpine.DEB.2.20.1712122124320.2289@nanos>

On Tue, 12 Dec 2017, Thomas Gleixner wrote:
> On Tue, 12 Dec 2017, Dave Hansen wrote:
> 
> > On 12/12/2017 11:21 AM, Thomas Gleixner wrote:
> > > The only critical interaction is the return to user path (user CS/SS) and
> > > we made sure with the LAR touching that these are precached in the CPU
> > > before we go into fragile exit code.
> > 
> > How do we make sure that it _stays_ cached?
> > 
> > Surely there is weird stuff like WBINVD or SMI's that can come at very
> > inconvenient times and wipe it out of the cache.
> 
> This does not look like cache in the sense of memory cache. It seems to be
> CPU internal state and I just stuffed WBINVD and alternatively CLFLUSH'ed
> the entries after the 'touch' via LAR. Still works.

Dave pointed me once more to the following paragraph in the SDM, which
Peter and I looked at before and we tried that w/o success:

    If the segment descriptors in the GDT or an LDT are placed in ROM, the
    processor can enter an indefinite loop if software or the processor
    attempts to update (write to) the ROM-based segment descriptors. To
    prevent this problem, set the accessed bits for all segment descriptors
    placed in a ROM. Also, remove operating-system or executive code that
    attempts to modify segment descriptors located in ROM.

Now that made me go back to the state of the patch series which made us
make that magic 'touch' and write fault handler. The difference to the code
today is that it did not prepopulate the user visible mapping.

We added that later because we were worried about not being able to
populate it in the #PF due to memory pressure without ripping out the magic
cure again.

But I did now and actually removing both the user exit magic 'touch' code
and the write fault handler keeps it working.

Removing the prepopulate code makes it break again with a #GP in
IRET/SYSRET.

What happens there is that the IRET pops SS (with a minimal testcase) which
causes the #PF. That populates the PTE and returns happily. Right after
that the #GP comes in with IP pointing to the user space instruction right
after the syscall.

That simplifies and descaryfies that code massively.

Darn, I should have gone back and check every part again as I usually do,
but my fried brain failed.

Thanks,

	tglx

WARNING: multiple messages have this Message-ID (diff)
From: Thomas Gleixner <tglx@linutronix.de>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	the arch/x86 maintainers <x86@kernel.org>,
	Andy Lutomirsky <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Borislav Petkov <bpetkov@suse.de>,
	Greg KH <gregkh@linuxfoundation.org>,
	Kees Cook <keescook@google.com>, Hugh Dickins <hughd@google.com>,
	Brian Gerst <brgerst@gmail.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Juergen Gross <jgross@suse.com>,
	David Laight <David.Laight@aculab.com>,
	Eduardo Valentin <eduval@amazon.com>,
	"Liguori, Anthony" <aliguori@amazon.com>,
	Will Deacon <will.deacon@arm.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: [patch 13/16] x86/ldt: Introduce LDT write fault handler
Date: Tue, 12 Dec 2017 22:41:03 +0100 (CET)	[thread overview]
Message-ID: <alpine.DEB.2.20.1712122219580.2289@nanos> (raw)
In-Reply-To: <alpine.DEB.2.20.1712122124320.2289@nanos>

On Tue, 12 Dec 2017, Thomas Gleixner wrote:
> On Tue, 12 Dec 2017, Dave Hansen wrote:
> 
> > On 12/12/2017 11:21 AM, Thomas Gleixner wrote:
> > > The only critical interaction is the return to user path (user CS/SS) and
> > > we made sure with the LAR touching that these are precached in the CPU
> > > before we go into fragile exit code.
> > 
> > How do we make sure that it _stays_ cached?
> > 
> > Surely there is weird stuff like WBINVD or SMI's that can come at very
> > inconvenient times and wipe it out of the cache.
> 
> This does not look like cache in the sense of memory cache. It seems to be
> CPU internal state and I just stuffed WBINVD and alternatively CLFLUSH'ed
> the entries after the 'touch' via LAR. Still works.

Dave pointed me once more to the following paragraph in the SDM, which
Peter and I looked at before and we tried that w/o success:

    If the segment descriptors in the GDT or an LDT are placed in ROM, the
    processor can enter an indefinite loop if software or the processor
    attempts to update (write to) the ROM-based segment descriptors. To
    prevent this problem, set the accessed bits for all segment descriptors
    placed in a ROM. Also, remove operating-system or executive code that
    attempts to modify segment descriptors located in ROM.

Now that made me go back to the state of the patch series which made us
make that magic 'touch' and write fault handler. The difference to the code
today is that it did not prepopulate the user visible mapping.

We added that later because we were worried about not being able to
populate it in the #PF due to memory pressure without ripping out the magic
cure again.

But I did now and actually removing both the user exit magic 'touch' code
and the write fault handler keeps it working.

Removing the prepopulate code makes it break again with a #GP in
IRET/SYSRET.

What happens there is that the IRET pops SS (with a minimal testcase) which
causes the #PF. That populates the PTE and returns happily. Right after
that the #GP comes in with IP pointing to the user space instruction right
after the syscall.

That simplifies and descaryfies that code massively.

Darn, I should have gone back and check every part again as I usually do,
but my fried brain failed.

Thanks,

	tglx







--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2017-12-12 21:41 UTC|newest]

Thread overview: 134+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-12 17:32 [patch 00/16] x86/ldt: Use a VMA based read only mapping Thomas Gleixner
2017-12-12 17:32 ` Thomas Gleixner
2017-12-12 17:32 ` [patch 01/16] arch: Allow arch_dup_mmap() to fail Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 02/16] x86/ldt: Rework locking Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 03/16] x86/ldt: Prevent ldt inheritance on exec Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 04/16] mm/softdirty: Move VM_SOFTDIRTY into high bits Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 05/16] mm: Allow special mappings with user access cleared Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 18:00   ` Andy Lutomirski
2017-12-12 18:00     ` Andy Lutomirski
2017-12-12 18:05     ` Peter Zijlstra
2017-12-12 18:05       ` Peter Zijlstra
2017-12-12 18:06       ` Andy Lutomirski
2017-12-12 18:06         ` Andy Lutomirski
2017-12-12 18:25         ` Peter Zijlstra
2017-12-12 18:25           ` Peter Zijlstra
2017-12-13 12:22     ` Peter Zijlstra
2017-12-13 12:22       ` Peter Zijlstra
2017-12-13 12:57       ` Kirill A. Shutemov
2017-12-13 12:57         ` Kirill A. Shutemov
2017-12-13 14:34         ` Peter Zijlstra
2017-12-13 14:34           ` Peter Zijlstra
2017-12-13 14:43           ` Kirill A. Shutemov
2017-12-13 14:43             ` Kirill A. Shutemov
2017-12-13 15:00             ` Peter Zijlstra
2017-12-13 15:00               ` Peter Zijlstra
2017-12-13 15:04               ` Peter Zijlstra
2017-12-13 15:04                 ` Peter Zijlstra
2017-12-13 15:14         ` Dave Hansen
2017-12-13 15:14           ` Dave Hansen
2017-12-13 15:32           ` Peter Zijlstra
2017-12-13 15:32             ` Peter Zijlstra
2017-12-13 15:47             ` Dave Hansen
2017-12-13 15:47               ` Dave Hansen
2017-12-13 15:54               ` Peter Zijlstra
2017-12-13 15:54                 ` Peter Zijlstra
2017-12-13 18:08                 ` Linus Torvalds
2017-12-13 18:08                   ` Linus Torvalds
2017-12-13 18:21                   ` Dave Hansen
2017-12-13 18:21                     ` Dave Hansen
2017-12-13 18:23                     ` Linus Torvalds
2017-12-13 18:23                       ` Linus Torvalds
2017-12-13 18:31                   ` Andy Lutomirski
2017-12-13 18:31                     ` Andy Lutomirski
2017-12-13 18:32                   ` Peter Zijlstra
2017-12-13 18:32                     ` Peter Zijlstra
2017-12-13 18:35                     ` Linus Torvalds
2017-12-13 18:35                       ` Linus Torvalds
2017-12-14  4:53                   ` Aneesh Kumar K.V
2017-12-14  4:53                     ` Aneesh Kumar K.V
2017-12-13 21:50   ` Matthew Wilcox
2017-12-13 21:50     ` Matthew Wilcox
2017-12-13 22:12     ` Peter Zijlstra
2017-12-13 22:12       ` Peter Zijlstra
2017-12-14  0:10       ` Matthew Wilcox
2017-12-14  0:10         ` Matthew Wilcox
2017-12-14  0:16         ` Andy Lutomirski
2017-12-14  0:16           ` Andy Lutomirski
2017-12-12 17:32 ` [patch 06/16] mm: Provide vm_special_mapping::close Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 07/16] selftest/x86: Implement additional LDT selftests Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 08/16] selftests/x86/ldt_gdt: Prepare for access bit forced Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 09/16] mm: Make populate_vma_page_range() available Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 10/16] x86/ldt: Do not install LDT for kernel threads Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:57   ` Andy Lutomirski
2017-12-12 17:57     ` Andy Lutomirski
2017-12-12 17:32 ` [patch 11/16] x86/ldt: Force access bit for CS/SS Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 18:03   ` Andy Lutomirski
2017-12-12 18:03     ` Andy Lutomirski
2017-12-12 18:09     ` Peter Zijlstra
2017-12-12 18:09       ` Peter Zijlstra
2017-12-12 18:10       ` Andy Lutomirski
2017-12-12 18:10         ` Andy Lutomirski
2017-12-12 18:22         ` Andy Lutomirski
2017-12-12 18:22           ` Andy Lutomirski
2017-12-12 18:29           ` Peter Zijlstra
2017-12-12 18:29             ` Peter Zijlstra
2017-12-12 18:41             ` Thomas Gleixner
2017-12-12 18:41               ` Thomas Gleixner
2017-12-12 19:04               ` Peter Zijlstra
2017-12-12 19:04                 ` Peter Zijlstra
2017-12-12 19:05   ` Linus Torvalds
2017-12-12 19:05     ` Linus Torvalds
2017-12-12 19:26     ` Andy Lutomirski
2017-12-12 19:26       ` Andy Lutomirski
2017-12-19 12:10       ` David Laight
2017-12-19 12:10         ` David Laight
2017-12-12 17:32 ` [patch 12/16] x86/ldt: Reshuffle code Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 13/16] x86/ldt: Introduce LDT write fault handler Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:58   ` Andy Lutomirski
2017-12-12 17:58     ` Andy Lutomirski
2017-12-12 18:19     ` Peter Zijlstra
2017-12-12 18:19       ` Peter Zijlstra
2017-12-12 18:43       ` Thomas Gleixner
2017-12-12 18:43         ` Thomas Gleixner
2017-12-12 19:01   ` Linus Torvalds
2017-12-12 19:01     ` Linus Torvalds
2017-12-12 19:21     ` Thomas Gleixner
2017-12-12 19:21       ` Thomas Gleixner
2017-12-12 19:51       ` Linus Torvalds
2017-12-12 19:51         ` Linus Torvalds
2017-12-12 20:21       ` Dave Hansen
2017-12-12 20:21         ` Dave Hansen
2017-12-12 20:37         ` Thomas Gleixner
2017-12-12 20:37           ` Thomas Gleixner
2017-12-12 21:35           ` Andy Lutomirski
2017-12-12 21:35             ` Andy Lutomirski
2017-12-12 21:42             ` Thomas Gleixner
2017-12-12 21:42               ` Thomas Gleixner
2017-12-12 21:41           ` Thomas Gleixner [this message]
2017-12-12 21:41             ` Thomas Gleixner
2017-12-12 21:46             ` Thomas Gleixner
2017-12-12 21:46               ` Thomas Gleixner
2017-12-12 22:25             ` Peter Zijlstra
2017-12-12 22:25               ` Peter Zijlstra
2017-12-12 17:32 ` [patch 14/16] x86/ldt: Prepare for VMA mapping Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 15/16] x86/ldt: Add VMA management code Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 17:32 ` [patch 16/16] x86/ldt: Make it read only VMA mapped Thomas Gleixner
2017-12-12 17:32   ` Thomas Gleixner
2017-12-12 18:03 ` [patch 00/16] x86/ldt: Use a VMA based read only mapping Andy Lutomirski
2017-12-12 18:03   ` Andy Lutomirski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.20.1712122219580.2289@nanos \
    --to=tglx@linutronix.de \
    --cc=David.Laight@aculab.com \
    --cc=aliguori@amazon.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=bpetkov@suse.de \
    --cc=brgerst@gmail.com \
    --cc=dave.hansen@intel.com \
    --cc=dvlasenk@redhat.com \
    --cc=eduval@amazon.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hughd@google.com \
    --cc=jgross@suse.com \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will.deacon@arm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.