All of lore.kernel.org
 help / color / mirror / Atom feed
From: George Dunlap <george.dunlap@citrix.com>
To: Wei Liu <wei.liu2@citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>
Cc: "Martin Pohlack" <mpohlack@amazon.de>,
	"Julien Grall" <julien.grall@arm.com>,
	"Jan Beulich" <JBeulich@suse.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Daniel Kiper" <daniel.kiper@oracle.com>,
	"Marek Marczykowski" <marmarek@invisiblethingslab.com>,
	"Anthony Liguori" <aliguori@amazon.com>,
	"Dannowski, Uwe" <uwed@amazon.de>,
	"Lars Kurth" <lars.kurth@citrix.com>,
	"Konrad Wilk" <konrad.wilk@oracle.com>,
	"Ross Philipson" <ross.philipson@oracle.com>,
	"Dario Faggioli" <dfaggioli@suse.com>,
	"Matt Wilson" <msw@amazon.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Juergen Gross" <JGross@suse.com>,
	"Sergey Dyasli" <sergey.dyasli@citrix.com>,
	"George Dunlap" <george.dunlap@eu.citrix.com>,
	"Xen-devel List" <xen-devel@lists.xen.org>,
	"Mihai Donțu" <mdontu@bitdefender.com>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"Roger Pau Monne" <roger.pau@citri>
Subject: Re: Ongoing/future speculative mitigation work
Date: Mon, 10 Dec 2018 12:12:34 +0000	[thread overview]
Message-ID: <3e7b96cf-fe5f-604f-75f8-4919737d7e63@citrix.com> (raw)
In-Reply-To: <20181207184051.l6owpsjvecog6zhx@zion.uk.xensource.com>

On 12/7/18 6:40 PM, Wei Liu wrote:
> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>> Hello,
>>
>> This is an accumulation and summary of various tasks which have been
>> discussed since the revelation of the speculative security issues in
>> January, and also an invitation to discuss alternative ideas.  They are
>> x86 specific, but a lot of the principles are architecture-agnostic.
>>
>> 1) A secrets-free hypervisor.
>>
>> Basically every hypercall can be (ab)used by a guest, and used as an
>> arbitrary cache-load gadget.  Logically, this is the first half of a
>> Spectre SP1 gadget, and is usually the first stepping stone to
>> exploiting one of the speculative sidechannels.
>>
>> Short of compiling Xen with LLVM's Speculative Load Hardening (which is
>> still experimental, and comes with a ~30% perf hit in the common case),
>> this is unavoidable.  Furthermore, throwing a few array_index_nospec()
>> into the code isn't a viable solution to the problem.
>>
>> An alternative option is to have less data mapped into Xen's virtual
>> address space - if a piece of memory isn't mapped, it can't be loaded
>> into the cache.
>>
>> An easy first step here is to remove Xen's directmap, which will mean
>> that guests general RAM isn't mapped by default into Xen's address
>> space.  This will come with some performance hit, as the
>> map_domain_page() infrastructure will now have to actually
>> create/destroy mappings, but removing the directmap will cause an
>> improvement for non-speculative security as well (No possibility of
>> ret2dir as an exploit technique).
>>
>> Beyond the directmap, there are plenty of other interesting secrets in
>> the Xen heap and other mappings, such as the stacks of the other pcpus. 
>> Fixing this requires moving Xen to having a non-uniform memory layout,
>> and this is much harder to change.  I already experimented with this as
>> a meltdown mitigation around about a year ago, and posted the resulting
>> series on Jan 4th,
>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html,
>> some trivial bits of which have already found their way upstream.
>>
>> To have a non-uniform memory layout, Xen may not share L4 pagetables. 
>> i.e. Xen must never have two pcpus which reference the same pagetable in
>> %cr3.
>>
>> This property already holds for 32bit PV guests, and all HVM guests, but
>> 64bit PV guests are the sticking point.  Because Linux has a flat memory
>> layout, when a 64bit PV guest schedules two threads from the same
>> process on separate vcpus, those two vcpus have the same virtual %cr3,
>> and currently, Xen programs the same real %cr3 into hardware.
>>
>> If we want Xen to have a non-uniform layout, are two options are:
>> * Fix Linux to have the same non-uniform layout that Xen wants
>> (Backwards compatibility for older 64bit PV guests can be achieved with
>> xen-shim).
>> * Make use XPTI algorithm (specifically, the pagetable sync/copy part)
>> forever more in the future.
>>
>> Option 2 isn't great (especially for perf on fixed hardware), but does
>> keep all the necessary changes in Xen.  Option 1 looks to be the better
>> option longterm.
>>
>> As an interesting point to note.  The 32bit PV ABI prohibits sharing of
>> L3 pagetables, because back in the 32bit hypervisor days, we used to
>> have linear mappings in the Xen virtual range.  This check is stale
>> (from a functionality point of view), but still present in Xen.  A
>> consequence of this is that 32bit PV guests definitely don't share
>> top-level pagetables across vcpus.
> 
> Correction: 32bit PV ABI prohibits sharing of L2 pagetables, but L3
> pagetables can be shared. So guests will schedule the same top-level
> pagetables across vcpus. >
> But, 64bit Xen creates a monitor table for 32bit PAE guest and put the
> CR3 provided by guest to the first slot, so pcpus don't share the same
> L4 pagetables. The property we want still holds.

Ah, right -- but Xen can get away with this because in PAE mode, "L3" is
just 4 entries that are loaded on CR3-switch and not automatically kept
in sync by the hardware; i.e., the OS already needs to do its own
"manual syncing" if it updates any of the L3 entires; so it's the same
for Xen.

>> Juergen/Boris: Do you have any idea if/how easy this infrastructure
>> would be to implement for 64bit PV guests as well?  If a PV guest can
>> advertise via Elfnote that it won't share top-level pagetables, then we
>> can audit this trivially in Xen.
>>
> 
> After reading Linux kernel code, I think it is not going to be trivial.
> As now threads in Linux share one pagetable (as it should be).
> 
> In order to make each thread has its own pagetable while still maintain
> the illusion of one address space, there needs to be synchronisation
> under the hood.
> 
> There is code in Linux to synchronise vmalloc, but that's only for the
> kernel portion. The infrastructure to synchronise userspace portion is
> missing.
> 
> One idea is to follow the same model as vmalloc -- maintain a reference
> pagetable in struct mm and a list of pagetables for threads, then
> synchronise the pagetables in the page fault handler. But this is
> probably a bit hard to sell to Linux maintainers because it will touch a
> lot of the non-Xen code, increase complexity and decrease performance.

Sorry -- what do you mean "synchronize vmalloc"?  If every thread has a
different view of the kernel's vmalloc area, then every thread must have
a different L4 table, right?  And if every thread has a different L4
table, then we've already got the main thing we need from Linux, don't we?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-12-10 12:12 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-18 17:46 Ongoing/future speculative mitigation work Andrew Cooper
2018-10-19  8:09 ` Dario Faggioli
2018-10-19 12:17   ` Andrew Cooper
2018-10-22  9:32     ` Mihai Donțu
2018-10-22 14:55 ` Wei Liu
2018-10-22 15:09   ` Woodhouse, David
2018-10-22 15:14     ` Andrew Cooper
2018-10-25 14:50   ` Jan Beulich
2018-10-25 14:56     ` George Dunlap
2018-10-25 15:02       ` Jan Beulich
2018-10-25 16:29         ` Andrew Cooper
2018-10-25 16:43           ` George Dunlap
2018-10-25 16:50             ` Andrew Cooper
2018-10-25 17:07               ` George Dunlap
2018-10-26  9:16           ` Jan Beulich
2018-10-26  9:28             ` Wei Liu
2018-10-26  9:56               ` Jan Beulich
2018-10-26 10:51                 ` George Dunlap
2018-10-26 11:20                   ` Jan Beulich
2018-10-26 11:24                     ` George Dunlap
2018-10-26 11:33                       ` Jan Beulich
2018-10-26 11:43                         ` George Dunlap
2018-10-26 11:45                           ` Jan Beulich
2018-12-11 18:05                     ` Wei Liu
     [not found]                       ` <FB70ABC00200007CA293CED3@prv1-mh.provo.novell.com>
2018-12-12  8:32                         ` Jan Beulich
2018-10-24 15:24 ` Tamas K Lengyel
2018-10-25 16:01   ` Dario Faggioli
2018-10-25 16:25     ` Tamas K Lengyel
2018-10-25 17:23       ` Dario Faggioli
2018-10-25 17:29         ` Tamas K Lengyel
2018-10-26  7:31           ` Dario Faggioli
2018-10-25 16:55   ` Andrew Cooper
2018-10-25 17:01     ` George Dunlap
2018-10-25 17:35       ` Tamas K Lengyel
2018-10-25 17:43         ` Andrew Cooper
2018-10-25 17:58           ` Tamas K Lengyel
2018-10-25 18:13             ` Andrew Cooper
2018-10-25 18:35               ` Tamas K Lengyel
2018-10-25 18:39                 ` Andrew Cooper
2018-10-26  7:49                 ` Dario Faggioli
2018-10-26 12:01                   ` Tamas K Lengyel
2018-10-26 14:17                     ` Dario Faggioli
2018-10-26 10:11               ` George Dunlap
2018-12-07 18:40 ` Wei Liu
2018-12-10 12:12   ` George Dunlap [this message]
2018-12-10 12:19     ` George Dunlap
2019-01-24 11:44 ` Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work) Wei Liu
2019-01-24 16:00   ` George Dunlap
2019-02-07 16:50   ` Wei Liu
2019-02-20 12:29   ` Wei Liu
2019-02-20 13:00     ` Roger Pau Monné
2019-02-20 13:09       ` Wei Liu
2019-02-20 17:08         ` Wei Liu
2019-02-21  9:59           ` Roger Pau Monné
2019-02-21 17:51             ` Wei Liu
2019-02-22 11:48           ` Jan Beulich
2019-02-22 11:50             ` Wei Liu
2019-02-22 12:06               ` Jan Beulich
2019-02-22 12:11                 ` Wei Liu
2019-02-22 12:47                   ` Jan Beulich
2019-02-22 13:19                     ` Wei Liu
     [not found]                       ` <158783E402000088A293CED3@prv1-mh.provo.novell.com>
2019-02-22 13:24                         ` Jan Beulich
2019-02-22 13:27                           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e7b96cf-fe5f-604f-75f8-4919737d7e63@citrix.com \
    --to=george.dunlap@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=JGross@suse.com \
    --cc=aliguori@amazon.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=daniel.kiper@oracle.com \
    --cc=dfaggioli@suse.com \
    --cc=dwmw@amazon.co.uk \
    --cc=george.dunlap@eu.citrix.com \
    --cc=joao.m.martins@oracle.com \
    --cc=julien.grall@arm.com \
    --cc=konrad.wilk@oracle.com \
    --cc=lars.kurth@citrix.com \
    --cc=marmarek@invisiblethingslab.com \
    --cc=mdontu@bitdefender.com \
    --cc=mpohlack@amazon.de \
    --cc=msw@amazon.com \
    --cc=roger.pau@citri \
    --cc=ross.philipson@oracle.com \
    --cc=sergey.dyasli@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=uwed@amazon.de \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.