All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dario Faggioli <dfaggioli@suse.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	Xen-devel List <xen-devel@lists.xen.org>
Cc: "Juergen Gross" <JGross@suse.com>,
	"Lars Kurth" <lars.kurth@citrix.com>,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Wei Liu" <wei.liu2@citrix.com>,
	"Anthony Liguori" <aliguori@amazon.com>,
	"Sergey Dyasli" <sergey.dyasli@citrix.com>,
	"George Dunlap" <george.dunlap@eu.citrix.com>,
	"Ross Philipson" <ross.philipson@oracle.com>,
	"Daniel Kiper" <daniel.kiper@oracle.com>,
	"Konrad Wilk" <konrad.wilk@oracle.com>,
	"Marek Marczykowski" <marmarek@invisiblethingslab.com>,
	"Martin Pohlack" <mpohlack@amazon.de>,
	"Julien Grall" <julien.grall@arm.com>,
	"Dannowski, Uwe" <uwed@amazon.de>,
	"Jan Beulich" <JBeulich@suse.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	"Mihai Donțu" <mdontu@bitdefender.com>,
	"Matt Wilson" <msw@amazon.com>,
	"Joao Martins" <joao.m.martins@oracle.com>,
	"Woodhouse, David" <dwmw@amazon.co.uk>,
	"Roger Pau Monne" <roger.pau@citrix.com>
Subject: Re: Ongoing/future speculative mitigation work
Date: Fri, 19 Oct 2018 10:09:30 +0200	[thread overview]
Message-ID: <0508ae79c8d74f6ebb7d1b239b2c3f0e428aca6b.camel@suse.com> (raw)
In-Reply-To: <e3219697-0759-39fc-2486-715cdec1ca9e@citrix.com>


[-- Attachment #1.1: Type: text/plain, Size: 6318 bytes --]

On Thu, 2018-10-18 at 18:46 +0100, Andrew Cooper wrote:
> Hello,
> 
Hey,

This is very accurate and useful... thanks for it. :-)

> 1) A secrets-free hypervisor.
> 
> Basically every hypercall can be (ab)used by a guest, and used as an
> arbitrary cache-load gadget.  Logically, this is the first half of a
> Spectre SP1 gadget, and is usually the first stepping stone to
> exploiting one of the speculative sidechannels.
> 
> Short of compiling Xen with LLVM's Speculative Load Hardening (which
> is
> still experimental, and comes with a ~30% perf hit in the common
> case),
> this is unavoidable.  Furthermore, throwing a few
> array_index_nospec()
> into the code isn't a viable solution to the problem.
> 
> An alternative option is to have less data mapped into Xen's virtual
> address space - if a piece of memory isn't mapped, it can't be loaded
> into the cache.
> 
> [...]
> 
> 2) Scheduler improvements.
> 
> (I'm afraid this is rather more sparse because I'm less familiar with
> the scheduler details.)
> 
> At the moment, all of Xen's schedulers will happily put two vcpus
> from
> different domains on sibling hyperthreads.  There has been a lot of
> sidechannel research over the past decade demonstrating ways for one
> thread to infer what is going on the other, but L1TF is the first
> vulnerability I'm aware of which allows one thread to directly read
> data
> out of the other.
> 
> Either way, it is now definitely a bad thing to run different guests
> concurrently on siblings.  
>
Well, yes. But, as you say, L1TF, and I'd say TLBLeed as well, are the
first serious issues discovered so far and, for instance, even on x86,
not all Intel CPUs and none of the AMD ones, AFAIK, are affected.

Therefore, although I certainly think we _must_ have the proper
scheduler enhancements in place (and in fact I'm working on that :-D)
it should IMO still be possible for the user to decide whether or not
to use them (either by opting-in or opting-out, I don't care much at
this stage).

> Fixing this by simply not scheduling vcpus
> from a different guest on siblings does result in a lower resource
> utilisation, most notably when there are an odd number runable vcpus
> in
> a domain, as the other thread is forced to idle.
> 
Right.

> A step beyond this is core-aware scheduling, where we schedule in
> units
> of a virtual core rather than a virtual thread.  This has much better
> behaviour from the guests point of view, as the actually-scheduled
> topology remains consistent, but does potentially come with even
> lower
> utilisation if every other thread in the guest is idle.
> 
Yes, basically, what you describe as 'core-aware scheduling' here can
be build on top of what you had described above as 'not scheduling
vcpus from different guests'.

I mean, we can/should put ourselves in a position where the user can
choose if he/she wants:
- just 'plain scheduling', as we have now,
- "just" that only vcpus of the same domains are scheduled on siblings
hyperthread,
- full 'core-aware scheduling', i.e., only vcpus that the guest
actually sees as virtual hyperthread siblings, are scheduled on
hardware hyperthread siblings.

About the performance impact, indeed it's even higher with core-aware
scheduling. Something that we can see about doing, is acting on the
guest scheduler, e.g., telling it to try to "pack the load", and keep
siblings busy, instead of trying to avoid doing that (which is what
happens by default in most cases).

In Linux, this can be done by playing with the sched-flags (see, e.g.,
https://elixir.bootlin.com/linux/v4.18/source/include/linux/sched/topology.h#L20 ,
and /proc/sys/kernel/sched_domain/cpu*/domain*/flags ).

The idea would be to avoid, as much as possible, the case when "every
other thread is idle in the guest". I'm not sure about being able to do
something by default, but we can certainly document things (like "if
you enable core-scheduling, also do `echo 1234 > /proc/sys/.../flags'
in your Linux guests").

I haven't checked whether other OSs' schedulers have something similar.

> A side requirement for core-aware scheduling is for Xen to have an
> accurate idea of the topology presented to the guest.  I need to dust
> off my Toolstack CPUID/MSR improvement series and get that upstream.
> 
Indeed. Without knowing which one of the guest's vcpus are to be
considered virtual hyperthread siblings, I can only get you as far as
"only scheduling vcpus of the same domain on siblings hyperthread". :-)

> One of the most insidious problems with L1TF is that, with
> hyperthreading enabled, a malicious guest kernel can engineer
> arbitrary
> data leakage by having one thread scanning the expected physical
> address, and the other thread using an arbitrary cache-load gadget in
> hypervisor context.  This occurs because the L1 data cache is shared
> by
> threads.
>
Right. So, sorry if this is a stupid question, but how does this relate
to the "secret-free hypervisor", and with the "if a piece of memory
isn't mapped, it can't be loaded into the cache".

So, basically, I'm asking whether I am understanding it correctly that
secret-free Xen + core-aware scheduling would *not* be enough for
mitigating L1TF properly (and if the answer is no, why... but only if
you have 5 mins to explain it to me :-P).

In fact, ISTR that core-scheduling plus something that looked to me
similar enough to "secret-free Xen", is how Microsoft claims to be
mitigating L1TF on hyper-v...

> A solution to this issue was proposed, whereby Xen synchronises
> siblings
> on vmexit/entry, so we are never executing code in two different
> privilege levels.  Getting this working would make it safe to
> continue
> using hyperthreading even in the presence of L1TF.  
>
Err... ok, but we still want core-aware scheduling, or at least we want
to avoid having vcpus from different domains on siblings, don't we? In
order to avoid leaks between guests, I mean.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  reply	other threads:[~2018-10-19  8:09 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-18 17:46 Ongoing/future speculative mitigation work Andrew Cooper
2018-10-19  8:09 ` Dario Faggioli [this message]
2018-10-19 12:17   ` Andrew Cooper
2018-10-22  9:32     ` Mihai Donțu
2018-10-22 14:55 ` Wei Liu
2018-10-22 15:09   ` Woodhouse, David
2018-10-22 15:14     ` Andrew Cooper
2018-10-25 14:50   ` Jan Beulich
2018-10-25 14:56     ` George Dunlap
2018-10-25 15:02       ` Jan Beulich
2018-10-25 16:29         ` Andrew Cooper
2018-10-25 16:43           ` George Dunlap
2018-10-25 16:50             ` Andrew Cooper
2018-10-25 17:07               ` George Dunlap
2018-10-26  9:16           ` Jan Beulich
2018-10-26  9:28             ` Wei Liu
2018-10-26  9:56               ` Jan Beulich
2018-10-26 10:51                 ` George Dunlap
2018-10-26 11:20                   ` Jan Beulich
2018-10-26 11:24                     ` George Dunlap
2018-10-26 11:33                       ` Jan Beulich
2018-10-26 11:43                         ` George Dunlap
2018-10-26 11:45                           ` Jan Beulich
2018-12-11 18:05                     ` Wei Liu
     [not found]                       ` <FB70ABC00200007CA293CED3@prv1-mh.provo.novell.com>
2018-12-12  8:32                         ` Jan Beulich
2018-10-24 15:24 ` Tamas K Lengyel
2018-10-25 16:01   ` Dario Faggioli
2018-10-25 16:25     ` Tamas K Lengyel
2018-10-25 17:23       ` Dario Faggioli
2018-10-25 17:29         ` Tamas K Lengyel
2018-10-26  7:31           ` Dario Faggioli
2018-10-25 16:55   ` Andrew Cooper
2018-10-25 17:01     ` George Dunlap
2018-10-25 17:35       ` Tamas K Lengyel
2018-10-25 17:43         ` Andrew Cooper
2018-10-25 17:58           ` Tamas K Lengyel
2018-10-25 18:13             ` Andrew Cooper
2018-10-25 18:35               ` Tamas K Lengyel
2018-10-25 18:39                 ` Andrew Cooper
2018-10-26  7:49                 ` Dario Faggioli
2018-10-26 12:01                   ` Tamas K Lengyel
2018-10-26 14:17                     ` Dario Faggioli
2018-10-26 10:11               ` George Dunlap
2018-12-07 18:40 ` Wei Liu
2018-12-10 12:12   ` George Dunlap
2018-12-10 12:19     ` George Dunlap
2019-01-24 11:44 ` Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work) Wei Liu
2019-01-24 16:00   ` George Dunlap
2019-02-07 16:50   ` Wei Liu
2019-02-20 12:29   ` Wei Liu
2019-02-20 13:00     ` Roger Pau Monné
2019-02-20 13:09       ` Wei Liu
2019-02-20 17:08         ` Wei Liu
2019-02-21  9:59           ` Roger Pau Monné
2019-02-21 17:51             ` Wei Liu
2019-02-22 11:48           ` Jan Beulich
2019-02-22 11:50             ` Wei Liu
2019-02-22 12:06               ` Jan Beulich
2019-02-22 12:11                 ` Wei Liu
2019-02-22 12:47                   ` Jan Beulich
2019-02-22 13:19                     ` Wei Liu
     [not found]                       ` <158783E402000088A293CED3@prv1-mh.provo.novell.com>
2019-02-22 13:24                         ` Jan Beulich
2019-02-22 13:27                           ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0508ae79c8d74f6ebb7d1b239b2c3f0e428aca6b.camel@suse.com \
    --to=dfaggioli@suse.com \
    --cc=JBeulich@suse.com \
    --cc=JGross@suse.com \
    --cc=aliguori@amazon.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=daniel.kiper@oracle.com \
    --cc=dwmw@amazon.co.uk \
    --cc=george.dunlap@eu.citrix.com \
    --cc=joao.m.martins@oracle.com \
    --cc=julien.grall@arm.com \
    --cc=konrad.wilk@oracle.com \
    --cc=lars.kurth@citrix.com \
    --cc=marmarek@invisiblethingslab.com \
    --cc=mdontu@bitdefender.com \
    --cc=mpohlack@amazon.de \
    --cc=msw@amazon.com \
    --cc=roger.pau@citrix.com \
    --cc=ross.philipson@oracle.com \
    --cc=sergey.dyasli@citrix.com \
    --cc=sstabellini@kernel.org \
    --cc=uwed@amazon.de \
    --cc=wei.liu2@citrix.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.