All of lore.kernel.org
 help / color / mirror / Atom feed
* Ongoing/future speculative mitigation work
@ 2018-10-18 17:46 Andrew Cooper
  2018-10-19  8:09 ` Dario Faggioli
                   ` (4 more replies)
  0 siblings, 5 replies; 63+ messages in thread
From: Andrew Cooper @ 2018-10-18 17:46 UTC (permalink / raw)
  To: Xen-devel List
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Dannowski, Uwe, Lars Kurth, Konrad Wilk,
	Ross Philipson, Dario Faggioli, Matt Wilson, Boris Ostrovsky,
	Juergen Gross, Sergey Dyasli, Wei Liu, George Dunlap,
	Mihai Donțu, Woodhouse, David, Roger Pau Monne

Hello,

This is an accumulation and summary of various tasks which have been
discussed since the revelation of the speculative security issues in
January, and also an invitation to discuss alternative ideas.  They are
x86 specific, but a lot of the principles are architecture-agnostic.

1) A secrets-free hypervisor.

Basically every hypercall can be (ab)used by a guest, and used as an
arbitrary cache-load gadget.  Logically, this is the first half of a
Spectre SP1 gadget, and is usually the first stepping stone to
exploiting one of the speculative sidechannels.

Short of compiling Xen with LLVM's Speculative Load Hardening (which is
still experimental, and comes with a ~30% perf hit in the common case),
this is unavoidable.  Furthermore, throwing a few array_index_nospec()
into the code isn't a viable solution to the problem.

An alternative option is to have less data mapped into Xen's virtual
address space - if a piece of memory isn't mapped, it can't be loaded
into the cache.

An easy first step here is to remove Xen's directmap, which will mean
that guests general RAM isn't mapped by default into Xen's address
space.  This will come with some performance hit, as the
map_domain_page() infrastructure will now have to actually
create/destroy mappings, but removing the directmap will cause an
improvement for non-speculative security as well (No possibility of
ret2dir as an exploit technique).

Beyond the directmap, there are plenty of other interesting secrets in
the Xen heap and other mappings, such as the stacks of the other pcpus. 
Fixing this requires moving Xen to having a non-uniform memory layout,
and this is much harder to change.  I already experimented with this as
a meltdown mitigation around about a year ago, and posted the resulting
series on Jan 4th,
https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html,
some trivial bits of which have already found their way upstream.

To have a non-uniform memory layout, Xen may not share L4 pagetables. 
i.e. Xen must never have two pcpus which reference the same pagetable in
%cr3.

This property already holds for 32bit PV guests, and all HVM guests, but
64bit PV guests are the sticking point.  Because Linux has a flat memory
layout, when a 64bit PV guest schedules two threads from the same
process on separate vcpus, those two vcpus have the same virtual %cr3,
and currently, Xen programs the same real %cr3 into hardware.

If we want Xen to have a non-uniform layout, are two options are:
* Fix Linux to have the same non-uniform layout that Xen wants
(Backwards compatibility for older 64bit PV guests can be achieved with
xen-shim).
* Make use XPTI algorithm (specifically, the pagetable sync/copy part)
forever more in the future.

Option 2 isn't great (especially for perf on fixed hardware), but does
keep all the necessary changes in Xen.  Option 1 looks to be the better
option longterm.

As an interesting point to note.  The 32bit PV ABI prohibits sharing of
L3 pagetables, because back in the 32bit hypervisor days, we used to
have linear mappings in the Xen virtual range.  This check is stale
(from a functionality point of view), but still present in Xen.  A
consequence of this is that 32bit PV guests definitely don't share
top-level pagetables across vcpus.

Juergen/Boris: Do you have any idea if/how easy this infrastructure
would be to implement for 64bit PV guests as well?  If a PV guest can
advertise via Elfnote that it won't share top-level pagetables, then we
can audit this trivially in Xen.


2) Scheduler improvements.

(I'm afraid this is rather more sparse because I'm less familiar with
the scheduler details.)

At the moment, all of Xen's schedulers will happily put two vcpus from
different domains on sibling hyperthreads.  There has been a lot of
sidechannel research over the past decade demonstrating ways for one
thread to infer what is going on the other, but L1TF is the first
vulnerability I'm aware of which allows one thread to directly read data
out of the other.

Either way, it is now definitely a bad thing to run different guests
concurrently on siblings.  Fixing this by simply not scheduling vcpus
from a different guest on siblings does result in a lower resource
utilisation, most notably when there are an odd number runable vcpus in
a domain, as the other thread is forced to idle.

A step beyond this is core-aware scheduling, where we schedule in units
of a virtual core rather than a virtual thread.  This has much better
behaviour from the guests point of view, as the actually-scheduled
topology remains consistent, but does potentially come with even lower
utilisation if every other thread in the guest is idle.

A side requirement for core-aware scheduling is for Xen to have an
accurate idea of the topology presented to the guest.  I need to dust
off my Toolstack CPUID/MSR improvement series and get that upstream.

One of the most insidious problems with L1TF is that, with
hyperthreading enabled, a malicious guest kernel can engineer arbitrary
data leakage by having one thread scanning the expected physical
address, and the other thread using an arbitrary cache-load gadget in
hypervisor context.  This occurs because the L1 data cache is shared by
threads.

A solution to this issue was proposed, whereby Xen synchronises siblings
on vmexit/entry, so we are never executing code in two different
privilege levels.  Getting this working would make it safe to continue
using hyperthreading even in the presence of L1TF.  Obviously, its going
to come in perf hit, but compared to disabling hyperthreading, all its
got to do is beat a 60% perf hit to make it the preferable option for
making your system L1TF-proof.

Anyway - enough of my rambling for now.  Thoughts?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-18 17:46 Ongoing/future speculative mitigation work Andrew Cooper
@ 2018-10-19  8:09 ` Dario Faggioli
  2018-10-19 12:17   ` Andrew Cooper
  2018-10-22 14:55 ` Wei Liu
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 63+ messages in thread
From: Dario Faggioli @ 2018-10-19  8:09 UTC (permalink / raw)
  To: Andrew Cooper, Xen-devel List
  Cc: Juergen Gross, Lars Kurth, Stefano Stabellini, Wei Liu,
	Anthony Liguori, Sergey Dyasli, George Dunlap, Ross Philipson,
	Daniel Kiper, Konrad Wilk, Marek Marczykowski, Martin Pohlack,
	Julien Grall, Dannowski, Uwe, Jan Beulich, Boris Ostrovsky,
	Mihai Donțu, Matt Wilson, Joao Martins, Woodhouse, David,
	Roger Pau Monne


[-- Attachment #1.1: Type: text/plain, Size: 6318 bytes --]

On Thu, 2018-10-18 at 18:46 +0100, Andrew Cooper wrote:
> Hello,
> 
Hey,

This is very accurate and useful... thanks for it. :-)

> 1) A secrets-free hypervisor.
> 
> Basically every hypercall can be (ab)used by a guest, and used as an
> arbitrary cache-load gadget.  Logically, this is the first half of a
> Spectre SP1 gadget, and is usually the first stepping stone to
> exploiting one of the speculative sidechannels.
> 
> Short of compiling Xen with LLVM's Speculative Load Hardening (which
> is
> still experimental, and comes with a ~30% perf hit in the common
> case),
> this is unavoidable.  Furthermore, throwing a few
> array_index_nospec()
> into the code isn't a viable solution to the problem.
> 
> An alternative option is to have less data mapped into Xen's virtual
> address space - if a piece of memory isn't mapped, it can't be loaded
> into the cache.
> 
> [...]
> 
> 2) Scheduler improvements.
> 
> (I'm afraid this is rather more sparse because I'm less familiar with
> the scheduler details.)
> 
> At the moment, all of Xen's schedulers will happily put two vcpus
> from
> different domains on sibling hyperthreads.  There has been a lot of
> sidechannel research over the past decade demonstrating ways for one
> thread to infer what is going on the other, but L1TF is the first
> vulnerability I'm aware of which allows one thread to directly read
> data
> out of the other.
> 
> Either way, it is now definitely a bad thing to run different guests
> concurrently on siblings.  
>
Well, yes. But, as you say, L1TF, and I'd say TLBLeed as well, are the
first serious issues discovered so far and, for instance, even on x86,
not all Intel CPUs and none of the AMD ones, AFAIK, are affected.

Therefore, although I certainly think we _must_ have the proper
scheduler enhancements in place (and in fact I'm working on that :-D)
it should IMO still be possible for the user to decide whether or not
to use them (either by opting-in or opting-out, I don't care much at
this stage).

> Fixing this by simply not scheduling vcpus
> from a different guest on siblings does result in a lower resource
> utilisation, most notably when there are an odd number runable vcpus
> in
> a domain, as the other thread is forced to idle.
> 
Right.

> A step beyond this is core-aware scheduling, where we schedule in
> units
> of a virtual core rather than a virtual thread.  This has much better
> behaviour from the guests point of view, as the actually-scheduled
> topology remains consistent, but does potentially come with even
> lower
> utilisation if every other thread in the guest is idle.
> 
Yes, basically, what you describe as 'core-aware scheduling' here can
be build on top of what you had described above as 'not scheduling
vcpus from different guests'.

I mean, we can/should put ourselves in a position where the user can
choose if he/she wants:
- just 'plain scheduling', as we have now,
- "just" that only vcpus of the same domains are scheduled on siblings
hyperthread,
- full 'core-aware scheduling', i.e., only vcpus that the guest
actually sees as virtual hyperthread siblings, are scheduled on
hardware hyperthread siblings.

About the performance impact, indeed it's even higher with core-aware
scheduling. Something that we can see about doing, is acting on the
guest scheduler, e.g., telling it to try to "pack the load", and keep
siblings busy, instead of trying to avoid doing that (which is what
happens by default in most cases).

In Linux, this can be done by playing with the sched-flags (see, e.g.,
https://elixir.bootlin.com/linux/v4.18/source/include/linux/sched/topology.h#L20 ,
and /proc/sys/kernel/sched_domain/cpu*/domain*/flags ).

The idea would be to avoid, as much as possible, the case when "every
other thread is idle in the guest". I'm not sure about being able to do
something by default, but we can certainly document things (like "if
you enable core-scheduling, also do `echo 1234 > /proc/sys/.../flags'
in your Linux guests").

I haven't checked whether other OSs' schedulers have something similar.

> A side requirement for core-aware scheduling is for Xen to have an
> accurate idea of the topology presented to the guest.  I need to dust
> off my Toolstack CPUID/MSR improvement series and get that upstream.
> 
Indeed. Without knowing which one of the guest's vcpus are to be
considered virtual hyperthread siblings, I can only get you as far as
"only scheduling vcpus of the same domain on siblings hyperthread". :-)

> One of the most insidious problems with L1TF is that, with
> hyperthreading enabled, a malicious guest kernel can engineer
> arbitrary
> data leakage by having one thread scanning the expected physical
> address, and the other thread using an arbitrary cache-load gadget in
> hypervisor context.  This occurs because the L1 data cache is shared
> by
> threads.
>
Right. So, sorry if this is a stupid question, but how does this relate
to the "secret-free hypervisor", and with the "if a piece of memory
isn't mapped, it can't be loaded into the cache".

So, basically, I'm asking whether I am understanding it correctly that
secret-free Xen + core-aware scheduling would *not* be enough for
mitigating L1TF properly (and if the answer is no, why... but only if
you have 5 mins to explain it to me :-P).

In fact, ISTR that core-scheduling plus something that looked to me
similar enough to "secret-free Xen", is how Microsoft claims to be
mitigating L1TF on hyper-v...

> A solution to this issue was proposed, whereby Xen synchronises
> siblings
> on vmexit/entry, so we are never executing code in two different
> privilege levels.  Getting this working would make it safe to
> continue
> using hyperthreading even in the presence of L1TF.  
>
Err... ok, but we still want core-aware scheduling, or at least we want
to avoid having vcpus from different domains on siblings, don't we? In
order to avoid leaks between guests, I mean.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-19  8:09 ` Dario Faggioli
@ 2018-10-19 12:17   ` Andrew Cooper
  2018-10-22  9:32     ` Mihai Donțu
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2018-10-19 12:17 UTC (permalink / raw)
  To: Dario Faggioli, Xen-devel List
  Cc: Juergen Gross, Lars Kurth, Stefano Stabellini, Wei Liu,
	Anthony Liguori, Sergey Dyasli, George Dunlap, Ross Philipson,
	Daniel Kiper, Konrad Wilk, Marek Marczykowski, Martin Pohlack,
	Julien Grall, Dannowski, Uwe, Jan Beulich, Boris Ostrovsky,
	Mihai Donțu, Matt Wilson, Joao Martins, Woodhouse, David,
	Roger Pau Monne

On 19/10/18 09:09, Dario Faggioli wrote:
> On Thu, 2018-10-18 at 18:46 +0100, Andrew Cooper wrote:
>> Hello,
>>
> Hey,
>
> This is very accurate and useful... thanks for it. :-)
>
>> 1) A secrets-free hypervisor.
>>
>> Basically every hypercall can be (ab)used by a guest, and used as an
>> arbitrary cache-load gadget.  Logically, this is the first half of a
>> Spectre SP1 gadget, and is usually the first stepping stone to
>> exploiting one of the speculative sidechannels.
>>
>> Short of compiling Xen with LLVM's Speculative Load Hardening (which
>> is
>> still experimental, and comes with a ~30% perf hit in the common
>> case),
>> this is unavoidable.  Furthermore, throwing a few
>> array_index_nospec()
>> into the code isn't a viable solution to the problem.
>>
>> An alternative option is to have less data mapped into Xen's virtual
>> address space - if a piece of memory isn't mapped, it can't be loaded
>> into the cache.
>>
>> [...]
>>
>> 2) Scheduler improvements.
>>
>> (I'm afraid this is rather more sparse because I'm less familiar with
>> the scheduler details.)
>>
>> At the moment, all of Xen's schedulers will happily put two vcpus
>> from
>> different domains on sibling hyperthreads.  There has been a lot of
>> sidechannel research over the past decade demonstrating ways for one
>> thread to infer what is going on the other, but L1TF is the first
>> vulnerability I'm aware of which allows one thread to directly read
>> data
>> out of the other.
>>
>> Either way, it is now definitely a bad thing to run different guests
>> concurrently on siblings.  
>>
> Well, yes. But, as you say, L1TF, and I'd say TLBLeed as well, are the
> first serious issues discovered so far and, for instance, even on x86,
> not all Intel CPUs and none of the AMD ones, AFAIK, are affected.

TLBleed is an excellent paper and associated research, but is still just
inference - a vast quantity of post-processing is required to extract
the key.

There are plenty of other sidechannels which affect all SMT
implementations, such as the effects of executing an mfence instruction,
execution unit

> Therefore, although I certainly think we _must_ have the proper
> scheduler enhancements in place (and in fact I'm working on that :-D)
> it should IMO still be possible for the user to decide whether or not
> to use them (either by opting-in or opting-out, I don't care much at
> this stage).

I'm not suggesting that we leave people without a choice, but given an
option which doesn't share siblings between different guests, it should
be the default.

>
>> Fixing this by simply not scheduling vcpus
>> from a different guest on siblings does result in a lower resource
>> utilisation, most notably when there are an odd number runable vcpus
>> in
>> a domain, as the other thread is forced to idle.
>>
> Right.
>
>> A step beyond this is core-aware scheduling, where we schedule in
>> units
>> of a virtual core rather than a virtual thread.  This has much better
>> behaviour from the guests point of view, as the actually-scheduled
>> topology remains consistent, but does potentially come with even
>> lower
>> utilisation if every other thread in the guest is idle.
>>
> Yes, basically, what you describe as 'core-aware scheduling' here can
> be build on top of what you had described above as 'not scheduling
> vcpus from different guests'.
>
> I mean, we can/should put ourselves in a position where the user can
> choose if he/she wants:
> - just 'plain scheduling', as we have now,
> - "just" that only vcpus of the same domains are scheduled on siblings
> hyperthread,
> - full 'core-aware scheduling', i.e., only vcpus that the guest
> actually sees as virtual hyperthread siblings, are scheduled on
> hardware hyperthread siblings.
>
> About the performance impact, indeed it's even higher with core-aware
> scheduling. Something that we can see about doing, is acting on the
> guest scheduler, e.g., telling it to try to "pack the load", and keep
> siblings busy, instead of trying to avoid doing that (which is what
> happens by default in most cases).
>
> In Linux, this can be done by playing with the sched-flags (see, e.g.,
> https://elixir.bootlin.com/linux/v4.18/source/include/linux/sched/topology.h#L20 ,
> and /proc/sys/kernel/sched_domain/cpu*/domain*/flags ).
>
> The idea would be to avoid, as much as possible, the case when "every
> other thread is idle in the guest". I'm not sure about being able to do
> something by default, but we can certainly document things (like "if
> you enable core-scheduling, also do `echo 1234 > /proc/sys/.../flags'
> in your Linux guests").
>
> I haven't checked whether other OSs' schedulers have something similar.
>
>> A side requirement for core-aware scheduling is for Xen to have an
>> accurate idea of the topology presented to the guest.  I need to dust
>> off my Toolstack CPUID/MSR improvement series and get that upstream.
>>
> Indeed. Without knowing which one of the guest's vcpus are to be
> considered virtual hyperthread siblings, I can only get you as far as
> "only scheduling vcpus of the same domain on siblings hyperthread". :-)
>
>> One of the most insidious problems with L1TF is that, with
>> hyperthreading enabled, a malicious guest kernel can engineer
>> arbitrary
>> data leakage by having one thread scanning the expected physical
>> address, and the other thread using an arbitrary cache-load gadget in
>> hypervisor context.  This occurs because the L1 data cache is shared
>> by
>> threads.
>>
> Right. So, sorry if this is a stupid question, but how does this relate
> to the "secret-free hypervisor", and with the "if a piece of memory
> isn't mapped, it can't be loaded into the cache".
>
> So, basically, I'm asking whether I am understanding it correctly that
> secret-free Xen + core-aware scheduling would *not* be enough for
> mitigating L1TF properly (and if the answer is no, why... but only if
> you have 5 mins to explain it to me :-P).
>
> In fact, ISTR that core-scheduling plus something that looked to me
> similar enough to "secret-free Xen", is how Microsoft claims to be
> mitigating L1TF on hyper-v...

Correct - that is what HyperV appears to be doing.

Its best to consider the secret-free Xen and scheduler improvements as
orthogonal.  In particular, the secret-free Xen is defence in depth
against SP1, and the risk of future issues, but does have
non-speculative benefits as well.

That said, the only way to use HT and definitely be safe to L1TF without
a secret-free Xen is to have the synchronised entry/exit logic working.

>> A solution to this issue was proposed, whereby Xen synchronises
>> siblings
>> on vmexit/entry, so we are never executing code in two different
>> privilege levels.  Getting this working would make it safe to
>> continue
>> using hyperthreading even in the presence of L1TF.  
>>
> Err... ok, but we still want core-aware scheduling, or at least we want
> to avoid having vcpus from different domains on siblings, don't we? In
> order to avoid leaks between guests, I mean.

Ideally, we'd want all of these.  I expect the only reasonable way to
develop them is one on top of another.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-19 12:17   ` Andrew Cooper
@ 2018-10-22  9:32     ` Mihai Donțu
  0 siblings, 0 replies; 63+ messages in thread
From: Mihai Donțu @ 2018-10-22  9:32 UTC (permalink / raw)
  To: Andrew Cooper, Dario Faggioli, Xen-devel List
  Cc: Juergen Gross, Lars Kurth, Stefano Stabellini, Wei Liu,
	Anthony Liguori, Sergey Dyasli, George Dunlap, Ross Philipson,
	Daniel Kiper, Konrad Wilk, Marek Marczykowski, Martin Pohlack,
	Julien Grall, Dannowski, Uwe, Jan Beulich, Boris Ostrovsky,
	Matt Wilson, Joao Martins, Woodhouse, David, Roger Pau Monne

On Fri, 2018-10-19 at 13:17 +0100, Andrew Cooper wrote:
> [...]
> 
> > Therefore, although I certainly think we _must_ have the proper
> > scheduler enhancements in place (and in fact I'm working on that :-D)
> > it should IMO still be possible for the user to decide whether or not
> > to use them (either by opting-in or opting-out, I don't care much at
> > this stage).
> 
> I'm not suggesting that we leave people without a choice, but given an
> option which doesn't share siblings between different guests, it should
> be the default.

+1

> [...]
> 
> Its best to consider the secret-free Xen and scheduler improvements as
> orthogonal.  In particular, the secret-free Xen is defence in depth
> against SP1, and the risk of future issues, but does have
> non-speculative benefits as well.
> 
> That said, the only way to use HT and definitely be safe to L1TF without
> a secret-free Xen is to have the synchronised entry/exit logic working.
> 
> > > A solution to this issue was proposed, whereby Xen synchronises
> > > siblings on vmexit/entry, so we are never executing code in two different
> > > privilege levels.  Getting this working would make it safe to
> > > continue using hyperthreading even in the presence of L1TF.  
> > 
> > Err... ok, but we still want core-aware scheduling, or at least we want
> > to avoid having vcpus from different domains on siblings, don't we? In
> > order to avoid leaks between guests, I mean.
> 
> Ideally, we'd want all of these.  I expect the only reasonable way to
> develop them is one on top of another.

If there was a vote, I'd place the scheduler changes at the top.

-- 
Mihai Donțu


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-18 17:46 Ongoing/future speculative mitigation work Andrew Cooper
  2018-10-19  8:09 ` Dario Faggioli
@ 2018-10-22 14:55 ` Wei Liu
  2018-10-22 15:09   ` Woodhouse, David
  2018-10-25 14:50   ` Jan Beulich
  2018-10-24 15:24 ` Tamas K Lengyel
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 63+ messages in thread
From: Wei Liu @ 2018-10-22 14:55 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Dannowski, Uwe, Lars Kurth, Konrad Wilk,
	Ross Philipson, Dario Faggioli, Matt Wilson, Boris Ostrovsky,
	Juergen Gross, Sergey Dyasli, Wei Liu, George Dunlap,
	Xen-devel List, Mihai Donțu, Woodhouse, David

On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
> Hello,
> 
> This is an accumulation and summary of various tasks which have been
> discussed since the revelation of the speculative security issues in
> January, and also an invitation to discuss alternative ideas.  They are
> x86 specific, but a lot of the principles are architecture-agnostic.
> 
> 1) A secrets-free hypervisor.
> 
> Basically every hypercall can be (ab)used by a guest, and used as an
> arbitrary cache-load gadget.  Logically, this is the first half of a
> Spectre SP1 gadget, and is usually the first stepping stone to
> exploiting one of the speculative sidechannels.
> 
> Short of compiling Xen with LLVM's Speculative Load Hardening (which is
> still experimental, and comes with a ~30% perf hit in the common case),
> this is unavoidable.  Furthermore, throwing a few array_index_nospec()
> into the code isn't a viable solution to the problem.
> 
> An alternative option is to have less data mapped into Xen's virtual
> address space - if a piece of memory isn't mapped, it can't be loaded
> into the cache.
> 
> An easy first step here is to remove Xen's directmap, which will mean
> that guests general RAM isn't mapped by default into Xen's address
> space.  This will come with some performance hit, as the
> map_domain_page() infrastructure will now have to actually
> create/destroy mappings, but removing the directmap will cause an
> improvement for non-speculative security as well (No possibility of
> ret2dir as an exploit technique).

I have looked into making the "separate xenheap domheap with partial
direct map" mode (see common/page_alloc.c) work but found it not as
straight forward as it should've been.

Before I spend more time on this, I would like some opinions on if there
is other approach which might be more useful than that mode.

> 
> Beyond the directmap, there are plenty of other interesting secrets in
> the Xen heap and other mappings, such as the stacks of the other pcpus. 
> Fixing this requires moving Xen to having a non-uniform memory layout,
> and this is much harder to change.  I already experimented with this as
> a meltdown mitigation around about a year ago, and posted the resulting
> series on Jan 4th,
> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html,
> some trivial bits of which have already found their way upstream.
> 
> To have a non-uniform memory layout, Xen may not share L4 pagetables. 
> i.e. Xen must never have two pcpus which reference the same pagetable in
> %cr3.
> 
> This property already holds for 32bit PV guests, and all HVM guests, but
> 64bit PV guests are the sticking point.  Because Linux has a flat memory
> layout, when a 64bit PV guest schedules two threads from the same
> process on separate vcpus, those two vcpus have the same virtual %cr3,
> and currently, Xen programs the same real %cr3 into hardware.

Which bit of Linux code are you referring to? If you remember it off the
top of your head, it would save me some time digging around. If not,
never mind, I can look it up myself.

> 
> If we want Xen to have a non-uniform layout, are two options are:
> * Fix Linux to have the same non-uniform layout that Xen wants
> (Backwards compatibility for older 64bit PV guests can be achieved with
> xen-shim).
> * Make use XPTI algorithm (specifically, the pagetable sync/copy part)
> forever more in the future.
> 
> Option 2 isn't great (especially for perf on fixed hardware), but does
> keep all the necessary changes in Xen.  Option 1 looks to be the better
> option longterm.

What is the problem with 1+2 at the same time? I think XPTI can be
enabled / disabled on a per-guest basis?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-22 14:55 ` Wei Liu
@ 2018-10-22 15:09   ` Woodhouse, David
  2018-10-22 15:14     ` Andrew Cooper
  2018-10-25 14:50   ` Jan Beulich
  1 sibling, 1 reply; 63+ messages in thread
From: Woodhouse, David @ 2018-10-22 15:09 UTC (permalink / raw)
  To: Nuernberger, Stefan, andrew.cooper3, wei.liu2
  Cc: JGross, sergey.dyasli, lars.kurth, sstabellini, dfaggioli,
	Wilson, Matt, konrad.wilk, george.dunlap, ross.philipson, mdontu,
	marmarek, xen-devel, Pohlack, Martin, JBeulich, Liguori, Anthony,
	joao.m.martins, roger.pau, julien.grall@arm.com


[-- Attachment #1.1: Type: text/plain, Size: 4077 bytes --]

Adding Stefan to Cc.

Should we take this to the spexen or another mailing list?


On Mon, 2018-10-22 at 15:55 +0100, Wei Liu wrote:
> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
> > Hello,
> > 
> > This is an accumulation and summary of various tasks which have been
> > discussed since the revelation of the speculative security issues in
> > January, and also an invitation to discuss alternative ideas.  They are
> > x86 specific, but a lot of the principles are architecture-agnostic.
> > 
> > 1) A secrets-free hypervisor.
> > 
> > Basically every hypercall can be (ab)used by a guest, and used as an
> > arbitrary cache-load gadget.  Logically, this is the first half of a
> > Spectre SP1 gadget, and is usually the first stepping stone to
> > exploiting one of the speculative sidechannels.
> > 
> > Short of compiling Xen with LLVM's Speculative Load Hardening (which is
> > still experimental, and comes with a ~30% perf hit in the common case),
> > this is unavoidable.  Furthermore, throwing a few array_index_nospec()
> > into the code isn't a viable solution to the problem.
> > 
> > An alternative option is to have less data mapped into Xen's virtual
> > address space - if a piece of memory isn't mapped, it can't be loaded
> > into the cache.
> > 
> > An easy first step here is to remove Xen's directmap, which will mean
> > that guests general RAM isn't mapped by default into Xen's address
> > space.  This will come with some performance hit, as the
> > map_domain_page() infrastructure will now have to actually
> > create/destroy mappings, but removing the directmap will cause an
> > improvement for non-speculative security as well (No possibility of
> > ret2dir as an exploit technique).
> 
> I have looked into making the "separate xenheap domheap with partial
> direct map" mode (see common/page_alloc.c) work but found it not as
> straight forward as it should've been.
> 
> Before I spend more time on this, I would like some opinions on if there
> is other approach which might be more useful than that mode.
> 
> > 
> > Beyond the directmap, there are plenty of other interesting secrets in
> > the Xen heap and other mappings, such as the stacks of the other pcpus. 
> > Fixing this requires moving Xen to having a non-uniform memory layout,
> > and this is much harder to change.  I already experimented with this as
> > a meltdown mitigation around about a year ago, and posted the resulting
> > series on Jan 4th,
> > https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html,
> > some trivial bits of which have already found their way upstream.
> > 
> > To have a non-uniform memory layout, Xen may not share L4 pagetables. 
> > i.e. Xen must never have two pcpus which reference the same pagetable in
> > %cr3.
> > 
> > This property already holds for 32bit PV guests, and all HVM guests, but
> > 64bit PV guests are the sticking point.  Because Linux has a flat memory
> > layout, when a 64bit PV guest schedules two threads from the same
> > process on separate vcpus, those two vcpus have the same virtual %cr3,
> > and currently, Xen programs the same real %cr3 into hardware.
> 
> Which bit of Linux code are you referring to? If you remember it off the
> top of your head, it would save me some time digging around. If not,
> never mind, I can look it up myself.
> 
> > 
> > If we want Xen to have a non-uniform layout, are two options are:
> > * Fix Linux to have the same non-uniform layout that Xen wants
> > (Backwards compatibility for older 64bit PV guests can be achieved with
> > xen-shim).
> > * Make use XPTI algorithm (specifically, the pagetable sync/copy part)
> > forever more in the future.
> > 
> > Option 2 isn't great (especially for perf on fixed hardware), but does
> > keep all the necessary changes in Xen.  Option 1 looks to be the better
> > option longterm.
> 
> What is the problem with 1+2 at the same time? I think XPTI can be
> enabled / disabled on a per-guest basis?
> 
> Wei.


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5210 bytes --]

[-- Attachment #2.1: Type: text/plain, Size: 215 bytes --]




Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.



[-- Attachment #2.2: Type: text/html, Size: 228 bytes --]

[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-22 15:09   ` Woodhouse, David
@ 2018-10-22 15:14     ` Andrew Cooper
  0 siblings, 0 replies; 63+ messages in thread
From: Andrew Cooper @ 2018-10-22 15:14 UTC (permalink / raw)
  To: Woodhouse, David, Nuernberger, Stefan, wei.liu2
  Cc: JGross, sergey.dyasli, lars.kurth, sstabellini, dfaggioli,
	Wilson, Matt, konrad.wilk, george.dunlap, ross.philipson, mdontu,
	marmarek, xen-devel, Pohlack, Martin, JBeulich, Liguori, Anthony,
	joao.m.martins, roger.pau, julien.grall@arm.com

On 22/10/18 16:09, Woodhouse, David wrote:
> Adding Stefan to Cc.
>
> Should we take this to the spexen or another mailing list?

Now that L1TF is public, so is all of this.  I see no reason to continue
it in private.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-18 17:46 Ongoing/future speculative mitigation work Andrew Cooper
  2018-10-19  8:09 ` Dario Faggioli
  2018-10-22 14:55 ` Wei Liu
@ 2018-10-24 15:24 ` Tamas K Lengyel
  2018-10-25 16:01   ` Dario Faggioli
  2018-10-25 16:55   ` Andrew Cooper
  2018-12-07 18:40 ` Wei Liu
  2019-01-24 11:44 ` Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work) Wei Liu
  4 siblings, 2 replies; 63+ messages in thread
From: Tamas K Lengyel @ 2018-10-24 15:24 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Dario Faggioli, msw,
	Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu, George Dunlap,
	Xen-devel, mdontu, dwmw, Roger Pau Monné

> A solution to this issue was proposed, whereby Xen synchronises siblings
> on vmexit/entry, so we are never executing code in two different
> privilege levels.  Getting this working would make it safe to continue
> using hyperthreading even in the presence of L1TF.  Obviously, its going
> to come in perf hit, but compared to disabling hyperthreading, all its
> got to do is beat a 60% perf hit to make it the preferable option for
> making your system L1TF-proof.

Could you shed some light what tests were done where that 60%
performance hit was observed? We have performed intensive stress-tests
to confirm this but according to our findings turning off
hyper-threading is actually improving performance on all machines we
tested thus far.

Thanks,
Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-22 14:55 ` Wei Liu
  2018-10-22 15:09   ` Woodhouse, David
@ 2018-10-25 14:50   ` Jan Beulich
  2018-10-25 14:56     ` George Dunlap
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2018-10-25 14:50 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

>>> On 22.10.18 at 16:55, <wei.liu2@citrix.com> wrote:
> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>> An easy first step here is to remove Xen's directmap, which will mean
>> that guests general RAM isn't mapped by default into Xen's address
>> space.  This will come with some performance hit, as the
>> map_domain_page() infrastructure will now have to actually
>> create/destroy mappings, but removing the directmap will cause an
>> improvement for non-speculative security as well (No possibility of
>> ret2dir as an exploit technique).
> 
> I have looked into making the "separate xenheap domheap with partial
> direct map" mode (see common/page_alloc.c) work but found it not as
> straight forward as it should've been.
> 
> Before I spend more time on this, I would like some opinions on if there
> is other approach which might be more useful than that mode.

How would such a split heap model help with L1TF, where the
guest specifies host physical addresses in its vulnerable page
table entries (and hence could spy at xenheap but - due to not
being mapped - not domheap)?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 14:50   ` Jan Beulich
@ 2018-10-25 14:56     ` George Dunlap
  2018-10-25 15:02       ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: George Dunlap @ 2018-10-25 14:56 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

On 10/25/2018 03:50 PM, Jan Beulich wrote:
>>>> On 22.10.18 at 16:55, <wei.liu2@citrix.com> wrote:
>> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>>> An easy first step here is to remove Xen's directmap, which will mean
>>> that guests general RAM isn't mapped by default into Xen's address
>>> space.  This will come with some performance hit, as the
>>> map_domain_page() infrastructure will now have to actually
>>> create/destroy mappings, but removing the directmap will cause an
>>> improvement for non-speculative security as well (No possibility of
>>> ret2dir as an exploit technique).
>>
>> I have looked into making the "separate xenheap domheap with partial
>> direct map" mode (see common/page_alloc.c) work but found it not as
>> straight forward as it should've been.
>>
>> Before I spend more time on this, I would like some opinions on if there
>> is other approach which might be more useful than that mode.
> 
> How would such a split heap model help with L1TF, where the
> guest specifies host physical addresses in its vulnerable page
> table entries

I don't think it would.

> (and hence could spy at xenheap but - due to not
> being mapped - not domheap)?

Er, didn't follow this bit -- if L1TF is related to host physical
addresses, how does having a virtual mapping in Xen affect things in any
way?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 14:56     ` George Dunlap
@ 2018-10-25 15:02       ` Jan Beulich
  2018-10-25 16:29         ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2018-10-25 15:02 UTC (permalink / raw)
  To: george.dunlap
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel List, Daniel Kiper, David Woodhouse

>>> On 25.10.18 at 16:56, <george.dunlap@citrix.com> wrote:
> On 10/25/2018 03:50 PM, Jan Beulich wrote:
>>>>> On 22.10.18 at 16:55, <wei.liu2@citrix.com> wrote:
>>> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>>>> An easy first step here is to remove Xen's directmap, which will mean
>>>> that guests general RAM isn't mapped by default into Xen's address
>>>> space.  This will come with some performance hit, as the
>>>> map_domain_page() infrastructure will now have to actually
>>>> create/destroy mappings, but removing the directmap will cause an
>>>> improvement for non-speculative security as well (No possibility of
>>>> ret2dir as an exploit technique).
>>>
>>> I have looked into making the "separate xenheap domheap with partial
>>> direct map" mode (see common/page_alloc.c) work but found it not as
>>> straight forward as it should've been.
>>>
>>> Before I spend more time on this, I would like some opinions on if there
>>> is other approach which might be more useful than that mode.
>> 
>> How would such a split heap model help with L1TF, where the
>> guest specifies host physical addresses in its vulnerable page
>> table entries
> 
> I don't think it would.
> 
>> (and hence could spy at xenheap but - due to not
>> being mapped - not domheap)?
> 
> Er, didn't follow this bit -- if L1TF is related to host physical
> addresses, how does having a virtual mapping in Xen affect things in any
> way?

Hmm, indeed. Scratch that part.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-24 15:24 ` Tamas K Lengyel
@ 2018-10-25 16:01   ` Dario Faggioli
  2018-10-25 16:25     ` Tamas K Lengyel
  2018-10-25 16:55   ` Andrew Cooper
  1 sibling, 1 reply; 63+ messages in thread
From: Dario Faggioli @ 2018-10-25 16:01 UTC (permalink / raw)
  To: Tamas K Lengyel, Andrew Cooper
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, msw, Boris Ostrovsky,
	JGross, sergey.dyasli, Wei Liu, George Dunlap, Xen-devel, mdontu,
	dwmw, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 1761 bytes --]

On Wed, 2018-10-24 at 09:24 -0600, Tamas K Lengyel wrote:
> > A solution to this issue was proposed, whereby Xen synchronises
> > siblings
> > on vmexit/entry, so we are never executing code in two different
> > privilege levels.  Getting this working would make it safe to
> > continue
> > using hyperthreading even in the presence of L1TF.  Obviously, its
> > going
> > to come in perf hit, but compared to disabling hyperthreading, all
> > its
> > got to do is beat a 60% perf hit to make it the preferable option
> > for
> > making your system L1TF-proof.
> 
> Could you shed some light what tests were done where that 60%
> performance hit was observed? 
>
I don't have any data handy right now, but I have certainly seen
hyperthreading being beneficial for performance in more than a few
benchmarks and workloads. How much so, this indeed varies *a lot* both
with the platform and with the workload itself.

That being said, I agree it would be good to have as much data as
possible. I'll try to do something about that.

> We have performed intensive stress-tests
> to confirm this but according to our findings turning off
> hyper-threading is actually improving performance on all machines we
> tested thus far.
> 
Which is indeed very interesting. But, as we're discussing in the other
thread, I would, in your case, do some more measurements, varying the
configuration of the system, in order to be absolutely sure you are not
hitting some bug or anomaly.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 16:01   ` Dario Faggioli
@ 2018-10-25 16:25     ` Tamas K Lengyel
  2018-10-25 17:23       ` Dario Faggioli
  0 siblings, 1 reply; 63+ messages in thread
From: Tamas K Lengyel @ 2018-10-25 16:25 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Matt Wilson,
	Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu, George Dunlap,
	Andrew Cooper, Xen-devel, mdontu, dwmw, Roger Pau Monné

On Thu, Oct 25, 2018 at 10:01 AM Dario Faggioli <dfaggioli@suse.com> wrote:
>
> On Wed, 2018-10-24 at 09:24 -0600, Tamas K Lengyel wrote:
> > > A solution to this issue was proposed, whereby Xen synchronises
> > > siblings
> > > on vmexit/entry, so we are never executing code in two different
> > > privilege levels.  Getting this working would make it safe to
> > > continue
> > > using hyperthreading even in the presence of L1TF.  Obviously, its
> > > going
> > > to come in perf hit, but compared to disabling hyperthreading, all
> > > its
> > > got to do is beat a 60% perf hit to make it the preferable option
> > > for
> > > making your system L1TF-proof.
> >
> > Could you shed some light what tests were done where that 60%
> > performance hit was observed?
> >
> I don't have any data handy right now, but I have certainly seen
> hyperthreading being beneficial for performance in more than a few
> benchmarks and workloads. How much so, this indeed varies *a lot* both
> with the platform and with the workload itself.
>
> That being said, I agree it would be good to have as much data as
> possible. I'll try to do something about that.
>
> > We have performed intensive stress-tests
> > to confirm this but according to our findings turning off
> > hyper-threading is actually improving performance on all machines we
> > tested thus far.
> >
> Which is indeed very interesting. But, as we're discussing in the other
> thread, I would, in your case, do some more measurements, varying the
> configuration of the system, in order to be absolutely sure you are not
> hitting some bug or anomaly.

Sure, I would be happy to repeat tests that were done in the past to
see whether they are still holding. We have run this test with Xen
4.10, 4.11 and 4.12-unstable on laptops and desktops, using credit1
and credit2, and it is consistent that hyperthreading yields the worst
performance. It varies between platforms but it's around 10-40%
performance hit with hyperthread on. This test we do is a very CPU
intensive test where we heavily oversubscribe the system. But I don't
think it would be all that unusual to run into such a setup in the
real world from time-to-time.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 15:02       ` Jan Beulich
@ 2018-10-25 16:29         ` Andrew Cooper
  2018-10-25 16:43           ` George Dunlap
  2018-10-26  9:16           ` Jan Beulich
  0 siblings, 2 replies; 63+ messages in thread
From: Andrew Cooper @ 2018-10-25 16:29 UTC (permalink / raw)
  To: Jan Beulich, george.dunlap
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

On 25/10/18 16:02, Jan Beulich wrote:
>>>> On 25.10.18 at 16:56, <george.dunlap@citrix.com> wrote:
>> On 10/25/2018 03:50 PM, Jan Beulich wrote:
>>>>>> On 22.10.18 at 16:55, <wei.liu2@citrix.com> wrote:
>>>> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>>>>> An easy first step here is to remove Xen's directmap, which will mean
>>>>> that guests general RAM isn't mapped by default into Xen's address
>>>>> space.  This will come with some performance hit, as the
>>>>> map_domain_page() infrastructure will now have to actually
>>>>> create/destroy mappings, but removing the directmap will cause an
>>>>> improvement for non-speculative security as well (No possibility of
>>>>> ret2dir as an exploit technique).
>>>> I have looked into making the "separate xenheap domheap with partial
>>>> direct map" mode (see common/page_alloc.c) work but found it not as
>>>> straight forward as it should've been.
>>>>
>>>> Before I spend more time on this, I would like some opinions on if there
>>>> is other approach which might be more useful than that mode.
>>> How would such a split heap model help with L1TF, where the
>>> guest specifies host physical addresses in its vulnerable page
>>> table entries
>> I don't think it would.
>>
>>> (and hence could spy at xenheap but - due to not
>>> being mapped - not domheap)?
>> Er, didn't follow this bit -- if L1TF is related to host physical
>> addresses, how does having a virtual mapping in Xen affect things in any
>> way?
> Hmm, indeed. Scratch that part.

There seems to be quite a bit of confusion in these replies.

To exploit L1TF, the data in question has to be present in the L1 cache
when the attack is performed.

In practice, an attacker has to arrange for target data to be resident
in the L1 cache.  One way it can do this when HT is enabled is via a
cache-load gadget such as the first half of an SP1 attack on the other
hyperthread.  A different way mechanism is to try and cause Xen to
speculatively access a piece of data, and have the hardware prefetch
bring it into the cache.

Everything which is virtually mapped in Xen is potentially vulnerable,
and the goal of the "secret-free Xen" is to make sure that in the
context of one vcpu pulling off an attack like this, there is no
interesting data which can be exfiltrated.

A single xenheap model means that everything allocated with
alloc_xenheap_page() (e.g. struct domain, struct vcpu, pcpu stacks) are
potentially exposed to all domains.

A split xenheap model means that data pertaining to other guests isn't
mapped in the context of this vcpu, so cannot be brought into the cache.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 16:29         ` Andrew Cooper
@ 2018-10-25 16:43           ` George Dunlap
  2018-10-25 16:50             ` Andrew Cooper
  2018-10-26  9:16           ` Jan Beulich
  1 sibling, 1 reply; 63+ messages in thread
From: George Dunlap @ 2018-10-25 16:43 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

On 10/25/2018 05:29 PM, Andrew Cooper wrote:
> On 25/10/18 16:02, Jan Beulich wrote:
>>>>> On 25.10.18 at 16:56, <george.dunlap@citrix.com> wrote:
>>> On 10/25/2018 03:50 PM, Jan Beulich wrote:
>>>>>>> On 22.10.18 at 16:55, <wei.liu2@citrix.com> wrote:
>>>>> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>>>>>> An easy first step here is to remove Xen's directmap, which will mean
>>>>>> that guests general RAM isn't mapped by default into Xen's address
>>>>>> space.  This will come with some performance hit, as the
>>>>>> map_domain_page() infrastructure will now have to actually
>>>>>> create/destroy mappings, but removing the directmap will cause an
>>>>>> improvement for non-speculative security as well (No possibility of
>>>>>> ret2dir as an exploit technique).
>>>>> I have looked into making the "separate xenheap domheap with partial
>>>>> direct map" mode (see common/page_alloc.c) work but found it not as
>>>>> straight forward as it should've been.
>>>>>
>>>>> Before I spend more time on this, I would like some opinions on if there
>>>>> is other approach which might be more useful than that mode.
>>>> How would such a split heap model help with L1TF, where the
>>>> guest specifies host physical addresses in its vulnerable page
>>>> table entries
>>> I don't think it would.
>>>
>>>> (and hence could spy at xenheap but - due to not
>>>> being mapped - not domheap)?
>>> Er, didn't follow this bit -- if L1TF is related to host physical
>>> addresses, how does having a virtual mapping in Xen affect things in any
>>> way?
>> Hmm, indeed. Scratch that part.
> 
> There seems to be quite a bit of confusion in these replies.
> 
> To exploit L1TF, the data in question has to be present in the L1 cache
> when the attack is performed.
> 
> In practice, an attacker has to arrange for target data to be resident
> in the L1 cache.  One way it can do this when HT is enabled is via a
> cache-load gadget such as the first half of an SP1 attack on the other
> hyperthread.  A different way mechanism is to try and cause Xen to
> speculatively access a piece of data, and have the hardware prefetch
> bring it into the cache.

Right -- so a split xen/domheap model doesn't prevent L1TF attacks, but
it does make L1TF much harder to pull off, because it now only works if
you can manage to get onto the same core as the victim, after the victim
has accessed the data you want.

So it would reduce the risk of L1TF significantly, but not enough (I
think) that we could recommend disabling other mitigations.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 16:43           ` George Dunlap
@ 2018-10-25 16:50             ` Andrew Cooper
  2018-10-25 17:07               ` George Dunlap
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2018-10-25 16:50 UTC (permalink / raw)
  To: George Dunlap, Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

On 25/10/18 17:43, George Dunlap wrote:
> On 10/25/2018 05:29 PM, Andrew Cooper wrote:
>> On 25/10/18 16:02, Jan Beulich wrote:
>>>>>> On 25.10.18 at 16:56, <george.dunlap@citrix.com> wrote:
>>>> On 10/25/2018 03:50 PM, Jan Beulich wrote:
>>>>>>>> On 22.10.18 at 16:55, <wei.liu2@citrix.com> wrote:
>>>>>> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>>>>>>> An easy first step here is to remove Xen's directmap, which will mean
>>>>>>> that guests general RAM isn't mapped by default into Xen's address
>>>>>>> space.  This will come with some performance hit, as the
>>>>>>> map_domain_page() infrastructure will now have to actually
>>>>>>> create/destroy mappings, but removing the directmap will cause an
>>>>>>> improvement for non-speculative security as well (No possibility of
>>>>>>> ret2dir as an exploit technique).
>>>>>> I have looked into making the "separate xenheap domheap with partial
>>>>>> direct map" mode (see common/page_alloc.c) work but found it not as
>>>>>> straight forward as it should've been.
>>>>>>
>>>>>> Before I spend more time on this, I would like some opinions on if there
>>>>>> is other approach which might be more useful than that mode.
>>>>> How would such a split heap model help with L1TF, where the
>>>>> guest specifies host physical addresses in its vulnerable page
>>>>> table entries
>>>> I don't think it would.
>>>>
>>>>> (and hence could spy at xenheap but - due to not
>>>>> being mapped - not domheap)?
>>>> Er, didn't follow this bit -- if L1TF is related to host physical
>>>> addresses, how does having a virtual mapping in Xen affect things in any
>>>> way?
>>> Hmm, indeed. Scratch that part.
>> There seems to be quite a bit of confusion in these replies.
>>
>> To exploit L1TF, the data in question has to be present in the L1 cache
>> when the attack is performed.
>>
>> In practice, an attacker has to arrange for target data to be resident
>> in the L1 cache.  One way it can do this when HT is enabled is via a
>> cache-load gadget such as the first half of an SP1 attack on the other
>> hyperthread.  A different way mechanism is to try and cause Xen to
>> speculatively access a piece of data, and have the hardware prefetch
>> bring it into the cache.
> Right -- so a split xen/domheap model doesn't prevent L1TF attacks, but
> it does make L1TF much harder to pull off, because it now only works if
> you can manage to get onto the same core as the victim, after the victim
> has accessed the data you want.
>
> So it would reduce the risk of L1TF significantly, but not enough (I
> think) that we could recommend disabling other mitigations.

Correct.  All of these suggestions are for increased defence in depth. 
They are not replacements for the existing mitigations.

From a practical point of view, until people work out how to
comprehensively solve SP1, reducing the quantity of mapped data is the
only practical defence that an OS/Hypervisor has.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-24 15:24 ` Tamas K Lengyel
  2018-10-25 16:01   ` Dario Faggioli
@ 2018-10-25 16:55   ` Andrew Cooper
  2018-10-25 17:01     ` George Dunlap
  1 sibling, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2018-10-25 16:55 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Dario Faggioli, msw,
	Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu, George Dunlap,
	Xen-devel, mdontu, dwmw, Roger Pau Monné

On 24/10/18 16:24, Tamas K Lengyel wrote:
>> A solution to this issue was proposed, whereby Xen synchronises siblings
>> on vmexit/entry, so we are never executing code in two different
>> privilege levels.  Getting this working would make it safe to continue
>> using hyperthreading even in the presence of L1TF.  Obviously, its going
>> to come in perf hit, but compared to disabling hyperthreading, all its
>> got to do is beat a 60% perf hit to make it the preferable option for
>> making your system L1TF-proof.
> Could you shed some light what tests were done where that 60%
> performance hit was observed? We have performed intensive stress-tests
> to confirm this but according to our findings turning off
> hyper-threading is actually improving performance on all machines we
> tested thus far.

Aggregate inter and intra host disk and network throughput, which is a
reasonable approximation of a load of webserver VM's on a single
physical server.  Small packet IO was hit worst, as it has a very high
vcpu context switch rate between dom0 and domU.  Disabling HT means you
have half the number of logical cores to schedule on, which doubles the
mean time to next timeslice.

In principle, for a fully optimised workload, HT gets you ~30% extra due
to increased utilisation of the pipeline functional units.  Some
resources are statically partitioned, while some are competitively
shared, and its now been well proven that actions on one thread can have
a large effect on others.

Two arbitrary vcpus are not an optimised workload.  If the perf
improvement you get from not competing in the pipeline is greater than
the perf loss from Xen's reduced capability to schedule, then disabling
HT would be an improvement.  I can certainly believe that this might be
the case for Qubes style workloads where you are probably not very
overprovisioned, and you probably don't have long running IO and CPU
bound tasks in the VMs.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 16:55   ` Andrew Cooper
@ 2018-10-25 17:01     ` George Dunlap
  2018-10-25 17:35       ` Tamas K Lengyel
  0 siblings, 1 reply; 63+ messages in thread
From: George Dunlap @ 2018-10-25 17:01 UTC (permalink / raw)
  To: Andrew Cooper, Tamas K Lengyel
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Dario Faggioli, msw,
	Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu, George Dunlap,
	Xen-devel, mdontu, dwmw, Roger Pau Monné

On 10/25/2018 05:55 PM, Andrew Cooper wrote:
> On 24/10/18 16:24, Tamas K Lengyel wrote:
>>> A solution to this issue was proposed, whereby Xen synchronises siblings
>>> on vmexit/entry, so we are never executing code in two different
>>> privilege levels.  Getting this working would make it safe to continue
>>> using hyperthreading even in the presence of L1TF.  Obviously, its going
>>> to come in perf hit, but compared to disabling hyperthreading, all its
>>> got to do is beat a 60% perf hit to make it the preferable option for
>>> making your system L1TF-proof.
>> Could you shed some light what tests were done where that 60%
>> performance hit was observed? We have performed intensive stress-tests
>> to confirm this but according to our findings turning off
>> hyper-threading is actually improving performance on all machines we
>> tested thus far.
> 
> Aggregate inter and intra host disk and network throughput, which is a
> reasonable approximation of a load of webserver VM's on a single
> physical server.  Small packet IO was hit worst, as it has a very high
> vcpu context switch rate between dom0 and domU.  Disabling HT means you
> have half the number of logical cores to schedule on, which doubles the
> mean time to next timeslice.
> 
> In principle, for a fully optimised workload, HT gets you ~30% extra due
> to increased utilisation of the pipeline functional units.  Some
> resources are statically partitioned, while some are competitively
> shared, and its now been well proven that actions on one thread can have
> a large effect on others.
> 
> Two arbitrary vcpus are not an optimised workload.  If the perf
> improvement you get from not competing in the pipeline is greater than
> the perf loss from Xen's reduced capability to schedule, then disabling
> HT would be an improvement.  I can certainly believe that this might be
> the case for Qubes style workloads where you are probably not very
> overprovisioned, and you probably don't have long running IO and CPU
> bound tasks in the VMs.

As another data point, I think it was MSCI who said they always disabled
hyperthreading, because they also found that their workloads ran slower
with HT than without.  Presumably they were doing massive number
crunching, such that each thread was waiting on the ALU a significant
portion of the time anyway; at which point the superscalar scheduling
and/or reduction in cache efficiency would have brought performance from
"no benefit" down to "negative benefit".

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 16:50             ` Andrew Cooper
@ 2018-10-25 17:07               ` George Dunlap
  0 siblings, 0 replies; 63+ messages in thread
From: George Dunlap @ 2018-10-25 17:07 UTC (permalink / raw)
  To: Andrew Cooper, Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

On 10/25/2018 05:50 PM, Andrew Cooper wrote:
> On 25/10/18 17:43, George Dunlap wrote:
>> On 10/25/2018 05:29 PM, Andrew Cooper wrote:
>>> On 25/10/18 16:02, Jan Beulich wrote:
>>>>>>> On 25.10.18 at 16:56, <george.dunlap@citrix.com> wrote:
>>>>> On 10/25/2018 03:50 PM, Jan Beulich wrote:
>>>>>>>>> On 22.10.18 at 16:55, <wei.liu2@citrix.com> wrote:
>>>>>>> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>>>>>>>> An easy first step here is to remove Xen's directmap, which will mean
>>>>>>>> that guests general RAM isn't mapped by default into Xen's address
>>>>>>>> space.  This will come with some performance hit, as the
>>>>>>>> map_domain_page() infrastructure will now have to actually
>>>>>>>> create/destroy mappings, but removing the directmap will cause an
>>>>>>>> improvement for non-speculative security as well (No possibility of
>>>>>>>> ret2dir as an exploit technique).
>>>>>>> I have looked into making the "separate xenheap domheap with partial
>>>>>>> direct map" mode (see common/page_alloc.c) work but found it not as
>>>>>>> straight forward as it should've been.
>>>>>>>
>>>>>>> Before I spend more time on this, I would like some opinions on if there
>>>>>>> is other approach which might be more useful than that mode.
>>>>>> How would such a split heap model help with L1TF, where the
>>>>>> guest specifies host physical addresses in its vulnerable page
>>>>>> table entries
>>>>> I don't think it would.
>>>>>
>>>>>> (and hence could spy at xenheap but - due to not
>>>>>> being mapped - not domheap)?
>>>>> Er, didn't follow this bit -- if L1TF is related to host physical
>>>>> addresses, how does having a virtual mapping in Xen affect things in any
>>>>> way?
>>>> Hmm, indeed. Scratch that part.
>>> There seems to be quite a bit of confusion in these replies.
>>>
>>> To exploit L1TF, the data in question has to be present in the L1 cache
>>> when the attack is performed.
>>>
>>> In practice, an attacker has to arrange for target data to be resident
>>> in the L1 cache.  One way it can do this when HT is enabled is via a
>>> cache-load gadget such as the first half of an SP1 attack on the other
>>> hyperthread.  A different way mechanism is to try and cause Xen to
>>> speculatively access a piece of data, and have the hardware prefetch
>>> bring it into the cache.
>> Right -- so a split xen/domheap model doesn't prevent L1TF attacks, but
>> it does make L1TF much harder to pull off, because it now only works if
>> you can manage to get onto the same core as the victim, after the victim
>> has accessed the data you want.
>>
>> So it would reduce the risk of L1TF significantly, but not enough (I
>> think) that we could recommend disabling other mitigations.
> 
> Correct.  All of these suggestions are for increased defence in depth. 
> They are not replacements for the existing mitigations.

But it could be a mitigation for, say, Meltdown, yes?  I'm trying to
remember the details; but wouldn't a "secret-free Xen" mean that
disabling XPTI entirely for 64-bit PV guests would be a reasonable
decision (even if many people left it enabled 'just in case')?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 16:25     ` Tamas K Lengyel
@ 2018-10-25 17:23       ` Dario Faggioli
  2018-10-25 17:29         ` Tamas K Lengyel
  0 siblings, 1 reply; 63+ messages in thread
From: Dario Faggioli @ 2018-10-25 17:23 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Matt Wilson,
	Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu, George Dunlap,
	Andrew Cooper, Xen-devel, mdontu, dwmw, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 2591 bytes --]

On Thu, 2018-10-25 at 10:25 -0600, Tamas K Lengyel wrote:
> On Thu, Oct 25, 2018 at 10:01 AM Dario Faggioli <dfaggioli@suse.com>
> wrote:
> > 
> > Which is indeed very interesting. But, as we're discussing in the
> > other
> > thread, I would, in your case, do some more measurements, varying
> > the
> > configuration of the system, in order to be absolutely sure you are
> > not
> > hitting some bug or anomaly.
> 
> Sure, I would be happy to repeat tests that were done in the past to
> see whether they are still holding. We have run this test with Xen
> 4.10, 4.11 and 4.12-unstable on laptops and desktops, using credit1
> and credit2, and it is consistent that hyperthreading yields the
> worst
> performance. 
>
So, just to be clear, I'm not saying it's impossible to find a workload
for which HT is detrimental. Quite the opposite. And these benchmarks
you're running might well fall into that category.

I'm just suggesting to double check that. :-)

> It varies between platforms but it's around 10-40%
> performance hit with hyperthread on. This test we do is a very CPU
> intensive test where we heavily oversubscribe the system. But I don't
> think it would be all that unusual to run into such a setup in the
> real world from time-to-time.
> 
Ah, ok, so you're _heavily_ oversubscribing...

So, I don't think that an heavily oversubscribed host, where all vCPUs
would want to run 100% CPU intensive activities --and this not being
some transient situation-- is that common. And for the ones for which
it is, there is not much we can do, hyperthreading or not.

In any case, hyperthreading works best when the workload is mixed,
where it helps making sure that IO-bound tasks have enough chances to
file a lot of IO requests, without conflicting too much with the CPU-
bound tasks doing their number/logic crunching.

Having _everyone_ wanting to do actual stuff on the CPUs is, IMO, one
of the worst workloads for hyperthreading, and it is in fact a workload
where I've always seen it having the least beneficial effect on
performance. I guess it's possible that, in your case, it's actually
really doing more harm than good.

It's an interesting data point, but I wouldn't use a workload like that
to measure the benefit, or the impact, of an SMT related change.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 17:23       ` Dario Faggioli
@ 2018-10-25 17:29         ` Tamas K Lengyel
  2018-10-26  7:31           ` Dario Faggioli
  0 siblings, 1 reply; 63+ messages in thread
From: Tamas K Lengyel @ 2018-10-25 17:29 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Matt Wilson,
	Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu, George Dunlap,
	Andrew Cooper, Xen-devel, mdontu, dwmw, Roger Pau Monné

On Thu, Oct 25, 2018 at 11:23 AM Dario Faggioli <dfaggioli@suse.com> wrote:
>
> On Thu, 2018-10-25 at 10:25 -0600, Tamas K Lengyel wrote:
> > On Thu, Oct 25, 2018 at 10:01 AM Dario Faggioli <dfaggioli@suse.com>
> > wrote:
> > >
> > > Which is indeed very interesting. But, as we're discussing in the
> > > other
> > > thread, I would, in your case, do some more measurements, varying
> > > the
> > > configuration of the system, in order to be absolutely sure you are
> > > not
> > > hitting some bug or anomaly.
> >
> > Sure, I would be happy to repeat tests that were done in the past to
> > see whether they are still holding. We have run this test with Xen
> > 4.10, 4.11 and 4.12-unstable on laptops and desktops, using credit1
> > and credit2, and it is consistent that hyperthreading yields the
> > worst
> > performance.
> >
> So, just to be clear, I'm not saying it's impossible to find a workload
> for which HT is detrimental. Quite the opposite. And these benchmarks
> you're running might well fall into that category.
>
> I'm just suggesting to double check that. :-)
>
> > It varies between platforms but it's around 10-40%
> > performance hit with hyperthread on. This test we do is a very CPU
> > intensive test where we heavily oversubscribe the system. But I don't
> > think it would be all that unusual to run into such a setup in the
> > real world from time-to-time.
> >
> Ah, ok, so you're _heavily_ oversubscribing...
>
> So, I don't think that an heavily oversubscribed host, where all vCPUs
> would want to run 100% CPU intensive activities --and this not being
> some transient situation-- is that common. And for the ones for which
> it is, there is not much we can do, hyperthreading or not.
>
> In any case, hyperthreading works best when the workload is mixed,
> where it helps making sure that IO-bound tasks have enough chances to
> file a lot of IO requests, without conflicting too much with the CPU-
> bound tasks doing their number/logic crunching.
>
> Having _everyone_ wanting to do actual stuff on the CPUs is, IMO, one
> of the worst workloads for hyperthreading, and it is in fact a workload
> where I've always seen it having the least beneficial effect on
> performance. I guess it's possible that, in your case, it's actually
> really doing more harm than good.
>
> It's an interesting data point, but I wouldn't use a workload like that
> to measure the benefit, or the impact, of an SMT related change.

Thanks, and indeed this test is the worst-case scenario for
hyperthreading, that's was our goal. While a typical work-load may not
be similar, it is a possible one for the system we are concerned
about. So if at any given time the benefit of hyperthreading ranges
between say +30% and -30% and we can't predict the workload or
optimize it, it is looking like a safe bet to just disable
hyperthreading. Would you agree?

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 17:01     ` George Dunlap
@ 2018-10-25 17:35       ` Tamas K Lengyel
  2018-10-25 17:43         ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Tamas K Lengyel @ 2018-10-25 17:35 UTC (permalink / raw)
  To: George Dunlap
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Dario Faggioli,
	Matt Wilson, Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu,
	George Dunlap, Andrew Cooper, Xen-devel, mdontu, dwmw,
	Roger Pau Monné

On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@citrix.com> wrote:
>
> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
> > On 24/10/18 16:24, Tamas K Lengyel wrote:
> >>> A solution to this issue was proposed, whereby Xen synchronises siblings
> >>> on vmexit/entry, so we are never executing code in two different
> >>> privilege levels.  Getting this working would make it safe to continue
> >>> using hyperthreading even in the presence of L1TF.  Obviously, its going
> >>> to come in perf hit, but compared to disabling hyperthreading, all its
> >>> got to do is beat a 60% perf hit to make it the preferable option for
> >>> making your system L1TF-proof.
> >> Could you shed some light what tests were done where that 60%
> >> performance hit was observed? We have performed intensive stress-tests
> >> to confirm this but according to our findings turning off
> >> hyper-threading is actually improving performance on all machines we
> >> tested thus far.
> >
> > Aggregate inter and intra host disk and network throughput, which is a
> > reasonable approximation of a load of webserver VM's on a single
> > physical server.  Small packet IO was hit worst, as it has a very high
> > vcpu context switch rate between dom0 and domU.  Disabling HT means you
> > have half the number of logical cores to schedule on, which doubles the
> > mean time to next timeslice.
> >
> > In principle, for a fully optimised workload, HT gets you ~30% extra due
> > to increased utilisation of the pipeline functional units.  Some
> > resources are statically partitioned, while some are competitively
> > shared, and its now been well proven that actions on one thread can have
> > a large effect on others.
> >
> > Two arbitrary vcpus are not an optimised workload.  If the perf
> > improvement you get from not competing in the pipeline is greater than
> > the perf loss from Xen's reduced capability to schedule, then disabling
> > HT would be an improvement.  I can certainly believe that this might be
> > the case for Qubes style workloads where you are probably not very
> > overprovisioned, and you probably don't have long running IO and CPU
> > bound tasks in the VMs.
>
> As another data point, I think it was MSCI who said they always disabled
> hyperthreading, because they also found that their workloads ran slower
> with HT than without.  Presumably they were doing massive number
> crunching, such that each thread was waiting on the ALU a significant
> portion of the time anyway; at which point the superscalar scheduling
> and/or reduction in cache efficiency would have brought performance from
> "no benefit" down to "negative benefit".
>

Thanks for the insights. Indeed, we are primarily concerned with
performance of Qubes-style workloads which may range from
no-oversubscription to heavily oversubscribed. It's not a workload we
can predict or optimize before-hand, so we are looking for a default
that would be 1) safe and 2) performant in the most general case
possible.

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 17:35       ` Tamas K Lengyel
@ 2018-10-25 17:43         ` Andrew Cooper
  2018-10-25 17:58           ` Tamas K Lengyel
  0 siblings, 1 reply; 63+ messages in thread
From: Andrew Cooper @ 2018-10-25 17:43 UTC (permalink / raw)
  To: Tamas K Lengyel, George Dunlap
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Dario Faggioli,
	Matt Wilson, Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu,
	George Dunlap, Xen-devel, mdontu, dwmw, Roger Pau Monné

On 25/10/18 18:35, Tamas K Lengyel wrote:
> On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@citrix.com> wrote:
>> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
>>> On 24/10/18 16:24, Tamas K Lengyel wrote:
>>>>> A solution to this issue was proposed, whereby Xen synchronises siblings
>>>>> on vmexit/entry, so we are never executing code in two different
>>>>> privilege levels.  Getting this working would make it safe to continue
>>>>> using hyperthreading even in the presence of L1TF.  Obviously, its going
>>>>> to come in perf hit, but compared to disabling hyperthreading, all its
>>>>> got to do is beat a 60% perf hit to make it the preferable option for
>>>>> making your system L1TF-proof.
>>>> Could you shed some light what tests were done where that 60%
>>>> performance hit was observed? We have performed intensive stress-tests
>>>> to confirm this but according to our findings turning off
>>>> hyper-threading is actually improving performance on all machines we
>>>> tested thus far.
>>> Aggregate inter and intra host disk and network throughput, which is a
>>> reasonable approximation of a load of webserver VM's on a single
>>> physical server.  Small packet IO was hit worst, as it has a very high
>>> vcpu context switch rate between dom0 and domU.  Disabling HT means you
>>> have half the number of logical cores to schedule on, which doubles the
>>> mean time to next timeslice.
>>>
>>> In principle, for a fully optimised workload, HT gets you ~30% extra due
>>> to increased utilisation of the pipeline functional units.  Some
>>> resources are statically partitioned, while some are competitively
>>> shared, and its now been well proven that actions on one thread can have
>>> a large effect on others.
>>>
>>> Two arbitrary vcpus are not an optimised workload.  If the perf
>>> improvement you get from not competing in the pipeline is greater than
>>> the perf loss from Xen's reduced capability to schedule, then disabling
>>> HT would be an improvement.  I can certainly believe that this might be
>>> the case for Qubes style workloads where you are probably not very
>>> overprovisioned, and you probably don't have long running IO and CPU
>>> bound tasks in the VMs.
>> As another data point, I think it was MSCI who said they always disabled
>> hyperthreading, because they also found that their workloads ran slower
>> with HT than without.  Presumably they were doing massive number
>> crunching, such that each thread was waiting on the ALU a significant
>> portion of the time anyway; at which point the superscalar scheduling
>> and/or reduction in cache efficiency would have brought performance from
>> "no benefit" down to "negative benefit".
>>
> Thanks for the insights. Indeed, we are primarily concerned with
> performance of Qubes-style workloads which may range from
> no-oversubscription to heavily oversubscribed. It's not a workload we
> can predict or optimize before-hand, so we are looking for a default
> that would be 1) safe and 2) performant in the most general case
> possible.

So long as you've got the XSA-273 patches, you should be able to park
and re-reactivate hyperthreads using `xen-hptool cpu-{online,offline} $CPU`.

You should be able to effectively change hyperthreading configuration at
runtime.  It's not quite the same as changing it in the BIOS, but from a
competition of pipeline resources, it should be good enough.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 17:43         ` Andrew Cooper
@ 2018-10-25 17:58           ` Tamas K Lengyel
  2018-10-25 18:13             ` Andrew Cooper
  0 siblings, 1 reply; 63+ messages in thread
From: Tamas K Lengyel @ 2018-10-25 17:58 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, George Dunlap,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, JGross,
	sergey.dyasli, Wei Liu, George Dunlap, Xen-devel, mdontu, dwmw,
	Roger Pau Monné

On Thu, Oct 25, 2018 at 11:43 AM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 25/10/18 18:35, Tamas K Lengyel wrote:
> > On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@citrix.com> wrote:
> >> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
> >>> On 24/10/18 16:24, Tamas K Lengyel wrote:
> >>>>> A solution to this issue was proposed, whereby Xen synchronises siblings
> >>>>> on vmexit/entry, so we are never executing code in two different
> >>>>> privilege levels.  Getting this working would make it safe to continue
> >>>>> using hyperthreading even in the presence of L1TF.  Obviously, its going
> >>>>> to come in perf hit, but compared to disabling hyperthreading, all its
> >>>>> got to do is beat a 60% perf hit to make it the preferable option for
> >>>>> making your system L1TF-proof.
> >>>> Could you shed some light what tests were done where that 60%
> >>>> performance hit was observed? We have performed intensive stress-tests
> >>>> to confirm this but according to our findings turning off
> >>>> hyper-threading is actually improving performance on all machines we
> >>>> tested thus far.
> >>> Aggregate inter and intra host disk and network throughput, which is a
> >>> reasonable approximation of a load of webserver VM's on a single
> >>> physical server.  Small packet IO was hit worst, as it has a very high
> >>> vcpu context switch rate between dom0 and domU.  Disabling HT means you
> >>> have half the number of logical cores to schedule on, which doubles the
> >>> mean time to next timeslice.
> >>>
> >>> In principle, for a fully optimised workload, HT gets you ~30% extra due
> >>> to increased utilisation of the pipeline functional units.  Some
> >>> resources are statically partitioned, while some are competitively
> >>> shared, and its now been well proven that actions on one thread can have
> >>> a large effect on others.
> >>>
> >>> Two arbitrary vcpus are not an optimised workload.  If the perf
> >>> improvement you get from not competing in the pipeline is greater than
> >>> the perf loss from Xen's reduced capability to schedule, then disabling
> >>> HT would be an improvement.  I can certainly believe that this might be
> >>> the case for Qubes style workloads where you are probably not very
> >>> overprovisioned, and you probably don't have long running IO and CPU
> >>> bound tasks in the VMs.
> >> As another data point, I think it was MSCI who said they always disabled
> >> hyperthreading, because they also found that their workloads ran slower
> >> with HT than without.  Presumably they were doing massive number
> >> crunching, such that each thread was waiting on the ALU a significant
> >> portion of the time anyway; at which point the superscalar scheduling
> >> and/or reduction in cache efficiency would have brought performance from
> >> "no benefit" down to "negative benefit".
> >>
> > Thanks for the insights. Indeed, we are primarily concerned with
> > performance of Qubes-style workloads which may range from
> > no-oversubscription to heavily oversubscribed. It's not a workload we
> > can predict or optimize before-hand, so we are looking for a default
> > that would be 1) safe and 2) performant in the most general case
> > possible.
>
> So long as you've got the XSA-273 patches, you should be able to park
> and re-reactivate hyperthreads using `xen-hptool cpu-{online,offline} $CPU`.
>
> You should be able to effectively change hyperthreading configuration at
> runtime.  It's not quite the same as changing it in the BIOS, but from a
> competition of pipeline resources, it should be good enough.
>

Thanks, indeed that is a handy tool to have. We often can't disable
hyperthreading in the BIOS anyway because most BIOS' don't allow you
to do that when TXT is used. That said, with this tool we still
require some way to determine when to do parking/reactivation of
hyperthreads. We could certainly park hyperthreads when we see the
system is being oversubscribed in terms of number of vCPUs being
active, but for real optimization we would have to understand the
workloads running within the VMs if I understand correctly?

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 17:58           ` Tamas K Lengyel
@ 2018-10-25 18:13             ` Andrew Cooper
  2018-10-25 18:35               ` Tamas K Lengyel
  2018-10-26 10:11               ` George Dunlap
  0 siblings, 2 replies; 63+ messages in thread
From: Andrew Cooper @ 2018-10-25 18:13 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, George Dunlap,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, JGross,
	sergey.dyasli, Wei Liu, George Dunlap, Xen-devel, mdontu, dwmw,
	Roger Pau Monné

On 25/10/18 18:58, Tamas K Lengyel wrote:
> On Thu, Oct 25, 2018 at 11:43 AM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 25/10/18 18:35, Tamas K Lengyel wrote:
>>> On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@citrix.com> wrote:
>>>> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
>>>>> On 24/10/18 16:24, Tamas K Lengyel wrote:
>>>>>>> A solution to this issue was proposed, whereby Xen synchronises siblings
>>>>>>> on vmexit/entry, so we are never executing code in two different
>>>>>>> privilege levels.  Getting this working would make it safe to continue
>>>>>>> using hyperthreading even in the presence of L1TF.  Obviously, its going
>>>>>>> to come in perf hit, but compared to disabling hyperthreading, all its
>>>>>>> got to do is beat a 60% perf hit to make it the preferable option for
>>>>>>> making your system L1TF-proof.
>>>>>> Could you shed some light what tests were done where that 60%
>>>>>> performance hit was observed? We have performed intensive stress-tests
>>>>>> to confirm this but according to our findings turning off
>>>>>> hyper-threading is actually improving performance on all machines we
>>>>>> tested thus far.
>>>>> Aggregate inter and intra host disk and network throughput, which is a
>>>>> reasonable approximation of a load of webserver VM's on a single
>>>>> physical server.  Small packet IO was hit worst, as it has a very high
>>>>> vcpu context switch rate between dom0 and domU.  Disabling HT means you
>>>>> have half the number of logical cores to schedule on, which doubles the
>>>>> mean time to next timeslice.
>>>>>
>>>>> In principle, for a fully optimised workload, HT gets you ~30% extra due
>>>>> to increased utilisation of the pipeline functional units.  Some
>>>>> resources are statically partitioned, while some are competitively
>>>>> shared, and its now been well proven that actions on one thread can have
>>>>> a large effect on others.
>>>>>
>>>>> Two arbitrary vcpus are not an optimised workload.  If the perf
>>>>> improvement you get from not competing in the pipeline is greater than
>>>>> the perf loss from Xen's reduced capability to schedule, then disabling
>>>>> HT would be an improvement.  I can certainly believe that this might be
>>>>> the case for Qubes style workloads where you are probably not very
>>>>> overprovisioned, and you probably don't have long running IO and CPU
>>>>> bound tasks in the VMs.
>>>> As another data point, I think it was MSCI who said they always disabled
>>>> hyperthreading, because they also found that their workloads ran slower
>>>> with HT than without.  Presumably they were doing massive number
>>>> crunching, such that each thread was waiting on the ALU a significant
>>>> portion of the time anyway; at which point the superscalar scheduling
>>>> and/or reduction in cache efficiency would have brought performance from
>>>> "no benefit" down to "negative benefit".
>>>>
>>> Thanks for the insights. Indeed, we are primarily concerned with
>>> performance of Qubes-style workloads which may range from
>>> no-oversubscription to heavily oversubscribed. It's not a workload we
>>> can predict or optimize before-hand, so we are looking for a default
>>> that would be 1) safe and 2) performant in the most general case
>>> possible.
>> So long as you've got the XSA-273 patches, you should be able to park
>> and re-reactivate hyperthreads using `xen-hptool cpu-{online,offline} $CPU`.
>>
>> You should be able to effectively change hyperthreading configuration at
>> runtime.  It's not quite the same as changing it in the BIOS, but from a
>> competition of pipeline resources, it should be good enough.
>>
> Thanks, indeed that is a handy tool to have. We often can't disable
> hyperthreading in the BIOS anyway because most BIOS' don't allow you
> to do that when TXT is used.

Hmm - that's an odd restriction.  I don't immediately see why such a
restriction would be necessary.

> That said, with this tool we still
> require some way to determine when to do parking/reactivation of
> hyperthreads. We could certainly park hyperthreads when we see the
> system is being oversubscribed in terms of number of vCPUs being
> active, but for real optimization we would have to understand the
> workloads running within the VMs if I understand correctly?

TBH, I'd perhaps start with an admin control which lets them switch
between the two modes, and some instructions on how/why they might want
to try switching.

Trying to second-guess the best HT setting automatically is most likely
going to be a lost cause.  It will be system specific as to whether the
same workload is better with or without HT.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 18:13             ` Andrew Cooper
@ 2018-10-25 18:35               ` Tamas K Lengyel
  2018-10-25 18:39                 ` Andrew Cooper
  2018-10-26  7:49                 ` Dario Faggioli
  2018-10-26 10:11               ` George Dunlap
  1 sibling, 2 replies; 63+ messages in thread
From: Tamas K Lengyel @ 2018-10-25 18:35 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, George Dunlap,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, JGross,
	sergey.dyasli, Wei Liu, George Dunlap, Xen-devel, mdontu, dwmw,
	Roger Pau Monné

On Thu, Oct 25, 2018 at 12:13 PM Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>
> On 25/10/18 18:58, Tamas K Lengyel wrote:
> > On Thu, Oct 25, 2018 at 11:43 AM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> >> On 25/10/18 18:35, Tamas K Lengyel wrote:
> >>> On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@citrix.com> wrote:
> >>>> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
> >>>>> On 24/10/18 16:24, Tamas K Lengyel wrote:
> >>>>>>> A solution to this issue was proposed, whereby Xen synchronises siblings
> >>>>>>> on vmexit/entry, so we are never executing code in two different
> >>>>>>> privilege levels.  Getting this working would make it safe to continue
> >>>>>>> using hyperthreading even in the presence of L1TF.  Obviously, its going
> >>>>>>> to come in perf hit, but compared to disabling hyperthreading, all its
> >>>>>>> got to do is beat a 60% perf hit to make it the preferable option for
> >>>>>>> making your system L1TF-proof.
> >>>>>> Could you shed some light what tests were done where that 60%
> >>>>>> performance hit was observed? We have performed intensive stress-tests
> >>>>>> to confirm this but according to our findings turning off
> >>>>>> hyper-threading is actually improving performance on all machines we
> >>>>>> tested thus far.
> >>>>> Aggregate inter and intra host disk and network throughput, which is a
> >>>>> reasonable approximation of a load of webserver VM's on a single
> >>>>> physical server.  Small packet IO was hit worst, as it has a very high
> >>>>> vcpu context switch rate between dom0 and domU.  Disabling HT means you
> >>>>> have half the number of logical cores to schedule on, which doubles the
> >>>>> mean time to next timeslice.
> >>>>>
> >>>>> In principle, for a fully optimised workload, HT gets you ~30% extra due
> >>>>> to increased utilisation of the pipeline functional units.  Some
> >>>>> resources are statically partitioned, while some are competitively
> >>>>> shared, and its now been well proven that actions on one thread can have
> >>>>> a large effect on others.
> >>>>>
> >>>>> Two arbitrary vcpus are not an optimised workload.  If the perf
> >>>>> improvement you get from not competing in the pipeline is greater than
> >>>>> the perf loss from Xen's reduced capability to schedule, then disabling
> >>>>> HT would be an improvement.  I can certainly believe that this might be
> >>>>> the case for Qubes style workloads where you are probably not very
> >>>>> overprovisioned, and you probably don't have long running IO and CPU
> >>>>> bound tasks in the VMs.
> >>>> As another data point, I think it was MSCI who said they always disabled
> >>>> hyperthreading, because they also found that their workloads ran slower
> >>>> with HT than without.  Presumably they were doing massive number
> >>>> crunching, such that each thread was waiting on the ALU a significant
> >>>> portion of the time anyway; at which point the superscalar scheduling
> >>>> and/or reduction in cache efficiency would have brought performance from
> >>>> "no benefit" down to "negative benefit".
> >>>>
> >>> Thanks for the insights. Indeed, we are primarily concerned with
> >>> performance of Qubes-style workloads which may range from
> >>> no-oversubscription to heavily oversubscribed. It's not a workload we
> >>> can predict or optimize before-hand, so we are looking for a default
> >>> that would be 1) safe and 2) performant in the most general case
> >>> possible.
> >> So long as you've got the XSA-273 patches, you should be able to park
> >> and re-reactivate hyperthreads using `xen-hptool cpu-{online,offline} $CPU`.
> >>
> >> You should be able to effectively change hyperthreading configuration at
> >> runtime.  It's not quite the same as changing it in the BIOS, but from a
> >> competition of pipeline resources, it should be good enough.
> >>
> > Thanks, indeed that is a handy tool to have. We often can't disable
> > hyperthreading in the BIOS anyway because most BIOS' don't allow you
> > to do that when TXT is used.
>
> Hmm - that's an odd restriction.  I don't immediately see why such a
> restriction would be necessary.
>
> > That said, with this tool we still
> > require some way to determine when to do parking/reactivation of
> > hyperthreads. We could certainly park hyperthreads when we see the
> > system is being oversubscribed in terms of number of vCPUs being
> > active, but for real optimization we would have to understand the
> > workloads running within the VMs if I understand correctly?
>
> TBH, I'd perhaps start with an admin control which lets them switch
> between the two modes, and some instructions on how/why they might want
> to try switching.
>
> Trying to second-guess the best HT setting automatically is most likely
> going to be a lost cause.  It will be system specific as to whether the
> same workload is better with or without HT.

This may just not be practically possible at the end as the system
administrator may have no idea what workload will be running on any
given system. It may also vary between one user to the next on the
same system, without the users being allowed to tune such details of
the system. If we can show that with core-scheduling deployed for most
workloads performance is improved by x % it may be a safe option. But
if every system needs to be tuned and evaluated in terms of its
eventual workload, that task becomes problematic. I appreciate the
insights though!

Tamas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 18:35               ` Tamas K Lengyel
@ 2018-10-25 18:39                 ` Andrew Cooper
  2018-10-26  7:49                 ` Dario Faggioli
  1 sibling, 0 replies; 63+ messages in thread
From: Andrew Cooper @ 2018-10-25 18:39 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, George Dunlap,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, JGross,
	sergey.dyasli, Wei Liu, George Dunlap, Xen-devel, mdontu, dwmw,
	Roger Pau Monné

On 25/10/18 19:35, Tamas K Lengyel wrote:
> On Thu, Oct 25, 2018 at 12:13 PM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
>> On 25/10/18 18:58, Tamas K Lengyel wrote:
>>> On Thu, Oct 25, 2018 at 11:43 AM Andrew Cooper
>>> <andrew.cooper3@citrix.com> wrote:
>>>> On 25/10/18 18:35, Tamas K Lengyel wrote:
>>>>> On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@citrix.com> wrote:
>>>>>> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
>>>>>>> On 24/10/18 16:24, Tamas K Lengyel wrote:
>>>>>>>>> A solution to this issue was proposed, whereby Xen synchronises siblings
>>>>>>>>> on vmexit/entry, so we are never executing code in two different
>>>>>>>>> privilege levels.  Getting this working would make it safe to continue
>>>>>>>>> using hyperthreading even in the presence of L1TF.  Obviously, its going
>>>>>>>>> to come in perf hit, but compared to disabling hyperthreading, all its
>>>>>>>>> got to do is beat a 60% perf hit to make it the preferable option for
>>>>>>>>> making your system L1TF-proof.
>>>>>>>> Could you shed some light what tests were done where that 60%
>>>>>>>> performance hit was observed? We have performed intensive stress-tests
>>>>>>>> to confirm this but according to our findings turning off
>>>>>>>> hyper-threading is actually improving performance on all machines we
>>>>>>>> tested thus far.
>>>>>>> Aggregate inter and intra host disk and network throughput, which is a
>>>>>>> reasonable approximation of a load of webserver VM's on a single
>>>>>>> physical server.  Small packet IO was hit worst, as it has a very high
>>>>>>> vcpu context switch rate between dom0 and domU.  Disabling HT means you
>>>>>>> have half the number of logical cores to schedule on, which doubles the
>>>>>>> mean time to next timeslice.
>>>>>>>
>>>>>>> In principle, for a fully optimised workload, HT gets you ~30% extra due
>>>>>>> to increased utilisation of the pipeline functional units.  Some
>>>>>>> resources are statically partitioned, while some are competitively
>>>>>>> shared, and its now been well proven that actions on one thread can have
>>>>>>> a large effect on others.
>>>>>>>
>>>>>>> Two arbitrary vcpus are not an optimised workload.  If the perf
>>>>>>> improvement you get from not competing in the pipeline is greater than
>>>>>>> the perf loss from Xen's reduced capability to schedule, then disabling
>>>>>>> HT would be an improvement.  I can certainly believe that this might be
>>>>>>> the case for Qubes style workloads where you are probably not very
>>>>>>> overprovisioned, and you probably don't have long running IO and CPU
>>>>>>> bound tasks in the VMs.
>>>>>> As another data point, I think it was MSCI who said they always disabled
>>>>>> hyperthreading, because they also found that their workloads ran slower
>>>>>> with HT than without.  Presumably they were doing massive number
>>>>>> crunching, such that each thread was waiting on the ALU a significant
>>>>>> portion of the time anyway; at which point the superscalar scheduling
>>>>>> and/or reduction in cache efficiency would have brought performance from
>>>>>> "no benefit" down to "negative benefit".
>>>>>>
>>>>> Thanks for the insights. Indeed, we are primarily concerned with
>>>>> performance of Qubes-style workloads which may range from
>>>>> no-oversubscription to heavily oversubscribed. It's not a workload we
>>>>> can predict or optimize before-hand, so we are looking for a default
>>>>> that would be 1) safe and 2) performant in the most general case
>>>>> possible.
>>>> So long as you've got the XSA-273 patches, you should be able to park
>>>> and re-reactivate hyperthreads using `xen-hptool cpu-{online,offline} $CPU`.
>>>>
>>>> You should be able to effectively change hyperthreading configuration at
>>>> runtime.  It's not quite the same as changing it in the BIOS, but from a
>>>> competition of pipeline resources, it should be good enough.
>>>>
>>> Thanks, indeed that is a handy tool to have. We often can't disable
>>> hyperthreading in the BIOS anyway because most BIOS' don't allow you
>>> to do that when TXT is used.
>> Hmm - that's an odd restriction.  I don't immediately see why such a
>> restriction would be necessary.
>>
>>> That said, with this tool we still
>>> require some way to determine when to do parking/reactivation of
>>> hyperthreads. We could certainly park hyperthreads when we see the
>>> system is being oversubscribed in terms of number of vCPUs being
>>> active, but for real optimization we would have to understand the
>>> workloads running within the VMs if I understand correctly?
>> TBH, I'd perhaps start with an admin control which lets them switch
>> between the two modes, and some instructions on how/why they might want
>> to try switching.
>>
>> Trying to second-guess the best HT setting automatically is most likely
>> going to be a lost cause.  It will be system specific as to whether the
>> same workload is better with or without HT.
> This may just not be practically possible at the end as the system
> administrator may have no idea what workload will be running on any
> given system. It may also vary between one user to the next on the
> same system, without the users being allowed to tune such details of
> the system. If we can show that with core-scheduling deployed for most
> workloads performance is improved by x % it may be a safe option. But
> if every system needs to be tuned and evaluated in terms of its
> eventual workload, that task becomes problematic. I appreciate the
> insights though!

To a first approximation, a superuser knob of "switch between single and
dual threaded mode" can be used by people to experiment as to which is
faster overall.

If it really is the case that disabling HT makes things faster, then
you've suddenly gained (almost-)core scheduling "for free" alongside
that perf improvement.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 17:29         ` Tamas K Lengyel
@ 2018-10-26  7:31           ` Dario Faggioli
  0 siblings, 0 replies; 63+ messages in thread
From: Dario Faggioli @ 2018-10-26  7:31 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Matt Wilson,
	Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu, George Dunlap,
	Andrew Cooper, Xen-devel, mdontu, dwmw, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 2312 bytes --]

On Thu, 2018-10-25 at 11:29 -0600, Tamas K Lengyel wrote:
> On Thu, Oct 25, 2018 at 11:23 AM Dario Faggioli <dfaggioli@suse.com>
> wrote:
> > 
> > Having _everyone_ wanting to do actual stuff on the CPUs is, IMO,
> > one
> > of the worst workloads for hyperthreading, and it is in fact a
> > workload
> > where I've always seen it having the least beneficial effect on
> > performance. I guess it's possible that, in your case, it's
> > actually
> > really doing more harm than good.
> > 
> > It's an interesting data point, but I wouldn't use a workload like
> > that
> > to measure the benefit, or the impact, of an SMT related change.
> 
> Thanks, and indeed this test is the worst-case scenario for
> hyperthreading, that's was our goal. While a typical work-load may
> not
> be similar, it is a possible one for the system we are concerned
> about. 
>
Sure, and that is fine. But at the same time, it is not much, if at
all, related with speculative execution, L1TF and coscheduling. It's
just that, with this workload, hyperthreading is bad, and not much more
to say.

> So if at any given time the benefit of hyperthreading ranges
> between say +30% and -30% and we can't predict the workload or
> optimize it, it is looking like a safe bet to just disable
> hyperthreading. Would you agree?
> 
That's, AFAICR, the OpenBSD's take, back at the time of when TLBLeed
came out. But, no, I don't really agree. Not entirely, at least.

The way I see it, is that there are special workloads where  SMT gives,
say, -30%, and those should just disable it, and be done.

For others, it's perfectly fine to keep it on, and we should, ideally,
find a solution to the security issues it introduces, without
nullifying the performance benefit it introduces.

And when it comes to judge how good, or bad, such solutions are, we
should consider both the best and the worst case scenarios, and I'd say
that the best case scenario is more important, as for the worst case,
one could just disable SMT, as said above.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 18:35               ` Tamas K Lengyel
  2018-10-25 18:39                 ` Andrew Cooper
@ 2018-10-26  7:49                 ` Dario Faggioli
  2018-10-26 12:01                   ` Tamas K Lengyel
  1 sibling, 1 reply; 63+ messages in thread
From: Dario Faggioli @ 2018-10-26  7:49 UTC (permalink / raw)
  To: Tamas K Lengyel, Andrew Cooper
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, George Dunlap,
	Matt Wilson, Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu,
	George Dunlap, Xen-devel, mdontu, dwmw, Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 2629 bytes --]

On Thu, 2018-10-25 at 12:35 -0600, Tamas K Lengyel wrote:
> On Thu, Oct 25, 2018 at 12:13 PM Andrew Cooper
> <andrew.cooper3@citrix.com> wrote:
> > 
> > TBH, I'd perhaps start with an admin control which lets them switch
> > between the two modes, and some instructions on how/why they might
> > want
> > to try switching.
> > 
> > Trying to second-guess the best HT setting automatically is most
> > likely
> > going to be a lost cause.  It will be system specific as to whether
> > the
> > same workload is better with or without HT.
> 
> This may just not be practically possible at the end as the system
> administrator may have no idea what workload will be running on any
> given system. It may also vary between one user to the next on the
> same system, without the users being allowed to tune such details of
> the system. If we can show that with core-scheduling deployed for
> most
> workloads performance is improved by x % it may be a safe option. 
>
I haven't done this kind of benchmark yet, but I'd say that, if every
vCPU of every domain is doing 100% CPU intensive work, core-scheduling
isn't going to make much difference, or help you much, as compared to
regular scheduling with hyperthreading enabled.

Actual numbers may vary depending on whether VMs have odd or even
number of vCPUs but, e.g., on hardware with 2 threads per core, and
using VMs with at least 2 vCPUs each, the _perfect_ implementation of
core-scheduling would still manage to keep all the *threads* busy,
which is --as far as our speculations currently go-- what is causing
the performance degradation you're seeing.

So, again, if it is confirmed that this workload of yours is a
particularly bad one for SMT, then you are just better off disabling
hyperthreading. And, no, I don't think such a situation is common
enough to say "let's disable for everyone by default".

> But
> if every system needs to be tuned and evaluated in terms of its
> eventual workload, that task becomes problematic.
>
So, the scheduler has the notion of the system load (at least, Credit2
does), and it is in theory possible to put together some heuristics for
basically stopping using hyperthreading, upon certain conditions.

This, however, I see it as something completely orthogonal from
security related consideration and from core-scheduling.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 16:29         ` Andrew Cooper
  2018-10-25 16:43           ` George Dunlap
@ 2018-10-26  9:16           ` Jan Beulich
  2018-10-26  9:28             ` Wei Liu
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2018-10-26  9:16 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson, george.dunlap,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

>>> On 25.10.18 at 18:29, <andrew.cooper3@citrix.com> wrote:
> A split xenheap model means that data pertaining to other guests isn't
> mapped in the context of this vcpu, so cannot be brought into the cache.

It was not clear to me from Wei's original mail that talk here is
about "split" in a sense of "per-domain"; I was assuming the
CONFIG_SEPARATE_XENHEAP mode instead.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26  9:16           ` Jan Beulich
@ 2018-10-26  9:28             ` Wei Liu
  2018-10-26  9:56               ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: Wei Liu @ 2018-10-26  9:28 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson, george.dunlap,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel List, Daniel Kiper, David Woodhouse

On Fri, Oct 26, 2018 at 03:16:15AM -0600, Jan Beulich wrote:
> >>> On 25.10.18 at 18:29, <andrew.cooper3@citrix.com> wrote:
> > A split xenheap model means that data pertaining to other guests isn't
> > mapped in the context of this vcpu, so cannot be brought into the cache.
> 
> It was not clear to me from Wei's original mail that talk here is
> about "split" in a sense of "per-domain"; I was assuming the
> CONFIG_SEPARATE_XENHEAP mode instead.

The split heap was indeed referring to CONFIG_SEPARATE_XENHEAP mode, yet
I what I wanted most is the partial direct map which reduces the amount
of data mapped inside Xen context -- the original idea was removing
direct map discussed during one of the calls IIRC. I thought making the
partial direct map mode work and make it as small as possible will get
us 90% there.

The "per-domain" heap is a different work item.

Wei.

> 
> Jan
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26  9:28             ` Wei Liu
@ 2018-10-26  9:56               ` Jan Beulich
  2018-10-26 10:51                 ` George Dunlap
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2018-10-26  9:56 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson, george.dunlap,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

>>> On 26.10.18 at 11:28, <wei.liu2@citrix.com> wrote:
> On Fri, Oct 26, 2018 at 03:16:15AM -0600, Jan Beulich wrote:
>> >>> On 25.10.18 at 18:29, <andrew.cooper3@citrix.com> wrote:
>> > A split xenheap model means that data pertaining to other guests isn't
>> > mapped in the context of this vcpu, so cannot be brought into the cache.
>> 
>> It was not clear to me from Wei's original mail that talk here is
>> about "split" in a sense of "per-domain"; I was assuming the
>> CONFIG_SEPARATE_XENHEAP mode instead.
> 
> The split heap was indeed referring to CONFIG_SEPARATE_XENHEAP mode, yet
> I what I wanted most is the partial direct map which reduces the amount
> of data mapped inside Xen context -- the original idea was removing
> direct map discussed during one of the calls IIRC. I thought making the
> partial direct map mode work and make it as small as possible will get
> us 90% there.
> 
> The "per-domain" heap is a different work item.

But if we mean to go that route, going (back) to the separate
Xen heap model seems just like an extra complication to me.
Yet I agree that this would remove the need for a fair chunk of
the direct map. Otoh a statically partitioned Xen heap would
bring back scalability issues which we had specifically meant to
get rid of by moving away from that model.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-25 18:13             ` Andrew Cooper
  2018-10-25 18:35               ` Tamas K Lengyel
@ 2018-10-26 10:11               ` George Dunlap
  1 sibling, 0 replies; 63+ messages in thread
From: George Dunlap @ 2018-10-26 10:11 UTC (permalink / raw)
  To: Andrew Cooper, Tamas K Lengyel
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, Dario Faggioli,
	Matt Wilson, Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu,
	George Dunlap, Xen-devel, mdontu, dwmw, Roger Pau Monné

On 10/25/2018 07:13 PM, Andrew Cooper wrote:
> On 25/10/18 18:58, Tamas K Lengyel wrote:
>> On Thu, Oct 25, 2018 at 11:43 AM Andrew Cooper
>> <andrew.cooper3@citrix.com> wrote:
>>> On 25/10/18 18:35, Tamas K Lengyel wrote:
>>>> On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@citrix.com> wrote:
>>>>> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
>>>>>> On 24/10/18 16:24, Tamas K Lengyel wrote:
>>>>>>>> A solution to this issue was proposed, whereby Xen synchronises siblings
>>>>>>>> on vmexit/entry, so we are never executing code in two different
>>>>>>>> privilege levels.  Getting this working would make it safe to continue
>>>>>>>> using hyperthreading even in the presence of L1TF.  Obviously, its going
>>>>>>>> to come in perf hit, but compared to disabling hyperthreading, all its
>>>>>>>> got to do is beat a 60% perf hit to make it the preferable option for
>>>>>>>> making your system L1TF-proof.
>>>>>>> Could you shed some light what tests were done where that 60%
>>>>>>> performance hit was observed? We have performed intensive stress-tests
>>>>>>> to confirm this but according to our findings turning off
>>>>>>> hyper-threading is actually improving performance on all machines we
>>>>>>> tested thus far.
>>>>>> Aggregate inter and intra host disk and network throughput, which is a
>>>>>> reasonable approximation of a load of webserver VM's on a single
>>>>>> physical server.  Small packet IO was hit worst, as it has a very high
>>>>>> vcpu context switch rate between dom0 and domU.  Disabling HT means you
>>>>>> have half the number of logical cores to schedule on, which doubles the
>>>>>> mean time to next timeslice.
>>>>>>
>>>>>> In principle, for a fully optimised workload, HT gets you ~30% extra due
>>>>>> to increased utilisation of the pipeline functional units.  Some
>>>>>> resources are statically partitioned, while some are competitively
>>>>>> shared, and its now been well proven that actions on one thread can have
>>>>>> a large effect on others.
>>>>>>
>>>>>> Two arbitrary vcpus are not an optimised workload.  If the perf
>>>>>> improvement you get from not competing in the pipeline is greater than
>>>>>> the perf loss from Xen's reduced capability to schedule, then disabling
>>>>>> HT would be an improvement.  I can certainly believe that this might be
>>>>>> the case for Qubes style workloads where you are probably not very
>>>>>> overprovisioned, and you probably don't have long running IO and CPU
>>>>>> bound tasks in the VMs.
>>>>> As another data point, I think it was MSCI who said they always disabled
>>>>> hyperthreading, because they also found that their workloads ran slower
>>>>> with HT than without.  Presumably they were doing massive number
>>>>> crunching, such that each thread was waiting on the ALU a significant
>>>>> portion of the time anyway; at which point the superscalar scheduling
>>>>> and/or reduction in cache efficiency would have brought performance from
>>>>> "no benefit" down to "negative benefit".
>>>>>
>>>> Thanks for the insights. Indeed, we are primarily concerned with
>>>> performance of Qubes-style workloads which may range from
>>>> no-oversubscription to heavily oversubscribed. It's not a workload we
>>>> can predict or optimize before-hand, so we are looking for a default
>>>> that would be 1) safe and 2) performant in the most general case
>>>> possible.
>>> So long as you've got the XSA-273 patches, you should be able to park
>>> and re-reactivate hyperthreads using `xen-hptool cpu-{online,offline} $CPU`.
>>>
>>> You should be able to effectively change hyperthreading configuration at
>>> runtime.  It's not quite the same as changing it in the BIOS, but from a
>>> competition of pipeline resources, it should be good enough.
>>>
>> Thanks, indeed that is a handy tool to have. We often can't disable
>> hyperthreading in the BIOS anyway because most BIOS' don't allow you
>> to do that when TXT is used.
> 
> Hmm - that's an odd restriction.  I don't immediately see why such a
> restriction would be necessary.
> 
>> That said, with this tool we still
>> require some way to determine when to do parking/reactivation of
>> hyperthreads. We could certainly park hyperthreads when we see the
>> system is being oversubscribed in terms of number of vCPUs being
>> active, but for real optimization we would have to understand the
>> workloads running within the VMs if I understand correctly?
> 
> TBH, I'd perhaps start with an admin control which lets them switch
> between the two modes, and some instructions on how/why they might want
> to try switching.
> 
> Trying to second-guess the best HT setting automatically is most likely
> going to be a lost cause.  It will be system specific as to whether the
> same workload is better with or without HT.

There may be hardware-specific performance counters that could be used
to detect when pathological cases are happening.  But that would need to
be implemented and/or re-verified on basically every new piece of hardware.

 -George




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26  9:56               ` Jan Beulich
@ 2018-10-26 10:51                 ` George Dunlap
  2018-10-26 11:20                   ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: George Dunlap @ 2018-10-26 10:51 UTC (permalink / raw)
  To: Jan Beulich, Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

On 10/26/2018 10:56 AM, Jan Beulich wrote:
>>>> On 26.10.18 at 11:28, <wei.liu2@citrix.com> wrote:
>> On Fri, Oct 26, 2018 at 03:16:15AM -0600, Jan Beulich wrote:
>>>>>> On 25.10.18 at 18:29, <andrew.cooper3@citrix.com> wrote:
>>>> A split xenheap model means that data pertaining to other guests isn't
>>>> mapped in the context of this vcpu, so cannot be brought into the cache.
>>>
>>> It was not clear to me from Wei's original mail that talk here is
>>> about "split" in a sense of "per-domain"; I was assuming the
>>> CONFIG_SEPARATE_XENHEAP mode instead.
>>
>> The split heap was indeed referring to CONFIG_SEPARATE_XENHEAP mode, yet
>> I what I wanted most is the partial direct map which reduces the amount
>> of data mapped inside Xen context -- the original idea was removing
>> direct map discussed during one of the calls IIRC. I thought making the
>> partial direct map mode work and make it as small as possible will get
>> us 90% there.
>>
>> The "per-domain" heap is a different work item.
> 
> But if we mean to go that route, going (back) to the separate
> Xen heap model seems just like an extra complication to me.
> Yet I agree that this would remove the need for a fair chunk of
> the direct map. Otoh a statically partitioned Xen heap would
> bring back scalability issues which we had specifically meant to
> get rid of by moving away from that model.

I think turning SEPARATE_XENHEAP back on would just be the first step.
We definitely would then need to sort things out so that it's scalable
again.

After system set-up, the key difference between xenheap and domheap
pages is that xenheap pages are assumed to be always mapped (i.e., you
can keep a pointer to them and it will be valid), whereas domheap pages
cannot assumed to be mapped, and need to be wrapped with
[un]map_domain_page().

The basic solution involves having a xenheap virtual address mapping
area not tied to the physical layout of the memory.  domheap and xenheap
memory would have to come from the same pool, but xenheap would need to
be mapped into the xenheap virtual memory region before being returned.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26 10:51                 ` George Dunlap
@ 2018-10-26 11:20                   ` Jan Beulich
  2018-10-26 11:24                     ` George Dunlap
  2018-12-11 18:05                     ` Wei Liu
  0 siblings, 2 replies; 63+ messages in thread
From: Jan Beulich @ 2018-10-26 11:20 UTC (permalink / raw)
  To: george.dunlap
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel List, Daniel Kiper, David Woodhouse

>>> On 26.10.18 at 12:51, <george.dunlap@citrix.com> wrote:
> On 10/26/2018 10:56 AM, Jan Beulich wrote:
>>>>> On 26.10.18 at 11:28, <wei.liu2@citrix.com> wrote:
>>> On Fri, Oct 26, 2018 at 03:16:15AM -0600, Jan Beulich wrote:
>>>>>>> On 25.10.18 at 18:29, <andrew.cooper3@citrix.com> wrote:
>>>>> A split xenheap model means that data pertaining to other guests isn't
>>>>> mapped in the context of this vcpu, so cannot be brought into the cache.
>>>>
>>>> It was not clear to me from Wei's original mail that talk here is
>>>> about "split" in a sense of "per-domain"; I was assuming the
>>>> CONFIG_SEPARATE_XENHEAP mode instead.
>>>
>>> The split heap was indeed referring to CONFIG_SEPARATE_XENHEAP mode, yet
>>> I what I wanted most is the partial direct map which reduces the amount
>>> of data mapped inside Xen context -- the original idea was removing
>>> direct map discussed during one of the calls IIRC. I thought making the
>>> partial direct map mode work and make it as small as possible will get
>>> us 90% there.
>>>
>>> The "per-domain" heap is a different work item.
>> 
>> But if we mean to go that route, going (back) to the separate
>> Xen heap model seems just like an extra complication to me.
>> Yet I agree that this would remove the need for a fair chunk of
>> the direct map. Otoh a statically partitioned Xen heap would
>> bring back scalability issues which we had specifically meant to
>> get rid of by moving away from that model.
> 
> I think turning SEPARATE_XENHEAP back on would just be the first step.
> We definitely would then need to sort things out so that it's scalable
> again.
> 
> After system set-up, the key difference between xenheap and domheap
> pages is that xenheap pages are assumed to be always mapped (i.e., you
> can keep a pointer to them and it will be valid), whereas domheap pages
> cannot assumed to be mapped, and need to be wrapped with
> [un]map_domain_page().
> 
> The basic solution involves having a xenheap virtual address mapping
> area not tied to the physical layout of the memory.  domheap and xenheap
> memory would have to come from the same pool, but xenheap would need to
> be mapped into the xenheap virtual memory region before being returned.

Wouldn't this most easily be done by making alloc_xenheap_pages()
call alloc_domheap_pages() and then vmap() the result? Of course
we may need to grow the vmap area in that case.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26 11:20                   ` Jan Beulich
@ 2018-10-26 11:24                     ` George Dunlap
  2018-10-26 11:33                       ` Jan Beulich
  2018-12-11 18:05                     ` Wei Liu
  1 sibling, 1 reply; 63+ messages in thread
From: George Dunlap @ 2018-10-26 11:24 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel List, Daniel Kiper, David Woodhouse

On 10/26/2018 12:20 PM, Jan Beulich wrote:
>>>> On 26.10.18 at 12:51, <george.dunlap@citrix.com> wrote:
>> On 10/26/2018 10:56 AM, Jan Beulich wrote:
>>>>>> On 26.10.18 at 11:28, <wei.liu2@citrix.com> wrote:
>>>> On Fri, Oct 26, 2018 at 03:16:15AM -0600, Jan Beulich wrote:
>>>>>>>> On 25.10.18 at 18:29, <andrew.cooper3@citrix.com> wrote:
>>>>>> A split xenheap model means that data pertaining to other guests isn't
>>>>>> mapped in the context of this vcpu, so cannot be brought into the cache.
>>>>>
>>>>> It was not clear to me from Wei's original mail that talk here is
>>>>> about "split" in a sense of "per-domain"; I was assuming the
>>>>> CONFIG_SEPARATE_XENHEAP mode instead.
>>>>
>>>> The split heap was indeed referring to CONFIG_SEPARATE_XENHEAP mode, yet
>>>> I what I wanted most is the partial direct map which reduces the amount
>>>> of data mapped inside Xen context -- the original idea was removing
>>>> direct map discussed during one of the calls IIRC. I thought making the
>>>> partial direct map mode work and make it as small as possible will get
>>>> us 90% there.
>>>>
>>>> The "per-domain" heap is a different work item.
>>>
>>> But if we mean to go that route, going (back) to the separate
>>> Xen heap model seems just like an extra complication to me.
>>> Yet I agree that this would remove the need for a fair chunk of
>>> the direct map. Otoh a statically partitioned Xen heap would
>>> bring back scalability issues which we had specifically meant to
>>> get rid of by moving away from that model.
>>
>> I think turning SEPARATE_XENHEAP back on would just be the first step.
>> We definitely would then need to sort things out so that it's scalable
>> again.
>>
>> After system set-up, the key difference between xenheap and domheap
>> pages is that xenheap pages are assumed to be always mapped (i.e., you
>> can keep a pointer to them and it will be valid), whereas domheap pages
>> cannot assumed to be mapped, and need to be wrapped with
>> [un]map_domain_page().
>>
>> The basic solution involves having a xenheap virtual address mapping
>> area not tied to the physical layout of the memory.  domheap and xenheap
>> memory would have to come from the same pool, but xenheap would need to
>> be mapped into the xenheap virtual memory region before being returned.
> 
> Wouldn't this most easily be done by making alloc_xenheap_pages()
> call alloc_domheap_pages() and then vmap() the result? Of course
> we may need to grow the vmap area in that case.

I couldn't answer that question without a lot more digging. :-)  I'd
always assumed that the reason for the original reason for having the
xenheap direct-mapped on 32-bit was something to do with early-boot
allocation; if there is something tricky there, we'd need to
special-case the early-boot allocation somehow.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26 11:24                     ` George Dunlap
@ 2018-10-26 11:33                       ` Jan Beulich
  2018-10-26 11:43                         ` George Dunlap
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2018-10-26 11:33 UTC (permalink / raw)
  To: george.dunlap
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel List, Daniel Kiper, David Woodhouse

>>> On 26.10.18 at 13:24, <george.dunlap@citrix.com> wrote:
> On 10/26/2018 12:20 PM, Jan Beulich wrote:
>>>>> On 26.10.18 at 12:51, <george.dunlap@citrix.com> wrote:
>>> The basic solution involves having a xenheap virtual address mapping
>>> area not tied to the physical layout of the memory.  domheap and xenheap
>>> memory would have to come from the same pool, but xenheap would need to
>>> be mapped into the xenheap virtual memory region before being returned.
>> 
>> Wouldn't this most easily be done by making alloc_xenheap_pages()
>> call alloc_domheap_pages() and then vmap() the result? Of course
>> we may need to grow the vmap area in that case.
> 
> I couldn't answer that question without a lot more digging. :-)  I'd
> always assumed that the reason for the original reason for having the
> xenheap direct-mapped on 32-bit was something to do with early-boot
> allocation; if there is something tricky there, we'd need to
> special-case the early-boot allocation somehow.

The reason for the split on 32-bit was simply the lack of sufficient
VA space.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26 11:33                       ` Jan Beulich
@ 2018-10-26 11:43                         ` George Dunlap
  2018-10-26 11:45                           ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: George Dunlap @ 2018-10-26 11:43 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel List, Daniel Kiper, David Woodhouse

On 10/26/2018 12:33 PM, Jan Beulich wrote:
>>>> On 26.10.18 at 13:24, <george.dunlap@citrix.com> wrote:
>> On 10/26/2018 12:20 PM, Jan Beulich wrote:
>>>>>> On 26.10.18 at 12:51, <george.dunlap@citrix.com> wrote:
>>>> The basic solution involves having a xenheap virtual address mapping
>>>> area not tied to the physical layout of the memory.  domheap and xenheap
>>>> memory would have to come from the same pool, but xenheap would need to
>>>> be mapped into the xenheap virtual memory region before being returned.
>>>
>>> Wouldn't this most easily be done by making alloc_xenheap_pages()
>>> call alloc_domheap_pages() and then vmap() the result? Of course
>>> we may need to grow the vmap area in that case.
>>
>> I couldn't answer that question without a lot more digging. :-)  I'd
>> always assumed that the reason for the original reason for having the
>> xenheap direct-mapped on 32-bit was something to do with early-boot
>> allocation; if there is something tricky there, we'd need to
>> special-case the early-boot allocation somehow.
> 
> The reason for the split on 32-bit was simply the lack of sufficient
> VA space.

That tells me why the domheap was *not* direct-mapped; but it doesn't
tell me why the xenheap *was*.  Was it perhaps just something that
evolved from what we inherited from Linux?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26 11:43                         ` George Dunlap
@ 2018-10-26 11:45                           ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2018-10-26 11:45 UTC (permalink / raw)
  To: george.dunlap
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel List, Daniel Kiper, David Woodhouse

>>> On 26.10.18 at 13:43, <george.dunlap@citrix.com> wrote:
> On 10/26/2018 12:33 PM, Jan Beulich wrote:
>>>>> On 26.10.18 at 13:24, <george.dunlap@citrix.com> wrote:
>>> On 10/26/2018 12:20 PM, Jan Beulich wrote:
>>>>>>> On 26.10.18 at 12:51, <george.dunlap@citrix.com> wrote:
>>>>> The basic solution involves having a xenheap virtual address mapping
>>>>> area not tied to the physical layout of the memory.  domheap and xenheap
>>>>> memory would have to come from the same pool, but xenheap would need to
>>>>> be mapped into the xenheap virtual memory region before being returned.
>>>>
>>>> Wouldn't this most easily be done by making alloc_xenheap_pages()
>>>> call alloc_domheap_pages() and then vmap() the result? Of course
>>>> we may need to grow the vmap area in that case.
>>>
>>> I couldn't answer that question without a lot more digging. :-)  I'd
>>> always assumed that the reason for the original reason for having the
>>> xenheap direct-mapped on 32-bit was something to do with early-boot
>>> allocation; if there is something tricky there, we'd need to
>>> special-case the early-boot allocation somehow.
>> 
>> The reason for the split on 32-bit was simply the lack of sufficient
>> VA space.
> 
> That tells me why the domheap was *not* direct-mapped; but it doesn't
> tell me why the xenheap *was*.  Was it perhaps just something that
> evolved from what we inherited from Linux?

Presumably, but there I'm really the wrong one to ask. When I joined,
things had long been that way.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26  7:49                 ` Dario Faggioli
@ 2018-10-26 12:01                   ` Tamas K Lengyel
  2018-10-26 14:17                     ` Dario Faggioli
  0 siblings, 1 reply; 63+ messages in thread
From: Tamas K Lengyel @ 2018-10-26 12:01 UTC (permalink / raw)
  To: Dario Faggioli
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, George Dunlap,
	Matt Wilson, Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu,
	George Dunlap, Andrew Cooper, Xen-devel, mdontu, dwmw,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 2794 bytes --]

On Fri, Oct 26, 2018, 1:49 AM Dario Faggioli <dfaggioli@suse.com> wrote:

> On Thu, 2018-10-25 at 12:35 -0600, Tamas K Lengyel wrote:
> > On Thu, Oct 25, 2018 at 12:13 PM Andrew Cooper
> > <andrew.cooper3@citrix.com> wrote:
> > >
> > > TBH, I'd perhaps start with an admin control which lets them switch
> > > between the two modes, and some instructions on how/why they might
> > > want
> > > to try switching.
> > >
> > > Trying to second-guess the best HT setting automatically is most
> > > likely
> > > going to be a lost cause.  It will be system specific as to whether
> > > the
> > > same workload is better with or without HT.
> >
> > This may just not be practically possible at the end as the system
> > administrator may have no idea what workload will be running on any
> > given system. It may also vary between one user to the next on the
> > same system, without the users being allowed to tune such details of
> > the system. If we can show that with core-scheduling deployed for
> > most
> > workloads performance is improved by x % it may be a safe option.
> >
> I haven't done this kind of benchmark yet, but I'd say that, if every
> vCPU of every domain is doing 100% CPU intensive work, core-scheduling
> isn't going to make much difference, or help you much, as compared to
> regular scheduling with hyperthreading enabled.
>

Understood, we actually went into the this with the assumption that in such
cases core-scheduling would underperform plain credit1. The idea was to
measure the worst case with plain scheduling and with core-scheduling to be
able to see the difference clearly between the two.


> Actual numbers may vary depending on whether VMs have odd or even
> number of vCPUs but, e.g., on hardware with 2 threads per core, and
> using VMs with at least 2 vCPUs each, the _perfect_ implementation of
> core-scheduling would still manage to keep all the *threads* busy,
> which is --as far as our speculations currently go-- what is causing
> the performance degradation you're seeing.
>
> So, again, if it is confirmed that this workload of yours is a
> particularly bad one for SMT, then you are just better off disabling
> hyperthreading. And, no, I don't think such a situation is common
> enough to say "let's disable for everyone by default".
>

I wasn't asking to make it the default in Xen but if we make it the default
for our deployment where such workloads are entirely possible, would that
be reasonable. Again, we don't know the workload and we can't predict it.
We were hoping to use core-scheduling eventually but it was not expected
that hyperthreading can cause such drops in performance. If there are tests
that I can run which are the "best case" for hyperthreading, I would like
to repeat those tests to see where we are.

Thanks,
Tamas

[-- Attachment #1.2: Type: text/html, Size: 3768 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26 12:01                   ` Tamas K Lengyel
@ 2018-10-26 14:17                     ` Dario Faggioli
  0 siblings, 0 replies; 63+ messages in thread
From: Dario Faggioli @ 2018-10-26 14:17 UTC (permalink / raw)
  To: Tamas K Lengyel
  Cc: mpohlack, Julien Grall, Jan Beulich, joao.m.martins,
	Stefano Stabellini, Daniel Kiper,
	Marek Marczykowski-Górecki, aliguori, uwed, Lars Kurth,
	Konrad Rzeszutek Wilk, ross.philipson, George Dunlap,
	Matt Wilson, Boris Ostrovsky, JGross, sergey.dyasli, Wei Liu,
	George Dunlap, Andrew Cooper, Xen-devel, mdontu, dwmw,
	Roger Pau Monné


[-- Attachment #1.1: Type: text/plain, Size: 3479 bytes --]

On Fri, 2018-10-26 at 06:01 -0600, Tamas K Lengyel wrote:
> On Fri, Oct 26, 2018, 1:49 AM Dario Faggioli <dfaggioli@suse.com>
> wrote:
> > 
> > I haven't done this kind of benchmark yet, but I'd say that, if
> > every
> > vCPU of every domain is doing 100% CPU intensive work, core-
> > scheduling
> > isn't going to make much difference, or help you much, as compared
> > to
> > regular scheduling with hyperthreading enabled.
> 
> Understood, we actually went into the this with the assumption that
> in such cases core-scheduling would underperform plain credit1. 
>
Which may actually happen. Or it might improve things a little, because
there are higher chances that a core only has 1 thread busy. But then
we're not really benchmarking core-scheduling vs. plain-scheduling,
we're benchmarking a side-effect of core-scheduling, which is not
equally interesting.

> The idea was to measure the worst case with plain scheduling and with
> core-scheduling to be able to see the difference clearly between the
> two.
> 
For the sake of benchmarking core-scheduling solutions, we should put
ourself in a position where what we measure is actually its own impact,
and I don't think this very workload put us there.

Then, of course, if this workload is relevant to you, you indeed have
the right and should benchmark and evaluate it, and we're always
interested in hearing what you find out. :-)

> > Actual numbers may vary depending on whether VMs have odd or even
> > number of vCPUs but, e.g., on hardware with 2 threads per core, and
> > using VMs with at least 2 vCPUs each, the _perfect_ implementation
> > of
> > core-scheduling would still manage to keep all the *threads* busy,
> > which is --as far as our speculations currently go-- what is
> > causing
> > the performance degradation you're seeing.
> > 
> > So, again, if it is confirmed that this workload of yours is a
> > particularly bad one for SMT, then you are just better off
> > disabling
> > hyperthreading. And, no, I don't think such a situation is common
> > enough to say "let's disable for everyone by default".
> 
> I wasn't asking to make it the default in Xen but if we make it the
> default for our deployment where such workloads are entirely
> possible, would that be reasonable. 
>
It all comes to how common a situation where you have a massively
oversubscribed system, with a fully CPU-bound workload, for significant
chunks of time.

As said in a previous email, I think that, if this is common enough,
and it is not something just transient, you'll are in trouble anyway.
And if it's not causing you/your customers troubles already, it might
not be that common, and hence it wouldn't be necessary/wise to disable
SMT.

But of course, you know your workload, and your requirements, much more
than me. If this kind of load really is what you experience, or what
you want to target, then yes, apparently disabling SMT is your best way
to go.

> If there are
> tests that I can run which are the "best case" for hyperthreading, I
> would like to repeat those tests to see where we are.
> 
If we come up with a good enough synthetic benchmark, I'll let you
know.

Regards,
Dario
-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/

[-- Attachment #1.2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-18 17:46 Ongoing/future speculative mitigation work Andrew Cooper
                   ` (2 preceding siblings ...)
  2018-10-24 15:24 ` Tamas K Lengyel
@ 2018-12-07 18:40 ` Wei Liu
  2018-12-10 12:12   ` George Dunlap
  2019-01-24 11:44 ` Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work) Wei Liu
  4 siblings, 1 reply; 63+ messages in thread
From: Wei Liu @ 2018-12-07 18:40 UTC (permalink / raw)
  To: Andrew Cooper
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Dannowski, Uwe, Lars Kurth, Konrad Wilk,
	Ross Philipson, Dario Faggioli, Matt Wilson, Boris Ostrovsky,
	Juergen Gross, Sergey Dyasli, Wei Liu, George Dunlap,
	Xen-devel List, Mihai Donțu, Woodhouse, David

On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
> Hello,
> 
> This is an accumulation and summary of various tasks which have been
> discussed since the revelation of the speculative security issues in
> January, and also an invitation to discuss alternative ideas.  They are
> x86 specific, but a lot of the principles are architecture-agnostic.
> 
> 1) A secrets-free hypervisor.
> 
> Basically every hypercall can be (ab)used by a guest, and used as an
> arbitrary cache-load gadget.  Logically, this is the first half of a
> Spectre SP1 gadget, and is usually the first stepping stone to
> exploiting one of the speculative sidechannels.
> 
> Short of compiling Xen with LLVM's Speculative Load Hardening (which is
> still experimental, and comes with a ~30% perf hit in the common case),
> this is unavoidable.  Furthermore, throwing a few array_index_nospec()
> into the code isn't a viable solution to the problem.
> 
> An alternative option is to have less data mapped into Xen's virtual
> address space - if a piece of memory isn't mapped, it can't be loaded
> into the cache.
> 
> An easy first step here is to remove Xen's directmap, which will mean
> that guests general RAM isn't mapped by default into Xen's address
> space.  This will come with some performance hit, as the
> map_domain_page() infrastructure will now have to actually
> create/destroy mappings, but removing the directmap will cause an
> improvement for non-speculative security as well (No possibility of
> ret2dir as an exploit technique).
> 
> Beyond the directmap, there are plenty of other interesting secrets in
> the Xen heap and other mappings, such as the stacks of the other pcpus. 
> Fixing this requires moving Xen to having a non-uniform memory layout,
> and this is much harder to change.  I already experimented with this as
> a meltdown mitigation around about a year ago, and posted the resulting
> series on Jan 4th,
> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html,
> some trivial bits of which have already found their way upstream.
> 
> To have a non-uniform memory layout, Xen may not share L4 pagetables. 
> i.e. Xen must never have two pcpus which reference the same pagetable in
> %cr3.
> 
> This property already holds for 32bit PV guests, and all HVM guests, but
> 64bit PV guests are the sticking point.  Because Linux has a flat memory
> layout, when a 64bit PV guest schedules two threads from the same
> process on separate vcpus, those two vcpus have the same virtual %cr3,
> and currently, Xen programs the same real %cr3 into hardware.
> 
> If we want Xen to have a non-uniform layout, are two options are:
> * Fix Linux to have the same non-uniform layout that Xen wants
> (Backwards compatibility for older 64bit PV guests can be achieved with
> xen-shim).
> * Make use XPTI algorithm (specifically, the pagetable sync/copy part)
> forever more in the future.
> 
> Option 2 isn't great (especially for perf on fixed hardware), but does
> keep all the necessary changes in Xen.  Option 1 looks to be the better
> option longterm.
> 
> As an interesting point to note.  The 32bit PV ABI prohibits sharing of
> L3 pagetables, because back in the 32bit hypervisor days, we used to
> have linear mappings in the Xen virtual range.  This check is stale
> (from a functionality point of view), but still present in Xen.  A
> consequence of this is that 32bit PV guests definitely don't share
> top-level pagetables across vcpus.

Correction: 32bit PV ABI prohibits sharing of L2 pagetables, but L3
pagetables can be shared. So guests will schedule the same top-level
pagetables across vcpus. 

But, 64bit Xen creates a monitor table for 32bit PAE guest and put the
CR3 provided by guest to the first slot, so pcpus don't share the same
L4 pagetables. The property we want still holds.

> 
> Juergen/Boris: Do you have any idea if/how easy this infrastructure
> would be to implement for 64bit PV guests as well?  If a PV guest can
> advertise via Elfnote that it won't share top-level pagetables, then we
> can audit this trivially in Xen.
> 

After reading Linux kernel code, I think it is not going to be trivial.
As now threads in Linux share one pagetable (as it should be).

In order to make each thread has its own pagetable while still maintain
the illusion of one address space, there needs to be synchronisation
under the hood.

There is code in Linux to synchronise vmalloc, but that's only for the
kernel portion. The infrastructure to synchronise userspace portion is
missing.

One idea is to follow the same model as vmalloc -- maintain a reference
pagetable in struct mm and a list of pagetables for threads, then
synchronise the pagetables in the page fault handler. But this is
probably a bit hard to sell to Linux maintainers because it will touch a
lot of the non-Xen code, increase complexity and decrease performance.

Thoughts?

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-12-07 18:40 ` Wei Liu
@ 2018-12-10 12:12   ` George Dunlap
  2018-12-10 12:19     ` George Dunlap
  0 siblings, 1 reply; 63+ messages in thread
From: George Dunlap @ 2018-12-10 12:12 UTC (permalink / raw)
  To: Wei Liu, Andrew Cooper
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Dannowski, Uwe, Lars Kurth, Konrad Wilk,
	Ross Philipson, Dario Faggioli, Matt Wilson, Boris Ostrovsky,
	Juergen Gross, Sergey Dyasli, George Dunlap, Xen-devel List,
	Mihai Donțu, Woodhouse, David, Roger Pau Monne

On 12/7/18 6:40 PM, Wei Liu wrote:
> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>> Hello,
>>
>> This is an accumulation and summary of various tasks which have been
>> discussed since the revelation of the speculative security issues in
>> January, and also an invitation to discuss alternative ideas.  They are
>> x86 specific, but a lot of the principles are architecture-agnostic.
>>
>> 1) A secrets-free hypervisor.
>>
>> Basically every hypercall can be (ab)used by a guest, and used as an
>> arbitrary cache-load gadget.  Logically, this is the first half of a
>> Spectre SP1 gadget, and is usually the first stepping stone to
>> exploiting one of the speculative sidechannels.
>>
>> Short of compiling Xen with LLVM's Speculative Load Hardening (which is
>> still experimental, and comes with a ~30% perf hit in the common case),
>> this is unavoidable.  Furthermore, throwing a few array_index_nospec()
>> into the code isn't a viable solution to the problem.
>>
>> An alternative option is to have less data mapped into Xen's virtual
>> address space - if a piece of memory isn't mapped, it can't be loaded
>> into the cache.
>>
>> An easy first step here is to remove Xen's directmap, which will mean
>> that guests general RAM isn't mapped by default into Xen's address
>> space.  This will come with some performance hit, as the
>> map_domain_page() infrastructure will now have to actually
>> create/destroy mappings, but removing the directmap will cause an
>> improvement for non-speculative security as well (No possibility of
>> ret2dir as an exploit technique).
>>
>> Beyond the directmap, there are plenty of other interesting secrets in
>> the Xen heap and other mappings, such as the stacks of the other pcpus. 
>> Fixing this requires moving Xen to having a non-uniform memory layout,
>> and this is much harder to change.  I already experimented with this as
>> a meltdown mitigation around about a year ago, and posted the resulting
>> series on Jan 4th,
>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html,
>> some trivial bits of which have already found their way upstream.
>>
>> To have a non-uniform memory layout, Xen may not share L4 pagetables. 
>> i.e. Xen must never have two pcpus which reference the same pagetable in
>> %cr3.
>>
>> This property already holds for 32bit PV guests, and all HVM guests, but
>> 64bit PV guests are the sticking point.  Because Linux has a flat memory
>> layout, when a 64bit PV guest schedules two threads from the same
>> process on separate vcpus, those two vcpus have the same virtual %cr3,
>> and currently, Xen programs the same real %cr3 into hardware.
>>
>> If we want Xen to have a non-uniform layout, are two options are:
>> * Fix Linux to have the same non-uniform layout that Xen wants
>> (Backwards compatibility for older 64bit PV guests can be achieved with
>> xen-shim).
>> * Make use XPTI algorithm (specifically, the pagetable sync/copy part)
>> forever more in the future.
>>
>> Option 2 isn't great (especially for perf on fixed hardware), but does
>> keep all the necessary changes in Xen.  Option 1 looks to be the better
>> option longterm.
>>
>> As an interesting point to note.  The 32bit PV ABI prohibits sharing of
>> L3 pagetables, because back in the 32bit hypervisor days, we used to
>> have linear mappings in the Xen virtual range.  This check is stale
>> (from a functionality point of view), but still present in Xen.  A
>> consequence of this is that 32bit PV guests definitely don't share
>> top-level pagetables across vcpus.
> 
> Correction: 32bit PV ABI prohibits sharing of L2 pagetables, but L3
> pagetables can be shared. So guests will schedule the same top-level
> pagetables across vcpus. >
> But, 64bit Xen creates a monitor table for 32bit PAE guest and put the
> CR3 provided by guest to the first slot, so pcpus don't share the same
> L4 pagetables. The property we want still holds.

Ah, right -- but Xen can get away with this because in PAE mode, "L3" is
just 4 entries that are loaded on CR3-switch and not automatically kept
in sync by the hardware; i.e., the OS already needs to do its own
"manual syncing" if it updates any of the L3 entires; so it's the same
for Xen.

>> Juergen/Boris: Do you have any idea if/how easy this infrastructure
>> would be to implement for 64bit PV guests as well?  If a PV guest can
>> advertise via Elfnote that it won't share top-level pagetables, then we
>> can audit this trivially in Xen.
>>
> 
> After reading Linux kernel code, I think it is not going to be trivial.
> As now threads in Linux share one pagetable (as it should be).
> 
> In order to make each thread has its own pagetable while still maintain
> the illusion of one address space, there needs to be synchronisation
> under the hood.
> 
> There is code in Linux to synchronise vmalloc, but that's only for the
> kernel portion. The infrastructure to synchronise userspace portion is
> missing.
> 
> One idea is to follow the same model as vmalloc -- maintain a reference
> pagetable in struct mm and a list of pagetables for threads, then
> synchronise the pagetables in the page fault handler. But this is
> probably a bit hard to sell to Linux maintainers because it will touch a
> lot of the non-Xen code, increase complexity and decrease performance.

Sorry -- what do you mean "synchronize vmalloc"?  If every thread has a
different view of the kernel's vmalloc area, then every thread must have
a different L4 table, right?  And if every thread has a different L4
table, then we've already got the main thing we need from Linux, don't we?

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-12-10 12:12   ` George Dunlap
@ 2018-12-10 12:19     ` George Dunlap
  0 siblings, 0 replies; 63+ messages in thread
From: George Dunlap @ 2018-12-10 12:19 UTC (permalink / raw)
  To: Wei Liu, Andrew Cooper
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Dannowski, Uwe, Lars Kurth, Konrad Wilk,
	Ross Philipson, Dario Faggioli, Matt Wilson, Boris Ostrovsky,
	Juergen Gross, Sergey Dyasli, George Dunlap, Xen-devel List,
	Mihai Donțu, Woodhouse, David, Roger Pau Monne

On 12/10/18 12:12 PM, George Dunlap wrote:
> On 12/7/18 6:40 PM, Wei Liu wrote:
>> On Thu, Oct 18, 2018 at 06:46:22PM +0100, Andrew Cooper wrote:
>>> Hello,
>>>
>>> This is an accumulation and summary of various tasks which have been
>>> discussed since the revelation of the speculative security issues in
>>> January, and also an invitation to discuss alternative ideas.  They are
>>> x86 specific, but a lot of the principles are architecture-agnostic.
>>>
>>> 1) A secrets-free hypervisor.
>>>
>>> Basically every hypercall can be (ab)used by a guest, and used as an
>>> arbitrary cache-load gadget.  Logically, this is the first half of a
>>> Spectre SP1 gadget, and is usually the first stepping stone to
>>> exploiting one of the speculative sidechannels.
>>>
>>> Short of compiling Xen with LLVM's Speculative Load Hardening (which is
>>> still experimental, and comes with a ~30% perf hit in the common case),
>>> this is unavoidable.  Furthermore, throwing a few array_index_nospec()
>>> into the code isn't a viable solution to the problem.
>>>
>>> An alternative option is to have less data mapped into Xen's virtual
>>> address space - if a piece of memory isn't mapped, it can't be loaded
>>> into the cache.
>>>
>>> An easy first step here is to remove Xen's directmap, which will mean
>>> that guests general RAM isn't mapped by default into Xen's address
>>> space.  This will come with some performance hit, as the
>>> map_domain_page() infrastructure will now have to actually
>>> create/destroy mappings, but removing the directmap will cause an
>>> improvement for non-speculative security as well (No possibility of
>>> ret2dir as an exploit technique).
>>>
>>> Beyond the directmap, there are plenty of other interesting secrets in
>>> the Xen heap and other mappings, such as the stacks of the other pcpus. 
>>> Fixing this requires moving Xen to having a non-uniform memory layout,
>>> and this is much harder to change.  I already experimented with this as
>>> a meltdown mitigation around about a year ago, and posted the resulting
>>> series on Jan 4th,
>>> https://lists.xenproject.org/archives/html/xen-devel/2018-01/msg00274.html,
>>> some trivial bits of which have already found their way upstream.
>>>
>>> To have a non-uniform memory layout, Xen may not share L4 pagetables. 
>>> i.e. Xen must never have two pcpus which reference the same pagetable in
>>> %cr3.
>>>
>>> This property already holds for 32bit PV guests, and all HVM guests, but
>>> 64bit PV guests are the sticking point.  Because Linux has a flat memory
>>> layout, when a 64bit PV guest schedules two threads from the same
>>> process on separate vcpus, those two vcpus have the same virtual %cr3,
>>> and currently, Xen programs the same real %cr3 into hardware.
>>>
>>> If we want Xen to have a non-uniform layout, are two options are:
>>> * Fix Linux to have the same non-uniform layout that Xen wants
>>> (Backwards compatibility for older 64bit PV guests can be achieved with
>>> xen-shim).
>>> * Make use XPTI algorithm (specifically, the pagetable sync/copy part)
>>> forever more in the future.
>>>
>>> Option 2 isn't great (especially for perf on fixed hardware), but does
>>> keep all the necessary changes in Xen.  Option 1 looks to be the better
>>> option longterm.
>>>
>>> As an interesting point to note.  The 32bit PV ABI prohibits sharing of
>>> L3 pagetables, because back in the 32bit hypervisor days, we used to
>>> have linear mappings in the Xen virtual range.  This check is stale
>>> (from a functionality point of view), but still present in Xen.  A
>>> consequence of this is that 32bit PV guests definitely don't share
>>> top-level pagetables across vcpus.
>>
>> Correction: 32bit PV ABI prohibits sharing of L2 pagetables, but L3
>> pagetables can be shared. So guests will schedule the same top-level
>> pagetables across vcpus. >
>> But, 64bit Xen creates a monitor table for 32bit PAE guest and put the
>> CR3 provided by guest to the first slot, so pcpus don't share the same
>> L4 pagetables. The property we want still holds.
> 
> Ah, right -- but Xen can get away with this because in PAE mode, "L3" is
> just 4 entries that are loaded on CR3-switch and not automatically kept
> in sync by the hardware; i.e., the OS already needs to do its own
> "manual syncing" if it updates any of the L3 entires; so it's the same
> for Xen.
> 
>>> Juergen/Boris: Do you have any idea if/how easy this infrastructure
>>> would be to implement for 64bit PV guests as well?  If a PV guest can
>>> advertise via Elfnote that it won't share top-level pagetables, then we
>>> can audit this trivially in Xen.
>>>
>>
>> After reading Linux kernel code, I think it is not going to be trivial.
>> As now threads in Linux share one pagetable (as it should be).
>>
>> In order to make each thread has its own pagetable while still maintain
>> the illusion of one address space, there needs to be synchronisation
>> under the hood.
>>
>> There is code in Linux to synchronise vmalloc, but that's only for the
>> kernel portion. The infrastructure to synchronise userspace portion is
>> missing.
>>
>> One idea is to follow the same model as vmalloc -- maintain a reference
>> pagetable in struct mm and a list of pagetables for threads, then
>> synchronise the pagetables in the page fault handler. But this is
>> probably a bit hard to sell to Linux maintainers because it will touch a
>> lot of the non-Xen code, increase complexity and decrease performance.
> 
> Sorry -- what do you mean "synchronize vmalloc"?  If every thread has a
> different view of the kernel's vmalloc area, then every thread must have
> a different L4 table, right?  And if every thread has a different L4
> table, then we've already got the main thing we need from Linux, don't we?

Just had an IRL chat with Wei:  The syncronization he was talking about
was a syncronization *of the kernel space* *between procesess*.  What we
would need in Linux is a synchronization *of userspace* *between
threads*.  So the same basic idea is there, but it would require a
reasomable amount of extra extension work.

Since the work that would need to be done in Linux is exactly the same
work that we'd need to do in Xen, I think the Linux maintainers would be
pretty annoyed if we asked them to do it instead of doing it ourselves.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
  2018-10-26 11:20                   ` Jan Beulich
  2018-10-26 11:24                     ` George Dunlap
@ 2018-12-11 18:05                     ` Wei Liu
       [not found]                       ` <FB70ABC00200007CA293CED3@prv1-mh.provo.novell.com>
  1 sibling, 1 reply; 63+ messages in thread
From: Wei Liu @ 2018-12-11 18:05 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson, george.dunlap,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Xen-devel List, Daniel Kiper, David Woodhouse

On Fri, Oct 26, 2018 at 05:20:47AM -0600, Jan Beulich wrote:
> >>> On 26.10.18 at 12:51, <george.dunlap@citrix.com> wrote:
> > On 10/26/2018 10:56 AM, Jan Beulich wrote:
> >>>>> On 26.10.18 at 11:28, <wei.liu2@citrix.com> wrote:
> >>> On Fri, Oct 26, 2018 at 03:16:15AM -0600, Jan Beulich wrote:
> >>>>>>> On 25.10.18 at 18:29, <andrew.cooper3@citrix.com> wrote:
> >>>>> A split xenheap model means that data pertaining to other guests isn't
> >>>>> mapped in the context of this vcpu, so cannot be brought into the cache.
> >>>>
> >>>> It was not clear to me from Wei's original mail that talk here is
> >>>> about "split" in a sense of "per-domain"; I was assuming the
> >>>> CONFIG_SEPARATE_XENHEAP mode instead.
> >>>
> >>> The split heap was indeed referring to CONFIG_SEPARATE_XENHEAP mode, yet
> >>> I what I wanted most is the partial direct map which reduces the amount
> >>> of data mapped inside Xen context -- the original idea was removing
> >>> direct map discussed during one of the calls IIRC. I thought making the
> >>> partial direct map mode work and make it as small as possible will get
> >>> us 90% there.
> >>>
> >>> The "per-domain" heap is a different work item.
> >> 
> >> But if we mean to go that route, going (back) to the separate
> >> Xen heap model seems just like an extra complication to me.
> >> Yet I agree that this would remove the need for a fair chunk of
> >> the direct map. Otoh a statically partitioned Xen heap would
> >> bring back scalability issues which we had specifically meant to
> >> get rid of by moving away from that model.
> > 
> > I think turning SEPARATE_XENHEAP back on would just be the first step.
> > We definitely would then need to sort things out so that it's scalable
> > again.
> > 
> > After system set-up, the key difference between xenheap and domheap
> > pages is that xenheap pages are assumed to be always mapped (i.e., you
> > can keep a pointer to them and it will be valid), whereas domheap pages
> > cannot assumed to be mapped, and need to be wrapped with
> > [un]map_domain_page().
> > 
> > The basic solution involves having a xenheap virtual address mapping
> > area not tied to the physical layout of the memory.  domheap and xenheap
> > memory would have to come from the same pool, but xenheap would need to
> > be mapped into the xenheap virtual memory region before being returned.
> 
> Wouldn't this most easily be done by making alloc_xenheap_pages()
> call alloc_domheap_pages() and then vmap() the result? Of course
> we may need to grow the vmap area in that case.

The existing vmap area is 64GB, but that should be big enough for Xen?

If that's not big enough, we need to move that area to a different
location, because it can't expand to either side of the address space.

Wei.

> 
> Jan
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Ongoing/future speculative mitigation work
       [not found]                       ` <FB70ABC00200007CA293CED3@prv1-mh.provo.novell.com>
@ 2018-12-12  8:32                         ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2018-12-12  8:32 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, uwed,
	Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson, george.dunlap,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Xen-devel List,
	Daniel Kiper, David Woodhouse, Roger Pau Monne

>>> On 11.12.18 at 19:05, <wei.liu2@citrix.com> wrote:
> On Fri, Oct 26, 2018 at 05:20:47AM -0600, Jan Beulich wrote:
>> >>> On 26.10.18 at 12:51, <george.dunlap@citrix.com> wrote:
>> > The basic solution involves having a xenheap virtual address mapping
>> > area not tied to the physical layout of the memory.  domheap and xenheap
>> > memory would have to come from the same pool, but xenheap would need to
>> > be mapped into the xenheap virtual memory region before being returned.
>> 
>> Wouldn't this most easily be done by making alloc_xenheap_pages()
>> call alloc_domheap_pages() and then vmap() the result? Of course
>> we may need to grow the vmap area in that case.
> 
> The existing vmap area is 64GB, but that should be big enough for Xen?

In the common case perhaps. But what about extreme cases, like
very many VMs on multi-Tb hosts?

> If that's not big enough, we need to move that area to a different
> location, because it can't expand to either side of the address space.

When the directmap goes away, ample address space gets freed
up.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2018-10-18 17:46 Ongoing/future speculative mitigation work Andrew Cooper
                   ` (3 preceding siblings ...)
  2018-12-07 18:40 ` Wei Liu
@ 2019-01-24 11:44 ` Wei Liu
  2019-01-24 16:00   ` George Dunlap
                     ` (2 more replies)
  4 siblings, 3 replies; 63+ messages in thread
From: Wei Liu @ 2019-01-24 11:44 UTC (permalink / raw)
  To: Xen-devel
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Dannowski, Uwe, Lars Kurth, Konrad Wilk,
	Ross Philipson, Dario Faggioli, Matt Wilson, Boris Ostrovsky,
	Juergen Gross, Sergey Dyasli, Wei Liu, George Dunlap,
	Andrew Cooper, Mihai Donțu, Woodhouse, David

Below is a summary for a discussion on this topic between Jan and me.

End goal: reduce the size of direct map or remove it completely

Constraints:

1. We want unified xenheap and domheap.
2. We want to preserve xenheap's semantics -- always globally mapped
   so that pointers can be stashed safely.
3. Performance shouldn't be heavily impacted -- to be tested.

Things need to be done:

1. Remove map domain page infra's dependency on direct map

Currently map_domain_page uses either direct map or per-domain slot.
Direct map is used when Xen runs in EFI context or before any domain is
constructed, or when in non-debug build as a fast path.

In the new world when there isn't a direct map, we have at least two
options to set aside address space to map_domain_page:

  1.1. Implement percpu infrastructure
  1.2. Statically allocate address space to each CPU

1.1 requires:

  1.1.1 Carve out some address space
  1.1.2 Adjust early setup code
  1.1.3 Use fixmap to bootstrap percpu region
  1.1.4 Change context-switching code

1.2 is a bit simpler given that it doesn't require changing
context-switching code. Yet it is also less flexible once we decide we
want to move more stuff into the percpu region.

2. Remove map_domain_page_global's dependency on direct map

This is easy.

3. Implement xenheap using vmap infrastructure

This helps preserve xenheap's "always mapped" property. At the moment,
vmap relies on xenheap, we want to turn this relationship around.

There is a loop what needs breaking in the new world:

  alloc_xenheap_pages -> vmap -> __vmap -> map_pages_to_xen ->
    virt_to_xen_l1e -> alloc_xen_pagetable -> alloc_xenheap_page -> vmap ...

Two options were proposed to break this loop:

  3.1 Pre-populate all page tables for vmap region
  3.2 Switch page table allocation to use domheap page

3.1 is wasteful since we expect vmap to grow in the future. 3.2 requires
a lot of code churn -- the assumption up until now is xen's page tables
are always mapped, and a lot of code and APIs is designed based on that.

We think that 3.2 is a worthwhile thing to do anyway. This work just
gives us a good excuse to do it.

The other work item is to track page<->virt relationship so that
conversion functions (_to_virt etc) continue to work. For PoC purpose,
putting a void * into page_info is good enough. But in the future we
want to have a separate array for tracking so that page_info stays power
of two in size.

(The other option for the conversion functions is to purge them all.
That's a lot of code churn and, more importantly, touches common code.
So that idea was discarded.)

4. Remove or reduce direct map

This is a matter of changing some constants. It is conceptually easy but
I expect quite a bit of bug fixing is needed.

Tl;dr

I have broken down this project into several sub-projects and recorded
their relationship starting from XEN-119.

  https://xenproject.atlassian.net/browse/XEN-119

Comments are welcome.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-01-24 11:44 ` Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work) Wei Liu
@ 2019-01-24 16:00   ` George Dunlap
  2019-02-07 16:50   ` Wei Liu
  2019-02-20 12:29   ` Wei Liu
  2 siblings, 0 replies; 63+ messages in thread
From: George Dunlap @ 2019-01-24 16:00 UTC (permalink / raw)
  To: Wei Liu, Xen-devel
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Dannowski, Uwe, Lars Kurth, Konrad Wilk,
	Ross Philipson, Dario Faggioli, Matt Wilson, Boris Ostrovsky,
	Juergen Gross, Sergey Dyasli, George Dunlap, Andrew Cooper,
	Mihai Donțu, Woodhouse, David, Roger Pau Monne

On 1/24/19 11:44 AM, Wei Liu wrote:
> Below is a summary for a discussion on this topic between Jan and me.

I've skimmed this over and it looks reasonable.  Thanks for doing this.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-01-24 11:44 ` Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work) Wei Liu
  2019-01-24 16:00   ` George Dunlap
@ 2019-02-07 16:50   ` Wei Liu
  2019-02-20 12:29   ` Wei Liu
  2 siblings, 0 replies; 63+ messages in thread
From: Wei Liu @ 2019-02-07 16:50 UTC (permalink / raw)
  To: Xen-devel
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Dannowski, Uwe, Lars Kurth, Konrad Wilk,
	Ross Philipson, Dario Faggioli, Matt Wilson, Boris Ostrovsky,
	Juergen Gross, Sergey Dyasli, Wei Liu, George Dunlap,
	Andrew Cooper, Mihai Donțu, Woodhouse, David

On Thu, Jan 24, 2019 at 11:44:55AM +0000, Wei Liu wrote:
[...]
>   3.2 Switch page table allocation to use domheap page
> 
> We think that 3.2 is a worthwhile thing to do anyway. This work just
> gives us a good excuse to do it.

I just posted a patch series for this work item.

See [PATCH RFC 00/55] x86: use domheap page for xen page tables

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-01-24 11:44 ` Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work) Wei Liu
  2019-01-24 16:00   ` George Dunlap
  2019-02-07 16:50   ` Wei Liu
@ 2019-02-20 12:29   ` Wei Liu
  2019-02-20 13:00     ` Roger Pau Monné
  2 siblings, 1 reply; 63+ messages in thread
From: Wei Liu @ 2019-02-20 12:29 UTC (permalink / raw)
  To: Xen-devel
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Dannowski, Uwe, Lars Kurth, Konrad Wilk,
	Ross Philipson, Dario Faggioli, Matt Wilson, Boris Ostrovsky,
	Juergen Gross, Sergey Dyasli, Wei Liu, George Dunlap,
	Andrew Cooper, Mihai Donțu, Woodhouse, David

On Thu, Jan 24, 2019 at 11:44:55AM +0000, Wei Liu wrote:
> 3. Implement xenheap using vmap infrastructure
> 
> This helps preserve xenheap's "always mapped" property. At the moment,
> vmap relies on xenheap, we want to turn this relationship around.
> 
> There is a loop what needs breaking in the new world:
> 
>   alloc_xenheap_pages -> vmap -> __vmap -> map_pages_to_xen ->
>     virt_to_xen_l1e -> alloc_xen_pagetable -> alloc_xenheap_page -> vmap ...
> 
> Two options were proposed to break this loop:
> 
>   3.1 Pre-populate all page tables for vmap region

Now that we have this ...

>   3.2 Switch page table allocation to use domheap page
> 
> 
> The other work item is to track page<->virt relationship so that
> conversion functions (_to_virt etc) continue to work. For PoC purpose,
> putting a void * into page_info is good enough. But in the future we
> want to have a separate array for tracking so that page_info stays power
> of two in size.
> 

I started working on some prototyping code for the rest of this major
work item. Conversion functions are a bit messy to deal with (I have no
idea whether my modifications are totally correct at this point), but
the most major issue I see is an optimisation done by xmalloc which
isn't compatible with vmap.

So xmalloc has this optimisation: it will allocate a high-order page
from xenheap when necessary and then attempt to break that up and return
the unused portion.  Vmap uses bitmap to track address space usage, and
it mandates a guard page before every address space allocation. What
xmalloc does is to free a portion of the address space, which isn't
really supported by vmap.

I came up with two options yesterday:

1. Remove the optimisation in xmalloc
2. Make vmap able to break up allocation

Neither looks great to me. The first is simple but potentially wasteful
(how much is wasted?). The second requires non-trivial modification to
vmap, essentially removing the mandatory guard page. In comparison the
first is easier and safer.

I would like to hear people's thought on this. Comments are welcome.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-20 12:29   ` Wei Liu
@ 2019-02-20 13:00     ` Roger Pau Monné
  2019-02-20 13:09       ` Wei Liu
  0 siblings, 1 reply; 63+ messages in thread
From: Roger Pau Monné @ 2019-02-20 13:00 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Xen-devel, Dannowski, Uwe, Lars Kurth,
	Konrad Wilk, Ross Philipson, Dario Faggioli, Matt Wilson,
	Boris Ostrovsky, Juergen Gross, Sergey Dyasli, George Dunlap,
	Andrew Cooper, Mihai Donțu, Woodhouse, David

On Wed, Feb 20, 2019 at 12:29:01PM +0000, Wei Liu wrote:
> On Thu, Jan 24, 2019 at 11:44:55AM +0000, Wei Liu wrote:
> > 3. Implement xenheap using vmap infrastructure
> > 
> > This helps preserve xenheap's "always mapped" property. At the moment,
> > vmap relies on xenheap, we want to turn this relationship around.
> > 
> > There is a loop what needs breaking in the new world:
> > 
> >   alloc_xenheap_pages -> vmap -> __vmap -> map_pages_to_xen ->
> >     virt_to_xen_l1e -> alloc_xen_pagetable -> alloc_xenheap_page -> vmap ...
> > 
> > Two options were proposed to break this loop:
> > 
> >   3.1 Pre-populate all page tables for vmap region
> 
> Now that we have this ...
> 
> >   3.2 Switch page table allocation to use domheap page
> > 
> > 
> > The other work item is to track page<->virt relationship so that
> > conversion functions (_to_virt etc) continue to work. For PoC purpose,
> > putting a void * into page_info is good enough. But in the future we
> > want to have a separate array for tracking so that page_info stays power
> > of two in size.
> > 
> 
> I started working on some prototyping code for the rest of this major
> work item. Conversion functions are a bit messy to deal with (I have no
> idea whether my modifications are totally correct at this point), but
> the most major issue I see is an optimisation done by xmalloc which
> isn't compatible with vmap.
> 
> So xmalloc has this optimisation: it will allocate a high-order page
> from xenheap when necessary and then attempt to break that up and return
> the unused portion.  Vmap uses bitmap to track address space usage, and
> it mandates a guard page before every address space allocation. What
> xmalloc does is to free a portion of the address space, which isn't
> really supported by vmap.
> 
> I came up with two options yesterday:
> 
> 1. Remove the optimisation in xmalloc
> 2. Make vmap able to break up allocation
> 
> Neither looks great to me. The first is simple but potentially wasteful
> (how much is wasted?). The second requires non-trivial modification to
> vmap, essentially removing the mandatory guard page. In comparison the
> first is easier and safer.
> 
> I would like to hear people's thought on this. Comments are welcome.

The PV dom0 builder does something similar to this, it tries to
allocate a page that has an order equal or higher than the order of
the request size, and then frees up the unused part.

I've used another approach for the PVH dom0 builder, which is to never
allocate more than what's required, and instead always under-allocate.
This has the benefit of not splitting high order pages, but requires
multiple calls to the allocation function. See
pvh_populate_memory_range in hvm/dom0_build.c and it's usage of
get_order_from_pages. I think a similar approach could be implemented
in xmalloc?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-20 13:00     ` Roger Pau Monné
@ 2019-02-20 13:09       ` Wei Liu
  2019-02-20 17:08         ` Wei Liu
  0 siblings, 1 reply; 63+ messages in thread
From: Wei Liu @ 2019-02-20 13:09 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Xen-devel, Dannowski, Uwe, Lars Kurth,
	Konrad Wilk, Ross Philipson, Dario Faggioli, Matt Wilson,
	Boris Ostrovsky, Juergen Gross, Sergey Dyasli, Wei Liu,
	George Dunlap, Andrew Cooper, Mihai Donțu

On Wed, Feb 20, 2019 at 02:00:52PM +0100, Roger Pau Monné wrote:
> On Wed, Feb 20, 2019 at 12:29:01PM +0000, Wei Liu wrote:
> > On Thu, Jan 24, 2019 at 11:44:55AM +0000, Wei Liu wrote:
> > > 3. Implement xenheap using vmap infrastructure
> > > 
> > > This helps preserve xenheap's "always mapped" property. At the moment,
> > > vmap relies on xenheap, we want to turn this relationship around.
> > > 
> > > There is a loop what needs breaking in the new world:
> > > 
> > >   alloc_xenheap_pages -> vmap -> __vmap -> map_pages_to_xen ->
> > >     virt_to_xen_l1e -> alloc_xen_pagetable -> alloc_xenheap_page -> vmap ...
> > > 
> > > Two options were proposed to break this loop:
> > > 
> > >   3.1 Pre-populate all page tables for vmap region
> > 
> > Now that we have this ...
> > 
> > >   3.2 Switch page table allocation to use domheap page
> > > 
> > > 
> > > The other work item is to track page<->virt relationship so that
> > > conversion functions (_to_virt etc) continue to work. For PoC purpose,
> > > putting a void * into page_info is good enough. But in the future we
> > > want to have a separate array for tracking so that page_info stays power
> > > of two in size.
> > > 
> > 
> > I started working on some prototyping code for the rest of this major
> > work item. Conversion functions are a bit messy to deal with (I have no
> > idea whether my modifications are totally correct at this point), but
> > the most major issue I see is an optimisation done by xmalloc which
> > isn't compatible with vmap.
> > 
> > So xmalloc has this optimisation: it will allocate a high-order page
> > from xenheap when necessary and then attempt to break that up and return
> > the unused portion.  Vmap uses bitmap to track address space usage, and
> > it mandates a guard page before every address space allocation. What
> > xmalloc does is to free a portion of the address space, which isn't
> > really supported by vmap.
> > 
> > I came up with two options yesterday:
> > 
> > 1. Remove the optimisation in xmalloc
> > 2. Make vmap able to break up allocation
> > 
> > Neither looks great to me. The first is simple but potentially wasteful
> > (how much is wasted?). The second requires non-trivial modification to
> > vmap, essentially removing the mandatory guard page. In comparison the
> > first is easier and safer.
> > 
> > I would like to hear people's thought on this. Comments are welcome.
> 
> The PV dom0 builder does something similar to this, it tries to
> allocate a page that has an order equal or higher than the order of
> the request size, and then frees up the unused part.
> 
> I've used another approach for the PVH dom0 builder, which is to never
> allocate more than what's required, and instead always under-allocate.
> This has the benefit of not splitting high order pages, but requires
> multiple calls to the allocation function. See
> pvh_populate_memory_range in hvm/dom0_build.c and it's usage of
> get_order_from_pages. I think a similar approach could be implemented
> in xmalloc?
> 

The usage in PV dom0 build is not an issue because those pages are
domheap pages. On a related topic, I have to fix that instance since it
treats domheap pages like xenheap pages, which will be very wrong in the
future.

Your example of PVH dom0 build uses domheap pages too, so that's not an
issue.

I think under-allocate-then-map looks plausible. xmalloc will need
to allocate pages, put them into an array and call __vmap on that array
directly.

Wei.

> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-20 13:09       ` Wei Liu
@ 2019-02-20 17:08         ` Wei Liu
  2019-02-21  9:59           ` Roger Pau Monné
  2019-02-22 11:48           ` Jan Beulich
  0 siblings, 2 replies; 63+ messages in thread
From: Wei Liu @ 2019-02-20 17:08 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Xen-devel, Dannowski, Uwe, Lars Kurth,
	Konrad Wilk, Ross Philipson, Dario Faggioli, Matt Wilson,
	Boris Ostrovsky, Juergen Gross, Sergey Dyasli, Wei Liu,
	George Dunlap, Andrew Cooper, Mihai Donțu

On Wed, Feb 20, 2019 at 01:09:56PM +0000, Wei Liu wrote:
[...]
> I think under-allocate-then-map looks plausible. xmalloc will need
> to allocate pages, put them into an array and call __vmap on that array
> directly.

The biggest issue with this approach is that we now need an array of
1UL<<MAX_ORDER to accommodate mfns. Back of envelope calculation: on x86
this is going to be (1UL<<20)*8 bytes long. This is not feasible.

Wei.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-20 17:08         ` Wei Liu
@ 2019-02-21  9:59           ` Roger Pau Monné
  2019-02-21 17:51             ` Wei Liu
  2019-02-22 11:48           ` Jan Beulich
  1 sibling, 1 reply; 63+ messages in thread
From: Roger Pau Monné @ 2019-02-21  9:59 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Xen-devel, Dannowski, Uwe, Lars Kurth,
	Konrad Wilk, Ross Philipson, Dario Faggioli, Matt Wilson,
	Boris Ostrovsky, Juergen Gross, Sergey Dyasli, George Dunlap,
	Andrew Cooper, Mihai Donțu, Woodhouse, David

On Wed, Feb 20, 2019 at 05:08:09PM +0000, Wei Liu wrote:
> On Wed, Feb 20, 2019 at 01:09:56PM +0000, Wei Liu wrote:
> [...]
> > I think under-allocate-then-map looks plausible. xmalloc will need
> > to allocate pages, put them into an array and call __vmap on that array
> > directly.
> 
> The biggest issue with this approach is that we now need an array of
> 1UL<<MAX_ORDER to accommodate mfns. Back of envelope calculation: on x86
> this is going to be (1UL<<20)*8 bytes long. This is not feasible.

Right. I guess the only remaining option is to allocate a virtual
address space and populate it using multiple pages?

That would likely require to split some functions into smaller helpers
so you can call them and provide the virtual address where a page
should be mapped?

Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-21  9:59           ` Roger Pau Monné
@ 2019-02-21 17:51             ` Wei Liu
  0 siblings, 0 replies; 63+ messages in thread
From: Wei Liu @ 2019-02-21 17:51 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Martin Pohlack, Julien Grall, Jan Beulich, Joao Martins,
	Stefano Stabellini, Daniel Kiper, Marek Marczykowski,
	Anthony Liguori, Xen-devel, Dannowski, Uwe, Lars Kurth,
	Konrad Wilk, Ross Philipson, Dario Faggioli, Matt Wilson,
	Boris Ostrovsky, Juergen Gross, Sergey Dyasli, Wei Liu,
	George Dunlap, Andrew Cooper, Mihai Donțu

On Thu, Feb 21, 2019 at 10:59:41AM +0100, Roger Pau Monné wrote:
> On Wed, Feb 20, 2019 at 05:08:09PM +0000, Wei Liu wrote:
> > On Wed, Feb 20, 2019 at 01:09:56PM +0000, Wei Liu wrote:
> > [...]
> > > I think under-allocate-then-map looks plausible. xmalloc will need
> > > to allocate pages, put them into an array and call __vmap on that array
> > > directly.
> > 
> > The biggest issue with this approach is that we now need an array of
> > 1UL<<MAX_ORDER to accommodate mfns. Back of envelope calculation: on x86
> > this is going to be (1UL<<20)*8 bytes long. This is not feasible.
> 
> Right. I guess the only remaining option is to allocate a virtual
> address space and populate it using multiple pages?

I probably was not clear enough -- the aforementioned array was needed
for this method, so I don't think this is feasible.

> 
> That would likely require to split some functions into smaller helpers
> so you can call them and provide the virtual address where a page
> should be mapped?
> 

Not feasible as of now. The fundamental issue is that vmap is managed by
a bitmap which has 1:1 mapping for page and address space, and it
mandates a guard page for each allocation.

To make this work we need to remove the mandatory guard page. Inventing
more complex tracking structure is probably not worth it.

Wei.

> Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-20 17:08         ` Wei Liu
  2019-02-21  9:59           ` Roger Pau Monné
@ 2019-02-22 11:48           ` Jan Beulich
  2019-02-22 11:50             ` Wei Liu
  1 sibling, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2019-02-22 11:48 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, xen-devel,
	uwed, Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Daniel Kiper,
	David Woodhouse, Roger Pau Monne

>>> On 20.02.19 at 18:08, <wei.liu2@citrix.com> wrote:
> On Wed, Feb 20, 2019 at 01:09:56PM +0000, Wei Liu wrote:
> [...]
>> I think under-allocate-then-map looks plausible. xmalloc will need
>> to allocate pages, put them into an array and call __vmap on that array
>> directly.
> 
> The biggest issue with this approach is that we now need an array of
> 1UL<<MAX_ORDER to accommodate mfns. Back of envelope calculation: on x86
> this is going to be (1UL<<20)*8 bytes long. This is not feasible.

Are we really calling xmalloc() with any number nearly this big?

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-22 11:48           ` Jan Beulich
@ 2019-02-22 11:50             ` Wei Liu
  2019-02-22 12:06               ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: Wei Liu @ 2019-02-22 11:50 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, xen-devel,
	uwed, Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Daniel Kiper, David Woodhouse, Roger

On Fri, Feb 22, 2019 at 04:48:09AM -0700, Jan Beulich wrote:
> >>> On 20.02.19 at 18:08, <wei.liu2@citrix.com> wrote:
> > On Wed, Feb 20, 2019 at 01:09:56PM +0000, Wei Liu wrote:
> > [...]
> >> I think under-allocate-then-map looks plausible. xmalloc will need
> >> to allocate pages, put them into an array and call __vmap on that array
> >> directly.
> > 
> > The biggest issue with this approach is that we now need an array of
> > 1UL<<MAX_ORDER to accommodate mfns. Back of envelope calculation: on x86
> > this is going to be (1UL<<20)*8 bytes long. This is not feasible.
> 
> Are we really calling xmalloc() with any number nearly this big?

In practice, I don't think so. What do you think is a sensible limit?

Wei.

> 
> Jan
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-22 11:50             ` Wei Liu
@ 2019-02-22 12:06               ` Jan Beulich
  2019-02-22 12:11                 ` Wei Liu
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2019-02-22 12:06 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, xen-devel,
	uwed, Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Daniel Kiper,
	David Woodhouse, Roger Pau Monne

>>> On 22.02.19 at 12:50, <wei.liu2@citrix.com> wrote:
> On Fri, Feb 22, 2019 at 04:48:09AM -0700, Jan Beulich wrote:
>> >>> On 20.02.19 at 18:08, <wei.liu2@citrix.com> wrote:
>> > On Wed, Feb 20, 2019 at 01:09:56PM +0000, Wei Liu wrote:
>> > [...]
>> >> I think under-allocate-then-map looks plausible. xmalloc will need
>> >> to allocate pages, put them into an array and call __vmap on that array
>> >> directly.
>> > 
>> > The biggest issue with this approach is that we now need an array of
>> > 1UL<<MAX_ORDER to accommodate mfns. Back of envelope calculation: on x86
>> > this is going to be (1UL<<20)*8 bytes long. This is not feasible.
>> 
>> Are we really calling xmalloc() with any number nearly this big?
> 
> In practice, I don't think so. What do you think is a sensible limit?

I'm afraid you won't like the answer: Whatever the biggest chunk is
we currently allocate anywhere. Perhaps, e.g. if there's a single big
"violator", changing some code to reduce the upper bound might be
desirable.

In general there shouldn't be any going beyond one page once we've
completed booting. Several years back I think I had managed to
replace most (all?) higher order xmalloc()-s. So another option might
be to allow up to MAX_ORDER by way of some init-only mechanism,
and later allow only up to single page chunks.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-22 12:06               ` Jan Beulich
@ 2019-02-22 12:11                 ` Wei Liu
  2019-02-22 12:47                   ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: Wei Liu @ 2019-02-22 12:11 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, xen-devel,
	uwed, Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Daniel Kiper, David Woodhouse, Roger

On Fri, Feb 22, 2019 at 05:06:03AM -0700, Jan Beulich wrote:
> >>> On 22.02.19 at 12:50, <wei.liu2@citrix.com> wrote:
> > On Fri, Feb 22, 2019 at 04:48:09AM -0700, Jan Beulich wrote:
> >> >>> On 20.02.19 at 18:08, <wei.liu2@citrix.com> wrote:
> >> > On Wed, Feb 20, 2019 at 01:09:56PM +0000, Wei Liu wrote:
> >> > [...]
> >> >> I think under-allocate-then-map looks plausible. xmalloc will need
> >> >> to allocate pages, put them into an array and call __vmap on that array
> >> >> directly.
> >> > 
> >> > The biggest issue with this approach is that we now need an array of
> >> > 1UL<<MAX_ORDER to accommodate mfns. Back of envelope calculation: on x86
> >> > this is going to be (1UL<<20)*8 bytes long. This is not feasible.
> >> 
> >> Are we really calling xmalloc() with any number nearly this big?
> > 
> > In practice, I don't think so. What do you think is a sensible limit?
> 
> I'm afraid you won't like the answer: Whatever the biggest chunk is
> we currently allocate anywhere. Perhaps, e.g. if there's a single big
> "violator", changing some code to reduce the upper bound might be
> desirable.
> 
> In general there shouldn't be any going beyond one page once we've
> completed booting. Several years back I think I had managed to
> replace most (all?) higher order xmalloc()-s. So another option might
> be to allow up to MAX_ORDER by way of some init-only mechanism,
> and later allow only up to single page chunks.
> 

Think about it, if you have done the work to remove high order
allocations, removing this optimisation is the easiest thing to do and
wouldn't make things worse, isn't it?

Wei.

> Jan
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-22 12:11                 ` Wei Liu
@ 2019-02-22 12:47                   ` Jan Beulich
  2019-02-22 13:19                     ` Wei Liu
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2019-02-22 12:47 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, xen-devel,
	uwed, Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Daniel Kiper,
	David Woodhouse, Roger Pau Monne

>>> On 22.02.19 at 13:11, <wei.liu2@citrix.com> wrote:
> On Fri, Feb 22, 2019 at 05:06:03AM -0700, Jan Beulich wrote:
>> >>> On 22.02.19 at 12:50, <wei.liu2@citrix.com> wrote:
>> > On Fri, Feb 22, 2019 at 04:48:09AM -0700, Jan Beulich wrote:
>> >> >>> On 20.02.19 at 18:08, <wei.liu2@citrix.com> wrote:
>> >> > On Wed, Feb 20, 2019 at 01:09:56PM +0000, Wei Liu wrote:
>> >> > [...]
>> >> >> I think under-allocate-then-map looks plausible. xmalloc will need
>> >> >> to allocate pages, put them into an array and call __vmap on that array
>> >> >> directly.
>> >> > 
>> >> > The biggest issue with this approach is that we now need an array of
>> >> > 1UL<<MAX_ORDER to accommodate mfns. Back of envelope calculation: on x86
>> >> > this is going to be (1UL<<20)*8 bytes long. This is not feasible.
>> >> 
>> >> Are we really calling xmalloc() with any number nearly this big?
>> > 
>> > In practice, I don't think so. What do you think is a sensible limit?
>> 
>> I'm afraid you won't like the answer: Whatever the biggest chunk is
>> we currently allocate anywhere. Perhaps, e.g. if there's a single big
>> "violator", changing some code to reduce the upper bound might be
>> desirable.
>> 
>> In general there shouldn't be any going beyond one page once we've
>> completed booting. Several years back I think I had managed to
>> replace most (all?) higher order xmalloc()-s. So another option might
>> be to allow up to MAX_ORDER by way of some init-only mechanism,
>> and later allow only up to single page chunks.
>> 
> 
> Think about it, if you have done the work to remove high order
> allocations, removing this optimisation is the easiest thing to do and
> wouldn't make things worse, isn't it?

Not sure I understand what exactly you want to remove. Allocating
32 pages when you need 17 is wasteful, and hence I'd prefer if we
could continue to make actual use of the remaining 15. That's
independent of whether the allocation occurs at boot or run time.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-22 12:47                   ` Jan Beulich
@ 2019-02-22 13:19                     ` Wei Liu
       [not found]                       ` <158783E402000088A293CED3@prv1-mh.provo.novell.com>
  0 siblings, 1 reply; 63+ messages in thread
From: Wei Liu @ 2019-02-22 13:19 UTC (permalink / raw)
  To: Jan Beulich
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, xen-devel,
	uwed, Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, Wei Liu, George Dunlap, Andrew Cooper,
	Daniel Kiper, David Woodhouse, Roger

On Fri, Feb 22, 2019 at 05:47:13AM -0700, Jan Beulich wrote:
> >>> On 22.02.19 at 13:11, <wei.liu2@citrix.com> wrote:
> > On Fri, Feb 22, 2019 at 05:06:03AM -0700, Jan Beulich wrote:
> >> >>> On 22.02.19 at 12:50, <wei.liu2@citrix.com> wrote:
> >> > On Fri, Feb 22, 2019 at 04:48:09AM -0700, Jan Beulich wrote:
> >> >> >>> On 20.02.19 at 18:08, <wei.liu2@citrix.com> wrote:
> >> >> > On Wed, Feb 20, 2019 at 01:09:56PM +0000, Wei Liu wrote:
> >> >> > [...]
> >> >> >> I think under-allocate-then-map looks plausible. xmalloc will need
> >> >> >> to allocate pages, put them into an array and call __vmap on that array
> >> >> >> directly.
> >> >> > 
> >> >> > The biggest issue with this approach is that we now need an array of
> >> >> > 1UL<<MAX_ORDER to accommodate mfns. Back of envelope calculation: on x86
> >> >> > this is going to be (1UL<<20)*8 bytes long. This is not feasible.
> >> >> 
> >> >> Are we really calling xmalloc() with any number nearly this big?
> >> > 
> >> > In practice, I don't think so. What do you think is a sensible limit?
> >> 
> >> I'm afraid you won't like the answer: Whatever the biggest chunk is
> >> we currently allocate anywhere. Perhaps, e.g. if there's a single big
> >> "violator", changing some code to reduce the upper bound might be
> >> desirable.
> >> 
> >> In general there shouldn't be any going beyond one page once we've
> >> completed booting. Several years back I think I had managed to
> >> replace most (all?) higher order xmalloc()-s. So another option might
> >> be to allow up to MAX_ORDER by way of some init-only mechanism,
> >> and later allow only up to single page chunks.
> >> 
> > 
> > Think about it, if you have done the work to remove high order
> > allocations, removing this optimisation is the easiest thing to do and
> > wouldn't make things worse, isn't it?
> 
> Not sure I understand what exactly you want to remove. Allocating

Remove the code that returns those 17 pages.

> 32 pages when you need 17 is wasteful, and hence I'd prefer if we
> could continue to make actual use of the remaining 15. That's
> independent of whether the allocation occurs at boot or run time.
> 

Sure. But in my opinion there will only be one such wastage in the life
time of system then opting for simpler code is a far better approach
(with appropriate checks in place). On the other hand, if not returning
pages results in wasting almost half of each allocation, we will need to
think of a clever way. Your reply made me think of the former.

I have only realised this today: essentially we will end up implementing
xmalloc with vmalloc, which at the moment depends on xmalloc to allocate
the array of mfns.

Wei.

> Jan
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
       [not found]                       ` <158783E402000088A293CED3@prv1-mh.provo.novell.com>
@ 2019-02-22 13:24                         ` Jan Beulich
  2019-02-22 13:27                           ` Jan Beulich
  0 siblings, 1 reply; 63+ messages in thread
From: Jan Beulich @ 2019-02-22 13:24 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, xen-devel,
	uwed, Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Daniel Kiper,
	David Woodhouse, Roger Pau Monne

>>> On 22.02.19 at 14:19, <wei.liu2@citrix.com> wrote:
> I have only realised this today: essentially we will end up implementing
> xmalloc with vmalloc, which at the moment depends on xmalloc to allocate
> the array of mfns.

Which (potential locking issues aside) is not a problem, as the size of
the MFN array will reduce logarithmically, until it eventually is no larger
than a page anymore.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

* Re: Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work)
  2019-02-22 13:24                         ` Jan Beulich
@ 2019-02-22 13:27                           ` Jan Beulich
  0 siblings, 0 replies; 63+ messages in thread
From: Jan Beulich @ 2019-02-22 13:27 UTC (permalink / raw)
  To: Wei Liu
  Cc: Martin Pohlack, Julien Grall, Joao Martins, Stefano Stabellini,
	Mihai Dontu, Marek Marczykowski, Anthony Liguori, xen-devel,
	uwed, Lars Kurth, Konrad Rzeszutek Wilk, Ross Philipson,
	Dario Faggioli, Matt Wilson, Boris Ostrovsky, Juergen Gross,
	Sergey Dyasli, George Dunlap, Andrew Cooper, Daniel Kiper,
	David Woodhouse, Roger Pau Monne

>>> On 22.02.19 at 14:24,  wrote:
>>>> On 22.02.19 at 14:19, <wei.liu2@citrix.com> wrote:
> > I have only realised this today: essentially we will end up implementing
> > xmalloc with vmalloc, which at the moment depends on xmalloc to allocate
> > the array of mfns.
> 
> Which (potential locking issues aside) is not a problem, as the size of
> the MFN array will reduce logarithmically, until it eventually is no larger
> than a page anymore.

Err, not logarithmically, but you get the point.

Jan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 63+ messages in thread

end of thread, other threads:[~2019-02-22 13:27 UTC | newest]

Thread overview: 63+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-18 17:46 Ongoing/future speculative mitigation work Andrew Cooper
2018-10-19  8:09 ` Dario Faggioli
2018-10-19 12:17   ` Andrew Cooper
2018-10-22  9:32     ` Mihai Donțu
2018-10-22 14:55 ` Wei Liu
2018-10-22 15:09   ` Woodhouse, David
2018-10-22 15:14     ` Andrew Cooper
2018-10-25 14:50   ` Jan Beulich
2018-10-25 14:56     ` George Dunlap
2018-10-25 15:02       ` Jan Beulich
2018-10-25 16:29         ` Andrew Cooper
2018-10-25 16:43           ` George Dunlap
2018-10-25 16:50             ` Andrew Cooper
2018-10-25 17:07               ` George Dunlap
2018-10-26  9:16           ` Jan Beulich
2018-10-26  9:28             ` Wei Liu
2018-10-26  9:56               ` Jan Beulich
2018-10-26 10:51                 ` George Dunlap
2018-10-26 11:20                   ` Jan Beulich
2018-10-26 11:24                     ` George Dunlap
2018-10-26 11:33                       ` Jan Beulich
2018-10-26 11:43                         ` George Dunlap
2018-10-26 11:45                           ` Jan Beulich
2018-12-11 18:05                     ` Wei Liu
     [not found]                       ` <FB70ABC00200007CA293CED3@prv1-mh.provo.novell.com>
2018-12-12  8:32                         ` Jan Beulich
2018-10-24 15:24 ` Tamas K Lengyel
2018-10-25 16:01   ` Dario Faggioli
2018-10-25 16:25     ` Tamas K Lengyel
2018-10-25 17:23       ` Dario Faggioli
2018-10-25 17:29         ` Tamas K Lengyel
2018-10-26  7:31           ` Dario Faggioli
2018-10-25 16:55   ` Andrew Cooper
2018-10-25 17:01     ` George Dunlap
2018-10-25 17:35       ` Tamas K Lengyel
2018-10-25 17:43         ` Andrew Cooper
2018-10-25 17:58           ` Tamas K Lengyel
2018-10-25 18:13             ` Andrew Cooper
2018-10-25 18:35               ` Tamas K Lengyel
2018-10-25 18:39                 ` Andrew Cooper
2018-10-26  7:49                 ` Dario Faggioli
2018-10-26 12:01                   ` Tamas K Lengyel
2018-10-26 14:17                     ` Dario Faggioli
2018-10-26 10:11               ` George Dunlap
2018-12-07 18:40 ` Wei Liu
2018-12-10 12:12   ` George Dunlap
2018-12-10 12:19     ` George Dunlap
2019-01-24 11:44 ` Reducing or removing direct map from xen (was Re: Ongoing/future speculative mitigation work) Wei Liu
2019-01-24 16:00   ` George Dunlap
2019-02-07 16:50   ` Wei Liu
2019-02-20 12:29   ` Wei Liu
2019-02-20 13:00     ` Roger Pau Monné
2019-02-20 13:09       ` Wei Liu
2019-02-20 17:08         ` Wei Liu
2019-02-21  9:59           ` Roger Pau Monné
2019-02-21 17:51             ` Wei Liu
2019-02-22 11:48           ` Jan Beulich
2019-02-22 11:50             ` Wei Liu
2019-02-22 12:06               ` Jan Beulich
2019-02-22 12:11                 ` Wei Liu
2019-02-22 12:47                   ` Jan Beulich
2019-02-22 13:19                     ` Wei Liu
     [not found]                       ` <158783E402000088A293CED3@prv1-mh.provo.novell.com>
2019-02-22 13:24                         ` Jan Beulich
2019-02-22 13:27                           ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.