Re: Ongoing/future speculative mitigation work

From: George Dunlap <george.dunlap@citrix.com>
To: Andrew Cooper <andrew.cooper3@citrix.com>,
	Tamas K Lengyel <tamas.k.lengyel@gmail.com>
Cc: mpohlack@amazon.de, "Julien Grall" <julien.grall@arm.com>,
	"Jan Beulich" <JBeulich@suse.com>,
	joao.m.martins@oracle.com,
	"Stefano Stabellini" <sstabellini@kernel.org>,
	"Daniel Kiper" <daniel.kiper@oracle.com>,
	"Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>,
	aliguori@amazon.com, uwed@amazon.de,
	"Lars Kurth" <lars.kurth@citrix.com>,
	"Konrad Rzeszutek Wilk" <konrad.wilk@oracle.com>,
	ross.philipson@oracle.com, "Dario Faggioli" <dfaggioli@suse.com>,
	"Matt Wilson" <msw@amazon.com>,
	"Boris Ostrovsky" <boris.ostrovsky@oracle.com>,
	JGross@suse.com, sergey.dyasli@citrix.com,
	"Wei Liu" <wei.liu2@citrix.com>,
	"George Dunlap" <george.dunlap@eu.citrix.com>,
	Xen-devel <xen-devel@lists.xen.org>,
	mdontu <mdontu@bitdefender.com>,
	dwmw@amazon.co.uk, "Roger Pau Monné" <roger.pau@citrix.com>
Subject: Re: Ongoing/future speculative mitigation work
Date: Fri, 26 Oct 2018 11:11:18 +0100	[thread overview]
Message-ID: <59431d5d-0a11-20e5-740a-8566aa846f47@citrix.com> (raw)
In-Reply-To: <5c251189-cfb7-a36a-5a33-fd661771c9c0@citrix.com>

On 10/25/2018 07:13 PM, Andrew Cooper wrote:
> On 25/10/18 18:58, Tamas K Lengyel wrote:
>> On Thu, Oct 25, 2018 at 11:43 AM Andrew Cooper
>> <andrew.cooper3@citrix.com> wrote:
>>> On 25/10/18 18:35, Tamas K Lengyel wrote:
>>>> On Thu, Oct 25, 2018 at 11:02 AM George Dunlap <george.dunlap@citrix.com> wrote:
>>>>> On 10/25/2018 05:55 PM, Andrew Cooper wrote:
>>>>>> On 24/10/18 16:24, Tamas K Lengyel wrote:
>>>>>>>> A solution to this issue was proposed, whereby Xen synchronises siblings
>>>>>>>> on vmexit/entry, so we are never executing code in two different
>>>>>>>> privilege levels.  Getting this working would make it safe to continue
>>>>>>>> using hyperthreading even in the presence of L1TF.  Obviously, its going
>>>>>>>> to come in perf hit, but compared to disabling hyperthreading, all its
>>>>>>>> got to do is beat a 60% perf hit to make it the preferable option for
>>>>>>>> making your system L1TF-proof.
>>>>>>> Could you shed some light what tests were done where that 60%
>>>>>>> performance hit was observed? We have performed intensive stress-tests
>>>>>>> to confirm this but according to our findings turning off
>>>>>>> hyper-threading is actually improving performance on all machines we
>>>>>>> tested thus far.
>>>>>> Aggregate inter and intra host disk and network throughput, which is a
>>>>>> reasonable approximation of a load of webserver VM's on a single
>>>>>> physical server.  Small packet IO was hit worst, as it has a very high
>>>>>> vcpu context switch rate between dom0 and domU.  Disabling HT means you
>>>>>> have half the number of logical cores to schedule on, which doubles the
>>>>>> mean time to next timeslice.
>>>>>>
>>>>>> In principle, for a fully optimised workload, HT gets you ~30% extra due
>>>>>> to increased utilisation of the pipeline functional units.  Some
>>>>>> resources are statically partitioned, while some are competitively
>>>>>> shared, and its now been well proven that actions on one thread can have
>>>>>> a large effect on others.
>>>>>>
>>>>>> Two arbitrary vcpus are not an optimised workload.  If the perf
>>>>>> improvement you get from not competing in the pipeline is greater than
>>>>>> the perf loss from Xen's reduced capability to schedule, then disabling
>>>>>> HT would be an improvement.  I can certainly believe that this might be
>>>>>> the case for Qubes style workloads where you are probably not very
>>>>>> overprovisioned, and you probably don't have long running IO and CPU
>>>>>> bound tasks in the VMs.
>>>>> As another data point, I think it was MSCI who said they always disabled
>>>>> hyperthreading, because they also found that their workloads ran slower
>>>>> with HT than without.  Presumably they were doing massive number
>>>>> crunching, such that each thread was waiting on the ALU a significant
>>>>> portion of the time anyway; at which point the superscalar scheduling
>>>>> and/or reduction in cache efficiency would have brought performance from
>>>>> "no benefit" down to "negative benefit".
>>>>>
>>>> Thanks for the insights. Indeed, we are primarily concerned with
>>>> performance of Qubes-style workloads which may range from
>>>> no-oversubscription to heavily oversubscribed. It's not a workload we
>>>> can predict or optimize before-hand, so we are looking for a default
>>>> that would be 1) safe and 2) performant in the most general case
>>>> possible.
>>> So long as you've got the XSA-273 patches, you should be able to park
>>> and re-reactivate hyperthreads using `xen-hptool cpu-{online,offline} $CPU`.
>>>
>>> You should be able to effectively change hyperthreading configuration at
>>> runtime.  It's not quite the same as changing it in the BIOS, but from a
>>> competition of pipeline resources, it should be good enough.
>>>
>> Thanks, indeed that is a handy tool to have. We often can't disable
>> hyperthreading in the BIOS anyway because most BIOS' don't allow you
>> to do that when TXT is used.
> 
> Hmm - that's an odd restriction.  I don't immediately see why such a
> restriction would be necessary.
> 
>> That said, with this tool we still
>> require some way to determine when to do parking/reactivation of
>> hyperthreads. We could certainly park hyperthreads when we see the
>> system is being oversubscribed in terms of number of vCPUs being
>> active, but for real optimization we would have to understand the
>> workloads running within the VMs if I understand correctly?
> 
> TBH, I'd perhaps start with an admin control which lets them switch
> between the two modes, and some instructions on how/why they might want
> to try switching.
> 
> Trying to second-guess the best HT setting automatically is most likely
> going to be a lost cause.  It will be system specific as to whether the
> same workload is better with or without HT.

There may be hardware-specific performance counters that could be used
to detect when pathological cases are happening.  But that would need to
be implemented and/or re-verified on basically every new piece of hardware.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel