Re: [RFC][PATCH 00/16] sched: Core scheduling

From: Tim Chen <tim.c.chen@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: "Linus Torvalds" <torvalds@linux-foundation.org>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	subhra.mazumdar@oracle.com,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	kerrnel@google.com
Subject: Re: [RFC][PATCH 00/16] sched: Core scheduling
Date: Fri, 22 Feb 2019 11:26:56 -0800	[thread overview]
Message-ID: <786668c1-fb52-508c-e916-f86707a1d791@linux.intel.com> (raw)
In-Reply-To: <20190222142030.GA32494@hirez.programming.kicks-ass.net>

On 2/22/19 6:20 AM, Peter Zijlstra wrote:
> On Fri, Feb 22, 2019 at 01:17:01PM +0100, Paolo Bonzini wrote:
>> On 18/02/19 21:40, Peter Zijlstra wrote:
>>> On Mon, Feb 18, 2019 at 09:49:10AM -0800, Linus Torvalds wrote:
>>>> On Mon, Feb 18, 2019 at 9:40 AM Peter Zijlstra <peterz@infradead.org> wrote:
>>>>>
>>>>> However; whichever way around you turn this cookie; it is expensive and nasty.
>>>>
>>>> Do you (or anybody else) have numbers for real loads?
>>>>
>>>> Because performance is all that matters. If performance is bad, then
>>>> it's pointless, since just turning off SMT is the answer.
>>>
>>> Not for these patches; they stopped crashing only yesterday and I
>>> cleaned them up and send them out.
>>>
>>> The previous version; which was more horrible; but L1TF complete, was
>>> between OK-ish and horrible depending on the number of VMEXITs a
>>> workload had.
>>>
>>> If there were close to no VMEXITs, it beat smt=off, if there were lots
>>> of VMEXITs it was far far worse. Supposedly hosting people try their
>>> very bestest to have no VMEXITs so it mostly works for them (with the
>>> obvious exception of single VCPU guests).
>>
>> If you are giving access to dedicated cores to guests, you also let them
>> do PAUSE/HLT/MWAIT without vmexits and the host just thinks it's a CPU
>> bound workload.
>>
>> In any case, IIUC what you are looking for is:
>>
>> 1) take a benchmark that *is* helped by SMT, this will be something CPU
>> bound.
>>
>> 2) compare two runs, one without SMT and without core scheduler, and one
>> with SMT+core scheduler.
>>
>> 3) find out whether performance is helped by SMT despite the increased
>> overhead of the core scheduler
>>
>> Do you want some other load in the host, so that the scheduler actually
>> does do something?  Or is the point just that you show that the
>> performance isn't affected when the scheduler does not have anything to
>> do (which should be obvious, but having numbers is always better)?
> 
> Well, what _I_ want is for all this to just go away :-)
> 
> Tim did much of testing last time around; and I don't think he did
> core-pinning of VMs much (although I'm sure he did some of that). I'm

Yes. The last time around I tested basic scenarios like:
1. single VM pinned on a core
2. 2 VMs pinned on a core
3. system oversubscription (no pinning)

In general, CPU bound benchmarks and even things without too much I/O
causing lots of VMexits perform better with HT than without for Peter's
last patchset.

> still a complete virt noob; I can barely boot a VM to save my life.
> 
> (you should be glad to not have heard my cursing at qemu cmdline when
> trying to reproduce some of Tim's results -- lets just say that I can
> deal with gpg)
> 
> I'm sure he tried some oversubscribed scenarios without pinning. 

We did try some oversubscribed scenarios like SPECVirt, that tried to
squeeze tons of VMs on a single system in over subscription mode.

There're two main problems in the last go around:

1. Workload with high rate of Vmexits (SpecVirt is one) 
were a major source of pain when we tried Peter's previous patchset.
The switch from vcpus to qemu and back in previous version of Peter's patch
requires some coordination between the hyperthread siblings via IPI.  And for
workload that does this a lot, the overhead quickly added up.

For Peter's new patch, this overhead hopefully would be reduced and give
better performance.

2. Load balancing is quite tricky.  Peter's last patchset did not have
load balancing for consolidating compatible running threads.
I did some non-sophisticated load balancing
to pair vcpus up.  But the constant vcpu migrations overhead probably ate up
any improvements from better load pairing.  So I didn't get much
improvement in the over-subscription case when turning on load balancing
to consolidate the VCPUs of the same VM. We'll probably have to try
out this incarnation of Peter's patch and see how well the load balancing
works.

I'll try to line up some benchmarking folks to do some tests.

Tim