Re: [PATCH] cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available

From: Wanpeng Li <kernellwp@gmail.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: "Paolo Bonzini" <pbonzini@redhat.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	LKML <linux-kernel@vger.kernel.org>, kvm <kvm@vger.kernel.org>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Linux PM" <linux-pm@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [PATCH] cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available
Date: Tue, 27 Aug 2019 08:43:13 +0800	[thread overview]
Message-ID: <CANRm+Cx0+V67Ek7FhSs61ZqZL3MgV88Wdy17Q6UA369RH7=dgQ@mail.gmail.com> (raw)
In-Reply-To: <20190826204045.GA24697@amt.cnet>

Cc Michael S. Tsirkin,
On Tue, 27 Aug 2019 at 04:42, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>
> On Tue, Aug 13, 2019 at 08:55:29AM +0800, Wanpeng Li wrote:
> > On Sun, 4 Aug 2019 at 04:21, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > >
> > > On Thu, Aug 01, 2019 at 06:54:49PM +0200, Paolo Bonzini wrote:
> > > > On 01/08/19 18:51, Rafael J. Wysocki wrote:
> > > > > On 8/1/2019 9:06 AM, Wanpeng Li wrote:
> > > > >> From: Wanpeng Li <wanpengli@tencent.com>
> > > > >>
> > > > >> The downside of guest side polling is that polling is performed even
> > > > >> with other runnable tasks in the host. However, even if poll in kvm
> > > > >> can aware whether or not other runnable tasks in the same pCPU, it
> > > > >> can still incur extra overhead in over-subscribe scenario. Now we can
> > > > >> just enable guest polling when dedicated pCPUs are available.
> > > > >>
> > > > >> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > >> Cc: Paolo Bonzini <pbonzini@redhat.com>
> > > > >> Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > > >> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> > > > >> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
> > > > >
> > > > > Paolo, Marcelo, any comments?
> > > >
> > > > Yes, it's a good idea.
> > > >
> > > > Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> > > >
> > > > Paolo
> > >
> >
> > Hi Marcelo,
> >
> > Sorry for the late response.
> >
> > > I think KVM_HINTS_REALTIME is being abused somewhat.
> > > It has no clear meaning and used in different locations
> > > for different purposes.
> >
> > ================== ============ =================================
> > KVM_HINTS_REALTIME 0                      guest checks this feature bit to
> >
> > determine that vCPUs are never
> >
> > preempted for an unlimited time
>
> Unlimited time means infinite time, or unlimited time means
> 10s ? 1s ?

The former one I think. There is a discussion here
https://lkml.org/lkml/2018/5/17/612

>
> The previous definition was much better IMO: HINTS_DEDICATED.
>
>
> > allowing optimizations
> > ================== ============ =================================
> >
> > Now it disables pv queued spinlock,
>
> OK.
>
> > pv tlb shootdown,
>
> OK.
>
> > pv sched yield
>
> "The idea is from Xen, when sending a call-function IPI-many to vCPUs,
> yield if any of the IPI target vCPUs was preempted. 17% performance
> increasement of ebizzy benchmark can be observed in an over-subscribe
> environment. (w/ kvm-pv-tlb disabled, testing TLB flush call-function
> IPI-many since call-function is not easy to be trigged by userspace
> workload)."
>
> This can probably hurt if vcpus are rarely preempted.

That's why I add the KVM_HINTS_REALTIME checking here.

>
> > which are not expected present in vCPUs are never preempted for an
> > unlimited time scenario.
> >
> > >
> > > For example, i think that using pv queued spinlocks and
> > > haltpoll is a desired scenario, which the patch below disallows.
> >
> > So even if dedicated pCPU is available, pv queued spinlocks should
> > still be chose if something like vhost-kthreads are used instead of
> > DPDK/vhost-user.
>
> Can't you enable the individual features you need for optimizing
> the overcommitted case? This is how things have been done historically:
> If a new feature is available, you enable it to get the desired
> performance. x2apic, invariant-tsc, cpuidle haltpoll...
>
> So in your case: enable pv schedyield, enable pv tlb shootdown.

Both of them are used to optimize function-call IPIs. pv sched yield
for call function interrupts, and pv tlb shootdown for tlb
invalidation. So still different here. Our latest testing against an
80 pCPUs host, and three 80 vCPUs VMs, the number is more better than
64 pCPUs host which I used when posting patches:

ebizzy -M
              vanilla    optimized     boost
1VM            31234       34489        10%
2VM             5380       26664       396%
3VM             2967       23140       679%

>
> > kvm adaptive halt-polling will compete with
> > vhost-kthreads, however, poll in guest unaware other runnable tasks in
> > the host which will defeat vhost-kthreads.
>
> It depends on how much work vhost-kthreads needs to do, how successful
> halt-poll in the guest is, and what improvement halt-polling brings.
> The amount of polling will be reduced to zero if polling
> is not successful.

We observe vhost-kthreads compete with vCPUs adaptive halt-polling in
kvm, it hurt performance in over-subscribe product environment,
polling in guest can make it worse.

Regards,
Wanpeng Li