linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	kvm-devel <kvm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Bandan Das <bsd@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] sched: introduce configurable delay before entering idle
Date: Wed, 15 May 2019 17:26:20 -0300	[thread overview]
Message-ID: <20190515202618.GA31128@amt.cnet> (raw)
In-Reply-To: <CANRm+CytV7PfS++RnYU0P3HT_QBufrO=bzd6Fx-7dC2=sotvmA@mail.gmail.com>

On Wed, May 15, 2019 at 09:42:48AM +0800, Wanpeng Li wrote:
> On Wed, 15 May 2019 at 02:20, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >
> > On Tue, May 14, 2019 at 11:20:15AM -0400, Konrad Rzeszutek Wilk wrote:
> > > On Tue, May 14, 2019 at 10:50:23AM -0300, Marcelo Tosatti wrote:
> > > > On Mon, May 13, 2019 at 05:20:37PM +0800, Wanpeng Li wrote:
> > > > > On Wed, 8 May 2019 at 02:57, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > > > > >
> > > > > >
> > > > > > Certain workloads perform poorly on KVM compared to baremetal
> > > > > > due to baremetal's ability to perform mwait on NEED_RESCHED
> > > > > > bit of task flags (therefore skipping the IPI).
> > > > >
> > > > > KVM supports expose mwait to the guest, if it can solve this?
> > > > >
> > > > > Regards,
> > > > > Wanpeng Li
> > > >
> > > > Unfortunately mwait in guest is not feasible (uncompatible with multiple
> > > > guests). Checking whether a paravirt solution is possible.
> > >
> > > There is the obvious problem with that the guest can be malicious and
> > > provide via the paravirt solution bogus data. That is it expose 0% CPU
> > > usage but in reality be mining and using 100%.
> >
> > The idea is to have a hypercall for the guest to perform the
> > need_resched=1 bit set. It can only hurt itself.
> 
> This lets me recall the patchset from aliyun
> https://lkml.org/lkml/2017/6/22/296 

Thanks for the pointer.

"The background is that we(Alibaba Cloud) do get more and more
complaints from our customers in both KVM and Xen compare to bare-mental.
After investigations, the root cause is known to us: big cost in message 
passing workload(David show it in KVM forum 2015) 

A typical message workload like below: 
vcpu 0                             vcpu 1 
1. send ipi                     2.  doing hlt 
3. go into idle                 4.  receive ipi and wake up from hlt 
5. write APIC time twice        6.  write APIC time twice to 
    to stop sched timer              reprogram sched timer 
7. doing hlt                    8.  handle task and send ipi to 
                                     vcpu 0 
9. same to 4.                   10. same to 3"

This is very similar to the client/server example pair 
included in the first message.

 
> They poll after
> __current_set_polling() in do_idle() so avoid this hypercall I think.

Yes, i was thinking about a variant without poll.

> Btw, do you get SAP HANA by 5-10% bonus even if adaptive halt-polling
> is enabled?

host			   = 31.18 
halt_poll_ns set to 200000 = 38.55	(80%)
halt_poll_ns set to 300000 = 33.28	(93%)
idle_spin set to 220000 = 32.22 	(96%)

So avoiding the IPI VM-exits is faster. 

300000 is the optimal value vfor this workload. Haven't checked
adaptive halt-polling.


  reply	other threads:[~2019-05-15 20:26 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-07 18:56 [PATCH] sched: introduce configurable delay before entering idle Marcelo Tosatti
2019-05-07 22:15 ` Peter Zijlstra
2019-05-07 23:44   ` Marcelo Tosatti
2019-05-13  9:20 ` Wanpeng Li
2019-05-13 11:31   ` Konrad Rzeszutek Wilk
2019-05-13 11:51     ` Raslan, KarimAllah
2019-05-13 12:30       ` Boris Ostrovsky
2019-05-15  1:45       ` Wanpeng Li
2019-05-14 13:50   ` Marcelo Tosatti
2019-05-14 15:20     ` Konrad Rzeszutek Wilk
2019-05-14 17:42       ` Marcelo Tosatti
2019-05-15  1:42         ` Wanpeng Li
2019-05-15 20:26           ` Marcelo Tosatti [this message]
2019-05-15 18:42     ` Ankur Arora
2019-05-15 20:43       ` Marcelo Tosatti
2019-05-17  4:32         ` Ankur Arora
2019-05-17 17:49           ` Marcelo Tosatti
2019-05-16  1:07       ` Wanpeng Li
2019-05-17  2:06         ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190515202618.GA31128@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=bsd@redhat.com \
    --cc=kernellwp@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).