All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wanpeng Li <kernellwp@gmail.com>
To: Ankur Arora <ankur.a.arora@oracle.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	kvm-devel <kvm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Bandan Das <bsd@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] sched: introduce configurable delay before entering idle
Date: Thu, 16 May 2019 09:07:32 +0800	[thread overview]
Message-ID: <CANRm+CyrLneGkOXzEmGyB-Sr+DOqqDAF4eNB1YBpbhm3Edo3Gw@mail.gmail.com> (raw)
In-Reply-To: <7e390fef-e0df-963f-4e18-e44ac2766be3@oracle.com>

On Thu, 16 May 2019 at 02:42, Ankur Arora <ankur.a.arora@oracle.com> wrote:
>
> On 5/14/19 6:50 AM, Marcelo Tosatti wrote:
> > On Mon, May 13, 2019 at 05:20:37PM +0800, Wanpeng Li wrote:
> >> On Wed, 8 May 2019 at 02:57, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> >>>
> >>>
> >>> Certain workloads perform poorly on KVM compared to baremetal
> >>> due to baremetal's ability to perform mwait on NEED_RESCHED
> >>> bit of task flags (therefore skipping the IPI).
> >>
> >> KVM supports expose mwait to the guest, if it can solve this?
> >>
> >> Regards,
> >> Wanpeng Li
> >
> > Unfortunately mwait in guest is not feasible (uncompatible with multiple
> > guests). Checking whether a paravirt solution is possible.
>
> Hi Marcelo,
>
> I was also looking at making MWAIT available to guests in a safe manner:
> whether through emulation or a PV-MWAIT. My (unsolicited) thoughts

MWAIT emulation is not simple, here is a research
https://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/mwait.html

Regards,
Wanpeng Li

> follow.
>
> We basically want to handle this sequence:
>
>      monitor(monitor_address);
>      if (*monitor_address == base_value)
>           mwaitx(max_delay);
>
> Emulation seems problematic because, AFAICS this would happen:
>
>      guest                                   hypervisor
>      =====                                   ====
>
>      monitor(monitor_address);
>          vmexit  ===>                        monitor(monitor_address)
>      if (*monitor_address == base_value)
>           mwait();
>                vmexit    ====>               mwait()
>
> There's a context switch back to the guest in this sequence which seems
> problematic. Both the AMD and Intel specs list system calls and
> far calls as events which would lead to the MWAIT being woken up:
> "Voluntary transitions due to fast system call and far calls (occurring
> prior to issuing MWAIT but after setting the monitor)".
>
>
> We could do this instead:
>
>      guest                                   hypervisor
>      =====                                   ====
>
>      monitor(monitor_address);
>          vmexit  ===>                        cache monitor_address
>      if (*monitor_address == base_value)
>           mwait();
>                vmexit    ====>              monitor(monitor_address)
>                                             mwait()
>
> But, this would miss the "if (*monitor_address == base_value)" check in
> the host which is problematic if *monitor_address changed simultaneously
> when monitor was executed.
> (Similar problem if we cache both the monitor_address and
> *monitor_address.)
>
>
> So, AFAICS, the only thing that would work is the guest offloading the
> whole PV-MWAIT operation.
>
> AFAICS, that could be a paravirt operation which needs three parameters:
> (monitor_address, base_value, max_delay.)
>
> This would allow the guest to offload this whole operation to
> the host:
>      monitor(monitor_address);
>      if (*monitor_address == base_value)
>           mwaitx(max_delay);
>
> I'm guessing you are thinking on similar lines?
>
>
> High level semantics: If the CPU doesn't have any runnable threads, then
> we actually do this version of PV-MWAIT -- arming a timer if necessary
> so we only sleep until the time-slice expires or the MWAIT max_delay does.
>
> If the CPU has any runnable threads then this could still finish its
> time-quanta or we could just do a schedule-out.
>
>
> So the semantics guaranteed to the host would be that PV-MWAIT returns
> after >= max_delay OR with the *monitor_address changed.
>
>
>
> Ankur

  parent reply	other threads:[~2019-05-16  1:47 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-07 18:56 [PATCH] sched: introduce configurable delay before entering idle Marcelo Tosatti
2019-05-07 22:15 ` Peter Zijlstra
2019-05-07 23:44   ` Marcelo Tosatti
2019-05-13  9:20 ` Wanpeng Li
2019-05-13 11:31   ` Konrad Rzeszutek Wilk
2019-05-13 11:51     ` Raslan, KarimAllah
2019-05-13 12:30       ` Boris Ostrovsky
2019-05-15  1:45       ` Wanpeng Li
2019-05-14 13:50   ` Marcelo Tosatti
2019-05-14 15:20     ` Konrad Rzeszutek Wilk
2019-05-14 17:42       ` Marcelo Tosatti
2019-05-15  1:42         ` Wanpeng Li
2019-05-15 20:26           ` Marcelo Tosatti
2019-05-15 18:42     ` Ankur Arora
2019-05-15 20:43       ` Marcelo Tosatti
2019-05-17  4:32         ` Ankur Arora
2019-05-17 17:49           ` Marcelo Tosatti
2019-05-16  1:07       ` Wanpeng Li [this message]
2019-05-17  2:06         ` Ankur Arora

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANRm+CyrLneGkOXzEmGyB-Sr+DOqqDAF4eNB1YBpbhm3Edo3Gw@mail.gmail.com \
    --to=kernellwp@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=ankur.a.arora@oracle.com \
    --cc=bsd@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.