linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ankur Arora <ankur.a.arora@oracle.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	kvm-devel <kvm@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Bandan Das <bsd@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH] sched: introduce configurable delay before entering idle
Date: Thu, 16 May 2019 19:06:30 -0700	[thread overview]
Message-ID: <265675b1-07e2-f5dd-6de8-5e47fa91be32@oracle.com> (raw)
In-Reply-To: <CANRm+CyrLneGkOXzEmGyB-Sr+DOqqDAF4eNB1YBpbhm3Edo3Gw@mail.gmail.com>

On 2019-05-15 6:07 p.m., Wanpeng Li wrote:
> On Thu, 16 May 2019 at 02:42, Ankur Arora <ankur.a.arora@oracle.com> wrote:
>>
>> On 5/14/19 6:50 AM, Marcelo Tosatti wrote:
>>> On Mon, May 13, 2019 at 05:20:37PM +0800, Wanpeng Li wrote:
>>>> On Wed, 8 May 2019 at 02:57, Marcelo Tosatti <mtosatti@redhat.com> wrote:
>>>>>
>>>>>
>>>>> Certain workloads perform poorly on KVM compared to baremetal
>>>>> due to baremetal's ability to perform mwait on NEED_RESCHED
>>>>> bit of task flags (therefore skipping the IPI).
>>>>
>>>> KVM supports expose mwait to the guest, if it can solve this?
>>>>
>>>> Regards,
>>>> Wanpeng Li
>>>
>>> Unfortunately mwait in guest is not feasible (uncompatible with multiple
>>> guests). Checking whether a paravirt solution is possible.
>>
>> Hi Marcelo,
>>
>> I was also looking at making MWAIT available to guests in a safe manner:
>> whether through emulation or a PV-MWAIT. My (unsolicited) thoughts
> 
> MWAIT emulation is not simple, here is a research
> https://www.contrib.andrew.cmu.edu/~somlo/OSXKVM/mwait.html
Agreed. I had outlined my attempt to do that below and come
to the conclusion that we would need a PV-MWAIT.

Ankur

> 
> Regards,
> Wanpeng Li
> 
>> follow.
>>
>> We basically want to handle this sequence:
>>
>>       monitor(monitor_address);
>>       if (*monitor_address == base_value)
>>            mwaitx(max_delay);
>>
>> Emulation seems problematic because, AFAICS this would happen:
>>
>>       guest                                   hypervisor
>>       =====                                   ====
>>
>>       monitor(monitor_address);
>>           vmexit  ===>                        monitor(monitor_address)
>>       if (*monitor_address == base_value)
>>            mwait();
>>                 vmexit    ====>               mwait()
>>
>> There's a context switch back to the guest in this sequence which seems
>> problematic. Both the AMD and Intel specs list system calls and
>> far calls as events which would lead to the MWAIT being woken up:
>> "Voluntary transitions due to fast system call and far calls (occurring
>> prior to issuing MWAIT but after setting the monitor)".
>>
>>
>> We could do this instead:
>>
>>       guest                                   hypervisor
>>       =====                                   ====
>>
>>       monitor(monitor_address);
>>           vmexit  ===>                        cache monitor_address
>>       if (*monitor_address == base_value)
>>            mwait();
>>                 vmexit    ====>              monitor(monitor_address)
>>                                              mwait()
>>
>> But, this would miss the "if (*monitor_address == base_value)" check in
>> the host which is problematic if *monitor_address changed simultaneously
>> when monitor was executed.
>> (Similar problem if we cache both the monitor_address and
>> *monitor_address.)
>>
>>
>> So, AFAICS, the only thing that would work is the guest offloading the
>> whole PV-MWAIT operation.
>>
>> AFAICS, that could be a paravirt operation which needs three parameters:
>> (monitor_address, base_value, max_delay.)
>>
>> This would allow the guest to offload this whole operation to
>> the host:
>>       monitor(monitor_address);
>>       if (*monitor_address == base_value)
>>            mwaitx(max_delay);
>>
>> I'm guessing you are thinking on similar lines?
>>
>>
>> High level semantics: If the CPU doesn't have any runnable threads, then
>> we actually do this version of PV-MWAIT -- arming a timer if necessary
>> so we only sleep until the time-slice expires or the MWAIT max_delay does.
>>
>> If the CPU has any runnable threads then this could still finish its
>> time-quanta or we could just do a schedule-out.
>>
>>
>> So the semantics guaranteed to the host would be that PV-MWAIT returns
>> after >= max_delay OR with the *monitor_address changed.
>>
>>
>>
>> Ankur


      reply	other threads:[~2019-05-17  2:07 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-07 18:56 [PATCH] sched: introduce configurable delay before entering idle Marcelo Tosatti
2019-05-07 22:15 ` Peter Zijlstra
2019-05-07 23:44   ` Marcelo Tosatti
2019-05-13  9:20 ` Wanpeng Li
2019-05-13 11:31   ` Konrad Rzeszutek Wilk
2019-05-13 11:51     ` Raslan, KarimAllah
2019-05-13 12:30       ` Boris Ostrovsky
2019-05-15  1:45       ` Wanpeng Li
2019-05-14 13:50   ` Marcelo Tosatti
2019-05-14 15:20     ` Konrad Rzeszutek Wilk
2019-05-14 17:42       ` Marcelo Tosatti
2019-05-15  1:42         ` Wanpeng Li
2019-05-15 20:26           ` Marcelo Tosatti
2019-05-15 18:42     ` Ankur Arora
2019-05-15 20:43       ` Marcelo Tosatti
2019-05-17  4:32         ` Ankur Arora
2019-05-17 17:49           ` Marcelo Tosatti
2019-05-16  1:07       ` Wanpeng Li
2019-05-17  2:06         ` Ankur Arora [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=265675b1-07e2-f5dd-6de8-5e47fa91be32@oracle.com \
    --to=ankur.a.arora@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=bsd@redhat.com \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).