stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Waiman Long <longman@redhat.com>
To: Wanpeng Li <kernellwp@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>, kvm <kvm@vger.kernel.org>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Radim Krčmář" <rkrcmar@redhat.com>,
	"Sean Christopherson" <sean.j.christopherson@intel.com>,
	"Vitaly Kuznetsov" <vkuznets@redhat.com>,
	"Wanpeng Li" <wanpengli@tencent.com>,
	"Jim Mattson" <jmattson@google.com>,
	"Joerg Roedel" <joro@8bytes.org>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Ingo Molnar" <mingo@kernel.org>,
	loobinliu@tencent.com, "# v3 . 10+" <stable@vger.kernel.org>
Subject: Re: [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted"
Date: Wed, 11 Sep 2019 05:25:53 +0100	[thread overview]
Message-ID: <2dda32db-5662-f7a6-f52d-b835df1f45f1@redhat.com> (raw)
In-Reply-To: <CANRm+CxVXsQCmEpxNJSifmQJk5cqoSifFq+huHJE1s7a-=0iXw@mail.gmail.com>

On 9/10/19 6:56 AM, Wanpeng Li wrote:
> On Mon, 9 Sep 2019 at 18:56, Waiman Long <longman@redhat.com> wrote:
>> On 9/9/19 2:40 AM, Wanpeng Li wrote:
>>> From: Wanpeng Li <wanpengli@tencent.com>
>>>
>>> This patch reverts commit 75437bb304b20 (locking/pvqspinlock: Don't wait if
>>> vCPU is preempted), we found great regression caused by this commit.
>>>
>>> Xeon Skylake box, 2 sockets, 40 cores, 80 threads, three VMs, each is 80 vCPUs.
>>> The score of ebizzy -M can reduce from 13000-14000 records/s to 1700-1800
>>> records/s with this commit.
>>>
>>>           Host                       Guest                score
>>>
>>> vanilla + w/o kvm optimizes     vanilla               1700-1800 records/s
>>> vanilla + w/o kvm optimizes     vanilla + revert      13000-14000 records/s
>>> vanilla + w/ kvm optimizes      vanilla               4500-5000 records/s
>>> vanilla + w/ kvm optimizes      vanilla + revert      14000-15500 records/s
>>>
>>> Exit from aggressive wait-early mechanism can result in yield premature and
>>> incur extra scheduling latency in over-subscribe scenario.
>>>
>>> kvm optimizes:
>>> [1] commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts)
>>> [2] commit 266e85a5ec9 (KVM: X86: Boost queue head vCPU to mitigate lock waiter preemption)
>>>
>>> Tested-by: loobinliu@tencent.com
>>> Cc: Peter Zijlstra <peterz@infradead.org>
>>> Cc: Thomas Gleixner <tglx@linutronix.de>
>>> Cc: Ingo Molnar <mingo@kernel.org>
>>> Cc: Waiman Long <longman@redhat.com>
>>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Cc: loobinliu@tencent.com
>>> Cc: stable@vger.kernel.org
>>> Fixes: 75437bb304b20 (locking/pvqspinlock: Don't wait if vCPU is preempted)
>>> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
>>> ---
>>>  kernel/locking/qspinlock_paravirt.h | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/locking/qspinlock_paravirt.h b/kernel/locking/qspinlock_paravirt.h
>>> index 89bab07..e84d21a 100644
>>> --- a/kernel/locking/qspinlock_paravirt.h
>>> +++ b/kernel/locking/qspinlock_paravirt.h
>>> @@ -269,7 +269,7 @@ pv_wait_early(struct pv_node *prev, int loop)
>>>       if ((loop & PV_PREV_CHECK_MASK) != 0)
>>>               return false;
>>>
>>> -     return READ_ONCE(prev->state) != vcpu_running || vcpu_is_preempted(prev->cpu);
>>> +     return READ_ONCE(prev->state) != vcpu_running;
>>>  }
>>>
>>>  /*
>> There are several possibilities for this performance regression:
>>
>> 1) Multiple vcpus calling vcpu_is_preempted() repeatedly may cause some
>> cacheline contention issue depending on how that callback is implemented.
>>
>> 2) KVM may set the preempt flag for a short period whenver an vmexit
>> happens even if a vmenter is executed shortly after. In this case, we
>> may want to use a more durable vcpu suspend flag that indicates the vcpu
>> won't get a real vcpu back for a longer period of time.
>>
>> Perhaps you can add a lock event counter to count the number of
>> wait_early events caused by vcpu_is_preempted() being true to see if it
>> really cause a lot more wait_early than without the vcpu_is_preempted()
>> call.
> pv_wait_again:1:179
> pv_wait_early:1:189429
> pv_wait_head:1:263
> pv_wait_node:1:189429
> pv_vcpu_is_preempted:1:45588
> =========sleep 5============
> pv_wait_again:1:181
> pv_wait_early:1:202574
> pv_wait_head:1:267
> pv_wait_node:1:202590
> pv_vcpu_is_preempted:1:46336
>
> The sampling period is 5s, 6% of wait_early events caused by
> vcpu_is_preempted() being true.

6% isn't that high. However, when one vCPU voluntarily releases its
vCPU, all the subsequently waiters in the queue will do the same. It is
a cascading effect. Perhaps we wait early too aggressive with the
original patch.

I also look up the email chain of the original commit. The patch
submitter did not provide any performance data to support this change.
The patch just looked reasonable at that time. So there was no
objection. Given that we now have hard evidence that this was not a good
idea. I think we should revert it.

Reviewed-by: Waiman Long <longman@redhat.com>

Thanks,
Longman


  reply	other threads:[~2019-09-11  4:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-09  1:40 [PATCH] Revert "locking/pvqspinlock: Don't wait if vCPU is preempted" Wanpeng Li
2019-09-09 10:56 ` Waiman Long
2019-09-09 11:06   ` Paolo Bonzini
2019-09-09 12:16     ` Wanpeng Li
2019-09-10  5:56   ` Wanpeng Li
2019-09-11  4:25     ` Waiman Long [this message]
2019-09-11 13:04       ` Paolo Bonzini
2019-09-25  3:15         ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2dda32db-5662-f7a6-f52d-b835df1f45f1@redhat.com \
    --to=longman@redhat.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kernellwp@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=loobinliu@tencent.com \
    --cc=mingo@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rkrcmar@redhat.com \
    --cc=sean.j.christopherson@intel.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).