All of lore.kernel.org
 help / color / mirror / Atom feed
From: Wanpeng Li <kernellwp@gmail.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>,
	LKML <linux-kernel@vger.kernel.org>, kvm <kvm@vger.kernel.org>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>
Subject: Re: [PATCH] KVM: Boost vCPU candidiate in user mode which is delivering interrupt
Date: Tue, 20 Apr 2021 16:48:34 +0800	[thread overview]
Message-ID: <CANRm+Czysw6z1u+fbsRF3JUyiJc0jErVATusar_Vj8CcSBy5LQ@mail.gmail.com> (raw)
In-Reply-To: <b2fca9a5-9b2b-b8f2-0d1e-fc8b9d9b5659@redhat.com>

On Tue, 20 Apr 2021 at 15:23, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 20/04/21 08:08, Wanpeng Li wrote:
> > On Tue, 20 Apr 2021 at 14:02, Wanpeng Li <kernellwp@gmail.com> wrote:
> >>
> >> On Tue, 20 Apr 2021 at 00:59, Paolo Bonzini <pbonzini@redhat.com> wrote:
> >>>
> >>> On 19/04/21 18:32, Sean Christopherson wrote:
> >>>> If false positives are a big concern, what about adding another pass to the loop
> >>>> and only yielding to usermode vCPUs with interrupts in the second full pass?
> >>>> I.e. give vCPUs that are already in kernel mode priority, and only yield to
> >>>> handle an interrupt if there are no vCPUs in kernel mode.
> >>>>
> >>>> kvm_arch_dy_runnable() pulls in pv_unhalted, which seems like a good thing.
> >>>
> >>> pv_unhalted won't help if you're waiting for a kernel spinlock though,
> >>> would it?  Doing two passes (or looking for a "best" candidate that
> >>> prefers kernel mode vCPUs to user mode vCPUs waiting for an interrupt)
> >>> seems like the best choice overall.
> >>
> >> How about something like this:
>
> I was thinking of something simpler:
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 9b8e30dd5b9b..455c648f9adc 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3198,10 +3198,9 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
>   {
>         struct kvm *kvm = me->kvm;
>         struct kvm_vcpu *vcpu;
> -       int last_boosted_vcpu = me->kvm->last_boosted_vcpu;
>         int yielded = 0;
>         int try = 3;
> -       int pass;
> +       int pass, num_passes = 1;
>         int i;
>
>         kvm_vcpu_set_in_spin_loop(me, true);
> @@ -3212,13 +3211,14 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
>          * VCPU is holding the lock that we need and will release it.
>          * We approximate round-robin by starting at the last boosted VCPU.
>          */
> -       for (pass = 0; pass < 2 && !yielded && try; pass++) {
> -               kvm_for_each_vcpu(i, vcpu, kvm) {
> -                       if (!pass && i <= last_boosted_vcpu) {
> -                               i = last_boosted_vcpu;
> -                               continue;
> -                       } else if (pass && i > last_boosted_vcpu)
> -                               break;
> +       for (pass = 0; pass < num_passes; pass++) {
> +               int idx = me->kvm->last_boosted_vcpu;
> +               int n = atomic_read(&kvm->online_vcpus);
> +               for (i = 0; i < n; i++, idx++) {
> +                       if (idx == n)
> +                               idx = 0;
> +
> +                       vcpu = kvm_get_vcpu(kvm, idx);
>                         if (!READ_ONCE(vcpu->ready))
>                                 continue;
>                         if (vcpu == me)
> @@ -3226,23 +3226,36 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode)
>                         if (rcuwait_active(&vcpu->wait) &&
>                             !vcpu_dy_runnable(vcpu))
>                                 continue;
> -                       if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
> -                               !kvm_arch_vcpu_in_kernel(vcpu))
> -                               continue;
>                         if (!kvm_vcpu_eligible_for_directed_yield(vcpu))
>                                 continue;
>
> +                       if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode &&
> +                           !kvm_arch_vcpu_in_kernel(vcpu)) {
> +                           /*
> +                            * A vCPU running in userspace can get to kernel mode via
> +                            * an interrupt.  That's a worse choice than a CPU already
> +                            * in kernel mode so only do it on a second pass.
> +                            */
> +                           if (!vcpu_dy_runnable(vcpu))
> +                                   continue;
> +                           if (pass == 0) {
> +                                   num_passes = 2;
> +                                   continue;
> +                           }
> +                       }
> +
>                         yielded = kvm_vcpu_yield_to(vcpu);
>                         if (yielded > 0) {
>                                 kvm->last_boosted_vcpu = i;
> -                               break;
> +                               goto done;
>                         } else if (yielded < 0) {
>                                 try--;
>                                 if (!try)
> -                                       break;
> +                                       goto done;
>                         }
>                 }
>         }
> +done:

We just tested the above post against 96 vCPUs VM in an over-subscribe
scenario, the score of pbzip2 fluctuated drastically. Sometimes it is
worse than vanilla, but the average improvement is around 2.2%. The
new version of my post is around 9.3%,the origial posted patch is
around 10% which is totally as expected since now both IPI receivers
in user-mode and lock-waiters are second class citizens. Big VM
increases the probability multiple vCPUs may enter PLE handler, the
previous vCPU who starts searching earlier can mark IPI receivers in
user-mode as dy_eligible, the vCPU who starts searching a little later
can select it directly. However, after the above posting, the
PLE-caused vCPU should search the second full pass by himself.

    Wanpeng

  reply	other threads:[~2021-04-20  8:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-16  3:08 [PATCH] KVM: Boost vCPU candidiate in user mode which is delivering interrupt Wanpeng Li
2021-04-17 13:09 ` Paolo Bonzini
2021-04-19  7:34   ` Wanpeng Li
2021-04-19 16:32     ` Sean Christopherson
2021-04-19 16:59       ` Paolo Bonzini
2021-04-20  6:02         ` Wanpeng Li
2021-04-20  6:08           ` Wanpeng Li
2021-04-20  7:22             ` Paolo Bonzini
2021-04-20  8:48               ` Wanpeng Li [this message]
2021-04-20 10:23                 ` Paolo Bonzini
2021-04-20 10:27                   ` Wanpeng Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CANRm+Czysw6z1u+fbsRF3JUyiJc0jErVATusar_Vj8CcSBy5LQ@mail.gmail.com \
    --to=kernellwp@gmail.com \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.