linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* The vcpu won't be wakened for a long time
@ 2021-12-14 13:55 Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  2021-12-14 17:36 ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) @ 2021-12-14 13:55 UTC (permalink / raw)
  To: pbonzini, kvm
  Cc: Gonglei (Arei),
	Huangzhichao, seanjc, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel

Hi guys,

We find a problem in kvm_vcpu_block().

The testcase is:
 - VM configured with 1 vcpu and 1 VF (using vfio-pci passthrough)
 - the vfio interrupt and the vcpu are bound to the same pcpu
 - using remapped mode IRTE, NOT posted mode

The bug was triggered when the vcpu executed HLT instruction:

kvm_vcpu_block:
    prepare_to_rcuwait(&vcpu->wait);
    for (;;) {
        set_current_state(TASK_INTERRUPTIBLE);

        if (kvm_vcpu_check_block(vcpu) < 0)
            break;
					<------------ (*)
        waited = true;
        schedule();
    }
    finish_rcuwait(&vcpu->wait);

The vcpu will go to sleep even if an interrupt from the VF is fired at (*) and
the PIR and ON bit will be set ( in vmx_deliver_posted_interrupt ), so the vcpu
won't be wakened by subsequent interrupts.

Any suggestions ? Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The vcpu won't be wakened for a long time
  2021-12-14 13:55 The vcpu won't be wakened for a long time Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
@ 2021-12-14 17:36 ` Sean Christopherson
  2021-12-16 14:03   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  0 siblings, 1 reply; 11+ messages in thread
From: Sean Christopherson @ 2021-12-14 17:36 UTC (permalink / raw)
  To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  Cc: pbonzini, kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel

On Tue, Dec 14, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
> Hi guys,
> 
> We find a problem in kvm_vcpu_block().
> 
> The testcase is:
>  - VM configured with 1 vcpu and 1 VF (using vfio-pci passthrough)
>  - the vfio interrupt and the vcpu are bound to the same pcpu
>  - using remapped mode IRTE, NOT posted mode

What exactly is configured to force remapped mode?

> The bug was triggered when the vcpu executed HLT instruction:
> 
> kvm_vcpu_block:
>     prepare_to_rcuwait(&vcpu->wait);
>     for (;;) {
>         set_current_state(TASK_INTERRUPTIBLE);
> 
>         if (kvm_vcpu_check_block(vcpu) < 0)
>             break;
> 					<------------ (*)
>         waited = true;
>         schedule();
>     }
>     finish_rcuwait(&vcpu->wait);
> 
> The vcpu will go to sleep even if an interrupt from the VF is fired at (*) and
> the PIR and ON bit will be set ( in vmx_deliver_posted_interrupt ), so the vcpu
> won't be wakened by subsequent interrupts.
> 
> Any suggestions ? Thanks.

What kernel version?  There have been a variety of fixes/changes in the area in
recent kernels.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: The vcpu won't be wakened for a long time
  2021-12-14 17:36 ` Sean Christopherson
@ 2021-12-16 14:03   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  2021-12-16 15:42     ` Sean Christopherson
  0 siblings, 1 reply; 11+ messages in thread
From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) @ 2021-12-16 14:03 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel

Hi Sean,

> -----Original Message-----
> From: Sean Christopherson [mailto:seanjc@google.com]
> Sent: Wednesday, December 15, 2021 1:36 AM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>
> Cc: pbonzini@redhat.com; kvm@vger.kernel.org; Gonglei (Arei)
> <arei.gonglei@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; Wanpeng Li
> <wanpengli@tencent.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Jim Mattson
> <jmattson@google.com>; Joerg Roedel <joro@8bytes.org>; linux-kernel
> <linux-kernel@vger.kernel.org>
> Subject: Re: The vcpu won't be wakened for a long time
> 
> On Tue, Dec 14, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> wrote:
> > Hi guys,
> >
> > We find a problem in kvm_vcpu_block().
> >
> > The testcase is:
> >  - VM configured with 1 vcpu and 1 VF (using vfio-pci passthrough)
> >  - the vfio interrupt and the vcpu are bound to the same pcpu
> >  - using remapped mode IRTE, NOT posted mode
> 
> What exactly is configured to force remapped mode?
> 

It's a misconfigure in one of our test machines.

> > The bug was triggered when the vcpu executed HLT instruction:
> >
> > kvm_vcpu_block:
> >     prepare_to_rcuwait(&vcpu->wait);
> >     for (;;) {
> >         set_current_state(TASK_INTERRUPTIBLE);
> >
> >         if (kvm_vcpu_check_block(vcpu) < 0)
> >             break;
> > 					<------------ (*)
> >         waited = true;
> >         schedule();
> >     }
> >     finish_rcuwait(&vcpu->wait);
> >
> > The vcpu will go to sleep even if an interrupt from the VF is fired at (*)
> and
> > the PIR and ON bit will be set ( in vmx_deliver_posted_interrupt ), so the
> vcpu
> > won't be wakened by subsequent interrupts.
> >
> > Any suggestions ? Thanks.
> 
> What kernel version?  There have been a variety of fixes/changes in the area
> in
> recent kernels.

The kernel version is 4.18, and it seems the latest kernel also has this problem.

The following code can fixes this bug, I've tested it on 4.18.

(4.18)

@@ -3944,6 +3944,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
        if (pi_test_and_set_on(&vmx->pi_desc))
                return;
 
+       if (swq_has_sleeper(kvm_arch_vcpu_wq(vcpu))) {
+               kvm_vcpu_kick(vcpu);
+               return;
+       }
+
        if (vcpu != kvm_get_running_vcpu() &&
                !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
                kvm_vcpu_kick(vcpu);


(latest)

@@ -3959,6 +3959,11 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
        if (pi_test_and_set_on(&vmx->pi_desc))
                return 0;
 
+       if (rcuwait_active(&vcpu->wait)) {
+               kvm_vcpu_kick(vcpu);
+               return 0;
+       }
+
        if (vcpu != kvm_get_running_vcpu() &&
            !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
                kvm_vcpu_kick(vcpu);

Do you have any suggestions ?
Thnaks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The vcpu won't be wakened for a long time
  2021-12-16 14:03   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
@ 2021-12-16 15:42     ` Sean Christopherson
  2021-12-17  2:11       ` Wanpeng Li
  2021-12-18  9:08       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  0 siblings, 2 replies; 11+ messages in thread
From: Sean Christopherson @ 2021-12-16 15:42 UTC (permalink / raw)
  To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  Cc: pbonzini, kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel

On Thu, Dec 16, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
> > What kernel version?  There have been a variety of fixes/changes in the
> > area in recent kernels.
> 
> The kernel version is 4.18, and it seems the latest kernel also has this problem.
> 
> The following code can fixes this bug, I've tested it on 4.18.
> 
> (4.18)
> 
> @@ -3944,6 +3944,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>         if (pi_test_and_set_on(&vmx->pi_desc))
>                 return;
>  
> +       if (swq_has_sleeper(kvm_arch_vcpu_wq(vcpu))) {
> +               kvm_vcpu_kick(vcpu);
> +               return;
> +       }
> +
>         if (vcpu != kvm_get_running_vcpu() &&
>                 !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
>                 kvm_vcpu_kick(vcpu);
> 
> 
> (latest)
> 
> @@ -3959,6 +3959,11 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>         if (pi_test_and_set_on(&vmx->pi_desc))
>                 return 0;
>  
> +       if (rcuwait_active(&vcpu->wait)) {
> +               kvm_vcpu_kick(vcpu);
> +               return 0;
> +       }
> +
>         if (vcpu != kvm_get_running_vcpu() &&
>             !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
>                 kvm_vcpu_kick(vcpu);
> 
> Do you have any suggestions ?

Hmm, that strongly suggests the "vcpu != kvm_get_running_vcpu()" is at fault.
Can you try running with the below commit?  It's currently sitting in kvm/queue,
but not marked for stable because I didn't think it was possible for the check
to a cause a missed wake event in KVM's current code base.

commit 6a8110fea2c1b19711ac1ef718680dfd940363c6
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Dec 8 01:52:27 2021 +0000

    KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU

    Drop a check that guards triggering a posted interrupt on the currently
    running vCPU, and more importantly guards waking the target vCPU if
    triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
    The "do nothing" logic when "vcpu == running_vcpu" works only because KVM
    doesn't have a path to ->deliver_posted_interrupt() from asynchronous
    context, e.g. if apic_timer_expired() were changed to always go down the
    posted interrupt path for APICv, or if the IN_GUEST_MODE check in
    kvm_use_posted_timer_interrupt() were dropped, and the hrtimer fired in
    kvm_vcpu_block() after the final kvm_vcpu_check_block() check, the vCPU
    would be scheduled() out without being awakened, i.e. would "miss" the
    timer interrupt.

    One could argue that invoking kvm_apic_local_deliver() from (soft) IRQ
    context for the current running vCPU should be illegal, but nothing in
    KVM actually enforces that rules.  There's also no strong obvious benefit
    to making such behavior illegal, e.g. checking IN_GUEST_MODE and calling
    kvm_vcpu_wake_up() is at worst marginally more costly than querying the
    current running vCPU.

    Lastly, this aligns the non-nested and nested usage of triggering posted
    interrupts, and will allow for additional cleanups.

    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20211208015236.1616697-18-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 38749063da0e..f61a6348cffd 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -3995,8 +3995,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
         * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
         * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
         */
-       if (vcpu != kvm_get_running_vcpu() &&
-           !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
+       if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
                kvm_vcpu_wake_up(vcpu);

        return 0;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: The vcpu won't be wakened for a long time
  2021-12-16 15:42     ` Sean Christopherson
@ 2021-12-17  2:11       ` Wanpeng Li
  2021-12-17  5:51         ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  2021-12-18  9:08       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  1 sibling, 1 reply; 11+ messages in thread
From: Wanpeng Li @ 2021-12-17  2:11 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.),
	pbonzini, kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel

On Fri, 17 Dec 2021 at 07:48, Sean Christopherson <seanjc@google.com> wrote:
>
> On Thu, Dec 16, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
> > > What kernel version?  There have been a variety of fixes/changes in the
> > > area in recent kernels.
> >
> > The kernel version is 4.18, and it seems the latest kernel also has this problem.
> >
> > The following code can fixes this bug, I've tested it on 4.18.
> >
> > (4.18)
> >
> > @@ -3944,6 +3944,11 @@ static void vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
> >         if (pi_test_and_set_on(&vmx->pi_desc))
> >                 return;
> >
> > +       if (swq_has_sleeper(kvm_arch_vcpu_wq(vcpu))) {
> > +               kvm_vcpu_kick(vcpu);
> > +               return;
> > +       }
> > +
> >         if (vcpu != kvm_get_running_vcpu() &&
> >                 !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> >                 kvm_vcpu_kick(vcpu);
> >
> >
> > (latest)
> >
> > @@ -3959,6 +3959,11 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
> >         if (pi_test_and_set_on(&vmx->pi_desc))
> >                 return 0;
> >
> > +       if (rcuwait_active(&vcpu->wait)) {
> > +               kvm_vcpu_kick(vcpu);
> > +               return 0;
> > +       }
> > +
> >         if (vcpu != kvm_get_running_vcpu() &&
> >             !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> >                 kvm_vcpu_kick(vcpu);
> >
> > Do you have any suggestions ?
>
> Hmm, that strongly suggests the "vcpu != kvm_get_running_vcpu()" is at fault.

This was introduced in 5.8-rc1, however, his kernel version is 4.18.

> Can you try running with the below commit?  It's currently sitting in kvm/queue,
> but not marked for stable because I didn't think it was possible for the check
> to a cause a missed wake event in KVM's current code base.
>
> commit 6a8110fea2c1b19711ac1ef718680dfd940363c6
> Author: Sean Christopherson <seanjc@google.com>
> Date:   Wed Dec 8 01:52:27 2021 +0000
>
>     KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU
>
>     Drop a check that guards triggering a posted interrupt on the currently
>     running vCPU, and more importantly guards waking the target vCPU if
>     triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
>     The "do nothing" logic when "vcpu == running_vcpu" works only because KVM
>     doesn't have a path to ->deliver_posted_interrupt() from asynchronous
>     context, e.g. if apic_timer_expired() were changed to always go down the
>     posted interrupt path for APICv, or if the IN_GUEST_MODE check in
>     kvm_use_posted_timer_interrupt() were dropped, and the hrtimer fired in
>     kvm_vcpu_block() after the final kvm_vcpu_check_block() check, the vCPU
>     would be scheduled() out without being awakened, i.e. would "miss" the
>     timer interrupt.
>
>     One could argue that invoking kvm_apic_local_deliver() from (soft) IRQ
>     context for the current running vCPU should be illegal, but nothing in
>     KVM actually enforces that rules.  There's also no strong obvious benefit
>     to making such behavior illegal, e.g. checking IN_GUEST_MODE and calling
>     kvm_vcpu_wake_up() is at worst marginally more costly than querying the
>     current running vCPU.
>
>     Lastly, this aligns the non-nested and nested usage of triggering posted
>     interrupts, and will allow for additional cleanups.
>
>     Signed-off-by: Sean Christopherson <seanjc@google.com>
>     Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>     Message-Id: <20211208015236.1616697-18-seanjc@google.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 38749063da0e..f61a6348cffd 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3995,8 +3995,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu *vcpu, int vector)
>          * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
>          * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
>          */
> -       if (vcpu != kvm_get_running_vcpu() &&
> -           !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> +       if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
>                 kvm_vcpu_wake_up(vcpu);
>
>         return 0;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: The vcpu won't be wakened for a long time
  2021-12-17  2:11       ` Wanpeng Li
@ 2021-12-17  5:51         ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  0 siblings, 0 replies; 11+ messages in thread
From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) @ 2021-12-17  5:51 UTC (permalink / raw)
  To: Wanpeng Li, Sean Christopherson
  Cc: pbonzini, kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel



> -----Original Message-----
> From: Wanpeng Li [mailto:kernellwp@gmail.com]
> Sent: Friday, December 17, 2021 10:12 AM
> To: Sean Christopherson <seanjc@google.com>
> Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>; pbonzini@redhat.com; kvm@vger.kernel.org; Gonglei
> (Arei) <arei.gonglei@huawei.com>; Huangzhichao <huangzhichao@huawei.com>;
> Wanpeng Li <wanpengli@tencent.com>; Vitaly Kuznetsov <vkuznets@redhat.com>;
> Jim Mattson <jmattson@google.com>; Joerg Roedel <joro@8bytes.org>;
> linux-kernel <linux-kernel@vger.kernel.org>
> Subject: Re: The vcpu won't be wakened for a long time
> 
> On Fri, 17 Dec 2021 at 07:48, Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Thu, Dec 16, 2021, Longpeng (Mike, Cloud Infrastructure Service Product
> Dept.) wrote:
> > > > What kernel version?  There have been a variety of fixes/changes in the
> > > > area in recent kernels.
> > >
> > > The kernel version is 4.18, and it seems the latest kernel also has this
> problem.
> > >
> > > The following code can fixes this bug, I've tested it on 4.18.
> > >
> > > (4.18)
> > >
> > > @@ -3944,6 +3944,11 @@ static void vmx_deliver_posted_interrupt(struct
> kvm_vcpu *vcpu, int vector)
> > >         if (pi_test_and_set_on(&vmx->pi_desc))
> > >                 return;
> > >
> > > +       if (swq_has_sleeper(kvm_arch_vcpu_wq(vcpu))) {
> > > +               kvm_vcpu_kick(vcpu);
> > > +               return;
> > > +       }
> > > +
> > >         if (vcpu != kvm_get_running_vcpu() &&
> > >                 !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> > >                 kvm_vcpu_kick(vcpu);
> > >
> > >
> > > (latest)
> > >
> > > @@ -3959,6 +3959,11 @@ static int vmx_deliver_posted_interrupt(struct
> kvm_vcpu *vcpu, int vector)
> > >         if (pi_test_and_set_on(&vmx->pi_desc))
> > >                 return 0;
> > >
> > > +       if (rcuwait_active(&vcpu->wait)) {
> > > +               kvm_vcpu_kick(vcpu);
> > > +               return 0;
> > > +       }
> > > +
> > >         if (vcpu != kvm_get_running_vcpu() &&
> > >             !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> > >                 kvm_vcpu_kick(vcpu);
> > >
> > > Do you have any suggestions ?
> >
> > Hmm, that strongly suggests the "vcpu != kvm_get_running_vcpu()" is at fault.
> 
> This was introduced in 5.8-rc1, however, his kernel version is 4.18.
> 

Do you mean the following commit ?

```
While optimizing posted-interrupt delivery especially for the timer
fastpath scenario, I measured kvm_x86_ops.deliver_posted_interrupt()
to introduce substantial latency because the processor has to perform
all vmentry tasks, ack the posted interrupt notification vector,
read the posted-interrupt descriptor etc.

This is not only slow, it is also unnecessary when delivering an
interrupt to the current CPU (as is the case for the LAPIC timer) because
PIR->IRR and IRR->RVI synchronization is already performed on vmentry
Therefore skip kvm_vcpu_trigger_posted_interrupt in this case, and
instead do vmx_sync_pir_to_irr() on the EXIT_FASTPATH_REENTER_GUEST
fastpath as well.

Tested-by: Haiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-6-git-send-email-wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
```

It was backported to our codebase when we synchronized patches from upstream.

> > Can you try running with the below commit?  It's currently sitting in kvm/queue,
> > but not marked for stable because I didn't think it was possible for the check
> > to a cause a missed wake event in KVM's current code base.
> >
> > commit 6a8110fea2c1b19711ac1ef718680dfd940363c6
> > Author: Sean Christopherson <seanjc@google.com>
> > Date:   Wed Dec 8 01:52:27 2021 +0000
> >
> >     KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU
> >
> >     Drop a check that guards triggering a posted interrupt on the currently
> >     running vCPU, and more importantly guards waking the target vCPU if
> >     triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
> >     The "do nothing" logic when "vcpu == running_vcpu" works only because KVM
> >     doesn't have a path to ->deliver_posted_interrupt() from asynchronous
> >     context, e.g. if apic_timer_expired() were changed to always go down the
> >     posted interrupt path for APICv, or if the IN_GUEST_MODE check in
> >     kvm_use_posted_timer_interrupt() were dropped, and the hrtimer fired in
> >     kvm_vcpu_block() after the final kvm_vcpu_check_block() check, the vCPU
> >     would be scheduled() out without being awakened, i.e. would "miss" the
> >     timer interrupt.
> >
> >     One could argue that invoking kvm_apic_local_deliver() from (soft) IRQ
> >     context for the current running vCPU should be illegal, but nothing in
> >     KVM actually enforces that rules.  There's also no strong obvious benefit
> >     to making such behavior illegal, e.g. checking IN_GUEST_MODE and calling
> >     kvm_vcpu_wake_up() is at worst marginally more costly than querying the
> >     current running vCPU.
> >
> >     Lastly, this aligns the non-nested and nested usage of triggering posted
> >     interrupts, and will allow for additional cleanups.
> >
> >     Signed-off-by: Sean Christopherson <seanjc@google.com>
> >     Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> >     Message-Id: <20211208015236.1616697-18-seanjc@google.com>
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> >
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 38749063da0e..f61a6348cffd 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -3995,8 +3995,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu
> *vcpu, int vector)
> >          * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering
> a
> >          * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
> >          */
> > -       if (vcpu != kvm_get_running_vcpu() &&
> > -           !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> > +       if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> >                 kvm_vcpu_wake_up(vcpu);
> >
> >         return 0;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: The vcpu won't be wakened for a long time
  2021-12-16 15:42     ` Sean Christopherson
  2021-12-17  2:11       ` Wanpeng Li
@ 2021-12-18  9:08       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  2021-12-21 15:27         ` Sean Christopherson
  1 sibling, 1 reply; 11+ messages in thread
From: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) @ 2021-12-18  9:08 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: pbonzini, kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel



> -----Original Message-----
> From: Sean Christopherson [mailto:seanjc@google.com]
> Sent: Thursday, December 16, 2021 11:43 PM
> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> <longpeng2@huawei.com>
> Cc: pbonzini@redhat.com; kvm@vger.kernel.org; Gonglei (Arei)
> <arei.gonglei@huawei.com>; Huangzhichao <huangzhichao@huawei.com>; Wanpeng Li
> <wanpengli@tencent.com>; Vitaly Kuznetsov <vkuznets@redhat.com>; Jim Mattson
> <jmattson@google.com>; Joerg Roedel <joro@8bytes.org>; linux-kernel
> <linux-kernel@vger.kernel.org>
> Subject: Re: The vcpu won't be wakened for a long time
> 
> On Thu, Dec 16, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
> wrote:
> > > What kernel version?  There have been a variety of fixes/changes in the
> > > area in recent kernels.
> >
> > The kernel version is 4.18, and it seems the latest kernel also has this problem.
> >
> > The following code can fixes this bug, I've tested it on 4.18.
> >
> > (4.18)
> >
> > @@ -3944,6 +3944,11 @@ static void vmx_deliver_posted_interrupt(struct
> kvm_vcpu *vcpu, int vector)
> >         if (pi_test_and_set_on(&vmx->pi_desc))
> >                 return;
> >
> > +       if (swq_has_sleeper(kvm_arch_vcpu_wq(vcpu))) {
> > +               kvm_vcpu_kick(vcpu);
> > +               return;
> > +       }
> > +
> >         if (vcpu != kvm_get_running_vcpu() &&
> >                 !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> >                 kvm_vcpu_kick(vcpu);
> >
> >
> > (latest)
> >
> > @@ -3959,6 +3959,11 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu
> *vcpu, int vector)
> >         if (pi_test_and_set_on(&vmx->pi_desc))
> >                 return 0;
> >
> > +       if (rcuwait_active(&vcpu->wait)) {
> > +               kvm_vcpu_kick(vcpu);
> > +               return 0;
> > +       }
> > +
> >         if (vcpu != kvm_get_running_vcpu() &&
> >             !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> >                 kvm_vcpu_kick(vcpu);
> >
> > Do you have any suggestions ?
> 
> Hmm, that strongly suggests the "vcpu != kvm_get_running_vcpu()" is at fault.
> Can you try running with the below commit?  It's currently sitting in kvm/queue,
> but not marked for stable because I didn't think it was possible for the check
> to a cause a missed wake event in KVM's current code base.
> 

The below commit can fix the bug, we have just completed  the tests.
Thanks.

> commit 6a8110fea2c1b19711ac1ef718680dfd940363c6
> Author: Sean Christopherson <seanjc@google.com>
> Date:   Wed Dec 8 01:52:27 2021 +0000
> 
>     KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU
> 
>     Drop a check that guards triggering a posted interrupt on the currently
>     running vCPU, and more importantly guards waking the target vCPU if
>     triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
>     The "do nothing" logic when "vcpu == running_vcpu" works only because KVM
>     doesn't have a path to ->deliver_posted_interrupt() from asynchronous
>     context, e.g. if apic_timer_expired() were changed to always go down the
>     posted interrupt path for APICv, or if the IN_GUEST_MODE check in
>     kvm_use_posted_timer_interrupt() were dropped, and the hrtimer fired in
>     kvm_vcpu_block() after the final kvm_vcpu_check_block() check, the vCPU
>     would be scheduled() out without being awakened, i.e. would "miss" the
>     timer interrupt.
> 
>     One could argue that invoking kvm_apic_local_deliver() from (soft) IRQ
>     context for the current running vCPU should be illegal, but nothing in
>     KVM actually enforces that rules.  There's also no strong obvious benefit
>     to making such behavior illegal, e.g. checking IN_GUEST_MODE and calling
>     kvm_vcpu_wake_up() is at worst marginally more costly than querying the
>     current running vCPU.
> 
>     Lastly, this aligns the non-nested and nested usage of triggering posted
>     interrupts, and will allow for additional cleanups.
> 
>     Signed-off-by: Sean Christopherson <seanjc@google.com>
>     Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
>     Message-Id: <20211208015236.1616697-18-seanjc@google.com>
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 38749063da0e..f61a6348cffd 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -3995,8 +3995,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu
> *vcpu, int vector)
>          * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
>          * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
>          */
> -       if (vcpu != kvm_get_running_vcpu() &&
> -           !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> +       if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
>                 kvm_vcpu_wake_up(vcpu);
> 
>         return 0;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The vcpu won't be wakened for a long time
  2021-12-18  9:08       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
@ 2021-12-21 15:27         ` Sean Christopherson
  2021-12-21 15:34           ` Paolo Bonzini
  2021-12-22  6:07           ` Chao Gao
  0 siblings, 2 replies; 11+ messages in thread
From: Sean Christopherson @ 2021-12-21 15:27 UTC (permalink / raw)
  To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
  Cc: pbonzini, kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel

On Sat, Dec 18, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
> > Hmm, that strongly suggests the "vcpu != kvm_get_running_vcpu()" is at fault.
> > Can you try running with the below commit?  It's currently sitting in kvm/queue,
> > but not marked for stable because I didn't think it was possible for the check
> > to a cause a missed wake event in KVM's current code base.
> > 
> 
> The below commit can fix the bug, we have just completed  the tests.
> Thanks.

Aha!  Somehow I missed this call chain when analyzing the change.

  irqfd_wakeup()
  |
  |->kvm_arch_set_irq_inatomic()
     |
     |-> kvm_irq_delivery_to_apic_fast()
         |
	 |-> kvm_apic_set_irq()


Paolo, can the changelog be amended to the below, and maybe even pull the commit
into 5.16?


KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU

Drop a check that guards triggering a posted interrupt on the currently
running vCPU, and more importantly guards waking the target vCPU if
triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
If a vIRQ is delivered from asynchronous context, the target vCPU can be
the currently running vCPU and can also be blocking, in which case
skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to
be a wake event for the vCPU.

The "do nothing" logic when "vcpu == running_vcpu" mostly works only
because the majority of calls to ->deliver_posted_interrupt(), especially
when using posted interrupts, come from synchronous KVM context.  But if
a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ
and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to
use posted interrupts, wake events from the device will be delivered to
KVM from IRQ context, e.g.

  vfio_msihandler()
  |
  |-> eventfd_signal()
      |
      |-> ...
          |
          |->  irqfd_wakeup()
               |
               |->kvm_arch_set_irq_inatomic()
                  |
                  |-> kvm_irq_delivery_to_apic_fast()
                      |
                      |-> kvm_apic_set_irq()

This also aligns the non-nested and nested usage of triggering posted
interrupts, and will allow for additional cleanups.

Fixes: 379a3c8ee444 ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath")
Cc: stable@vger.kernel.org
Reported-by: Longpeng (Mike) <longpeng2@huawei.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211208015236.1616697-18-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>




> > commit 6a8110fea2c1b19711ac1ef718680dfd940363c6
> > Author: Sean Christopherson <seanjc@google.com>
> > Date:   Wed Dec 8 01:52:27 2021 +0000
> > 
> >     KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU
> > 
> >     Drop a check that guards triggering a posted interrupt on the currently
> >     running vCPU, and more importantly guards waking the target vCPU if
> >     triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE.
> >     The "do nothing" logic when "vcpu == running_vcpu" works only because KVM
> >     doesn't have a path to ->deliver_posted_interrupt() from asynchronous
> >     context, e.g. if apic_timer_expired() were changed to always go down the
> >     posted interrupt path for APICv, or if the IN_GUEST_MODE check in
> >     kvm_use_posted_timer_interrupt() were dropped, and the hrtimer fired in
> >     kvm_vcpu_block() after the final kvm_vcpu_check_block() check, the vCPU
> >     would be scheduled() out without being awakened, i.e. would "miss" the
> >     timer interrupt.
> > 
> >     One could argue that invoking kvm_apic_local_deliver() from (soft) IRQ
> >     context for the current running vCPU should be illegal, but nothing in
> >     KVM actually enforces that rules.  There's also no strong obvious benefit
> >     to making such behavior illegal, e.g. checking IN_GUEST_MODE and calling
> >     kvm_vcpu_wake_up() is at worst marginally more costly than querying the
> >     current running vCPU.
> > 
> >     Lastly, this aligns the non-nested and nested usage of triggering posted
> >     interrupts, and will allow for additional cleanups.
> > 
> >     Signed-off-by: Sean Christopherson <seanjc@google.com>
> >     Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
> >     Message-Id: <20211208015236.1616697-18-seanjc@google.com>
> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > 
> > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> > index 38749063da0e..f61a6348cffd 100644
> > --- a/arch/x86/kvm/vmx/vmx.c
> > +++ b/arch/x86/kvm/vmx/vmx.c
> > @@ -3995,8 +3995,7 @@ static int vmx_deliver_posted_interrupt(struct kvm_vcpu
> > *vcpu, int vector)
> >          * guaranteed to see PID.ON=1 and sync the PIR to IRR if triggering a
> >          * posted interrupt "fails" because vcpu->mode != IN_GUEST_MODE.
> >          */
> > -       if (vcpu != kvm_get_running_vcpu() &&
> > -           !kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> > +       if (!kvm_vcpu_trigger_posted_interrupt(vcpu, false))
> >                 kvm_vcpu_wake_up(vcpu);
> > 
> >         return 0;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The vcpu won't be wakened for a long time
  2021-12-21 15:27         ` Sean Christopherson
@ 2021-12-21 15:34           ` Paolo Bonzini
  2021-12-22  6:07           ` Chao Gao
  1 sibling, 0 replies; 11+ messages in thread
From: Paolo Bonzini @ 2021-12-21 15:34 UTC (permalink / raw)
  To: Sean Christopherson, Longpeng (Mike,
	Cloud Infrastructure Service Product Dept.)
  Cc: kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel

On 12/21/21 16:27, Sean Christopherson wrote:
> 
> Paolo, can the changelog be amended to the below, and maybe even pull the commit
> into 5.16?

Yes, of course.

Paolo


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The vcpu won't be wakened for a long time
  2021-12-21 15:27         ` Sean Christopherson
  2021-12-21 15:34           ` Paolo Bonzini
@ 2021-12-22  6:07           ` Chao Gao
  2021-12-22 15:44             ` Sean Christopherson
  1 sibling, 1 reply; 11+ messages in thread
From: Chao Gao @ 2021-12-22  6:07 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.),
	pbonzini, kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel

On Tue, Dec 21, 2021 at 03:27:01PM +0000, Sean Christopherson wrote:
>On Sat, Dec 18, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
>> > Hmm, that strongly suggests the "vcpu != kvm_get_running_vcpu()" is at fault.
>> > Can you try running with the below commit?  It's currently sitting in kvm/queue,
>> > but not marked for stable because I didn't think it was possible for the check
>> > to a cause a missed wake event in KVM's current code base.
>> > 
>> 
>> The below commit can fix the bug, we have just completed  the tests.
>> Thanks.
>
>Aha!  Somehow I missed this call chain when analyzing the change.
>
>  irqfd_wakeup()
>  |
>  |->kvm_arch_set_irq_inatomic()
>     |
>     |-> kvm_irq_delivery_to_apic_fast()
>         |
>	 |-> kvm_apic_set_irq()
>
>
>Paolo, can the changelog be amended to the below, and maybe even pull the commit
>into 5.16?
>
>
>KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU
>
>Drop a check that guards triggering a posted interrupt on the currently
>running vCPU,

Can we move (add) this check to kvm_vcpu_trigger_posted_interrupt()?

	if (vcpu->mode == IN_GUEST_MODE) {
[...]
-		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
+		if (vcpu != kvm_get_running_vcpu())
+			apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
 		return true;

It can achieve the purpose of the original patch without (re-)introducing
this bug.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: The vcpu won't be wakened for a long time
  2021-12-22  6:07           ` Chao Gao
@ 2021-12-22 15:44             ` Sean Christopherson
  0 siblings, 0 replies; 11+ messages in thread
From: Sean Christopherson @ 2021-12-22 15:44 UTC (permalink / raw)
  To: Chao Gao
  Cc: Longpeng (Mike, Cloud Infrastructure Service Product Dept.),
	pbonzini, kvm, Gonglei (Arei),
	Huangzhichao, Wanpeng Li, Vitaly Kuznetsov, Jim Mattson,
	Joerg Roedel, linux-kernel

On Wed, Dec 22, 2021, Chao Gao wrote:
> On Tue, Dec 21, 2021 at 03:27:01PM +0000, Sean Christopherson wrote:
> >On Sat, Dec 18, 2021, Longpeng (Mike, Cloud Infrastructure Service Product Dept.) wrote:
> >> > Hmm, that strongly suggests the "vcpu != kvm_get_running_vcpu()" is at fault.
> >> > Can you try running with the below commit?  It's currently sitting in kvm/queue,
> >> > but not marked for stable because I didn't think it was possible for the check
> >> > to a cause a missed wake event in KVM's current code base.
> >> > 
> >> 
> >> The below commit can fix the bug, we have just completed  the tests.
> >> Thanks.
> >
> >Aha!  Somehow I missed this call chain when analyzing the change.
> >
> >  irqfd_wakeup()
> >  |
> >  |->kvm_arch_set_irq_inatomic()
> >     |
> >     |-> kvm_irq_delivery_to_apic_fast()
> >         |
> >	 |-> kvm_apic_set_irq()
> >
> >
> >Paolo, can the changelog be amended to the below, and maybe even pull the commit
> >into 5.16?
> >
> >
> >KVM: VMX: Wake vCPU when delivering posted IRQ even if vCPU == this vCPU
> >
> >Drop a check that guards triggering a posted interrupt on the currently
> >running vCPU,
> 
> Can we move (add) this check to kvm_vcpu_trigger_posted_interrupt()?
>
> 	if (vcpu->mode == IN_GUEST_MODE) {
> [...]
> -		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
> +		if (vcpu != kvm_get_running_vcpu())
> +			apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), pi_vec);
>  		return true;
> 
> It can achieve the purpose of the original patch without (re-)introducing
> this bug.

Hmm, yes, I think that would be safe and would optimize delivery of TSC deadline
timer interrupts when they're emulated via the VMX preemption timer.  The original
patch confused me because the optimization went in before the code it was optimizing.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-12-22 15:44 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-14 13:55 The vcpu won't be wakened for a long time Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-12-14 17:36 ` Sean Christopherson
2021-12-16 14:03   ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-12-16 15:42     ` Sean Christopherson
2021-12-17  2:11       ` Wanpeng Li
2021-12-17  5:51         ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-12-18  9:08       ` Longpeng (Mike, Cloud Infrastructure Service Product Dept.)
2021-12-21 15:27         ` Sean Christopherson
2021-12-21 15:34           ` Paolo Bonzini
2021-12-22  6:07           ` Chao Gao
2021-12-22 15:44             ` Sean Christopherson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).