All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Lai Jiangshan <jiangshanlai+lkml@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>,
	kvm@vger.kernel.org, Filippo Sironi <sironi@amazon.de>,
	David Woodhouse <dwmw@amazon.co.uk>,
	"v4.7+" <stable@vger.kernel.org>,
	Wanpeng Li <wanpengli@tencent.com>
Subject: Re: [PATCH 2/2] KVM: x86: Fix split-irqchip vs interrupt injection window request
Date: Mon, 12 Apr 2021 21:43:35 +0000	[thread overview]
Message-ID: <YHS/BxMiO6I1VOEY@google.com> (raw)
In-Reply-To: <CAJhGHyCdqgtvK98_KieG-8MUfg1Jghd+H99q+FkgL0ZuqnvuAw@mail.gmail.com>

On Fri, Apr 09, 2021, Lai Jiangshan wrote:
> On Fri, Nov 27, 2020 at 7:26 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> >
> > kvm_cpu_accept_dm_intr and kvm_vcpu_ready_for_interrupt_injection are
> > a hodge-podge of conditions, hacked together to get something that
> > more or less works.  But what is actually needed is much simpler;
> > in both cases the fundamental question is, do we have a place to stash
> > an interrupt if userspace does KVM_INTERRUPT?
> >
> > In userspace irqchip mode, that is !vcpu->arch.interrupt.injected.
> > Currently kvm_event_needs_reinjection(vcpu) covers it, but it is
> > unnecessarily restrictive.
> >
> > In split irqchip mode it's a bit more complicated, we need to check
> > kvm_apic_accept_pic_intr(vcpu) (the IRQ window exit is basically an INTACK
> > cycle and thus requires ExtINTs not to be masked) as well as
> > !pending_userspace_extint(vcpu).  However, there is no need to
> > check kvm_event_needs_reinjection(vcpu), since split irqchip keeps
> > pending ExtINT state separate from event injection state, and checking
> > kvm_cpu_has_interrupt(vcpu) is wrong too since ExtINT has higher
> > priority than APIC interrupts.  In fact the latter fixes a bug:
> > when userspace requests an IRQ window vmexit, an interrupt in the
> > local APIC can cause kvm_cpu_has_interrupt() to be true and thus
> > kvm_vcpu_ready_for_interrupt_injection() to return false.  When this
> > happens, vcpu_run does not exit to userspace but the interrupt window
> > vmexits keep occurring.  The VM loops without any hope of making progress.
> >
> > Once we try to fix these with something like
> >
> >      return kvm_arch_interrupt_allowed(vcpu) &&
> > -        !kvm_cpu_has_interrupt(vcpu) &&
> > -        !kvm_event_needs_reinjection(vcpu) &&
> > -        kvm_cpu_accept_dm_intr(vcpu);
> > +        (!lapic_in_kernel(vcpu)
> > +         ? !vcpu->arch.interrupt.injected
> > +         : (kvm_apic_accept_pic_intr(vcpu)
> > +            && !pending_userspace_extint(v)));
> >
> > we realize two things.  First, thanks to the previous patch the complex
> > conditional can reuse !kvm_cpu_has_extint(vcpu).  Second, the interrupt
> > window request in vcpu_enter_guest()
> >
> >         bool req_int_win =
> >                 dm_request_for_irq_injection(vcpu) &&
> >                 kvm_cpu_accept_dm_intr(vcpu);
> >
> > should be kept in sync with kvm_vcpu_ready_for_interrupt_injection():
> > it is unnecessary to ask the processor for an interrupt window
> > if we would not be able to return to userspace.  Therefore, the
> > complex conditional is really the correct implementation of
> > kvm_cpu_accept_dm_intr(vcpu).  It all makes sense:
> >
> > - we can accept an interrupt from userspace if there is a place
> >   to stash it (and, for irqchip split, ExtINTs are not masked).
> >   Interrupts from userspace _can_ be accepted even if right now
> >   EFLAGS.IF=0.
> 
> Hello, Paolo
> 
> If userspace does KVM_INTERRUPT, vcpu->arch.interrupt.injected is
> set immediately, and in inject_pending_event(), we have
> 
>         else if (!vcpu->arch.exception.pending) {
>                 if (vcpu->arch.nmi_injected) {
>                         kvm_x86_ops.set_nmi(vcpu);
>                         can_inject = false;
>                 } else if (vcpu->arch.interrupt.injected) {
>                         kvm_x86_ops.set_irq(vcpu);
>                         can_inject = false;
>                 }
>         }
> 
> I'm curious about that can the kvm_x86_ops.set_irq() here be possible
> to queue the irq with EFLAGS.IF=0? If not, which code prevents it?

The interrupt is only directly injected if the local APIC is _not_ in-kernel.
If userspace is managing the local APIC, my understanding is that userspace is
also responsible for honoring EFLAGS.IF, though KVM aids userspace by updating
vcpu->run->ready_for_interrupt_injection when exiting to userspace.  When
userspace is modeling the local APIC, that resolves to
kvm_vcpu_ready_for_interrupt_injection():

	return kvm_arch_interrupt_allowed(vcpu) &&
		kvm_cpu_accept_dm_intr(vcpu);

where kvm_arch_interrupt_allowed() checks EFLAGS.IF (and an edge case related to
nested virtualization).  KVM also captures EFLAGS.IF in vcpu->run->if_flag.
For whatever reason, QEMU checks both vcpu->run flags before injecting an IRQ,
maybe to handle a case where QEMU itself clears EFLAGS.IF?
 
> I'm asking about this because I just noticed that interrupt can
> be queued when exception pending, and this patch relaxed it even
> more.
> 
> Note: interrupt can NOT be queued when exception pending
> until 664f8e26b00c7 ("KVM: X86: Fix loss of exception which
> has not yet been injected") which I think is dangerous.

  reply	other threads:[~2021-04-12 21:44 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-27 11:21 [PATCH 0/2] KVM: x86: cleanup and fix userspace interrupt window Paolo Bonzini
2020-11-27 11:21 ` [PATCH 1/2] KVM: x86: handle !lapic_in_kernel case in kvm_cpu_*_extint Paolo Bonzini
2020-11-27 11:56   ` David Woodhouse
2020-11-27 12:52   ` Filippo Sironi
2020-11-27 11:21 ` [PATCH 2/2] KVM: x86: Fix split-irqchip vs interrupt injection window request Paolo Bonzini
2020-11-27 12:53   ` Filippo Sironi
2021-04-09  7:14   ` Lai Jiangshan
2021-04-12 21:43     ` Sean Christopherson [this message]
2021-04-13 11:03       ` Lai Jiangshan
2021-04-13 12:15         ` Paolo Bonzini
2021-04-14  2:28           ` Lai Jiangshan
2021-04-14 16:58             ` Paolo Bonzini
2021-04-15  0:59               ` Lai Jiangshan
2021-04-15  6:07                 ` Paolo Bonzini
2021-04-15  8:06                   ` Lai Jiangshan
2021-04-13 12:10       ` Paolo Bonzini
2020-11-27 12:49 ` [PATCH 0/2] KVM: x86: cleanup and fix userspace interrupt window David Woodhouse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHS/BxMiO6I1VOEY@google.com \
    --to=seanjc@google.com \
    --cc=dwmw@amazon.co.uk \
    --cc=jiangshanlai+lkml@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=sironi@amazon.de \
    --cc=stable@vger.kernel.org \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.