All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
To: Sean Christopherson <seanjc@google.com>
Cc: Maxim Levitsky <mlevitsk@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 5/8] KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction
Date: Wed, 6 Apr 2022 21:08:41 +0200	[thread overview]
Message-ID: <5135b502-ce2e-babb-7812-4d4c431a5252@maciej.szmigiero.name> (raw)
In-Reply-To: <Yk3Jd6xAfgVoFgLc@google.com>

On 6.04.2022 19:10, Sean Christopherson wrote:
> On Wed, Apr 06, 2022, Maciej S. Szmigiero wrote:
>> On 6.04.2022 03:48, Sean Christopherson wrote:
>>> On Mon, Apr 04, 2022, Maciej S. Szmigiero wrote:
>> (..)
>>>> Also, I'm not sure that even the proposed updated code above will
>>>> actually restore the L1-requested next_rip correctly on L1 -> L2
>>>> re-injection (will review once the full version is available).
>>>
>>> Spoiler alert, it doesn't.  Save yourself the review time.  :-)
>>>
>>> The missing piece is stashing away the injected event on nested VMRUN.  Those
>>> events don't get routed through the normal interrupt/exception injection code and
>>> so the next_rip info is lost on the subsequent #NPF.
>>>
>>> Treating soft interrupts/exceptions like they were injected by KVM (which they
>>> are, technically) works and doesn't seem too gross.  E.g. when prepping vmcb02
>>>
>>> 	if (svm->nrips_enabled)
>>> 		vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
>>> 	else if (boot_cpu_has(X86_FEATURE_NRIPS))
>>> 		vmcb02->control.next_rip    = vmcb12_rip;
>>>
>>> 	if (is_evtinj_soft(vmcb02->control.event_inj)) {
>>> 		svm->soft_int_injected = true;
>>> 		svm->soft_int_csbase = svm->vmcb->save.cs.base;
>>> 		svm->soft_int_old_rip = vmcb12_rip;
>>> 		if (svm->nrips_enabled)
>>> 			svm->soft_int_next_rip = svm->nested.ctl.next_rip;
>>> 		else
>>> 			svm->soft_int_next_rip = vmcb12_rip;
>>> 	}
>>>
>>> And then the VMRUN error path just needs to clear soft_int_injected.
>>
>> I am also a fan of parsing EVENTINJ from VMCB12 into relevant KVM
>> injection structures (much like EXITINTINFO is parsed), as I said to
>> Maxim two days ago [1].
>> Not only for software {interrupts,exceptions} but for all incoming
>> events (again, just like EXITINTINFO).
> 
> Ahh, I saw that fly by, but somehow I managed to misread what you intended.
> 
> I like the idea of populating vcpu->arch.interrupt/exception as "injected" events.
> KVM prioritizes "injected" over other nested events, so in theory it should work
> without too much fuss.  I've ran through a variety of edge cases in my head and
> haven't found anything that would be fundamentally broken.  I think even live
> migration would work.
> 
> I think I'd prefer to do that in a follow-up series so that nVMX can be converted
> at the same time?  It's a bit absurd to add the above soft int code knowing that,
> at least in theory, simply populating the right software structs would automagically
> fix the bug.  But manually handling the soft int case first would be safer in the
> sense that we'd still have a fix for the soft int case if it turns out that populating
> vcpu->arch.interrupt/exception isn't as straightfoward as it seems.

I don't see any problem with having two patch series, the second series
depending on the first.

I planned to at least fix the bugs that I described in my previous message
(the NMI one actually breaks a real Windows guest) but that needs
knowledge how the event injection code will finally look like after the
first series of fixes.

>> However, there is another issue related to L1 -> L2 event re-injection
>> using standard KVM event injection mechanism: it mixes the L1 injection
>> state with the L2 one.
>>
>> Specifically for SVM:
>> * When re-injecting a NMI into L2 NMI-blocking is enabled in
>> vcpu->arch.hflags (shared between L1 and L2) and IRET intercept is
>> enabled.
>>
>> This is incorrect, since it is L1 that is responsible for enforcing NMI
>> blocking for NMIs that it injects into its L2.
> 
> Ah, I see what you're saying.  I think :-)  IIUC, we can fix this bug without any
> new flags, just skip the side effects if the NMI is being injected into L2.
> 
> @@ -3420,6 +3424,10 @@ static void svm_inject_nmi(struct kvm_vcpu *vcpu)
>          struct vcpu_svm *svm = to_svm(vcpu);
> 
>          svm->vmcb->control.event_inj = SVM_EVTINJ_VALID | SVM_EVTINJ_TYPE_NMI;
> +
> +       if (is_guest_mode(vcpu))
> +               return;
> +
>          vcpu->arch.hflags |= HF_NMI_MASK;
>          if (!sev_es_guest(vcpu->kvm))
>                  svm_set_intercept(svm, INTERCEPT_IRET);
> 
> and for nVMX:
> 
> @@ -4598,6 +4598,9 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>   {
>          struct vcpu_vmx *vmx = to_vmx(vcpu);
> 
> +       if (is_guest_mode(vcpu))
> +               goto inject_nmi;
> +
>          if (!enable_vnmi) {
>                  /*
>                   * Tracking the NMI-blocked state in software is built upon
> @@ -4619,6 +4622,7 @@ static void vmx_inject_nmi(struct kvm_vcpu *vcpu)
>                  return;
>          }
> 
> +inject_nmi:
>          vmcs_write32(VM_ENTRY_INTR_INFO_FIELD,
>                          INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR);
> 

And what if it's L0 that is trying to inject a NMI into L2?
In this case is_guest_mode() is true, but the full NMI injection machinery
should be used.

>> Also, *L2* being the target of such injection definitely should not block
>> further NMIs for *L1*.
> 
> Actually, it should block NMIs for L1.  From L1's perspective, the injection is
> part of VM-Entry.  That's a single gigantic instruction, thus there is no NMI window
> until VM-Entry completes from L1's perspetive.  Any exit that occurs on vectoring
> an injected event and is handled by L0 should not be visible to L1, because from
> L1's perspective it's all part of VMRUN/VMLAUNCH/VMRESUME.  So blocking new events
> because an NMI (or any event) needs to be reinjected for L2 is correct.

I think this kind of NMI blocking will be already handled by having
the pending new NMI in vcpu->arch.nmi_pending but the one that needs
re-injecting in vcpu->arch.nmi_injected.

The pending new NMI in vcpu->arch.nmi_pending won't be handled until
vcpu->arch.nmi_injected gets cleared (that is, until re-injection is
successful).

It is incorrect however, to wait for L2 to execute IRET to unblock
L0 -> L1 NMIs or L1 -> L2 NMIs, in these two cases we (L0) just need the CPU
to vector that L2 NMI so it no longer shows in EXITINTINFO.

It is also incorrect to block L1 -> L2 NMI injection because either L1
or L2 is currently under NMI blocking: the first case is obvious,
the second because it's L1 that is supposed to take care of proper NMI
blocking for L2 when injecting an NMI there.

>> * When re-injecting a *hardware* IRQ into L2 GIF is checked (previously
>> even on the BUG_ON() level), while L1 should be able to inject even when
>> L2 GIF is off,
> 
> Isn't that just a matter of tweaking the assertion to ignore GIF if L2 is
> active?  Hmm, or deleting the assertion altogether, it's likely doing more harm
> than good at this point.

I assume this assertion is meant to catch the case when KVM itself (L0) is
trying to erroneously inject a hardware interrupt into L1 or L2, so it will
need to be skipped only for L1 -> L2 event injection.

Whether this assertion benefits outweigh its costs is debatable - don't have
a strong opinion here (BUG_ON() is for sure too strong, but WARN_ON_ONCE()
might make sense to catch latent bugs).

>> With the code in my previous patch set I planned to use
>> exit_during_event_injection() to detect such case, but if we implement
>> VMCB12 EVENTINJ parsing we can simply add a flag that the relevant event
>> comes from L1, so its normal injection side-effects should be skipped.
> 
> Do we still need a flag based on the above?  Honest question... I've been staring
> at all this for the better part of an hour and may have lost track of things.

If checking just is_guest_mode() is not enough due to reasons I described
above then we need to somehow determine in the NMI / IRQ injection handler
whether the event to be injected into L2 comes from L0 or L1.
For this (assuming we do VMCB12 EVENTINJ parsing) I think we need an extra flag.

>> By the way, the relevant VMX code also looks rather suspicious,
>> especially for the !enable_vnmi case.
> 
> I think it's safe to say we can ignore edge cases for !enable_vnmi.  It might even
> be worth trying to remove that support again (Paolo tried years ago), IIRC the
> only Intel CPUs that don't support virtual NMIs are some funky Yonah SKUs.

Ack, we could at least disable nested on !enable_vnmi.

BTW, I think that besides Yonah cores very early Core 2 CPUs also lacked
vNMI support, that's why !enable_vnmi support was reinstated.
But that's hardware even older than !nrips AMD parts.

Thanks,
Maciej

  reply	other threads:[~2022-04-06 20:51 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-02  1:08 [PATCH 0/8] KVM: SVM: Fix soft int/ex re-injection Sean Christopherson
2022-04-02  1:08 ` [PATCH 1/8] KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02 Sean Christopherson
2022-04-04  9:54   ` Maxim Levitsky
2022-04-04 16:50   ` Maciej S. Szmigiero
2022-04-04 17:21     ` Sean Christopherson
2022-04-04 17:45       ` Maciej S. Szmigiero
2022-04-20 15:00       ` Paolo Bonzini
2022-04-20 15:05         ` Maciej S. Szmigiero
2022-04-20 16:15           ` Sean Christopherson
2022-04-20 16:33             ` Paolo Bonzini
2022-04-20 16:44               ` Sean Christopherson
2022-04-02  1:08 ` [PATCH 2/8] KVM: SVM: Downgrade BUG_ON() to WARN_ON() in svm_inject_irq() Sean Christopherson
2022-04-02  1:08 ` [PATCH 3/8] KVM: SVM: Unwind "speculative" RIP advancement if INTn injection "fails" Sean Christopherson
2022-04-04 10:03   ` Maxim Levitsky
2022-04-20 15:01   ` Paolo Bonzini
2022-04-02  1:08 ` [PATCH 4/8] KVM: SVM: Stuff next_rip on emualted INT3 injection if NRIPS is supported Sean Christopherson
2022-04-04 12:00   ` Maxim Levitsky
2022-04-02  1:09 ` [PATCH 5/8] KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction Sean Christopherson
2022-04-04 12:12   ` Maxim Levitsky
2022-04-04 16:49     ` Sean Christopherson
2022-04-04 16:53       ` Maciej S. Szmigiero
2022-04-04 19:33         ` Sean Christopherson
2022-04-04 19:50           ` Maciej S. Szmigiero
2022-04-04 19:54           ` Sean Christopherson
2022-04-04 20:46             ` Maciej S. Szmigiero
2022-04-04 20:44       ` Maciej S. Szmigiero
2022-04-06  1:48         ` Sean Christopherson
2022-04-06 13:13           ` Maciej S. Szmigiero
2022-04-06 17:10             ` Sean Christopherson
2022-04-06 19:08               ` Maciej S. Szmigiero [this message]
2022-04-06 19:48                 ` Sean Christopherson
2022-04-06 20:30                   ` Maciej S. Szmigiero
2022-04-06 20:52                     ` Sean Christopherson
2022-04-06 22:34                       ` Maciej S. Szmigiero
2022-04-06 23:03                         ` Sean Christopherson
2022-04-07 15:32                           ` Maciej S. Szmigiero
2022-04-02  1:09 ` [PATCH 6/8] KVM: SVM: Re-inject INTn instead of retrying the insn on "failure" Sean Christopherson
2022-04-04 17:14   ` Sean Christopherson
2022-04-04 20:27   ` Maciej S. Szmigiero
2022-04-02  1:09 ` [PATCH 7/8] KVM: x86: Trace re-injected exceptions Sean Christopherson
2022-04-04 12:14   ` Maxim Levitsky
2022-04-04 16:14     ` Sean Christopherson
2022-04-02  1:09 ` [PATCH 8/8] KVM: selftests: nSVM: Add svm_nested_soft_inject_test Sean Christopherson
2022-04-04 12:27   ` Maxim Levitsky
2022-04-04 16:59     ` Maciej S. Szmigiero

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5135b502-ce2e-babb-7812-4d4c431a5252@maciej.szmigiero.name \
    --to=mail@maciej.szmigiero.name \
    --cc=jmattson@google.com \
    --cc=joro@8bytes.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mlevitsk@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.