Re: [PATCH 2/9] KVM: x86: Emulate triple fault shutdown if RSM emulation fails

From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	syzbot+fb0b6a7e8713aeb0319c@syzkaller.appspotmail.com,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 2/9] KVM: x86: Emulate triple fault shutdown if RSM emulation fails
Date: Thu, 10 Jun 2021 10:26:16 +0200	[thread overview]
Message-ID: <87eedayvkn.fsf@vitty.brq.redhat.com> (raw)
In-Reply-To: <20210609185619.992058-3-seanjc@google.com>

Sean Christopherson <seanjc@google.com> writes:

> Use the recently introduced KVM_REQ_TRIPLE_FAULT to properly emulate
> shutdown if RSM from SMM fails.
>
> Note, entering shutdown after clearing the SMM flag and restoring NMI
> blocking is architecturally correct with respect to AMD's APM, which KVM
> also uses for SMRAM layout and RSM NMI blocking behavior.  The APM says:
>
>   An RSM causes a processor shutdown if an invalid-state condition is
>   found in the SMRAM state-save area. Only an external reset, external
>   processor-initialization, or non-maskable external interrupt (NMI) can
>   cause the processor to leave the shutdown state.
>
> Of note is processor-initialization (INIT) as a valid shutdown wake
> event, as INIT is blocked by SMM, implying that entering shutdown also
> forces the CPU out of SMM.
>
> For recent Intel CPUs, restoring NMI blocking is technically wrong, but
> so is restoring NMI blocking in the first place, and Intel's RSM
> "architecture" is such a mess that just about anything is allowed and can
> be justified as micro-architectural behavior.
>
> Per the SDM:
>
>   On Pentium 4 and later processors, shutdown will inhibit INTR and A20M
>   but will not change any of the other inhibits. On these processors,
>   NMIs will be inhibited if no action is taken in the SMI handler to
>   uninhibit them (see Section 34.8).
>
> where Section 34.8 says:
>
>   When the processor enters SMM while executing an NMI handler, the
>   processor saves the SMRAM state save map but does not save the
>   attribute to keep NMI interrupts disabled. Potentially, an NMI could be
>   latched (while in SMM or upon exit) and serviced upon exit of SMM even
>   though the previous NMI handler has still not completed.
>
> I.e. RSM unconditionally unblocks NMI, but shutdown on RSM does not,
> which is in direct contradiction of KVM's behavior.  But, as mentioned
> above, KVM follows AMD architecture and restores NMI blocking on RSM, so
> that micro-architectural detail is already lost.
>
> And for Pentium era CPUs, SMI# can break shutdown, meaning that at least
> some Intel CPUs fully leave SMM when entering shutdown:
>
>   In the shutdown state, Intel processors stop executing instructions
>   until a RESET#, INIT# or NMI# is asserted.  While Pentium family
>   processors recognize the SMI# signal in shutdown state, P6 family and
>   Intel486 processors do not.
>
> In other words, the fact that Intel CPUs have implemented the two
> extremes gives KVM carte blanche when it comes to honoring Intel's
> architecture for handling shutdown during RSM.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/emulate.c     | 12 +++++++-----
>  arch/x86/kvm/kvm_emulate.h |  1 +
>  arch/x86/kvm/x86.c         |  6 ++++++
>  3 files changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
> index 5e5de05a8fbf..0603a2c79093 100644
> --- a/arch/x86/kvm/emulate.c
> +++ b/arch/x86/kvm/emulate.c
> @@ -2683,7 +2683,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
>  	 * state-save area.
>  	 */
>  	if (ctxt->ops->pre_leave_smm(ctxt, buf))
> -		return X86EMUL_UNHANDLEABLE;
> +		goto emulate_shutdown;
>  
>  #ifdef CONFIG_X86_64
>  	if (emulator_has_longmode(ctxt))
> @@ -2692,14 +2692,16 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt)
>  #endif
>  		ret = rsm_load_state_32(ctxt, buf);
>  
> -	if (ret != X86EMUL_CONTINUE) {
> -		/* FIXME: should triple fault */
> -		return X86EMUL_UNHANDLEABLE;
> -	}
> +	if (ret != X86EMUL_CONTINUE)
> +		goto emulate_shutdown;
>  
>  	ctxt->ops->post_leave_smm(ctxt);
>  
>  	return X86EMUL_CONTINUE;
> +
> +emulate_shutdown:
> +	ctxt->ops->triple_fault(ctxt);
> +	return X86EMUL_UNHANDLEABLE;

I'm probably missing something, but what's the desired effect of both
raising KVM_REQ_TRIPLE_FAULT and returning X86EMUL_UNHANDLEABLE here?

I've modified smm selftest to see what's happening:

diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c
index 613c42c5a9b8..cf215cd2c6e2 100644
--- a/tools/testing/selftests/kvm/x86_64/smm_test.c
+++ b/tools/testing/selftests/kvm/x86_64/smm_test.c
@@ -147,6 +147,11 @@ int main(int argc, char *argv[])
                            "Unexpected stage: #%x, got %x",
                            stage, stage_reported);
 
+               if (stage_reported == SMRAM_STAGE) {
+                       /* corrupt smram */
+                       memset(addr_gpa2hva(vm, SMRAM_GPA) + 0xfe00, 0xff, 512);
+               }
+
                state = vcpu_save_state(vm, VCPU_ID);
                kvm_vm_release(vm);
                kvm_vm_restart(vm, O_RDWR);

What I see is:

        smm_test-7600  [002]  4497.073918: kvm_exit:             reason EXIT_RSM rip 0x8004 info 0 0
        smm_test-7600  [002]  4497.073921: kvm_emulate_insn:     1000000:8004: 0f aa
        smm_test-7600  [002]  4497.073924: kvm_smm_transition:   vcpu 1: leaving SMM, smbase 0x1000000
        smm_test-7600  [002]  4497.073928: kvm_emulate_insn:     0:8004: 0f aa FAIL
        smm_test-7600  [002]  4497.073929: kvm_fpu:              unload
        smm_test-7600  [002]  4497.073930: kvm_userspace_exit:   reason KVM_EXIT_INTERNAL_ERROR (17)

If I change X86EMUL_UNHANDLEABLE to X86EMUL_CONTINUE tripple fault is
happening indeed (why don't we have triple fault printed in trace by
default BTW???):

        smm_test-16810 [006]  5117.007220: kvm_exit:             reason EXIT_RSM rip 0x8004 info 0 0
        smm_test-16810 [006]  5117.007222: kvm_emulate_insn:     1000000:8004: 0f aa
        smm_test-16810 [006]  5117.007225: kvm_smm_transition:   vcpu 1: leaving SMM, smbase 0x1000000
        smm_test-16810 [006]  5117.007229: bputs:                vcpu_enter_guest: KVM_REQ_TRIPLE_FAULT
        smm_test-16810 [006]  5117.007230: kvm_fpu:              unload
        smm_test-16810 [006]  5117.007230: kvm_userspace_exit:   reason KVM_EXIT_SHUTDOWN (8)

So should we actually have X86EMUL_CONTINUE when we queue
KVM_REQ_TRIPLE_FAULT here?

(Initially, my comment was supposed to be 'why don't you add
TRIPLE_FAULT to smm selftest?' but the above overshadows it)

>  }
>  
>  static void
> diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
> index 3e870bf9ca4d..9c34aa60e45f 100644
> --- a/arch/x86/kvm/kvm_emulate.h
> +++ b/arch/x86/kvm/kvm_emulate.h
> @@ -233,6 +233,7 @@ struct x86_emulate_ops {
>  	int (*pre_leave_smm)(struct x86_emulate_ctxt *ctxt,
>  			     const char *smstate);
>  	void (*post_leave_smm)(struct x86_emulate_ctxt *ctxt);
> +	void (*triple_fault)(struct x86_emulate_ctxt *ctxt);
>  	int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr);
>  };
>  
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 54d212fe9b15..cda148cf06fa 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7123,6 +7123,11 @@ static void emulator_post_leave_smm(struct x86_emulate_ctxt *ctxt)
>  	kvm_smm_changed(emul_to_vcpu(ctxt));
>  }
>  
> +static void emulator_triple_fault(struct x86_emulate_ctxt *ctxt)
> +{
> +	kvm_make_request(KVM_REQ_TRIPLE_FAULT, emul_to_vcpu(ctxt));
> +}
> +
>  static int emulator_set_xcr(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr)
>  {
>  	return __kvm_set_xcr(emul_to_vcpu(ctxt), index, xcr);
> @@ -7172,6 +7177,7 @@ static const struct x86_emulate_ops emulate_ops = {
>  	.set_hflags          = emulator_set_hflags,
>  	.pre_leave_smm       = emulator_pre_leave_smm,
>  	.post_leave_smm      = emulator_post_leave_smm,
> +	.triple_fault        = emulator_triple_fault,
>  	.set_xcr             = emulator_set_xcr,
>  };

-- 
Vitaly