Re: Query about calling kvm_vcpu_gfn_to_memslot() with a GVA (Re: [PATCH 1/2] KVM: SVM: avoid infinite loop on NPF from bad address

From: Liam Merwick <liam.merwick@oracle.com>
To: Sean Christopherson <seanjc@google.com>
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
	"Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>,
	Boris Ostrovsky <boris.ostrovsky@oracle.com>,
	Liam Merwick <liam.merwick@oracle.com>
Subject: Re: Query about calling kvm_vcpu_gfn_to_memslot() with a GVA (Re: [PATCH 1/2] KVM: SVM: avoid infinite loop on NPF from bad address
Date: Mon, 17 Jan 2022 17:09:02 +0000	[thread overview]
Message-ID: <fcf4c5c8-aa13-11bf-ec6d-1775b3bd9cd2@oracle.com> (raw)
In-Reply-To: <YeBZ+QcXUIQ7/fD2@google.com>

On 13/01/2022 16:57, Sean Christopherson wrote:
> On Thu, Jan 13, 2022, Liam Merwick wrote:
>> On Fri, Apr 17, 2020 at 12:38:42PM -0400, Paolo Bonzini wrote:
>>> When a nested page fault is taken from an address that does not have
>>> a memslot associated to it, kvm_mmu_do_page_fault returns RET_PF_EMULATE
>>> (via mmu_set_spte) and kvm_mmu_page_fault then invokes
>> svm_need_emulation_on_page_fault.
>>>
>>> The default answer there is to return false, but in this case this just
>>> causes the page fault to be retried ad libitum.  Since this is not a
>>> fast path, and the only other case where it is taken is an erratum,
>>> just stick a kvm_vcpu_gfn_to_memslot check in there to detect the
>>> common case where the erratum is not happening.
>>>
>>> This fixes an infinite loop in the new set_memory_region_test.
>>>
>>> Fixes: 05d5a4863525 ("KVM: SVM: Workaround errata#1096 (insn_len maybe
>> zero on SMAP violation)")
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>>   arch/x86/kvm/svm/svm.c | 7 +++++++
>>>   virt/kvm/kvm_main.c    | 1 +
>>>   2 files changed, 8 insertions(+)
>>>
>>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>>> index a91e397d6750..c86f7278509b 100644
>>> --- a/arch/x86/kvm/svm/svm.c
>>> +++ b/arch/x86/kvm/svm/svm.c
>>> @@ -3837,6 +3837,13 @@ static bool svm_need_emulation_on_page_fault(struct
>> kvm_vcpu *vcpu)
>>>   	bool smap = cr4 & X86_CR4_SMAP;
>>>   	bool is_user = svm_get_cpl(vcpu) == 3;
>>>
>>> +	/*
>>> +	 * If RIP is invalid, go ahead with emulation which will cause an
>>> +	 * internal error exit.
>>> +	 */
>>> +	if (!kvm_vcpu_gfn_to_memslot(vcpu, kvm_rip_read(vcpu) >> PAGE_SHIFT))
>>
>> When looking into an SEV issue it was noted that the second arg to
>> kvm_vcpu_gfn_to_memslot() is a gfn_t but kvm_rip_read() will return guest
>> RIP which is a guest virtual address and memslots hold guest physical
>> addresses. How is KVM supposed to translate it to a memslot
>> and indicate if the guest RIP is valid?
> 
> Ugh, magic?  That code is complete garbage.  It worked to fix the selftest issue
> because the selftest identity maps the relevant guest code.
> 
> The entire idea is a hack.  If KVM gets into an infinite loop because the guest
> is attempting to fetch from MMIO, then the #NPF/#PF should have the FETCH bit set
> in the error code.  I.e. I believe the below change should fix the original issue,
> at which point we can revert the above.  I'll test today and hopefully get a patch
> sent out.

Thanks Sean.

I have been running with this patch along with reverting commit
e72436bc3a52 ("KVM: SVM: avoid infinite loop on NPF from bad address")
with over 150 hours runtime on multiple machines and it resolves an SEV
guest crash I was encountering where if there were no decode assist 
bytes available, it then continued on and hit the invalid RIP check.

Reviewed-by: Liam Merwick <liam.merwick@oracle.com>
Tested-by: Liam Merwick <liam.merwick@oracle.com>

> 
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index c3d9006478a4..e1d2a46e06bf 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -1995,6 +1995,17 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
>          vmcb_mark_dirty(svm->vmcb, VMCB_DR);
>   }
> 
> +static char *svm_get_pf_insn_bytes(struct vcpu_svm *svm)
> +{
> +       if (!static_cpu_has(X86_FEATURE_DECODEASSISTS))
> +               return NULL;
> +
> +       if (svm->vmcb->control.exit_info_1 & PFERR_FETCH_MASK)
> +               return NULL;
> +
> +       return svm->vmcb->control.insn_bytes;
> +}
> +
>   static int pf_interception(struct kvm_vcpu *vcpu)
>   {
>          struct vcpu_svm *svm = to_svm(vcpu);
> @@ -2003,9 +2014,8 @@ static int pf_interception(struct kvm_vcpu *vcpu)
>          u64 error_code = svm->vmcb->control.exit_info_1;
> 
>          return kvm_handle_page_fault(vcpu, error_code, fault_address,
> -                       static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
> -                       svm->vmcb->control.insn_bytes : NULL,
> -                       svm->vmcb->control.insn_len);
> +                                    svm_get_pf_insn_bytes(svm),
> +                                    svm->vmcb->control.insn_len);
>   }
> 
>   static int npf_interception(struct kvm_vcpu *vcpu)
> @@ -2017,9 +2027,8 @@ static int npf_interception(struct kvm_vcpu *vcpu)
> 
>          trace_kvm_page_fault(fault_address, error_code);
>          return kvm_mmu_page_fault(vcpu, fault_address, error_code,
> -                       static_cpu_has(X86_FEATURE_DECODEASSISTS) ?
> -                       svm->vmcb->control.insn_bytes : NULL,
> -                       svm->vmcb->control.insn_len);
> +                                 svm_get_pf_insn_bytes(svm),
> +                                 svm->vmcb->control.insn_len);
>   }
> 
>   static int db_interception(struct kvm_vcpu *vcpu)