From: Shaoqin Huang <shahuang@redhat.com>
To: Gavin Shan <gshan@redhat.com>, kvmarm@lists.linux.dev
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
pbonzini@redhat.com, seanjc@google.com, mtosatti@redhat.com,
maz@kernel.org, will@kernel.org, c.dall@virtualopensystems.com,
peterx@redhat.com, david@redhat.com, aarcange@redhat.com,
hshuai@redhat.com, zhenyzha@redhat.com, shan.gavin@gmail.com
Subject: Re: [PATCH v3] KVM: Avoid illegal stage2 mapping on invalid memory slot
Date: Fri, 16 Jun 2023 16:35:18 +0800 [thread overview]
Message-ID: <ea6777df-7bb6-eeb9-645e-548bcbd6c2f6@redhat.com> (raw)
In-Reply-To: <20230615054259.14911-1-gshan@redhat.com>
On 6/15/23 13:42, Gavin Shan wrote:
> We run into guest hang in edk2 firmware when KSM is kept as running on
> the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
> device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
> buffered write. The status is returned by reading the memory region of
> the pflash device and the read request should have been forwarded to QEMU
> and emulated by it. Unfortunately, the read request is covered by an
> illegal stage2 mapping when the guest hang issue occurs. The read request
> is completed with QEMU bypassed and wrong status is fetched. The edk2
> firmware runs into an infinite loop with the wrong status.
>
> The illegal stage2 mapping is populated due to same page sharing by KSM
> at (C) even the associated memory slot has been marked as invalid at (B)
> when the memory slot is requested to be deleted. It's notable that the
> active and inactive memory slots can't be swapped when we're in the middle
> of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
> is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
> to zero again. Besides, the swapping from the active to the inactive memory
> slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
> corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
>
> CPU-A CPU-B
> ----- -----
> ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
> kvm_vm_ioctl_set_memory_region
> kvm_set_memory_region
> __kvm_set_memory_region
> kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
> kvm_invalidate_memslot
> kvm_copy_memslot
> kvm_replace_memslot
> kvm_swap_active_memslots (A)
> kvm_arch_flush_shadow_memslot (B)
> same page sharing by KSM
> kvm_mmu_notifier_invalidate_range_start
> :
> kvm_mmu_notifier_change_pte
> kvm_handle_hva_range
> __kvm_handle_hva_range
> kvm_set_spte_gfn (C)
> :
> kvm_mmu_notifier_invalidate_range_end
>
> Fix the issue by skipping the invalid memory slot at (C) to avoid the
> illegal stage2 mapping so that the read request for the pflash's status
> is forwarded to QEMU and emulated by it. In this way, the correct pflash's
> status can be returned from QEMU to break the infinite loop in the edk2
> firmware.
>
> We tried a git-bisect and the first problematic commit is cd4c71835228 ("
> KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
> clean_dcache_guest_page() is called after the memory slots are iterated
> in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
> before the iteration on the memory slots before this commit. This change
> literally enlarges the racy window between kvm_mmu_notifier_change_pte()
> and memory slot removal so that we're able to reproduce the issue in a
> practical test case. However, the issue exists since commit d5d8184d35c9
> ("KVM: ARM: Memory virtualization setup").
>
> Cc: stable@vger.kernel.org # v3.9+
> Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
> Reported-by: Shuai Hu <hshuai@redhat.com>
> Reported-by: Zhenyu Zhang <zhenyzha@redhat.com>
> Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
> ---
> v3: Skip the invalid memory slots in change_pte() MMU notifier only,
> suggested by Sean. Improved changelog to describe how the fixes
> tag is given.
> ---
> virt/kvm/kvm_main.c | 20 +++++++++++++++++++-
> 1 file changed, 19 insertions(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 479802a892d4..65f94f592ff8 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -686,6 +686,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
>
> return __kvm_handle_hva_range(kvm, &range);
> }
> +
> +static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
> +{
> + /*
> + * Skipping invalid memslots is correct if and only change_pte() is
> + * surrounded by invalidate_range_{start,end}(), which is currently
> + * guaranteed by the primary MMU. If that ever changes, KVM needs to
> + * unmap the memslot instead of skipping the memslot to ensure that KVM
> + * doesn't hold references to the old PFN.
> + */
> + WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
> +
> + if (range->slot->flags & KVM_MEMSLOT_INVALID)
> + return false;
> +
> + return kvm_set_spte_gfn(kvm, range);
> +}
> +
> static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
> struct mm_struct *mm,
> unsigned long address,
> @@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
> if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
> return;
>
> - kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
> + kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
> }
>
> void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
--
Shaoqin
prev parent reply other threads:[~2023-06-16 8:36 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-15 5:42 [PATCH v3] KVM: Avoid illegal stage2 mapping on invalid memory slot Gavin Shan
2023-06-15 8:53 ` David Hildenbrand
2023-06-15 12:37 ` Oliver Upton
2023-06-15 20:01 ` Peter Xu
2023-06-15 21:24 ` Sean Christopherson
2023-06-16 8:35 ` Shaoqin Huang [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ea6777df-7bb6-eeb9-645e-548bcbd6c2f6@redhat.com \
--to=shahuang@redhat.com \
--cc=aarcange@redhat.com \
--cc=c.dall@virtualopensystems.com \
--cc=david@redhat.com \
--cc=gshan@redhat.com \
--cc=hshuai@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.linux.dev \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mtosatti@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=seanjc@google.com \
--cc=shan.gavin@gmail.com \
--cc=will@kernel.org \
--cc=zhenyzha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).