linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs
@ 2023-08-24  1:01 Sean Christopherson
  2023-08-24  6:53 ` Greg Kroah-Hartman
  0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2023-08-24  1:01 UTC (permalink / raw)
  To: stable, Greg Kroah-Hartman; +Cc: Paolo Bonzini, linux-kernel

Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
when checking to see if a page fault is stale, as the sequence count is
stored as an "unsigned long" everywhere else in KVM.  This fixes a bug
where KVM will effectively hang vCPUs due to always thinking page faults
are stale, which results in KVM refusing to "fix" faults.

mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
KVM is handling page faults to detect if userspace mappings relevant to
the guest were invalidated between snapshotting the counter and acquiring
mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
resolve the page fault is fresh.  If KVM sees that the counter has
changed, KVM simply resumes the guest without fixing the fault.

What _should_ happen is that the source of the mmu_notifier invalidations
eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
again fix guest page fault(s).

But for a long-lived VM and/or a VM that the host just doesn't particularly
like, it's possible for a VM to be on the receiving end of 2 billion (with
a B) mmu_notifier invalidations.  When that happens, bit 31 will be set in
mmu_invalidate_seq.  This causes the value to be turned into a 32-bit
negative value when implicitly cast to an "int" by is_page_fault_stale(),
and then sign-extended into a 64-bit unsigned when the signed "int" is
implicitly cast back to an "unsigned long" on the call to
mmu_invalidate_retry_hva().

As a result of the casting and sign-extension, given a sequence counter of
e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing

	if (0x8002dc25 != 0xffffffff8002dc25)

and signals that the page fault is stale and needs to be retried even
though the sequence counter is stable, and KVM effectively hangs any vCPU
that takes a page fault (EPT violation or #NPF when TDP is enabled).

Note, upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq
in kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
how KVM tracks the sequence counter snapshot.

Reported-by: Brian Rak <brak@vultr.com>
Reported-by: Amaan Cheval <amaan.cheval@gmail.com>
Reported-by: Eric Wheeler <kvm@lists.ewheeler.net>
Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 230108a90cf3..beca03556379 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4212,7 +4212,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
  * root was invalidated by a memslot update or a relevant mmu_notifier fired.
  */
 static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
-				struct kvm_page_fault *fault, int mmu_seq)
+				struct kvm_page_fault *fault,
+				unsigned long mmu_seq)
 {
 	struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
 

base-commit: 802aacbbffe2512dce9f8f33ad99d01cfec435de
-- 
2.42.0.rc2.253.gd59a3bf2b4-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs
  2023-08-24  1:01 [PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs Sean Christopherson
@ 2023-08-24  6:53 ` Greg Kroah-Hartman
  2023-08-24 13:46   ` Sean Christopherson
  0 siblings, 1 reply; 5+ messages in thread
From: Greg Kroah-Hartman @ 2023-08-24  6:53 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: stable, Paolo Bonzini, linux-kernel

On Wed, Aug 23, 2023 at 06:01:04PM -0700, Sean Christopherson wrote:
> Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
> when checking to see if a page fault is stale, as the sequence count is
> stored as an "unsigned long" everywhere else in KVM.  This fixes a bug
> where KVM will effectively hang vCPUs due to always thinking page faults
> are stale, which results in KVM refusing to "fix" faults.
> 
> mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
> KVM is handling page faults to detect if userspace mappings relevant to
> the guest were invalidated between snapshotting the counter and acquiring
> mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
> resolve the page fault is fresh.  If KVM sees that the counter has
> changed, KVM simply resumes the guest without fixing the fault.
> 
> What _should_ happen is that the source of the mmu_notifier invalidations
> eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
> again fix guest page fault(s).
> 
> But for a long-lived VM and/or a VM that the host just doesn't particularly
> like, it's possible for a VM to be on the receiving end of 2 billion (with
> a B) mmu_notifier invalidations.  When that happens, bit 31 will be set in
> mmu_invalidate_seq.  This causes the value to be turned into a 32-bit
> negative value when implicitly cast to an "int" by is_page_fault_stale(),
> and then sign-extended into a 64-bit unsigned when the signed "int" is
> implicitly cast back to an "unsigned long" on the call to
> mmu_invalidate_retry_hva().
> 
> As a result of the casting and sign-extension, given a sequence counter of
> e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing
> 
> 	if (0x8002dc25 != 0xffffffff8002dc25)
> 
> and signals that the page fault is stale and needs to be retried even
> though the sequence counter is stable, and KVM effectively hangs any vCPU
> that takes a page fault (EPT violation or #NPF when TDP is enabled).
> 
> Note, upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq
> in kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
> how KVM tracks the sequence counter snapshot.
> 
> Reported-by: Brian Rak <brak@vultr.com>
> Reported-by: Amaan Cheval <amaan.cheval@gmail.com>
> Reported-by: Eric Wheeler <kvm@lists.ewheeler.net>
> Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
> Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

What is the git commit id of this change in Linus's tree?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs
  2023-08-24  6:53 ` Greg Kroah-Hartman
@ 2023-08-24 13:46   ` Sean Christopherson
  2023-08-24 14:46     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 5+ messages in thread
From: Sean Christopherson @ 2023-08-24 13:46 UTC (permalink / raw)
  To: Greg Kroah-Hartman; +Cc: stable, Paolo Bonzini, linux-kernel

On Thu, Aug 24, 2023, Greg Kroah-Hartman wrote:
> On Wed, Aug 23, 2023 at 06:01:04PM -0700, Sean Christopherson wrote:
> > Note, upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq
> > in kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
> > how KVM tracks the sequence counter snapshot.
> > 
> > Reported-by: Brian Rak <brak@vultr.com>
> > Reported-by: Amaan Cheval <amaan.cheval@gmail.com>
> > Reported-by: Eric Wheeler <kvm@lists.ewheeler.net>
> > Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
> > Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> What is the git commit id of this change in Linus's tree?

There is none.  Commit ba6e3fe25543 (landed in v6.3) unknowingly fixed the bug as
part of a completely unrelated refactoring.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs
  2023-08-24 13:46   ` Sean Christopherson
@ 2023-08-24 14:46     ` Greg Kroah-Hartman
  2023-08-26 16:46       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 5+ messages in thread
From: Greg Kroah-Hartman @ 2023-08-24 14:46 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: stable, Paolo Bonzini, linux-kernel

On Thu, Aug 24, 2023 at 06:46:59AM -0700, Sean Christopherson wrote:
> On Thu, Aug 24, 2023, Greg Kroah-Hartman wrote:
> > On Wed, Aug 23, 2023 at 06:01:04PM -0700, Sean Christopherson wrote:
> > > Note, upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq
> > > in kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
> > > how KVM tracks the sequence counter snapshot.
> > > 
> > > Reported-by: Brian Rak <brak@vultr.com>
> > > Reported-by: Amaan Cheval <amaan.cheval@gmail.com>
> > > Reported-by: Eric Wheeler <kvm@lists.ewheeler.net>
> > > Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
> > > Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > 
> > What is the git commit id of this change in Linus's tree?
> 
> There is none.  Commit ba6e3fe25543 (landed in v6.3) unknowingly fixed the bug as
> part of a completely unrelated refactoring.

Ah, missed that in the text here, thanks!

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs
  2023-08-24 14:46     ` Greg Kroah-Hartman
@ 2023-08-26 16:46       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 5+ messages in thread
From: Greg Kroah-Hartman @ 2023-08-26 16:46 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: stable, Paolo Bonzini, linux-kernel

On Thu, Aug 24, 2023 at 04:46:44PM +0200, Greg Kroah-Hartman wrote:
> On Thu, Aug 24, 2023 at 06:46:59AM -0700, Sean Christopherson wrote:
> > On Thu, Aug 24, 2023, Greg Kroah-Hartman wrote:
> > > On Wed, Aug 23, 2023 at 06:01:04PM -0700, Sean Christopherson wrote:
> > > > Note, upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq
> > > > in kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
> > > > how KVM tracks the sequence counter snapshot.
> > > > 
> > > > Reported-by: Brian Rak <brak@vultr.com>
> > > > Reported-by: Amaan Cheval <amaan.cheval@gmail.com>
> > > > Reported-by: Eric Wheeler <kvm@lists.ewheeler.net>
> > > > Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameservers.com
> > > > Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
> > > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > > 
> > > What is the git commit id of this change in Linus's tree?
> > 
> > There is none.  Commit ba6e3fe25543 (landed in v6.3) unknowingly fixed the bug as
> > part of a completely unrelated refactoring.
> 
> Ah, missed that in the text here, thanks!

Now queued up, thanks.

greg k-h

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-08-26 16:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-24  1:01 [PATCH 6.1] KVM: x86/mmu: Fix an sign-extension bug with mmu_seq that hangs vCPUs Sean Christopherson
2023-08-24  6:53 ` Greg Kroah-Hartman
2023-08-24 13:46   ` Sean Christopherson
2023-08-24 14:46     ` Greg Kroah-Hartman
2023-08-26 16:46       ` Greg Kroah-Hartman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).