linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
@ 2021-08-04 21:46 Sean Christopherson
  2021-08-04 22:19 ` Jim Mattson
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Sean Christopherson @ 2021-08-04 21:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
	Joerg Roedel, kvm, linux-kernel, Ben Gardon

Take a signed 'long' instead of an 'unsigned long' for the number of
pages to add/subtract to the total number of pages used by the MMU.  This
fixes a zero-extension bug on 32-bit kernels that effectively corrupts
the per-cpu counter used by the shrinker.

Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
an unsigned 32-bit value on 32-bit kernels.  As a result, the value used
to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to
4294967295 and effectively corrupts the counter.

This was found by a staggering amount of sheer dumb luck when running
kvm-unit-tests on a 32-bit KVM build.  The shrinker just happened to kick
in while running tests and do_shrink_slab() logged an error about trying
to free a negative number of objects.  The truly lucky part is that the
kernel just happened to be a slightly stale build, as the shrinker no
longer yells about negative objects as of commit 18bb473e5031 ("mm:
vmscan: shrink deferred objects proportional to priority").

 vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460

Fixes: bc8a3d8925a8 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation")
Cc: stable@vger.kernel.org
Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b4b65c21b2ca..082a0ba79edd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1700,7 +1700,7 @@ static int is_empty_shadow_page(u64 *spt)
  * aggregate version in order to make the slab shrinker
  * faster
  */
-static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, unsigned long nr)
+static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
 {
 	kvm->arch.n_used_mmu_pages += nr;
 	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
-- 
2.32.0.554.ge1b32706d8-goog


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
  2021-08-04 21:46 [PATCH] KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds Sean Christopherson
@ 2021-08-04 22:19 ` Jim Mattson
  2021-08-05  7:33 ` Paolo Bonzini
  2021-08-05 11:26 ` Maxim Levitsky
  2 siblings, 0 replies; 4+ messages in thread
From: Jim Mattson @ 2021-08-04 22:19 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Paolo Bonzini, Vitaly Kuznetsov, Wanpeng Li, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On Wed, Aug 4, 2021 at 2:46 PM Sean Christopherson <seanjc@google.com> wrote:
>
> Take a signed 'long' instead of an 'unsigned long' for the number of
> pages to add/subtract to the total number of pages used by the MMU.  This
> fixes a zero-extension bug on 32-bit kernels that effectively corrupts
> the per-cpu counter used by the shrinker.
>
> Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
> kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
> an unsigned 32-bit value on 32-bit kernels.  As a result, the value used
> to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
> sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to
> 4294967295 and effectively corrupts the counter.
>
> This was found by a staggering amount of sheer dumb luck when running
> kvm-unit-tests on a 32-bit KVM build.  The shrinker just happened to kick
> in while running tests and do_shrink_slab() logged an error about trying
> to free a negative number of objects.  The truly lucky part is that the
> kernel just happened to be a slightly stale build, as the shrinker no
> longer yells about negative objects as of commit 18bb473e5031 ("mm:
> vmscan: shrink deferred objects proportional to priority").
>
>  vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460
>
> Fixes: bc8a3d8925a8 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation")
> Cc: stable@vger.kernel.org
> Cc: Ben Gardon <bgardon@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Ouch!

Reviewed-by: Jim Mattson <jmattson@google.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
  2021-08-04 21:46 [PATCH] KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds Sean Christopherson
  2021-08-04 22:19 ` Jim Mattson
@ 2021-08-05  7:33 ` Paolo Bonzini
  2021-08-05 11:26 ` Maxim Levitsky
  2 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2021-08-05  7:33 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On 04/08/21 23:46, Sean Christopherson wrote:
> Take a signed 'long' instead of an 'unsigned long' for the number of
> pages to add/subtract to the total number of pages used by the MMU.  This
> fixes a zero-extension bug on 32-bit kernels that effectively corrupts
> the per-cpu counter used by the shrinker.
> 
> Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
> kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
> an unsigned 32-bit value on 32-bit kernels.  As a result, the value used
> to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
> sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to
> 4294967295 and effectively corrupts the counter.
> 
> This was found by a staggering amount of sheer dumb luck when running
> kvm-unit-tests on a 32-bit KVM build.  The shrinker just happened to kick
> in while running tests and do_shrink_slab() logged an error about trying
> to free a negative number of objects.  The truly lucky part is that the
> kernel just happened to be a slightly stale build, as the shrinker no
> longer yells about negative objects as of commit 18bb473e5031 ("mm:
> vmscan: shrink deferred objects proportional to priority").
> 
>   vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460
> 
> Fixes: bc8a3d8925a8 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation")
> Cc: stable@vger.kernel.org
> Cc: Ben Gardon <bgardon@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/mmu/mmu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b4b65c21b2ca..082a0ba79edd 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1700,7 +1700,7 @@ static int is_empty_shadow_page(u64 *spt)
>    * aggregate version in order to make the slab shrinker
>    * faster
>    */
> -static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, unsigned long nr)
> +static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
>   {
>   	kvm->arch.n_used_mmu_pages += nr;
>   	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
> 

Queued, thanks.

Paolo


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds
  2021-08-04 21:46 [PATCH] KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds Sean Christopherson
  2021-08-04 22:19 ` Jim Mattson
  2021-08-05  7:33 ` Paolo Bonzini
@ 2021-08-05 11:26 ` Maxim Levitsky
  2 siblings, 0 replies; 4+ messages in thread
From: Maxim Levitsky @ 2021-08-05 11:26 UTC (permalink / raw)
  To: Sean Christopherson, Paolo Bonzini
  Cc: Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm,
	linux-kernel, Ben Gardon

On Wed, 2021-08-04 at 14:46 -0700, Sean Christopherson wrote:
> Take a signed 'long' instead of an 'unsigned long' for the number of
> pages to add/subtract to the total number of pages used by the MMU.  This
> fixes a zero-extension bug on 32-bit kernels that effectively corrupts
> the per-cpu counter used by the shrinker.
> 
> Per-cpu counters take a signed 64-bit value on both 32-bit and 64-bit
> kernels, whereas kvm_mod_used_mmu_pages() takes an unsigned long and thus
> an unsigned 32-bit value on 32-bit kernels.  As a result, the value used
> to adjust the per-cpu counter is zero-extended (unsigned -> signed), not
> sign-extended (signed -> signed), and so KVM's intended -1 gets morphed to
> 4294967295 and effectively corrupts the counter.
> 
> This was found by a staggering amount of sheer dumb luck when running
> kvm-unit-tests on a 32-bit KVM build.  The shrinker just happened to kick
> in while running tests and do_shrink_slab() logged an error about trying
> to free a negative number of objects.  The truly lucky part is that the
> kernel just happened to be a slightly stale build, as the shrinker no
> longer yells about negative objects as of commit 18bb473e5031 ("mm:
> vmscan: shrink deferred objects proportional to priority").
> 
>  vmscan: shrink_slab: mmu_shrink_scan+0x0/0x210 [kvm] negative objects to delete nr=-858993460
> 
> Fixes: bc8a3d8925a8 ("kvm: mmu: Fix overflow on kvm mmu page limit calculation")
> Cc: stable@vger.kernel.org
> Cc: Ben Gardon <bgardon@google.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/mmu.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index b4b65c21b2ca..082a0ba79edd 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1700,7 +1700,7 @@ static int is_empty_shadow_page(u64 *spt)
>   * aggregate version in order to make the slab shrinker
>   * faster
>   */
> -static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, unsigned long nr)
> +static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
>  {
>  	kvm->arch.n_used_mmu_pages += nr;
>  	percpu_counter_add(&kvm_total_used_mmu_pages, nr);

I am almost sure that I seen this bug as well (I do test 32 bit KVM hosts,
even with nested 32 bit guests once in a while), but I didn't dare to investigate
it due to the fact the 32 bit KVM host is a very rare thing these days.

Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>

Best regards,
	Maxim Levitsky


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-05 11:26 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-04 21:46 [PATCH] KVM: x86/mmu: Fix per-cpu counter corruption on 32-bit builds Sean Christopherson
2021-08-04 22:19 ` Jim Mattson
2021-08-05  7:33 ` Paolo Bonzini
2021-08-05 11:26 ` Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).