linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: replace large kvmalloc allocation with vmalloc
@ 2021-10-15 16:55 Paolo Bonzini
  2021-10-15 17:28 ` Sean Christopherson
  0 siblings, 1 reply; 3+ messages in thread
From: Paolo Bonzini @ 2021-10-15 16:55 UTC (permalink / raw)
  To: linux-kernel, kvm; +Cc: seanjc, syzbot+e0de2333cbf95ea473e8

KVM's paging data structures (especially the rmaps) can be made as
large as possible by userspace simply by creating large-enough memslots.
Since commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
these huge allocations cause a warning, assuming that they could be the
result of an integer overflow or underflow.

Since there are configurations in the wild creating a multi-TiB memslot,
and in fact it is more likely than not that these allocations end up not
using kmalloc-ed memory.  For example, the dirty bitmap for a 64 GiB
memslot would cause a 4 MiB allocation, since each 32 KiB of guest
address space corresponds to 2 bytes in the dirty bitmap.  Therefore,
just use vmalloc directly.  Introduce a new helper vcalloc to check for
overflow for extra paranoia, even though it should not be a problem here
even on 32-bit systems.

Reported-by: syzbot+e0de2333cbf95ea473e8@syzkaller.appspotmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 arch/x86/kvm/mmu/page_track.c |  3 +--
 arch/x86/kvm/x86.c            |  4 ++--
 include/linux/vmalloc.h       | 10 ++++++++++
 virt/kvm/kvm_main.c           |  4 ++--
 4 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
index 21427e84a82e..0d9842472288 100644
--- a/arch/x86/kvm/mmu/page_track.c
+++ b/arch/x86/kvm/mmu/page_track.c
@@ -36,8 +36,7 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
 
 	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
 		slot->arch.gfn_track[i] =
-			kvcalloc(npages, sizeof(*slot->arch.gfn_track[i]),
-				 GFP_KERNEL_ACCOUNT);
+			vcalloc(npages, sizeof(*slot->arch.gfn_track[i]));
 		if (!slot->arch.gfn_track[i])
 			goto track_free;
 	}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aabd3a2ec1bc..07f5760ea30c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11394,7 +11394,7 @@ static int memslot_rmap_alloc(struct kvm_memory_slot *slot,
 
 		WARN_ON(slot->arch.rmap[i]);
 
-		slot->arch.rmap[i] = kvcalloc(lpages, sz, GFP_KERNEL_ACCOUNT);
+		slot->arch.rmap[i] = vcalloc(lpages, sz);
 		if (!slot->arch.rmap[i]) {
 			memslot_rmap_free(slot);
 			return -ENOMEM;
@@ -11475,7 +11475,7 @@ static int kvm_alloc_memslot_metadata(struct kvm *kvm,
 
 		lpages = __kvm_mmu_slot_lpages(slot, npages, level);
 
-		linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
+		linfo = vcalloc(lpages, sizeof(*linfo));
 		if (!linfo)
 			goto out_free;
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 671d402c3778..6d51c83c2b0e 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -167,6 +167,16 @@ extern int remap_vmalloc_range_partial(struct vm_area_struct *vma,
 extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
 							unsigned long pgoff);
 
+static inline void *vcalloc(size_t n, size_t size)
+{
+	size_t bytes;
+
+	if (unlikely(check_mul_overflow(n, size, &bytes)))
+		return NULL;
+
+	return vzalloc(bytes);
+}
+
 /*
  * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values
  * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings()
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7851f3a1b5f7..0295d89f5445 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1242,9 +1242,9 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
  */
 static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
-	unsigned long dirty_bytes = 2 * kvm_dirty_bitmap_bytes(memslot);
+	unsigned long dirty_bytes = kvm_dirty_bitmap_bytes(memslot);
 
-	memslot->dirty_bitmap = kvzalloc(dirty_bytes, GFP_KERNEL_ACCOUNT);
+	memslot->dirty_bitmap = vcalloc(2, dirty_bytes);
 	if (!memslot->dirty_bitmap)
 		return -ENOMEM;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] KVM: replace large kvmalloc allocation with vmalloc
  2021-10-15 16:55 [PATCH] KVM: replace large kvmalloc allocation with vmalloc Paolo Bonzini
@ 2021-10-15 17:28 ` Sean Christopherson
  2021-10-15 17:48   ` Paolo Bonzini
  0 siblings, 1 reply; 3+ messages in thread
From: Sean Christopherson @ 2021-10-15 17:28 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: linux-kernel, kvm, syzbot+e0de2333cbf95ea473e8

On Fri, Oct 15, 2021, Paolo Bonzini wrote:
> diff --git a/arch/x86/kvm/mmu/page_track.c b/arch/x86/kvm/mmu/page_track.c
> index 21427e84a82e..0d9842472288 100644
> --- a/arch/x86/kvm/mmu/page_track.c
> +++ b/arch/x86/kvm/mmu/page_track.c
> @@ -36,8 +36,7 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
>  
>  	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
>  		slot->arch.gfn_track[i] =
> -			kvcalloc(npages, sizeof(*slot->arch.gfn_track[i]),
> -				 GFP_KERNEL_ACCOUNT);
> +			vcalloc(npages, sizeof(*slot->arch.gfn_track[i]));

This loses the memcg accounting, which is somewhat important for the theoretical
4MiB allocations :-)

Maybe split out the introduction of vcalloc() to a separate patch (or two) and
introduce additional helpers to allow passing in gfp_t to e.g. __vzalloc()?

>  		if (!slot->arch.gfn_track[i])
>  			goto track_free;
>  	}
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index aabd3a2ec1bc..07f5760ea30c 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11394,7 +11394,7 @@ static int memslot_rmap_alloc(struct kvm_memory_slot *slot,
>  
>  		WARN_ON(slot->arch.rmap[i]);
>  
> -		slot->arch.rmap[i] = kvcalloc(lpages, sz, GFP_KERNEL_ACCOUNT);
> +		slot->arch.rmap[i] = vcalloc(lpages, sz);
>  		if (!slot->arch.rmap[i]) {
>  			memslot_rmap_free(slot);
>  			return -ENOMEM;
> @@ -11475,7 +11475,7 @@ static int kvm_alloc_memslot_metadata(struct kvm *kvm,
>  
>  		lpages = __kvm_mmu_slot_lpages(slot, npages, level);
>  
> -		linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
> +		linfo = vcalloc(lpages, sizeof(*linfo));
>  		if (!linfo)
>  			goto out_free;

All of the associated free paths should be converted to vfree().

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] KVM: replace large kvmalloc allocation with vmalloc
  2021-10-15 17:28 ` Sean Christopherson
@ 2021-10-15 17:48   ` Paolo Bonzini
  0 siblings, 0 replies; 3+ messages in thread
From: Paolo Bonzini @ 2021-10-15 17:48 UTC (permalink / raw)
  To: Sean Christopherson; +Cc: linux-kernel, kvm, syzbot+e0de2333cbf95ea473e8

On 15/10/21 19:28, Sean Christopherson wrote:
>>   	for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
>>   		slot->arch.gfn_track[i] =
>> -			kvcalloc(npages, sizeof(*slot->arch.gfn_track[i]),
>> -				 GFP_KERNEL_ACCOUNT);
>> +			vcalloc(npages, sizeof(*slot->arch.gfn_track[i]));
> This loses the memcg accounting, which is somewhat important for the theoretical
> 4MiB allocations:-)

True, and in fact 4 MiB is not so theoretical.

> Maybe split out the introduction of vcalloc() to a separate patch (or two) and
> introduce additional helpers to allow passing in gfp_t to e.g. __vzalloc()?

Yes, this is what actually slowed me down this week.This is the bare 
minimum that I can send to Linus right now to avoid the WARN.

I have patches to clean all of this up, but they will have to go throw 
Andrew Morton; he will decide whether to throw them in 5.15 or go 
through stable, but anyway 5.16.1 or .2 should have the accounting back 
at most.

Paolo


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-10-15 17:48 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-15 16:55 [PATCH] KVM: replace large kvmalloc allocation with vmalloc Paolo Bonzini
2021-10-15 17:28 ` Sean Christopherson
2021-10-15 17:48   ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).