Re: [PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array

From: Sean Christopherson <seanjc@google.com>
To: "Maciej S. Szmigiero" <mail@maciej.szmigiero.name>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Vitaly Kuznetsov <vkuznets@redhat.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Huacai Chen <chenhuacai@kernel.org>,
	Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
	Paul Mackerras <paulus@ozlabs.org>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Janosch Frank <frankja@linux.ibm.com>,
	David Hildenbrand <david@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Claudio Imbrenda <imbrenda@linux.ibm.com>,
	Joerg Roedel <joro@8bytes.org>,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v3 1/8] KVM: x86: Cache total page count to avoid traversing the memslot array
Date: Wed, 19 May 2021 21:00:52 +0000	[thread overview]
Message-ID: <YKV8hHDS489g9JBS@google.com> (raw)
In-Reply-To: <eb1c881ce814705c83813f02a1a13ced96f1b1d1.1621191551.git.maciej.szmigiero@oracle.com>

On Sun, May 16, 2021, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
> 
> There is no point in recalculating from scratch the total number of pages
> in all memslots each time a memslot is created or deleted.
> 
> Just cache the value and update it accordingly on each such operation so
> the code doesn't need to traverse the whole memslot array each time.
> 
> Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
> ---
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5bd550eaf683..8c7738b75393 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11112,9 +11112,21 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>  				const struct kvm_memory_slot *new,
>  				enum kvm_mr_change change)
>  {
> -	if (!kvm->arch.n_requested_mmu_pages)
> -		kvm_mmu_change_mmu_pages(kvm,
> -				kvm_mmu_calculate_default_mmu_pages(kvm));
> +	if (change == KVM_MR_CREATE)
> +		kvm->arch.n_memslots_pages += new->npages;
> +	else if (change == KVM_MR_DELETE) {
> +		WARN_ON(kvm->arch.n_memslots_pages < old->npages);

Heh, so I think this WARN can be triggered at will by userspace on 32-bit KVM by
causing the running count to wrap.  KVM artificially caps the size of a single
memslot at ((1UL << 31) - 1), but userspace could create multiple gigantic slots
to overflow arch.n_memslots_pages.

I _think_ changing it to a u64 would fix the problem since KVM forbids overlapping
memslots in the GPA space.

Also, what about moving the check-and-WARN to prepare_memory_region() so that
KVM can error out if the check fails?  Doesn't really matter, but an explicit
error for userspace is preferable to underflowing the number of pages and getting
weird MMU errors/behavior down the line.

> +		kvm->arch.n_memslots_pages -= old->npages;
> +	}
> +
> +	if (!kvm->arch.n_requested_mmu_pages) {

If we're going to bother caching the number of pages then we should also skip
the update when the number pages isn't changing, e.g.

	if (change == KVM_MR_CREATE || change == KVM_MR_DELETE) {
		if (change == KVM_MR_CREATE)
			kvm->arch.n_memslots_pages += new->npages;
		else
			kvm->arch.n_memslots_pages -= old->npages;

		if (!kvm->arch.n_requested_mmu_pages) {
			unsigned long nr_mmu_pages;

			nr_mmu_pages = kvm->arch.n_memslots_pages *
				       KVM_PERMILLE_MMU_PAGES / 1000;
			nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
			kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
		}
	}

> +		unsigned long nr_mmu_pages;
> +
> +		nr_mmu_pages = kvm->arch.n_memslots_pages *
> +			       KVM_PERMILLE_MMU_PAGES / 1000;
> +		nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
> +		kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
> +	}
>  
>  	/*
>  	 * FIXME: const-ify all uses of struct kvm_memory_slot.