All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ricardo Koller <ricarkol@google.com>
To: Marc Zyngier <maz@kernel.org>
Cc: pbonzini@redhat.com, oupton@google.com, yuzenghui@huawei.com,
	dmatlack@google.com, kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	qperret@google.com, catalin.marinas@arm.com,
	andrew.jones@linux.dev, seanjc@google.com,
	alexandru.elisei@arm.com, suzuki.poulose@arm.com,
	eric.auger@redhat.com, gshan@redhat.com, reijiw@google.com,
	rananta@google.com, bgardon@google.com, ricarkol@gmail.com,
	Shaoqin Huang <shahuang@redhat.com>
Subject: Re: [PATCH v6 09/12] KVM: arm64: Split huge pages when dirty logging is enabled
Date: Mon, 10 Apr 2023 11:32:03 -0700	[thread overview]
Message-ID: <ZDRWI74ERb1Cpgbe@google.com> (raw)
In-Reply-To: <875yb65dvq.wl-maz@kernel.org>

On Sun, Mar 12, 2023 at 12:54:17PM +0000, Marc Zyngier wrote:
> On Tue, 07 Mar 2023 03:45:52 +0000,
> Ricardo Koller <ricarkol@google.com> wrote:
> > 
> > Split huge pages eagerly when enabling dirty logging. The goal is to
> > avoid doing it while faulting on write-protected pages, which
> > negatively impacts guest performance.
> > 
> > A memslot marked for dirty logging is split in 1GB pieces at a time.
> > This is in order to release the mmu_lock and give other kernel threads
> > the opportunity to run, and also in order to allocate enough pages to
> > split a 1GB range worth of huge pages (or a single 1GB huge page).
> > Note that these page allocations can fail, so eager page splitting is
> > best-effort.  This is not a correctness issue though, as huge pages
> > can still be split on write-faults.
> > 
> > The benefits of eager page splitting are the same as in x86, added
> > with commit a3fe5dbda0a4 ("KVM: x86/mmu: Split huge pages mapped by
> > the TDP MMU when dirty logging is enabled"). For example, when running
> > dirty_log_perf_test with 64 virtual CPUs (Ampere Altra), 1GB per vCPU,
> > 50% reads, and 2MB HugeTLB memory, the time it takes vCPUs to access
> > all of their memory after dirty logging is enabled decreased by 44%
> > from 2.58s to 1.42s.
> > 
> > Signed-off-by: Ricardo Koller <ricarkol@google.com>
> > Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
> > ---
> >  arch/arm64/kvm/mmu.c | 118 ++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 116 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 898985b09321..b1b8da5f8b6c 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -31,14 +31,21 @@ static phys_addr_t __ro_after_init hyp_idmap_vector;
> >  
> >  static unsigned long __ro_after_init io_map_base;
> >  
> > -static phys_addr_t stage2_range_addr_end(phys_addr_t addr, phys_addr_t end)
> > +static phys_addr_t __stage2_range_addr_end(phys_addr_t addr, phys_addr_t end,
> > +					   phys_addr_t size)
> >  {
> > -	phys_addr_t size = kvm_granule_size(KVM_PGTABLE_MIN_BLOCK_LEVEL);
> >  	phys_addr_t boundary = ALIGN_DOWN(addr + size, size);
> >  
> >  	return (boundary - 1 < end - 1) ? boundary : end;
> >  }
> >  
> > +static phys_addr_t stage2_range_addr_end(phys_addr_t addr, phys_addr_t end)
> > +{
> > +	phys_addr_t size = kvm_granule_size(KVM_PGTABLE_MIN_BLOCK_LEVEL);
> > +
> > +	return __stage2_range_addr_end(addr, end, size);
> > +}
> > +
> >  /*
> >   * Release kvm_mmu_lock periodically if the memory region is large. Otherwise,
> >   * we may see kernel panics with CONFIG_DETECT_HUNG_TASK,
> > @@ -75,6 +82,77 @@ static int stage2_apply_range(struct kvm_s2_mmu *mmu, phys_addr_t addr,
> >  #define stage2_apply_range_resched(mmu, addr, end, fn)			\
> >  	stage2_apply_range(mmu, addr, end, fn, true)
> >  
> > +static bool need_topup_split_page_cache_or_resched(struct kvm *kvm, uint64_t min)
> 
> Please don't use the words "page cache", it triggers a painful
> Pavlovian reflex. Something like "need_split_memcache_topup_or_reched"
> makes me feel less anxious.
>

fixed

> > +{
> > +	struct kvm_mmu_memory_cache *cache;
> > +
> > +	if (need_resched() || rwlock_needbreak(&kvm->mmu_lock))
> > +		return true;
> > +
> > +	cache = &kvm->arch.mmu.split_page_cache;
> > +	return kvm_mmu_memory_cache_nr_free_objects(cache) < min;
> > +}
> > +
> > +/*
> > + * Get the maximum number of page-tables needed to split a range of
> 
> nit: page-table pages.
>

fixed

> > + * blocks into PAGE_SIZE PTEs. It assumes the range is already mapped
> > + * at the PMD level, or at the PUD level if allowed.
> > + */
> > +static int kvm_mmu_split_nr_page_tables(u64 range)
> > +{
> > +	int n = 0;
> > +
> > +	if (KVM_PGTABLE_MIN_BLOCK_LEVEL < 2)
> > +		n += DIV_ROUND_UP_ULL(range, PUD_SIZE);
> > +	n += DIV_ROUND_UP_ULL(range, PMD_SIZE);
> > +	return n;
> > +}
> > +
> > +static int kvm_mmu_split_huge_pages(struct kvm *kvm, phys_addr_t addr,
> > +				    phys_addr_t end)
> > +{
> > +	struct kvm_mmu_memory_cache *cache;
> > +	struct kvm_pgtable *pgt;
> > +	int ret;
> > +	u64 next;
> > +	u64 chunk_size = kvm->arch.mmu.split_page_chunk_size;
> > +	int cache_capacity = kvm_mmu_split_nr_page_tables(chunk_size);
> > +
> > +	if (chunk_size == 0)
> > +		return 0;
> > +
> > +	lockdep_assert_held_write(&kvm->mmu_lock);
> 
> Please check for the lock being held early, even in the 0-sized chunk
> condition.
> 

fixed

> > +
> > +	cache = &kvm->arch.mmu.split_page_cache;
> > +
> > +	do {
> > +		if (need_topup_split_page_cache_or_resched(kvm,
> > +							   cache_capacity)) {
> 
> Since cache_capacity is stored in the kvm struct, why not just passing
> it to the helper function and let it deal with it?
>

removed the cache_capacity arg.

> > +			write_unlock(&kvm->mmu_lock);
> > +			cond_resched();
> > +			/* Eager page splitting is best-effort. */
> > +			ret = __kvm_mmu_topup_memory_cache(cache,
> > +							   cache_capacity,
> > +							   cache_capacity);
> > +			write_lock(&kvm->mmu_lock);
> > +			if (ret)
> > +				break;
> > +		}
> > +
> > +		pgt = kvm->arch.mmu.pgt;
> > +		if (!pgt)
> > +			return -EINVAL;
> > +
> > +		next = __stage2_range_addr_end(addr, end, chunk_size);
> > +		ret = kvm_pgtable_stage2_split(pgt, addr, next - addr,
> > +					       cache, cache_capacity);
> > +		if (ret)
> > +			break;
> > +	} while (addr = next, addr != end);
> > +
> > +	return ret;
> > +}
> > +
> >  static bool memslot_is_logging(struct kvm_memory_slot *memslot)
> >  {
> >  	return memslot->dirty_bitmap && !(memslot->flags & KVM_MEM_READONLY);
> > @@ -773,6 +851,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_mmu *mmu, unsigned long t
> >  void kvm_uninit_stage2_mmu(struct kvm *kvm)
> >  {
> >  	kvm_free_stage2_pgd(&kvm->arch.mmu);
> > +	kvm_mmu_free_memory_cache(&kvm->arch.mmu.split_page_cache);
> >  }
> >  
> >  static void stage2_unmap_memslot(struct kvm *kvm,
> > @@ -999,6 +1078,31 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> >  	stage2_wp_range(&kvm->arch.mmu, start, end);
> >  }
> >  
> > +/**
> > + * kvm_mmu_split_memory_region() - split the stage 2 blocks into PAGE_SIZE
> > + *				   pages for memory slot
> > + * @kvm:	The KVM pointer
> > + * @slot:	The memory slot to split
> > + *
> > + * Acquires kvm->mmu_lock. Called with kvm->slots_lock mutex acquired,
> > + * serializing operations for VM memory regions.
> > + */
> > +static void kvm_mmu_split_memory_region(struct kvm *kvm, int slot)
> > +{
> > +	struct kvm_memslots *slots = kvm_memslots(kvm);
> > +	struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
> > +	phys_addr_t start, end;
> > +
> > +	lockdep_assert_held(&kvm->slots_lock);
> 
> You have already accessed the memslots by the time you check for the
> lock. Not great.
> 

fixed

> > +
> > +	start = memslot->base_gfn << PAGE_SHIFT;
> > +	end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
> > +
> > +	write_lock(&kvm->mmu_lock);
> > +	kvm_mmu_split_huge_pages(kvm, start, end);
> > +	write_unlock(&kvm->mmu_lock);
> > +}
> > +
> >  /*
> >   * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
> >   * dirty pages.
> > @@ -1790,6 +1894,16 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> >  			return;
> >  
> >  		kvm_mmu_wp_memory_region(kvm, new->id);
> > +		kvm_mmu_split_memory_region(kvm, new->id);
> 
> Would there be an advantage in merging these two operations somehow?
>

I guess we could. The only issue is that it could be useful to
write-protect a memslot without splitting huge pages.

> > +	} else {
> > +		/*
> > +		 * Free any leftovers from the eager page splitting cache. Do
> > +		 * this when deleting, moving, disabling dirty logging, or
> > +		 * creating the memslot (a nop). Doing it for deletes makes
> > +		 * sure we don't leak memory, and there's no need to keep the
> > +		 * cache around for any of the other cases.
> > +		 */
> > +		kvm_mmu_free_memory_cache(&kvm->arch.mmu.split_page_cache);
> >  	}
> >  }
> >  
> 
> Thanks,
> 
> 	M.
> 
> -- 
> Without deviation from the norm, progress is not possible.

  reply	other threads:[~2023-04-10 18:32 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-07  3:45 [PATCH v6 00/12] Implement Eager Page Splitting for ARM Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 01/12] KVM: arm64: Rename free_removed to free_unlinked Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 02/12] KVM: arm64: Add KVM_PGTABLE_WALK ctx->flags for skipping BBM and CMO Ricardo Koller
2023-03-12 10:49   ` Marc Zyngier
2023-03-13 18:49     ` Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 03/12] KVM: arm64: Add helper for creating unlinked stage2 subtrees Ricardo Koller
2023-03-12 11:06   ` Marc Zyngier
2023-03-13 22:23     ` Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 04/12] KVM: arm64: Add kvm_pgtable_stage2_split() Ricardo Koller
2023-03-12 11:35   ` Marc Zyngier
2023-03-13 23:58     ` Ricardo Koller
2023-03-15 18:09       ` Marc Zyngier
2023-03-15 18:51         ` Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 05/12] KVM: arm64: Refactor kvm_arch_commit_memory_region() Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 06/12] KVM: arm64: Add kvm_uninit_stage2_mmu() Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 07/12] KVM: arm64: Export kvm_are_all_memslots_empty() Ricardo Koller
2023-03-12 11:39   ` Marc Zyngier
2023-03-13 15:18     ` Sean Christopherson
2023-03-14 10:18       ` Marc Zyngier
2023-03-15 21:00         ` Sean Christopherson
2023-03-07  3:45 ` [PATCH v6 08/12] KVM: arm64: Add KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE Ricardo Koller
2023-03-12 11:56   ` Marc Zyngier
2023-03-24  7:41     ` Ricardo Koller
2023-03-29  4:50   ` Reiji Watanabe
2023-04-10 20:04     ` Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 09/12] KVM: arm64: Split huge pages when dirty logging is enabled Ricardo Koller
2023-03-12 12:54   ` Marc Zyngier
2023-04-10 18:32     ` Ricardo Koller [this message]
2023-03-07  3:45 ` [PATCH v6 10/12] KVM: arm64: Open-code kvm_mmu_write_protect_pt_masked() Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 11/12] KVM: arm64: Split huge pages during KVM_CLEAR_DIRTY_LOG Ricardo Koller
2023-03-12 13:01   ` Marc Zyngier
2023-04-10 18:26     ` Ricardo Koller
2023-03-07  3:45 ` [PATCH v6 12/12] KVM: arm64: Use local TLBI on permission relaxation Ricardo Koller
2023-03-12 13:22   ` Marc Zyngier
2023-04-10 18:22     ` Ricardo Koller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDRWI74ERb1Cpgbe@google.com \
    --to=ricarkol@google.com \
    --cc=alexandru.elisei@arm.com \
    --cc=andrew.jones@linux.dev \
    --cc=bgardon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=dmatlack@google.com \
    --cc=eric.auger@redhat.com \
    --cc=gshan@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=maz@kernel.org \
    --cc=oupton@google.com \
    --cc=pbonzini@redhat.com \
    --cc=qperret@google.com \
    --cc=rananta@google.com \
    --cc=reijiw@google.com \
    --cc=ricarkol@gmail.com \
    --cc=seanjc@google.com \
    --cc=shahuang@redhat.com \
    --cc=suzuki.poulose@arm.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.