linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vipin Sharma <vipinsh@google.com>
To: maz@kernel.org, oliver.upton@linux.dev, james.morse@arm.com,
	suzuki.poulose@arm.com, yuzenghui@huawei.com,
	catalin.marinas@arm.com, will@kernel.org, chenhuacai@kernel.org,
	aleksandar.qemu.devel@gmail.com, tsbogend@alpha.franken.de,
	anup@brainfault.org, atishp@atishpatra.org,
	paul.walmsley@sifive.com, palmer@dabbelt.com,
	aou@eecs.berkeley.edu, seanjc@google.com, pbonzini@redhat.com,
	dmatlack@google.com, ricarkol@google.com
Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
	linux-mips@vger.kernel.org, kvm-riscv@lists.infradead.org,
	linux-riscv@lists.infradead.org, linux-kselftest@vger.kernel.org,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Vipin Sharma <vipinsh@google.com>
Subject: [PATCH v2 16/16] KVM: arm64: Split huge pages during clear-dirty-log under MMU read lock
Date: Fri,  2 Jun 2023 09:09:14 -0700	[thread overview]
Message-ID: <20230602160914.4011728-17-vipinsh@google.com> (raw)
In-Reply-To: <20230602160914.4011728-1-vipinsh@google.com>

Split huge pages under MMU read lock instead of write when clearing
dirty log.

Running huge page split under read lock will unblock vCPUs execution and
allow whole clear-dirty-log operation run parallelly to vCPUs.

Note that splitting huge pages involves two walkers. First walker calls
stage2_split_walker() callback on each huge page. This callback will call
another walker which creates an unlinked page table. This commit makes
first walker as shared page walker which means, -EAGAIN will be retried.
Before this patch, -EAGAIN would have been ignored and walker would go
to next huge page. In practice this would not happen as the first walker
was holding MMU write lock. Inner walker is unchanged as it is working
on unlinked page table so no other thread will have access to it.

To improve confidence in correctness tested via dirty_log_test.

To measure performance improvement tested via dirty_log_perf_test.

Set up:
-------
Host: ARM Ampere Altra host (64 CPUs, 256 GB memory and single NUMA
      node)

Test VM: 48 vCPU, 192 GB total memory.

Ran dirty_log_perf_test for 400 iterations.
 ./dirty_log_perf_test -k 192G -v 48 -b 4G -m 2 -i 4000 -s anonymous_hugetlb_2mb -j

Observation:
------------

+==================+=============================+===================+
| Clear Chunk size | Clear dirty log time change | vCPUs improvement |
+==================+=============================+===================+
| 192GB            | 56%                         | 152%              |
+------------------+-----------------------------+-------------------+
| 1GB              | -81%                        | 72%               |
+------------------+-----------------------------+-------------------+

When larger chunks are used, clear dirty log time increases due to lots
of cmpxchg() but vCPUs are also able to execute parallelly causing
better performance of guest.

When chunk size is small, read lock is very fast in clearing dirty logs
as it is not waiting for MMU write lock and vCPUs are also able to run
parallelly.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 arch/arm64/kvm/mmu.c | 21 ++++++++++++++-------
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6dd964e3682c..aa278f5d27a2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -126,7 +126,10 @@ static int kvm_mmu_split_huge_pages(struct kvm *kvm, phys_addr_t addr,
 	int ret, cache_capacity;
 	u64 next, chunk_size;
 
-	lockdep_assert_held_write(&kvm->mmu_lock);
+	if (flags & KVM_PGTABLE_WALK_SHARED)
+		lockdep_assert_held_read(&kvm->mmu_lock);
+	else
+		lockdep_assert_held_write(&kvm->mmu_lock);
 
 	chunk_size = kvm->arch.mmu.split_page_chunk_size;
 	cache_capacity = kvm_mmu_split_nr_page_tables(chunk_size);
@@ -138,13 +141,19 @@ static int kvm_mmu_split_huge_pages(struct kvm *kvm, phys_addr_t addr,
 
 	do {
 		if (need_split_memcache_topup_or_resched(kvm)) {
-			write_unlock(&kvm->mmu_lock);
+			if (flags & KVM_PGTABLE_WALK_SHARED)
+				read_unlock(&kvm->mmu_lock);
+			else
+				write_unlock(&kvm->mmu_lock);
 			cond_resched();
 			/* Eager page splitting is best-effort. */
 			ret = __kvm_mmu_topup_memory_cache(cache,
 							   cache_capacity,
 							   cache_capacity);
-			write_lock(&kvm->mmu_lock);
+			if (flags & KVM_PGTABLE_WALK_SHARED)
+				read_lock(&kvm->mmu_lock);
+			else
+				write_lock(&kvm->mmu_lock);
 			if (ret)
 				break;
 		}
@@ -1139,9 +1148,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 
 	read_lock(&kvm->mmu_lock);
 	stage2_wp_range(&kvm->arch.mmu, start, end, KVM_PGTABLE_WALK_SHARED);
-	read_unlock(&kvm->mmu_lock);
 
-	write_lock(&kvm->mmu_lock);
 	/*
 	 * Eager-splitting is done when manual-protect is set.  We
 	 * also check for initially-all-set because we can avoid
@@ -1151,8 +1158,8 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
 	 * again.
 	 */
 	if (kvm_dirty_log_manual_protect_and_init_set(kvm))
-		kvm_mmu_split_huge_pages(kvm, start, end, 0);
-	write_unlock(&kvm->mmu_lock);
+		kvm_mmu_split_huge_pages(kvm, start, end, KVM_PGTABLE_WALK_SHARED);
+	read_unlock(&kvm->mmu_lock);
 }
 
 static void kvm_send_hwpoison_signal(unsigned long address, short lsb)
-- 
2.41.0.rc0.172.g3f132b7071-goog


      parent reply	other threads:[~2023-06-02 16:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-02 16:08 [PATCH v2 00/16] Use MMU read lock for clear-dirty-log Vipin Sharma
2023-06-02 16:08 ` [PATCH v2 01/16] KVM: selftests: Clear dirty logs in user defined chunks sizes in dirty_log_perf_test Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 02/16] KVM: selftests: Add optional delay between consecutive clear-dirty-log calls Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 03/16] KVM: selftests: Pass the count of read and write accesses from guest to host Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 04/16] KVM: selftests: Print read-write progress by vCPUs in dirty_log_perf_test Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 05/16] KVM: selftests: Allow independent execution of " Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 06/16] KVM: arm64: Correct the kvm_pgtable_stage2_flush() documentation Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 07/16] KVM: mmu: Move mmu lock/unlock to arch code for clear dirty log Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 08/16] KMV: arm64: Pass page table walker flags to stage2_apply_range_*() Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 09/16] KVM: arm64: Document the page table walker actions based on the callback's return value Vipin Sharma
2023-06-05 14:35   ` Zhi Wang
2023-06-06 17:30     ` Vipin Sharma
2023-06-07 12:37       ` Zhi Wang
2023-06-08 20:17         ` Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 10/16] KVM: arm64: Return -ENOENT if PTE is not valid in stage2_attr_walker Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 11/16] KVM: arm64: Use KVM_PGTABLE_WALK_SHARED flag instead of KVM_PGTABLE_WALK_HANDLE_FAULT Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 12/16] KVM: arm64: Retry shared page table walks outside of fault handler Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 13/16] KVM: arm64: Run clear-dirty-log under MMU read lock Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 14/16] KVM: arm64: Pass page walker flags from callers of stage 2 split walker Vipin Sharma
2023-06-02 16:09 ` [PATCH v2 15/16] KVM: arm64: Provide option to pass page walker flag for huge page splits Vipin Sharma
2023-06-02 16:09 ` Vipin Sharma [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230602160914.4011728-17-vipinsh@google.com \
    --to=vipinsh@google.com \
    --cc=aleksandar.qemu.devel@gmail.com \
    --cc=anup@brainfault.org \
    --cc=aou@eecs.berkeley.edu \
    --cc=atishp@atishpatra.org \
    --cc=catalin.marinas@arm.com \
    --cc=chenhuacai@kernel.org \
    --cc=dmatlack@google.com \
    --cc=james.morse@arm.com \
    --cc=kvm-riscv@lists.infradead.org \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mips@vger.kernel.org \
    --cc=linux-riscv@lists.infradead.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=palmer@dabbelt.com \
    --cc=paul.walmsley@sifive.com \
    --cc=pbonzini@redhat.com \
    --cc=ricarkol@google.com \
    --cc=seanjc@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).