From: Yanan Wang <wangyanan55@huawei.com>
To: Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
"Catalin Marinas" <catalin.marinas@arm.com>,
James Morse <james.morse@arm.com>,
"Julien Thierry" <julien.thierry.kdev@gmail.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Gavin Shan <gshan@redhat.com>,
Quentin Perret <qperret@google.com>,
<kvmarm@lists.cs.columbia.edu>,
<linux-arm-kernel@lists.infradead.org>, <kvm@vger.kernel.org>,
<linux-kernel@vger.kernel.org>
Cc: <wanghaibin.wang@huawei.com>, <zhukeqian1@huawei.com>,
<yuzenghui@huawei.com>, Yanan Wang <wangyanan55@huawei.com>
Subject: [RFC PATCH 0/4] KVM: arm64: Improve efficiency of stage2 page table
Date: Mon, 8 Feb 2021 19:22:46 +0800 [thread overview]
Message-ID: <20210208112250.163568-1-wangyanan55@huawei.com> (raw)
Hi,
This series makes some efficiency improvement of stage2 page table code,
and there are some test results to present the performance changes, which
were tested by a kvm selftest [1] that I have post:
[1] https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
About patch 1:
We currently uniformly clean dcache in user_mem_abort() before calling the
fault handlers, if we take a translation fault and the pfn is cacheable.
But if there are concurrent translation faults on the same page or block,
clean of dcache for the first time is necessary while the others are not.
By moving clean of dcache to the map handler, we can easily identify the
conditions where CMOs are really needed and avoid the unnecessary ones.
As it's a time consuming process to perform CMOs especially when flushing
a block range, so this solution reduces much load of kvm and improve the
efficiency of creating mappings.
Test results:
(1) when 20 vCPUs concurrently access 20G ram (all 1G hugepages):
KVM create block mappings time: 52.83s -> 3.70s
KVM recover block mappings time(after dirty-logging): 52.0s -> 2.87s
(2) when 40 vCPUs concurrently access 20G ram (all 1G hugepages):
KVM creating block mappings time: 104.56s -> 3.70s
KVM recover block mappings time(after dirty-logging): 103.93s -> 2.96s
About patch 2, 3:
When KVM needs to coalesce the normal page mappings into a block mapping,
we currently invalidate the old table entry first followed by invalidation
of TLB, then unmap the page mappings, and install the block entry at last.
It will cost a lot of time to unmap the numerous page mappings, which means
the table entry will be left invalid for a long time before installation of
the block entry, and this will cause many spurious translation faults.
So let's quickly install the block entry at first to ensure uninterrupted
memory access of the other vCPUs, and then unmap the page mappings after
installation. This will reduce most of the time when the table entry is
invalid, and avoid most of the unnecessary translation faults.
Test results based on patch 1:
(1) when 20 vCPUs concurrently access 20G ram (all 1G hugepages):
KVM recover block mappings time(after dirty-logging): 2.87s -> 0.30s
(2) when 40 vCPUs concurrently access 20G ram (all 1G hugepages):
KVM recover block mappings time(after dirty-logging): 2.96s -> 0.35s
So combined with patch 1, it makes a big difference of KVM creating mappings
and recovering block mappings with not much code change.
About patch 4:
A new method to distinguish cases of memcache allocations is introduced.
By comparing fault_granule and vma_pagesize, cases that require allocations
from memcache and cases that don't can be distinguished completely.
---
Details of test results
platform: HiSilicon Kunpeng920 (FWB not supported)
host kernel: Linux mainline (v5.11-rc6)
(1) performance change of patch 1
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
(20 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_CREATE_MAPPINGS: 52.8338s 52.8327s 52.8336s 52.8255s 52.8303s
After patch: KVM_CREATE_MAPPINGS: 3.7022s 3.7031s 3.7028s 3.7012s 3.7024s
Before patch: KVM_ADJUST_MAPPINGS: 52.0466s 52.0473s 52.0550s 52.0518s 52.0467s
After patch: KVM_ADJUST_MAPPINGS: 2.8787s 2.8781s 2.8785s 2.8742s 2.8759s
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40
(40 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_CREATE_MAPPINGS: 104.560s 104.556s 104.554s 104.556s 104.550s
After patch: KVM_CREATE_MAPPINGS: 3.7011s 3.7103s 3.7005s 3.7024s 3.7106s
Before patch: KVM_ADJUST_MAPPINGS: 103.931s 103.936s 103.927s 103.942s 103.927s
After patch: KVM_ADJUST_MAPPINGS: 2.9621s 2.9648s 2.9474s 2.9587s 2.9603s
(2) performance change of patch 2, 3(based on patch 1)
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 1
(1 vcpu, 20G memory, block mappings(granule 1G))
Before patch: KVM_ADJUST_MAPPINGS: 2.8241s 2.8234s 2.8245s 2.8230s 2.8652s
After patch: KVM_ADJUST_MAPPINGS: 0.2444s 0.2442s 0.2423s 0.2441s 0.2429s
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
(20 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_ADJUST_MAPPINGS: 2.8787s 2.8781s 2.8785s 2.8742s 2.8759s
After patch: KVM_ADJUST_MAPPINGS: 0.3008s 0.3004s 0.2974s 0.2917s 0.2900s
cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40
(40 vcpus, 20G memory, block mappings(granule 1G))
Before patch: KVM_ADJUST_MAPPINGS: 2.9621s 2.9648s 2.9474s 2.9587s 2.9603s
After patch: KVM_ADJUST_MAPPINGS: 0.3541s 0.3694s 0.3656s 0.3693s 0.3687s
---
Yanan Wang (4):
KVM: arm64: Move the clean of dcache to the map handler
KVM: arm64: Add an independent API for coalescing tables
KVM: arm64: Install the block entry before unmapping the page mappings
KVM: arm64: Distinguish cases of memcache allocations completely
arch/arm64/include/asm/kvm_mmu.h | 16 -------
arch/arm64/kvm/hyp/pgtable.c | 82 +++++++++++++++++++++-----------
arch/arm64/kvm/mmu.c | 39 ++++++---------
3 files changed, 69 insertions(+), 68 deletions(-)
--
2.23.0
next reply other threads:[~2021-02-08 11:43 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-08 11:22 Yanan Wang [this message]
2021-02-08 11:22 ` [RFC PATCH 1/4] KVM: arm64: Move the clean of dcache to the map handler Yanan Wang
2021-02-24 17:21 ` Alexandru Elisei
2021-02-24 17:39 ` Marc Zyngier
2021-02-25 16:45 ` Alexandru Elisei
[not found] ` <871rd41ngf.wl-maz@kernel.org>
2021-02-25 17:39 ` Alexandru Elisei
2021-02-25 18:30 ` Marc Zyngier
2021-02-26 15:51 ` wangyanan (Y)
2021-02-26 15:58 ` wangyanan (Y)
2021-02-08 11:22 ` [RFC PATCH 2/4] KVM: arm64: Add an independent API for coalescing tables Yanan Wang
2021-02-08 11:22 ` [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings Yanan Wang
2021-02-28 11:11 ` wangyanan (Y)
2021-03-02 17:13 ` Alexandru Elisei
2021-03-03 11:04 ` wangyanan (Y)
2021-03-03 17:27 ` Alexandru Elisei
2021-03-04 7:07 ` wangyanan (Y)
2021-03-04 7:22 ` wangyanan (Y)
2021-03-19 15:07 ` Alexandru Elisei
2021-03-22 13:19 ` wangyanan (Y)
2021-02-08 11:22 ` [RFC PATCH 4/4] KVM: arm64: Distinguish cases of memcache allocations completely Yanan Wang
2021-03-25 17:26 ` Alexandru Elisei
2021-03-26 1:24 ` wangyanan (Y)
2021-02-23 15:55 ` [RFC PATCH 0/4] KVM: arm64: Improve efficiency of stage2 page table Alexandru Elisei
2021-02-24 2:35 ` wangyanan (Y)
2021-02-24 17:20 ` Alexandru Elisei
2021-02-25 6:13 ` wangyanan (Y)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210208112250.163568-1-wangyanan55@huawei.com \
--to=wangyanan55@huawei.com \
--cc=catalin.marinas@arm.com \
--cc=gshan@redhat.com \
--cc=james.morse@arm.com \
--cc=julien.thierry.kdev@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=qperret@google.com \
--cc=suzuki.poulose@arm.com \
--cc=wanghaibin.wang@huawei.com \
--cc=will@kernel.org \
--cc=yuzenghui@huawei.com \
--cc=zhukeqian1@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).