linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexandru Elisei <alexandru.elisei@arm.com>
To: "wangyanan (Y)" <wangyanan55@huawei.com>,
	Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Gavin Shan <gshan@redhat.com>,
	Quentin Perret <qperret@google.com>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/4] KVM: arm64: Improve efficiency of stage2 page table
Date: Wed, 24 Feb 2021 17:20:38 +0000	[thread overview]
Message-ID: <0385a692-efed-9c1d-0e7f-a3e3af8258d5@arm.com> (raw)
In-Reply-To: <0dd3a764-0e11-af6a-2b46-84509bef7294@huawei.com>

Hi,

On 2/24/21 2:35 AM, wangyanan (Y) wrote:

> Hi Alex,
>
> On 2021/2/23 23:55, Alexandru Elisei wrote:
>> Hi Yanan,
>>
>> I wanted to review the patches, but unfortunately I get an error when trying to
>> apply the first patch in the series:
>>
>> Applying: KVM: arm64: Move the clean of dcache to the map handler
>> error: patch failed: arch/arm64/kvm/hyp/pgtable.c:464
>> error: arch/arm64/kvm/hyp/pgtable.c: patch does not apply
>> error: patch failed: arch/arm64/kvm/mmu.c:882
>> error: arch/arm64/kvm/mmu.c: patch does not apply
>> Patch failed at 0001 KVM: arm64: Move the clean of dcache to the map handler
>> hint: Use 'git am --show-current-patch=diff' to see the failed patch
>> When you have resolved this problem, run "git am --continue".
>> If you prefer to skip this patch, run "git am --skip" instead.
>> To restore the original branch and stop patching, run "git am --abort".
>>
>> Tried this with Linux tags v5.11-rc1 to v5.11-rc7. It looks like pgtable.c and
>> mmu.c from your patch is different than what is found on upstream master. Did you
>> use another branch as the base for your patches?
> Thanks for your attention.
> Indeed, this series was  more or less based on the patches I post before (Link:
> https://lore.kernel.org/r/20210114121350.123684-4-wangyanan55@huawei.com).
> And they have already been merged into up-to-data upstream master (commit:
> 509552e65ae8287178a5cdea2d734dcd2d6380ab), but not into tags v5.11-rc1 to
> v5.11-rc7.
> Could you please try the newest upstream master(since commit:
> 509552e65ae8287178a5cdea2d734dcd2d6380ab) ? I have tested on my local and no
> apply errors occur.

That worked for me, thank you for the quick reply.

Just to double check, when you run the benchmarks, the before results are for a
kernel built from commit 509552e65ae8 ("KVM: arm64: Mark the page dirty only if
the fault is handled successfully"), and the after results are with this series on
top, right?

Thanks,

Alex

>
> Thanks,
>
> Yanan.
>
>> Thanks,
>>
>> Alex
>>
>> On 2/8/21 11:22 AM, Yanan Wang wrote:
>>> Hi,
>>>
>>> This series makes some efficiency improvement of stage2 page table code,
>>> and there are some test results to present the performance changes, which
>>> were tested by a kvm selftest [1] that I have post:
>>> [1] https://lore.kernel.org/lkml/20210208090841.333724-1-wangyanan55@huawei.com/
>>>
>>> About patch 1:
>>> We currently uniformly clean dcache in user_mem_abort() before calling the
>>> fault handlers, if we take a translation fault and the pfn is cacheable.
>>> But if there are concurrent translation faults on the same page or block,
>>> clean of dcache for the first time is necessary while the others are not.
>>>
>>> By moving clean of dcache to the map handler, we can easily identify the
>>> conditions where CMOs are really needed and avoid the unnecessary ones.
>>> As it's a time consuming process to perform CMOs especially when flushing
>>> a block range, so this solution reduces much load of kvm and improve the
>>> efficiency of creating mappings.
>>>
>>> Test results:
>>> (1) when 20 vCPUs concurrently access 20G ram (all 1G hugepages):
>>> KVM create block mappings time: 52.83s -> 3.70s
>>> KVM recover block mappings time(after dirty-logging): 52.0s -> 2.87s
>>>
>>> (2) when 40 vCPUs concurrently access 20G ram (all 1G hugepages):
>>> KVM creating block mappings time: 104.56s -> 3.70s
>>> KVM recover block mappings time(after dirty-logging): 103.93s -> 2.96s
>>>
>>> About patch 2, 3:
>>> When KVM needs to coalesce the normal page mappings into a block mapping,
>>> we currently invalidate the old table entry first followed by invalidation
>>> of TLB, then unmap the page mappings, and install the block entry at last.
>>>
>>> It will cost a lot of time to unmap the numerous page mappings, which means
>>> the table entry will be left invalid for a long time before installation of
>>> the block entry, and this will cause many spurious translation faults.
>>>
>>> So let's quickly install the block entry at first to ensure uninterrupted
>>> memory access of the other vCPUs, and then unmap the page mappings after
>>> installation. This will reduce most of the time when the table entry is
>>> invalid, and avoid most of the unnecessary translation faults.
>>>
>>> Test results based on patch 1:
>>> (1) when 20 vCPUs concurrently access 20G ram (all 1G hugepages):
>>> KVM recover block mappings time(after dirty-logging): 2.87s -> 0.30s
>>>
>>> (2) when 40 vCPUs concurrently access 20G ram (all 1G hugepages):
>>> KVM recover block mappings time(after dirty-logging): 2.96s -> 0.35s
>>>
>>> So combined with patch 1, it makes a big difference of KVM creating mappings
>>> and recovering block mappings with not much code change.
>>>
>>> About patch 4:
>>> A new method to distinguish cases of memcache allocations is introduced.
>>> By comparing fault_granule and vma_pagesize, cases that require allocations
>>> from memcache and cases that don't can be distinguished completely.
>>>
>>> ---
>>>
>>> Details of test results
>>> platform: HiSilicon Kunpeng920 (FWB not supported)
>>> host kernel: Linux mainline (v5.11-rc6)
>>>
>>> (1) performance change of patch 1
>>> cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
>>>        (20 vcpus, 20G memory, block mappings(granule 1G))
>>> Before patch: KVM_CREATE_MAPPINGS: 52.8338s 52.8327s 52.8336s 52.8255s 52.8303s
>>> After  patch: KVM_CREATE_MAPPINGS:  3.7022s  3.7031s  3.7028s  3.7012s  3.7024s
>>>
>>> Before patch: KVM_ADJUST_MAPPINGS: 52.0466s 52.0473s 52.0550s 52.0518s 52.0467s
>>> After  patch: KVM_ADJUST_MAPPINGS:  2.8787s  2.8781s  2.8785s  2.8742s  2.8759s
>>>
>>> cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40
>>>        (40 vcpus, 20G memory, block mappings(granule 1G))
>>> Before patch: KVM_CREATE_MAPPINGS: 104.560s 104.556s 104.554s 104.556s 104.550s
>>> After  patch: KVM_CREATE_MAPPINGS:  3.7011s  3.7103s  3.7005s  3.7024s  3.7106s
>>>
>>> Before patch: KVM_ADJUST_MAPPINGS: 103.931s 103.936s 103.927s 103.942s 103.927s
>>> After  patch: KVM_ADJUST_MAPPINGS:  2.9621s  2.9648s  2.9474s  2.9587s  2.9603s
>>>
>>> (2) performance change of patch 2, 3(based on patch 1)
>>> cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 1
>>>        (1 vcpu, 20G memory, block mappings(granule 1G))
>>> Before patch: KVM_ADJUST_MAPPINGS: 2.8241s 2.8234s 2.8245s 2.8230s 2.8652s
>>> After  patch: KVM_ADJUST_MAPPINGS: 0.2444s 0.2442s 0.2423s 0.2441s 0.2429s
>>>
>>> cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 20
>>>        (20 vcpus, 20G memory, block mappings(granule 1G))
>>> Before patch: KVM_ADJUST_MAPPINGS: 2.8787s 2.8781s 2.8785s 2.8742s 2.8759s
>>> After  patch: KVM_ADJUST_MAPPINGS: 0.3008s 0.3004s 0.2974s 0.2917s 0.2900s
>>>
>>> cmdline: ./kvm_page_table_test -m 4 -t 2 -g 1G -s 20G -v 40
>>>        (40 vcpus, 20G memory, block mappings(granule 1G))
>>> Before patch: KVM_ADJUST_MAPPINGS: 2.9621s 2.9648s 2.9474s 2.9587s 2.9603s
>>> After  patch: KVM_ADJUST_MAPPINGS: 0.3541s 0.3694s 0.3656s 0.3693s 0.3687s
>>>
>>> ---
>>>
>>> Yanan Wang (4):
>>>    KVM: arm64: Move the clean of dcache to the map handler
>>>    KVM: arm64: Add an independent API for coalescing tables
>>>    KVM: arm64: Install the block entry before unmapping the page mappings
>>>    KVM: arm64: Distinguish cases of memcache allocations completely
>>>
>>>   arch/arm64/include/asm/kvm_mmu.h | 16 -------
>>>   arch/arm64/kvm/hyp/pgtable.c     | 82 +++++++++++++++++++++-----------
>>>   arch/arm64/kvm/mmu.c             | 39 ++++++---------
>>>   3 files changed, 69 insertions(+), 68 deletions(-)
>>>
>> .

  reply	other threads:[~2021-02-24 17:22 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-08 11:22 [RFC PATCH 0/4] KVM: arm64: Improve efficiency of stage2 page table Yanan Wang
2021-02-08 11:22 ` [RFC PATCH 1/4] KVM: arm64: Move the clean of dcache to the map handler Yanan Wang
2021-02-24 17:21   ` Alexandru Elisei
2021-02-24 17:39     ` Marc Zyngier
2021-02-25 16:45       ` Alexandru Elisei
     [not found]   ` <871rd41ngf.wl-maz@kernel.org>
2021-02-25 17:39     ` Alexandru Elisei
2021-02-25 18:30       ` Marc Zyngier
2021-02-26 15:51         ` wangyanan (Y)
2021-02-26 15:58     ` wangyanan (Y)
2021-02-08 11:22 ` [RFC PATCH 2/4] KVM: arm64: Add an independent API for coalescing tables Yanan Wang
2021-02-08 11:22 ` [RFC PATCH 3/4] KVM: arm64: Install the block entry before unmapping the page mappings Yanan Wang
2021-02-28 11:11   ` wangyanan (Y)
2021-03-02 17:13   ` Alexandru Elisei
2021-03-03 11:04     ` wangyanan (Y)
2021-03-03 17:27       ` Alexandru Elisei
2021-03-04  7:07         ` wangyanan (Y)
2021-03-04  7:22           ` wangyanan (Y)
2021-03-19 15:07           ` Alexandru Elisei
2021-03-22 13:19             ` wangyanan (Y)
2021-02-08 11:22 ` [RFC PATCH 4/4] KVM: arm64: Distinguish cases of memcache allocations completely Yanan Wang
2021-03-25 17:26   ` Alexandru Elisei
2021-03-26  1:24     ` wangyanan (Y)
2021-02-23 15:55 ` [RFC PATCH 0/4] KVM: arm64: Improve efficiency of stage2 page table Alexandru Elisei
2021-02-24  2:35   ` wangyanan (Y)
2021-02-24 17:20     ` Alexandru Elisei [this message]
2021-02-25  6:13       ` wangyanan (Y)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0385a692-efed-9c1d-0e7f-a3e3af8258d5@arm.com \
    --to=alexandru.elisei@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=gshan@redhat.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=qperret@google.com \
    --cc=suzuki.poulose@arm.com \
    --cc=wangyanan55@huawei.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).