From: zhukeqian <zhukeqian1@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>, kvmarm@lists.cs.columbia.edu
Subject: Re: [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM
Date: Mon, 13 Jul 2020 10:47:25 +0800 [thread overview]
Message-ID: <4eee3e4c-db73-c4ce-ca3d-d665ee87d66a@huawei.com> (raw)
In-Reply-To: <015847afd67e8bd4f8a158b604854838@kernel.org>
Hi Marc,
Sorry for the delay reply.
On 2020/7/6 15:54, Marc Zyngier wrote:
> Hi Keqian,
>
> On 2020-07-06 02:28, zhukeqian wrote:
>> Hi Catalin and Marc,
>>
>> On 2020/7/2 21:55, Keqian Zhu wrote:
>>> This patch series add support for dirty log based on HW DBM.
>>>
>>> It works well under some migration test cases, including VM with 4K
>>> pages or 2M THP. I checked the SHA256 hash digest of all memory and
>>> they keep same for source VM and destination VM, which means no dirty
>>> pages is missed under hardware DBM.
>>>
>>> Some key points:
>>>
>>> 1. Only support hardware updates of dirty status for PTEs. PMDs and PUDs
>>> are not involved for now.
>>>
>>> 2. About *performance*: In RFC patch, I have mentioned that for every 64GB
>>> memory, KVM consumes about 40ms to scan all PTEs to collect dirty log.
>>> This patch solves this problem through two ways: HW/SW dynamic switch
>>> and Multi-core offload.
>>>
>>> HW/SW dynamic switch: Give userspace right to enable/disable hw dirty
>>> log. This adds a new KVM cap named KVM_CAP_ARM_HW_DIRTY_LOG. We can
>>> achieve this by change the kvm->arch.vtcr value and kick vCPUs out to
>>> reload this value to VCTR_EL2. Then userspace can enable hw dirty log
>>> at the begining and disable it when dirty pages is little and about to
>>> stop VM, so VM downtime is not affected.
>>>
>>> Multi-core offload: Offload the PT scanning workload to multi-core can
>>> greatly reduce scanning time. To promise we can complete in time, I use
>>> smp_call_fuction to realize this policy, which utilize IPI to dispatch
>>> workload to other CPUs. Under 128U Kunpeng 920 platform, it just takes
>>> about 5ms to scan PTs of 256 RAM (use mempress and almost all PTs have
>>> been established). And We dispatch workload iterately (every CPU just
>>> scan PTs of 512M RAM for each iteration), so it won't affect physical
>>> CPUs seriously.
>>
>> What do you think of these two methods to solve high-cost PTs scaning? Maybe
>> you are waiting for PML like feature on ARM :-) , but for my test, DBM is usable
>> after these two methods applied.
>
> Useable, maybe. But leaving to userspace the decision to switch from one
> mode to another isn't an acceptable outcome. Userspace doesn't need nor
> want to know about this.
>
OK, maybe this is worth discussing. The switch logic can be encapsulated into Qemu
and can not be seen from VM users. Well, I think it maybe acceptable. :)
> Another thing is that sending IPIs all over to trigger scanning may
> work well on a system that runs a limited number of guests (or some
> other userspace, actually), but I seriously doubt that it is impact
> free once you start doing this on an otherwise loaded system.
>
Yes, it is not suitable to send IPIs to all other physical CPUs. Currently I just
want to show you my idea and to prove it is effective. In real cloud product, we
have resource isolation mechanism, so we will have a bit worse result (compared to 5ms)
but we won't effect other VMs.
> You may have better results by having an alternative mapping of your
> S2 page tables so that they are accessible linearly, which would
> sidestep the PT parsing altogether, probably saving some cycles. But
Yeah, this is a good idea. But for my understanding, to make them linear, we have to preserve
enough physical memory at VM start (may waste much memory), and the effect of this optimization
*maybe* not obvious.
> this is still a marginal gain compared to the overall overhead of
> scanning 4kB of memory per 2MB of guest RAM, as opposed to 64 *bytes*
> per 2MB (assuming strict 4kB mappings at S2, no block mappings).
>
I ever tested scanning PTs by reading only one byte of each PTE and the test result keeps same.
So, when we scan PTs using just one core, the bottle-neck is CPU speed, instead of memory bandwidth.
> Finally, this doesn't work with pages dirtied from DMA, which is the
> biggest problem. If you cannot track pages that are dirtied behind your
> back, what is the purpose of scanning the dirty bits?
>
> As for a PML-like feature, this would only be useful if the SMMU
> architecture took part in it and provided consistent logging of
> the dirtied pages in the IPA space. Only having it at the CPU level
> would be making the exact same mistake.
Even SMMU is equipped with PML like feature, we still rely on device suspend to
avoid omitting dirty pages, so I think the only advantage of PML is reducing dirty
log sync time compared to multi-core offload. Maybe I missing something?
Thanks,
Keqian
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
next prev parent reply other threads:[~2020-07-13 2:47 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 1/8] KVM: arm64: Set DBM bit for writable PTEs Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 2/8] KVM: arm64: Scan PTEs to sync dirty log Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 3/8] KVM: arm64: Modify stage2 young mechanism to support hw DBM Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 4/8] KVM: arm64: Save stage2 PTE dirty status if it is covered Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 5/8] KVM: arm64: Steply write protect page table by mask bit Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 6/8] KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability Keqian Zhu
2020-07-06 1:08 ` zhukeqian
2020-07-02 13:55 ` [PATCH v2 7/8] KVM: arm64: Sync dirty log parallel Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 8/8] KVM: Omit dirty log sync in log clear if initially all set Keqian Zhu
2020-07-06 1:28 ` [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM zhukeqian
2020-07-06 7:54 ` Marc Zyngier
2020-07-13 2:47 ` zhukeqian [this message]
2020-07-13 14:53 ` Marc Zyngier
2020-07-28 2:11 ` zhukeqian
2020-07-28 7:52 ` Marc Zyngier
2020-07-28 8:32 ` zhukeqian
2021-01-06 6:55 ` Keqian Zhu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4eee3e4c-db73-c4ce-ca3d-d665ee87d66a@huawei.com \
--to=zhukeqian1@huawei.com \
--cc=catalin.marinas@arm.com \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=maz@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).