All of lore.kernel.org
 help / color / mirror / Atom feed
From: zhukeqian <zhukeqian1@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<kvmarm@lists.cs.columbia.edu>, <kvm@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	James Morse <james.morse@arm.com>, Will Deacon <will@kernel.org>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	"Sean Christopherson" <sean.j.christopherson@intel.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Mark Brown <broonie@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexios Zavras <alexios.zavras@intel.com>,
	<wanghaibin.wang@huawei.com>, <zhengxiang9@huawei.com>
Subject: Re: [RFC PATCH 0/7] kvm: arm64: Support stage2 hardware DBM
Date: Tue, 26 May 2020 10:08:52 +0800	[thread overview]
Message-ID: <66deb797-726f-242b-82fb-0ddee975ef15@huawei.com> (raw)
In-Reply-To: <4b8a939172395bf38e581634abecf925@kernel.org>

Hi Marc,

On 2020/5/25 23:44, Marc Zyngier wrote:
> On 2020-05-25 12:23, Keqian Zhu wrote:
>> This patch series add support for stage2 hardware DBM, and it is only
>> used for dirty log for now.
>>
>> It works well under some migration test cases, including VM with 4K
>> pages or 2M THP. I checked the SHA256 hash digest of all memory and
>> they keep same for source VM and destination VM, which means no dirty
>> pages is missed under hardware DBM.
>>
>> However, there are some known issues not solved.
>>
>> 1. Some mechanisms that rely on "write permission fault" become invalid,
>>    such as kvm_set_pfn_dirty and "mmap page sharing".
>>
>>    kvm_set_pfn_dirty is called in user_mem_abort when guest issues write
>>    fault. This guarantees physical page will not be dropped directly when
>>    host kernel recycle memory. After using hardware dirty management, we
>>    have no chance to call kvm_set_pfn_dirty.
> 
> Then you will end-up with memory corruption under memory pressure.
> This also breaks things like CoW, which we depend on.
>
Yes, these problems looks knotty. But I think x86 PML support will face these
problems too. I believe there must be some methods to solve them.
>>
>>    For "mmap page sharing" mechanism, host kernel will allocate a new
>>    physical page when guest writes a page that is shared with other page
>>    table entries. After using hardware dirty management, we have no chance
>>    to do this too.
>>
>>    I need to do some survey on how stage1 hardware DBM solve these problems.
>>    It helps if anyone can figure it out.
>>
>> 2. Page Table Modification Races: Though I have found and solved some data
>>    races when kernel changes page table entries, I still doubt that there
>>    are data races I am not aware of. It's great if anyone can figure them out.
>>
>> 3. Performance: Under Kunpeng 920 platform, for every 64GB memory, KVM
>>    consumes about 40ms to traverse all PTEs to collect dirty log. It will
>>    cause unbearable downtime for migration if memory size is too big. I will
>>    try to solve this problem in Patch v1.
> 
> This, in my opinion, is why Stage-2 DBM is fairly useless.
> From a performance perspective, this is the worse possible
> situation. You end up continuously scanning page tables, at
> an arbitrary rate, without a way to evaluate the fault rate.
> 
> One thing S2-DBM would be useful for is SVA, where a device
> write would mark the S2 PTs dirty as they are shared between
> CPU and SMMU. Another thing is SPE, which is essentially a DMA
> agent using the CPU's PTs.
> 
> But on its own, and just to log the dirty pages, S2-DBM is
> pretty rubbish. I wish arm64 had something like Intel's PML,
> which looks far more interesting for the purpose of tracking
> accesses.

Sure, PML is a better solution on hardware management of dirty state.
However, compared to optimizing hardware, optimizing software is with
shorter cycle time.

Here I have an optimization in mind to solve it. Scanning page tables
can be done parallel, which can greatly reduce time consumption. For there
is no communication between parallel CPUs, we can achieve high speedup
ratio.


> 
> Thanks,
> 
>         M.
Thanks,
Keqian

WARNING: multiple messages have this Message-ID (diff)
From: zhukeqian <zhukeqian1@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	kvm@vger.kernel.org, Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel@vger.kernel.org,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Alexios Zavras <alexios.zavras@intel.com>,
	Mark Brown <broonie@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH 0/7] kvm: arm64: Support stage2 hardware DBM
Date: Tue, 26 May 2020 10:08:52 +0800	[thread overview]
Message-ID: <66deb797-726f-242b-82fb-0ddee975ef15@huawei.com> (raw)
In-Reply-To: <4b8a939172395bf38e581634abecf925@kernel.org>

Hi Marc,

On 2020/5/25 23:44, Marc Zyngier wrote:
> On 2020-05-25 12:23, Keqian Zhu wrote:
>> This patch series add support for stage2 hardware DBM, and it is only
>> used for dirty log for now.
>>
>> It works well under some migration test cases, including VM with 4K
>> pages or 2M THP. I checked the SHA256 hash digest of all memory and
>> they keep same for source VM and destination VM, which means no dirty
>> pages is missed under hardware DBM.
>>
>> However, there are some known issues not solved.
>>
>> 1. Some mechanisms that rely on "write permission fault" become invalid,
>>    such as kvm_set_pfn_dirty and "mmap page sharing".
>>
>>    kvm_set_pfn_dirty is called in user_mem_abort when guest issues write
>>    fault. This guarantees physical page will not be dropped directly when
>>    host kernel recycle memory. After using hardware dirty management, we
>>    have no chance to call kvm_set_pfn_dirty.
> 
> Then you will end-up with memory corruption under memory pressure.
> This also breaks things like CoW, which we depend on.
>
Yes, these problems looks knotty. But I think x86 PML support will face these
problems too. I believe there must be some methods to solve them.
>>
>>    For "mmap page sharing" mechanism, host kernel will allocate a new
>>    physical page when guest writes a page that is shared with other page
>>    table entries. After using hardware dirty management, we have no chance
>>    to do this too.
>>
>>    I need to do some survey on how stage1 hardware DBM solve these problems.
>>    It helps if anyone can figure it out.
>>
>> 2. Page Table Modification Races: Though I have found and solved some data
>>    races when kernel changes page table entries, I still doubt that there
>>    are data races I am not aware of. It's great if anyone can figure them out.
>>
>> 3. Performance: Under Kunpeng 920 platform, for every 64GB memory, KVM
>>    consumes about 40ms to traverse all PTEs to collect dirty log. It will
>>    cause unbearable downtime for migration if memory size is too big. I will
>>    try to solve this problem in Patch v1.
> 
> This, in my opinion, is why Stage-2 DBM is fairly useless.
> From a performance perspective, this is the worse possible
> situation. You end up continuously scanning page tables, at
> an arbitrary rate, without a way to evaluate the fault rate.
> 
> One thing S2-DBM would be useful for is SVA, where a device
> write would mark the S2 PTs dirty as they are shared between
> CPU and SMMU. Another thing is SPE, which is essentially a DMA
> agent using the CPU's PTs.
> 
> But on its own, and just to log the dirty pages, S2-DBM is
> pretty rubbish. I wish arm64 had something like Intel's PML,
> which looks far more interesting for the purpose of tracking
> accesses.

Sure, PML is a better solution on hardware management of dirty state.
However, compared to optimizing hardware, optimizing software is with
shorter cycle time.

Here I have an optimization in mind to solve it. Scanning page tables
can be done parallel, which can greatly reduce time consumption. For there
is no communication between parallel CPUs, we can achieve high speedup
ratio.


> 
> Thanks,
> 
>         M.
Thanks,
Keqian
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: zhukeqian <zhukeqian1@huawei.com>
To: Marc Zyngier <maz@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	kvm@vger.kernel.org, Suzuki K Poulose <suzuki.poulose@arm.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel@vger.kernel.org,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Alexios Zavras <alexios.zavras@intel.com>,
	zhengxiang9@huawei.com, Mark Brown <broonie@kernel.org>,
	James Morse <james.morse@arm.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	wanghaibin.wang@huawei.com, Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>,
	kvmarm@lists.cs.columbia.edu,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [RFC PATCH 0/7] kvm: arm64: Support stage2 hardware DBM
Date: Tue, 26 May 2020 10:08:52 +0800	[thread overview]
Message-ID: <66deb797-726f-242b-82fb-0ddee975ef15@huawei.com> (raw)
In-Reply-To: <4b8a939172395bf38e581634abecf925@kernel.org>

Hi Marc,

On 2020/5/25 23:44, Marc Zyngier wrote:
> On 2020-05-25 12:23, Keqian Zhu wrote:
>> This patch series add support for stage2 hardware DBM, and it is only
>> used for dirty log for now.
>>
>> It works well under some migration test cases, including VM with 4K
>> pages or 2M THP. I checked the SHA256 hash digest of all memory and
>> they keep same for source VM and destination VM, which means no dirty
>> pages is missed under hardware DBM.
>>
>> However, there are some known issues not solved.
>>
>> 1. Some mechanisms that rely on "write permission fault" become invalid,
>>    such as kvm_set_pfn_dirty and "mmap page sharing".
>>
>>    kvm_set_pfn_dirty is called in user_mem_abort when guest issues write
>>    fault. This guarantees physical page will not be dropped directly when
>>    host kernel recycle memory. After using hardware dirty management, we
>>    have no chance to call kvm_set_pfn_dirty.
> 
> Then you will end-up with memory corruption under memory pressure.
> This also breaks things like CoW, which we depend on.
>
Yes, these problems looks knotty. But I think x86 PML support will face these
problems too. I believe there must be some methods to solve them.
>>
>>    For "mmap page sharing" mechanism, host kernel will allocate a new
>>    physical page when guest writes a page that is shared with other page
>>    table entries. After using hardware dirty management, we have no chance
>>    to do this too.
>>
>>    I need to do some survey on how stage1 hardware DBM solve these problems.
>>    It helps if anyone can figure it out.
>>
>> 2. Page Table Modification Races: Though I have found and solved some data
>>    races when kernel changes page table entries, I still doubt that there
>>    are data races I am not aware of. It's great if anyone can figure them out.
>>
>> 3. Performance: Under Kunpeng 920 platform, for every 64GB memory, KVM
>>    consumes about 40ms to traverse all PTEs to collect dirty log. It will
>>    cause unbearable downtime for migration if memory size is too big. I will
>>    try to solve this problem in Patch v1.
> 
> This, in my opinion, is why Stage-2 DBM is fairly useless.
> From a performance perspective, this is the worse possible
> situation. You end up continuously scanning page tables, at
> an arbitrary rate, without a way to evaluate the fault rate.
> 
> One thing S2-DBM would be useful for is SVA, where a device
> write would mark the S2 PTs dirty as they are shared between
> CPU and SMMU. Another thing is SPE, which is essentially a DMA
> agent using the CPU's PTs.
> 
> But on its own, and just to log the dirty pages, S2-DBM is
> pretty rubbish. I wish arm64 had something like Intel's PML,
> which looks far more interesting for the purpose of tracking
> accesses.

Sure, PML is a better solution on hardware management of dirty state.
However, compared to optimizing hardware, optimizing software is with
shorter cycle time.

Here I have an optimization in mind to solve it. Scanning page tables
can be done parallel, which can greatly reduce time consumption. For there
is no communication between parallel CPUs, we can achieve high speedup
ratio.


> 
> Thanks,
> 
>         M.
Thanks,
Keqian

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-05-26  2:09 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-25 11:23 [RFC PATCH 0/7] kvm: arm64: Support stage2 hardware DBM Keqian Zhu
2020-05-25 11:23 ` Keqian Zhu
2020-05-25 11:23 ` Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 1/7] KVM: arm64: Add some basic functions for hw DBM Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 2/7] KVM: arm64: Set DBM bit of PTEs if hw DBM enabled Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-26 11:49   ` Catalin Marinas
2020-05-26 11:49     ` Catalin Marinas
2020-05-26 11:49     ` Catalin Marinas
2020-05-27  9:28     ` zhukeqian
2020-05-27  9:28       ` zhukeqian
2020-05-27  9:28       ` zhukeqian
2020-05-25 11:24 ` [RFC PATCH 3/7] KVM: arm64: Traverse page table entries when sync dirty log Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 4/7] KVM: arm64: Steply write protect page table by mask bit Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 5/7] kvm: arm64: Modify stage2 young mechanism to support hw DBM Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 6/7] kvm: arm64: Save stage2 PTE dirty info if it is coverred Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24 ` [RFC PATCH 7/7] KVM: arm64: Enable stage2 hardware DBM Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 11:24   ` Keqian Zhu
2020-05-25 15:44 ` [RFC PATCH 0/7] kvm: arm64: Support " Marc Zyngier
2020-05-25 15:44   ` Marc Zyngier
2020-05-25 15:44   ` Marc Zyngier
2020-05-26  2:08   ` zhukeqian [this message]
2020-05-26  2:08     ` zhukeqian
2020-05-26  2:08     ` zhukeqian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66deb797-726f-242b-82fb-0ddee975ef15@huawei.com \
    --to=zhukeqian1@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexios.zavras@intel.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=wanghaibin.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=zhengxiang9@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.