All of lore.kernel.org
 help / color / mirror / Atom feed
From: zhukeqian <zhukeqian1@huawei.com>
To: <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<kvmarm@lists.cs.columbia.edu>, <kvm@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
	Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	Julien Thierry <julien.thierry.kdev@gmail.com>,
	Mark Brown <broonie@kernel.org>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexios Zavras <alexios.zavras@intel.com>,
	<liangpeng10@huawei.com>, <zhengxiang9@huawei.com>,
	<wanghaibin.wang@huawei.com>
Subject: Re: [PATCH 00/12] KVM: arm64: Support stage2 hardware DBM
Date: Thu, 18 Jun 2020 12:13:50 +0800	[thread overview]
Message-ID: <f480d0dc-aaf8-c915-409c-d0f56812a49f@huawei.com> (raw)
In-Reply-To: <20200616093553.27512-1-zhukeqian1@huawei.com>

Hi,

On 2020/6/16 17:35, Keqian Zhu wrote:
> This patch series add support for stage2 hardware DBM, and it is only
> used for dirty log for now.
> 
> It works well under some migration test cases, including VM with 4K
> pages or 2M THP. I checked the SHA256 hash digest of all memory and
> they keep same for source VM and destination VM, which means no dirty
> pages is missed under hardware DBM.
> 
> Some key points:
> 
> 1. Only support hardware updates of dirty status for PTEs. PMDs and PUDs
>    are not involved for now.
> 
> 2. About *performance*: In RFC patch, I have mentioned that for every 64GB
>    memory, KVM consumes about 40ms to scan all PTEs to collect dirty log.
>    
>    Initially, I plan to solve this problem using parallel CPUs. However
>    I faced two problems.
> 
>    The first is bottleneck of memory bandwith. Single thread will occupy
>    bandwidth about 500GB/s, we can support about 4 parallel threads at
>    most, so the ideal speedup ratio is low.
Aha, I make it wrong here. I test it again, and find that speedup ratio can
be about 23x when I use 32 CPUs to scan PTs (takes about 5ms when scanning PTs
of 200GB RAM).

> 
>    The second is huge impact on other CPUs. To scan PTs quickly, I use
>    smp_call_function_many, which is based on IPI, to dispatch workload
>    on other CPUs. Though it can complete work in time, the interrupt is
>    disabled during scaning PTs, which has huge impact on other CPUs.
I think we can divide scanning workload into smaller ones, which can disable
and enable interrupts periodly.

> 
>    Now, I make hardware dirty log can be dynamic enabled and disabled.
>    Userspace can enable it before VM migration and disable it when
>    remaining dirty pages is little. Thus VM downtime is not affected. 
BTW, we can reserve this interface for userspace if CPU computing resources
are not enough.

Thanks,
Keqian
> 
> 
> 3. About correctness: Only add DBM bit when PTE is already writable, so
>    we still have readonly PTE and some mechanisms which rely on readonly
>    PTs are not broken.
> 
> 4. About PTs modification races: There are two kinds of PTs modification.
>    
>    The first is adding or clearing specific bit, such as AF or RW. All
>    these operations have been converted to be atomic, avoid covering
>    dirty status set by hardware.
>    
>    The second is replacement, such as PTEs unmapping or changement. All
>    these operations will invoke kvm_set_pte finally. kvm_set_pte have
>    been converted to be atomic and we save the dirty status to underlying
>    bitmap if dirty status is coverred.
> 
> 
> Keqian Zhu (12):
>   KVM: arm64: Add some basic functions to support hw DBM
>   KVM: arm64: Modify stage2 young mechanism to support hw DBM
>   KVM: arm64: Report hardware dirty status of stage2 PTE if coverred
>   KVM: arm64: Support clear DBM bit for PTEs
>   KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability
>   KVM: arm64: Set DBM bit of PTEs during write protecting
>   KVM: arm64: Scan PTEs to sync dirty log
>   KVM: Omit dirty log sync in log clear if initially all set
>   KVM: arm64: Steply write protect page table by mask bit
>   KVM: arm64: Save stage2 PTE dirty status if it is coverred
>   KVM: arm64: Support disable hw dirty log after enable
>   KVM: arm64: Enable stage2 hardware DBM
> 
>  arch/arm64/include/asm/kvm_host.h |  11 +
>  arch/arm64/include/asm/kvm_mmu.h  |  56 +++-
>  arch/arm64/include/asm/sysreg.h   |   2 +
>  arch/arm64/kvm/arm.c              |  22 +-
>  arch/arm64/kvm/mmu.c              | 411 ++++++++++++++++++++++++++++--
>  arch/arm64/kvm/reset.c            |  14 +-
>  include/uapi/linux/kvm.h          |   1 +
>  tools/include/uapi/linux/kvm.h    |   1 +
>  virt/kvm/kvm_main.c               |   7 +-
>  9 files changed, 499 insertions(+), 26 deletions(-)
> 

WARNING: multiple messages have this Message-ID (diff)
From: zhukeqian <zhukeqian1@huawei.com>
To: <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<kvmarm@lists.cs.columbia.edu>, <kvm@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
	Will Deacon <will@kernel.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>,
	liangpeng10@huawei.com, Alexios Zavras <alexios.zavras@intel.com>,
	Mark Brown <broonie@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [PATCH 00/12] KVM: arm64: Support stage2 hardware DBM
Date: Thu, 18 Jun 2020 12:13:50 +0800	[thread overview]
Message-ID: <f480d0dc-aaf8-c915-409c-d0f56812a49f@huawei.com> (raw)
In-Reply-To: <20200616093553.27512-1-zhukeqian1@huawei.com>

Hi,

On 2020/6/16 17:35, Keqian Zhu wrote:
> This patch series add support for stage2 hardware DBM, and it is only
> used for dirty log for now.
> 
> It works well under some migration test cases, including VM with 4K
> pages or 2M THP. I checked the SHA256 hash digest of all memory and
> they keep same for source VM and destination VM, which means no dirty
> pages is missed under hardware DBM.
> 
> Some key points:
> 
> 1. Only support hardware updates of dirty status for PTEs. PMDs and PUDs
>    are not involved for now.
> 
> 2. About *performance*: In RFC patch, I have mentioned that for every 64GB
>    memory, KVM consumes about 40ms to scan all PTEs to collect dirty log.
>    
>    Initially, I plan to solve this problem using parallel CPUs. However
>    I faced two problems.
> 
>    The first is bottleneck of memory bandwith. Single thread will occupy
>    bandwidth about 500GB/s, we can support about 4 parallel threads at
>    most, so the ideal speedup ratio is low.
Aha, I make it wrong here. I test it again, and find that speedup ratio can
be about 23x when I use 32 CPUs to scan PTs (takes about 5ms when scanning PTs
of 200GB RAM).

> 
>    The second is huge impact on other CPUs. To scan PTs quickly, I use
>    smp_call_function_many, which is based on IPI, to dispatch workload
>    on other CPUs. Though it can complete work in time, the interrupt is
>    disabled during scaning PTs, which has huge impact on other CPUs.
I think we can divide scanning workload into smaller ones, which can disable
and enable interrupts periodly.

> 
>    Now, I make hardware dirty log can be dynamic enabled and disabled.
>    Userspace can enable it before VM migration and disable it when
>    remaining dirty pages is little. Thus VM downtime is not affected. 
BTW, we can reserve this interface for userspace if CPU computing resources
are not enough.

Thanks,
Keqian
> 
> 
> 3. About correctness: Only add DBM bit when PTE is already writable, so
>    we still have readonly PTE and some mechanisms which rely on readonly
>    PTs are not broken.
> 
> 4. About PTs modification races: There are two kinds of PTs modification.
>    
>    The first is adding or clearing specific bit, such as AF or RW. All
>    these operations have been converted to be atomic, avoid covering
>    dirty status set by hardware.
>    
>    The second is replacement, such as PTEs unmapping or changement. All
>    these operations will invoke kvm_set_pte finally. kvm_set_pte have
>    been converted to be atomic and we save the dirty status to underlying
>    bitmap if dirty status is coverred.
> 
> 
> Keqian Zhu (12):
>   KVM: arm64: Add some basic functions to support hw DBM
>   KVM: arm64: Modify stage2 young mechanism to support hw DBM
>   KVM: arm64: Report hardware dirty status of stage2 PTE if coverred
>   KVM: arm64: Support clear DBM bit for PTEs
>   KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability
>   KVM: arm64: Set DBM bit of PTEs during write protecting
>   KVM: arm64: Scan PTEs to sync dirty log
>   KVM: Omit dirty log sync in log clear if initially all set
>   KVM: arm64: Steply write protect page table by mask bit
>   KVM: arm64: Save stage2 PTE dirty status if it is coverred
>   KVM: arm64: Support disable hw dirty log after enable
>   KVM: arm64: Enable stage2 hardware DBM
> 
>  arch/arm64/include/asm/kvm_host.h |  11 +
>  arch/arm64/include/asm/kvm_mmu.h  |  56 +++-
>  arch/arm64/include/asm/sysreg.h   |   2 +
>  arch/arm64/kvm/arm.c              |  22 +-
>  arch/arm64/kvm/mmu.c              | 411 ++++++++++++++++++++++++++++--
>  arch/arm64/kvm/reset.c            |  14 +-
>  include/uapi/linux/kvm.h          |   1 +
>  tools/include/uapi/linux/kvm.h    |   1 +
>  virt/kvm/kvm_main.c               |   7 +-
>  9 files changed, 499 insertions(+), 26 deletions(-)
> 
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

WARNING: multiple messages have this Message-ID (diff)
From: zhukeqian <zhukeqian1@huawei.com>
To: <linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<kvmarm@lists.cs.columbia.edu>, <kvm@vger.kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>,
	Will Deacon <will@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>,
	Sean Christopherson <sean.j.christopherson@intel.com>,
	liangpeng10@huawei.com, Alexios Zavras <alexios.zavras@intel.com>,
	zhengxiang9@huawei.com, Mark Brown <broonie@kernel.org>,
	wanghaibin.wang@huawei.com,
	Andrew Morton <akpm@linux-foundation.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Julien Thierry <julien.thierry.kdev@gmail.com>
Subject: Re: [PATCH 00/12] KVM: arm64: Support stage2 hardware DBM
Date: Thu, 18 Jun 2020 12:13:50 +0800	[thread overview]
Message-ID: <f480d0dc-aaf8-c915-409c-d0f56812a49f@huawei.com> (raw)
In-Reply-To: <20200616093553.27512-1-zhukeqian1@huawei.com>

Hi,

On 2020/6/16 17:35, Keqian Zhu wrote:
> This patch series add support for stage2 hardware DBM, and it is only
> used for dirty log for now.
> 
> It works well under some migration test cases, including VM with 4K
> pages or 2M THP. I checked the SHA256 hash digest of all memory and
> they keep same for source VM and destination VM, which means no dirty
> pages is missed under hardware DBM.
> 
> Some key points:
> 
> 1. Only support hardware updates of dirty status for PTEs. PMDs and PUDs
>    are not involved for now.
> 
> 2. About *performance*: In RFC patch, I have mentioned that for every 64GB
>    memory, KVM consumes about 40ms to scan all PTEs to collect dirty log.
>    
>    Initially, I plan to solve this problem using parallel CPUs. However
>    I faced two problems.
> 
>    The first is bottleneck of memory bandwith. Single thread will occupy
>    bandwidth about 500GB/s, we can support about 4 parallel threads at
>    most, so the ideal speedup ratio is low.
Aha, I make it wrong here. I test it again, and find that speedup ratio can
be about 23x when I use 32 CPUs to scan PTs (takes about 5ms when scanning PTs
of 200GB RAM).

> 
>    The second is huge impact on other CPUs. To scan PTs quickly, I use
>    smp_call_function_many, which is based on IPI, to dispatch workload
>    on other CPUs. Though it can complete work in time, the interrupt is
>    disabled during scaning PTs, which has huge impact on other CPUs.
I think we can divide scanning workload into smaller ones, which can disable
and enable interrupts periodly.

> 
>    Now, I make hardware dirty log can be dynamic enabled and disabled.
>    Userspace can enable it before VM migration and disable it when
>    remaining dirty pages is little. Thus VM downtime is not affected. 
BTW, we can reserve this interface for userspace if CPU computing resources
are not enough.

Thanks,
Keqian
> 
> 
> 3. About correctness: Only add DBM bit when PTE is already writable, so
>    we still have readonly PTE and some mechanisms which rely on readonly
>    PTs are not broken.
> 
> 4. About PTs modification races: There are two kinds of PTs modification.
>    
>    The first is adding or clearing specific bit, such as AF or RW. All
>    these operations have been converted to be atomic, avoid covering
>    dirty status set by hardware.
>    
>    The second is replacement, such as PTEs unmapping or changement. All
>    these operations will invoke kvm_set_pte finally. kvm_set_pte have
>    been converted to be atomic and we save the dirty status to underlying
>    bitmap if dirty status is coverred.
> 
> 
> Keqian Zhu (12):
>   KVM: arm64: Add some basic functions to support hw DBM
>   KVM: arm64: Modify stage2 young mechanism to support hw DBM
>   KVM: arm64: Report hardware dirty status of stage2 PTE if coverred
>   KVM: arm64: Support clear DBM bit for PTEs
>   KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability
>   KVM: arm64: Set DBM bit of PTEs during write protecting
>   KVM: arm64: Scan PTEs to sync dirty log
>   KVM: Omit dirty log sync in log clear if initially all set
>   KVM: arm64: Steply write protect page table by mask bit
>   KVM: arm64: Save stage2 PTE dirty status if it is coverred
>   KVM: arm64: Support disable hw dirty log after enable
>   KVM: arm64: Enable stage2 hardware DBM
> 
>  arch/arm64/include/asm/kvm_host.h |  11 +
>  arch/arm64/include/asm/kvm_mmu.h  |  56 +++-
>  arch/arm64/include/asm/sysreg.h   |   2 +
>  arch/arm64/kvm/arm.c              |  22 +-
>  arch/arm64/kvm/mmu.c              | 411 ++++++++++++++++++++++++++++--
>  arch/arm64/kvm/reset.c            |  14 +-
>  include/uapi/linux/kvm.h          |   1 +
>  tools/include/uapi/linux/kvm.h    |   1 +
>  virt/kvm/kvm_main.c               |   7 +-
>  9 files changed, 499 insertions(+), 26 deletions(-)
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-06-18  4:14 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16  9:35 [PATCH 00/12] KVM: arm64: Support stage2 hardware DBM Keqian Zhu
2020-06-16  9:35 ` Keqian Zhu
2020-06-16  9:35 ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 01/12] KVM: arm64: Add some basic functions to support hw DBM Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 02/12] KVM: arm64: Modify stage2 young mechanism " Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 03/12] KVM: arm64: Report hardware dirty status of stage2 PTE if coverred Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-07-01 11:28   ` Steven Price
2020-07-01 11:28     ` Steven Price
2020-07-01 11:28     ` Steven Price
2020-07-02 11:28     ` zhukeqian
2020-07-02 11:28       ` zhukeqian
2020-07-02 11:28       ` zhukeqian
2020-06-16  9:35 ` [PATCH 04/12] KVM: arm64: Support clear DBM bit for PTEs Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 05/12] KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 06/12] KVM: arm64: Set DBM bit of PTEs during write protecting Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 07/12] KVM: arm64: Scan PTEs to sync dirty log Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 08/12] KVM: Omit dirty log sync in log clear if initially all set Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 09/12] KVM: arm64: Steply write protect page table by mask bit Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 10/12] KVM: arm64: Save stage2 PTE dirty status if it is coverred Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 11/12] KVM: arm64: Support disable hw dirty log after enable Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35 ` [PATCH 12/12] KVM: arm64: Enable stage2 hardware DBM Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-16  9:35   ` Keqian Zhu
2020-06-18  4:13 ` zhukeqian [this message]
2020-06-18  4:13   ` [PATCH 00/12] KVM: arm64: Support " zhukeqian
2020-06-18  4:13   ` zhukeqian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f480d0dc-aaf8-c915-409c-d0f56812a49f@huawei.com \
    --to=zhukeqian1@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexios.zavras@intel.com \
    --cc=broonie@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=james.morse@arm.com \
    --cc=julien.thierry.kdev@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=liangpeng10@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=sean.j.christopherson@intel.com \
    --cc=suzuki.poulose@arm.com \
    --cc=tglx@linutronix.de \
    --cc=wanghaibin.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=zhengxiang9@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.