kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM
@ 2020-07-02 13:55 Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 1/8] KVM: arm64: Set DBM bit for writable PTEs Keqian Zhu
                   ` (7 more replies)
  0 siblings, 8 replies; 10+ messages in thread
From: Keqian Zhu @ 2020-07-02 13:55 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang,
	Keqian Zhu

This patch series add support for dirty log based on HW DBM.

It works well under some migration test cases, including VM with 4K
pages or 2M THP. I checked the SHA256 hash digest of all memory and
they keep same for source VM and destination VM, which means no dirty
pages is missed under hardware DBM.

Some key points:

1. Only support hardware updates of dirty status for PTEs. PMDs and PUDs
   are not involved for now.

2. About *performance*: In RFC patch, I have mentioned that for every 64GB
   memory, KVM consumes about 40ms to scan all PTEs to collect dirty log.
   This patch solves this problem through two ways: HW/SW dynamic switch
   and Multi-core offload.

   HW/SW dynamic switch: Give userspace right to enable/disable hw dirty
   log. This adds a new KVM cap named KVM_CAP_ARM_HW_DIRTY_LOG. We can
   achieve this by change the kvm->arch.vtcr value and kick vCPUs out to
   reload this value to VCTR_EL2. Then userspace can enable hw dirty log
   at the begining and disable it when dirty pages is little and about to
   stop VM, so VM downtime is not affected.

   Multi-core offload: Offload the PT scanning workload to multi-core can
   greatly reduce scanning time. To promise we can complete in time, I use
   smp_call_fuction to realize this policy, which utilize IPI to dispatch
   workload to other CPUs. Under 128U Kunpeng 920 platform, it just takes
   about 5ms to scan PTs of 256 RAM (use mempress and almost all PTs have
   been established). And We dispatch workload iterately (every CPU just
   scan PTs of 512M RAM for each iteration), so it won't affect physical
   CPUs seriously.

3. About correctness: Only add DBM bit when PTE is already writable, so
   we still have readonly PTE and some mechanisms which rely on readonly
   PTs are not broken.

4. About PTs modification races: There are two kinds of PTs modification.
   
   The first is adding or clearing specific bit, such as AF or RW. All
   these operations have been converted to be atomic, avoid covering
   dirty status set by hardware.
   
   The second is replacement, such as PTEs unmapping or changement. All
   these operations will invoke kvm_set_pte finally. kvm_set_pte have
   been converted to be atomic and we save the dirty status to underlying
   bitmap if dirty status is coverred.

Change log:

v2:
 - Address Steven's comments.
 - Add support of parallel dirty log sync.
 - Simplify and merge patches of v1.

v1:
 - Address Catalin's comments.

Keqian Zhu (8):
  KVM: arm64: Set DBM bit for writable PTEs
  KVM: arm64: Scan PTEs to sync dirty log
  KVM: arm64: Modify stage2 young mechanism to support hw DBM
  KVM: arm64: Save stage2 PTE dirty status if it is covered
  KVM: arm64: Steply write protect page table by mask bit
  KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability
  KVM: arm64: Sync dirty log parallel
  KVM: Omit dirty log sync in log clear if initially all set

 arch/arm64/include/asm/kvm_host.h |   5 +
 arch/arm64/include/asm/kvm_mmu.h  |  43 ++++-
 arch/arm64/kvm/arm.c              |  45 ++++-
 arch/arm64/kvm/mmu.c              | 307 ++++++++++++++++++++++++++++--
 arch/arm64/kvm/reset.c            |   5 +
 include/uapi/linux/kvm.h          |   1 +
 tools/include/uapi/linux/kvm.h    |   1 +
 virt/kvm/kvm_main.c               |   3 +-
 8 files changed, 389 insertions(+), 21 deletions(-)

-- 
2.19.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/8] KVM: arm64: Set DBM bit for writable PTEs
  2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
@ 2020-07-02 13:55 ` Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 2/8] KVM: arm64: Scan PTEs to sync dirty log Keqian Zhu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Keqian Zhu @ 2020-07-02 13:55 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang,
	Keqian Zhu

DBM bit is used by MMU to differentiate a genuinely non-writable
page from a page that is only temporarily non-writable in order
to mark dirty.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index b12bfc1f051a..2700442b0f75 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -169,6 +169,10 @@ void kvm_clear_hyp_idmap(void);
 static inline pte_t kvm_s2pte_mkwrite(pte_t pte)
 {
 	pte_val(pte) |= PTE_S2_RDWR;
+
+	if (IS_ENABLED(CONFIG_ARM64_HW_AFDBM))
+		pte_val(pte) |= PTE_DBM;
+
 	return pte;
 }
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/8] KVM: arm64: Scan PTEs to sync dirty log
  2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 1/8] KVM: arm64: Set DBM bit for writable PTEs Keqian Zhu
@ 2020-07-02 13:55 ` Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 3/8] KVM: arm64: Modify stage2 young mechanism to support hw DBM Keqian Zhu
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Keqian Zhu @ 2020-07-02 13:55 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang,
	Keqian Zhu

For hardware management of dirty state, dirty state is stored in
PTEs. We have to scan all PTEs to sync dirty log to memslot dirty
bitmap.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h |   1 +
 arch/arm64/include/asm/kvm_mmu.h  |  13 +++
 arch/arm64/kvm/arm.c              |   3 +-
 arch/arm64/kvm/mmu.c              | 142 ++++++++++++++++++++++++++++++
 4 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index c3e6fcc664b1..86b9c210ba43 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -480,6 +480,7 @@ u64 __kvm_call_hyp(void *hypfn, ...);
 
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
+void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
 
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		int exception_index);
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 2700442b0f75..4c12b7ad8ae8 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -269,6 +269,19 @@ static inline bool kvm_s2pud_young(pud_t pud)
 	return pud_young(pud);
 }
 
+static inline bool arm_mmu_hw_dbm_supported(void)
+{
+	u8 hadbs = (read_sysreg(id_aa64mmfr1_el1) >>
+		ID_AA64MMFR1_HADBS_SHIFT) & 0xf;
+
+	return hadbs == 0x2;
+}
+
+static inline bool kvm_mmu_hw_dbm_enabled(struct kvm *kvm)
+{
+	return arm_mmu_hw_dbm_supported() && !!(kvm->arch.vtcr & VTCR_EL2_HD);
+}
+
 #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
 
 #ifdef __PAGETABLE_PMD_FOLDED
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90cb90561446..fefa5406e037 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1199,7 +1199,8 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
-
+	if (IS_ENABLED(CONFIG_ARM66_HW_AFDBM) && kvm_mmu_hw_dbm_enabled(kvm))
+		kvm_mmu_sync_dirty_log(kvm, memslot);
 }
 
 void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8c0035cab6b6..b3cb8b6da4c2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2411,6 +2411,148 @@ int kvm_mmu_init(void)
 	return err;
 }
 
+#ifdef CONFIG_ARM64_HW_AFDBM
+/**
+ * stage2_sync_dirty_log_ptes() - synchronize dirty log from PMD range
+ * @kvm:	The KVM pointer
+ * @pmd:	pointer to pmd entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_sync_dirty_log_ptes(struct kvm *kvm, pmd_t *pmd,
+				       phys_addr_t addr, phys_addr_t end)
+{
+	pte_t *pte;
+
+	pte = pte_offset_kernel(pmd, addr);
+	do {
+		if (!pte_none(*pte) && !kvm_s2pte_readonly(pte))
+			mark_page_dirty(kvm, addr >> PAGE_SHIFT);
+	} while (pte++, addr += PAGE_SIZE, addr != end);
+}
+
+/**
+ * stage2_sync_dirty_log_pmds() - synchronize dirty log from PUD range
+ * @kvm:	The KVM pointer
+ * @pud:	pointer to pud entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_sync_dirty_log_pmds(struct kvm *kvm, pud_t *pud,
+				       phys_addr_t addr, phys_addr_t end)
+{
+	pmd_t *pmd;
+	phys_addr_t next;
+
+	pmd = stage2_pmd_offset(kvm, pud, addr);
+	do {
+		next = stage2_pmd_addr_end(kvm, addr, end);
+		if (!pmd_none(*pmd) && !pmd_thp_or_huge(*pmd))
+			stage2_sync_dirty_log_ptes(kvm, pmd, addr, next);
+	} while (pmd++, addr = next, addr != end);
+}
+
+/**
+ * stage2_sync_dirty_log_puds() - synchronize dirty log from P4D range
+ * @kvm:	The KVM pointer
+ * @pgd:	pointer to pgd entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_sync_dirty_log_puds(struct kvm *kvm, p4d_t *p4d,
+				       phys_addr_t addr, phys_addr_t end)
+{
+	pud_t *pud;
+	phys_addr_t next;
+
+	pud = stage2_pud_offset(kvm, p4d, addr);
+	do {
+		next = stage2_pud_addr_end(kvm, addr, end);
+		if (!stage2_pud_none(kvm, *pud) && !stage2_pud_huge(kvm, *pud))
+			stage2_sync_dirty_log_pmds(kvm, pud, addr, next);
+	} while (pud++, addr = next, addr != end);
+}
+
+/**
+ * stage2_sync_dirty_log_p4ds() - synchronize dirty log from PGD range
+ * @kvm:	The KVM pointer
+ * @pgd:	pointer to pgd entry
+ * @addr:	range start address
+ * @end:	range end address
+ */
+static void stage2_sync_dirty_log_p4ds(struct kvm *kvm, pgd_t *pgd,
+				       phys_addr_t addr, phys_addr_t end)
+{
+	p4d_t *p4d;
+	phys_addr_t next;
+
+	p4d = stage2_p4d_offset(kvm, pgd, addr);
+	do {
+		next = stage2_p4d_addr_end(kvm, addr, end);
+		if (!stage2_p4d_none(kvm, *p4d))
+			stage2_sync_dirty_log_puds(kvm, p4d, addr, next);
+	} while (p4d++, addr = next, addr != end);
+}
+
+/**
+ * stage2_sync_dirty_log_range() - synchronize dirty log from stage2 memory
+ * region range
+ * @kvm:	The KVM pointer
+ * @addr:	Start address of range
+ * @end:	End address of range
+ */
+static void stage2_sync_dirty_log_range(struct kvm *kvm, phys_addr_t addr,
+					phys_addr_t end)
+{
+	pgd_t *pgd;
+	phys_addr_t next;
+
+	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
+	do {
+		cond_resched_lock(&kvm->mmu_lock);
+		if (!READ_ONCE(kvm->arch.pgd))
+			break;
+		next = stage2_pgd_addr_end(kvm, addr, end);
+		if (stage2_pgd_present(kvm, *pgd))
+			stage2_sync_dirty_log_p4ds(kvm, pgd, addr, next);
+	} while (pgd++, addr = next, addr != end);
+}
+
+/**
+ * kvm_mmu_sync_dirty_log() - synchronize dirty log from stage2 PTEs for
+ * memory slot
+ * @kvm:	The KVM pointer
+ * @slot:	The memory slot to synchronize dirty log
+ *
+ * Called to synchronize dirty log (marked by hw) after memory region
+ * KVM_GET_DIRTY_LOG operation is called. After this function returns
+ * all dirty log information (for that hw will modify page tables during
+ * this routine, it is true only when guest is stopped, but it is OK
+ * because we won't miss dirty log finally.) are collected into memslot
+ * dirty_bitmap. Afterwards dirty_bitmap can be copied to userspace.
+ *
+ * Acquires kvm_mmu_lock. Called with kvm->slots_lock mutex acquired,
+ * serializing operations for VM memory regions.
+ */
+void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
+	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+	int srcu_idx;
+
+	if (WARN_ON_ONCE(!memslot->dirty_bitmap))
+		return;
+
+	srcu_idx = srcu_read_lock(&kvm->srcu);
+	spin_lock(&kvm->mmu_lock);
+
+	stage2_sync_dirty_log_range(kvm, start, end);
+
+	spin_unlock(&kvm->mmu_lock);
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
+}
+#endif /* CONFIG_ARM64_HW_AFDBM */
+
 void kvm_arch_commit_memory_region(struct kvm *kvm,
 				   const struct kvm_userspace_memory_region *mem,
 				   struct kvm_memory_slot *old,
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/8] KVM: arm64: Modify stage2 young mechanism to support hw DBM
  2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 1/8] KVM: arm64: Set DBM bit for writable PTEs Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 2/8] KVM: arm64: Scan PTEs to sync dirty log Keqian Zhu
@ 2020-07-02 13:55 ` Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 4/8] KVM: arm64: Save stage2 PTE dirty status if it is covered Keqian Zhu
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Keqian Zhu @ 2020-07-02 13:55 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang,
	Keqian Zhu

Marking PTs young (set AF bit) should be atomic to avoid cover
dirty status set by hardware.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
---
 arch/arm64/include/asm/kvm_mmu.h | 31 ++++++++++++++++++++++---------
 arch/arm64/kvm/mmu.c             | 15 ++++++++-------
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 4c12b7ad8ae8..a1b6131d980c 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -219,6 +219,18 @@ static inline void kvm_set_s2pte_readonly(pte_t *ptep)
 	} while (pteval != old_pteval);
 }
 
+static inline void kvm_set_s2pte_young(pte_t *ptep)
+{
+	pteval_t old_pteval, pteval;
+
+	pteval = READ_ONCE(pte_val(*ptep));
+	do {
+		old_pteval = pteval;
+		pteval |= PTE_AF;
+		pteval = cmpxchg_relaxed(&pte_val(*ptep), old_pteval, pteval);
+	} while (pteval != old_pteval);
+}
+
 static inline bool kvm_s2pte_readonly(pte_t *ptep)
 {
 	return (READ_ONCE(pte_val(*ptep)) & PTE_S2_RDWR) == PTE_S2_RDONLY;
@@ -234,6 +246,11 @@ static inline void kvm_set_s2pmd_readonly(pmd_t *pmdp)
 	kvm_set_s2pte_readonly((pte_t *)pmdp);
 }
 
+static inline void kvm_set_s2pmd_young(pmd_t *pmdp)
+{
+	kvm_set_s2pte_young((pte_t *)pmdp);
+}
+
 static inline bool kvm_s2pmd_readonly(pmd_t *pmdp)
 {
 	return kvm_s2pte_readonly((pte_t *)pmdp);
@@ -249,6 +266,11 @@ static inline void kvm_set_s2pud_readonly(pud_t *pudp)
 	kvm_set_s2pte_readonly((pte_t *)pudp);
 }
 
+static inline void kvm_set_s2pud_young(pud_t *pudp)
+{
+	kvm_set_s2pte_young((pte_t *)pudp);
+}
+
 static inline bool kvm_s2pud_readonly(pud_t *pudp)
 {
 	return kvm_s2pte_readonly((pte_t *)pudp);
@@ -259,15 +281,6 @@ static inline bool kvm_s2pud_exec(pud_t *pudp)
 	return !(READ_ONCE(pud_val(*pudp)) & PUD_S2_XN);
 }
 
-static inline pud_t kvm_s2pud_mkyoung(pud_t pud)
-{
-	return pud_mkyoung(pud);
-}
-
-static inline bool kvm_s2pud_young(pud_t pud)
-{
-	return pud_young(pud);
-}
 
 static inline bool arm_mmu_hw_dbm_supported(void)
 {
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index b3cb8b6da4c2..ab8a6ceecbd8 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2008,8 +2008,9 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
  * Resolve the access fault by making the page young again.
  * Note that because the faulting entry is guaranteed not to be
  * cached in the TLB, we don't need to invalidate anything.
- * Only the HW Access Flag updates are supported for Stage 2 (no DBM),
- * so there is no need for atomic (pte|pmd)_mkyoung operations.
+ *
+ * Note: Both DBM and HW AF updates are supported for Stage2, so
+ * young operations should be atomic.
  */
 static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 {
@@ -2027,15 +2028,15 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
 		goto out;
 
 	if (pud) {		/* HugeTLB */
-		*pud = kvm_s2pud_mkyoung(*pud);
+		kvm_set_s2pud_young(pud);
 		pfn = kvm_pud_pfn(*pud);
 		pfn_valid = true;
 	} else	if (pmd) {	/* THP, HugeTLB */
-		*pmd = pmd_mkyoung(*pmd);
+		kvm_set_s2pmd_young(pmd);
 		pfn = pmd_pfn(*pmd);
 		pfn_valid = true;
-	} else {
-		*pte = pte_mkyoung(*pte);	/* Just a page... */
+	} else {		/* Just a page... */
+		kvm_set_s2pte_young(pte);
 		pfn = pte_pfn(*pte);
 		pfn_valid = true;
 	}
@@ -2280,7 +2281,7 @@ static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, u64 size, void *
 		return 0;
 
 	if (pud)
-		return kvm_s2pud_young(*pud);
+		return pud_young(*pud);
 	else if (pmd)
 		return pmd_young(*pmd);
 	else
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/8] KVM: arm64: Save stage2 PTE dirty status if it is covered
  2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
                   ` (2 preceding siblings ...)
  2020-07-02 13:55 ` [PATCH v2 3/8] KVM: arm64: Modify stage2 young mechanism to support hw DBM Keqian Zhu
@ 2020-07-02 13:55 ` Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 5/8] KVM: arm64: Steply write protect page table by mask bit Keqian Zhu
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Keqian Zhu @ 2020-07-02 13:55 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang,
	Keqian Zhu

kvm_set_pte is called to replace a target PTE with a desired one.
We always do this without changing the desired one, but if dirty
status set by hardware is covered, let caller know it.

There are two types of operations will change PTE and may cover
dirty status set by hardware.

1. Stage2 PTE unmapping: Page table merging (revert of huge page
table dissolving), kvm_unmap_hva_range() and so on.

2. Stage2 PTE changing: including user_mem_abort(), kvm_mmu_notifier
_change_pte() and so on.

All operations above will invoke kvm_set_pte() finally. We should
save the dirty status into memslot bitmap.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
---
 arch/arm64/include/asm/kvm_mmu.h |  5 ++++
 arch/arm64/kvm/mmu.c             | 42 ++++++++++++++++++++++++++++----
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index a1b6131d980c..adb5c8edb29e 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -295,6 +295,11 @@ static inline bool kvm_mmu_hw_dbm_enabled(struct kvm *kvm)
 	return arm_mmu_hw_dbm_supported() && !!(kvm->arch.vtcr & VTCR_EL2_HD);
 }
 
+static inline bool kvm_s2pte_dbm(pte_t *ptep)
+{
+	return !!(READ_ONCE(pte_val(*ptep)) & PTE_DBM);
+}
+
 #define hyp_pte_table_empty(ptep) kvm_page_empty(ptep)
 
 #ifdef __PAGETABLE_PMD_FOLDED
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index ab8a6ceecbd8..d0c34549ef3b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -194,10 +194,26 @@ static void clear_stage2_pmd_entry(struct kvm *kvm, pmd_t *pmd, phys_addr_t addr
 	put_page(virt_to_page(pmd));
 }
 
-static inline void kvm_set_pte(pte_t *ptep, pte_t new_pte)
+/**
+ * @ret: true if dirty status set by hardware is covered.
+ */
+static inline bool kvm_set_pte(pte_t *ptep, pte_t new_pte)
 {
-	WRITE_ONCE(*ptep, new_pte);
-	dsb(ishst);
+	pteval_t old_pteval;
+	bool old_logging, new_no_write;
+
+	old_logging = IS_ENABLED(CONFIG_ARM64_HW_AFDBM) &&
+		arm_mmu_hw_dbm_supported() && kvm_s2pte_dbm(ptep);
+	new_no_write = pte_none(new_pte) || kvm_s2pte_readonly(&new_pte);
+
+	if (!old_logging || !new_no_write) {
+		WRITE_ONCE(*ptep, new_pte);
+		dsb(ishst);
+		return false;
+	}
+
+	old_pteval = xchg(&pte_val(*ptep), pte_val(new_pte));
+	return !kvm_s2pte_readonly(&__pte(old_pteval));
 }
 
 static inline void kvm_set_pmd(pmd_t *pmdp, pmd_t new_pmd)
@@ -260,15 +276,23 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 {
 	phys_addr_t start_addr = addr;
 	pte_t *pte, *start_pte;
+	bool dirty_covered;
+	int idx;
 
 	start_pte = pte = pte_offset_kernel(pmd, addr);
 	do {
 		if (!pte_none(*pte)) {
 			pte_t old_pte = *pte;
 
-			kvm_set_pte(pte, __pte(0));
+			dirty_covered = kvm_set_pte(pte, __pte(0));
 			kvm_tlb_flush_vmid_ipa(kvm, addr);
 
+			if (dirty_covered) {
+				idx = srcu_read_lock(&kvm->srcu);
+				mark_page_dirty(kvm, addr >> PAGE_SHIFT);
+				srcu_read_unlock(&kvm->srcu, idx);
+			}
+
 			/* No need to invalidate the cache for device mappings */
 			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
 				kvm_flush_dcache_pte(old_pte);
@@ -1354,6 +1378,8 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 	pte_t *pte, old_pte;
 	bool iomap = flags & KVM_S2PTE_FLAG_IS_IOMAP;
 	bool logging_active = flags & KVM_S2_FLAG_LOGGING_ACTIVE;
+	bool dirty_covered;
+	int idx;
 
 	VM_BUG_ON(logging_active && !cache);
 
@@ -1419,8 +1445,14 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 		if (pte_val(old_pte) == pte_val(*new_pte))
 			return 0;
 
-		kvm_set_pte(pte, __pte(0));
+		dirty_covered = kvm_set_pte(pte, __pte(0));
 		kvm_tlb_flush_vmid_ipa(kvm, addr);
+
+		if (dirty_covered) {
+			idx = srcu_read_lock(&kvm->srcu);
+			mark_page_dirty(kvm, addr >> PAGE_SHIFT);
+			srcu_read_unlock(&kvm->srcu, idx);
+		}
 	} else {
 		get_page(virt_to_page(pte));
 	}
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 5/8] KVM: arm64: Steply write protect page table by mask bit
  2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
                   ` (3 preceding siblings ...)
  2020-07-02 13:55 ` [PATCH v2 4/8] KVM: arm64: Save stage2 PTE dirty status if it is covered Keqian Zhu
@ 2020-07-02 13:55 ` Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 6/8] KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability Keqian Zhu
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 10+ messages in thread
From: Keqian Zhu @ 2020-07-02 13:55 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang,
	Keqian Zhu

During dirty log clear, page table entries are write protected
according to a mask. In the past we write protect all entries
corresponding to the mask from ffs to fls. Though there may be
zero bits between this range, we are holding the kvm mmu lock
so we won't write protect entries that we don't want to.

We are about to add support for hardware management of dirty state
to arm64, holding kvm mmu lock will be not enough. We should write
protect entries steply by mask bit.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
---
 arch/arm64/kvm/mmu.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index d0c34549ef3b..adfa62f1fced 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1703,10 +1703,16 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 		gfn_t gfn_offset, unsigned long mask)
 {
 	phys_addr_t base_gfn = slot->base_gfn + gfn_offset;
-	phys_addr_t start = (base_gfn +  __ffs(mask)) << PAGE_SHIFT;
-	phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT;
+	phys_addr_t start, end;
+	u32 i;
 
-	stage2_wp_range(kvm, start, end);
+	for (i = __ffs(mask); i <= __fls(mask); i++) {
+		if (test_bit_le(i, &mask)) {
+			start = (base_gfn + i) << PAGE_SHIFT;
+			end = (base_gfn + i + 1) << PAGE_SHIFT;
+			stage2_wp_range(kvm, start, end);
+		}
+	}
 }
 
 /*
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 6/8] KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability
  2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
                   ` (4 preceding siblings ...)
  2020-07-02 13:55 ` [PATCH v2 5/8] KVM: arm64: Steply write protect page table by mask bit Keqian Zhu
@ 2020-07-02 13:55 ` Keqian Zhu
  2020-07-06  1:08   ` zhukeqian
  2020-07-02 13:55 ` [PATCH v2 7/8] KVM: arm64: Sync dirty log parallel Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 8/8] KVM: Omit dirty log sync in log clear if initially all set Keqian Zhu
  7 siblings, 1 reply; 10+ messages in thread
From: Keqian Zhu @ 2020-07-02 13:55 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang,
	Keqian Zhu

For that using arm64 DBM to log dirty pages has the side effect
of long time dirty log sync, we should give userspace opportunity
to enable or disable this feature, to realize some policy.

This feature is disabled by default.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/arm.c              | 35 +++++++++++++++++++++++++++++++
 arch/arm64/kvm/mmu.c              | 22 +++++++++++++++++++
 arch/arm64/kvm/reset.c            |  5 +++++
 include/uapi/linux/kvm.h          |  1 +
 tools/include/uapi/linux/kvm.h    |  1 +
 6 files changed, 65 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 86b9c210ba43..69a5317c7049 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -481,6 +481,7 @@ u64 __kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
 void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
+void kvm_mmu_sync_dirty_log_all(struct kvm *kvm);
 
 int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		int exception_index);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index fefa5406e037..9e3f765d5467 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -78,6 +78,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			    struct kvm_enable_cap *cap)
 {
 	int r;
+#ifdef CONFIG_ARM64_HW_AFDBM
+	int i;
+	struct kvm_vcpu *vcpu;
+	bool enable_hw_dirty_log;
+#endif
 
 	if (cap->flags)
 		return -EINVAL;
@@ -87,6 +92,36 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		kvm->arch.return_nisv_io_abort_to_user = true;
 		break;
+#ifdef CONFIG_ARM64_HW_AFDBM
+	case KVM_CAP_ARM_HW_DIRTY_LOG:
+		if (!arm_mmu_hw_dbm_supported() || (cap->args[0] & ~0x1))
+			r = -EINVAL;
+
+		enable_hw_dirty_log = !!(cap->args[0] & 0x1);
+		if (!!(kvm->arch.vtcr & VTCR_EL2_HD) != enable_hw_dirty_log) {
+			if (enable_hw_dirty_log)
+				kvm->arch.vtcr |= VTCR_EL2_HD;
+			else
+				kvm->arch.vtcr &= ~VTCR_EL2_HD;
+
+			/*
+			 * We should kick vcpus out of guest mode here to
+			 * load new vtcr value to vtcr_el2 register when
+			 * re-enter guest mode.
+			 */
+			kvm_for_each_vcpu(i, vcpu, kvm)
+				kvm_vcpu_kick(vcpu);
+
+			if (!enable_hw_dirty_log) {
+				mutex_lock(&kvm->slots_lock);
+				kvm_mmu_sync_dirty_log_all(kvm);
+				mutex_unlock(&kvm->slots_lock);
+			}
+		}
+
+		r = 0;
+		break;
+#endif
 	default:
 		r = -EINVAL;
 		break;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index adfa62f1fced..1a48554accb0 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2590,6 +2590,28 @@ void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 }
+
+/**
+ * kvm_mmu_sync_dirty_log_all() - synchronize dirty log from PTEs for whole VM
+ * @kvm:	The KVM pointer
+ *
+ * Called with kvm->slots_lock mutex acquired
+ */
+void kvm_mmu_sync_dirty_log_all(struct kvm *kvm)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *memslots = slots->memslots;
+	struct kvm_memory_slot *memslot;
+	int slot;
+
+	if (unlikely(!slots->used_slots))
+		return;
+
+	for (slot = 0; slot < slots->used_slots; slot++) {
+		memslot = &memslots[slot];
+		kvm_mmu_sync_dirty_log(kvm, memslot);
+	}
+}
 #endif /* CONFIG_ARM64_HW_AFDBM */
 
 void kvm_arch_commit_memory_region(struct kvm *kvm,
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index d3b209023727..a3be703dd54b 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -83,6 +83,11 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = has_vhe() && system_supports_address_auth() &&
 				 system_supports_generic_auth();
 		break;
+#ifdef CONFIG_ARM64_HW_AFDBM
+	case KVM_CAP_ARM_HW_DIRTY_LOG:
+		r = arm_mmu_hw_dbm_supported();
+		break;
+#endif /* CONFIG_ARM64_HW_AFDBM */
 	default:
 		r = 0;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 4fdf30316582..e0b12c43397b 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SECURE_GUEST 181
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
+#define KVM_CAP_ARM_HW_DIRTY_LOG 184
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index 4fdf30316582..e0b12c43397b 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_PPC_SECURE_GUEST 181
 #define KVM_CAP_HALT_POLL 182
 #define KVM_CAP_ASYNC_PF_INT 183
+#define KVM_CAP_ARM_HW_DIRTY_LOG 184
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 7/8] KVM: arm64: Sync dirty log parallel
  2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
                   ` (5 preceding siblings ...)
  2020-07-02 13:55 ` [PATCH v2 6/8] KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability Keqian Zhu
@ 2020-07-02 13:55 ` Keqian Zhu
  2020-07-02 13:55 ` [PATCH v2 8/8] KVM: Omit dirty log sync in log clear if initially all set Keqian Zhu
  7 siblings, 0 replies; 10+ messages in thread
From: Keqian Zhu @ 2020-07-02 13:55 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang,
	Keqian Zhu

Give userspace another selection to solve high-cost dirty log
sync, which called multi-core offload. Usersapce can enable
this policy through KVM_CAP_ARM_HW_DIRTY_LOG.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: Peng Liang <liangpeng10@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h |  3 ++
 arch/arm64/kvm/arm.c              |  9 +++-
 arch/arm64/kvm/mmu.c              | 82 +++++++++++++++++++++++++++++--
 arch/arm64/kvm/reset.c            |  2 +-
 4 files changed, 90 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 69a5317c7049..05da819f9adc 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -95,6 +95,9 @@ struct kvm_arch {
 	 * supported.
 	 */
 	bool return_nisv_io_abort_to_user;
+
+	/* Sync dirty log parallel when hw dirty log enabled */
+	bool sync_dirty_log_parallel;
 };
 
 #define KVM_NR_MEM_OBJS     40
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9e3f765d5467..89614984831d 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -82,6 +82,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 	int i;
 	struct kvm_vcpu *vcpu;
 	bool enable_hw_dirty_log;
+	bool enable_sync_parallel;
 #endif
 
 	if (cap->flags)
@@ -94,10 +95,14 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		break;
 #ifdef CONFIG_ARM64_HW_AFDBM
 	case KVM_CAP_ARM_HW_DIRTY_LOG:
-		if (!arm_mmu_hw_dbm_supported() || (cap->args[0] & ~0x1))
+		if (!arm_mmu_hw_dbm_supported() || (cap->args[0] & ~0x3))
 			r = -EINVAL;
 
 		enable_hw_dirty_log = !!(cap->args[0] & 0x1);
+		enable_sync_parallel = !!(cap->args[0] & 0x2);
+		if (!enable_hw_dirty_log && enable_sync_parallel)
+			r = -EINVAL;
+
 		if (!!(kvm->arch.vtcr & VTCR_EL2_HD) != enable_hw_dirty_log) {
 			if (enable_hw_dirty_log)
 				kvm->arch.vtcr |= VTCR_EL2_HD;
@@ -119,6 +124,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 			}
 		}
 
+		kvm->arch.sync_dirty_log_parallel = enable_sync_parallel;
+
 		r = 0;
 		break;
 #endif
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 1a48554accb0..be360e0fd20b 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -2548,15 +2548,55 @@ static void stage2_sync_dirty_log_range(struct kvm *kvm, phys_addr_t addr,
 
 	pgd = kvm->arch.pgd + stage2_pgd_index(kvm, addr);
 	do {
-		cond_resched_lock(&kvm->mmu_lock);
-		if (!READ_ONCE(kvm->arch.pgd))
-			break;
+		if (!kvm->arch.sync_dirty_log_parallel) {
+			cond_resched_lock(&kvm->mmu_lock);
+			if (!READ_ONCE(kvm->arch.pgd))
+				break;
+		}
 		next = stage2_pgd_addr_end(kvm, addr, end);
 		if (stage2_pgd_present(kvm, *pgd))
 			stage2_sync_dirty_log_p4ds(kvm, pgd, addr, next);
 	} while (pgd++, addr = next, addr != end);
 }
 
+static struct dirty_sync_task {
+	struct kvm *kvm;
+	struct kvm_memory_slot *memslot;
+	u32 cpu_cnt;
+	u16 cpu_idx_map[NR_CPUS];
+	u32 ite_npages;
+	u32 ite;
+	bool finished;
+} sync_task;
+
+static void stage2_sync_dirty_log_smp(void *task)
+{
+	struct dirty_sync_task *t = task;
+	struct kvm_memory_slot *memslot = t->memslot;
+	unsigned long ite_idx, base_page, end_page;
+	gfn_t base_gfn;
+
+	ite_idx = t->cpu_cnt * t->ite + t->cpu_idx_map[smp_processor_id()];
+
+	base_page = ite_idx * t->ite_npages;
+	if (base_page >= memslot->npages) {
+		t->finished = true;
+		trace_printk("stage2_sync_dirty_log_smp finished 1.\n");
+		return;
+	}
+
+	end_page = min(memslot->npages, base_page + t->ite_npages);
+	if (end_page == memslot->npages) {
+		t->finished = true;
+		trace_printk("stage2_sync_dirty_log_smp finished 2.\n");
+	}
+
+	base_gfn = memslot->base_gfn;
+	trace_printk("base_page 0x%lx, end_page 0x%lx.\n", base_page, end_page);
+	stage2_sync_dirty_log_range(t->kvm, (base_gfn + base_page) << PAGE_SHIFT,
+				    (base_gfn + end_page) << PAGE_SHIFT);
+}
+
 /**
  * kvm_mmu_sync_dirty_log() - synchronize dirty log from stage2 PTEs for
  * memory slot
@@ -2577,18 +2617,52 @@ void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
 {
 	phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
 	phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+	u32 ite_npages, cpu_cnt, this_cpu, cpu;
+	u16 cpu_idx;
 	int srcu_idx;
 
 	if (WARN_ON_ONCE(!memslot->dirty_bitmap))
 		return;
 
+	get_online_cpus();
 	srcu_idx = srcu_read_lock(&kvm->srcu);
 	spin_lock(&kvm->mmu_lock);
 
-	stage2_sync_dirty_log_range(kvm, start, end);
+	ite_npages = (1 << 17); /* 512MB max for per iteration and CPU */
+	cpu_cnt = num_online_cpus();
+
+	/* Use single CPU */
+	if (!kvm->arch.sync_dirty_log_parallel ||
+	    memslot->npages <= ite_npages ||
+	    unlikely(cpu_cnt == 1)) {
+		stage2_sync_dirty_log_range(kvm, start, end);
+		goto out_unlock;
+	}
 
+	/* Use many CPUs through IPI */
+	cpu_idx = 0;
+	this_cpu = smp_processor_id();
+	for_each_online_cpu(cpu) {
+		if (cpu != this_cpu)
+			sync_task.cpu_idx_map[cpu] = cpu_idx++;
+	}
+
+	sync_task.kvm = kvm;
+	sync_task.memslot = memslot;
+	sync_task.cpu_cnt = cpu_cnt - 1; /* Not include this CPU */
+	sync_task.ite_npages = ite_npages;
+
+	sync_task.ite = 0;
+	sync_task.finished = false;
+	do {
+		smp_call_function(stage2_sync_dirty_log_smp, &sync_task, 1);
+		sync_task.ite++;
+	} while (!sync_task.finished);
+
+out_unlock:
 	spin_unlock(&kvm->mmu_lock);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
+	put_online_cpus();
 }
 
 /**
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index a3be703dd54b..4171d6c1d400 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -85,7 +85,7 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		break;
 #ifdef CONFIG_ARM64_HW_AFDBM
 	case KVM_CAP_ARM_HW_DIRTY_LOG:
-		r = arm_mmu_hw_dbm_supported();
+		r = arm_mmu_hw_dbm_supported() ? 0x3 : 0x0;
 		break;
 #endif /* CONFIG_ARM64_HW_AFDBM */
 	default:
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 8/8] KVM: Omit dirty log sync in log clear if initially all set
  2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
                   ` (6 preceding siblings ...)
  2020-07-02 13:55 ` [PATCH v2 7/8] KVM: arm64: Sync dirty log parallel Keqian Zhu
@ 2020-07-02 13:55 ` Keqian Zhu
  7 siblings, 0 replies; 10+ messages in thread
From: Keqian Zhu @ 2020-07-02 13:55 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang,
	Keqian Zhu

Synchronizing dirty log during log clear is useful only when the dirty
bitmap of userspace contains dirty bits that memslot dirty bitmap does
not contain, because we can sync new dirty bits to memslot dirty bitmap,
then we can clear them by the way and avoid reporting them to userspace
later.

With dirty bitmap "initially all set" feature, the above situation will
not appear if userspace logic is normal, so we can omit dirty log sync in
log clear. This is valuable when dirty log sync is a high-cost operation,
such as arm64 DBM.

Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
---
 virt/kvm/kvm_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a852af5c3214..3f9e51d52b7a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1554,7 +1554,8 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,
 	    (log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63)))
 	    return -EINVAL;
 
-	kvm_arch_sync_dirty_log(kvm, memslot);
+	if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
+		kvm_arch_sync_dirty_log(kvm, memslot);
 
 	flush = false;
 	dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
-- 
2.19.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 6/8] KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability
  2020-07-02 13:55 ` [PATCH v2 6/8] KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability Keqian Zhu
@ 2020-07-06  1:08   ` zhukeqian
  0 siblings, 0 replies; 10+ messages in thread
From: zhukeqian @ 2020-07-06  1:08 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Catalin Marinas, Marc Zyngier, James Morse, Will Deacon,
	Suzuki K Poulose, Steven Price, Sean Christopherson,
	Julien Thierry, Mark Brown, Thomas Gleixner, Andrew Morton,
	Alexios Zavras, liangpeng10, zhengxiang9, wanghaibin.wang

Hi,

On 2020/7/2 21:55, Keqian Zhu wrote:
> For that using arm64 DBM to log dirty pages has the side effect
> of long time dirty log sync, we should give userspace opportunity
> to enable or disable this feature, to realize some policy.
> 
> This feature is disabled by default.
> 
> Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
> Signed-off-by: Peng Liang <liangpeng10@huawei.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/kvm/arm.c              | 35 +++++++++++++++++++++++++++++++
>  arch/arm64/kvm/mmu.c              | 22 +++++++++++++++++++
>  arch/arm64/kvm/reset.c            |  5 +++++
>  include/uapi/linux/kvm.h          |  1 +
>  tools/include/uapi/linux/kvm.h    |  1 +
>  6 files changed, 65 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 86b9c210ba43..69a5317c7049 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -481,6 +481,7 @@ u64 __kvm_call_hyp(void *hypfn, ...);
>  void force_vm_exit(const cpumask_t *mask);
>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
>  void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);
> +void kvm_mmu_sync_dirty_log_all(struct kvm *kvm);
>  
>  int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
>  		int exception_index);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index fefa5406e037..9e3f765d5467 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -78,6 +78,11 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  			    struct kvm_enable_cap *cap)
>  {
>  	int r;
> +#ifdef CONFIG_ARM64_HW_AFDBM
> +	int i;
> +	struct kvm_vcpu *vcpu;
> +	bool enable_hw_dirty_log;
> +#endif
>  
>  	if (cap->flags)
>  		return -EINVAL;
> @@ -87,6 +92,36 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
>  		r = 0;
>  		kvm->arch.return_nisv_io_abort_to_user = true;
>  		break;
> +#ifdef CONFIG_ARM64_HW_AFDBM
> +	case KVM_CAP_ARM_HW_DIRTY_LOG:
> +		if (!arm_mmu_hw_dbm_supported() || (cap->args[0] & ~0x1))
> +			r = -EINVAL;
Miss "break" here :-( . Code is converted from "return -EINVAL".
> +
> +		enable_hw_dirty_log = !!(cap->args[0] & 0x1);
> +		if (!!(kvm->arch.vtcr & VTCR_EL2_HD) != enable_hw_dirty_log) {
> +			if (enable_hw_dirty_log)
> +				kvm->arch.vtcr |= VTCR_EL2_HD;
> +			else
> +				kvm->arch.vtcr &= ~VTCR_EL2_HD;
> +
> +			/*
> +			 * We should kick vcpus out of guest mode here to
> +			 * load new vtcr value to vtcr_el2 register when
> +			 * re-enter guest mode.
> +			 */
> +			kvm_for_each_vcpu(i, vcpu, kvm)
> +				kvm_vcpu_kick(vcpu);
> +
> +			if (!enable_hw_dirty_log) {
> +				mutex_lock(&kvm->slots_lock);
> +				kvm_mmu_sync_dirty_log_all(kvm);
> +				mutex_unlock(&kvm->slots_lock);
> +			}
> +		}
> +
> +		r = 0;
> +		break;
> +#endif
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> index adfa62f1fced..1a48554accb0 100644
> --- a/arch/arm64/kvm/mmu.c
> +++ b/arch/arm64/kvm/mmu.c
> @@ -2590,6 +2590,28 @@ void kvm_mmu_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
>  	spin_unlock(&kvm->mmu_lock);
>  	srcu_read_unlock(&kvm->srcu, srcu_idx);
>  }
> +
> +/**
> + * kvm_mmu_sync_dirty_log_all() - synchronize dirty log from PTEs for whole VM
> + * @kvm:	The KVM pointer
> + *
> + * Called with kvm->slots_lock mutex acquired
> + */
> +void kvm_mmu_sync_dirty_log_all(struct kvm *kvm)
> +{
> +	struct kvm_memslots *slots = kvm_memslots(kvm);
> +	struct kvm_memory_slot *memslots = slots->memslots;
> +	struct kvm_memory_slot *memslot;
> +	int slot;
> +
> +	if (unlikely(!slots->used_slots))
> +		return;
> +
> +	for (slot = 0; slot < slots->used_slots; slot++) {
> +		memslot = &memslots[slot];
> +		kvm_mmu_sync_dirty_log(kvm, memslot);
> +	}
> +}
>  #endif /* CONFIG_ARM64_HW_AFDBM */
>  
>  void kvm_arch_commit_memory_region(struct kvm *kvm,
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index d3b209023727..a3be703dd54b 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -83,6 +83,11 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = has_vhe() && system_supports_address_auth() &&
>  				 system_supports_generic_auth();
>  		break;
> +#ifdef CONFIG_ARM64_HW_AFDBM
> +	case KVM_CAP_ARM_HW_DIRTY_LOG:
> +		r = arm_mmu_hw_dbm_supported();
> +		break;
> +#endif /* CONFIG_ARM64_HW_AFDBM */
>  	default:
>  		r = 0;
>  	}
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 4fdf30316582..e0b12c43397b 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_PPC_SECURE_GUEST 181
>  #define KVM_CAP_HALT_POLL 182
>  #define KVM_CAP_ASYNC_PF_INT 183
> +#define KVM_CAP_ARM_HW_DIRTY_LOG 184
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
> index 4fdf30316582..e0b12c43397b 100644
> --- a/tools/include/uapi/linux/kvm.h
> +++ b/tools/include/uapi/linux/kvm.h
> @@ -1031,6 +1031,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_PPC_SECURE_GUEST 181
>  #define KVM_CAP_HALT_POLL 182
>  #define KVM_CAP_ASYNC_PF_INT 183
> +#define KVM_CAP_ARM_HW_DIRTY_LOG 184
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-06  1:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-02 13:55 [PATCH v2 0/8] KVM: arm64: Support HW dirty log based on DBM Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 1/8] KVM: arm64: Set DBM bit for writable PTEs Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 2/8] KVM: arm64: Scan PTEs to sync dirty log Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 3/8] KVM: arm64: Modify stage2 young mechanism to support hw DBM Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 4/8] KVM: arm64: Save stage2 PTE dirty status if it is covered Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 5/8] KVM: arm64: Steply write protect page table by mask bit Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 6/8] KVM: arm64: Add KVM_CAP_ARM_HW_DIRTY_LOG capability Keqian Zhu
2020-07-06  1:08   ` zhukeqian
2020-07-02 13:55 ` [PATCH v2 7/8] KVM: arm64: Sync dirty log parallel Keqian Zhu
2020-07-02 13:55 ` [PATCH v2 8/8] KVM: Omit dirty log sync in log clear if initially all set Keqian Zhu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).