kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [RESEND RFC PATCH v1] arm64: kvm: flush tlbs by range in unmap_stage2_range function
@ 2020-07-24 13:43 Zhenyu Ye
  2020-07-25 17:40 ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Zhenyu Ye @ 2020-07-24 13:43 UTC (permalink / raw)
  To: maz, james.morse, julien.thierry.kdev, suzuki.poulose,
	catalin.marinas, will, steven.price, mark.rutland, ascull
  Cc: linux-arch, kvm, yezhenyu2, linux-kernel, linux-mm, arm, kvmarm,
	linux-arm-kernel

Now in unmap_stage2_range(), we flush tlbs one by one just after the
corresponding pages cleared.  However, this may cause some performance
problems when the unmap range is very large (such as when the vm
migration rollback, this may cause vm downtime too loog).

This patch moves the kvm_tlb_flush_vmid_ipa() out of loop, and
flush tlbs by range after other operations completed.  Because we
do not make new mapping for the pages here, so this doesn't violate
the Break-Before-Make rules.

Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
---
 arch/arm64/include/asm/kvm_asm.h |  2 ++
 arch/arm64/kvm/hyp/tlb.c         | 36 ++++++++++++++++++++++++++++++++
 arch/arm64/kvm/mmu.c             | 11 +++++++---
 3 files changed, 46 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 352aaebf4198..ef8203d3ca45 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -61,6 +61,8 @@ extern char __kvm_hyp_vector[];
 
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa);
+extern void __kvm_tlb_flush_vmid_range(struct kvm *kvm, phys_addr_t start,
+				       phys_addr_t end);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
 
diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
index d063a576d511..4f4737a7e588 100644
--- a/arch/arm64/kvm/hyp/tlb.c
+++ b/arch/arm64/kvm/hyp/tlb.c
@@ -189,6 +189,42 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 	__tlb_switch_to_host(kvm, &cxt);
 }
 
+void __hyp_text __kvm_tlb_flush_vmid_range(struct kvm *kvm, phys_addr_t start,
+					   phys_addr_t end)
+{
+	struct tlb_inv_context cxt;
+	unsigned long addr;
+
+	start = __TLBI_VADDR(start, 0);
+	end = __TLBI_VADDR(end, 0);
+
+	dsb(ishst);
+
+	/* Switch to requested VMID */
+	kvm = kern_hyp_va(kvm);
+	__tlb_switch_to_guest(kvm, &cxt);
+
+	if ((end - start) >= 512 << (PAGE_SHIFT - 12)) {
+		__tlbi(vmalls12e1is);
+		goto end;
+	}
+
+	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12))
+		__tlbi(ipas2e1is, addr);
+
+	dsb(ish);
+	__tlbi(vmalle1is);
+
+end:
+	dsb(ish);
+	isb();
+
+	if (!has_vhe() && icache_is_vpipt())
+		__flush_icache_all();
+
+	__tlb_switch_to_host(kvm, &cxt);
+}
+
 void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm)
 {
 	struct tlb_inv_context cxt;
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 8c0035cab6b6..bcc719c32921 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -63,6 +63,12 @@ static void kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa)
 	kvm_call_hyp(__kvm_tlb_flush_vmid_ipa, kvm, ipa);
 }
 
+static void kvm_tlb_flush_vmid_range(struct kvm *kvm, phys_addr_t start,
+				     phys_addr_t end)
+{
+	kvm_call_hyp(__kvm_tlb_flush_vmid_range, kvm, start, end);
+}
+
 /*
  * D-Cache management functions. They take the page table entries by
  * value, as they are flushing the cache using the kernel mapping (or
@@ -267,7 +273,6 @@ static void unmap_stage2_ptes(struct kvm *kvm, pmd_t *pmd,
 			pte_t old_pte = *pte;
 
 			kvm_set_pte(pte, __pte(0));
-			kvm_tlb_flush_vmid_ipa(kvm, addr);
 
 			/* No need to invalidate the cache for device mappings */
 			if (!kvm_is_device_pfn(pte_pfn(old_pte)))
@@ -295,7 +300,6 @@ static void unmap_stage2_pmds(struct kvm *kvm, pud_t *pud,
 				pmd_t old_pmd = *pmd;
 
 				pmd_clear(pmd);
-				kvm_tlb_flush_vmid_ipa(kvm, addr);
 
 				kvm_flush_dcache_pmd(old_pmd);
 
@@ -324,7 +328,6 @@ static void unmap_stage2_puds(struct kvm *kvm, p4d_t *p4d,
 				pud_t old_pud = *pud;
 
 				stage2_pud_clear(kvm, pud);
-				kvm_tlb_flush_vmid_ipa(kvm, addr);
 				kvm_flush_dcache_pud(old_pud);
 				put_page(virt_to_page(pud));
 			} else {
@@ -352,6 +355,8 @@ static void unmap_stage2_p4ds(struct kvm *kvm, pgd_t *pgd,
 
 	if (stage2_p4d_table_empty(kvm, start_p4d))
 		clear_stage2_pgd_entry(kvm, pgd, start_addr);
+
+	kvm_tlb_flush_vmid_range(kvm, start_addr, end);
 }
 
 /**
-- 
2.19.1


_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RESEND RFC PATCH v1] arm64: kvm: flush tlbs by range in unmap_stage2_range function
  2020-07-24 13:43 [RESEND RFC PATCH v1] arm64: kvm: flush tlbs by range in unmap_stage2_range function Zhenyu Ye
@ 2020-07-25 17:40 ` Marc Zyngier
  2020-07-27 14:51   ` Zhenyu Ye
  0 siblings, 1 reply; 4+ messages in thread
From: Marc Zyngier @ 2020-07-25 17:40 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: linux-arch, kvm, catalin.marinas, linux-kernel, steven.price,
	linux-mm, arm, will, kvmarm, linux-arm-kernel

On 2020-07-24 14:43, Zhenyu Ye wrote:
> Now in unmap_stage2_range(), we flush tlbs one by one just after the
> corresponding pages cleared.  However, this may cause some performance
> problems when the unmap range is very large (such as when the vm
> migration rollback, this may cause vm downtime too loog).

You keep resending this patch, but you don't give any numbers
that would back your assertion.

> This patch moves the kvm_tlb_flush_vmid_ipa() out of loop, and
> flush tlbs by range after other operations completed.  Because we
> do not make new mapping for the pages here, so this doesn't violate
> the Break-Before-Make rules.
> 
> Signed-off-by: Zhenyu Ye <yezhenyu2@huawei.com>
> ---
>  arch/arm64/include/asm/kvm_asm.h |  2 ++
>  arch/arm64/kvm/hyp/tlb.c         | 36 ++++++++++++++++++++++++++++++++
>  arch/arm64/kvm/mmu.c             | 11 +++++++---
>  3 files changed, 46 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index 352aaebf4198..ef8203d3ca45 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -61,6 +61,8 @@ extern char __kvm_hyp_vector[];
> 
>  extern void __kvm_flush_vm_context(void);
>  extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t 
> ipa);
> +extern void __kvm_tlb_flush_vmid_range(struct kvm *kvm, phys_addr_t 
> start,
> +				       phys_addr_t end);
>  extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
>  extern void __kvm_tlb_flush_local_vmid(struct kvm_vcpu *vcpu);
> 
> diff --git a/arch/arm64/kvm/hyp/tlb.c b/arch/arm64/kvm/hyp/tlb.c
> index d063a576d511..4f4737a7e588 100644
> --- a/arch/arm64/kvm/hyp/tlb.c
> +++ b/arch/arm64/kvm/hyp/tlb.c
> @@ -189,6 +189,42 @@ void __hyp_text __kvm_tlb_flush_vmid_ipa(struct
> kvm *kvm, phys_addr_t ipa)
>  	__tlb_switch_to_host(kvm, &cxt);
>  }
> 
> +void __hyp_text __kvm_tlb_flush_vmid_range(struct kvm *kvm, 
> phys_addr_t start,
> +					   phys_addr_t end)
> +{
> +	struct tlb_inv_context cxt;
> +	unsigned long addr;
> +
> +	start = __TLBI_VADDR(start, 0);
> +	end = __TLBI_VADDR(end, 0);
> +
> +	dsb(ishst);
> +
> +	/* Switch to requested VMID */
> +	kvm = kern_hyp_va(kvm);
> +	__tlb_switch_to_guest(kvm, &cxt);
> +
> +	if ((end - start) >= 512 << (PAGE_SHIFT - 12)) {
> +		__tlbi(vmalls12e1is);

And what is this magic value based on? You don't even mention in the
commit log that you are taking this shortcut.

> +		goto end;
> +	}
> +
> +	for (addr = start; addr < end; addr += 1 << (PAGE_SHIFT - 12))
> +		__tlbi(ipas2e1is, addr);
> +
> +	dsb(ish);
> +	__tlbi(vmalle1is);
> +
> +end:
> +	dsb(ish);
> +	isb();
> +
> +	if (!has_vhe() && icache_is_vpipt())
> +		__flush_icache_all();
> +
> +	__tlb_switch_to_host(kvm, &cxt);
> +}
> +

I'm sorry, but without numbers backing this approach for a number
of workloads and a representative set of platforms, this is
going nowhere.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RESEND RFC PATCH v1] arm64: kvm: flush tlbs by range in unmap_stage2_range function
  2020-07-25 17:40 ` Marc Zyngier
@ 2020-07-27 14:51   ` Zhenyu Ye
  2020-07-27 17:12     ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Zhenyu Ye @ 2020-07-27 14:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arch, kvm, catalin.marinas, linux-kernel, steven.price,
	linux-mm, arm, will, kvmarm, linux-arm-kernel

Hi Marc,

On 2020/7/26 1:40, Marc Zyngier wrote:
> On 2020-07-24 14:43, Zhenyu Ye wrote:
>> Now in unmap_stage2_range(), we flush tlbs one by one just after the
>> corresponding pages cleared.  However, this may cause some performance
>> problems when the unmap range is very large (such as when the vm
>> migration rollback, this may cause vm downtime too loog).
> 
> You keep resending this patch, but you don't give any numbers
> that would back your assertion.

I have tested the downtime of vm migration rollback on arm64, and found
the downtime could even take up to 7s.  Then I traced the cost of
unmap_stage2_range() and found it could take a maximum of 1.2s.  The
vm configuration is as follows (with high memory pressure, the dirty
rate is about 500MB/s):

  <memory unit='GiB'>192</memory>
  <vcpu placement='static'>48</vcpu>
  <memoryBacking>
    <hugepages>
      <page size='1' unit='GiB' nodeset='0'/>
    </hugepages>
  </memoryBacking>

After this patch applied, the cost of unmap_stage2_range() can reduce to
16ms, and VM downtime can be less than 1s.

The following figure shows a clear comparison:

	      |	vm downtime  |	cost of unmap_stage2_range()
--------------+--------------+----------------------------------
before change |		7s   |		1200 ms
after  change |		1s   |		  16 ms
--------------+--------------+----------------------------------

>> +
>> +    if ((end - start) >= 512 << (PAGE_SHIFT - 12)) {
>> +        __tlbi(vmalls12e1is);
> 
> And what is this magic value based on? You don't even mention in the
> commit log that you are taking this shortcut.
> 


If the page num is bigger than 512, flush all tlbs of this vm to avoid
soft lock-ups on large TLB flushing ranges.  Just like what the
flush_tlb_range() does.


Thanks,
Zhenyu

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RESEND RFC PATCH v1] arm64: kvm: flush tlbs by range in unmap_stage2_range function
  2020-07-27 14:51   ` Zhenyu Ye
@ 2020-07-27 17:12     ` Marc Zyngier
  0 siblings, 0 replies; 4+ messages in thread
From: Marc Zyngier @ 2020-07-27 17:12 UTC (permalink / raw)
  To: Zhenyu Ye
  Cc: linux-arch, kvm, catalin.marinas, linux-kernel, steven.price,
	linux-mm, arm, will, kvmarm, linux-arm-kernel

Zhenyu,

On 2020-07-27 15:51, Zhenyu Ye wrote:
> Hi Marc,
> 
> On 2020/7/26 1:40, Marc Zyngier wrote:
>> On 2020-07-24 14:43, Zhenyu Ye wrote:
>>> Now in unmap_stage2_range(), we flush tlbs one by one just after the
>>> corresponding pages cleared.  However, this may cause some 
>>> performance
>>> problems when the unmap range is very large (such as when the vm
>>> migration rollback, this may cause vm downtime too loog).
>> 
>> You keep resending this patch, but you don't give any numbers
>> that would back your assertion.
> 
> I have tested the downtime of vm migration rollback on arm64, and found
> the downtime could even take up to 7s.  Then I traced the cost of
> unmap_stage2_range() and found it could take a maximum of 1.2s.  The
> vm configuration is as follows (with high memory pressure, the dirty
> rate is about 500MB/s):
> 
>   <memory unit='GiB'>192</memory>
>   <vcpu placement='static'>48</vcpu>
>   <memoryBacking>
>     <hugepages>
>       <page size='1' unit='GiB' nodeset='0'/>
>     </hugepages>
>   </memoryBacking>

This means nothing to me, I'm afraid.

> 
> After this patch applied, the cost of unmap_stage2_range() can reduce 
> to
> 16ms, and VM downtime can be less than 1s.
> 
> The following figure shows a clear comparison:
> 
> 	      |	vm downtime  |	cost of unmap_stage2_range()
> --------------+--------------+----------------------------------
> before change |		7s   |		1200 ms
> after  change |		1s   |		  16 ms
> --------------+--------------+----------------------------------

I don't see how you turn a 1.184s reduction into a 6s gain.
Surely there is more to it than what you posted.

>>> +
>>> +    if ((end - start) >= 512 << (PAGE_SHIFT - 12)) {
>>> +        __tlbi(vmalls12e1is);
>> 
>> And what is this magic value based on? You don't even mention in the
>> commit log that you are taking this shortcut.
>> 
> 
> 
> If the page num is bigger than 512, flush all tlbs of this vm to avoid
> soft lock-ups on large TLB flushing ranges.  Just like what the
> flush_tlb_range() does.

I'm not sure this is applicable here, and it doesn't mean
this is as good on other systems.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-07-27 17:12 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-24 13:43 [RESEND RFC PATCH v1] arm64: kvm: flush tlbs by range in unmap_stage2_range function Zhenyu Ye
2020-07-25 17:40 ` Marc Zyngier
2020-07-27 14:51   ` Zhenyu Ye
2020-07-27 17:12     ` Marc Zyngier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).