linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] KVM: arm/arm64: Skip updating page table entry if no change
@ 2018-08-10 11:13 Punit Agrawal
  2018-08-10 11:27 ` Suzuki K Poulose
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Punit Agrawal @ 2018-08-10 11:13 UTC (permalink / raw)
  To: kvmarm
  Cc: linux-arm-kernel, linux-kernel, Punit Agrawal, Marc Zyngier,
	Christoffer Dall, Suzuki Poulose

Contention on updating a page table entry by a large number of vcpus
can lead to duplicate work when handling stage 2 page faults. As the
page table update follows the break-before-make requirement of the
architecture, it can lead to repeated refaults due to clearing the
entry and flushing the tlbs.

This problem is more likely when -

* there are large number of vcpus
* the mapping is large block mapping

such as when using PMD hugepages (512MB) with 64k pages.

Fix this by skipping the page table update if there is no change in
the entry being updated.

Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@arm.com>
Cc: Suzuki Poulose <suzuki.poulose@arm.com>
--
Hi,

This problem was reported by a user when testing PUD hugepages. During
VM restore when all threads are running cpu intensive workload, the
refauting was causing the VM to not make any forward progress.

This patch fixes the problem for PMD and PTE page fault handling.

Thanks,
Punit

Change-Id: I04c9aa8b9fbada47deb1a171c9959f400a0d2a21
---
 virt/kvm/arm/mmu.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 1d90d79706bd..a66a5441ca2f 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1027,6 +1027,18 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
 	VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd));
 
 	old_pmd = *pmd;
+	/*
+	 * Multiple vcpus faulting on the same PMD entry, can lead to
+	 * them sequentially updating the PMD with the same
+	 * value. Following the break-before-make (pmd_clear()
+	 * followed by tlb_flush()) process can hinder forward
+	 * progress due to refaults generated on missing translations.
+	 *
+	 * Skip updating the page table if the entry is unchanged.
+	 */
+	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
+		return 0;
+
 	if (pmd_present(old_pmd)) {
 		pmd_clear(pmd);
 		kvm_tlb_flush_vmid_ipa(kvm, addr);
@@ -1101,6 +1113,10 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
 
 	/* Create 2nd stage page table mapping - Level 3 */
 	old_pte = *pte;
+	/* Skip page table update if there is no change */
+	if (pte_val(old_pte) == pte_val(*new_pte))
+		return 0;
+
 	if (pte_present(old_pte)) {
 		kvm_set_pte(pte, __pte(0));
 		kvm_tlb_flush_vmid_ipa(kvm, addr);
-- 
2.18.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: arm/arm64: Skip updating page table entry if no change
  2018-08-10 11:13 [PATCH] KVM: arm/arm64: Skip updating page table entry if no change Punit Agrawal
@ 2018-08-10 11:27 ` Suzuki K Poulose
  2018-08-10 14:40 ` Marc Zyngier
  2018-08-13 14:27 ` Christoffer Dall
  2 siblings, 0 replies; 5+ messages in thread
From: Suzuki K Poulose @ 2018-08-10 11:27 UTC (permalink / raw)
  To: Punit Agrawal, kvmarm
  Cc: linux-arm-kernel, linux-kernel, Marc Zyngier, Christoffer Dall

On 08/10/2018 12:13 PM, Punit Agrawal wrote:
> Contention on updating a page table entry by a large number of vcpus
> can lead to duplicate work when handling stage 2 page faults. As the
> page table update follows the break-before-make requirement of the
> architecture, it can lead to repeated refaults due to clearing the
> entry and flushing the tlbs.
> 
> This problem is more likely when -
> 
> * there are large number of vcpus
> * the mapping is large block mapping
> 
> such as when using PMD hugepages (512MB) with 64k pages.
> 
> Fix this by skipping the page table update if there is no change in
> the entry being updated.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <christoffer.dall@arm.com>
> Cc: Suzuki Poulose <suzuki.poulose@arm.com>
> --


>   virt/kvm/arm/mmu.c | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1d90d79706bd..a66a5441ca2f 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1027,6 +1027,18 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>   	VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd));
>   
>   	old_pmd = *pmd;
> +	/*
> +	 * Multiple vcpus faulting on the same PMD entry, can lead to
> +	 * them sequentially updating the PMD with the same
> +	 * value. Following the break-before-make (pmd_clear()
> +	 * followed by tlb_flush()) process can hinder forward
> +	 * progress due to refaults generated on missing translations.
> +	 *
> +	 * Skip updating the page table if the entry is unchanged.
> +	 */
> +	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> +		return 0;
> +
>   	if (pmd_present(old_pmd)) {
>   		pmd_clear(pmd);
>   		kvm_tlb_flush_vmid_ipa(kvm, addr);
> @@ -1101,6 +1113,10 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>   
>   	/* Create 2nd stage page table mapping - Level 3 */
>   	old_pte = *pte;
> +	/* Skip page table update if there is no change */
> +	if (pte_val(old_pte) == pte_val(*new_pte))
> +		return 0;
> +
>   	if (pte_present(old_pte)) {
>   		kvm_set_pte(pte, __pte(0));
>   		kvm_tlb_flush_vmid_ipa(kvm, addr);
> 

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: arm/arm64: Skip updating page table entry if no change
  2018-08-10 11:13 [PATCH] KVM: arm/arm64: Skip updating page table entry if no change Punit Agrawal
  2018-08-10 11:27 ` Suzuki K Poulose
@ 2018-08-10 14:40 ` Marc Zyngier
  2018-08-10 18:51   ` Punit Agrawal
  2018-08-13 14:27 ` Christoffer Dall
  2 siblings, 1 reply; 5+ messages in thread
From: Marc Zyngier @ 2018-08-10 14:40 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: kvmarm, linux-arm-kernel, linux-kernel, Christoffer Dall, Suzuki Poulose

Hi Punit,

On Fri, 10 Aug 2018 12:13:00 +0100,
Punit Agrawal <punit.agrawal@arm.com> wrote:
> 
> Contention on updating a page table entry by a large number of vcpus
> can lead to duplicate work when handling stage 2 page faults. As the
> page table update follows the break-before-make requirement of the
> architecture, it can lead to repeated refaults due to clearing the
> entry and flushing the tlbs.
> 
> This problem is more likely when -
> 
> * there are large number of vcpus
> * the mapping is large block mapping
>  
> such as when using PMD hugepages (512MB) with 64k pages.
> 
> Fix this by skipping the page table update if there is no change in
> the entry being updated.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <christoffer.dall@arm.com>
> Cc: Suzuki Poulose <suzuki.poulose@arm.com>

This definitely deserves a Cc to stable, and a Fixes: tag.

> --
> Hi,
> 
> This problem was reported by a user when testing PUD hugepages. During
> VM restore when all threads are running cpu intensive workload, the
> refauting was causing the VM to not make any forward progress.
> 
> This patch fixes the problem for PMD and PTE page fault handling.
> 
> Thanks,
> Punit
> 
> Change-Id: I04c9aa8b9fbada47deb1a171c9959f400a0d2a21
> ---
>  virt/kvm/arm/mmu.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1d90d79706bd..a66a5441ca2f 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1027,6 +1027,18 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>  	VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd));
>  
>  	old_pmd = *pmd;
> +	/*
> +	 * Multiple vcpus faulting on the same PMD entry, can lead to
> +	 * them sequentially updating the PMD with the same
> +	 * value. Following the break-before-make (pmd_clear()
> +	 * followed by tlb_flush()) process can hinder forward
> +	 * progress due to refaults generated on missing translations.
> +	 *
> +	 * Skip updating the page table if the entry is unchanged.
> +	 */
> +	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> +		return 0;

Shouldn't you take this opportunity to also refactor it with the above
VM_BUG_ON and the below pmd_present? At the moment, we end-up testing
pmd_present twice, and your patch is awfully similar to the VM_BUG_ON
one.

> +
>  	if (pmd_present(old_pmd)) {
>  		pmd_clear(pmd);
>  		kvm_tlb_flush_vmid_ipa(kvm, addr);
> @@ -1101,6 +1113,10 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  
>  	/* Create 2nd stage page table mapping - Level 3 */
>  	old_pte = *pte;
> +	/* Skip page table update if there is no change */
> +	if (pte_val(old_pte) == pte_val(*new_pte))
> +		return 0;
> +
>  	if (pte_present(old_pte)) {
>  		kvm_set_pte(pte, __pte(0));
>  		kvm_tlb_flush_vmid_ipa(kvm, addr);

Thanks,

	M.

-- 
Jazz is not dead, it just smell funny.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: arm/arm64: Skip updating page table entry if no change
  2018-08-10 14:40 ` Marc Zyngier
@ 2018-08-10 18:51   ` Punit Agrawal
  0 siblings, 0 replies; 5+ messages in thread
From: Punit Agrawal @ 2018-08-10 18:51 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, linux-arm-kernel, linux-kernel

Marc Zyngier <marc.zyngier@arm.com> writes:

> Hi Punit,
>
> On Fri, 10 Aug 2018 12:13:00 +0100,
> Punit Agrawal <punit.agrawal@arm.com> wrote:
>> 
>> Contention on updating a page table entry by a large number of vcpus
>> can lead to duplicate work when handling stage 2 page faults. As the
>> page table update follows the break-before-make requirement of the
>> architecture, it can lead to repeated refaults due to clearing the
>> entry and flushing the tlbs.
>> 
>> This problem is more likely when -
>> 
>> * there are large number of vcpus
>> * the mapping is large block mapping
>>  
>> such as when using PMD hugepages (512MB) with 64k pages.
>> 
>> Fix this by skipping the page table update if there is no change in
>> the entry being updated.
>> 
>> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
>> Cc: Marc Zyngier <marc.zyngier@arm.com>
>> Cc: Christoffer Dall <christoffer.dall@arm.com>
>> Cc: Suzuki Poulose <suzuki.poulose@arm.com>
>
> This definitely deserves a Cc to stable, and a Fixes: tag.

Agreed.

For PMD the issue exists since commit ad361f093c1e ("KVM: ARM: Support
hugetlbfs backed huge pages") when the file lived in arch/arm/kvm. (v3.12+)

For PTE the issue exists since commit d5d8184d35c9 ("KVM: ARM: Memory
virtualization setup"). (v3.8+)

I'll split the fix into two patches and add a cc to stable.

>> --
>> Hi,
>> 
>> This problem was reported by a user when testing PUD hugepages. During
>> VM restore when all threads are running cpu intensive workload, the
>> refauting was causing the VM to not make any forward progress.
>> 
>> This patch fixes the problem for PMD and PTE page fault handling.
>> 
>> Thanks,
>> Punit
>> 
>> Change-Id: I04c9aa8b9fbada47deb1a171c9959f400a0d2a21

Just noticed this. Looks like the commit hook has escaped it's worktree.

>> ---
>>  virt/kvm/arm/mmu.c | 16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
>> 
>> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
>> index 1d90d79706bd..a66a5441ca2f 100644
>> --- a/virt/kvm/arm/mmu.c
>> +++ b/virt/kvm/arm/mmu.c
>> @@ -1027,6 +1027,18 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>>  	VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd));
>>  
>>  	old_pmd = *pmd;
>> +	/*
>> +	 * Multiple vcpus faulting on the same PMD entry, can lead to
>> +	 * them sequentially updating the PMD with the same
>> +	 * value. Following the break-before-make (pmd_clear()
>> +	 * followed by tlb_flush()) process can hinder forward
>> +	 * progress due to refaults generated on missing translations.
>> +	 *
>> +	 * Skip updating the page table if the entry is unchanged.
>> +	 */
>> +	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
>> +		return 0;
>
> Shouldn't you take this opportunity to also refactor it with the above
> VM_BUG_ON and the below pmd_present? At the moment, we end-up testing
> pmd_present twice, and your patch is awfully similar to the VM_BUG_ON
> one.

I went for the minimal change keeping a backport in mind.

The VM_BUG_ON() is enabled only when CONFIG_DEBUG_VM is selected and
checks for pfn ignoring the attributes, while this fix checks for the
entire entry. Maybe I'm missing something, but I can't see an obvious
way to combine the checks.

I've eliminated the duplicate pmd_present() check and will post the
updated patches.

Thanks,
Punit

>> +
>>  	if (pmd_present(old_pmd)) {
>>  		pmd_clear(pmd);
>>  		kvm_tlb_flush_vmid_ipa(kvm, addr);
>> @@ -1101,6 +1113,10 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>>  
>>  	/* Create 2nd stage page table mapping - Level 3 */
>>  	old_pte = *pte;
>> +	/* Skip page table update if there is no change */
>> +	if (pte_val(old_pte) == pte_val(*new_pte))
>> +		return 0;
>> +
>>  	if (pte_present(old_pte)) {
>>  		kvm_set_pte(pte, __pte(0));
>>  		kvm_tlb_flush_vmid_ipa(kvm, addr);
>
> Thanks,
>
> 	M.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] KVM: arm/arm64: Skip updating page table entry if no change
  2018-08-10 11:13 [PATCH] KVM: arm/arm64: Skip updating page table entry if no change Punit Agrawal
  2018-08-10 11:27 ` Suzuki K Poulose
  2018-08-10 14:40 ` Marc Zyngier
@ 2018-08-13 14:27 ` Christoffer Dall
  2 siblings, 0 replies; 5+ messages in thread
From: Christoffer Dall @ 2018-08-13 14:27 UTC (permalink / raw)
  To: Punit Agrawal
  Cc: kvmarm, linux-arm-kernel, linux-kernel, Marc Zyngier, Suzuki Poulose

On Fri, Aug 10, 2018 at 12:13:00PM +0100, Punit Agrawal wrote:
> Contention on updating a page table entry by a large number of vcpus
> can lead to duplicate work when handling stage 2 page faults. As the
> page table update follows the break-before-make requirement of the
> architecture, it can lead to repeated refaults due to clearing the
> entry and flushing the tlbs.
> 
> This problem is more likely when -
> 
> * there are large number of vcpus
> * the mapping is large block mapping
> 
> such as when using PMD hugepages (512MB) with 64k pages.
> 
> Fix this by skipping the page table update if there is no change in
> the entry being updated.
> 
> Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Cc: Christoffer Dall <christoffer.dall@arm.com>
> Cc: Suzuki Poulose <suzuki.poulose@arm.com>
> --
> Hi,
> 
> This problem was reported by a user when testing PUD hugepages. During
> VM restore when all threads are running cpu intensive workload, the
> refauting was causing the VM to not make any forward progress.
> 
> This patch fixes the problem for PMD and PTE page fault handling.
> 
> Thanks,
> Punit
> 
> Change-Id: I04c9aa8b9fbada47deb1a171c9959f400a0d2a21
> ---
>  virt/kvm/arm/mmu.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
> index 1d90d79706bd..a66a5441ca2f 100644
> --- a/virt/kvm/arm/mmu.c
> +++ b/virt/kvm/arm/mmu.c
> @@ -1027,6 +1027,18 @@ static int stage2_set_pmd_huge(struct kvm *kvm, struct kvm_mmu_memory_cache
>  	VM_BUG_ON(pmd_present(*pmd) && pmd_pfn(*pmd) != pmd_pfn(*new_pmd));
>  
>  	old_pmd = *pmd;
> +	/*
> +	 * Multiple vcpus faulting on the same PMD entry, can lead to
> +	 * them sequentially updating the PMD with the same
> +	 * value. Following the break-before-make (pmd_clear()
> +	 * followed by tlb_flush()) process can hinder forward
> +	 * progress due to refaults generated on missing translations.
> +	 *
> +	 * Skip updating the page table if the entry is unchanged.
> +	 */
> +	if (pmd_val(old_pmd) == pmd_val(*new_pmd))
> +		return 0;
> +
>  	if (pmd_present(old_pmd)) {
>  		pmd_clear(pmd);
>  		kvm_tlb_flush_vmid_ipa(kvm, addr);
> @@ -1101,6 +1113,10 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
>  
>  	/* Create 2nd stage page table mapping - Level 3 */
>  	old_pte = *pte;
> +	/* Skip page table update if there is no change */
> +	if (pte_val(old_pte) == pte_val(*new_pte))
> +		return 0;
> +
>  	if (pte_present(old_pte)) {
>  		kvm_set_pte(pte, __pte(0));
>  		kvm_tlb_flush_vmid_ipa(kvm, addr);
> -- 
> 2.18.0
> 

Acked-by: Christoffer Dall <christoffer.dall@arm.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-08-13 14:27 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-10 11:13 [PATCH] KVM: arm/arm64: Skip updating page table entry if no change Punit Agrawal
2018-08-10 11:27 ` Suzuki K Poulose
2018-08-10 14:40 ` Marc Zyngier
2018-08-10 18:51   ` Punit Agrawal
2018-08-13 14:27 ` Christoffer Dall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).