All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] arm/arm64: KVM: Add support for page aging
@ 2015-01-21 18:42 Marc Zyngier
  2015-01-21 18:42 ` [PATCH 1/3] arm/arm64: KVM: Allow handle_hva_to_gpa to return a value Marc Zyngier
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Marc Zyngier @ 2015-01-21 18:42 UTC (permalink / raw)
  To: Christoffer Dall, kvm, kvmarm; +Cc: Steve Capper

So far, KVM/arm doesn't implement any support for page aging, leading
to rather bad performance when the system is swapping. This short
series implements the required hooks and fault handling to deal with
pages being marked old/young.

The three patches are fairly straightforward:

- First patch changes the range iterator to be able to return a value

- Second patch implements the actual page aging (clearing the AF bit
  in the page tables, and relying on the normal faulting code to set
  the bit again).

- Last patch optimizes the access fault path by only doing the minimum
  to satisfy the fault.

The end result is a system that behaves visibly better under load, as
VM pages don't get evicted that easily.

Based on 3.19-rc5, tested on Seattle and X-Gene.

Also at git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git kvm-arm64/mm-fixes-3.19

Marc Zyngier (3):
  arm/arm64: KVM: Allow handle_hva_to_gpa to return a value
  arm/arm64: KVM: Implement Stage-2 page aging
  arm/arm64: KVM: Optimize handling of Access Flag faults

 arch/arm/include/asm/kvm_host.h   |  13 +---
 arch/arm/kvm/mmu.c                | 128 +++++++++++++++++++++++++++++++++++---
 arch/arm/kvm/trace.h              |  48 ++++++++++++++
 arch/arm64/include/asm/kvm_arm.h  |   1 +
 arch/arm64/include/asm/kvm_host.h |  13 +---
 5 files changed, 171 insertions(+), 32 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] arm/arm64: KVM: Allow handle_hva_to_gpa to return a value
  2015-01-21 18:42 [PATCH 0/3] arm/arm64: KVM: Add support for page aging Marc Zyngier
@ 2015-01-21 18:42 ` Marc Zyngier
  2015-03-12 11:40   ` Christoffer Dall
  2015-01-21 18:42 ` [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging Marc Zyngier
  2015-01-21 18:42 ` [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults Marc Zyngier
  2 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2015-01-21 18:42 UTC (permalink / raw)
  To: Christoffer Dall, kvm, kvmarm; +Cc: Steve Capper

So far, handle_hva_to_gpa was never required to return a value.
As we prepare to age pages at Stage-2, we need to be able to
return a value from the iterator (kvm_test_age_hva).

Adapt the code to handle this situation. No semantic change.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/mmu.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 1366625..e163a45 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1146,15 +1146,16 @@ out_unlock:
 	return ret;
 }
 
-static void handle_hva_to_gpa(struct kvm *kvm,
-			      unsigned long start,
-			      unsigned long end,
-			      void (*handler)(struct kvm *kvm,
-					      gpa_t gpa, void *data),
-			      void *data)
+static int handle_hva_to_gpa(struct kvm *kvm,
+			     unsigned long start,
+			     unsigned long end,
+			     int (*handler)(struct kvm *kvm,
+					    gpa_t gpa, void *data),
+			     void *data)
 {
 	struct kvm_memslots *slots;
 	struct kvm_memory_slot *memslot;
+	int ret = 0;
 
 	slots = kvm_memslots(kvm);
 
@@ -1178,14 +1179,17 @@ static void handle_hva_to_gpa(struct kvm *kvm,
 
 		for (; gfn < gfn_end; ++gfn) {
 			gpa_t gpa = gfn << PAGE_SHIFT;
-			handler(kvm, gpa, data);
+			ret |= handler(kvm, gpa, data);
 		}
 	}
+
+	return ret;
 }
 
-static void kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
+static int kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
 {
 	unmap_stage2_range(kvm, gpa, PAGE_SIZE);
+	return 0;
 }
 
 int kvm_unmap_hva(struct kvm *kvm, unsigned long hva)
@@ -1211,11 +1215,12 @@ int kvm_unmap_hva_range(struct kvm *kvm,
 	return 0;
 }
 
-static void kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
+static int kvm_set_spte_handler(struct kvm *kvm, gpa_t gpa, void *data)
 {
 	pte_t *pte = (pte_t *)data;
 
 	stage2_set_pte(kvm, NULL, gpa, pte, false);
+	return 0;
 }
 
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging
  2015-01-21 18:42 [PATCH 0/3] arm/arm64: KVM: Add support for page aging Marc Zyngier
  2015-01-21 18:42 ` [PATCH 1/3] arm/arm64: KVM: Allow handle_hva_to_gpa to return a value Marc Zyngier
@ 2015-01-21 18:42 ` Marc Zyngier
  2015-03-12 11:40   ` Christoffer Dall
  2015-01-21 18:42 ` [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults Marc Zyngier
  2 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2015-01-21 18:42 UTC (permalink / raw)
  To: Christoffer Dall, kvm, kvmarm; +Cc: Steve Capper

Until now, KVM/arm didn't care much for page aging (who was swapping
anyway?), and simply provided empty hooks to the core KVM code. With
server-type systems now being available, things are quite different.

This patch implements very simple support for page aging, by clearing
the Access flag in the Stage-2 page tables. On access fault, the current
fault handling will write the PTE or PMD again, putting the Access flag
back on.

It should be possible to implement a much faster handling for Access
faults, but that's left for a later patch.

With this in place, performance in VMs is degraded much more gracefully.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_host.h   | 13 ++-------
 arch/arm/kvm/mmu.c                | 59 ++++++++++++++++++++++++++++++++++++++-
 arch/arm/kvm/trace.h              | 33 ++++++++++++++++++++++
 arch/arm64/include/asm/kvm_arm.h  |  1 +
 arch/arm64/include/asm/kvm_host.h | 13 ++-------
 5 files changed, 96 insertions(+), 23 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 04b4ea0..d6b5b85 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -163,19 +163,10 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
 
 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 
 /* We do not have shadow page tables, hence the empty hooks */
-static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
-			      unsigned long end)
-{
-	return 0;
-}
-
-static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
-{
-	return 0;
-}
-
 static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
 							 unsigned long address)
 {
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index e163a45..ffe89a0 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1068,6 +1068,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 
 out_unlock:
 	spin_unlock(&kvm->mmu_lock);
+	kvm_set_pfn_accessed(pfn);
 	kvm_release_pfn_clean(pfn);
 	return ret;
 }
@@ -1102,7 +1103,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	/* Check the stage-2 fault is trans. fault or write fault */
 	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
-	if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
+	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
+	    fault_status != FSC_ACCESS) {
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
@@ -1237,6 +1239,61 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
 	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
 }
 
+static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+
+	pmd = stage2_get_pmd(kvm, NULL, gpa);
+	if (!pmd || pmd_none(*pmd))	/* Nothing there */
+		return 0;
+
+	if (kvm_pmd_huge(*pmd)) {	/* THP, HugeTLB */
+		*pmd = pmd_mkold(*pmd);
+		goto tlbi;
+	}
+
+	pte = pte_offset_kernel(pmd, gpa);
+	if (pte_none(*pte))
+		return 0;
+
+	*pte = pte_mkold(*pte);		/* Just a page... */
+tlbi:
+	kvm_tlb_flush_vmid_ipa(kvm, gpa);
+	return 1;
+}
+
+static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+
+	pmd = stage2_get_pmd(kvm, NULL, gpa);
+	if (!pmd || pmd_none(*pmd))	/* Nothing there */
+		return 0;
+
+	if (kvm_pmd_huge(*pmd))		/* THP, HugeTLB */
+		return pmd_young(*pmd);
+
+	pte = pte_offset_kernel(pmd, gpa);
+	if (!pte_none(*pte))		/* Just a page... */
+		return pte_young(*pte);
+
+	return 0;
+}
+
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
+{
+	trace_kvm_age_hva(start, end);
+	return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
+}
+
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+	trace_kvm_test_age_hva(hva);
+	return handle_hva_to_gpa(kvm, hva, hva, kvm_test_age_hva_handler, NULL);
+}
+
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index b6a6e71..364b5382 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -203,6 +203,39 @@ TRACE_EVENT(kvm_set_spte_hva,
 	TP_printk("mmu notifier set pte hva: %#08lx", __entry->hva)
 );
 
+TRACE_EVENT(kvm_age_hva,
+	TP_PROTO(unsigned long start, unsigned long end),
+	TP_ARGS(start, end),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	start		)
+		__field(	unsigned long,	end		)
+	),
+
+	TP_fast_assign(
+		__entry->start		= start;
+		__entry->end		= end;
+	),
+
+	TP_printk("mmu notifier age hva: %#08lx -- %#08lx",
+		  __entry->start, __entry->end)
+);
+
+TRACE_EVENT(kvm_test_age_hva,
+	TP_PROTO(unsigned long hva),
+	TP_ARGS(hva),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	hva		)
+	),
+
+	TP_fast_assign(
+		__entry->hva		= hva;
+	),
+
+	TP_printk("mmu notifier test age hva: %#08lx", __entry->hva)
+);
+
 TRACE_EVENT(kvm_hvc,
 	TP_PROTO(unsigned long vcpu_pc, unsigned long r0, unsigned long imm),
 	TP_ARGS(vcpu_pc, r0, imm),
diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
index 8afb863..0d738f2 100644
--- a/arch/arm64/include/asm/kvm_arm.h
+++ b/arch/arm64/include/asm/kvm_arm.h
@@ -212,6 +212,7 @@
 
 
 #define FSC_FAULT	(0x04)
+#define FSC_ACCESS	(0x08)
 #define FSC_PERM	(0x0c)
 
 /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index acd101a..b831710 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -173,19 +173,10 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
 int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
+int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
 
 /* We do not have shadow page tables, hence the empty hooks */
-static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
-			      unsigned long end)
-{
-	return 0;
-}
-
-static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
-{
-	return 0;
-}
-
 static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
 							 unsigned long address)
 {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults
  2015-01-21 18:42 [PATCH 0/3] arm/arm64: KVM: Add support for page aging Marc Zyngier
  2015-01-21 18:42 ` [PATCH 1/3] arm/arm64: KVM: Allow handle_hva_to_gpa to return a value Marc Zyngier
  2015-01-21 18:42 ` [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging Marc Zyngier
@ 2015-01-21 18:42 ` Marc Zyngier
  2015-03-12 11:40   ` Christoffer Dall
  2 siblings, 1 reply; 10+ messages in thread
From: Marc Zyngier @ 2015-01-21 18:42 UTC (permalink / raw)
  To: Christoffer Dall, kvm, kvmarm; +Cc: Steve Capper

Now that we have page aging in Stage-2, it becomes obvious that
we're doing way too much work handling the fault.

The page is not going anywhere (it is still mapped), the page
tables are already allocated, and all we want is to flip a bit
in the PMD or PTE. Also, we can avoid any form of TLB invalidation,
since a page with the AF bit off is not allowed to be cached.

An obvious solution is to have a separate handler for FSC_ACCESS,
where we pride ourselves to only do the very minimum amount of
work.

Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/kvm/mmu.c   | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 arch/arm/kvm/trace.h | 15 +++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index ffe89a0..112bae1 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -1073,6 +1073,46 @@ out_unlock:
 	return ret;
 }
 
+/*
+ * Resolve the access fault by making the page young again.
+ * Note that because the faulting entry is guaranteed not to be
+ * cached in the TLB, we don't need to invalidate anything.
+ */
+static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
+{
+	pmd_t *pmd;
+	pte_t *pte;
+	pfn_t pfn;
+	bool pfn_valid = false;
+
+	trace_kvm_access_fault(fault_ipa);
+
+	spin_lock(&vcpu->kvm->mmu_lock);
+
+	pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
+	if (!pmd || pmd_none(*pmd))	/* Nothing there */
+		goto out;
+
+	if (kvm_pmd_huge(*pmd)) {	/* THP, HugeTLB */
+		*pmd = pmd_mkyoung(*pmd);
+		pfn = pmd_pfn(*pmd);
+		pfn_valid = true;
+		goto out;
+	}
+
+	pte = pte_offset_kernel(pmd, fault_ipa);
+	if (pte_none(*pte))		/* Nothing there either */
+		goto out;
+
+	*pte = pte_mkyoung(*pte);	/* Just a page... */
+	pfn = pte_pfn(*pte);
+	pfn_valid = true;
+out:
+	spin_unlock(&vcpu->kvm->mmu_lock);
+	if (pfn_valid)
+		kvm_set_pfn_accessed(pfn);
+}
+
 /**
  * kvm_handle_guest_abort - handles all 2nd stage aborts
  * @vcpu:	the VCPU pointer
@@ -1140,6 +1180,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	/* Userspace should not be able to register out-of-bounds IPAs */
 	VM_BUG_ON(fault_ipa >= KVM_PHYS_SIZE);
 
+	if (fault_status == FSC_ACCESS) {
+		handle_access_fault(vcpu, fault_ipa);
+		ret = 1;
+		goto out_unlock;
+	}
+
 	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
 	if (ret == 0)
 		ret = 1;
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 364b5382..5665a16 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -64,6 +64,21 @@ TRACE_EVENT(kvm_guest_fault,
 		  __entry->hxfar, __entry->vcpu_pc)
 );
 
+TRACE_EVENT(kvm_access_fault,
+	TP_PROTO(unsigned long ipa),
+	TP_ARGS(ipa),
+
+	TP_STRUCT__entry(
+		__field(	unsigned long,	ipa		)
+	),
+
+	TP_fast_assign(
+		__entry->ipa		= ipa;
+	),
+
+	TP_printk("IPA: %lx", __entry->ipa)
+);
+
 TRACE_EVENT(kvm_irq_line,
 	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
 	TP_ARGS(type, vcpu_idx, irq_num, level),
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] arm/arm64: KVM: Allow handle_hva_to_gpa to return a value
  2015-01-21 18:42 ` [PATCH 1/3] arm/arm64: KVM: Allow handle_hva_to_gpa to return a value Marc Zyngier
@ 2015-03-12 11:40   ` Christoffer Dall
  0 siblings, 0 replies; 10+ messages in thread
From: Christoffer Dall @ 2015-03-12 11:40 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, Steve Capper

On Wed, Jan 21, 2015 at 06:42:11PM +0000, Marc Zyngier wrote:
> So far, handle_hva_to_gpa was never required to return a value.
> As we prepare to age pages at Stage-2, we need to be able to
> return a value from the iterator (kvm_test_age_hva).
> 
> Adapt the code to handle this situation. No semantic change.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>

Acked-by: Christoffer Dall <christoffer.dall@linaro.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging
  2015-01-21 18:42 ` [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging Marc Zyngier
@ 2015-03-12 11:40   ` Christoffer Dall
  2015-03-12 14:50     ` Marc Zyngier
  0 siblings, 1 reply; 10+ messages in thread
From: Christoffer Dall @ 2015-03-12 11:40 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, Steve Capper

On Wed, Jan 21, 2015 at 06:42:12PM +0000, Marc Zyngier wrote:
> Until now, KVM/arm didn't care much for page aging (who was swapping
> anyway?), and simply provided empty hooks to the core KVM code. With
> server-type systems now being available, things are quite different.
> 
> This patch implements very simple support for page aging, by clearing
> the Access flag in the Stage-2 page tables. On access fault, the current
> fault handling will write the PTE or PMD again, putting the Access flag
> back on.
> 
> It should be possible to implement a much faster handling for Access
> faults, but that's left for a later patch.
> 
> With this in place, performance in VMs is degraded much more gracefully.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/include/asm/kvm_host.h   | 13 ++-------
>  arch/arm/kvm/mmu.c                | 59 ++++++++++++++++++++++++++++++++++++++-
>  arch/arm/kvm/trace.h              | 33 ++++++++++++++++++++++
>  arch/arm64/include/asm/kvm_arm.h  |  1 +
>  arch/arm64/include/asm/kvm_host.h | 13 ++-------
>  5 files changed, 96 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 04b4ea0..d6b5b85 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -163,19 +163,10 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
>  
>  unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
>  int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices);
> +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
> +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
>  
>  /* We do not have shadow page tables, hence the empty hooks */
> -static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> -			      unsigned long end)
> -{
> -	return 0;
> -}
> -
> -static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
> -{
> -	return 0;
> -}
> -
>  static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
>  							 unsigned long address)
>  {
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index e163a45..ffe89a0 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1068,6 +1068,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  
>  out_unlock:
>  	spin_unlock(&kvm->mmu_lock);
> +	kvm_set_pfn_accessed(pfn);
>  	kvm_release_pfn_clean(pfn);
>  	return ret;
>  }
> @@ -1102,7 +1103,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  	/* Check the stage-2 fault is trans. fault or write fault */
>  	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> -	if (fault_status != FSC_FAULT && fault_status != FSC_PERM) {
> +	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
> +	    fault_status != FSC_ACCESS) {
>  		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
>  			kvm_vcpu_trap_get_class(vcpu),
>  			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
> @@ -1237,6 +1239,61 @@ void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
>  	handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler, &stage2_pte);
>  }
>  
> +static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
> +{
> +	pmd_t *pmd;
> +	pte_t *pte;
> +
> +	pmd = stage2_get_pmd(kvm, NULL, gpa);
> +	if (!pmd || pmd_none(*pmd))	/* Nothing there */
> +		return 0;
> +
> +	if (kvm_pmd_huge(*pmd)) {	/* THP, HugeTLB */
> +		*pmd = pmd_mkold(*pmd);
> +		goto tlbi;

so in this case we'll loop over a huge pmd on a page-by-page basis,
invalidating the tlb each time, right?

Would it be worth checking of the access flag is already clear
(!pmd_young()) and in that case exit without doing tlb invalidation?

In fact, shouldn't we only return 1 if the pmd is indeed young and then
the tlb invalidation will be done once by kvm_flush_remote_tlbs() in
kvm_mmu_notifier_clear_flush_young() ?

I got a little lost looking at how the core mm code uses the return
value, but if I read the x86 and powerpc code correctly, they use it the
way I suggest.  Did I get this all wrong?

> +	}
> +
> +	pte = pte_offset_kernel(pmd, gpa);
> +	if (pte_none(*pte))
> +		return 0;
> +
> +	*pte = pte_mkold(*pte);		/* Just a page... */

same with checking if it's young or not... ?

> +tlbi:
> +	kvm_tlb_flush_vmid_ipa(kvm, gpa);
> +	return 1;
> +}
> +
> +static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa, void *data)
> +{
> +	pmd_t *pmd;
> +	pte_t *pte;
> +
> +	pmd = stage2_get_pmd(kvm, NULL, gpa);
> +	if (!pmd || pmd_none(*pmd))	/* Nothing there */
> +		return 0;
> +
> +	if (kvm_pmd_huge(*pmd))		/* THP, HugeTLB */
> +		return pmd_young(*pmd);
> +
> +	pte = pte_offset_kernel(pmd, gpa);
> +	if (!pte_none(*pte))		/* Just a page... */
> +		return pte_young(*pte);
> +
> +	return 0;
> +}
> +
> +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end)
> +{
> +	trace_kvm_age_hva(start, end);
> +	return handle_hva_to_gpa(kvm, start, end, kvm_age_hva_handler, NULL);
> +}
> +
> +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
> +{
> +	trace_kvm_test_age_hva(hva);
> +	return handle_hva_to_gpa(kvm, hva, hva, kvm_test_age_hva_handler, NULL);
> +}
> +
>  void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
>  {
>  	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index b6a6e71..364b5382 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -203,6 +203,39 @@ TRACE_EVENT(kvm_set_spte_hva,
>  	TP_printk("mmu notifier set pte hva: %#08lx", __entry->hva)
>  );
>  
> +TRACE_EVENT(kvm_age_hva,
> +	TP_PROTO(unsigned long start, unsigned long end),
> +	TP_ARGS(start, end),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	start		)
> +		__field(	unsigned long,	end		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->start		= start;
> +		__entry->end		= end;
> +	),
> +
> +	TP_printk("mmu notifier age hva: %#08lx -- %#08lx",
> +		  __entry->start, __entry->end)
> +);
> +
> +TRACE_EVENT(kvm_test_age_hva,
> +	TP_PROTO(unsigned long hva),
> +	TP_ARGS(hva),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	hva		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->hva		= hva;
> +	),
> +
> +	TP_printk("mmu notifier test age hva: %#08lx", __entry->hva)
> +);
> +
>  TRACE_EVENT(kvm_hvc,
>  	TP_PROTO(unsigned long vcpu_pc, unsigned long r0, unsigned long imm),
>  	TP_ARGS(vcpu_pc, r0, imm),
> diff --git a/arch/arm64/include/asm/kvm_arm.h b/arch/arm64/include/asm/kvm_arm.h
> index 8afb863..0d738f2 100644
> --- a/arch/arm64/include/asm/kvm_arm.h
> +++ b/arch/arm64/include/asm/kvm_arm.h
> @@ -212,6 +212,7 @@
>  
>  
>  #define FSC_FAULT	(0x04)
> +#define FSC_ACCESS	(0x08)
>  #define FSC_PERM	(0x0c)
>  
>  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index acd101a..b831710 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -173,19 +173,10 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
>  int kvm_unmap_hva_range(struct kvm *kvm,
>  			unsigned long start, unsigned long end);
>  void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
> +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned long end);
> +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva);
>  
>  /* We do not have shadow page tables, hence the empty hooks */
> -static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> -			      unsigned long end)
> -{
> -	return 0;
> -}
> -
> -static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
> -{
> -	return 0;
> -}
> -
>  static inline void kvm_arch_mmu_notifier_invalidate_page(struct kvm *kvm,
>  							 unsigned long address)
>  {
> -- 
> 2.1.4
> 
Otherwise, this looks good.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults
  2015-01-21 18:42 ` [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults Marc Zyngier
@ 2015-03-12 11:40   ` Christoffer Dall
  2015-03-12 15:04     ` Marc Zyngier
  2015-03-12 15:07     ` Marc Zyngier
  0 siblings, 2 replies; 10+ messages in thread
From: Christoffer Dall @ 2015-03-12 11:40 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvm, kvmarm, Steve Capper

On Wed, Jan 21, 2015 at 06:42:13PM +0000, Marc Zyngier wrote:
> Now that we have page aging in Stage-2, it becomes obvious that
> we're doing way too much work handling the fault.
> 
> The page is not going anywhere (it is still mapped), the page
> tables are already allocated, and all we want is to flip a bit
> in the PMD or PTE. Also, we can avoid any form of TLB invalidation,
> since a page with the AF bit off is not allowed to be cached.
> 
> An obvious solution is to have a separate handler for FSC_ACCESS,
> where we pride ourselves to only do the very minimum amount of
> work.
> 
> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> ---
>  arch/arm/kvm/mmu.c   | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  arch/arm/kvm/trace.h | 15 +++++++++++++++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index ffe89a0..112bae1 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -1073,6 +1073,46 @@ out_unlock:
>  	return ret;
>  }
>  
> +/*
> + * Resolve the access fault by making the page young again.
> + * Note that because the faulting entry is guaranteed not to be
> + * cached in the TLB, we don't need to invalidate anything.
> + */
> +static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
> +{
> +	pmd_t *pmd;
> +	pte_t *pte;
> +	pfn_t pfn;
> +	bool pfn_valid = false;

I think you can just initialize pfn to KVM_PFN_ERR_FAULT and use
is_error_pfn() if you like, not sure if it's cleaner.

> +
> +	trace_kvm_access_fault(fault_ipa);
> +
> +	spin_lock(&vcpu->kvm->mmu_lock);
> +
> +	pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
> +	if (!pmd || pmd_none(*pmd))	/* Nothing there */
> +		goto out;
> +
> +	if (kvm_pmd_huge(*pmd)) {	/* THP, HugeTLB */
> +		*pmd = pmd_mkyoung(*pmd);
> +		pfn = pmd_pfn(*pmd);
> +		pfn_valid = true;
> +		goto out;
> +	}
> +
> +	pte = pte_offset_kernel(pmd, fault_ipa);
> +	if (pte_none(*pte))		/* Nothing there either */
> +		goto out;
> +
> +	*pte = pte_mkyoung(*pte);	/* Just a page... */
> +	pfn = pte_pfn(*pte);
> +	pfn_valid = true;
> +out:
> +	spin_unlock(&vcpu->kvm->mmu_lock);

do you have a race here if the page is swapped out before you go and set
pfn accessed?  Does that cause any harm?

> +	if (pfn_valid)
> +		kvm_set_pfn_accessed(pfn);
> +}
> +
>  /**
>   * kvm_handle_guest_abort - handles all 2nd stage aborts
>   * @vcpu:	the VCPU pointer
> @@ -1140,6 +1180,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  	/* Userspace should not be able to register out-of-bounds IPAs */
>  	VM_BUG_ON(fault_ipa >= KVM_PHYS_SIZE);
>  
> +	if (fault_status == FSC_ACCESS) {
> +		handle_access_fault(vcpu, fault_ipa);
> +		ret = 1;
> +		goto out_unlock;
> +	}
> +
>  	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
>  	if (ret == 0)
>  		ret = 1;
> diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> index 364b5382..5665a16 100644
> --- a/arch/arm/kvm/trace.h
> +++ b/arch/arm/kvm/trace.h
> @@ -64,6 +64,21 @@ TRACE_EVENT(kvm_guest_fault,
>  		  __entry->hxfar, __entry->vcpu_pc)
>  );
>  
> +TRACE_EVENT(kvm_access_fault,
> +	TP_PROTO(unsigned long ipa),
> +	TP_ARGS(ipa),
> +
> +	TP_STRUCT__entry(
> +		__field(	unsigned long,	ipa		)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->ipa		= ipa;
> +	),
> +
> +	TP_printk("IPA: %lx", __entry->ipa)
> +);
> +
>  TRACE_EVENT(kvm_irq_line,
>  	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int level),
>  	TP_ARGS(type, vcpu_idx, irq_num, level),
> -- 
> 2.1.4
> 
Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging
  2015-03-12 11:40   ` Christoffer Dall
@ 2015-03-12 14:50     ` Marc Zyngier
  0 siblings, 0 replies; 10+ messages in thread
From: Marc Zyngier @ 2015-03-12 14:50 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm

On Thu, 12 Mar 2015 11:40:17 +0000
Christoffer Dall <christoffer.dall@linaro.org> wrote:

> On Wed, Jan 21, 2015 at 06:42:12PM +0000, Marc Zyngier wrote:
> > Until now, KVM/arm didn't care much for page aging (who was swapping
> > anyway?), and simply provided empty hooks to the core KVM code. With
> > server-type systems now being available, things are quite different.
> > 
> > This patch implements very simple support for page aging, by
> > clearing the Access flag in the Stage-2 page tables. On access
> > fault, the current fault handling will write the PTE or PMD again,
> > putting the Access flag back on.
> > 
> > It should be possible to implement a much faster handling for Access
> > faults, but that's left for a later patch.
> > 
> > With this in place, performance in VMs is degraded much more
> > gracefully.
> > 
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm/include/asm/kvm_host.h   | 13 ++-------
> >  arch/arm/kvm/mmu.c                | 59
> > ++++++++++++++++++++++++++++++++++++++-
> > arch/arm/kvm/trace.h              | 33 ++++++++++++++++++++++
> > arch/arm64/include/asm/kvm_arm.h  |  1 +
> > arch/arm64/include/asm/kvm_host.h | 13 ++------- 5 files changed,
> > 96 insertions(+), 23 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h
> > b/arch/arm/include/asm/kvm_host.h index 04b4ea0..d6b5b85 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -163,19 +163,10 @@ void kvm_set_spte_hva(struct kvm *kvm,
> > unsigned long hva, pte_t pte); 
> >  unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu);
> >  int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user
> > *indices); +int kvm_age_hva(struct kvm *kvm, unsigned long start,
> > unsigned long end); +int kvm_test_age_hva(struct kvm *kvm, unsigned
> > long hva); 
> >  /* We do not have shadow page tables, hence the empty hooks */
> > -static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> > -			      unsigned long end)
> > -{
> > -	return 0;
> > -}
> > -
> > -static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long
> > hva) -{
> > -	return 0;
> > -}
> > -
> >  static inline void kvm_arch_mmu_notifier_invalidate_page(struct
> > kvm *kvm, unsigned long address)
> >  {
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index e163a45..ffe89a0 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -1068,6 +1068,7 @@ static int user_mem_abort(struct kvm_vcpu
> > *vcpu, phys_addr_t fault_ipa, 
> >  out_unlock:
> >  	spin_unlock(&kvm->mmu_lock);
> > +	kvm_set_pfn_accessed(pfn);
> >  	kvm_release_pfn_clean(pfn);
> >  	return ret;
> >  }
> > @@ -1102,7 +1103,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> > *vcpu, struct kvm_run *run) 
> >  	/* Check the stage-2 fault is trans. fault or write fault
> > */ fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
> > -	if (fault_status != FSC_FAULT && fault_status != FSC_PERM)
> > {
> > +	if (fault_status != FSC_FAULT && fault_status != FSC_PERM
> > &&
> > +	    fault_status != FSC_ACCESS) {
> >  		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx
> > ESR_EL2=%#lx\n", kvm_vcpu_trap_get_class(vcpu),
> >  			(unsigned
> > long)kvm_vcpu_trap_get_fault(vcpu), @@ -1237,6 +1239,61 @@ void
> > kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte)
> > handle_hva_to_gpa(kvm, hva, end, &kvm_set_spte_handler,
> > &stage2_pte); } 
> > +static int kvm_age_hva_handler(struct kvm *kvm, gpa_t gpa, void
> > *data) +{
> > +	pmd_t *pmd;
> > +	pte_t *pte;
> > +
> > +	pmd = stage2_get_pmd(kvm, NULL, gpa);
> > +	if (!pmd || pmd_none(*pmd))	/* Nothing there */
> > +		return 0;
> > +
> > +	if (kvm_pmd_huge(*pmd)) {	/* THP, HugeTLB */
> > +		*pmd = pmd_mkold(*pmd);
> > +		goto tlbi;
> 
> so in this case we'll loop over a huge pmd on a page-by-page basis,
> invalidating the tlb each time, right?
> 
> Would it be worth checking of the access flag is already clear
> (!pmd_young()) and in that case exit without doing tlb invalidation?
> 
> In fact, shouldn't we only return 1 if the pmd is indeed young and
> then the tlb invalidation will be done once by
> kvm_flush_remote_tlbs() in kvm_mmu_notifier_clear_flush_young() ?
> 
> I got a little lost looking at how the core mm code uses the return
> value, but if I read the x86 and powerpc code correctly, they use it
> the way I suggest.  Did I get this all wrong?

That's a very good point. I wonder if we could actually optimize
handle_hva_to_gpa to actually respect the mapping boundaries instead of
stupidly iterating over single pages, but that would be a further
improvement.

In the meantime, testing the access flag and doing an early out if
clear seems like the right thing to do.

> > +	}
> > +
> > +	pte = pte_offset_kernel(pmd, gpa);
> > +	if (pte_none(*pte))
> > +		return 0;
> > +
> > +	*pte = pte_mkold(*pte);		/* Just a page... */
> 
> same with checking if it's young or not... ?

Yes, same logic.

> > +tlbi:
> > +	kvm_tlb_flush_vmid_ipa(kvm, gpa);
> > +	return 1;
> > +}
> > +
> > +static int kvm_test_age_hva_handler(struct kvm *kvm, gpa_t gpa,
> > void *data) +{
> > +	pmd_t *pmd;
> > +	pte_t *pte;
> > +
> > +	pmd = stage2_get_pmd(kvm, NULL, gpa);
> > +	if (!pmd || pmd_none(*pmd))	/* Nothing there */
> > +		return 0;
> > +
> > +	if (kvm_pmd_huge(*pmd))		/* THP, HugeTLB */
> > +		return pmd_young(*pmd);
> > +
> > +	pte = pte_offset_kernel(pmd, gpa);
> > +	if (!pte_none(*pte))		/* Just a page... */
> > +		return pte_young(*pte);
> > +
> > +	return 0;
> > +}
> > +
> > +int kvm_age_hva(struct kvm *kvm, unsigned long start, unsigned
> > long end) +{
> > +	trace_kvm_age_hva(start, end);
> > +	return handle_hva_to_gpa(kvm, start, end,
> > kvm_age_hva_handler, NULL); +}
> > +
> > +int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
> > +{
> > +	trace_kvm_test_age_hva(hva);
> > +	return handle_hva_to_gpa(kvm, hva, hva,
> > kvm_test_age_hva_handler, NULL); +}
> > +
> >  void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
> >  {
> >  	mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
> > diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> > index b6a6e71..364b5382 100644
> > --- a/arch/arm/kvm/trace.h
> > +++ b/arch/arm/kvm/trace.h
> > @@ -203,6 +203,39 @@ TRACE_EVENT(kvm_set_spte_hva,
> >  	TP_printk("mmu notifier set pte hva: %#08lx", __entry->hva)
> >  );
> >  
> > +TRACE_EVENT(kvm_age_hva,
> > +	TP_PROTO(unsigned long start, unsigned long end),
> > +	TP_ARGS(start, end),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	unsigned long,
> > start		)
> > +		__field(	unsigned long,
> > end		)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->start		= start;
> > +		__entry->end		= end;
> > +	),
> > +
> > +	TP_printk("mmu notifier age hva: %#08lx -- %#08lx",
> > +		  __entry->start, __entry->end)
> > +);
> > +
> > +TRACE_EVENT(kvm_test_age_hva,
> > +	TP_PROTO(unsigned long hva),
> > +	TP_ARGS(hva),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	unsigned long,
> > hva		)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->hva		= hva;
> > +	),
> > +
> > +	TP_printk("mmu notifier test age hva: %#08lx",
> > __entry->hva) +);
> > +
> >  TRACE_EVENT(kvm_hvc,
> >  	TP_PROTO(unsigned long vcpu_pc, unsigned long r0, unsigned
> > long imm), TP_ARGS(vcpu_pc, r0, imm),
> > diff --git a/arch/arm64/include/asm/kvm_arm.h
> > b/arch/arm64/include/asm/kvm_arm.h index 8afb863..0d738f2 100644
> > --- a/arch/arm64/include/asm/kvm_arm.h
> > +++ b/arch/arm64/include/asm/kvm_arm.h
> > @@ -212,6 +212,7 @@
> >  
> >  
> >  #define FSC_FAULT	(0x04)
> > +#define FSC_ACCESS	(0x08)
> >  #define FSC_PERM	(0x0c)
> >  
> >  /* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
> > diff --git a/arch/arm64/include/asm/kvm_host.h
> > b/arch/arm64/include/asm/kvm_host.h index acd101a..b831710 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -173,19 +173,10 @@ int kvm_unmap_hva(struct kvm *kvm, unsigned
> > long hva); int kvm_unmap_hva_range(struct kvm *kvm,
> >  			unsigned long start, unsigned long end);
> >  void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t
> > pte); +int kvm_age_hva(struct kvm *kvm, unsigned long start,
> > unsigned long end); +int kvm_test_age_hva(struct kvm *kvm, unsigned
> > long hva); 
> >  /* We do not have shadow page tables, hence the empty hooks */
> > -static inline int kvm_age_hva(struct kvm *kvm, unsigned long start,
> > -			      unsigned long end)
> > -{
> > -	return 0;
> > -}
> > -
> > -static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long
> > hva) -{
> > -	return 0;
> > -}
> > -
> >  static inline void kvm_arch_mmu_notifier_invalidate_page(struct
> > kvm *kvm, unsigned long address)
> >  {
> > -- 
> > 2.1.4
> > 
> Otherwise, this looks good.

Thanks for having had a look!

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults
  2015-03-12 11:40   ` Christoffer Dall
@ 2015-03-12 15:04     ` Marc Zyngier
  2015-03-12 15:07     ` Marc Zyngier
  1 sibling, 0 replies; 10+ messages in thread
From: Marc Zyngier @ 2015-03-12 15:04 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, kvmarm, Steve Capper

On Thu, 12 Mar 2015 11:40:24 +0000
Christoffer Dall <christoffer.dall@linaro.org> wrote:

> On Wed, Jan 21, 2015 at 06:42:13PM +0000, Marc Zyngier wrote:
> > Now that we have page aging in Stage-2, it becomes obvious that
> > we're doing way too much work handling the fault.
> > 
> > The page is not going anywhere (it is still mapped), the page
> > tables are already allocated, and all we want is to flip a bit
> > in the PMD or PTE. Also, we can avoid any form of TLB invalidation,
> > since a page with the AF bit off is not allowed to be cached.
> > 
> > An obvious solution is to have a separate handler for FSC_ACCESS,
> > where we pride ourselves to only do the very minimum amount of
> > work.
> > 
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm/kvm/mmu.c   | 46
> > ++++++++++++++++++++++++++++++++++++++++++++++ arch/arm/kvm/trace.h
> > | 15 +++++++++++++++ 2 files changed, 61 insertions(+)
> > 
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index ffe89a0..112bae1 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -1073,6 +1073,46 @@ out_unlock:
> >  	return ret;
> >  }
> >  
> > +/*
> > + * Resolve the access fault by making the page young again.
> > + * Note that because the faulting entry is guaranteed not to be
> > + * cached in the TLB, we don't need to invalidate anything.
> > + */
> > +static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t
> > fault_ipa) +{
> > +	pmd_t *pmd;
> > +	pte_t *pte;
> > +	pfn_t pfn;
> > +	bool pfn_valid = false;
> 
> I think you can just initialize pfn to KVM_PFN_ERR_FAULT and use
> is_error_pfn() if you like, not sure if it's cleaner.

I can have a look...

> > +
> > +	trace_kvm_access_fault(fault_ipa);
> > +
> > +	spin_lock(&vcpu->kvm->mmu_lock);
> > +
> > +	pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
> > +	if (!pmd || pmd_none(*pmd))	/* Nothing there */
> > +		goto out;
> > +
> > +	if (kvm_pmd_huge(*pmd)) {	/* THP, HugeTLB */
> > +		*pmd = pmd_mkyoung(*pmd);
> > +		pfn = pmd_pfn(*pmd);
> > +		pfn_valid = true;
> > +		goto out;
> > +	}
> > +
> > +	pte = pte_offset_kernel(pmd, fault_ipa);
> > +	if (pte_none(*pte))		/* Nothing there either
> > */
> > +		goto out;
> > +
> > +	*pte = pte_mkyoung(*pte);	/* Just a page... */
> > +	pfn = pte_pfn(*pte);
> > +	pfn_valid = true;
> > +out:
> > +	spin_unlock(&vcpu->kvm->mmu_lock);
> 
> do you have a race here if the page is swapped out before you go and
> set pfn accessed?  Does that cause any harm?

I don't think it really matters. The physical page is still there, and
is actually being accessed. The fact that the data is being evicted is
an interesting side effect... ;-)

> > +	if (pfn_valid)
> > +		kvm_set_pfn_accessed(pfn);
> > +}
> > +
> >  /**
> >   * kvm_handle_guest_abort - handles all 2nd stage aborts
> >   * @vcpu:	the VCPU pointer
> > @@ -1140,6 +1180,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> > *vcpu, struct kvm_run *run) /* Userspace should not be able to
> > register out-of-bounds IPAs */ VM_BUG_ON(fault_ipa >=
> > KVM_PHYS_SIZE); 
> > +	if (fault_status == FSC_ACCESS) {
> > +		handle_access_fault(vcpu, fault_ipa);
> > +		ret = 1;
> > +		goto out_unlock;
> > +	}
> > +
> >  	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva,
> > fault_status); if (ret == 0)
> >  		ret = 1;
> > diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> > index 364b5382..5665a16 100644
> > --- a/arch/arm/kvm/trace.h
> > +++ b/arch/arm/kvm/trace.h
> > @@ -64,6 +64,21 @@ TRACE_EVENT(kvm_guest_fault,
> >  		  __entry->hxfar, __entry->vcpu_pc)
> >  );
> >  
> > +TRACE_EVENT(kvm_access_fault,
> > +	TP_PROTO(unsigned long ipa),
> > +	TP_ARGS(ipa),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	unsigned long,
> > ipa		)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->ipa		= ipa;
> > +	),
> > +
> > +	TP_printk("IPA: %lx", __entry->ipa)
> > +);
> > +
> >  TRACE_EVENT(kvm_irq_line,
> >  	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int
> > level), TP_ARGS(type, vcpu_idx, irq_num, level),

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults
  2015-03-12 11:40   ` Christoffer Dall
  2015-03-12 15:04     ` Marc Zyngier
@ 2015-03-12 15:07     ` Marc Zyngier
  1 sibling, 0 replies; 10+ messages in thread
From: Marc Zyngier @ 2015-03-12 15:07 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvm, kvmarm, Steve Capper

On Thu, 12 Mar 2015 11:40:24 +0000
Christoffer Dall <christoffer.dall@linaro.org> wrote:

> On Wed, Jan 21, 2015 at 06:42:13PM +0000, Marc Zyngier wrote:
> > Now that we have page aging in Stage-2, it becomes obvious that
> > we're doing way too much work handling the fault.
> > 
> > The page is not going anywhere (it is still mapped), the page
> > tables are already allocated, and all we want is to flip a bit
> > in the PMD or PTE. Also, we can avoid any form of TLB invalidation,
> > since a page with the AF bit off is not allowed to be cached.
> > 
> > An obvious solution is to have a separate handler for FSC_ACCESS,
> > where we pride ourselves to only do the very minimum amount of
> > work.
> > 
> > Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
> > ---
> >  arch/arm/kvm/mmu.c   | 46
> > ++++++++++++++++++++++++++++++++++++++++++++++ arch/arm/kvm/trace.h
> > | 15 +++++++++++++++ 2 files changed, 61 insertions(+)
> > 
> > diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> > index ffe89a0..112bae1 100644
> > --- a/arch/arm/kvm/mmu.c
> > +++ b/arch/arm/kvm/mmu.c
> > @@ -1073,6 +1073,46 @@ out_unlock:
> >  	return ret;
> >  }
> >  
> > +/*
> > + * Resolve the access fault by making the page young again.
> > + * Note that because the faulting entry is guaranteed not to be
> > + * cached in the TLB, we don't need to invalidate anything.
> > + */
> > +static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t
> > fault_ipa) +{
> > +	pmd_t *pmd;
> > +	pte_t *pte;
> > +	pfn_t pfn;
> > +	bool pfn_valid = false;
> 
> I think you can just initialize pfn to KVM_PFN_ERR_FAULT and use
> is_error_pfn() if you like, not sure if it's cleaner.

I can have a look...

> > +
> > +	trace_kvm_access_fault(fault_ipa);
> > +
> > +	spin_lock(&vcpu->kvm->mmu_lock);
> > +
> > +	pmd = stage2_get_pmd(vcpu->kvm, NULL, fault_ipa);
> > +	if (!pmd || pmd_none(*pmd))	/* Nothing there */
> > +		goto out;
> > +
> > +	if (kvm_pmd_huge(*pmd)) {	/* THP, HugeTLB */
> > +		*pmd = pmd_mkyoung(*pmd);
> > +		pfn = pmd_pfn(*pmd);
> > +		pfn_valid = true;
> > +		goto out;
> > +	}
> > +
> > +	pte = pte_offset_kernel(pmd, fault_ipa);
> > +	if (pte_none(*pte))		/* Nothing there either
> > */
> > +		goto out;
> > +
> > +	*pte = pte_mkyoung(*pte);	/* Just a page... */
> > +	pfn = pte_pfn(*pte);
> > +	pfn_valid = true;
> > +out:
> > +	spin_unlock(&vcpu->kvm->mmu_lock);
> 
> do you have a race here if the page is swapped out before you go and
> set pfn accessed?  Does that cause any harm?

I don't think it really matters. The physical page is still there, and
is actually being accessed. The fact that the data is being evicted is
an interesting side effect... ;-)

> > +	if (pfn_valid)
> > +		kvm_set_pfn_accessed(pfn);
> > +}
> > +
> >  /**
> >   * kvm_handle_guest_abort - handles all 2nd stage aborts
> >   * @vcpu:	the VCPU pointer
> > @@ -1140,6 +1180,12 @@ int kvm_handle_guest_abort(struct kvm_vcpu
> > *vcpu, struct kvm_run *run) /* Userspace should not be able to
> > register out-of-bounds IPAs */ VM_BUG_ON(fault_ipa >=
> > KVM_PHYS_SIZE); 
> > +	if (fault_status == FSC_ACCESS) {
> > +		handle_access_fault(vcpu, fault_ipa);
> > +		ret = 1;
> > +		goto out_unlock;
> > +	}
> > +
> >  	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva,
> > fault_status); if (ret == 0)
> >  		ret = 1;
> > diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
> > index 364b5382..5665a16 100644
> > --- a/arch/arm/kvm/trace.h
> > +++ b/arch/arm/kvm/trace.h
> > @@ -64,6 +64,21 @@ TRACE_EVENT(kvm_guest_fault,
> >  		  __entry->hxfar, __entry->vcpu_pc)
> >  );
> >  
> > +TRACE_EVENT(kvm_access_fault,
> > +	TP_PROTO(unsigned long ipa),
> > +	TP_ARGS(ipa),
> > +
> > +	TP_STRUCT__entry(
> > +		__field(	unsigned long,
> > ipa		)
> > +	),
> > +
> > +	TP_fast_assign(
> > +		__entry->ipa		= ipa;
> > +	),
> > +
> > +	TP_printk("IPA: %lx", __entry->ipa)
> > +);
> > +
> >  TRACE_EVENT(kvm_irq_line,
> >  	TP_PROTO(unsigned int type, int vcpu_idx, int irq_num, int
> > level), TP_ARGS(type, vcpu_idx, irq_num, level),

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-03-12 15:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-21 18:42 [PATCH 0/3] arm/arm64: KVM: Add support for page aging Marc Zyngier
2015-01-21 18:42 ` [PATCH 1/3] arm/arm64: KVM: Allow handle_hva_to_gpa to return a value Marc Zyngier
2015-03-12 11:40   ` Christoffer Dall
2015-01-21 18:42 ` [PATCH 2/3] arm/arm64: KVM: Implement Stage-2 page aging Marc Zyngier
2015-03-12 11:40   ` Christoffer Dall
2015-03-12 14:50     ` Marc Zyngier
2015-01-21 18:42 ` [PATCH 3/3] arm/arm64: KVM: Optimize handling of Access Flag faults Marc Zyngier
2015-03-12 11:40   ` Christoffer Dall
2015-03-12 15:04     ` Marc Zyngier
2015-03-12 15:07     ` Marc Zyngier

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.