linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFCv1 0/7] Support Async Page Fault
@ 2020-04-10  8:58 Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 1/7] kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() Gavin Shan
                   ` (7 more replies)
  0 siblings, 8 replies; 15+ messages in thread
From: Gavin Shan @ 2020-04-10  8:58 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, drjones, suzuki.poulose, maz, sudeep.holla,
	eric.auger, james.morse, shan.gavin, catalin.marinas, will,
	linux-arm-kernel

There are two stages of page faults and the stage one page fault is
handled by guest itself. The guest is trapped to host when the page
fault is caused by stage 2 page table, for example missing. The guest
is suspended until the requested page is populated. To populate the
requested page can be costly and might be related to IO activities
if the page was swapped out previously. In this case, the guest has
to suspend for a few of milliseconds at least, regardless of the
overall system load.

The series adds support to asychornous page fault to improve above
situation. If it's costly to populate the requested page, a signal
(PAGE_NOT_PRESENT) is sent to guest so that the faulting process can
be rescheduled if it can be. Otherwise, it is put into power-saving
mode. Another signal (PAGE_READY) is sent to guest once the requested
page is populated so that the faulting process can be waken up either
from either waiting state or power-saving state.

In order to fulfil the control flow and convey signals between host
and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced.
The register accepts control block's physical address, plus requested
features. Also, the signal is sent using data abort with the specific
IMPDEF Data Fault Status Code (DFSC). The specific signal is stored
in the control block by host, to be consumed by guest.

Todo
====
* CONFIG_KVM_ASYNC_PF_SYNC is disabled for now because the exception
  injection can't work in nested mode. It might be something to be
  improved in future.
* KVM_ASYNC_PF_SEND_ALWAYS is disabled even with CONFIG_PREEMPTION
  because it's simply not working reliably.
* Tracepoints, which should something to be done in short term.
* kvm-unit-test cases.
* More testing and debugging are needed. Sometimes, the guest can be
  stuck and the root cause needs to be figured out.

PATCH[01] renames kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() since the
          aarch32 host isn't supported.
PATCH[02] allows various helper functions to access ESR value from
          somewhere other than vCPU struct.
PATCH[03] replaces @hsr with @esr as aarch32 host isn't supported.
PATCH[04] exports kvm_handle_user_mem_abort(), which is used by the
          subsequent patch.
PATCH[05] introduces API to inject data abort with IMPDEF DFSC
PATCH[06] supports asynchronous page fault for host
PATCH[07] supports asynchronous page fault for guest

Testing
=======

Start a VM and its QEMU process is put into the specific memory cgroup.
The cgroup's memory limitation is less that the total amount of memory
assigned to the VM. For example, the VM is assigned with 4GB memory, but
the cgroup's limitaton is 2GB. A program is run after VM boots up, to
allocate (and access) all free memory. No system hang is found.

Gavin Shan (7):
  kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr()
  kvm/arm64: Detach ESR operator from vCPU struct
  kvm/arm64: Replace hsr with esr
  kvm/arm64: Export kvm_handle_user_mem_abort() with prefault mode
  kvm/arm64: Allow inject data abort with specified DFSC
  kvm/arm64: Support async page fault
  arm64: Support async page fault

 arch/arm64/Kconfig                       |  11 +
 arch/arm64/include/asm/exception.h       |   5 +
 arch/arm64/include/asm/kvm_emulate.h     |  87 +++----
 arch/arm64/include/asm/kvm_host.h        |  46 ++++
 arch/arm64/include/asm/kvm_para.h        |  55 +++++
 arch/arm64/include/asm/sysreg.h          |   3 +
 arch/arm64/include/uapi/asm/Kbuild       |   3 -
 arch/arm64/include/uapi/asm/kvm_para.h   |  22 ++
 arch/arm64/kernel/smp.c                  |  47 ++++
 arch/arm64/kvm/Kconfig                   |   1 +
 arch/arm64/kvm/Makefile                  |   2 +
 arch/arm64/kvm/handle_exit.c             |  48 ++--
 arch/arm64/kvm/hyp/switch.c              |  33 +--
 arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c |   7 +-
 arch/arm64/kvm/inject_fault.c            |  38 ++-
 arch/arm64/kvm/sys_regs.c                |  91 +++++--
 arch/arm64/mm/fault.c                    | 239 ++++++++++++++++++-
 virt/kvm/arm/aarch32.c                   |  27 ++-
 virt/kvm/arm/arm.c                       |  36 ++-
 virt/kvm/arm/async_pf.c                  | 290 +++++++++++++++++++++++
 virt/kvm/arm/hyp/aarch32.c               |   4 +-
 virt/kvm/arm/hyp/vgic-v3-sr.c            |   7 +-
 virt/kvm/arm/mmio.c                      |  27 ++-
 virt/kvm/arm/mmu.c                       |  69 ++++--
 24 files changed, 1040 insertions(+), 158 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_para.h
 delete mode 100644 arch/arm64/include/uapi/asm/Kbuild
 create mode 100644 arch/arm64/include/uapi/asm/kvm_para.h
 create mode 100644 virt/kvm/arm/async_pf.c

-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH RFCv1 1/7] kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr()
  2020-04-10  8:58 [PATCH RFCv1 0/7] Support Async Page Fault Gavin Shan
@ 2020-04-10  8:58 ` Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 2/7] kvm/arm64: Detach ESR operator from vCPU struct Gavin Shan
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2020-04-10  8:58 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, drjones, suzuki.poulose, maz, sudeep.holla,
	eric.auger, james.morse, shan.gavin, catalin.marinas, will,
	linux-arm-kernel

Since kvm/arm32 was removed, this renames kvm_vcpu_get_hsr() to
kvm_vcpu_get_esr() to it a bit more self-explaining because the
functions returns ESR instead of HSR on aarch64. This shouldn't
cause any functional changes.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/kvm_emulate.h | 36 +++++++++++++++-------------
 arch/arm64/kvm/handle_exit.c         | 12 +++++-----
 arch/arm64/kvm/hyp/switch.c          |  2 +-
 arch/arm64/kvm/sys_regs.c            |  6 ++---
 virt/kvm/arm/hyp/aarch32.c           |  2 +-
 virt/kvm/arm/hyp/vgic-v3-sr.c        |  4 ++--
 virt/kvm/arm/mmu.c                   |  6 ++---
 7 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index a30b4eec7cb4..bd1a69e7c104 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -265,14 +265,14 @@ static inline bool vcpu_mode_priv(const struct kvm_vcpu *vcpu)
 	return mode != PSR_MODE_EL0t;
 }
 
-static __always_inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu)
+static __always_inline u32 kvm_vcpu_get_esr(const struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.fault.esr_el2;
 }
 
 static __always_inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
 {
-	u32 esr = kvm_vcpu_get_hsr(vcpu);
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 
 	if (esr & ESR_ELx_CV)
 		return (esr & ESR_ELx_COND_MASK) >> ESR_ELx_COND_SHIFT;
@@ -297,64 +297,66 @@ static inline u64 kvm_vcpu_get_disr(const struct kvm_vcpu *vcpu)
 
 static inline u32 kvm_vcpu_hvc_get_imm(const struct kvm_vcpu *vcpu)
 {
-	return kvm_vcpu_get_hsr(vcpu) & ESR_ELx_xVC_IMM_MASK;
+	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_xVC_IMM_MASK;
 }
 
 static __always_inline bool kvm_vcpu_dabt_isvalid(const struct kvm_vcpu *vcpu)
 {
-	return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_ISV);
+	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_ISV);
 }
 
 static inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(const struct kvm_vcpu *vcpu)
 {
-	return kvm_vcpu_get_hsr(vcpu) & (ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC);
+	return kvm_vcpu_get_esr(vcpu) &
+	       (ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC);
 }
 
 static inline bool kvm_vcpu_dabt_issext(const struct kvm_vcpu *vcpu)
 {
-	return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SSE);
+	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_SSE);
 }
 
 static inline bool kvm_vcpu_dabt_issf(const struct kvm_vcpu *vcpu)
 {
-	return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SF);
+	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_SF);
 }
 
 static __always_inline int kvm_vcpu_dabt_get_rd(const struct kvm_vcpu *vcpu)
 {
-	return (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SRT_MASK) >> ESR_ELx_SRT_SHIFT;
+	return (kvm_vcpu_get_esr(vcpu) & ESR_ELx_SRT_MASK) >> ESR_ELx_SRT_SHIFT;
 }
 
 static __always_inline bool kvm_vcpu_dabt_iss1tw(const struct kvm_vcpu *vcpu)
 {
-	return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_S1PTW);
+	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_S1PTW);
 }
 
 static __always_inline bool kvm_vcpu_dabt_iswrite(const struct kvm_vcpu *vcpu)
 {
-	return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WNR) ||
+	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WNR) ||
 		kvm_vcpu_dabt_iss1tw(vcpu); /* AF/DBM update */
 }
 
 static inline bool kvm_vcpu_dabt_is_cm(const struct kvm_vcpu *vcpu)
 {
-	return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_CM);
+	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_CM);
 }
 
 static __always_inline unsigned int kvm_vcpu_dabt_get_as(const struct kvm_vcpu *vcpu)
 {
-	return 1 << ((kvm_vcpu_get_hsr(vcpu) & ESR_ELx_SAS) >> ESR_ELx_SAS_SHIFT);
+	return 1 << ((kvm_vcpu_get_esr(vcpu) & ESR_ELx_SAS) >>
+		     ESR_ELx_SAS_SHIFT);
 }
 
 /* This one is not specific to Data Abort */
 static __always_inline bool kvm_vcpu_trap_il_is32bit(const struct kvm_vcpu *vcpu)
 {
-	return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_IL);
+	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_IL);
 }
 
 static __always_inline u8 kvm_vcpu_trap_get_class(const struct kvm_vcpu *vcpu)
 {
-	return ESR_ELx_EC(kvm_vcpu_get_hsr(vcpu));
+	return ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
 }
 
 static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
@@ -364,12 +366,12 @@ static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
 
 static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
 {
-	return kvm_vcpu_get_hsr(vcpu) & ESR_ELx_FSC;
+	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC;
 }
 
 static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
 {
-	return kvm_vcpu_get_hsr(vcpu) & ESR_ELx_FSC_TYPE;
+	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
 }
 
 static __always_inline bool kvm_vcpu_dabt_isextabt(const struct kvm_vcpu *vcpu)
@@ -393,7 +395,7 @@ static __always_inline bool kvm_vcpu_dabt_isextabt(const struct kvm_vcpu *vcpu)
 
 static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
 {
-	u32 esr = kvm_vcpu_get_hsr(vcpu);
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	return ESR_ELx_SYS64_ISS_RT(esr);
 }
 
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index aacfc55de44c..c5b75a4d5eda 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -89,7 +89,7 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
  */
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	if (kvm_vcpu_get_hsr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
+	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
 		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
@@ -119,7 +119,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
  */
 static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	u32 hsr = kvm_vcpu_get_hsr(vcpu);
+	u32 hsr = kvm_vcpu_get_esr(vcpu);
 	int ret = 0;
 
 	run->exit_reason = KVM_EXIT_DEBUG;
@@ -146,7 +146,7 @@ static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	u32 hsr = kvm_vcpu_get_hsr(vcpu);
+	u32 hsr = kvm_vcpu_get_esr(vcpu);
 
 	kvm_pr_unimpl("Unknown exception class: hsr: %#08x -- %s\n",
 		      hsr, esr_get_class_string(hsr));
@@ -226,7 +226,7 @@ static exit_handle_fn arm_exit_handlers[] = {
 
 static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
 {
-	u32 hsr = kvm_vcpu_get_hsr(vcpu);
+	u32 hsr = kvm_vcpu_get_esr(vcpu);
 	u8 hsr_ec = ESR_ELx_EC(hsr);
 
 	return arm_exit_handlers[hsr_ec];
@@ -267,7 +267,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		       int exception_index)
 {
 	if (ARM_SERROR_PENDING(exception_index)) {
-		u8 hsr_ec = ESR_ELx_EC(kvm_vcpu_get_hsr(vcpu));
+		u8 hsr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
 
 		/*
 		 * HVC/SMC already have an adjusted PC, which we need
@@ -333,5 +333,5 @@ void handle_exit_early(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	exception_index = ARM_EXCEPTION_CODE(exception_index);
 
 	if (exception_index == ARM_EXCEPTION_EL1_SERROR)
-		kvm_handle_guest_serror(vcpu, kvm_vcpu_get_hsr(vcpu));
+		kvm_handle_guest_serror(vcpu, kvm_vcpu_get_esr(vcpu));
 }
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 8a1e81a400e0..2c3242bcfed2 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -437,7 +437,7 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 
 static bool __hyp_text handle_tx2_tvm(struct kvm_vcpu *vcpu)
 {
-	u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_hsr(vcpu));
+	u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
 	int rt = kvm_vcpu_sys_get_rt(vcpu);
 	u64 val = vcpu_get_reg(vcpu, rt);
 
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 51db934702b6..5b61465927b7 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2214,7 +2214,7 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
 			    size_t nr_specific)
 {
 	struct sys_reg_params params;
-	u32 hsr = kvm_vcpu_get_hsr(vcpu);
+	u32 hsr = kvm_vcpu_get_esr(vcpu);
 	int Rt = kvm_vcpu_sys_get_rt(vcpu);
 	int Rt2 = (hsr >> 10) & 0x1f;
 
@@ -2271,7 +2271,7 @@ static int kvm_handle_cp_32(struct kvm_vcpu *vcpu,
 			    size_t nr_specific)
 {
 	struct sys_reg_params params;
-	u32 hsr = kvm_vcpu_get_hsr(vcpu);
+	u32 hsr = kvm_vcpu_get_esr(vcpu);
 	int Rt  = kvm_vcpu_sys_get_rt(vcpu);
 
 	params.is_aarch32 = true;
@@ -2387,7 +2387,7 @@ static void reset_sys_reg_descs(struct kvm_vcpu *vcpu,
 int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	struct sys_reg_params params;
-	unsigned long esr = kvm_vcpu_get_hsr(vcpu);
+	unsigned long esr = kvm_vcpu_get_esr(vcpu);
 	int Rt = kvm_vcpu_sys_get_rt(vcpu);
 	int ret;
 
diff --git a/virt/kvm/arm/hyp/aarch32.c b/virt/kvm/arm/hyp/aarch32.c
index d31f267961e7..864b477e660a 100644
--- a/virt/kvm/arm/hyp/aarch32.c
+++ b/virt/kvm/arm/hyp/aarch32.c
@@ -51,7 +51,7 @@ bool __hyp_text kvm_condition_valid32(const struct kvm_vcpu *vcpu)
 	int cond;
 
 	/* Top two bits non-zero?  Unconditional. */
-	if (kvm_vcpu_get_hsr(vcpu) >> 30)
+	if (kvm_vcpu_get_esr(vcpu) >> 30)
 		return true;
 
 	/* Is condition field valid? */
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index ccf1fde9836c..8a7a14ec9120 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -441,7 +441,7 @@ static int __hyp_text __vgic_v3_bpr_min(void)
 
 static int __hyp_text __vgic_v3_get_group(struct kvm_vcpu *vcpu)
 {
-	u32 esr = kvm_vcpu_get_hsr(vcpu);
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	u8 crm = (esr & ESR_ELx_SYS64_ISS_CRM_MASK) >> ESR_ELx_SYS64_ISS_CRM_SHIFT;
 
 	return crm != 8;
@@ -1007,7 +1007,7 @@ int __hyp_text __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu)
 	bool is_read;
 	u32 sysreg;
 
-	esr = kvm_vcpu_get_hsr(vcpu);
+	esr = kvm_vcpu_get_esr(vcpu);
 	if (vcpu_mode_is_32bit(vcpu)) {
 		if (!kvm_condition_valid(vcpu)) {
 			__kvm_skip_instr(vcpu);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index e3b9ee268823..5da0d0e7519b 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1922,7 +1922,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * For RAS the host kernel may handle this abort.
 		 * There is no need to pass the error into the guest.
 		 */
-		if (!kvm_handle_guest_sea(fault_ipa, kvm_vcpu_get_hsr(vcpu)))
+		if (!kvm_handle_guest_sea(fault_ipa, kvm_vcpu_get_esr(vcpu)))
 			return 1;
 
 		if (unlikely(!is_iabt)) {
@@ -1931,7 +1931,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		}
 	}
 
-	trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),
+	trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_esr(vcpu),
 			      kvm_vcpu_get_hfar(vcpu), fault_ipa);
 
 	/* Check the stage-2 fault is trans. fault or write fault */
@@ -1940,7 +1940,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
 			kvm_vcpu_trap_get_class(vcpu),
 			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
-			(unsigned long)kvm_vcpu_get_hsr(vcpu));
+			(unsigned long)kvm_vcpu_get_esr(vcpu));
 		return -EFAULT;
 	}
 
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFCv1 2/7] kvm/arm64: Detach ESR operator from vCPU struct
  2020-04-10  8:58 [PATCH RFCv1 0/7] Support Async Page Fault Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 1/7] kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() Gavin Shan
@ 2020-04-10  8:58 ` Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 3/7] kvm/arm64: Replace hsr with esr Gavin Shan
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2020-04-10  8:58 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, drjones, suzuki.poulose, maz, sudeep.holla,
	eric.auger, james.morse, shan.gavin, catalin.marinas, will,
	linux-arm-kernel

There are a set of inline functions defined in kvm_emulate.h. Those
functions reads ESR from vCPU fault information struct and then operate
on it. So it's tied with vCPU fault information and vCPU struct. It
limits their usage scope.

This detaches these functions from the vCPU struct. With this, the
caller has flexibility on where the ESR is read. It shouldn't cause
any functional changes.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/kvm_emulate.h     | 83 +++++++++++-------------
 arch/arm64/kvm/handle_exit.c             | 20 ++++--
 arch/arm64/kvm/hyp/switch.c              | 24 ++++---
 arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c |  7 +-
 arch/arm64/kvm/inject_fault.c            |  4 +-
 arch/arm64/kvm/sys_regs.c                | 12 ++--
 virt/kvm/arm/arm.c                       |  4 +-
 virt/kvm/arm/hyp/aarch32.c               |  2 +-
 virt/kvm/arm/hyp/vgic-v3-sr.c            |  5 +-
 virt/kvm/arm/mmio.c                      | 27 ++++----
 virt/kvm/arm/mmu.c                       | 22 ++++---
 11 files changed, 112 insertions(+), 98 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index bd1a69e7c104..2873bf6dc85e 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -270,10 +270,8 @@ static __always_inline u32 kvm_vcpu_get_esr(const struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault.esr_el2;
 }
 
-static __always_inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu)
+static __always_inline int kvm_vcpu_get_condition(u32 esr)
 {
-	u32 esr = kvm_vcpu_get_esr(vcpu);
-
 	if (esr & ESR_ELx_CV)
 		return (esr & ESR_ELx_COND_MASK) >> ESR_ELx_COND_SHIFT;
 
@@ -295,88 +293,86 @@ static inline u64 kvm_vcpu_get_disr(const struct kvm_vcpu *vcpu)
 	return vcpu->arch.fault.disr_el1;
 }
 
-static inline u32 kvm_vcpu_hvc_get_imm(const struct kvm_vcpu *vcpu)
+static __always_inline u32 kvm_vcpu_hvc_get_imm(u32 esr)
 {
-	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_xVC_IMM_MASK;
+	return esr & ESR_ELx_xVC_IMM_MASK;
 }
 
-static __always_inline bool kvm_vcpu_dabt_isvalid(const struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_vcpu_dabt_isvalid(u32 esr)
 {
-	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_ISV);
+	return !!(esr & ESR_ELx_ISV);
 }
 
-static inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(const struct kvm_vcpu *vcpu)
+static __always_inline unsigned long kvm_vcpu_dabt_iss_nisv_sanitized(u32 esr)
 {
-	return kvm_vcpu_get_esr(vcpu) &
-	       (ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC);
+	return esr & (ESR_ELx_CM | ESR_ELx_WNR | ESR_ELx_FSC);
 }
 
-static inline bool kvm_vcpu_dabt_issext(const struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_vcpu_dabt_issext(u32 esr)
 {
-	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_SSE);
+	return !!(esr & ESR_ELx_SSE);
 }
 
-static inline bool kvm_vcpu_dabt_issf(const struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_vcpu_dabt_issf(u32 esr)
 {
-	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_SF);
+	return !!(esr & ESR_ELx_SF);
 }
 
-static __always_inline int kvm_vcpu_dabt_get_rd(const struct kvm_vcpu *vcpu)
+static __always_inline int kvm_vcpu_dabt_get_rd(u32 esr)
 {
-	return (kvm_vcpu_get_esr(vcpu) & ESR_ELx_SRT_MASK) >> ESR_ELx_SRT_SHIFT;
+	return (esr & ESR_ELx_SRT_MASK) >> ESR_ELx_SRT_SHIFT;
 }
 
-static __always_inline bool kvm_vcpu_dabt_iss1tw(const struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_vcpu_dabt_iss1tw(u32 esr)
 {
-	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_S1PTW);
+	return !!(esr & ESR_ELx_S1PTW);
 }
 
-static __always_inline bool kvm_vcpu_dabt_iswrite(const struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_vcpu_dabt_iswrite(u32 esr)
 {
-	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_WNR) ||
-		kvm_vcpu_dabt_iss1tw(vcpu); /* AF/DBM update */
+	return !!(esr & ESR_ELx_WNR) ||
+		kvm_vcpu_dabt_iss1tw(esr); /* AF/DBM update */
 }
 
-static inline bool kvm_vcpu_dabt_is_cm(const struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_vcpu_dabt_is_cm(u32 esr)
 {
-	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_CM);
+	return !!(esr & ESR_ELx_CM);
 }
 
-static __always_inline unsigned int kvm_vcpu_dabt_get_as(const struct kvm_vcpu *vcpu)
+static __always_inline unsigned int kvm_vcpu_dabt_get_as(u32 esr)
 {
-	return 1 << ((kvm_vcpu_get_esr(vcpu) & ESR_ELx_SAS) >>
-		     ESR_ELx_SAS_SHIFT);
+	return 1 << ((esr & ESR_ELx_SAS) >> ESR_ELx_SAS_SHIFT);
 }
 
 /* This one is not specific to Data Abort */
-static __always_inline bool kvm_vcpu_trap_il_is32bit(const struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_vcpu_trap_il_is32bit(u32 esr)
 {
-	return !!(kvm_vcpu_get_esr(vcpu) & ESR_ELx_IL);
+	return !!(esr & ESR_ELx_IL);
 }
 
-static __always_inline u8 kvm_vcpu_trap_get_class(const struct kvm_vcpu *vcpu)
+static __always_inline u8 kvm_vcpu_trap_get_class(u32 esr)
 {
-	return ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
+	return ESR_ELx_EC(esr);
 }
 
-static inline bool kvm_vcpu_trap_is_iabt(const struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_vcpu_trap_is_iabt(u32 esr)
 {
-	return kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_IABT_LOW;
+	return kvm_vcpu_trap_get_class(esr) == ESR_ELx_EC_IABT_LOW;
 }
 
-static __always_inline u8 kvm_vcpu_trap_get_fault(const struct kvm_vcpu *vcpu)
+static __always_inline u8 kvm_vcpu_trap_get_fault(u32 esr)
 {
-	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC;
+	return esr & ESR_ELx_FSC;
 }
 
-static __always_inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
+static __always_inline u8 kvm_vcpu_trap_get_fault_type(u32 esr)
 {
-	return kvm_vcpu_get_esr(vcpu) & ESR_ELx_FSC_TYPE;
+	return esr & ESR_ELx_FSC_TYPE;
 }
 
-static __always_inline bool kvm_vcpu_dabt_isextabt(const struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_vcpu_dabt_isextabt(u32 esr)
 {
-	switch (kvm_vcpu_trap_get_fault(vcpu)) {
+	switch (kvm_vcpu_trap_get_fault(esr)) {
 	case FSC_SEA:
 	case FSC_SEA_TTW0:
 	case FSC_SEA_TTW1:
@@ -393,18 +389,17 @@ static __always_inline bool kvm_vcpu_dabt_isextabt(const struct kvm_vcpu *vcpu)
 	}
 }
 
-static __always_inline int kvm_vcpu_sys_get_rt(struct kvm_vcpu *vcpu)
+static __always_inline int kvm_vcpu_sys_get_rt(u32 esr)
 {
-	u32 esr = kvm_vcpu_get_esr(vcpu);
 	return ESR_ELx_SYS64_ISS_RT(esr);
 }
 
-static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
+static __always_inline bool kvm_is_write_fault(u32 esr)
 {
-	if (kvm_vcpu_trap_is_iabt(vcpu))
+	if (kvm_vcpu_trap_is_iabt(esr))
 		return false;
 
-	return kvm_vcpu_dabt_iswrite(vcpu);
+	return kvm_vcpu_dabt_iswrite(esr);
 }
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
@@ -527,7 +522,7 @@ static __always_inline void __hyp_text __kvm_skip_instr(struct kvm_vcpu *vcpu)
 	*vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR);
 	vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(SYS_SPSR);
 
-	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(kvm_vcpu_get_esr(vcpu)));
 
 	write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, SYS_SPSR);
 	write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR);
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index c5b75a4d5eda..00858db82a64 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -38,7 +38,7 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	int ret;
 
 	trace_kvm_hvc_arm64(*vcpu_pc(vcpu), vcpu_get_reg(vcpu, 0),
-			    kvm_vcpu_hvc_get_imm(vcpu));
+			    kvm_vcpu_hvc_get_imm(kvm_vcpu_get_esr(vcpu)));
 	vcpu->stat.hvc_exit_stat++;
 
 	ret = kvm_hvc_call_handler(vcpu);
@@ -52,6 +52,8 @@ static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
+
 	/*
 	 * "If an SMC instruction executed at Non-secure EL1 is
 	 * trapped to EL2 because HCR_EL2.TSC is 1, the exception is a
@@ -61,7 +63,7 @@ static int handle_smc(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	 * otherwise return to the same address...
 	 */
 	vcpu_set_reg(vcpu, 0, ~0UL);
-	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(esr));
 	return 1;
 }
 
@@ -89,7 +91,9 @@ static int handle_no_fpsimd(struct kvm_vcpu *vcpu, struct kvm_run *run)
  */
 static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	if (kvm_vcpu_get_esr(vcpu) & ESR_ELx_WFx_ISS_WFE) {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
+
+	if (esr & ESR_ELx_WFx_ISS_WFE) {
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), true);
 		vcpu->stat.wfe_exit_stat++;
 		kvm_vcpu_on_spin(vcpu, vcpu_mode_priv(vcpu));
@@ -100,7 +104,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	}
 
-	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(esr));
 
 	return 1;
 }
@@ -240,6 +244,7 @@ static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
  */
 static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	int handled;
 
 	/*
@@ -247,7 +252,7 @@ static int handle_trap_exceptions(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	 * that fail their condition code check"
 	 */
 	if (!kvm_condition_valid(vcpu)) {
-		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(esr));
 		handled = 1;
 	} else {
 		exit_handle_fn exit_handler;
@@ -267,7 +272,8 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		       int exception_index)
 {
 	if (ARM_SERROR_PENDING(exception_index)) {
-		u8 hsr_ec = ESR_ELx_EC(kvm_vcpu_get_esr(vcpu));
+		u32 esr = kvm_vcpu_get_esr(vcpu);
+		u8 hsr_ec = ESR_ELx_EC(esr);
 
 		/*
 		 * HVC/SMC already have an adjusted PC, which we need
@@ -276,7 +282,7 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		 */
 		if (hsr_ec == ESR_ELx_EC_HVC32 || hsr_ec == ESR_ELx_EC_HVC64 ||
 		    hsr_ec == ESR_ELx_EC_SMC32 || hsr_ec == ESR_ELx_EC_SMC64) {
-			u32 adj =  kvm_vcpu_trap_il_is32bit(vcpu) ? 4 : 2;
+			u32 adj =  kvm_vcpu_trap_il_is32bit(esr) ? 4 : 2;
 			*vcpu_pc(vcpu) -= adj;
 		}
 
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 2c3242bcfed2..369f22f49f3d 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -355,6 +355,7 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 /* Check for an FPSIMD/SVE trap and handle as appropriate */
 static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	bool vhe, sve_guest, sve_host;
 	u8 hsr_ec;
 
@@ -371,7 +372,7 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 		vhe = has_vhe();
 	}
 
-	hsr_ec = kvm_vcpu_trap_get_class(vcpu);
+	hsr_ec = kvm_vcpu_trap_get_class(esr);
 	if (hsr_ec != ESR_ELx_EC_FP_ASIMD &&
 	    hsr_ec != ESR_ELx_EC_SVE)
 		return false;
@@ -438,7 +439,8 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 static bool __hyp_text handle_tx2_tvm(struct kvm_vcpu *vcpu)
 {
 	u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
-	int rt = kvm_vcpu_sys_get_rt(vcpu);
+	u32 esr = kvm_vcpu_get_esr(vcpu);
+	int rt = kvm_vcpu_sys_get_rt(esr);
 	u64 val = vcpu_get_reg(vcpu, rt);
 
 	/*
@@ -497,6 +499,8 @@ static bool __hyp_text handle_tx2_tvm(struct kvm_vcpu *vcpu)
  */
 static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
+
 	if (ARM_EXCEPTION_CODE(*exit_code) != ARM_EXCEPTION_IRQ)
 		vcpu->arch.fault.esr_el2 = read_sysreg_el2(SYS_ESR);
 
@@ -510,7 +514,7 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 		goto exit;
 
 	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) &&
-	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 &&
+	    kvm_vcpu_trap_get_class(esr) == ESR_ELx_EC_SYS64 &&
 	    handle_tx2_tvm(vcpu))
 		return true;
 
@@ -530,11 +534,11 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (static_branch_unlikely(&vgic_v2_cpuif_trap)) {
 		bool valid;
 
-		valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW &&
-			kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT &&
-			kvm_vcpu_dabt_isvalid(vcpu) &&
-			!kvm_vcpu_dabt_isextabt(vcpu) &&
-			!kvm_vcpu_dabt_iss1tw(vcpu);
+		valid = kvm_vcpu_trap_get_class(esr) == ESR_ELx_EC_DABT_LOW &&
+			kvm_vcpu_trap_get_fault_type(esr) == FSC_FAULT &&
+			kvm_vcpu_dabt_isvalid(esr) &&
+			!kvm_vcpu_dabt_isextabt(esr) &&
+			!kvm_vcpu_dabt_iss1tw(esr);
 
 		if (valid) {
 			int ret = __vgic_v2_perform_cpuif_access(vcpu);
@@ -551,8 +555,8 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	}
 
 	if (static_branch_unlikely(&vgic_v3_cpuif_trap) &&
-	    (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 ||
-	     kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) {
+	    (kvm_vcpu_trap_get_class(esr) == ESR_ELx_EC_SYS64 ||
+	     kvm_vcpu_trap_get_class(esr) == ESR_ELx_EC_CP15_32)) {
 		int ret = __vgic_v3_perform_cpuif_access(vcpu);
 
 		if (ret == 1)
diff --git a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
index 4f3a087e36d5..bcf13a074b69 100644
--- a/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
+++ b/arch/arm64/kvm/hyp/vgic-v2-cpuif-proxy.c
@@ -36,6 +36,7 @@ int __hyp_text __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu)
 {
 	struct kvm *kvm = kern_hyp_va(vcpu->kvm);
 	struct vgic_dist *vgic = &kvm->arch.vgic;
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	phys_addr_t fault_ipa;
 	void __iomem *addr;
 	int rd;
@@ -50,7 +51,7 @@ int __hyp_text __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu)
 		return 0;
 
 	/* Reject anything but a 32bit access */
-	if (kvm_vcpu_dabt_get_as(vcpu) != sizeof(u32)) {
+	if (kvm_vcpu_dabt_get_as(esr) != sizeof(u32)) {
 		__kvm_skip_instr(vcpu);
 		return -1;
 	}
@@ -61,11 +62,11 @@ int __hyp_text __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu)
 		return -1;
 	}
 
-	rd = kvm_vcpu_dabt_get_rd(vcpu);
+	rd = kvm_vcpu_dabt_get_rd(esr);
 	addr  = hyp_symbol_addr(kvm_vgic_global_state)->vcpu_hyp_va;
 	addr += fault_ipa - vgic->vgic_cpu_base;
 
-	if (kvm_vcpu_dabt_iswrite(vcpu)) {
+	if (kvm_vcpu_dabt_iswrite(esr)) {
 		u32 data = vcpu_get_reg(vcpu, rd);
 		if (__is_be(vcpu)) {
 			/* guest pre-swabbed data, undo this for writel() */
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 6aafc2825c1c..0ae7c2e40e02 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -128,7 +128,7 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	 * Build an {i,d}abort, depending on the level and the
 	 * instruction set. Report an external synchronous abort.
 	 */
-	if (kvm_vcpu_trap_il_is32bit(vcpu))
+	if (kvm_vcpu_trap_il_is32bit(kvm_vcpu_get_esr(vcpu)))
 		esr |= ESR_ELx_IL;
 
 	/*
@@ -161,7 +161,7 @@ static void inject_undef64(struct kvm_vcpu *vcpu)
 	 * Build an unknown exception, depending on the instruction
 	 * set.
 	 */
-	if (kvm_vcpu_trap_il_is32bit(vcpu))
+	if (kvm_vcpu_trap_il_is32bit(kvm_vcpu_get_esr(vcpu)))
 		esr |= ESR_ELx_IL;
 
 	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 5b61465927b7..012fff834a4b 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2126,6 +2126,7 @@ static void perform_access(struct kvm_vcpu *vcpu,
 			   struct sys_reg_params *params,
 			   const struct sys_reg_desc *r)
 {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	trace_kvm_sys_access(*vcpu_pc(vcpu), params, r);
 
 	/* Check for regs disabled by runtime config */
@@ -2143,7 +2144,7 @@ static void perform_access(struct kvm_vcpu *vcpu,
 
 	/* Skip instruction if instructed so */
 	if (likely(r->access(vcpu, params, r)))
-		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+		kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(esr));
 }
 
 /*
@@ -2180,7 +2181,8 @@ static int emulate_cp(struct kvm_vcpu *vcpu,
 static void unhandled_cp_access(struct kvm_vcpu *vcpu,
 				struct sys_reg_params *params)
 {
-	u8 hsr_ec = kvm_vcpu_trap_get_class(vcpu);
+	u32 esr = kvm_vcpu_get_esr(vcpu);
+	u8 hsr_ec = kvm_vcpu_trap_get_class(esr);
 	int cp = -1;
 
 	switch(hsr_ec) {
@@ -2215,7 +2217,7 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
 {
 	struct sys_reg_params params;
 	u32 hsr = kvm_vcpu_get_esr(vcpu);
-	int Rt = kvm_vcpu_sys_get_rt(vcpu);
+	int Rt = kvm_vcpu_sys_get_rt(hsr);
 	int Rt2 = (hsr >> 10) & 0x1f;
 
 	params.is_aarch32 = true;
@@ -2272,7 +2274,7 @@ static int kvm_handle_cp_32(struct kvm_vcpu *vcpu,
 {
 	struct sys_reg_params params;
 	u32 hsr = kvm_vcpu_get_esr(vcpu);
-	int Rt  = kvm_vcpu_sys_get_rt(vcpu);
+	int Rt  = kvm_vcpu_sys_get_rt(hsr);
 
 	params.is_aarch32 = true;
 	params.is_32bit = true;
@@ -2388,7 +2390,7 @@ int kvm_handle_sys_reg(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
 	struct sys_reg_params params;
 	unsigned long esr = kvm_vcpu_get_esr(vcpu);
-	int Rt = kvm_vcpu_sys_get_rt(vcpu);
+	int Rt = kvm_vcpu_sys_get_rt(esr);
 	int ret;
 
 	trace_kvm_handle_sys_reg(esr);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 48d0ec44ad77..2cbb57485760 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -808,7 +808,9 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * guest time.
 		 */
 		guest_exit();
-		trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+		trace_kvm_exit(ret,
+			kvm_vcpu_trap_get_class(kvm_vcpu_get_esr(vcpu)),
+			*vcpu_pc(vcpu));
 
 		/* Exit types that need handling before we can be preempted */
 		handle_exit_early(vcpu, run, ret);
diff --git a/virt/kvm/arm/hyp/aarch32.c b/virt/kvm/arm/hyp/aarch32.c
index 864b477e660a..df3055ab3a42 100644
--- a/virt/kvm/arm/hyp/aarch32.c
+++ b/virt/kvm/arm/hyp/aarch32.c
@@ -55,7 +55,7 @@ bool __hyp_text kvm_condition_valid32(const struct kvm_vcpu *vcpu)
 		return true;
 
 	/* Is condition field valid? */
-	cond = kvm_vcpu_get_condition(vcpu);
+	cond = kvm_vcpu_get_condition(kvm_vcpu_get_esr(vcpu));
 	if (cond == 0xE)
 		return true;
 
diff --git a/virt/kvm/arm/hyp/vgic-v3-sr.c b/virt/kvm/arm/hyp/vgic-v3-sr.c
index 8a7a14ec9120..bb2174b8a086 100644
--- a/virt/kvm/arm/hyp/vgic-v3-sr.c
+++ b/virt/kvm/arm/hyp/vgic-v3-sr.c
@@ -1000,14 +1000,13 @@ static void __hyp_text __vgic_v3_write_ctlr(struct kvm_vcpu *vcpu,
 
 int __hyp_text __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu)
 {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	int rt;
-	u32 esr;
 	u32 vmcr;
 	void (*fn)(struct kvm_vcpu *, u32, int);
 	bool is_read;
 	u32 sysreg;
 
-	esr = kvm_vcpu_get_esr(vcpu);
 	if (vcpu_mode_is_32bit(vcpu)) {
 		if (!kvm_condition_valid(vcpu)) {
 			__kvm_skip_instr(vcpu);
@@ -1119,7 +1118,7 @@ int __hyp_text __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu)
 	}
 
 	vmcr = __vgic_v3_read_vmcr();
-	rt = kvm_vcpu_sys_get_rt(vcpu);
+	rt = kvm_vcpu_sys_get_rt(esr);
 	fn(vcpu, vmcr, rt);
 
 	__kvm_skip_instr(vcpu);
diff --git a/virt/kvm/arm/mmio.c b/virt/kvm/arm/mmio.c
index aedfcff99ac5..d92bee8c75e3 100644
--- a/virt/kvm/arm/mmio.c
+++ b/virt/kvm/arm/mmio.c
@@ -81,6 +81,7 @@ unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len)
  */
 int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	unsigned long data;
 	unsigned int len;
 	int mask;
@@ -91,30 +92,30 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 	vcpu->mmio_needed = 0;
 
-	if (!kvm_vcpu_dabt_iswrite(vcpu)) {
-		len = kvm_vcpu_dabt_get_as(vcpu);
+	if (!kvm_vcpu_dabt_iswrite(esr)) {
+		len = kvm_vcpu_dabt_get_as(esr);
 		data = kvm_mmio_read_buf(run->mmio.data, len);
 
-		if (kvm_vcpu_dabt_issext(vcpu) &&
+		if (kvm_vcpu_dabt_issext(esr) &&
 		    len < sizeof(unsigned long)) {
 			mask = 1U << ((len * 8) - 1);
 			data = (data ^ mask) - mask;
 		}
 
-		if (!kvm_vcpu_dabt_issf(vcpu))
+		if (!kvm_vcpu_dabt_issf(esr))
 			data = data & 0xffffffff;
 
 		trace_kvm_mmio(KVM_TRACE_MMIO_READ, len, run->mmio.phys_addr,
 			       &data);
 		data = vcpu_data_host_to_guest(vcpu, data, len);
-		vcpu_set_reg(vcpu, kvm_vcpu_dabt_get_rd(vcpu), data);
+		vcpu_set_reg(vcpu, kvm_vcpu_dabt_get_rd(esr), data);
 	}
 
 	/*
 	 * The MMIO instruction is emulated and should not be re-executed
 	 * in the guest.
 	 */
-	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(esr));
 
 	return 0;
 }
@@ -122,6 +123,7 @@ int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run)
 int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 		 phys_addr_t fault_ipa)
 {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	unsigned long data;
 	unsigned long rt;
 	int ret;
@@ -133,10 +135,11 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	 * No valid syndrome? Ask userspace for help if it has
 	 * voluntered to do so, and bail out otherwise.
 	 */
-	if (!kvm_vcpu_dabt_isvalid(vcpu)) {
+	if (!kvm_vcpu_dabt_isvalid(esr)) {
 		if (vcpu->kvm->arch.return_nisv_io_abort_to_user) {
 			run->exit_reason = KVM_EXIT_ARM_NISV;
-			run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu);
+			run->arm_nisv.esr_iss =
+				kvm_vcpu_dabt_iss_nisv_sanitized(esr);
 			run->arm_nisv.fault_ipa = fault_ipa;
 			return 0;
 		}
@@ -146,7 +149,7 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	}
 
 	/* Page table accesses IO mem: tell guest to fix its TTBR */
-	if (kvm_vcpu_dabt_iss1tw(vcpu)) {
+	if (kvm_vcpu_dabt_iss1tw(esr)) {
 		kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));
 		return 1;
 	}
@@ -156,9 +159,9 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 	 * from the CPU. Then try if some in-kernel emulation feels
 	 * responsible, otherwise let user space do its magic.
 	 */
-	is_write = kvm_vcpu_dabt_iswrite(vcpu);
-	len = kvm_vcpu_dabt_get_as(vcpu);
-	rt = kvm_vcpu_dabt_get_rd(vcpu);
+	is_write = kvm_vcpu_dabt_iswrite(esr);
+	len = kvm_vcpu_dabt_get_as(esr);
+	rt = kvm_vcpu_dabt_get_rd(esr);
 
 	if (is_write) {
 		data = vcpu_data_guest_to_host(vcpu, vcpu_get_reg(vcpu, rt),
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 5da0d0e7519b..e462e0368fd9 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1661,6 +1661,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 			  unsigned long fault_status)
 {
 	int ret;
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	bool write_fault, writable, force_pte = false;
 	bool exec_fault, needs_exec;
 	unsigned long mmu_seq;
@@ -1674,8 +1675,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long vma_pagesize, flags = 0;
 
-	write_fault = kvm_is_write_fault(vcpu);
-	exec_fault = kvm_vcpu_trap_is_iabt(vcpu);
+	write_fault = kvm_is_write_fault(esr);
+	exec_fault = kvm_vcpu_trap_is_iabt(esr);
 	VM_BUG_ON(write_fault && exec_fault);
 
 	if (fault_status == FSC_PERM && !write_fault && !exec_fault) {
@@ -1903,6 +1904,7 @@ static void handle_access_fault(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa)
  */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	unsigned long fault_status;
 	phys_addr_t fault_ipa;
 	struct kvm_memory_slot *memslot;
@@ -1911,13 +1913,13 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	gfn_t gfn;
 	int ret, idx;
 
-	fault_status = kvm_vcpu_trap_get_fault_type(vcpu);
+	fault_status = kvm_vcpu_trap_get_fault_type(esr);
 
 	fault_ipa = kvm_vcpu_get_fault_ipa(vcpu);
-	is_iabt = kvm_vcpu_trap_is_iabt(vcpu);
+	is_iabt = kvm_vcpu_trap_is_iabt(esr);
 
 	/* Synchronous External Abort? */
-	if (kvm_vcpu_dabt_isextabt(vcpu)) {
+	if (kvm_vcpu_dabt_isextabt(esr)) {
 		/*
 		 * For RAS the host kernel may handle this abort.
 		 * There is no need to pass the error into the guest.
@@ -1938,8 +1940,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	if (fault_status != FSC_FAULT && fault_status != FSC_PERM &&
 	    fault_status != FSC_ACCESS) {
 		kvm_err("Unsupported FSC: EC=%#x xFSC=%#lx ESR_EL2=%#lx\n",
-			kvm_vcpu_trap_get_class(vcpu),
-			(unsigned long)kvm_vcpu_trap_get_fault(vcpu),
+			kvm_vcpu_trap_get_class(esr),
+			(unsigned long)kvm_vcpu_trap_get_fault(esr),
 			(unsigned long)kvm_vcpu_get_esr(vcpu));
 		return -EFAULT;
 	}
@@ -1949,7 +1951,7 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	gfn = fault_ipa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(vcpu->kvm, gfn);
 	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
-	write_fault = kvm_is_write_fault(vcpu);
+	write_fault = kvm_is_write_fault(esr);
 	if (kvm_is_error_hva(hva) || (write_fault && !writable)) {
 		if (is_iabt) {
 			/* Prefetch Abort on I/O address */
@@ -1967,8 +1969,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 * So let's assume that the guest is just being
 		 * cautious, and skip the instruction.
 		 */
-		if (kvm_vcpu_dabt_is_cm(vcpu)) {
-			kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
+		if (kvm_vcpu_dabt_is_cm(esr)) {
+			kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(esr));
 			ret = 1;
 			goto out_unlock;
 		}
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFCv1 3/7] kvm/arm64: Replace hsr with esr
  2020-04-10  8:58 [PATCH RFCv1 0/7] Support Async Page Fault Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 1/7] kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 2/7] kvm/arm64: Detach ESR operator from vCPU struct Gavin Shan
@ 2020-04-10  8:58 ` Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 4/7] kvm/arm64: Export kvm_handle_user_mem_abort() with prefault mode Gavin Shan
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2020-04-10  8:58 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, drjones, suzuki.poulose, maz, sudeep.holla,
	eric.auger, james.morse, shan.gavin, catalin.marinas, will,
	linux-arm-kernel

This replace the variable names to make them self-explaining. The
tracepoint isn't changed accordingly because they're part of ABI:

   * @hsr to @esr
   * @hsr_ec to @ec
   * Use kvm_vcpu_trap_get_class() helper if possible

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/kvm/handle_exit.c | 28 ++++++++++++++--------------
 arch/arm64/kvm/hyp/switch.c  |  9 ++++-----
 arch/arm64/kvm/sys_regs.c    | 30 +++++++++++++++---------------
 3 files changed, 33 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index 00858db82a64..e3b3dcd5b811 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -123,13 +123,13 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
  */
 static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	u32 hsr = kvm_vcpu_get_esr(vcpu);
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 	int ret = 0;
 
 	run->exit_reason = KVM_EXIT_DEBUG;
-	run->debug.arch.hsr = hsr;
+	run->debug.arch.hsr = esr;
 
-	switch (ESR_ELx_EC(hsr)) {
+	switch (kvm_vcpu_trap_get_class(esr)) {
 	case ESR_ELx_EC_WATCHPT_LOW:
 		run->debug.arch.far = vcpu->arch.fault.far_el2;
 		/* fall through */
@@ -139,8 +139,8 @@ static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
 	case ESR_ELx_EC_BRK64:
 		break;
 	default:
-		kvm_err("%s: un-handled case hsr: %#08x\n",
-			__func__, (unsigned int) hsr);
+		kvm_err("%s: un-handled case esr: %#08x\n",
+			__func__, (unsigned int)esr);
 		ret = -1;
 		break;
 	}
@@ -150,10 +150,10 @@ static int kvm_handle_guest_debug(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 static int kvm_handle_unknown_ec(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-	u32 hsr = kvm_vcpu_get_esr(vcpu);
+	u32 esr = kvm_vcpu_get_esr(vcpu);
 
-	kvm_pr_unimpl("Unknown exception class: hsr: %#08x -- %s\n",
-		      hsr, esr_get_class_string(hsr));
+	kvm_pr_unimpl("Unknown exception class: esr: %#08x -- %s\n",
+		      esr, esr_get_class_string(esr));
 
 	kvm_inject_undefined(vcpu);
 	return 1;
@@ -230,10 +230,10 @@ static exit_handle_fn arm_exit_handlers[] = {
 
 static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu)
 {
-	u32 hsr = kvm_vcpu_get_esr(vcpu);
-	u8 hsr_ec = ESR_ELx_EC(hsr);
+	u32 esr = kvm_vcpu_get_esr(vcpu);
+	u8 ec = kvm_vcpu_trap_get_class(esr);
 
-	return arm_exit_handlers[hsr_ec];
+	return arm_exit_handlers[ec];
 }
 
 /*
@@ -273,15 +273,15 @@ int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run,
 {
 	if (ARM_SERROR_PENDING(exception_index)) {
 		u32 esr = kvm_vcpu_get_esr(vcpu);
-		u8 hsr_ec = ESR_ELx_EC(esr);
+		u8 ec = kvm_vcpu_trap_get_class(esr);
 
 		/*
 		 * HVC/SMC already have an adjusted PC, which we need
 		 * to correct in order to return to after having
 		 * injected the SError.
 		 */
-		if (hsr_ec == ESR_ELx_EC_HVC32 || hsr_ec == ESR_ELx_EC_HVC64 ||
-		    hsr_ec == ESR_ELx_EC_SMC32 || hsr_ec == ESR_ELx_EC_SMC64) {
+		if (ec == ESR_ELx_EC_HVC32 || ec == ESR_ELx_EC_HVC64 ||
+		    ec == ESR_ELx_EC_SMC32 || ec == ESR_ELx_EC_SMC64) {
 			u32 adj =  kvm_vcpu_trap_il_is32bit(esr) ? 4 : 2;
 			*vcpu_pc(vcpu) -= adj;
 		}
diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index 369f22f49f3d..7bf4840bf90e 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -356,8 +356,8 @@ static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu)
 static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 {
 	u32 esr = kvm_vcpu_get_esr(vcpu);
+	u8 ec = kvm_vcpu_trap_get_class(esr);
 	bool vhe, sve_guest, sve_host;
-	u8 hsr_ec;
 
 	if (!system_supports_fpsimd())
 		return false;
@@ -372,14 +372,13 @@ static bool __hyp_text __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 		vhe = has_vhe();
 	}
 
-	hsr_ec = kvm_vcpu_trap_get_class(esr);
-	if (hsr_ec != ESR_ELx_EC_FP_ASIMD &&
-	    hsr_ec != ESR_ELx_EC_SVE)
+	if (ec != ESR_ELx_EC_FP_ASIMD &&
+	    ec != ESR_ELx_EC_SVE)
 		return false;
 
 	/* Don't handle SVE traps for non-SVE vcpus here: */
 	if (!sve_guest)
-		if (hsr_ec != ESR_ELx_EC_FP_ASIMD)
+		if (ec != ESR_ELx_EC_FP_ASIMD)
 			return false;
 
 	/* Valid trap.  Switch the context: */
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 012fff834a4b..58f81ab519af 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -2182,10 +2182,10 @@ static void unhandled_cp_access(struct kvm_vcpu *vcpu,
 				struct sys_reg_params *params)
 {
 	u32 esr = kvm_vcpu_get_esr(vcpu);
-	u8 hsr_ec = kvm_vcpu_trap_get_class(esr);
+	u8 ec = kvm_vcpu_trap_get_class(esr);
 	int cp = -1;
 
-	switch(hsr_ec) {
+	switch (ec) {
 	case ESR_ELx_EC_CP15_32:
 	case ESR_ELx_EC_CP15_64:
 		cp = 15;
@@ -2216,17 +2216,17 @@ static int kvm_handle_cp_64(struct kvm_vcpu *vcpu,
 			    size_t nr_specific)
 {
 	struct sys_reg_params params;
-	u32 hsr = kvm_vcpu_get_esr(vcpu);
-	int Rt = kvm_vcpu_sys_get_rt(hsr);
-	int Rt2 = (hsr >> 10) & 0x1f;
+	u32 esr = kvm_vcpu_get_esr(vcpu);
+	int Rt = kvm_vcpu_sys_get_rt(esr);
+	int Rt2 = (esr >> 10) & 0x1f;
 
 	params.is_aarch32 = true;
 	params.is_32bit = false;
-	params.CRm = (hsr >> 1) & 0xf;
-	params.is_write = ((hsr & 1) == 0);
+	params.CRm = (esr >> 1) & 0xf;
+	params.is_write = ((esr & 1) == 0);
 
 	params.Op0 = 0;
-	params.Op1 = (hsr >> 16) & 0xf;
+	params.Op1 = (esr >> 16) & 0xf;
 	params.Op2 = 0;
 	params.CRn = 0;
 
@@ -2273,18 +2273,18 @@ static int kvm_handle_cp_32(struct kvm_vcpu *vcpu,
 			    size_t nr_specific)
 {
 	struct sys_reg_params params;
-	u32 hsr = kvm_vcpu_get_esr(vcpu);
-	int Rt  = kvm_vcpu_sys_get_rt(hsr);
+	u32 esr = kvm_vcpu_get_esr(vcpu);
+	int Rt = kvm_vcpu_sys_get_rt(esr);
 
 	params.is_aarch32 = true;
 	params.is_32bit = true;
-	params.CRm = (hsr >> 1) & 0xf;
+	params.CRm = (esr >> 1) & 0xf;
 	params.regval = vcpu_get_reg(vcpu, Rt);
-	params.is_write = ((hsr & 1) == 0);
-	params.CRn = (hsr >> 10) & 0xf;
+	params.is_write = ((esr & 1) == 0);
+	params.CRn = (esr >> 10) & 0xf;
 	params.Op0 = 0;
-	params.Op1 = (hsr >> 14) & 0x7;
-	params.Op2 = (hsr >> 17) & 0x7;
+	params.Op1 = (esr >> 14) & 0x7;
+	params.Op2 = (esr >> 17) & 0x7;
 
 	if (!emulate_cp(vcpu, &params, target_specific, nr_specific) ||
 	    !emulate_cp(vcpu, &params, global, nr_global)) {
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFCv1 4/7] kvm/arm64: Export kvm_handle_user_mem_abort() with prefault mode
  2020-04-10  8:58 [PATCH RFCv1 0/7] Support Async Page Fault Gavin Shan
                   ` (2 preceding siblings ...)
  2020-04-10  8:58 ` [PATCH RFCv1 3/7] kvm/arm64: Replace hsr with esr Gavin Shan
@ 2020-04-10  8:58 ` Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 5/7] kvm/arm64: Allow inject data abort with specified DFSC Gavin Shan
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2020-04-10  8:58 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, drjones, suzuki.poulose, maz, sudeep.holla,
	eric.auger, james.morse, shan.gavin, catalin.marinas, will,
	linux-arm-kernel

This renames user_mem_abort() to kvm_handle_user_mem_abort(), and
then export it. The function will be used in asynchronous page fault
to populate a page table entry once the corresponding page is populated
from the backup device (e.g. swap partition):

   * Parameter @fault_status is replace by @esr.
   * The parameters are reorder based on their importance.

This shouldn't cause any functional changes.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/kvm_host.h |  4 ++++
 virt/kvm/arm/mmu.c                | 14 ++++++++------
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 32c8a675e5a4..f77c706777ec 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -437,6 +437,10 @@ int __kvm_arm_vcpu_set_events(struct kvm_vcpu *vcpu,
 			      struct kvm_vcpu_events *events);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
+int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu, unsigned int esr,
+			      struct kvm_memory_slot *memslot,
+			      phys_addr_t fault_ipa, unsigned long hva,
+			      bool prefault);
 int kvm_unmap_hva_range(struct kvm *kvm,
 			unsigned long start, unsigned long end);
 int kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index e462e0368fd9..95aaabb2b1fc 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1656,12 +1656,12 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 	       (hva & ~(map_size - 1)) + map_size <= uaddr_end;
 }
 
-static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
-			  struct kvm_memory_slot *memslot, unsigned long hva,
-			  unsigned long fault_status)
+int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu, unsigned int esr,
+			      struct kvm_memory_slot *memslot,
+			      phys_addr_t fault_ipa, unsigned long hva,
+			      bool prefault)
 {
-	int ret;
-	u32 esr = kvm_vcpu_get_esr(vcpu);
+	unsigned int fault_status = kvm_vcpu_trap_get_fault_type(esr);
 	bool write_fault, writable, force_pte = false;
 	bool exec_fault, needs_exec;
 	unsigned long mmu_seq;
@@ -1674,6 +1674,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
 	pgprot_t mem_type = PAGE_S2;
 	bool logging_active = memslot_is_logging(memslot);
 	unsigned long vma_pagesize, flags = 0;
+	int ret;
 
 	write_fault = kvm_is_write_fault(esr);
 	exec_fault = kvm_vcpu_trap_is_iabt(esr);
@@ -1995,7 +1996,8 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		goto out_unlock;
 	}
 
-	ret = user_mem_abort(vcpu, fault_ipa, memslot, hva, fault_status);
+	ret = kvm_handle_user_mem_abort(vcpu, esr, memslot,
+					fault_ipa, hva, false);
 	if (ret == 0)
 		ret = 1;
 out:
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFCv1 5/7] kvm/arm64: Allow inject data abort with specified DFSC
  2020-04-10  8:58 [PATCH RFCv1 0/7] Support Async Page Fault Gavin Shan
                   ` (3 preceding siblings ...)
  2020-04-10  8:58 ` [PATCH RFCv1 4/7] kvm/arm64: Export kvm_handle_user_mem_abort() with prefault mode Gavin Shan
@ 2020-04-10  8:58 ` Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 6/7] kvm/arm64: Support async page fault Gavin Shan
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2020-04-10  8:58 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, drjones, suzuki.poulose, maz, sudeep.holla,
	eric.auger, james.morse, shan.gavin, catalin.marinas, will,
	linux-arm-kernel

The data abort will be used as signal by the asynchronous page fault.
However, the specific IMPDEF Data Fault Status Code (DFSC) is used.
Currently, there is no API to inject data abort with specific DSC.
This fixes the gap by introducing kvm_inject_dabt_with_dfsc().

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/kvm_emulate.h |  4 ++++
 arch/arm64/kvm/inject_fault.c        | 34 ++++++++++++++++++++++++----
 virt/kvm/arm/aarch32.c               | 27 +++++++++++++++-------
 3 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index 2873bf6dc85e..fdf6a01b9dcb 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -31,9 +31,13 @@ void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_vabt(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
+void kvm_inject_dabt_with_dfsc(struct kvm_vcpu *vcpu,
+			       unsigned long addr, unsigned int dfsc);
 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
 void kvm_inject_undef32(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr);
+void kvm_inject_dabt32_with_dfsc(struct kvm_vcpu *vcpu,
+				 unsigned long addr, unsigned int dfsc);
 void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr);
 
 static __always_inline bool vcpu_el1_is_32bit(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/kvm/inject_fault.c b/arch/arm64/kvm/inject_fault.c
index 0ae7c2e40e02..35794d0de0e9 100644
--- a/arch/arm64/kvm/inject_fault.c
+++ b/arch/arm64/kvm/inject_fault.c
@@ -110,7 +110,9 @@ static unsigned long get_except64_pstate(struct kvm_vcpu *vcpu)
 	return new;
 }
 
-static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr)
+static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt,
+			 unsigned long addr, bool dfsc_valid,
+			 unsigned int dfsc)
 {
 	unsigned long cpsr = *vcpu_cpsr(vcpu);
 	bool is_aarch32 = vcpu_mode_is_32bit(vcpu);
@@ -143,7 +145,12 @@ static void inject_abt64(struct kvm_vcpu *vcpu, bool is_iabt, unsigned long addr
 	if (!is_iabt)
 		esr |= ESR_ELx_EC_DABT_LOW << ESR_ELx_EC_SHIFT;
 
-	vcpu_write_sys_reg(vcpu, esr | ESR_ELx_FSC_EXTABT, ESR_EL1);
+	if (dfsc_valid)
+		esr |= dfsc;
+	else
+		esr |= ESR_ELx_FSC_EXTABT;
+
+	vcpu_write_sys_reg(vcpu, esr, ESR_EL1);
 }
 
 static void inject_undef64(struct kvm_vcpu *vcpu)
@@ -180,7 +187,26 @@ void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr)
 	if (vcpu_el1_is_32bit(vcpu))
 		kvm_inject_dabt32(vcpu, addr);
 	else
-		inject_abt64(vcpu, false, addr);
+		inject_abt64(vcpu, false, addr, false, 0);
+}
+
+/**
+ * kvm_inject_dabt_with_dfsc - inject a data abort into the guest
+ * @vcpu: The VCPU to receive the data abort
+ * @addr: The address to report in the DFAR
+ * @dfsc: The data fault status code to be reported in DFSR
+ *
+ * It is assumed that this code is called from the VCPU thread and that the
+ * VCPU therefore is not currently executing guest code.
+ */
+void kvm_inject_dabt_with_dfsc(struct kvm_vcpu *vcpu,
+			       unsigned long addr,
+			       unsigned int dfsc)
+{
+	if (vcpu_el1_is_32bit(vcpu))
+		kvm_inject_dabt32_with_dfsc(vcpu, addr, dfsc);
+	else
+		inject_abt64(vcpu, false, addr, true, dfsc);
 }
 
 /**
@@ -196,7 +222,7 @@ void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr)
 	if (vcpu_el1_is_32bit(vcpu))
 		kvm_inject_pabt32(vcpu, addr);
 	else
-		inject_abt64(vcpu, true, addr);
+		inject_abt64(vcpu, true, addr, false, 0);
 }
 
 /**
diff --git a/virt/kvm/arm/aarch32.c b/virt/kvm/arm/aarch32.c
index 0a356aa91aa1..82bded4cab25 100644
--- a/virt/kvm/arm/aarch32.c
+++ b/virt/kvm/arm/aarch32.c
@@ -163,7 +163,8 @@ void kvm_inject_undef32(struct kvm_vcpu *vcpu)
  * pseudocode.
  */
 static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
-			 unsigned long addr)
+			 unsigned long addr, bool dfsc_valid,
+			 unsigned int dfsc)
 {
 	u32 vect_offset;
 	u32 *far, *fsr;
@@ -184,21 +185,31 @@ static void inject_abt32(struct kvm_vcpu *vcpu, bool is_pabt,
 	*far = addr;
 
 	/* Give the guest an IMPLEMENTATION DEFINED exception */
-	is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
-	if (is_lpae) {
-		*fsr = DFSR_LPAE | DFSR_FSC_EXTABT_LPAE;
+	if (dfsc_valid) {
+		*fsr = dfsc;
 	} else {
-		/* no need to shuffle FS[4] into DFSR[10] as its 0 */
-		*fsr = DFSR_FSC_EXTABT_nLPAE;
+		is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31);
+		if (is_lpae) {
+			*fsr = DFSR_LPAE | DFSR_FSC_EXTABT_LPAE;
+		} else {
+			/* no need to shuffle FS[4] into DFSR[10] as its 0 */
+			*fsr = DFSR_FSC_EXTABT_nLPAE;
+		}
 	}
 }
 
 void kvm_inject_dabt32(struct kvm_vcpu *vcpu, unsigned long addr)
 {
-	inject_abt32(vcpu, false, addr);
+	inject_abt32(vcpu, false, addr, false, 0);
+}
+
+void kvm_inject_dabt32_with_dfsc(struct kvm_vcpu *vcpu,
+				 unsigned long addr, unsigned int dfsc)
+{
+	inject_abt32(vcpu, false, addr, true, dfsc);
 }
 
 void kvm_inject_pabt32(struct kvm_vcpu *vcpu, unsigned long addr)
 {
-	inject_abt32(vcpu, true, addr);
+	inject_abt32(vcpu, true, addr, false, 0);
 }
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFCv1 6/7] kvm/arm64: Support async page fault
  2020-04-10  8:58 [PATCH RFCv1 0/7] Support Async Page Fault Gavin Shan
                   ` (4 preceding siblings ...)
  2020-04-10  8:58 ` [PATCH RFCv1 5/7] kvm/arm64: Allow inject data abort with specified DFSC Gavin Shan
@ 2020-04-10  8:58 ` Gavin Shan
  2020-04-10  8:58 ` [PATCH RFCv1 7/7] arm64: " Gavin Shan
  2020-04-10 12:52 ` [PATCH RFCv1 0/7] Support Async Page Fault Marc Zyngier
  7 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2020-04-10  8:58 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, drjones, suzuki.poulose, maz, sudeep.holla,
	eric.auger, james.morse, shan.gavin, catalin.marinas, will,
	linux-arm-kernel

There are two stages of page faults and the stage one page fault is
handled by guest itself. The guest is trapped to host when the page
fault is caused by stage 2 page table, for example missing. The guest
is suspended until the requested page is populated. To populate the
requested page can be related to IO activities if the page was swapped
out previously. In this case, the guest has to suspend for a few of
milliseconds at least, regardless of the overall system load.

This adds support to asychornous page fault to improve above situation.
A signal (PAGE_NOT_PRESENT) is sent to guest. Guest might reschedule to
another running process if rescheduling is allowed. Otherwise, the CPU
is put into power-saving mode, which is actually to cause vCPU reschedule
from host's view. Another signal (PAGE_READY) is sent to guest once the
requested page is populated. The suspended task is scheduled or waken up
when guest receives the signal. There are more details highlighted as
below. Note the implementation is pretty similar to what x86 has.

   * Signal (PAGE_NOT_PRESENT) is sent to guest if the requested page
     isn't ready. In the mean while, a work is started to populate the
     page asynchronously in background. The stage 2 page table entry is
     updated accordingly and another signal (PAGE_READY) is fired after
     the request page is populted.

   * A IMPDEF system register (SYS_ASYNC_PF_EL1) is added. The register
     accepts the physical address of control block, which is 64-bits
     aligned and represented by struct kvm_vcpu_pv_apf_data. The low bits
     of the control block's physical address are used to enable/disable
     asynchronous page fault, enable the requested features etc.

   * A hash table whose key is gfn is maintained for each vCPU, to avoid
     duplicate signals will be fired for one gfn.

   * The signal is conveyed through data abort with IMPDEF Data Fault
     Status Code (DFSC), which is 0x34. the specific events are stored
     in the control block, waiting for guest to read.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/include/asm/kvm_host.h      |  42 ++++
 arch/arm64/include/asm/kvm_para.h      |  27 +++
 arch/arm64/include/asm/sysreg.h        |   3 +
 arch/arm64/include/uapi/asm/Kbuild     |   3 -
 arch/arm64/include/uapi/asm/kvm_para.h |  22 ++
 arch/arm64/kvm/Kconfig                 |   1 +
 arch/arm64/kvm/Makefile                |   2 +
 arch/arm64/kvm/sys_regs.c              |  53 +++++
 virt/kvm/arm/arm.c                     |  32 ++-
 virt/kvm/arm/async_pf.c                | 290 +++++++++++++++++++++++++
 virt/kvm/arm/mmu.c                     |  29 ++-
 11 files changed, 498 insertions(+), 6 deletions(-)
 create mode 100644 arch/arm64/include/asm/kvm_para.h
 delete mode 100644 arch/arm64/include/uapi/asm/Kbuild
 create mode 100644 arch/arm64/include/uapi/asm/kvm_para.h
 create mode 100644 virt/kvm/arm/async_pf.c

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index f77c706777ec..24fbfa36a951 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -250,6 +250,23 @@ struct vcpu_reset_state {
 	bool		reset;
 };
 
+#ifdef CONFIG_KVM_ASYNC_PF
+
+/* Should be a power of two number */
+#define ASYNC_PF_PER_VCPU	64
+
+/*
+ * The association of gfn and token. The token will be sent to guest as
+ * page fault address. Also, the guest could be in aarch32 mode. So its
+ * length should be 32-bits.
+ */
+struct kvm_arch_async_pf {
+	u32     token;
+	gfn_t   gfn;
+	u32	esr;
+};
+#endif /* CONFIG_KVM_ASYNC_PF */
+
 struct kvm_vcpu_arch {
 	struct kvm_cpu_context ctxt;
 	void *sve_state;
@@ -351,6 +368,16 @@ struct kvm_vcpu_arch {
 		u64 last_steal;
 		gpa_t base;
 	} steal;
+
+#ifdef CONFIG_KVM_ASYNC_PF
+	struct {
+		struct gfn_to_hva_cache	cache;
+		gfn_t			gfns[ASYNC_PF_PER_VCPU];
+		u64			msr_val;
+		u16			id;
+		bool			send_user_only;
+	} apf;
+#endif
 };
 
 /* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
@@ -604,6 +631,21 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
 static inline void __cpu_init_stage2(void) {}
 
+#ifdef CONFIG_KVM_ASYNC_PF
+bool kvm_async_pf_hash_find(struct kvm_vcpu *vcpu, gfn_t gfn);
+bool kvm_arch_can_inject_async_page_not_present(struct kvm_vcpu *vcpu);
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu);
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, u32 esr,
+			    gpa_t gpa, gfn_t gfn);
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work);
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work);
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work);
+int kvm_async_pf_update_reg(struct kvm_vcpu *vcpu, u64 data);
+#endif /* CONFIG_KVM_ASYNC_PF */
+
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
 void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/kvm_para.h b/arch/arm64/include/asm/kvm_para.h
new file mode 100644
index 000000000000..0ea481dd1c7a
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_para.h
@@ -0,0 +1,27 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM_KVM_PARA_H
+#define _ASM_ARM_KVM_PARA_H
+
+#include <uapi/asm/kvm_para.h>
+
+static inline bool kvm_check_and_clear_guest_paused(void)
+{
+	return false;
+}
+
+static inline unsigned int kvm_arch_para_features(void)
+{
+	return 0;
+}
+
+static inline unsigned int kvm_arch_para_hints(void)
+{
+	return 0;
+}
+
+static inline bool kvm_para_available(void)
+{
+	return false;
+}
+
+#endif /* _ASM_ARM_KVM_PARA_H */
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ebc622432831..cdc6adbb4feb 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -520,6 +520,9 @@
 #define SYS_CNTV_CTL_EL02		sys_reg(3, 5, 14, 3, 1)
 #define SYS_CNTV_CVAL_EL02		sys_reg(3, 5, 14, 3, 2)
 
+/* Asynchronous Page Fault */
+#define SYS_ASYNC_PF_EL1		sys_reg(3, 7, 15, 15, 7)
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_DSSBS	(BIT(44))
 #define SCTLR_ELx_ENIA	(BIT(31))
diff --git a/arch/arm64/include/uapi/asm/Kbuild b/arch/arm64/include/uapi/asm/Kbuild
deleted file mode 100644
index 602d137932dc..000000000000
--- a/arch/arm64/include/uapi/asm/Kbuild
+++ /dev/null
@@ -1,3 +0,0 @@
-# SPDX-License-Identifier: GPL-2.0
-
-generic-y += kvm_para.h
diff --git a/arch/arm64/include/uapi/asm/kvm_para.h b/arch/arm64/include/uapi/asm/kvm_para.h
new file mode 100644
index 000000000000..ed94ad870a92
--- /dev/null
+++ b/arch/arm64/include/uapi/asm/kvm_para.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _UAPI_ASM_ARM_KVM_PARA_H
+#define _UAPI_ASM_ARM_KVM_PARA_H
+
+#include <linux/types.h>
+
+#define KVM_FEATURE_ASYNC_PF	0
+
+/* Async PF */
+#define KVM_ASYNC_PF_DFSC		0x34
+#define KVM_ASYNC_PF_ENABLED		(1 << 0)
+#define KVM_ASYNC_PF_SEND_ALWAYS	(1 << 1)
+#define KVM_PV_REASON_PAGE_NOT_PRESENT	1
+#define KVM_PV_REASON_PAGE_READY	2
+
+struct kvm_vcpu_pv_apf_data {
+	__u32	reason;
+	__u8	pad[60];
+	__u32	enabled;
+};
+
+#endif /* _UAPI_ASM_ARM_KVM_PARA_H */
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 449386d76441..1053e16b1739 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -34,6 +34,7 @@ config KVM
 	select KVM_VFIO
 	select HAVE_KVM_EVENTFD
 	select HAVE_KVM_IRQFD
+	select KVM_ASYNC_PF
 	select KVM_ARM_PMU if HW_PERF_EVENTS
 	select HAVE_KVM_MSI
 	select HAVE_KVM_IRQCHIP
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 5ffbdc39e780..3be24c1e401f 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -37,3 +37,5 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-debug.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/irqchip.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
 kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o
+kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
+kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/arm/async_pf.o
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 58f81ab519af..c7d6003473e9 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1340,6 +1340,55 @@ static bool access_ccsidr(struct kvm_vcpu *vcpu, struct sys_reg_params *p,
 	return true;
 }
 
+#ifdef CONFIG_KVM_ASYNC_PF
+static bool access_async_pf(struct kvm_vcpu *vcpu,
+			    struct sys_reg_params *p,
+			    const struct sys_reg_desc *r)
+{
+	int ret;
+
+	if (!p->is_write) {
+		p->regval = vcpu->arch.apf.msr_val;
+		return true;
+	}
+
+	ret = kvm_async_pf_update_reg(vcpu, p->regval);
+	return ret ? false : true;
+}
+
+static void reset_async_pf(struct kvm_vcpu *vcpu,
+			   const struct sys_reg_desc *r)
+{
+	kvm_async_pf_update_reg(vcpu, 0);
+}
+
+static int get_user_async_pf(struct kvm_vcpu *vcpu,
+			     const struct sys_reg_desc *r,
+			     const struct kvm_one_reg *reg,
+			     void __user *uaddr)
+{
+	u64 val = vcpu->arch.apf.msr_val;
+
+	return reg_to_user(uaddr, &val, reg->id);
+}
+
+static int put_user_async_pf(struct kvm_vcpu *vcpu,
+			     const struct sys_reg_desc *r,
+			     const struct kvm_one_reg *reg,
+			     void __user *uaddr)
+{
+	const u64 id = sys_reg_to_index(r);
+	u64 val;
+	int ret;
+
+	ret = reg_from_user(&val, uaddr, id);
+	if (ret)
+		return ret;
+
+	return kvm_async_pf_update_reg(vcpu, val);
+}
+#endif /* CONFIG_KVM_ASYNC_PF */
+
 /* sys_reg_desc initialiser for known cpufeature ID registers */
 #define ID_SANITISED(name) {			\
 	SYS_DESC(SYS_##name),			\
@@ -1740,6 +1789,10 @@ static const struct sys_reg_desc sys_reg_descs[] = {
 	{ SYS_DESC(SYS_DACR32_EL2), NULL, reset_unknown, DACR32_EL2 },
 	{ SYS_DESC(SYS_IFSR32_EL2), NULL, reset_unknown, IFSR32_EL2 },
 	{ SYS_DESC(SYS_FPEXC32_EL2), NULL, reset_val, FPEXC32_EL2, 0x700 },
+#ifdef CONFIG_KVM_ASYNC_PF
+	{ SYS_DESC(SYS_ASYNC_PF_EL1), access_async_pf, reset_async_pf,
+	  SYS_ASYNC_PF_EL1, 0, get_user_async_pf, put_user_async_pf },
+#endif
 };
 
 static bool trap_dbgidr(struct kvm_vcpu *vcpu,
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 2cbb57485760..5767953879f9 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -222,6 +222,11 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		 */
 		r = 1;
 		break;
+#ifdef CONFIG_KVM_ASYNC_PF
+	case KVM_CAP_ASYNC_PF:
+		r = 1;
+		break;
+#endif
 	default:
 		r = kvm_arch_vm_ioctl_check_extension(kvm, ext);
 		break;
@@ -426,8 +431,27 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
 	bool irq_lines = *vcpu_hcr(v) & (HCR_VI | HCR_VF);
-	return ((irq_lines || kvm_vgic_vcpu_pending_irq(v))
-		&& !v->arch.power_off && !v->arch.pause);
+
+	if ((irq_lines || kvm_vgic_vcpu_pending_irq(v)) &&
+	    !v->arch.power_off && !v->arch.pause)
+		return true;
+
+#ifdef CONFIG_KVM_ASYNC_PF
+	if (v->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED) {
+		u32 val;
+		int ret;
+
+		if (!list_empty_careful(&v->async_pf.done))
+			return true;
+
+		ret = kvm_read_guest_cached(v->kvm, &v->arch.apf.cache,
+					    &val, sizeof(val));
+		if (ret || val)
+			return true;
+	}
+#endif
+
+	return false;
 }
 
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu)
@@ -683,6 +707,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		check_vcpu_requests(vcpu);
 
+#ifdef CONFIG_KVM_ASYNC_PF
+		kvm_check_async_pf_completion(vcpu);
+#endif
+
 		/*
 		 * Preparing the interrupts to be injected also
 		 * involves poking the GIC, which must be done in a
diff --git a/virt/kvm/arm/async_pf.c b/virt/kvm/arm/async_pf.c
new file mode 100644
index 000000000000..929d11d1b566
--- /dev/null
+++ b/virt/kvm/arm/async_pf.c
@@ -0,0 +1,290 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Asynchronous Page Fault Support
+ *
+ * Copyright (C) 2020 Red Hat, Inc., Gavin Shan
+ *
+ * Based on arch/x86/kernel/kvm.c
+ */
+
+#include <linux/kvm_host.h>
+#include <asm/kvm_emulate.h>
+
+static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
+{
+	return hash_32(gfn & 0xffffffff, order_base_2(ASYNC_PF_PER_VCPU));
+}
+
+static inline u32 kvm_async_pf_hash_next(u32 key)
+{
+	return (key + 1) & (ASYNC_PF_PER_VCPU - 1);
+}
+
+static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
+{
+	int i;
+
+	for (i = 0; i < ASYNC_PF_PER_VCPU; i++)
+		vcpu->arch.apf.gfns[i] = ~0;
+}
+
+/*
+ * Add gfn to the hash table. It's ensured there is a free entry
+ * when this function is called.
+ */
+static void kvm_async_pf_hash_add(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	u32 key = kvm_async_pf_hash_fn(gfn);
+
+	while (vcpu->arch.apf.gfns[key] != ~0)
+		key = kvm_async_pf_hash_next(key);
+
+	vcpu->arch.apf.gfns[key] = gfn;
+}
+
+static u32 kvm_async_pf_hash_slot(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	u32 key = kvm_async_pf_hash_fn(gfn);
+	int i;
+
+	for (i = 0; i < ASYNC_PF_PER_VCPU; i++) {
+		if (vcpu->arch.apf.gfns[key] == gfn ||
+		    vcpu->arch.apf.gfns[key] == ~0)
+			break;
+
+		key = kvm_async_pf_hash_next(key);
+	}
+
+	return key;
+}
+
+static void kvm_async_pf_hash_remove(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	u32 i, j, k;
+
+	i = j = kvm_async_pf_hash_slot(vcpu, gfn);
+	while (true) {
+		vcpu->arch.apf.gfns[i] = ~0;
+
+		do {
+			j = kvm_async_pf_hash_next(j);
+			if (vcpu->arch.apf.gfns[j] == ~0)
+				return;
+
+			k = kvm_async_pf_hash_fn(vcpu->arch.apf.gfns[j]);
+			/*
+			 * k lies cyclically in ]i,j]
+			 * |    i.k.j |
+			 * |....j i.k.| or  |.k..j i...|
+			 */
+		} while ((i <= j) ? (i < k && k <= j) : (i < k || k <= j));
+
+		vcpu->arch.apf.gfns[i] = vcpu->arch.apf.gfns[j];
+		i = j;
+	}
+}
+
+bool kvm_async_pf_hash_find(struct kvm_vcpu *vcpu, gfn_t gfn)
+{
+	u32 key = kvm_async_pf_hash_slot(vcpu, gfn);
+
+	return vcpu->arch.apf.gfns[key] == gfn;
+}
+
+static inline int kvm_async_pf_read_cache(struct kvm_vcpu *vcpu, u32 *val)
+{
+	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.apf.cache,
+				     val, sizeof(*val));
+}
+
+static inline int kvm_async_pf_write_cache(struct kvm_vcpu *vcpu, u32 val)
+{
+	return kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.apf.cache,
+				      &val, sizeof(val));
+}
+
+bool kvm_arch_can_inject_async_page_not_present(struct kvm_vcpu *vcpu)
+{
+	u64 vbar;
+	unsigned long pc;
+	u32 val;
+	int ret;
+
+	if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
+		return false;
+
+	if (vcpu->arch.apf.send_user_only && vcpu_mode_priv(vcpu))
+		return false;
+
+	/* Pending page fault, which ins't acknowledged by guest */
+	ret = kvm_async_pf_read_cache(vcpu, &val);
+	if (ret || val)
+		return false;
+
+	/*
+	 * Events can't be injected through data abort because it's
+	 * going to clobber ELR_EL1, which might not consumed by the
+	 * guest yet.
+	 */
+	vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
+	pc = *vcpu_pc(vcpu);
+	if (pc >= vbar && pc < (vbar + 0x800))
+		return false;
+
+	return true;
+}
+
+bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
+{
+	u64 vbar;
+	unsigned long pc;
+	u32 val;
+	int ret;
+
+	if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
+		return true;
+
+	/* Pending page fault, which ins't acknowledged by guest */
+	ret = kvm_async_pf_read_cache(vcpu, &val);
+	if (ret || val)
+		return false;
+
+	/*
+	 * Events can't be injected through data abort because it's
+	 * going to clobber ELR_EL1, which might not consumed by the
+	 * guest yet.
+	 */
+	vbar = vcpu_read_sys_reg(vcpu, VBAR_EL1);
+	pc = *vcpu_pc(vcpu);
+	if (pc >= vbar && pc < (vbar + 0x800))
+		return false;
+
+	return true;
+}
+
+int kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, u32 esr,
+			    gpa_t gpa, gfn_t gfn)
+{
+	struct kvm_arch_async_pf arch;
+	unsigned long hva = kvm_vcpu_gfn_to_hva(vcpu, gfn);
+
+	arch.token = (vcpu->arch.apf.id++ << 16) | vcpu->vcpu_id;
+	arch.gfn = gfn;
+	arch.esr = esr;
+
+	return kvm_setup_async_pf(vcpu, gpa, hva, &arch);
+}
+
+/*
+ * It's garanteed that no pending asynchronous page fault when this is
+ * called. It means all previous issued asynchronous page faults have
+ * been acknoledged.
+ */
+void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
+				     struct kvm_async_pf *work)
+{
+	int ret;
+
+	kvm_async_pf_hash_add(vcpu, work->arch.gfn);
+	ret = kvm_async_pf_write_cache(vcpu, KVM_PV_REASON_PAGE_NOT_PRESENT);
+	if (ret) {
+		kvm_err("%s: Error %d writing cache\n", __func__, ret);
+		kvm_async_pf_hash_remove(vcpu, work->arch.gfn);
+		return;
+	}
+
+	kvm_inject_dabt_with_dfsc(vcpu, work->arch.token, KVM_ASYNC_PF_DFSC);
+}
+
+/*
+ * It's garanteed that no pending asynchronous page fault when this is
+ * called. It means all previous issued asynchronous page faults have
+ * been acknoledged.
+ */
+void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
+				 struct kvm_async_pf *work)
+{
+	int ret;
+
+	/* Broadcast wakeup */
+	if (work->wakeup_all)
+		work->arch.token = ~0;
+	else
+		kvm_async_pf_hash_remove(vcpu, work->arch.gfn);
+
+	ret = kvm_async_pf_write_cache(vcpu, KVM_PV_REASON_PAGE_READY);
+	if (ret) {
+		kvm_err("%s: Error %d writing cache\n", __func__, ret);
+		return;
+	}
+
+	kvm_inject_dabt_with_dfsc(vcpu, work->arch.token, KVM_ASYNC_PF_DFSC);
+}
+
+void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu,
+			       struct kvm_async_pf *work)
+{
+	struct kvm_memory_slot *memslot;
+	unsigned int esr = work->arch.esr;
+	phys_addr_t gpa = work->cr2_or_gpa;
+	gfn_t gfn = gpa >> PAGE_SHIFT;
+	unsigned long hva;
+	bool write_fault, writable;
+	int idx;
+
+	/*
+	 * The gpa was validated before the work is started. However, the
+	 * memory slots might be changed since then. So we need to redo the
+	 * validatation here.
+	 */
+	idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+	write_fault = kvm_is_write_fault(esr);
+	memslot = gfn_to_memslot(vcpu->kvm, gfn);
+	hva = gfn_to_hva_memslot_prot(memslot, gfn, &writable);
+	if (kvm_is_error_hva(hva) || (write_fault && !writable))
+		goto out;
+
+	kvm_handle_user_mem_abort(vcpu, esr, memslot, gpa, hva, true);
+
+out:
+	srcu_read_unlock(&vcpu->kvm->srcu, idx);
+}
+
+int kvm_async_pf_update_reg(struct kvm_vcpu *vcpu, u64 data)
+{
+	bool enabled, enable;
+	gpa_t gpa = (data & ~0x3F);
+	int ret;
+
+	enabled = !!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED);
+	enable = !!(data & KVM_ASYNC_PF_ENABLED);
+	if (enable == enabled) {
+		kvm_debug("%s: Async PF has been %s (0x%llx -> 0x%llx)\n",
+			  __func__, enabled ? "enabled" : "disabled",
+			  vcpu->arch.apf.msr_val, data);
+		return 0;
+	}
+
+	if (enable) {
+		ret = kvm_gfn_to_hva_cache_init(vcpu->kvm,
+						&vcpu->arch.apf.cache,
+						gpa, sizeof(u32));
+		if (ret) {
+			kvm_err("%s: Error %d initializing cache on 0x%llx\n",
+				__func__, ret, data);
+			return ret;
+		}
+
+		kvm_async_pf_hash_reset(vcpu);
+		vcpu->arch.apf.send_user_only =
+			!(data & KVM_ASYNC_PF_SEND_ALWAYS);
+		kvm_async_pf_wakeup_all(vcpu);
+		vcpu->arch.apf.msr_val = data;
+	} else {
+		kvm_clear_async_pf_completion_queue(vcpu);
+		vcpu->arch.apf.msr_val = data;
+	}
+
+	return 0;
+}
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 95aaabb2b1fc..a303815845a2 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1656,6 +1656,30 @@ static bool fault_supports_stage2_huge_mapping(struct kvm_memory_slot *memslot,
 	       (hva & ~(map_size - 1)) + map_size <= uaddr_end;
 }
 
+static bool try_async_pf(struct kvm_vcpu *vcpu, u32 esr, gpa_t gpa,
+			 gfn_t gfn, kvm_pfn_t *pfn, bool write,
+			 bool *writable, bool prefault)
+{
+	struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn);
+#ifdef CONFIG_KVM_ASYNC_PF
+	bool async = false;
+
+	/* Bail if *pfn has correct page */
+	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async, write, writable);
+	if (!async)
+		return false;
+
+	if (!prefault && kvm_arch_can_inject_async_page_not_present(vcpu)) {
+		if (kvm_async_pf_hash_find(vcpu, gfn) ||
+		    kvm_arch_setup_async_pf(vcpu, esr, gpa, gfn))
+			return true;
+	}
+#endif
+
+	*pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL, write, writable);
+	return false;
+}
+
 int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu, unsigned int esr,
 			      struct kvm_memory_slot *memslot,
 			      phys_addr_t fault_ipa, unsigned long hva,
@@ -1737,7 +1761,10 @@ int kvm_handle_user_mem_abort(struct kvm_vcpu *vcpu, unsigned int esr,
 	 */
 	smp_rmb();
 
-	pfn = gfn_to_pfn_prot(kvm, gfn, write_fault, &writable);
+	if (try_async_pf(vcpu, esr, fault_ipa, gfn, &pfn,
+			 write_fault, &writable, prefault))
+		return 1;
+
 	if (pfn == KVM_PFN_ERR_HWPOISON) {
 		kvm_send_hwpoison_signal(hva, vma_shift);
 		return 0;
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH RFCv1 7/7] arm64: Support async page fault
  2020-04-10  8:58 [PATCH RFCv1 0/7] Support Async Page Fault Gavin Shan
                   ` (5 preceding siblings ...)
  2020-04-10  8:58 ` [PATCH RFCv1 6/7] kvm/arm64: Support async page fault Gavin Shan
@ 2020-04-10  8:58 ` Gavin Shan
  2020-04-10 12:52 ` [PATCH RFCv1 0/7] Support Async Page Fault Marc Zyngier
  7 siblings, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2020-04-10  8:58 UTC (permalink / raw)
  To: kvmarm
  Cc: mark.rutland, drjones, suzuki.poulose, maz, sudeep.holla,
	eric.auger, james.morse, shan.gavin, catalin.marinas, will,
	linux-arm-kernel

This supports asynchronous page fault for the guest. The design is
similar to what x86 has: on receiving a PAGE_NOT_PRESENT signal from
the hypervisor, the current task is either rescheduled or put into
power-saving mode. The task will be waken up when PAGE_READY signal
is received.

The signals are conveyed through data abort with specific (IMPDEF)
Data Fault Status Code (DFSC). Besides, a hash table is introduced
to track the processes that have been put into waiting state, to
avoid out-of-consistency.

The feature is put into the CONFIG_KVM_GUEST umbrella, which is added
by this patch.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/Kconfig                 |  11 ++
 arch/arm64/include/asm/exception.h |   5 +
 arch/arm64/include/asm/kvm_para.h  |  42 ++++-
 arch/arm64/kernel/smp.c            |  47 ++++++
 arch/arm64/mm/fault.c              | 239 ++++++++++++++++++++++++++++-
 5 files changed, 336 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 40fb05d96c60..2d5e5ee62d6d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -1045,6 +1045,17 @@ config PARAVIRT
 	  under a hypervisor, potentially improving performance significantly
 	  over full virtualization.
 
+config KVM_GUEST
+	bool "KVM Guest Support"
+	depends on PARAVIRT
+	default y
+	help
+	  This option enables various optimizations for running under the KVM
+	  hypervisor. Overhead for the kernel when not running inside KVM should
+	  be minimal.
+
+	  In case of doubt, say Y
+
 config PARAVIRT_TIME_ACCOUNTING
 	bool "Paravirtual steal time accounting"
 	select PARAVIRT
diff --git a/arch/arm64/include/asm/exception.h b/arch/arm64/include/asm/exception.h
index 7a6e81ca23a8..17ac2db36472 100644
--- a/arch/arm64/include/asm/exception.h
+++ b/arch/arm64/include/asm/exception.h
@@ -46,4 +46,9 @@ void bad_el0_sync(struct pt_regs *regs, int reason, unsigned int esr);
 void do_cp15instr(unsigned int esr, struct pt_regs *regs);
 void do_el0_svc(struct pt_regs *regs);
 void do_el0_svc_compat(struct pt_regs *regs);
+
+#ifdef CONFIG_KVM_GUEST
+void kvm_pv_async_pf_enable(void);
+void kvm_pv_async_pf_disable(void);
+#endif /* CONFIG_KVM_GUEST */
 #endif	/* __ASM_EXCEPTION_H */
diff --git a/arch/arm64/include/asm/kvm_para.h b/arch/arm64/include/asm/kvm_para.h
index 0ea481dd1c7a..a43bed479c2b 100644
--- a/arch/arm64/include/asm/kvm_para.h
+++ b/arch/arm64/include/asm/kvm_para.h
@@ -3,6 +3,30 @@
 #define _ASM_ARM_KVM_PARA_H
 
 #include <uapi/asm/kvm_para.h>
+#include <linux/of.h>
+
+#ifdef CONFIG_KVM_GUEST
+static inline int kvm_para_available(void)
+{
+	struct device_node *hyper_node;
+	int ret = 0;
+
+	hyper_node = of_find_node_by_path("/hypervisor");
+	if (!hyper_node)
+		return 0;
+
+	if (of_device_is_compatible(hyper_node, "linux,kvm"))
+		ret = 1;
+
+	of_node_put(hyper_node);
+	return ret;
+}
+#else
+static inline int kvm_para_available(void)
+{
+	return 0;
+}
+#endif /* CONFIG_KVM_GUEST */
 
 static inline bool kvm_check_and_clear_guest_paused(void)
 {
@@ -11,17 +35,21 @@ static inline bool kvm_check_and_clear_guest_paused(void)
 
 static inline unsigned int kvm_arch_para_features(void)
 {
-	return 0;
+	struct device_node *hyper_node;
+	unsigned int features = 0;
+
+	if (!kvm_para_available())
+		return 0;
+
+	hyper_node = of_find_node_by_path("/hypervisor");
+	of_property_read_u32(hyper_node, "para-features", &features);
+	of_node_put(hyper_node);
+
+	return features;
 }
 
 static inline unsigned int kvm_arch_para_hints(void)
 {
 	return 0;
 }
-
-static inline bool kvm_para_available(void)
-{
-	return false;
-}
-
 #endif /* _ASM_ARM_KVM_PARA_H */
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 061f60fe452f..cc97a8462d7f 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -40,6 +40,7 @@
 #include <asm/cputype.h>
 #include <asm/cpu_ops.h>
 #include <asm/daifflags.h>
+#include <asm/exception.h>
 #include <asm/kvm_mmu.h>
 #include <asm/mmu_context.h>
 #include <asm/numa.h>
@@ -443,6 +444,38 @@ void __init smp_cpus_done(unsigned int max_cpus)
 	mark_linear_text_alias_ro();
 }
 
+#ifdef CONFIG_KVM_GUEST
+static void kvm_cpu_reboot(void *unused)
+{
+	kvm_pv_async_pf_disable();
+}
+
+static int kvm_cpu_reboot_notify(struct notifier_block *nb,
+				 unsigned long code, void *unused)
+{
+	if (code == SYS_RESTART)
+		on_each_cpu(kvm_cpu_reboot, NULL, 1);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_cpu_reboot_nb = {
+	.notifier_call = kvm_cpu_reboot_notify,
+};
+
+static int kvm_cpu_online(unsigned int cpu)
+{
+	kvm_pv_async_pf_enable();
+	return 0;
+}
+
+static int kvm_cpu_offline(unsigned int cpu)
+{
+	kvm_pv_async_pf_disable();
+	return 0;
+}
+#endif /* CONFIG_KVM_GUEST */
+
 void __init smp_prepare_boot_cpu(void)
 {
 	set_my_cpu_offset(per_cpu_offset(smp_processor_id()));
@@ -458,6 +491,20 @@ void __init smp_prepare_boot_cpu(void)
 	/* Conditionally switch to GIC PMR for interrupt masking */
 	if (system_uses_irq_prio_masking())
 		init_gic_priority_masking();
+
+
+	/* Enable async page fault */
+#ifdef CONFIG_KVM_GUEST
+	register_reboot_notifier(&kvm_cpu_reboot_nb);
+	if (cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+		"arm/kvm:online", kvm_cpu_online, kvm_cpu_offline) < 0) {
+		pr_warn("%s: Failed to install cpu hotplug callbacks\n",
+			__func__);
+		return;
+	}
+
+	kvm_pv_async_pf_enable();
+#endif /* CONFIG_KVM_GUEST */
 }
 
 static u64 __init of_get_cpu_mpidr(struct device_node *dn)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 1027851d469a..39c7570fe303 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -19,10 +19,12 @@
 #include <linux/page-flags.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/debug.h>
+#include <linux/swait.h>
 #include <linux/highmem.h>
 #include <linux/perf_event.h>
 #include <linux/preempt.h>
 #include <linux/hugetlb.h>
+#include <linux/kvm_para.h>
 
 #include <asm/acpi.h>
 #include <asm/bug.h>
@@ -48,8 +50,31 @@ struct fault_info {
 	const char *name;
 };
 
+#ifdef CONFIG_KVM_GUEST
+#define KVM_TASK_SLEEP_HASHBITS		8
+#define KVM_TASK_SLEEP_HASHSIZE	(1 << KVM_TASK_SLEEP_HASHBITS)
+
+struct kvm_task_sleep_node {
+	struct hlist_node	link;
+	struct swait_queue_head	wq;
+	u32			token;
+	int			cpu;
+	bool			halted;
+};
+
+struct kvm_task_sleep_head {
+	raw_spinlock_t		lock;
+	struct hlist_head	list;
+};
+#endif /* CONFIG_KVM_GUEST */
+
 static const struct fault_info fault_info[];
 static struct fault_info debug_fault_info[];
+#ifdef CONFIG_KVM_GUEST
+static DEFINE_PER_CPU(struct kvm_vcpu_pv_apf_data, apf_data) __aligned(64);
+static struct kvm_task_sleep_head async_pf_sleepers[KVM_TASK_SLEEP_HASHSIZE];
+static bool async_pf_initialized;
+#endif
 
 static inline const struct fault_info *esr_to_fault_info(unsigned int esr)
 {
@@ -623,6 +648,178 @@ static int do_alignment_fault(unsigned long addr, unsigned int esr,
 	return 0;
 }
 
+#ifdef CONFIG_KVM_GUEST
+static struct kvm_task_sleep_node *kvm_pv_async_pf_find(
+		struct kvm_task_sleep_head *b, u32 token)
+{
+	struct kvm_task_sleep_node *n;
+	struct hlist_node *p;
+
+	hlist_for_each(p, &b->list) {
+		n = hlist_entry(p, typeof(*n), link);
+		if (n->token == token)
+			return n;
+	}
+
+	return NULL;
+}
+
+static void kvm_pv_async_pf_wait(u32 token, int interrupt_kernel)
+{
+	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
+	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
+	struct kvm_task_sleep_node n, *e;
+	DECLARE_SWAITQUEUE(wait);
+
+	raw_spin_lock(&b->lock);
+	e = kvm_pv_async_pf_find(b, token);
+	if (e) {
+		/* dummy entry exist -> wake up was delivered ahead of PF */
+		hlist_del(&e->link);
+		kfree(e);
+		raw_spin_unlock(&b->lock);
+
+		return;
+	}
+
+	n.token = token;
+	n.cpu = smp_processor_id();
+	n.halted = is_idle_task(current) ||
+		   (IS_ENABLED(CONFIG_PREEMPT_COUNT) ?
+			preempt_count() > 1 || rcu_preempt_depth() :
+			interrupt_kernel);
+	init_swait_queue_head(&n.wq);
+	hlist_add_head(&n.link, &b->list);
+	raw_spin_unlock(&b->lock);
+
+	for (;;) {
+		if (!n.halted) {
+			prepare_to_swait_exclusive(&n.wq, &wait,
+						   TASK_UNINTERRUPTIBLE);
+		}
+
+		if (hlist_unhashed(&n.link))
+			break;
+
+		/*
+		 * Enable the IRQ explicitly. Otherwise, the task
+		 * won't be scheduled or waken up properly.
+		 */
+		local_irq_enable();
+
+		if (!n.halted) {
+			schedule();
+		} else {
+			dsb(sy);
+			wfi();
+		}
+
+		local_irq_disable();
+	}
+
+	if (!n.halted)
+		finish_swait(&n.wq, &wait);
+}
+
+static inline void kvm_pv_async_pf_wake_one(struct kvm_task_sleep_node *n)
+{
+	/* The task will be waken up once being detached */
+	hlist_del_init(&n->link);
+
+	if (!n->halted)
+		swake_up_one(&n->wq);
+	else
+		smp_send_reschedule(n->cpu);
+}
+
+static void kvm_pv_async_pf_wake_all(void)
+{
+	struct kvm_task_sleep_head *b;
+	struct kvm_task_sleep_node *n;
+	struct hlist_node *p, *next;
+	int i;
+
+	for (i = 0; i < KVM_TASK_SLEEP_HASHSIZE; i++) {
+		b = &async_pf_sleepers[i];
+
+		raw_spin_lock(&b->lock);
+
+		hlist_for_each_safe(p, next, &b->list) {
+			n = hlist_entry(p, typeof(*n), link);
+			if (n->cpu != smp_processor_id())
+				continue;
+
+			kvm_pv_async_pf_wake_one(n);
+		}
+
+		raw_spin_unlock(&b->lock);
+	}
+}
+
+static void kvm_pv_async_pf_wake(u32 token)
+{
+	u32 key = hash_32(token, KVM_TASK_SLEEP_HASHBITS);
+	struct kvm_task_sleep_head *b = &async_pf_sleepers[key];
+	struct kvm_task_sleep_node *n;
+
+	if (token == ~0) {
+		kvm_pv_async_pf_wake_all();
+		return;
+	}
+
+again:
+	raw_spin_lock(&b->lock);
+
+	n = kvm_pv_async_pf_find(b, token);
+	if (!n) {
+		/*
+		 * Async PF was not yet handled. Add dummy entry
+		 * for the token. Busy wait until other CPU handles
+		 * the async PF on allocation failure.
+		 */
+		n = kzalloc(sizeof(*n), GFP_ATOMIC);
+		if (!n) {
+			raw_spin_unlock(&b->lock);
+			cpu_relax();
+			goto again;
+		}
+		n->token = token;
+		n->cpu = smp_processor_id();
+		init_swait_queue_head(&n->wq);
+		hlist_add_head(&n->link, &b->list);
+	} else {
+		kvm_pv_async_pf_wake_one(n);
+	}
+
+	raw_spin_unlock(&b->lock);
+}
+#endif /* CONFIG_KVM_GUEST */
+
+static int do_lockdown(unsigned long addr, unsigned int esr,
+		       struct pt_regs *regs)
+{
+#ifdef CONFIG_KVM_GUEST
+	u32 reason = 0;
+
+	if (__this_cpu_read(apf_data.enabled)) {
+		reason = __this_cpu_read(apf_data.reason);
+		__this_cpu_write(apf_data.reason, 0);
+	}
+
+	switch (reason) {
+	case KVM_PV_REASON_PAGE_NOT_PRESENT:
+		kvm_pv_async_pf_wait((u32)addr, !user_mode(regs));
+		return 0;
+	case KVM_PV_REASON_PAGE_READY:
+		kvm_pv_async_pf_wake((u32)addr);
+		return 0;
+	}
+#endif /* CONFIG_KVM_GUEST */
+
+	pr_info("%s: addr=0x%lx, esr=0x%x\n", __func__, addr, esr);
+	return 1;
+}
+
 static int do_bad(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 {
 	return 1; /* "fault" */
@@ -703,7 +900,8 @@ static const struct fault_info fault_info[] = {
 	{ do_bad,		SIGKILL, SI_KERNEL,	"Unsupported atomic hardware update fault"	},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 50"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 51"			},
-	{ do_bad,		SIGKILL, SI_KERNEL,	"implementation fault (lockdown abort)" },
+	{ do_lockdown,		SIGKILL, SI_KERNEL,
+	  "implementation fault (lockdown abort)" },
 	{ do_bad,		SIGBUS,  BUS_OBJERR,	"implementation fault (unsupported exclusive)" },
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 54"			},
 	{ do_bad,		SIGKILL, SI_KERNEL,	"unknown 55"			},
@@ -878,3 +1076,42 @@ void do_debug_exception(unsigned long addr_if_watchpoint, unsigned int esr,
 	debug_exception_exit(regs);
 }
 NOKPROBE_SYMBOL(do_debug_exception);
+
+#ifdef CONFIG_KVM_GUEST
+void kvm_pv_async_pf_enable(void)
+{
+	u64 pa;
+	int i;
+
+	if (!kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) ||
+	     __this_cpu_read(apf_data.enabled))
+		return;
+
+	if (!async_pf_initialized) {
+		async_pf_initialized = true;
+		for (i = 0; i < KVM_TASK_SLEEP_HASHSIZE; i++)
+			raw_spin_lock_init(&async_pf_sleepers[i].lock);
+	}
+
+	/* FIXME: Enable KVM_ASYNC_PF_SEND_ALWAYS on CONFIG_PREEMPTION */
+	pa = virt_to_phys(this_cpu_ptr(&apf_data));
+	pa |= KVM_ASYNC_PF_ENABLED;
+
+	__this_cpu_write(apf_data.enabled, 1);
+	write_sysreg_s(pa, SYS_ASYNC_PF_EL1);
+
+	pr_info("Async PF enabled on CPU %d\n", smp_processor_id());
+}
+
+void kvm_pv_async_pf_disable(void)
+{
+	if (!kvm_para_has_feature(KVM_FEATURE_ASYNC_PF) ||
+	    !__this_cpu_read(apf_data.enabled))
+		return;
+
+	write_sysreg_s(0, SYS_ASYNC_PF_EL1);
+	__this_cpu_write(apf_data.enabled, 0);
+
+	pr_info("Async PF disabled on CPU %d\n", smp_processor_id());
+}
+#endif /* CONFIG_KVM_GUEST */
-- 
2.23.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH RFCv1 0/7] Support Async Page Fault
  2020-04-10  8:58 [PATCH RFCv1 0/7] Support Async Page Fault Gavin Shan
                   ` (6 preceding siblings ...)
  2020-04-10  8:58 ` [PATCH RFCv1 7/7] arm64: " Gavin Shan
@ 2020-04-10 12:52 ` Marc Zyngier
  2020-04-14  5:39   ` Gavin Shan
  7 siblings, 1 reply; 15+ messages in thread
From: Marc Zyngier @ 2020-04-10 12:52 UTC (permalink / raw)
  To: Gavin Shan
  Cc: mark.rutland, drjones, suzuki.poulose, catalin.marinas,
	eric.auger, james.morse, shan.gavin, sudeep.holla, will, kvmarm,
	linux-arm-kernel

Hi Gavin,

On 2020-04-10 09:58, Gavin Shan wrote:
> There are two stages of page faults and the stage one page fault is
> handled by guest itself. The guest is trapped to host when the page
> fault is caused by stage 2 page table, for example missing. The guest
> is suspended until the requested page is populated. To populate the
> requested page can be costly and might be related to IO activities
> if the page was swapped out previously. In this case, the guest has
> to suspend for a few of milliseconds at least, regardless of the
> overall system load.
> 
> The series adds support to asychornous page fault to improve above
> situation. If it's costly to populate the requested page, a signal
> (PAGE_NOT_PRESENT) is sent to guest so that the faulting process can
> be rescheduled if it can be. Otherwise, it is put into power-saving
> mode. Another signal (PAGE_READY) is sent to guest once the requested
> page is populated so that the faulting process can be waken up either
> from either waiting state or power-saving state.
> 
> In order to fulfil the control flow and convey signals between host
> and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced.
> The register accepts control block's physical address, plus requested
> features. Also, the signal is sent using data abort with the specific
> IMPDEF Data Fault Status Code (DFSC). The specific signal is stored
> in the control block by host, to be consumed by guest.
> 
> Todo
> ====
> * CONFIG_KVM_ASYNC_PF_SYNC is disabled for now because the exception
>   injection can't work in nested mode. It might be something to be
>   improved in future.
> * KVM_ASYNC_PF_SEND_ALWAYS is disabled even with CONFIG_PREEMPTION
>   because it's simply not working reliably.
> * Tracepoints, which should something to be done in short term.
> * kvm-unit-test cases.
> * More testing and debugging are needed. Sometimes, the guest can be
>   stuck and the root cause needs to be figured out.

Let me add another few things:

- KVM/arm is (supposed to be) an architectural hypervisor. It means
  that one of the design goal is to have as few differences as possible
  from the actual hardware. I'm not keen on deviating from it (next
  thing you know, you'll add all the PV horror from Xen, HV, VMware...). 

- The idea of butchering the arm64 mm subsystem to handle a new exotic
  style of exceptions is not something I am looking forward to. We
  might as well PV the whole MMU, Xen style, and be done with it. I'll
  let the arm64 maintainers comment on this though.

- We don't add IMPDEF sysregs, period. That's reserved for the HW. If
  you want to trap, there's the HVC instruction to that effect.

- If this is such a great improvement, where are the performance
  numbers?

- The fact that it apparently cannot work with nesting nor with
  preemption tends to indicate that it isn't future proof.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFCv1 0/7] Support Async Page Fault
  2020-04-10 12:52 ` [PATCH RFCv1 0/7] Support Async Page Fault Marc Zyngier
@ 2020-04-14  5:39   ` Gavin Shan
  2020-04-14 11:05     ` Mark Rutland
  0 siblings, 1 reply; 15+ messages in thread
From: Gavin Shan @ 2020-04-14  5:39 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: mark.rutland, drjones, suzuki.poulose, catalin.marinas,
	eric.auger, james.morse, shan.gavin, sudeep.holla, will, kvmarm,
	linux-arm-kernel

Hi Marc,

On 4/10/20 10:52 PM, Marc Zyngier wrote:
> Hi Gavin,
> 
> On 2020-04-10 09:58, Gavin Shan wrote:
>> There are two stages of page faults and the stage one page fault is
>> handled by guest itself. The guest is trapped to host when the page
>> fault is caused by stage 2 page table, for example missing. The guest
>> is suspended until the requested page is populated. To populate the
>> requested page can be costly and might be related to IO activities
>> if the page was swapped out previously. In this case, the guest has
>> to suspend for a few of milliseconds at least, regardless of the
>> overall system load.
>>
>> The series adds support to asychornous page fault to improve above
>> situation. If it's costly to populate the requested page, a signal
>> (PAGE_NOT_PRESENT) is sent to guest so that the faulting process can
>> be rescheduled if it can be. Otherwise, it is put into power-saving
>> mode. Another signal (PAGE_READY) is sent to guest once the requested
>> page is populated so that the faulting process can be waken up either
>> from either waiting state or power-saving state.
>>
>> In order to fulfil the control flow and convey signals between host
>> and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced.
>> The register accepts control block's physical address, plus requested
>> features. Also, the signal is sent using data abort with the specific
>> IMPDEF Data Fault Status Code (DFSC). The specific signal is stored
>> in the control block by host, to be consumed by guest.
>>
>> Todo
>> ====
>> * CONFIG_KVM_ASYNC_PF_SYNC is disabled for now because the exception
>>    injection can't work in nested mode. It might be something to be
>>    improved in future.
>> * KVM_ASYNC_PF_SEND_ALWAYS is disabled even with CONFIG_PREEMPTION
>>    because it's simply not working reliably.
>> * Tracepoints, which should something to be done in short term.
>> * kvm-unit-test cases.
>> * More testing and debugging are needed. Sometimes, the guest can be
>>    stuck and the root cause needs to be figured out.
> 
> Let me add another few things:
> 
> - KVM/arm is (supposed to be) an architectural hypervisor. It means
>    that one of the design goal is to have as few differences as possible
>    from the actual hardware. I'm not keen on deviating from it (next
>    thing you know, you'll add all the PV horror from Xen, HV, VMware...).
> 
> - The idea of butchering the arm64 mm subsystem to handle a new exotic
>    style of exceptions is not something I am looking forward to. We
>    might as well PV the whole MMU, Xen style, and be done with it. I'll
>    let the arm64 maintainers comment on this though.
> 

Thanks for your comments. The feature won't be enabled on guest side until
CONFIG_KVM_GUEST is enabled. More details can be found from PATCH[7/7]. So
it would be one specific features supported by KVM. I'm not familiar with
xen and would like to learn how MMU is para-virtualized there. Do you have
documents recommended to start with? Otherwise, I will try google later.

> - We don't add IMPDEF sysregs, period. That's reserved for the HW. If
>    you want to trap, there's the HVC instruction to that effect.
> 

Yes, HVC can be used for trapping as PV stolen time did. However, I guess
it's guarded by specification? For example, the para-virtualized time calls
are specified by DEN0057A, as highlighted in include/linux/arm-smccc.h.

/* Paravirtualised time calls (defined by ARM DEN0057A) */
#define ARM_SMCCC_HV_PV_TIME_FEATURES                           \
         ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,                 \
                            ARM_SMCCC_SMC_64,                    \
                            ARM_SMCCC_OWNER_STANDARD_HYP,        \
                            0x20)

#define ARM_SMCCC_HV_PV_TIME_ST                                 \
         ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL,                 \
                            ARM_SMCCC_SMC_64,                    \
                            ARM_SMCCC_OWNER_STANDARD_HYP,        \
                            0x21)

I really don't understand how IMPDEF sysreg is used by hardware vendors.
Do we have an existing functionality, which depends on IMPDEF sysreg?
I was thinking the IMPDEF sysreg can be used by software either, but
it seems I'm wrong.

> - If this is such a great improvement, where are the performance
>    numbers?
> 

Yep, Ineed. I'm still looking for appropriate workload currently and hopefully,
I can share performance data in RFCv2 :)

> - The fact that it apparently cannot work with nesting nor with
>    preemption tends to indicate that it isn't future proof.
> 

I didn't make myself clear about the nesting. The data abort exception is injected
by tweaking ELR_EL1/SPSR_EL1 if the guest is runing in 64-bits and EL1 mode. These
registers are loaded when the guest gets chance to run. However, it's impossible to
inject two (nested) data abort exception at once. It's something different from nested
VM.

There was a hot discusson about the preemption support. It's something in the TODO list
and needs to be sorted out in future.

https://lore.kernel.org/patchwork/patch/1206121/


> Thanks,
> 
> 	M.
> 

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFCv1 0/7] Support Async Page Fault
  2020-04-14  5:39   ` Gavin Shan
@ 2020-04-14 11:05     ` Mark Rutland
  2020-04-16  7:59       ` Gavin Shan
  0 siblings, 1 reply; 15+ messages in thread
From: Mark Rutland @ 2020-04-14 11:05 UTC (permalink / raw)
  To: Gavin Shan
  Cc: drjones, suzuki.poulose, Marc Zyngier, sudeep.holla, eric.auger,
	james.morse, shan.gavin, catalin.marinas, will, kvmarm,
	linux-arm-kernel

Hi Gavin,

On Tue, Apr 14, 2020 at 03:39:56PM +1000, Gavin Shan wrote:
> On 4/10/20 10:52 PM, Marc Zyngier wrote:
> > On 2020-04-10 09:58, Gavin Shan wrote:
> > > In order to fulfil the control flow and convey signals between host
> > > and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced.
> > > The register accepts control block's physical address, plus requested
> > > features. Also, the signal is sent using data abort with the specific
> > > IMPDEF Data Fault Status Code (DFSC). The specific signal is stored
> > > in the control block by host, to be consumed by guest.

> > - We don't add IMPDEF sysregs, period. That's reserved for the HW. If
> >    you want to trap, there's the HVC instruction to that effect.

> I really don't understand how IMPDEF sysreg is used by hardware vendors.
> Do we have an existing functionality, which depends on IMPDEF sysreg?
> I was thinking the IMPDEF sysreg can be used by software either, but
> it seems I'm wrong.

The key is in the name: an IMPLEMENTATION DEFINED register is defined by
the implementation (i.e. the specific CPU microarchitecture), so it's
wrong for software to come up with an arbitrary semantic as this will
differ from the implementation's defined semantic for the register.

Typically, IMP DEF resgisters are used for things that firmware needs to
do (e.g. enter/exit coherency), or for bringup-time debug (e.g. poking
into TLB/cache internals), and are not usually intended for general
purpose software.

Linux generally avoids the use of IMP DEF registers, but does so in some
cases (e.g. for PMUs) after FW explicitly describes that those are safe
to access.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFCv1 0/7] Support Async Page Fault
  2020-04-14 11:05     ` Mark Rutland
@ 2020-04-16  7:59       ` Gavin Shan
  2020-04-16  9:16         ` Mark Rutland
  0 siblings, 1 reply; 15+ messages in thread
From: Gavin Shan @ 2020-04-16  7:59 UTC (permalink / raw)
  To: Mark Rutland
  Cc: drjones, suzuki.poulose, Marc Zyngier, sudeep.holla, eric.auger,
	james.morse, shan.gavin, catalin.marinas, will, kvmarm,
	linux-arm-kernel

Hi Mark,

On 4/14/20 9:05 PM, Mark Rutland wrote:
> On Tue, Apr 14, 2020 at 03:39:56PM +1000, Gavin Shan wrote:
>> On 4/10/20 10:52 PM, Marc Zyngier wrote:
>>> On 2020-04-10 09:58, Gavin Shan wrote:
>>>> In order to fulfil the control flow and convey signals between host
>>>> and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced.
>>>> The register accepts control block's physical address, plus requested
>>>> features. Also, the signal is sent using data abort with the specific
>>>> IMPDEF Data Fault Status Code (DFSC). The specific signal is stored
>>>> in the control block by host, to be consumed by guest.
> 
>>> - We don't add IMPDEF sysregs, period. That's reserved for the HW. If
>>>     you want to trap, there's the HVC instruction to that effect.
> 
>> I really don't understand how IMPDEF sysreg is used by hardware vendors.
>> Do we have an existing functionality, which depends on IMPDEF sysreg?
>> I was thinking the IMPDEF sysreg can be used by software either, but
>> it seems I'm wrong.
> 
> The key is in the name: an IMPLEMENTATION DEFINED register is defined by
> the implementation (i.e. the specific CPU microarchitecture), so it's
> wrong for software to come up with an arbitrary semantic as this will
> differ from the implementation's defined semantic for the register.
> 
> Typically, IMP DEF resgisters are used for things that firmware needs to
> do (e.g. enter/exit coherency), or for bringup-time debug (e.g. poking
> into TLB/cache internals), and are not usually intended for general
> purpose software.
> 
> Linux generally avoids the use of IMP DEF registers, but does so in some
> cases (e.g. for PMUs) after FW explicitly describes that those are safe
> to access.
> 

Thanks for the explanation and details, which make things much clear. Since
the IMPDEF system register can't be used like this way, hypercall (HVC) would
be considered to serve same purpose - deliver signals from host to guest. However,
the hypercall number and behaviors are guarded by specification. For example,
the hypercalls used by para-virtualized stolen time, which are defined in
include/linux/arm-smccc.h, are specified by ARM DEN0057A [1]. So I need a
specification to be created, where the hypercalls used by this feature are
defined? If it's not needed, can I pick hypercalls that aren't used and define
their behaviors by myself?

[1] http://infocenter.arm.com/help/topic/com.arm.doc.den0057a/DEN0057A_Paravirtualized_Time_for_Arm_based_Systems_v1_0.pdf

Another thing I want to check is about the ESR_EL1[DFSC]. In this series,
the asynchronous page fault is identified by IMPDEF DFSC (Data Fault Status
Code) in ESR_EL1. According to what we discussed, the IMPDEF DFSC shouldn't
be fired (produced) by software. It should be produced by hardware either?
What I understood is IMPDEF is hardware behavior. If this is true, I need
to avoid using IMPDEF DFSC in next revision :)


Thanks,
Gavin




_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFCv1 0/7] Support Async Page Fault
  2020-04-16  7:59       ` Gavin Shan
@ 2020-04-16  9:16         ` Mark Rutland
  2020-04-16  9:21           ` Will Deacon
  2020-04-17 10:34           ` Gavin Shan
  0 siblings, 2 replies; 15+ messages in thread
From: Mark Rutland @ 2020-04-16  9:16 UTC (permalink / raw)
  To: Gavin Shan
  Cc: drjones, suzuki.poulose, Marc Zyngier, sudeep.holla, eric.auger,
	james.morse, shan.gavin, catalin.marinas, will, kvmarm,
	linux-arm-kernel

On Thu, Apr 16, 2020 at 05:59:33PM +1000, Gavin Shan wrote:
> On 4/14/20 9:05 PM, Mark Rutland wrote:
> > On Tue, Apr 14, 2020 at 03:39:56PM +1000, Gavin Shan wrote:
> > > On 4/10/20 10:52 PM, Marc Zyngier wrote:
> > > > On 2020-04-10 09:58, Gavin Shan wrote:
> > > > > In order to fulfil the control flow and convey signals between host
> > > > > and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced.
> > > > > The register accepts control block's physical address, plus requested
> > > > > features. Also, the signal is sent using data abort with the specific
> > > > > IMPDEF Data Fault Status Code (DFSC). The specific signal is stored
> > > > > in the control block by host, to be consumed by guest.
> > 
> > > > - We don't add IMPDEF sysregs, period. That's reserved for the HW. If
> > > >     you want to trap, there's the HVC instruction to that effect.
> > 
> > > I really don't understand how IMPDEF sysreg is used by hardware vendors.
> > > Do we have an existing functionality, which depends on IMPDEF sysreg?
> > > I was thinking the IMPDEF sysreg can be used by software either, but
> > > it seems I'm wrong.
> > 
> > The key is in the name: an IMPLEMENTATION DEFINED register is defined by
> > the implementation (i.e. the specific CPU microarchitecture), so it's
> > wrong for software to come up with an arbitrary semantic as this will
> > differ from the implementation's defined semantic for the register.
> > 
> > Typically, IMP DEF resgisters are used for things that firmware needs to
> > do (e.g. enter/exit coherency), or for bringup-time debug (e.g. poking
> > into TLB/cache internals), and are not usually intended for general
> > purpose software.
> > 
> > Linux generally avoids the use of IMP DEF registers, but does so in some
> > cases (e.g. for PMUs) after FW explicitly describes that those are safe
> > to access.
> 
> Thanks for the explanation and details, which make things much clear. Since
> the IMPDEF system register can't be used like this way, hypercall (HVC) would
> be considered to serve same purpose - deliver signals from host to guest.

I'm not sure I follow how you'd use HVC to inject a signal into a guest;
the HVC would have to be issued by the guest to the host. Unless you're
injecting the signal via some other mechanism (e.g. an interrupt), and
the guest issues the HVC in response to that?

> However, the hypercall number and behaviors are guarded by
> specification. For example, the hypercalls used by para-virtualized
> stolen time, which are defined in include/linux/arm-smccc.h, are
> specified by ARM DEN0057A [1]. So I need a specification to be
> created, where the hypercalls used by this feature are defined? If
> it's not needed, can I pick hypercalls that aren't used and define
> their behaviors by myself?
> 
> [1] http://infocenter.arm.com/help/topic/com.arm.doc.den0057a/DEN0057A_Paravirtualized_Time_for_Arm_based_Systems_v1_0.pdf

Take a look at the SMCCC / SMC Calling Convention:

 https://developer.arm.com/docs/den0028/c

... that defines ranges set aside for hypervisor-specific usage, and
despite its name it also applies to HVC calls.

There's been intermittent work to add a probing story for that, so that
part is subject to change, but for prototyping you can just choose an
arbitray number in that range -- just be suere to mention in the commit
and cover letter that this part isn't complete.

> Another thing I want to check is about the ESR_EL1[DFSC]. In this series,
> the asynchronous page fault is identified by IMPDEF DFSC (Data Fault Status
> Code) in ESR_EL1. According to what we discussed, the IMPDEF DFSC shouldn't
> be fired (produced) by software. It should be produced by hardware either?
> What I understood is IMPDEF is hardware behavior. If this is true, I need
> to avoid using IMPDEF DFSC in next revision :)

Yes, similar applies here.

If the guest is making a hypercall, you can return the fault info as the
response in GPRs, so I don't think you need to touch any architectural
fault registers.

Thanks,
Mark.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFCv1 0/7] Support Async Page Fault
  2020-04-16  9:16         ` Mark Rutland
@ 2020-04-16  9:21           ` Will Deacon
  2020-04-17 10:34           ` Gavin Shan
  1 sibling, 0 replies; 15+ messages in thread
From: Will Deacon @ 2020-04-16  9:21 UTC (permalink / raw)
  To: Mark Rutland
  Cc: drjones, Gavin Shan, suzuki.poulose, catalin.marinas,
	sudeep.holla, eric.auger, james.morse, shan.gavin, Marc Zyngier,
	kvmarm, linux-arm-kernel

On Thu, Apr 16, 2020 at 10:16:22AM +0100, Mark Rutland wrote:
> On Thu, Apr 16, 2020 at 05:59:33PM +1000, Gavin Shan wrote:
> > However, the hypercall number and behaviors are guarded by
> > specification. For example, the hypercalls used by para-virtualized
> > stolen time, which are defined in include/linux/arm-smccc.h, are
> > specified by ARM DEN0057A [1]. So I need a specification to be
> > created, where the hypercalls used by this feature are defined? If
> > it's not needed, can I pick hypercalls that aren't used and define
> > their behaviors by myself?
> > 
> > [1] http://infocenter.arm.com/help/topic/com.arm.doc.den0057a/DEN0057A_Paravirtualized_Time_for_Arm_based_Systems_v1_0.pdf
> 
> Take a look at the SMCCC / SMC Calling Convention:
> 
>  https://developer.arm.com/docs/den0028/c
> 
> ... that defines ranges set aside for hypervisor-specific usage, and
> despite its name it also applies to HVC calls.
> 
> There's been intermittent work to add a probing story for that, so that
> part is subject to change, but for prototyping you can just choose an
> arbitray number in that range -- just be suere to mention in the commit
> and cover letter that this part isn't complete.

Right, might be simplest to start off with:

https://android-kvm.googlesource.com/linux/+/refs/heads/willdeacon/hvc

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH RFCv1 0/7] Support Async Page Fault
  2020-04-16  9:16         ` Mark Rutland
  2020-04-16  9:21           ` Will Deacon
@ 2020-04-17 10:34           ` Gavin Shan
  1 sibling, 0 replies; 15+ messages in thread
From: Gavin Shan @ 2020-04-17 10:34 UTC (permalink / raw)
  To: Mark Rutland
  Cc: drjones, suzuki.poulose, Marc Zyngier, sudeep.holla, eric.auger,
	james.morse, shan.gavin, catalin.marinas, will, kvmarm,
	linux-arm-kernel

Hi Mark,

On 4/16/20 7:16 PM, Mark Rutland wrote:
> On Thu, Apr 16, 2020 at 05:59:33PM +1000, Gavin Shan wrote:
>> On 4/14/20 9:05 PM, Mark Rutland wrote:
>>> On Tue, Apr 14, 2020 at 03:39:56PM +1000, Gavin Shan wrote:
>>>> On 4/10/20 10:52 PM, Marc Zyngier wrote:
>>>>> On 2020-04-10 09:58, Gavin Shan wrote:
>>>>>> In order to fulfil the control flow and convey signals between host
>>>>>> and guest. A IMPDEF system register (SYS_ASYNC_PF_EL1) is introduced.
>>>>>> The register accepts control block's physical address, plus requested
>>>>>> features. Also, the signal is sent using data abort with the specific
>>>>>> IMPDEF Data Fault Status Code (DFSC). The specific signal is stored
>>>>>> in the control block by host, to be consumed by guest.
>>>
>>>>> - We don't add IMPDEF sysregs, period. That's reserved for the HW. If
>>>>>      you want to trap, there's the HVC instruction to that effect.
>>>
>>>> I really don't understand how IMPDEF sysreg is used by hardware vendors.
>>>> Do we have an existing functionality, which depends on IMPDEF sysreg?
>>>> I was thinking the IMPDEF sysreg can be used by software either, but
>>>> it seems I'm wrong.
>>>
>>> The key is in the name: an IMPLEMENTATION DEFINED register is defined by
>>> the implementation (i.e. the specific CPU microarchitecture), so it's
>>> wrong for software to come up with an arbitrary semantic as this will
>>> differ from the implementation's defined semantic for the register.
>>>
>>> Typically, IMP DEF resgisters are used for things that firmware needs to
>>> do (e.g. enter/exit coherency), or for bringup-time debug (e.g. poking
>>> into TLB/cache internals), and are not usually intended for general
>>> purpose software.
>>>
>>> Linux generally avoids the use of IMP DEF registers, but does so in some
>>> cases (e.g. for PMUs) after FW explicitly describes that those are safe
>>> to access.
>>
>> Thanks for the explanation and details, which make things much clear. Since
>> the IMPDEF system register can't be used like this way, hypercall (HVC) would
>> be considered to serve same purpose - deliver signals from host to guest.
> 
> I'm not sure I follow how you'd use HVC to inject a signal into a guest;
> the HVC would have to be issued by the guest to the host. Unless you're
> injecting the signal via some other mechanism (e.g. an interrupt), and
> the guest issues the HVC in response to that?
> 

Yeah, I expressed it in wrong way. It should be - HVC is used by guest
to inject signal to host. Sorry for the confusion.

>> However, the hypercall number and behaviors are guarded by
>> specification. For example, the hypercalls used by para-virtualized
>> stolen time, which are defined in include/linux/arm-smccc.h, are
>> specified by ARM DEN0057A [1]. So I need a specification to be
>> created, where the hypercalls used by this feature are defined? If
>> it's not needed, can I pick hypercalls that aren't used and define
>> their behaviors by myself?
>>
>> [1] http://infocenter.arm.com/help/topic/com.arm.doc.den0057a/DEN0057A_Paravirtualized_Time_for_Arm_based_Systems_v1_0.pdf
> 
> Take a look at the SMCCC / SMC Calling Convention:
> 
>   https://developer.arm.com/docs/den0028/c
> 
> ... that defines ranges set aside for hypervisor-specific usage, and
> despite its name it also applies to HVC calls.
> 
> There's been intermittent work to add a probing story for that, so that
> part is subject to change, but for prototyping you can just choose an
> arbitray number in that range -- just be suere to mention in the commit
> and cover letter that this part isn't complete.
> 

Sure, thanks for the pointer, which is very useful. Will already shared
the git repo link about the probing story. I'll take a look and come back
to you if I have more questions. Yes, arbitrary numbers in the range is
ok for prototyping.

>> Another thing I want to check is about the ESR_EL1[DFSC]. In this series,
>> the asynchronous page fault is identified by IMPDEF DFSC (Data Fault Status
>> Code) in ESR_EL1. According to what we discussed, the IMPDEF DFSC shouldn't
>> be fired (produced) by software. It should be produced by hardware either?
>> What I understood is IMPDEF is hardware behavior. If this is true, I need
>> to avoid using IMPDEF DFSC in next revision :)
> 
> Yes, similar applies here.
> 
> If the guest is making a hypercall, you can return the fault info as the
> response in GPRs, so I don't think you need to touch any architectural
> fault registers.
> 

The guest passively receives the async page fault from the host. It means
there is no hypercall issued by guest. I think the asynchronous property can
be stored in control block by host and it's retrieved by guest when the async
page fault is handled. In this way, I needn't a specific (IMPDEF) DFSC. Note
the physical address of the control block is passed to host when the functionality
is enabled by HVC.

Thanks,
Gavin


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-04-17 10:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-10  8:58 [PATCH RFCv1 0/7] Support Async Page Fault Gavin Shan
2020-04-10  8:58 ` [PATCH RFCv1 1/7] kvm/arm64: Rename kvm_vcpu_get_hsr() to kvm_vcpu_get_esr() Gavin Shan
2020-04-10  8:58 ` [PATCH RFCv1 2/7] kvm/arm64: Detach ESR operator from vCPU struct Gavin Shan
2020-04-10  8:58 ` [PATCH RFCv1 3/7] kvm/arm64: Replace hsr with esr Gavin Shan
2020-04-10  8:58 ` [PATCH RFCv1 4/7] kvm/arm64: Export kvm_handle_user_mem_abort() with prefault mode Gavin Shan
2020-04-10  8:58 ` [PATCH RFCv1 5/7] kvm/arm64: Allow inject data abort with specified DFSC Gavin Shan
2020-04-10  8:58 ` [PATCH RFCv1 6/7] kvm/arm64: Support async page fault Gavin Shan
2020-04-10  8:58 ` [PATCH RFCv1 7/7] arm64: " Gavin Shan
2020-04-10 12:52 ` [PATCH RFCv1 0/7] Support Async Page Fault Marc Zyngier
2020-04-14  5:39   ` Gavin Shan
2020-04-14 11:05     ` Mark Rutland
2020-04-16  7:59       ` Gavin Shan
2020-04-16  9:16         ` Mark Rutland
2020-04-16  9:21           ` Will Deacon
2020-04-17 10:34           ` Gavin Shan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).