kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support
@ 2020-08-17  8:41 Keqian Zhu
  2020-08-17  8:41 ` [RFC PATCH 1/5] KVM: arm64: Document pvtime LPT interface Keqian Zhu
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Keqian Zhu @ 2020-08-17  8:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Marc Zyngier, Steven Price, Catalin Marinas, Will Deacon

Hi all,

This patch series picks up the LPT pvtime feature originally developed
by Steven Price: https://patchwork.kernel.org/cover/10726499/

Backgroud:

There is demand for cross-platform migration, which means we have to
solve different CPU features and arch counter frequency between hosts.
This patch series can solve the latter problem.

About LPT:

This implements support for Live Physical Time (LPT) which provides the
guest with a method to derive a stable counter of time during which the
guest is executing even when the guest is being migrated between hosts
with different physical counter frequencies.

Changes on Steven Price's work:
1. LPT structure: use symmatical semantics of scale multiplier, and use
   fraction bits instead of "shift" to make everything clear.
2. Structure allocation: host kernel does not allocates the LPT structure,
   instead it is allocated by userspace through VM attributes. The save/restore
   functionality can be removed.
3. Since LPT structure just need update once for each guest run, add a flag to
   indicate the update status. This has two benifits: 1) avoid multiple update
   by each vCPUs. 2) If the update flag is not set, then return NOT SUPPORT for
   coressponding guest HVC call.
4. Add VM device attributes interface for userspace configuration.
5. Add a base LPT read/write layer to reduce code.
6. Support ptimer scaling.
7. Support timer event stream translation.

Things need concern:
1. https://developer.arm.com/docs/den0057/a needs update.
2. If a guest fails to initialize and use LPT, when it migrates to a host with
   different native frequency, it will see the virtual counter jump. However
   this concern is not a problem.
   1) Guest does not support LPT or host does not enable LPT: we won't migrate
      guest to a host with different frequency.
   2) Guest kernel faces problem when initilize LPT: guest can not boot up, so
      counter jump is not a problem at this situation.


Patch 2 and 3 implement host KVM part.
Patch 4 and 5 implement guest driver part.

Keqian Zhu (5):
  KVM: arm64: Document pvtime LPT interface
  KVM: arm64: Support Live Physical Time reporting
  KVM: arm64: Provide VM device attributes for LPT time
  clocksource: arm_arch_timer: Add pvtime LPT initialization
  clocksource: arm_arch_timer: Use pvtime LPT

 Documentation/virt/kvm/arm/pvtime.rst |  78 +++++++++++--
 Documentation/virt/kvm/devices/vm.rst |  30 +++++
 arch/arm64/include/asm/arch_timer.h   | 179 ++++++++++++++++++++++++++--
 arch/arm64/include/asm/kvm_host.h     |  14 +++
 arch/arm64/include/uapi/asm/kvm.h     |   5 +
 arch/arm64/kvm/arm.c                  |  71 ++++++++++++
 arch/arm64/kvm/hypercalls.c           |   5 +
 arch/arm64/kvm/pvtime.c               | 212 +++++++++++++++++++++++++++++++++-
 drivers/clocksource/arm_arch_timer.c  | 120 +++++++++++++++----
 9 files changed, 669 insertions(+), 45 deletions(-)

-- 
1.8.3.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/5] KVM: arm64: Document pvtime LPT interface
  2020-08-17  8:41 [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Keqian Zhu
@ 2020-08-17  8:41 ` Keqian Zhu
  2020-08-17  8:41 ` [RFC PATCH 2/5] KVM: arm64: Support Live Physical Time reporting Keqian Zhu
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Keqian Zhu @ 2020-08-17  8:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Marc Zyngier, Steven Price, Catalin Marinas, Will Deacon

The "Arm Paravirtualized Time for Arm-Base Systems" specification
DEN 0057A just contains "Stolen Time" now, as the details of "Live
Physical Time" have not been fully agreed.

This introduces a new LPT structure with more understandable fields,
as scale_mult and rscale_mult are symmetrical.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
---
 Documentation/virt/kvm/arm/pvtime.rst | 78 ++++++++++++++++++++++++++++++-----
 Documentation/virt/kvm/devices/vm.rst | 30 ++++++++++++++
 2 files changed, 97 insertions(+), 11 deletions(-)

diff --git a/Documentation/virt/kvm/arm/pvtime.rst b/Documentation/virt/kvm/arm/pvtime.rst
index 94bffe2..fd11915 100644
--- a/Documentation/virt/kvm/arm/pvtime.rst
+++ b/Documentation/virt/kvm/arm/pvtime.rst
@@ -8,14 +8,17 @@ support for AArch64 guests:
 
 https://developer.arm.com/docs/den0057/a
 
-KVM/arm64 implements the stolen time part of this specification by providing
-some hypervisor service calls to support a paravirtualized guest obtaining a
-view of the amount of time stolen from its execution.
+KVM/arm64 implements the stolen time and live physical time (LPT) parts of this
+specification by providing some hypervisor service calls to a paravirtualized
+guest. With stolen time support, guest can obtain the amount of time stolen
+from its execution. LPT represents time during which the guest is running and
+it realizes cross-native-frequency migrations.
 
-Two new SMCCC compatible hypercalls are defined:
+Three new SMCCC compatible hypercalls are defined:
 
 * PV_TIME_FEATURES: 0xC5000020
 * PV_TIME_ST:       0xC5000021
+* PV_TIME_LPT:      0xC5000022
 
 These are only available in the SMC64/HVC64 calling convention as
 paravirtualized time is not available to 32 bit Arm guests. The existence of
@@ -26,7 +29,8 @@ PV_TIME_FEATURES
     ============= ========    ==========
     Function ID:  (uint32)    0xC5000020
     PV_call_id:   (uint32)    The function to query for support.
-                              Currently only PV_TIME_ST is supported.
+                              Currently only PV_TIME_ST and PV_TIME_LPT are
+                              supported.
     Return value: (int64)     NOT_SUPPORTED (-1) or SUCCESS (0) if the relevant
                               PV-time feature is supported by the hypervisor.
     ============= ========    ==========
@@ -39,17 +43,23 @@ PV_TIME_ST
                               NOT_SUPPORTED (-1)
     ============= ========    ==========
 
-The IPA returned by PV_TIME_ST should be mapped by the guest as normal memory
-with inner and outer write back caching attributes, in the inner shareable
-domain. A total of 16 bytes from the IPA returned are guaranteed to be
-meaningfully filled by the hypervisor (see structure below).
+PV_TIME_LPT
+    ============= ========    ==========
+    Function ID:  (uint32)    0xC5000022
+    Return value: (int64)     IPA of the LPT data structure for this guest.
+                              On failure:
+                              NOT_SUPPORTED (-1)
+    ============= ========    ==========
 
-PV_TIME_ST returns the structure for the calling VCPU.
+The IPA returned by PV_TIME_ST and PV_TIME_LPT should be mapped by the guest as
+normal memory with inner and outer write back caching attributes, in the inner
+shareable domain.
 
 Stolen Time
 -----------
 
-The structure pointed to by the PV_TIME_ST hypercall is as follows:
+A total of 16 bytes from the IPA returned are guaranteed to be meaningfully
+filled by the hypervisor. The structure description is as follows:
 
 +-------------+-------------+-------------+----------------------------+
 | Field       | Byte Length | Byte Offset | Description                |
@@ -78,3 +88,49 @@ the region using 64k pages and avoids conflicting attributes with other memory.
 
 For the user space interface see Documentation/virt/kvm/devices/vcpu.rst
 section "3. GROUP: KVM_ARM_VCPU_PVTIME_CTRL".
+
+Live Physical Time
+-----------
+
+A total of 48 bytes from the IPA returned are guaranteed to be meaningfully
+filled by the hypervisor. The structure description is as follows:
+
++-----------------+-------------+-------------+----------------------------+
+| Field           | Byte Length | Byte Offset | Description                |
++=================+=============+=============+============================+
+| Revision        |      4      |      0      | Must be 0 for version 1.0  |
++-----------------+-------------+-------------+----------------------------+
+| Attributes      |      4      |      4      | Must be 0                  |
++-----------------+-------------+-------------+----------------------------+
+| sequence_number |      8      |      8      | Bit 0: reserved            |
+|                 |             |             | Bits 1-63: number of runs, |
+|                 |             |             | including the initial run. |
++-----------------+-------------+-------------+----------------------------|
+| native_freq     |      4      |      16     | Native frequency           |
++-----------------+-------------+-------------+----------------------------|
+| pv_freq         |      4      |      20     | Paravirtualized frequency  |
+|                 |             |             | seen by guest              |
++-----------------+-------------+-------------+----------------------------|
+| scale_mult      |      8      |      24     | Multiplier to scale native |
+|                 |             |             | cycles to PV cycles        |
++-----------------+-------------+-------------+----------------------------|
+| rscale_mult     |      8      |      32     | Multiplier to reversely    |
+|                 |             |             | scale PV cycles to native  |
+|                 |             |             | cycles                     |
++-----------------+-------------+-------------+----------------------------|
+| fracbits        |      4      |      40     | The bits of fraction of    |
+|                 |             |             | scale_mult                 |
++-----------------+-------------+-------------+----------------------------|
+| rfracbits       |      4      |      44     | The bits of fraction of    |
+|                 |             |             | rscale_mult                |
++-----------------+-------------+-------------+----------------------------|
+
+All values in the structure are stored little-endian.
+
+The structure will be updated by the hypervisor prior to scheduling VCPUs. It
+will be present within a reserved region of the normal memory given to the
+guest. The guest should not attempt to write into this memory. There is a
+structure per guest.
+
+For the user space interface see Documentation/virt/kvm/devices/vm.rst
+section "6. GROUP: KVM_ARM_VM_PVTIME_LPT_CTRL".
diff --git a/Documentation/virt/kvm/devices/vm.rst b/Documentation/virt/kvm/devices/vm.rst
index 0aa5b1c..7e5a189 100644
--- a/Documentation/virt/kvm/devices/vm.rst
+++ b/Documentation/virt/kvm/devices/vm.rst
@@ -314,3 +314,33 @@ Allows userspace to query the status of migration mode.
 	     if it is enabled
 :Returns:   -EFAULT if the given address is not accessible from kernel space;
 	    0 in case of success.
+
+6. GROUP: KVM_ARM_VM_PVTIME_LPT_CTRL
+====================================
+
+:Architectures: ARM64
+
+6.1. ATTRIBUTE: KVM_ARM_VM_PVTIME_LPT_IPA (r/w)
+-----------------------------------------------
+
+Specifies the base address of the live physical time (LPT) structure for this
+guest. The base address must be 64 byte aligned and exist within a valid guest
+memory region. See Documentation/virt/kvm/arm/pvtime.rst for more information
+including the layout of the LPT structure.
+
+Parameters: 64-bit base address
+Returns: -ENXIO:  LPT not implemented
+         -EEXIST: Base address already set for this guest
+         -EINVAL: Base address not 64 byte aligned or not within guest memory
+
+6.2. ATTRIBUTE: KVM_ARM_VM_PVTIME_LPT_FREQ (r/w)
+------------------------------------------------
+
+Specifies the paravirtualized (PV) frequency for this guest. The PV frequency
+is independent with native frequency, so we can support cross-native-frequency
+migration.
+
+Parameters: 32-bit PV frequency
+Returns: -ENXIO:  LPT not implemented
+         -EEXIST: PV frequency already set for this guest
+         -EINVAL: PV frequency is zero
-- 
1.8.3.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/5] KVM: arm64: Support Live Physical Time reporting
  2020-08-17  8:41 [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Keqian Zhu
  2020-08-17  8:41 ` [RFC PATCH 1/5] KVM: arm64: Document pvtime LPT interface Keqian Zhu
@ 2020-08-17  8:41 ` Keqian Zhu
  2020-08-17  8:41 ` [RFC PATCH 3/5] KVM: arm64: Provide VM device attributes for LPT time Keqian Zhu
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Keqian Zhu @ 2020-08-17  8:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Marc Zyngier, Steven Price, Catalin Marinas, Will Deacon

Provide a method for a guest to derive a paravirtualized counter/timer
which isn't dependent on the host's counter frequency. This allows a
guest to be migrated onto a new host which doesn't have the same
frequency without the virtual counter being disturbed.

The host provides a shared structure which contains coefficients that
can be used to map the real counter from the host (the Arm "virtual
counter") to a paravirtualized view of time. On migration the new host
updates the coefficients to ensure that the guests view of time (after
using the coefficients) doesn't change and that the derived counter
progresses at the same real frequency.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h |  10 +++
 arch/arm64/kvm/arm.c              |   7 +++
 arch/arm64/kvm/hypercalls.c       |   5 ++
 arch/arm64/kvm/pvtime.c           | 125 +++++++++++++++++++++++++++++++++++++-
 4 files changed, 146 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e21d4a0..0c6a564 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -95,6 +95,13 @@ struct kvm_arch {
 	 * supported.
 	 */
 	bool return_nisv_io_abort_to_user;
+
+	/* Guest PV Live Physical Time state */
+	struct {
+		u32 fpv; /* PV frequency */
+		gpa_t base; /* Base IPA of shared structure */
+		bool updated; /* Indicate whether it is updated by KVM */
+	} lpt;
 };
 
 #define KVM_NR_MEM_OBJS     40
@@ -506,6 +513,9 @@ int io_mem_abort(struct kvm_vcpu *vcpu, struct kvm_run *run,
 gpa_t kvm_init_stolen_time(struct kvm_vcpu *vcpu);
 void kvm_update_stolen_time(struct kvm_vcpu *vcpu);
 
+gpa_t kvm_init_lpt_time(struct kvm *kvm);
+int kvm_update_lpt_time(struct kvm *kvm);
+
 int kvm_arm_pvtime_set_attr(struct kvm_vcpu *vcpu,
 			    struct kvm_device_attr *attr);
 int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 90cb905..671f1461 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -135,6 +135,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
 	/* The maximum number of VCPUs is limited by the host's GIC model */
 	kvm->arch.max_vcpus = kvm_arm_default_max_vcpus();
 
+	/* Should be setup by userspace before guest run */
+	kvm->arch.lpt.base = GPA_INVALID;
+
 	return ret;
 out_free_stage2_pgd:
 	kvm_free_stage2_pgd(kvm);
@@ -528,6 +531,10 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.has_run_once = true;
 
+	ret = kvm_update_lpt_time(kvm);
+	if (ret)
+		return ret;
+
 	if (likely(irqchip_in_kernel(kvm))) {
 		/*
 		 * Map the VGIC hardware resources before running a vcpu the
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 550dfa3..254491b 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -62,6 +62,11 @@ int kvm_hvc_call_handler(struct kvm_vcpu *vcpu)
 		if (gpa != GPA_INVALID)
 			val = gpa;
 		break;
+	case ARM_SMCCC_HV_PV_TIME_LPT:
+		gpa = kvm_init_lpt_time(vcpu->kvm);
+		if (gpa != GPA_INVALID)
+			val = gpa;
+		break;
 	default:
 		return kvm_psci_call(vcpu);
 	}
diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
index 2b24e7f..24131ca 100644
--- a/arch/arm64/kvm/pvtime.c
+++ b/arch/arm64/kvm/pvtime.c
@@ -43,7 +43,9 @@ long kvm_hypercall_pv_features(struct kvm_vcpu *vcpu)
 	switch (feature) {
 	case ARM_SMCCC_HV_PV_TIME_FEATURES:
 	case ARM_SMCCC_HV_PV_TIME_ST:
-		val = SMCCC_RET_SUCCESS;
+	case ARM_SMCCC_HV_PV_TIME_LPT:
+		if (vcpu->kvm->arch.lpt.updated)
+			val = SMCCC_RET_SUCCESS;
 		break;
 	}
 
@@ -134,3 +136,124 @@ int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
 	}
 	return -ENXIO;
 }
+
+static int pvclock_lpt_update_vtimer(struct kvm *kvm,
+				     struct pvclock_vm_lpt_time *pvclock)
+{
+	u32 current_freq = arch_timer_get_rate();
+	u64 current_time = kvm_phys_timer_read();
+	u32 previous_freq;
+	struct kvm_vcpu *vcpu;
+	int i;
+
+	/* The first run? */
+	if (le64_to_cpu(pvclock->sequence_number) == 0)
+		return 0;
+
+	/* PV frequency must not change! */
+	if (le32_to_cpu(pvclock->pv_freq) != kvm->arch.lpt.fpv)
+		return -EFAULT;
+
+	previous_freq = le32_to_cpu(pvclock->native_freq);
+	if (previous_freq == current_freq)
+		return 0;
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
+		u64 cntvct, new_cntvct;
+		u32 cnt_tval, new_cnt_tval;
+
+		/* Update cntvoff based on new cntvct */
+		cntvct = current_time - vtimer->cntvoff;
+		new_cntvct = mul_u64_u32_div(cntvct,
+					current_freq,
+					previous_freq);
+		vtimer->cntvoff = current_time - new_cntvct;
+
+		/* Update cnt_cval based on new cnt_tval */
+		cnt_tval = vtimer->cnt_cval - cntvct;
+		new_cnt_tval = mul_u64_u32_div(cnt_tval,
+					current_freq,
+					previous_freq);
+		vtimer->cnt_cval = new_cntvct + new_cnt_tval;
+	}
+
+	return 0;
+}
+
+static void pvclock_lpt_update_structure(struct kvm *kvm,
+					 struct pvclock_vm_lpt_time *pvclock)
+{
+	u64 sequence_number, scale_mult, rscale_mult;
+	u32 native_freq, pv_freq;
+	u32 scale_intbits, fracbits;
+	u32 rscale_intbits, rfracbits;
+
+	sequence_number = le64_to_cpu(pvclock->sequence_number) + 2;
+
+	native_freq = arch_timer_get_rate();
+	pv_freq = kvm->arch.lpt.fpv;
+
+	/* At least one bit for int part */
+	scale_intbits = rscale_intbits = 1;
+	if (pv_freq >= native_freq)
+		scale_intbits = ilog2(pv_freq / native_freq) + 1;
+	else
+		rscale_intbits = ilog2(native_freq / pv_freq) + 1;
+
+	fracbits = 64 - scale_intbits;
+	scale_mult = mul_u64_u32_div(BIT_ULL(fracbits), pv_freq, native_freq);
+	rfracbits = 64 - rscale_intbits;
+	rscale_mult = mul_u64_u32_div(BIT_ULL(rfracbits), native_freq, pv_freq);
+
+	pvclock->sequence_number = cpu_to_le64(sequence_number);
+	pvclock->native_freq = cpu_to_le32(native_freq);
+	pvclock->pv_freq = cpu_to_le32(pv_freq);
+	pvclock->scale_mult = cpu_to_le64(scale_mult);
+	pvclock->rscale_mult = cpu_to_le64(rscale_mult);
+	pvclock->fracbits = cpu_to_le32(fracbits);
+	pvclock->rfracbits = cpu_to_le32(rfracbits);
+}
+
+int kvm_update_lpt_time(struct kvm *kvm)
+{
+	u32 pv_freq = kvm->arch.lpt.fpv;
+	u64 lpt_ipa = kvm->arch.lpt.base;
+	struct pvclock_vm_lpt_time pvclock;
+	int ret = 0;
+
+	/* Userspace does not enable LPT? */
+	if (pv_freq == 0 && lpt_ipa == GPA_INVALID)
+		return 0;
+
+	/* Userspace fault programming? */
+	if (pv_freq == 0 || lpt_ipa == GPA_INVALID)
+		return -EINVAL;
+
+	mutex_lock(&kvm->lock);
+	if (kvm->arch.lpt.updated)
+		goto unlock;
+
+	ret = kvm_read_guest_lock(kvm, lpt_ipa, &pvclock, sizeof(pvclock));
+	if (ret < 0)
+		goto unlock;
+
+	ret = pvclock_lpt_update_vtimer(kvm, &pvclock);
+	if (ret < 0)
+		goto unlock;
+
+	pvclock_lpt_update_structure(kvm, &pvclock);
+
+	ret = kvm_write_guest_lock(kvm, lpt_ipa, &pvclock, sizeof(pvclock));
+	if (!ret)
+		kvm->arch.lpt.updated = true;
+
+unlock:
+	mutex_unlock(&kvm->lock);
+	return ret;
+}
+
+gpa_t kvm_init_lpt_time(struct kvm *kvm)
+{
+	return kvm->arch.lpt.base;
+}
-- 
1.8.3.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/5] KVM: arm64: Provide VM device attributes for LPT time
  2020-08-17  8:41 [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Keqian Zhu
  2020-08-17  8:41 ` [RFC PATCH 1/5] KVM: arm64: Document pvtime LPT interface Keqian Zhu
  2020-08-17  8:41 ` [RFC PATCH 2/5] KVM: arm64: Support Live Physical Time reporting Keqian Zhu
@ 2020-08-17  8:41 ` Keqian Zhu
  2020-08-17  8:41 ` [RFC PATCH 4/5] clocksource: arm_arch_timer: Add pvtime LPT initialization Keqian Zhu
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Keqian Zhu @ 2020-08-17  8:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Marc Zyngier, Steven Price, Catalin Marinas, Will Deacon

Allow user space to inform the KVM host what the PV frequency is
and where in the physical memory map the paravirtualized LPT time
structures should be located. User space can set attributes on the
VM for that guest.

The address is given in terms of the physical address visible to
the guest and must be 64 byte aligned. The guest will discover the
address via a hypercall. PV frequency is 32 bits and must not be 0.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
---
 arch/arm64/include/asm/kvm_host.h |  4 ++
 arch/arm64/include/uapi/asm/kvm.h |  5 +++
 arch/arm64/kvm/arm.c              | 64 ++++++++++++++++++++++++++++
 arch/arm64/kvm/pvtime.c           | 87 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 160 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 0c6a564..cbe330c 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -523,6 +523,10 @@ int kvm_arm_pvtime_get_attr(struct kvm_vcpu *vcpu,
 int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu,
 			    struct kvm_device_attr *attr);
 
+int kvm_arm_pvtime_lpt_set_attr(struct kvm *kvm, struct kvm_device_attr *attr);
+int kvm_arm_pvtime_lpt_get_attr(struct kvm *kvm, struct kvm_device_attr *attr);
+int kvm_arm_pvtime_lpt_has_attr(struct kvm *kvm, struct kvm_device_attr *attr);
+
 static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch)
 {
 	vcpu_arch->steal.base = GPA_INVALID;
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index ba85bb2..7b045c7 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -325,6 +325,11 @@ struct kvm_vcpu_events {
 #define   KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES	3
 #define   KVM_DEV_ARM_ITS_CTRL_RESET		4
 
+/* Device Control API on kvm fd */
+#define KVM_ARM_VM_PVTIME_LPT_CTRL	0
+#define   KVM_ARM_VM_PVTIME_LPT_IPA	0
+#define   KVM_ARM_VM_PVTIME_LPT_FREQ	1
+
 /* Device Control API on vcpu fd */
 #define KVM_ARM_VCPU_PMU_V3_CTRL	0
 #define   KVM_ARM_VCPU_PMU_V3_IRQ	0
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 671f1461..4a867e5 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1235,11 +1235,60 @@ static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
 	}
 }
 
+static int kvm_arm_vm_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+	int ret;
+
+	switch (attr->group) {
+	case KVM_ARM_VM_PVTIME_LPT_CTRL:
+		ret = kvm_arm_pvtime_lpt_set_attr(kvm, attr);
+		break;
+	default:
+		ret = -ENXIO;
+		break;
+	}
+
+	return ret;
+}
+
+static int kvm_arm_vm_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+	int ret;
+
+	switch (attr->group) {
+	case KVM_ARM_VM_PVTIME_LPT_CTRL:
+		ret = kvm_arm_pvtime_lpt_get_attr(kvm, attr);
+		break;
+	default:
+		ret = -ENXIO;
+		break;
+	}
+
+	return ret;
+}
+
+static int kvm_arm_vm_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+	int ret;
+
+	switch (attr->group) {
+	case KVM_ARM_VM_PVTIME_LPT_CTRL:
+		ret = kvm_arm_pvtime_lpt_has_attr(kvm, attr);
+		break;
+	default:
+		ret = -ENXIO;
+		break;
+	}
+
+	return ret;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
 		       unsigned int ioctl, unsigned long arg)
 {
 	struct kvm *kvm = filp->private_data;
 	void __user *argp = (void __user *)arg;
+	struct kvm_device_attr attr;
 
 	switch (ioctl) {
 	case KVM_CREATE_IRQCHIP: {
@@ -1271,6 +1320,21 @@ long kvm_arch_vm_ioctl(struct file *filp,
 
 		return 0;
 	}
+	case KVM_SET_DEVICE_ATTR: {
+		if (copy_from_user(&attr, argp, sizeof(attr)))
+			return -EFAULT;
+		return kvm_arm_vm_set_attr(kvm, &attr);
+	}
+	case KVM_GET_DEVICE_ATTR: {
+		if (copy_from_user(&attr, argp, sizeof(attr)))
+			return -EFAULT;
+		return kvm_arm_vm_get_attr(kvm, &attr);
+	}
+	case KVM_HAS_DEVICE_ATTR: {
+		if (copy_from_user(&attr, argp, sizeof(attr)))
+			return  -EFAULT;
+		return kvm_arm_vm_has_attr(kvm, &attr);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/arch/arm64/kvm/pvtime.c b/arch/arm64/kvm/pvtime.c
index 24131ca..3f93473 100644
--- a/arch/arm64/kvm/pvtime.c
+++ b/arch/arm64/kvm/pvtime.c
@@ -257,3 +257,90 @@ gpa_t kvm_init_lpt_time(struct kvm *kvm)
 {
 	return kvm->arch.lpt.base;
 }
+
+int kvm_arm_pvtime_lpt_set_attr(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+	u64 __user *user = (u64 __user *)attr->addr;
+	u64 ipa;
+	u32 freq;
+	int idx;
+	int ret = 0;
+
+	switch (attr->attr) {
+	case KVM_ARM_VM_PVTIME_LPT_IPA:
+		if (get_user(ipa, user)) {
+			ret = -EFAULT;
+			break;
+		}
+		if (!IS_ALIGNED(ipa, 64)) {
+			ret = -EINVAL;
+			break;
+		}
+		if (kvm->arch.lpt.base != GPA_INVALID) {
+			ret = -EEXIST;
+			break;
+		}
+		/* Check the address is in a valid memslot */
+		idx = srcu_read_lock(&kvm->srcu);
+		if (kvm_is_error_hva(gfn_to_hva(kvm, ipa >> PAGE_SHIFT)))
+			ret = -EINVAL;
+		srcu_read_unlock(&kvm->srcu, idx);
+		if (ret)
+			break;
+
+		kvm->arch.lpt.base = ipa;
+		break;
+	case KVM_ARM_VM_PVTIME_LPT_FREQ:
+		if (get_user(freq, user)) {
+			ret = -EFAULT;
+			break;
+		}
+		if (freq == 0) {
+			ret = -EINVAL;
+			break;
+		}
+		if (kvm->arch.lpt.fpv != 0) {
+			ret = -EEXIST;
+			break;
+		}
+		kvm->arch.lpt.fpv = freq;
+		break;
+	default:
+		ret = -ENXIO;
+		break;
+	}
+
+	return ret;
+}
+
+int kvm_arm_pvtime_lpt_get_attr(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+	u64 __user *user = (u64 __user *)attr->addr;
+	int ret = 0;
+
+	switch (attr->attr) {
+	case KVM_ARM_VM_PVTIME_LPT_IPA:
+		if (put_user(kvm->arch.lpt.base, user))
+			ret = -EFAULT;
+		break;
+	case KVM_ARM_VM_PVTIME_LPT_FREQ:
+		if (put_user(kvm->arch.lpt.fpv, user))
+			ret = -EFAULT;
+		break;
+	default:
+		ret = -ENXIO;
+		break;
+	}
+
+	return ret;
+}
+
+int kvm_arm_pvtime_lpt_has_attr(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+	switch (attr->attr) {
+	case KVM_ARM_VM_PVTIME_LPT_IPA:
+	case KVM_ARM_VM_PVTIME_LPT_FREQ:
+		return 0;
+	}
+	return -ENXIO;
+}
-- 
1.8.3.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 4/5] clocksource: arm_arch_timer: Add pvtime LPT initialization
  2020-08-17  8:41 [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Keqian Zhu
                   ` (2 preceding siblings ...)
  2020-08-17  8:41 ` [RFC PATCH 3/5] KVM: arm64: Provide VM device attributes for LPT time Keqian Zhu
@ 2020-08-17  8:41 ` Keqian Zhu
  2020-08-17  8:41 ` [RFC PATCH 5/5] clocksource: arm_arch_timer: Use pvtime LPT Keqian Zhu
  2020-08-18 14:41 ` [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Marc Zyngier
  5 siblings, 0 replies; 13+ messages in thread
From: Keqian Zhu @ 2020-08-17  8:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Marc Zyngier, Steven Price, Catalin Marinas, Will Deacon

Enable paravirtualized time to be used in a KVM guest if the host
supports it. This allows the guest to derive a counter which is clocked
at a persistent rate even when the guest is migrated.

If we discover that the system supports SMCCC v1.1 then we probe to
determine whether the hypervisor supports paravirtualized features and
finally whether it supports "Live Physical Time" reporting. If so a
shared structure is made available to the guest containing coefficients
to calculate the derived clock.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
---
 drivers/clocksource/arm_arch_timer.c | 69 ++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index 6c3e841..eb2e57a 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -26,6 +26,7 @@
 #include <linux/acpi.h>
 
 #include <asm/arch_timer.h>
+#include <asm/pvclock-abi.h>
 #include <asm/virt.h>
 
 #include <clocksource/arm_arch_timer.h>
@@ -84,6 +85,66 @@ static int __init early_evtstrm_cfg(char *buf)
 }
 early_param("clocksource.arm_arch_timer.evtstrm", early_evtstrm_cfg);
 
+/* PV-time LPT */
+#ifdef CONFIG_ARM64
+struct pvclock_vm_lpt_time *lpt_info;
+EXPORT_SYMBOL_GPL(lpt_info);
+DEFINE_STATIC_KEY_FALSE(pvclock_lpt_key_enabled);
+EXPORT_SYMBOL_GPL(pvclock_lpt_key_enabled);
+
+static bool has_pv_lpt_clock(void)
+{
+	struct arm_smccc_res res;
+
+	if (arm_smccc_1_1_get_conduit() == SMCCC_CONDUIT_NONE)
+		return false;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_ARCH_FEATURES_FUNC_ID,
+			   ARM_SMCCC_HV_PV_TIME_FEATURES, &res);
+	if (res.a0 != SMCCC_RET_SUCCESS)
+		return false;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_TIME_FEATURES,
+			     ARM_SMCCC_HV_PV_TIME_LPT, &res);
+	return res.a0 == SMCCC_RET_SUCCESS;
+}
+
+static int pvclock_lpt_init(void)
+{
+	struct arm_smccc_res res;
+
+	if (!has_pv_lpt_clock())
+		return 0;
+
+	arm_smccc_1_1_invoke(ARM_SMCCC_HV_PV_TIME_LPT, 0, &res);
+	if (res.a0 == SMCCC_RET_NOT_SUPPORTED)
+		return 0;
+
+	lpt_info = memremap(res.a0, sizeof(*lpt_info), MEMREMAP_WB);
+	if (!lpt_info) {
+		pr_warn("Failed to map pvclock LPT data structure\n");
+		return -EFAULT;
+	}
+
+	if (le32_to_cpu(lpt_info->revision) != 0 ||
+	    le32_to_cpu(lpt_info->attributes) != 0) {
+		pr_warn_once("Unexpected revision or attributes "
+			     "in pvclock LPT data structure\n");
+		return -EFAULT;
+	}
+
+	static_branch_enable(&pvclock_lpt_key_enabled);
+	pr_info("Using pvclock LPT\n");
+	return 0;
+}
+#else /* CONFIG_ARM64 */
+static int pvclock_lpt_init(void)
+{
+	return 0;
+}
+#endif /* CONFIG_ARM64 */
+
+
 /*
  * Architected system timer support.
  */
@@ -1285,6 +1346,10 @@ static int __init arch_timer_of_init(struct device_node *np)
 
 	arch_timer_populate_kvm_info();
 
+	ret = pvclock_lpt_init();
+	if (ret)
+		return ret;
+
 	rate = arch_timer_get_cntfrq();
 	arch_timer_of_configure_rate(rate, np);
 
@@ -1613,6 +1678,10 @@ static int __init arch_timer_acpi_init(struct acpi_table_header *table)
 
 	arch_timer_populate_kvm_info();
 
+	ret = pvclock_lpt_init();
+	if (ret)
+		return ret;
+
 	/*
 	 * When probing via ACPI, we have no mechanism to override the sysreg
 	 * CNTFRQ value. This *must* be correct.
-- 
1.8.3.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 5/5] clocksource: arm_arch_timer: Use pvtime LPT
  2020-08-17  8:41 [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Keqian Zhu
                   ` (3 preceding siblings ...)
  2020-08-17  8:41 ` [RFC PATCH 4/5] clocksource: arm_arch_timer: Add pvtime LPT initialization Keqian Zhu
@ 2020-08-17  8:41 ` Keqian Zhu
  2020-08-18 14:41 ` [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Marc Zyngier
  5 siblings, 0 replies; 13+ messages in thread
From: Keqian Zhu @ 2020-08-17  8:41 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel, kvmarm, kvm
  Cc: Marc Zyngier, Steven Price, Catalin Marinas, Will Deacon

Enable paravirtualized time to be used in a KVM guest if the host
supports it. This allows the guest to derive a counter which is clocked
at a persistent rate even when the guest is migrated.

If we discover that the system supports SMCCC v1.1 then we probe to
determine whether the hypervisor supports paravirtualized features and
finally whether it supports "Live Physical Time" reporting. If so a
shared structure is made available to the guest containing coefficients
to calculate the derived clock.

The guest kernel uses the coefficients to present a clock to user space
that is always clocked at the same rate whenever the guest is running
('live'), even if the physical clock changes (due to the guest being
migrated).

The existing workaround framework for CNTVCT is used to trap user space
accesses to the timer registers so we can present the derived clock.

Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Keqian Zhu <zhukeqian1@huawei.com>
---
 arch/arm64/include/asm/arch_timer.h  | 179 ++++++++++++++++++++++++++++++++---
 drivers/clocksource/arm_arch_timer.c |  59 +++++++-----
 2 files changed, 204 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
index 9f0ec21..bbaecf1 100644
--- a/arch/arm64/include/asm/arch_timer.h
+++ b/arch/arm64/include/asm/arch_timer.h
@@ -10,6 +10,7 @@
 
 #include <asm/barrier.h>
 #include <asm/hwcap.h>
+#include <asm/pvclock-abi.h>
 #include <asm/sysreg.h>
 
 #include <linux/bug.h>
@@ -64,25 +65,178 @@ struct arch_timer_erratum_workaround {
 DECLARE_PER_CPU(const struct arch_timer_erratum_workaround *,
 		timer_unstable_counter_workaround);
 
+
+extern struct pvclock_vm_lpt_time *lpt_info;
+DECLARE_STATIC_KEY_FALSE(pvclock_lpt_key_enabled);
+
+/* LPT read/write base layer */
+
+#define lpt_read_base(target, trans, read) ({				\
+	__le64 _seq_begin, _seq_end;					\
+	u64 _nval, _pval;						\
+									\
+	do {								\
+		_seq_begin = READ_ONCE(lpt_info->sequence_number);	\
+		/* LPT structure can be treated as readonly device */	\
+		rmb();							\
+									\
+		_nval = read(target);					\
+		_pval = trans(_nval);					\
+									\
+		rmb();							\
+		_seq_end = READ_ONCE(lpt_info->sequence_number);	\
+	} while (unlikely(_seq_begin != _seq_end));			\
+									\
+	_pval;								\
+})
+
+#define lpt_write_base(val, target, trans, write) ({			\
+	__le64 _seq_begin, _seq_end;					\
+	u64 _pval = val;						\
+	u64 _nval;							\
+									\
+	do {								\
+		_seq_begin = READ_ONCE(lpt_info->sequence_number);	\
+		/* LPT structure can be treated as readonly device */	\
+		rmb();							\
+									\
+		_nval = trans(_pval);					\
+		write(_nval, target);					\
+									\
+		rmb();							\
+		_seq_end = READ_ONCE(lpt_info->sequence_number);	\
+	} while (unlikely(_seq_begin != _seq_end));			\
+})
+
+#define lpt_read(target, trans, read) ({				\
+	u64 _val;							\
+									\
+	if (static_branch_unlikely(&pvclock_lpt_key_enabled)) {		\
+		_val = lpt_read_base(target, trans, read);		\
+	} else {							\
+		_val = read(target);					\
+	}								\
+									\
+	_val;								\
+})
+
+#define lpt_write(val, target, trans, write) ({				\
+	if (static_branch_unlikely(&pvclock_lpt_key_enabled)) {		\
+		lpt_write_base(val, target, trans, write);		\
+	} else {							\
+		write(val, target);					\
+	}								\
+})
+
+/* LPT read/write layer for timer and count */
+
+static inline u64 native_to_pv_cycles(u64 cnt)
+{
+	u64 scale_mult = le64_to_cpu(lpt_info->scale_mult);
+	u32 fracbits = le32_to_cpu(lpt_info->fracbits);
+
+	return mul_u64_u64_shr(scale_mult, cnt, fracbits);
+}
+
+static inline u64 pv_to_native_cycles(u64 cnt)
+{
+	u64 rscale_mult = le64_to_cpu(lpt_info->rscale_mult);
+	u32 rfracbits = le32_to_cpu(lpt_info->rfracbits);
+
+	return mul_u64_u64_shr(rscale_mult, cnt, rfracbits);
+}
+
+#define arch_timer_read_mediated(reg) ({				\
+	lpt_read(reg, native_to_pv_cycles, read_sysreg);		\
+})
+
+#define arch_timer_write_mediated(val, reg) ({				\
+	u64 _val = val;							\
+	lpt_write(_val, reg, pv_to_native_cycles, write_sysreg);	\
+})
+
+#define mem_timer_read_mediated(addr) ({				\
+	lpt_read(addr, native_to_pv_cycles, readl_relaxed);		\
+})
+
+#define mem_timer_write_mediated(val, addr) ({				\
+	u64 _val = val;							\
+	lpt_write(_val, addr, pv_to_native_cycles, writel_relaxed);	\
+})
+
+/* LPT read/write layer for cntkctl_el1 */
+
+static inline int cntkctl_evnti_shift(void)
+{
+	u32 native_freq = le32_to_cpu(lpt_info->native_freq);
+	u32 pv_freq = le32_to_cpu(lpt_info->pv_freq);
+	int div, shift;
+
+	if (pv_freq >= native_freq)
+		div = pv_freq / native_freq;
+	else
+		div = native_freq / pv_freq;
+
+	/* Find the closest power of two to the divisor */
+	shift = fls(div);
+	if ((shift == 1) || (shift > 1 && !(shift & (1 << (shift - 2)))))
+		shift--;
+
+	return pv_freq >= native_freq ? shift : -shift;
+}
+
+static inline u64 parse_cntkctl(u64 val, bool native_to_pv)
+{
+	int evnti = (val >> ARCH_TIMER_EVT_TRIGGER_SHIFT) & 0xF;
+
+	if (native_to_pv)
+		evnti = evnti + cntkctl_evnti_shift();
+	else
+		evnti = evnti - cntkctl_evnti_shift();
+
+	evnti = min(15, max(0, evnti));
+	val &= ~ARCH_TIMER_EVT_TRIGGER_MASK;
+	val |= evnti << ARCH_TIMER_EVT_TRIGGER_SHIFT;
+
+	return val;
+}
+
+#define TRANS_CNTKCTL_N(nval) ({					\
+	parse_cntkctl(nval, true);					\
+})
+
+#define TRANS_CNTKCTL_P(pval) ({					\
+	parse_cntkctl(pval, false);					\
+})
+
+#define arch_timer_read_cntkctl_mediated() ({				\
+	lpt_read(cntkctl_el1, TRANS_CNTKCTL_N, read_sysreg);		\
+})
+
+#define arch_timer_write_cntkctl_mediated(val) ({			\
+	u64 _val = val;							\
+	lpt_write(_val, cntkctl_el1, TRANS_CNTKCTL_P, write_sysreg);	\
+})
+
 /* inline sysreg accessors that make erratum_handler() work */
 static inline notrace u32 arch_timer_read_cntp_tval_el0(void)
 {
-	return read_sysreg(cntp_tval_el0);
+	return arch_timer_read_mediated(cntp_tval_el0);
 }
 
 static inline notrace u32 arch_timer_read_cntv_tval_el0(void)
 {
-	return read_sysreg(cntv_tval_el0);
+	return arch_timer_read_mediated(cntv_tval_el0);
 }
 
 static inline notrace u64 arch_timer_read_cntpct_el0(void)
 {
-	return read_sysreg(cntpct_el0);
+	return arch_timer_read_mediated(cntpct_el0);
 }
 
 static inline notrace u64 arch_timer_read_cntvct_el0(void)
 {
-	return read_sysreg(cntvct_el0);
+	return arch_timer_read_mediated(cntvct_el0);
 }
 
 #define arch_timer_reg_read_stable(reg)					\
@@ -110,7 +264,7 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
 			write_sysreg(val, cntp_ctl_el0);
 			break;
 		case ARCH_TIMER_REG_TVAL:
-			write_sysreg(val, cntp_tval_el0);
+			arch_timer_write_mediated(val, cntp_tval_el0);
 			break;
 		}
 	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
@@ -119,7 +273,7 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
 			write_sysreg(val, cntv_ctl_el0);
 			break;
 		case ARCH_TIMER_REG_TVAL:
-			write_sysreg(val, cntv_tval_el0);
+			arch_timer_write_mediated(val, cntv_tval_el0);
 			break;
 		}
 	}
@@ -151,17 +305,20 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
 
 static inline u32 arch_timer_get_cntfrq(void)
 {
-	return read_sysreg(cntfrq_el0);
+	if (static_branch_unlikely(&pvclock_lpt_key_enabled))
+		return le32_to_cpu(lpt_info->pv_freq);
+	else
+		return read_sysreg(cntfrq_el0);
 }
 
 static inline u32 arch_timer_get_cntkctl(void)
 {
-	return read_sysreg(cntkctl_el1);
+	return arch_timer_read_cntkctl_mediated();
 }
 
 static inline void arch_timer_set_cntkctl(u32 cntkctl)
 {
-	write_sysreg(cntkctl, cntkctl_el1);
+	arch_timer_write_cntkctl_mediated(cntkctl);
 	isb();
 }
 
@@ -199,7 +356,7 @@ static __always_inline u64 __arch_counter_get_cntpct(void)
 	u64 cnt;
 
 	isb();
-	cnt = read_sysreg(cntpct_el0);
+	cnt = arch_timer_read_mediated(cntpct_el0);
 	arch_counter_enforce_ordering(cnt);
 	return cnt;
 }
@@ -219,7 +376,7 @@ static __always_inline u64 __arch_counter_get_cntvct(void)
 	u64 cnt;
 
 	isb();
-	cnt = read_sysreg(cntvct_el0);
+	cnt = arch_timer_read_mediated(cntvct_el0);
 	arch_counter_enforce_ordering(cnt);
 	return cnt;
 }
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
index eb2e57a..28277b0 100644
--- a/drivers/clocksource/arm_arch_timer.c
+++ b/drivers/clocksource/arm_arch_timer.c
@@ -26,7 +26,6 @@
 #include <linux/acpi.h>
 
 #include <asm/arch_timer.h>
-#include <asm/pvclock-abi.h>
 #include <asm/virt.h>
 
 #include <clocksource/arm_arch_timer.h>
@@ -137,11 +136,24 @@ static int pvclock_lpt_init(void)
 	pr_info("Using pvclock LPT\n");
 	return 0;
 }
+
+static bool pvclock_lpt_enabled(void)
+{
+	return static_branch_unlikely(&pvclock_lpt_key_enabled);
+}
 #else /* CONFIG_ARM64 */
 static int pvclock_lpt_init(void)
 {
 	return 0;
 }
+
+static bool pvclock_lpt_enabled(void)
+{
+	return false;
+}
+
+#define mem_timer_read_mediated(val, addr) (readl_relaxed(val, addr))
+#define mem_timer_write_mediated(val, addr) (writel_relaxed(val, addr))
 #endif /* CONFIG_ARM64 */
 
 
@@ -160,7 +172,7 @@ void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
 			writel_relaxed(val, timer->base + CNTP_CTL);
 			break;
 		case ARCH_TIMER_REG_TVAL:
-			writel_relaxed(val, timer->base + CNTP_TVAL);
+			mem_timer_write_mediated(val, timer->base + CNTP_TVAL);
 			break;
 		}
 	} else if (access == ARCH_TIMER_MEM_VIRT_ACCESS) {
@@ -170,7 +182,7 @@ void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
 			writel_relaxed(val, timer->base + CNTV_CTL);
 			break;
 		case ARCH_TIMER_REG_TVAL:
-			writel_relaxed(val, timer->base + CNTV_TVAL);
+			mem_timer_write_mediated(val, timer->base + CNTV_TVAL);
 			break;
 		}
 	} else {
@@ -279,8 +291,8 @@ struct ate_acpi_oem_info {
 	int _retries = 200;				\
 							\
 	do {						\
-		_old = read_sysreg(reg);		\
-		_new = read_sysreg(reg);		\
+		_old = arch_timer_read_mediated(reg);	\
+		_new = arch_timer_read_mediated(reg);	\
 		_retries--;				\
 	} while (unlikely(_old != _new) && _retries);	\
 							\
@@ -325,8 +337,8 @@ static u64 notrace fsl_a008585_read_cntvct_el0(void)
 	int _retries = 50;					\
 								\
 	do {							\
-		_old = read_sysreg(reg);			\
-		_new = read_sysreg(reg);			\
+		_old = arch_timer_read_mediated(reg);		\
+		_new = arch_timer_read_mediated(reg);		\
 		_retries--;					\
 	} while (unlikely((_new - _old) >> 5) && _retries);	\
 								\
@@ -383,8 +395,8 @@ static u64 notrace arm64_858921_read_cntpct_el0(void)
 {
 	u64 old, new;
 
-	old = read_sysreg(cntpct_el0);
-	new = read_sysreg(cntpct_el0);
+	old = arch_timer_read_mediated(cntpct_el0);
+	new = arch_timer_read_mediated(cntpct_el0);
 	return (((old ^ new) >> 32) & 1) ? old : new;
 }
 
@@ -392,8 +404,8 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
 {
 	u64 old, new;
 
-	old = read_sysreg(cntvct_el0);
-	new = read_sysreg(cntvct_el0);
+	old = arch_timer_read_mediated(cntvct_el0);
+	new = arch_timer_read_mediated(cntvct_el0);
 	return (((old ^ new) >> 32) & 1) ? old : new;
 }
 #endif
@@ -411,7 +423,7 @@ static u64 notrace arm64_858921_read_cntvct_el0(void)
 	int _retries = 150;						\
 									\
 	do {								\
-		_val = read_sysreg(reg);				\
+		_val = arch_timer_read_mediated(reg);			\
 		_retries--;						\
 	} while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries);	\
 									\
@@ -431,12 +443,14 @@ static u64 notrace sun50i_a64_read_cntvct_el0(void)
 
 static u32 notrace sun50i_a64_read_cntp_tval_el0(void)
 {
-	return read_sysreg(cntp_cval_el0) - sun50i_a64_read_cntpct_el0();
+	return arch_timer_read_mediated(cntp_cval_el0) -
+	       sun50i_a64_read_cntpct_el0();
 }
 
 static u32 notrace sun50i_a64_read_cntv_tval_el0(void)
 {
-	return read_sysreg(cntv_cval_el0) - sun50i_a64_read_cntvct_el0();
+	return arch_timer_read_mediated(cntv_cval_el0) -
+	       sun50i_a64_read_cntvct_el0();
 }
 #endif
 
@@ -458,10 +472,10 @@ static void erratum_set_next_event_tval_generic(const int access, unsigned long
 
 	if (access == ARCH_TIMER_PHYS_ACCESS) {
 		cval = evt + arch_counter_get_cntpct();
-		write_sysreg(cval, cntp_cval_el0);
+		arch_timer_write_mediated(cval, cntp_cval_el0);
 	} else {
 		cval = evt + arch_counter_get_cntvct();
-		write_sysreg(cval, cntv_cval_el0);
+		arch_timer_write_mediated(cval, cntv_cval_el0);
 	}
 
 	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
@@ -906,12 +920,11 @@ static void arch_counter_set_user_access(void)
 			| ARCH_TIMER_VIRT_EVT_EN
 			| ARCH_TIMER_USR_PCT_ACCESS_EN);
 
-	/*
-	 * Enable user access to the virtual counter if it doesn't
-	 * need to be workaround. The vdso may have been already
+	/* Trap user access to the virtual counter if we support LPT
+	 * or it needs to be workaround. The vdso may have been already
 	 * disabled though.
 	 */
-	if (arch_timer_this_cpu_has_cntvct_wa())
+	if (pvclock_lpt_enabled() || arch_timer_this_cpu_has_cntvct_wa())
 		pr_info("CPU%d: Trapping CNTVCT access\n", smp_processor_id());
 	else
 		cntkctl |= ARCH_TIMER_USR_VCT_ACCESS_EN;
@@ -1029,9 +1042,9 @@ static u64 arch_counter_get_cntvct_mem(void)
 	u32 vct_lo, vct_hi, tmp_hi;
 
 	do {
-		vct_hi = readl_relaxed(arch_counter_base + CNTVCT_HI);
-		vct_lo = readl_relaxed(arch_counter_base + CNTVCT_LO);
-		tmp_hi = readl_relaxed(arch_counter_base + CNTVCT_HI);
+		vct_hi = mem_timer_read_mediated(arch_counter_base + CNTVCT_HI);
+		vct_lo = mem_timer_read_mediated(arch_counter_base + CNTVCT_LO);
+		tmp_hi = mem_timer_read_mediated(arch_counter_base + CNTVCT_HI);
 	} while (vct_hi != tmp_hi);
 
 	return ((u64) vct_hi << 32) | vct_lo;
-- 
1.8.3.1

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support
  2020-08-17  8:41 [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Keqian Zhu
                   ` (4 preceding siblings ...)
  2020-08-17  8:41 ` [RFC PATCH 5/5] clocksource: arm_arch_timer: Use pvtime LPT Keqian Zhu
@ 2020-08-18 14:41 ` Marc Zyngier
  2020-08-19  8:54   ` Steven Price
  5 siblings, 1 reply; 13+ messages in thread
From: Marc Zyngier @ 2020-08-18 14:41 UTC (permalink / raw)
  To: Keqian Zhu
  Cc: kvm, Catalin Marinas, linux-kernel, Steven Price, Will Deacon,
	kvmarm, linux-arm-kernel

On 2020-08-17 09:41, Keqian Zhu wrote:
> Hi all,
> 
> This patch series picks up the LPT pvtime feature originally developed
> by Steven Price: https://patchwork.kernel.org/cover/10726499/
> 
> Backgroud:
> 
> There is demand for cross-platform migration, which means we have to
> solve different CPU features and arch counter frequency between hosts.
> This patch series can solve the latter problem.
> 
> About LPT:
> 
> This implements support for Live Physical Time (LPT) which provides the
> guest with a method to derive a stable counter of time during which the
> guest is executing even when the guest is being migrated between hosts
> with different physical counter frequencies.
> 
> Changes on Steven Price's work:
> 1. LPT structure: use symmatical semantics of scale multiplier, and use
>    fraction bits instead of "shift" to make everything clear.
> 2. Structure allocation: host kernel does not allocates the LPT 
> structure,
>    instead it is allocated by userspace through VM attributes. The 
> save/restore
>    functionality can be removed.
> 3. Since LPT structure just need update once for each guest run, add a 
> flag to
>    indicate the update status. This has two benifits: 1) avoid multiple 
> update
>    by each vCPUs. 2) If the update flag is not set, then return NOT 
> SUPPORT for
>    coressponding guest HVC call.
> 4. Add VM device attributes interface for userspace configuration.
> 5. Add a base LPT read/write layer to reduce code.
> 6. Support ptimer scaling.
> 7. Support timer event stream translation.
> 
> Things need concern:
> 1. https://developer.arm.com/docs/den0057/a needs update.

LPT was explicitly removed from the spec because it doesn't really
solve the problem, specially for the firmware: EFI knows
nothing about this, for example. How is it going to work?
Also, nobody was ever able to explain how this would work for
nested virt.

ARMv8.4 and ARMv8.6 have the feature set that is required to solve
this problem without adding more PV to the kernel.

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support
  2020-08-18 14:41 ` [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Marc Zyngier
@ 2020-08-19  8:54   ` Steven Price
  2020-08-21  3:54     ` zhukeqian
                       ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Steven Price @ 2020-08-19  8:54 UTC (permalink / raw)
  To: Marc Zyngier, Keqian Zhu
  Cc: kvm, Catalin Marinas, linux-kernel, Will Deacon, kvmarm,
	linux-arm-kernel

On 18/08/2020 15:41, Marc Zyngier wrote:
> On 2020-08-17 09:41, Keqian Zhu wrote:
>> Hi all,
>>
>> This patch series picks up the LPT pvtime feature originally developed
>> by Steven Price: https://patchwork.kernel.org/cover/10726499/
>>
>> Backgroud:
>>
>> There is demand for cross-platform migration, which means we have to
>> solve different CPU features and arch counter frequency between hosts.
>> This patch series can solve the latter problem.
>>
>> About LPT:
>>
>> This implements support for Live Physical Time (LPT) which provides the
>> guest with a method to derive a stable counter of time during which the
>> guest is executing even when the guest is being migrated between hosts
>> with different physical counter frequencies.
>>
>> Changes on Steven Price's work:
>> 1. LPT structure: use symmatical semantics of scale multiplier, and use
>>    fraction bits instead of "shift" to make everything clear.
>> 2. Structure allocation: host kernel does not allocates the LPT 
>> structure,
>>    instead it is allocated by userspace through VM attributes. The 
>> save/restore
>>    functionality can be removed.
>> 3. Since LPT structure just need update once for each guest run, add a 
>> flag to
>>    indicate the update status. This has two benifits: 1) avoid 
>> multiple update
>>    by each vCPUs. 2) If the update flag is not set, then return NOT 
>> SUPPORT for
>>    coressponding guest HVC call.
>> 4. Add VM device attributes interface for userspace configuration.
>> 5. Add a base LPT read/write layer to reduce code.
>> 6. Support ptimer scaling.
>> 7. Support timer event stream translation.
>>
>> Things need concern:
>> 1. https://developer.arm.com/docs/den0057/a needs update.
> 
> LPT was explicitly removed from the spec because it doesn't really
> solve the problem, specially for the firmware: EFI knows
> nothing about this, for example. How is it going to work?
> Also, nobody was ever able to explain how this would work for
> nested virt.
> 
> ARMv8.4 and ARMv8.6 have the feature set that is required to solve
> this problem without adding more PV to the kernel.

Hi Marc,

These are good points, however we do still have the situation that CPUs 
that don't have ARMv8.4/8.6 clearly cannot implement this. I presume the 
use-case Keqian is looking at predates the necessary support in the CPU 
- Keqian if you can provide more details on the architecture(s) involved 
that would be helpful.

Nested virt is indeed more of an issue - we did have some ideas around 
using SDEI that never made it to the spec. However I would argue that 
the most pragmatic approach would be to not support the combination of 
nested virt and LPT. Hopefully that can wait until the counter scaling 
support is available and not require PV.

We are discussing (re-)releasing the spec with the LPT parts added. If 
you have fundamental objections then please me know.

Thanks,

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support
  2020-08-19  8:54   ` Steven Price
@ 2020-08-21  3:54     ` zhukeqian
  2020-08-22 10:31     ` Marc Zyngier
  2020-08-25 12:52     ` Mark Rutland
  2 siblings, 0 replies; 13+ messages in thread
From: zhukeqian @ 2020-08-21  3:54 UTC (permalink / raw)
  To: Steven Price, Marc Zyngier
  Cc: kvm, Catalin Marinas, linux-kernel, yebiaoxiang, Will Deacon,
	kvmarm, linux-arm-kernel



On 2020/8/19 16:54, Steven Price wrote:
> On 18/08/2020 15:41, Marc Zyngier wrote:
>> On 2020-08-17 09:41, Keqian Zhu wrote:
>>> Hi all,
>>>
>>> This patch series picks up the LPT pvtime feature originally developed
>>> by Steven Price: https://patchwork.kernel.org/cover/10726499/
>>>
>>> Backgroud:
>>>
>>> There is demand for cross-platform migration, which means we have to
>>> solve different CPU features and arch counter frequency between hosts.
>>> This patch series can solve the latter problem.
>>>
>>> About LPT:
>>>
>>> This implements support for Live Physical Time (LPT) which provides the
>>> guest with a method to derive a stable counter of time during which the
>>> guest is executing even when the guest is being migrated between hosts
>>> with different physical counter frequencies.
>>>
>>> Changes on Steven Price's work:
>>> 1. LPT structure: use symmatical semantics of scale multiplier, and use
>>>    fraction bits instead of "shift" to make everything clear.
>>> 2. Structure allocation: host kernel does not allocates the LPT structure,
>>>    instead it is allocated by userspace through VM attributes. The save/restore
>>>    functionality can be removed.
>>> 3. Since LPT structure just need update once for each guest run, add a flag to
>>>    indicate the update status. This has two benifits: 1) avoid multiple update
>>>    by each vCPUs. 2) If the update flag is not set, then return NOT SUPPORT for
>>>    coressponding guest HVC call.
>>> 4. Add VM device attributes interface for userspace configuration.
>>> 5. Add a base LPT read/write layer to reduce code.
>>> 6. Support ptimer scaling.
>>> 7. Support timer event stream translation.
>>>
>>> Things need concern:
>>> 1. https://developer.arm.com/docs/den0057/a needs update.
>>
>> LPT was explicitly removed from the spec because it doesn't really
>> solve the problem, specially for the firmware: EFI knows
>> nothing about this, for example. How is it going to work?
>> Also, nobody was ever able to explain how this would work for
>> nested virt.
>>
>> ARMv8.4 and ARMv8.6 have the feature set that is required to solve
>> this problem without adding more PV to the kernel.
> 
> Hi Marc,
> 
> These are good points, however we do still have the situation that CPUs that don't have ARMv8.4/8.6 clearly cannot implement this. I presume the use-case Keqian is looking at predates the necessary support in the CPU - Keqian if you can provide more details on the architecture(s) involved that would be helpful.
> 
> Nested virt is indeed more of an issue - we did have some ideas around using SDEI that never made it to the spec. However I would argue that the most pragmatic approach would be to not support the combination of nested virt and LPT. Hopefully that can wait until the counter scaling support is available and not require PV.
> 
> We are discussing (re-)releasing the spec with the LPT parts added. If you have fundamental objections then please me know.
> 
> Thanks,
> 
> Steve
> .
> 
Hi Marc and Steven,

In fact, I have realize a demo which utilize v8.6-ECV to present a constant timer freq to guest. It seems
work well, but this approach has some shortcoming:

1. Guest access to cntvct cntv_ctl cntv_tval cntv_cval must trap to EL2. Every trap will take about
   hundreds of nano-seconds. For every timer interrupt, there is about 5~6 traps, so it will spend
   several us (this seems not a serious problem :-) ). But trap will cause big deviation for nano-sleep.
2. We have to make cntfrq be a context of guest. However, only the highest exception level has right to
   modify cntfrq. It means we have to add a new SMC call.
3. cntkctl controls event stream freq, so KVM should also translate the guest access of cntkctl. However
   we cannot trap guest access of that. Any solution for this problem?

I think LPT as a software solution can solve these problems. However, as Marc said, UEFI knows nothing about
LPT, and it will access vtimer/counter directly. The key point is how serious the impact is on UEFI.

I can see that some UEFI runtime services and drivers/applications will access timer/counter.
For runtime services, it is OK. Because we can translate the result which return from UEFI for Linux.
For drivers/applications, they will feel time goes faster or slower after migration. This is a problem indeed :-)

Thanks,
Keqian
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support
  2020-08-19  8:54   ` Steven Price
  2020-08-21  3:54     ` zhukeqian
@ 2020-08-22 10:31     ` Marc Zyngier
  2020-09-02 10:09       ` Steven Price
  2020-08-25 12:52     ` Mark Rutland
  2 siblings, 1 reply; 13+ messages in thread
From: Marc Zyngier @ 2020-08-22 10:31 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, Catalin Marinas, linux-kernel, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Steven,

On Wed, 19 Aug 2020 09:54:40 +0100,
Steven Price <steven.price@arm.com> wrote:
> 
> On 18/08/2020 15:41, Marc Zyngier wrote:
> > On 2020-08-17 09:41, Keqian Zhu wrote:
> >> Hi all,
> >> 
> >> This patch series picks up the LPT pvtime feature originally developed
> >> by Steven Price: https://patchwork.kernel.org/cover/10726499/
> >> 
> >> Backgroud:
> >> 
> >> There is demand for cross-platform migration, which means we have to
> >> solve different CPU features and arch counter frequency between hosts.
> >> This patch series can solve the latter problem.
> >> 
> >> About LPT:
> >> 
> >> This implements support for Live Physical Time (LPT) which provides the
> >> guest with a method to derive a stable counter of time during which the
> >> guest is executing even when the guest is being migrated between hosts
> >> with different physical counter frequencies.
> >> 
> >> Changes on Steven Price's work:
> >> 1. LPT structure: use symmatical semantics of scale multiplier, and use
> >>    fraction bits instead of "shift" to make everything clear.
> >> 2. Structure allocation: host kernel does not allocates the LPT
> >> structure,
> >>    instead it is allocated by userspace through VM attributes. The
> >> save/restore
> >>    functionality can be removed.
> >> 3. Since LPT structure just need update once for each guest run,
> >> add a flag to
> >>    indicate the update status. This has two benifits: 1) avoid
> >> multiple update
> >>    by each vCPUs. 2) If the update flag is not set, then return NOT
> >> SUPPORT for
> >>    coressponding guest HVC call.
> >> 4. Add VM device attributes interface for userspace configuration.
> >> 5. Add a base LPT read/write layer to reduce code.
> >> 6. Support ptimer scaling.
> >> 7. Support timer event stream translation.
> >> 
> >> Things need concern:
> >> 1. https://developer.arm.com/docs/den0057/a needs update.
> > 
> > LPT was explicitly removed from the spec because it doesn't really
> > solve the problem, specially for the firmware: EFI knows
> > nothing about this, for example. How is it going to work?
> > Also, nobody was ever able to explain how this would work for
> > nested virt.
> > 
> > ARMv8.4 and ARMv8.6 have the feature set that is required to solve
> > this problem without adding more PV to the kernel.
> 
> Hi Marc,
> 
> These are good points, however we do still have the situation that
> CPUs that don't have ARMv8.4/8.6 clearly cannot implement this. I
> presume the use-case Keqian is looking at predates the necessary
> support in the CPU - Keqian if you can provide more details on the
> architecture(s) involved that would be helpful.

My take on this is that it is a fictional use case. In my experience,
migration happens across *identical* systems, and *any* difference
visible to guests will cause things to go wrong. Errata management
gets in the way, as usual (name *one* integration that isn't broken
one way or another!).

Allowing migration across heterogeneous hosts requires a solution to
the errata management problem, which everyone (including me) has
decided to ignore so far (and I claim that not having a constant timer
frequency exposed to guests is an architecture bug).

> Nested virt is indeed more of an issue - we did have some ideas around
> using SDEI that never made it to the spec.

SDEI? Sigh... Why would SDEI be useful for NV and not for !NV?

> However I would argue that the most pragmatic approach would be to
> not support the combination of nested virt and LPT. Hopefully that
> can wait until the counter scaling support is available and not
> require PV.

And have yet another set of band aids that paper over the fact that we
can't get a consistent story on virtualization? No, thank you.

NV is (IMHO) much more important than LPT as it has a chance of
getting used. LPT is just another tick box, and the fact that ARM is
ready to ignore sideline a decent portion of the architecture is a
clear sign that it hasn't been thought out.

> We are discussing (re-)releasing the spec with the LPT parts added. If
> you have fundamental objections then please me know.

I do, see above. I'm stating that the use case doesn't really exist
given the state of the available HW and the fragmentation of the
architecture, and that ignoring the most important innovation in the
virtualization architecture since ARMv7 is at best short-sighted.

Time scaling is just an instance of the errata management problem, and
that is the issue that needs solving. Papering over part of the
problem is not helping.

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support
  2020-08-19  8:54   ` Steven Price
  2020-08-21  3:54     ` zhukeqian
  2020-08-22 10:31     ` Marc Zyngier
@ 2020-08-25 12:52     ` Mark Rutland
  2 siblings, 0 replies; 13+ messages in thread
From: Mark Rutland @ 2020-08-25 12:52 UTC (permalink / raw)
  To: Steven Price
  Cc: kvm, Marc Zyngier, linux-kernel, Catalin Marinas, Will Deacon,
	kvmarm, linux-arm-kernel

On Wed, Aug 19, 2020 at 09:54:40AM +0100, Steven Price wrote:
> On 18/08/2020 15:41, Marc Zyngier wrote:
> > On 2020-08-17 09:41, Keqian Zhu wrote:

> We are discussing (re-)releasing the spec with the LPT parts added. If you
> have fundamental objections then please me know.

Like Marc, I argued strongly for the removal of the LPT bits on the
premise that it didn't really work (e.g. when transistioning between SW
agents) and so it was a pure maintenance burden.

I don't think the technical arguments have changed, and I don't think
it's a good idea to try to ressurect this. Please rope me in if
this comes up in internal discussions.

Mark.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support
  2020-08-22 10:31     ` Marc Zyngier
@ 2020-09-02 10:09       ` Steven Price
  2020-09-07  2:48         ` zhukeqian
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Price @ 2020-09-02 10:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm, Catalin Marinas, linux-kernel, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Marc,

Sorry for the slow response, I've been on holiday.

On 22/08/2020 11:31, Marc Zyngier wrote:
> Hi Steven,
> 
> On Wed, 19 Aug 2020 09:54:40 +0100,
> Steven Price <steven.price@arm.com> wrote:
>>
>> On 18/08/2020 15:41, Marc Zyngier wrote:
>>> On 2020-08-17 09:41, Keqian Zhu wrote:
[...]
>>>>
>>>> Things need concern:
>>>> 1. https://developer.arm.com/docs/den0057/a needs update.
>>>
>>> LPT was explicitly removed from the spec because it doesn't really
>>> solve the problem, specially for the firmware: EFI knows
>>> nothing about this, for example. How is it going to work?
>>> Also, nobody was ever able to explain how this would work for
>>> nested virt.
>>>
>>> ARMv8.4 and ARMv8.6 have the feature set that is required to solve
>>> this problem without adding more PV to the kernel.
>>
>> Hi Marc,
>>
>> These are good points, however we do still have the situation that
>> CPUs that don't have ARMv8.4/8.6 clearly cannot implement this. I
>> presume the use-case Keqian is looking at predates the necessary
>> support in the CPU - Keqian if you can provide more details on the
>> architecture(s) involved that would be helpful.
> 
> My take on this is that it is a fictional use case. In my experience,
> migration happens across *identical* systems, and *any* difference
> visible to guests will cause things to go wrong. Errata management
> gets in the way, as usual (name *one* integration that isn't broken
> one way or another!).

Keqian appears to have a use case - but obviously I don't know the 
details. I guess Keqian needs to convince you of that.

> Allowing migration across heterogeneous hosts requires a solution to
> the errata management problem, which everyone (including me) has
> decided to ignore so far (and I claim that not having a constant timer
> frequency exposed to guests is an architecture bug).

I agree - errata management needs to be solved before LPT. Between 
restricted subsets of hosts this doesn't seem impossible, but I guess we 
should stall LPT until a credible solution is proposed. I'm certainly 
not proposing one at the moment.

>> Nested virt is indeed more of an issue - we did have some ideas around
>> using SDEI that never made it to the spec.
> 
> SDEI? Sigh... Why would SDEI be useful for NV and not for !NV?

SDEI provides a way of injecting a synchronous exception on migration - 
although that certainly isn't the only possible mechanism. For NV we 
have the problem that a guest-guest may be running at the point of 
migration. However it's not practical for the host hypervisor to provide 
the necessary table directly to the guest-guest which means the 
guest-hypervisor must update the tables before the guest-guest is 
allowed to run on the new host. The only plausible route I could see for 
this is injecting a synchronous exception into the guest (per VCPU) to 
ensure any guest-guests running are exited at migration time.

!NV is easier because we don't have to worry about multiple levels of 
para-virtualisation.

>> However I would argue that the most pragmatic approach would be to
>> not support the combination of nested virt and LPT. Hopefully that
>> can wait until the counter scaling support is available and not
>> require PV.
> 
> And have yet another set of band aids that paper over the fact that we
> can't get a consistent story on virtualization? No, thank you.
> 
> NV is (IMHO) much more important than LPT as it has a chance of
> getting used. LPT is just another tick box, and the fact that ARM is
> ready to ignore sideline a decent portion of the architecture is a
> clear sign that it hasn't been thought out.

Different people have different priorities. NV is definitely important 
for many people. LPT may also be important if you've already got a bunch 
of VMs running on machines and you want to be able to (gradually) 
replace them with newer hosts which happen to have a different clock 
frequency. Those VMs running now clearly aren't using NV.

However, I have to admit it's not me that has the use-case, so I'll 
leave it for others who might actually know the specifics to explain the 
details.

>> We are discussing (re-)releasing the spec with the LPT parts added. If
>> you have fundamental objections then please me know.
> 
> I do, see above. I'm stating that the use case doesn't really exist
> given the state of the available HW and the fragmentation of the
> architecture, and that ignoring the most important innovation in the
> virtualization architecture since ARMv7 is at best short-sighted.
> 
> Time scaling is just an instance of the errata management problem, and
> that is the issue that needs solving. Papering over part of the
> problem is not helping.

I fully agree - errata management is definitely the first step that 
needs solving. This is why I abandoned LPT originally because I don't 
have a generic solution and the testing I did involved really ugly hacks 
just to make the migration possible.

For now I propose we (again) park LPT until some progress has been made 
on errata management.

Thanks,

Steve
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support
  2020-09-02 10:09       ` Steven Price
@ 2020-09-07  2:48         ` zhukeqian
  0 siblings, 0 replies; 13+ messages in thread
From: zhukeqian @ 2020-09-07  2:48 UTC (permalink / raw)
  To: Steven Price, Marc Zyngier
  Cc: kvm, Catalin Marinas, linux-kernel, Will Deacon, kvmarm,
	linux-arm-kernel

Hi Marc and Steven,

On 2020/9/2 18:09, Steven Price wrote:
> Hi Marc,
> 
> Sorry for the slow response, I've been on holiday.
> 
> On 22/08/2020 11:31, Marc Zyngier wrote:
>> Hi Steven,
>>
>> On Wed, 19 Aug 2020 09:54:40 +0100,
>> Steven Price <steven.price@arm.com> wrote:
>>>
>>> On 18/08/2020 15:41, Marc Zyngier wrote:
>>>> On 2020-08-17 09:41, Keqian Zhu wrote:
> [...]
>>>>>
>>>>> Things need concern:
>>>>> 1. https://developer.arm.com/docs/den0057/a needs update.
>>>>
>>>> LPT was explicitly removed from the spec because it doesn't really
>>>> solve the problem, specially for the firmware: EFI knows
>>>> nothing about this, for example. How is it going to work?
>>>> Also, nobody was ever able to explain how this would work for
>>>> nested virt.
>>>>
>>>> ARMv8.4 and ARMv8.6 have the feature set that is required to solve
>>>> this problem without adding more PV to the kernel.
>>>
>>> Hi Marc,
>>>
>>> These are good points, however we do still have the situation that
>>> CPUs that don't have ARMv8.4/8.6 clearly cannot implement this. I
>>> presume the use-case Keqian is looking at predates the necessary
>>> support in the CPU - Keqian if you can provide more details on the
>>> architecture(s) involved that would be helpful.
>>
>> My take on this is that it is a fictional use case. In my experience,
>> migration happens across *identical* systems, and *any* difference
>> visible to guests will cause things to go wrong. Errata management
>> gets in the way, as usual (name *one* integration that isn't broken
>> one way or another!).
> 
> Keqian appears to have a use case - but obviously I don't know the details. I guess Keqian needs to convince you of that.
Sure, there is use case, but I'm very sorry that it's inconvenient to show the detail. Maybe cross-chip migration
will be supported by arm64 eventually, so I think use case is not a key problem.

> 
>> Allowing migration across heterogeneous hosts requires a solution to
>> the errata management problem, which everyone (including me) has
>> decided to ignore so far (and I claim that not having a constant timer
>> frequency exposed to guests is an architecture bug).
> 
> I agree - errata management needs to be solved before LPT. Between restricted subsets of hosts this doesn't seem impossible, but I guess we should stall LPT until a credible solution is proposed. I'm certainly not proposing one at the moment.
> 
>>> Nested virt is indeed more of an issue - we did have some ideas around
>>> using SDEI that never made it to the spec.
>>
>> SDEI? Sigh... Why would SDEI be useful for NV and not for !NV?
> 
> SDEI provides a way of injecting a synchronous exception on migration - although that certainly isn't the only possible mechanism. For NV we have the problem that a guest-guest may be running at the point of migration. However it's not practical for the host hypervisor to provide the necessary table directly to the guest-guest which means the guest-hypervisor must update the tables before the guest-guest is allowed to run on the new host. The only plausible route I could see for this is injecting a synchronous exception into the guest (per VCPU) to ensure any guest-guests running are exited at migration time.
> 
> !NV is easier because we don't have to worry about multiple levels of para-virtualisation.
> 
>>> However I would argue that the most pragmatic approach would be to
>>> not support the combination of nested virt and LPT. Hopefully that
>>> can wait until the counter scaling support is available and not
>>> require PV.
>>
>> And have yet another set of band aids that paper over the fact that we
>> can't get a consistent story on virtualization? No, thank you.
>>
>> NV is (IMHO) much more important than LPT as it has a chance of
>> getting used. LPT is just another tick box, and the fact that ARM is
>> ready to ignore sideline a decent portion of the architecture is a
>> clear sign that it hasn't been thought out.
> 
> Different people have different priorities. NV is definitely important for many people. LPT may also be important if you've already got a bunch of VMs running on machines and you want to be able to (gradually) replace them with newer hosts which happen to have a different clock frequency. Those VMs running now clearly aren't using NV.
> 
> However, I have to admit it's not me that has the use-case, so I'll leave it for others who might actually know the specifics to explain the details.
> 
>>> We are discussing (re-)releasing the spec with the LPT parts added. If
>>> you have fundamental objections then please me know.
>>
>> I do, see above. I'm stating that the use case doesn't really exist
>> given the state of the available HW and the fragmentation of the
>> architecture, and that ignoring the most important innovation in the
>> virtualization architecture since ARMv7 is at best short-sighted.
>>
>> Time scaling is just an instance of the errata management problem, and
>> that is the issue that needs solving. Papering over part of the
>> problem is not helping.
> 
> I fully agree - errata management is definitely the first step that needs solving. This is why I abandoned LPT originally because I don't have a generic solution and the testing I did involved really ugly hacks just to make the migration possible.
> 
> For now I propose we (again) park LPT until some progress has been made on errata management.
> 
> Thanks,
> 
> Steve
> .
As we have discussed, to support the vtimer part of cross-chip migration, we still face many problems.
Firstly, we have no complete solution to realize the basic functionality. For PV solution, LPT just handles
Linux kernel, other SW agents are not involved. For non-PV solution, ARMv8.4 ext and ARMv8.6 ext is not enough.
Besides the basic functionality, we should concern errata management and NV (I think this is not urgent).

Giving above, I agree with Steven that re-park LPT.

Thanks,
Keqian
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-09-07  2:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-17  8:41 [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Keqian Zhu
2020-08-17  8:41 ` [RFC PATCH 1/5] KVM: arm64: Document pvtime LPT interface Keqian Zhu
2020-08-17  8:41 ` [RFC PATCH 2/5] KVM: arm64: Support Live Physical Time reporting Keqian Zhu
2020-08-17  8:41 ` [RFC PATCH 3/5] KVM: arm64: Provide VM device attributes for LPT time Keqian Zhu
2020-08-17  8:41 ` [RFC PATCH 4/5] clocksource: arm_arch_timer: Add pvtime LPT initialization Keqian Zhu
2020-08-17  8:41 ` [RFC PATCH 5/5] clocksource: arm_arch_timer: Use pvtime LPT Keqian Zhu
2020-08-18 14:41 ` [RFC PATCH 0/5] KVM: arm64: Add pvtime LPT support Marc Zyngier
2020-08-19  8:54   ` Steven Price
2020-08-21  3:54     ` zhukeqian
2020-08-22 10:31     ` Marc Zyngier
2020-09-02 10:09       ` Steven Price
2020-09-07  2:48         ` zhukeqian
2020-08-25 12:52     ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).