kvmarm.lists.cs.columbia.edu archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit
@ 2023-02-16 14:21 Marc Zyngier
  2023-02-16 14:21 ` [PATCH 01/16] arm64: Add CNTPOFF_EL2 register definition Marc Zyngier
                   ` (18 more replies)
  0 siblings, 19 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

This series aims at satisfying multiple goals:

- allow a VMM to atomically restore a timer offset for a whole VM
  instead of updating the offset each time a vcpu get its counter
  written

- allow a VMM to save/restore the physical timer context, something
  that we cannot do at the moment due to the lack of offsetting

- provide a framework that is suitable for NV support, where we get
  both global and per timer, per vcpu offsetting

We fix a couple of issues along the way, both from a stylistic and
correctness perspective. This results in a new per VM KVM API that
allows a pair of global offsets to be set at any point in time,
overriding the timer counter writeback.

This has been moderately tested with nVHE, VHE and NV. I do not have
access to CNTPOFF-aware HW, so the jury is still out on that one. Note
that the NV patches in this series are here to give a perspective on
how this gets used.

I've updated the arch_timer selftest to allow offsets to be provided
from the command line, but the arch_test is pretty flimsy and tends to
fail with an error==EINTR, even without this series. Something to
investigate.

Note that this is at best 6.4 material. I have a branch stashed at [1]
and based on kvmarm/next.

Thanks,

	M.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/timer-vm-offsets

Marc Zyngier (16):
  arm64: Add CNTPOFF_EL2 register definition
  arm64: Add HAS_ECV_CNTPOFF capability
  kvm: arm64: Expose {un,}lock_all_vcpus() to the reset of KVM
  KVM: arm64: timers: Use a per-vcpu, per-timer accumulator for
    fractional ns
  KVM: arm64: timers: Convert per-vcpu virtual offset to a global value
  KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer
  KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2
  KVM: arm64: timers: Allow userspace to set the counter offsets
  KVM: arm64: timers: Allow save/restoring of the physical timer
  KVM: arm64: timers: Rationalise per-vcpu timer init
  KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co
  KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset
  KVM: arm64: nv: timers: Support hyp timer emulation
  KVM: arm64: selftests: Add physical timer registers to the sysreg list
  KVM: arm64: selftests: Augment existing timer test to handle variable
    offsets
  KVM: arm64: selftests: Deal with spurious timer interrupts

 Documentation/virt/kvm/api.rst                |  47 ++
 arch/arm64/include/asm/kvm_host.h             |  14 +
 arch/arm64/include/uapi/asm/kvm.h             |  15 +
 arch/arm64/kernel/cpufeature.c                |  11 +
 arch/arm64/kvm/arch_timer.c                   | 443 ++++++++++++++----
 arch/arm64/kvm/arm.c                          |  47 ++
 arch/arm64/kvm/guest.c                        |  29 +-
 arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  18 +-
 arch/arm64/kvm/hypercalls.c                   |   2 +-
 arch/arm64/kvm/trace_arm.h                    |   6 +-
 arch/arm64/kvm/vgic/vgic-kvm-device.c         |  38 --
 arch/arm64/kvm/vgic/vgic.c                    |  15 +
 arch/arm64/kvm/vgic/vgic.h                    |   3 -
 arch/arm64/tools/cpucaps                      |   1 +
 arch/arm64/tools/sysreg                       |   4 +
 include/clocksource/arm_arch_timer.h          |   1 +
 include/kvm/arm_arch_timer.h                  |  32 +-
 include/kvm/arm_vgic.h                        |   1 +
 include/uapi/linux/kvm.h                      |   3 +
 .../selftests/kvm/aarch64/arch_timer.c        |  26 +-
 .../selftests/kvm/aarch64/get-reg-list.c      |   5 +-
 21 files changed, 603 insertions(+), 158 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH 01/16] arm64: Add CNTPOFF_EL2 register definition
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-16 14:21 ` [PATCH 02/16] arm64: Add HAS_ECV_CNTPOFF capability Marc Zyngier
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Add the definition for CNTPOFF_EL2 in the description file.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/tools/sysreg | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 330569fb2336..e930820cface 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -1812,6 +1812,10 @@ Sysreg	CONTEXTIDR_EL2	3	4	13	0	1
 Fields	CONTEXTIDR_ELx
 EndSysreg
 
+Sysreg	CNTPOFF_EL2	3	4	14	0	6
+Field	63:0	PhysicalOffset
+EndSysreg
+
 Sysreg	CPACR_EL12	3	5	1	0	2
 Fields	CPACR_ELx
 EndSysreg
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 02/16] arm64: Add HAS_ECV_CNTPOFF capability
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
  2023-02-16 14:21 ` [PATCH 01/16] arm64: Add CNTPOFF_EL2 register definition Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-22  4:30   ` Reiji Watanabe
  2023-02-16 14:21 ` [PATCH 03/16] kvm: arm64: Expose {un,}lock_all_vcpus() to the reset of KVM Marc Zyngier
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Add the probing code for the FEAT_ECV variant that implements CNTPOFF_EL2.
Why is it optional is a mystery, but let's try and detect it.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kernel/cpufeature.c | 11 +++++++++++
 arch/arm64/tools/cpucaps       |  1 +
 2 files changed, 12 insertions(+)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 23bd2a926b74..36852f96898d 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2186,6 +2186,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
 		.sign = FTR_UNSIGNED,
 		.min_field_value = 1,
 	},
+	{
+		.desc = "Enhanced Counter Virtualization (CNTPOFF)",
+		.capability = ARM64_HAS_ECV_CNTPOFF,
+		.type = ARM64_CPUCAP_SYSTEM_FEATURE,
+		.matches = has_cpuid_feature,
+		.sys_reg = SYS_ID_AA64MMFR0_EL1,
+		.field_pos = ID_AA64MMFR0_EL1_ECV_SHIFT,
+		.field_width = 4,
+		.sign = FTR_UNSIGNED,
+		.min_field_value = 2,
+	},
 #ifdef CONFIG_ARM64_PAN
 	{
 		.desc = "Privileged Access Never",
diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
index 82c7e579a8ba..6a26a4678406 100644
--- a/arch/arm64/tools/cpucaps
+++ b/arch/arm64/tools/cpucaps
@@ -23,6 +23,7 @@ HAS_DCPOP
 HAS_DIT
 HAS_E0PD
 HAS_ECV
+HAS_ECV_CNTPOFF
 HAS_EPAN
 HAS_GENERIC_AUTH
 HAS_GENERIC_AUTH_ARCH_QARMA3
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 03/16] kvm: arm64: Expose {un,}lock_all_vcpus() to the reset of KVM
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
  2023-02-16 14:21 ` [PATCH 01/16] arm64: Add CNTPOFF_EL2 register definition Marc Zyngier
  2023-02-16 14:21 ` [PATCH 02/16] arm64: Add HAS_ECV_CNTPOFF capability Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-23 22:30   ` Colton Lewis
  2023-02-16 14:21 ` [PATCH 04/16] KVM: arm64: timers: Use a per-vcpu, per-timer accumulator for fractional ns Marc Zyngier
                   ` (15 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Being able to lock/unlock all vcpus in one go is a feature that
only the vgic has enjoyed so far. Let's be brave and expose it
to the world.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h     |  3 +++
 arch/arm64/kvm/arm.c                  | 39 +++++++++++++++++++++++++++
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 38 --------------------------
 arch/arm64/kvm/vgic/vgic.h            |  3 ---
 4 files changed, 42 insertions(+), 41 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index a1892a8f6032..2a8175f38439 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -919,6 +919,9 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu);
 
 int __init kvm_sys_reg_table_init(void);
 
+bool lock_all_vcpus(struct kvm *kvm);
+void unlock_all_vcpus(struct kvm *kvm);
+
 /* MMIO helpers */
 void kvm_mmio_write_buf(void *buf, unsigned int len, unsigned long data);
 unsigned long kvm_mmio_read_buf(const void *buf, unsigned int len);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3bd732eaf087..097750a01497 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1484,6 +1484,45 @@ long kvm_arch_vm_ioctl(struct file *filp,
 	}
 }
 
+/* unlocks vcpus from @vcpu_lock_idx and smaller */
+static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
+{
+	struct kvm_vcpu *tmp_vcpu;
+
+	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
+		tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
+		mutex_unlock(&tmp_vcpu->mutex);
+	}
+}
+
+void unlock_all_vcpus(struct kvm *kvm)
+{
+	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
+}
+
+/* Returns true if all vcpus were locked, false otherwise */
+bool lock_all_vcpus(struct kvm *kvm)
+{
+	struct kvm_vcpu *tmp_vcpu;
+	unsigned long c;
+
+	/*
+	 * Any time a vcpu is in an ioctl (including running), the
+	 * core KVM code tries to grab the vcpu->mutex.
+	 *
+	 * By grabbing the vcpu->mutex of all VCPUs we ensure that no
+	 * other VCPUs can fiddle with the state while we access it.
+	 */
+	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
+		if (!mutex_trylock(&tmp_vcpu->mutex)) {
+			unlock_vcpus(kvm, c - 1);
+			return false;
+		}
+	}
+
+	return true;
+}
+
 static unsigned long nvhe_percpu_size(void)
 {
 	return (unsigned long)CHOOSE_NVHE_SYM(__per_cpu_end) -
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index edeac2380591..04dd68835b3f 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -342,44 +342,6 @@ int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
 	return 0;
 }
 
-/* unlocks vcpus from @vcpu_lock_idx and smaller */
-static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
-{
-	struct kvm_vcpu *tmp_vcpu;
-
-	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
-		tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
-		mutex_unlock(&tmp_vcpu->mutex);
-	}
-}
-
-void unlock_all_vcpus(struct kvm *kvm)
-{
-	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
-}
-
-/* Returns true if all vcpus were locked, false otherwise */
-bool lock_all_vcpus(struct kvm *kvm)
-{
-	struct kvm_vcpu *tmp_vcpu;
-	unsigned long c;
-
-	/*
-	 * Any time a vcpu is run, vcpu_load is called which tries to grab the
-	 * vcpu->mutex.  By grabbing the vcpu->mutex of all VCPUs we ensure
-	 * that no other VCPUs are run and fiddle with the vgic state while we
-	 * access it.
-	 */
-	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
-		if (!mutex_trylock(&tmp_vcpu->mutex)) {
-			unlock_vcpus(kvm, c - 1);
-			return false;
-		}
-	}
-
-	return true;
-}
-
 /**
  * vgic_v2_attr_regs_access - allows user space to access VGIC v2 state
  *
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 7f7f3c5ed85a..f9923beedd27 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -273,9 +273,6 @@ int vgic_init(struct kvm *kvm);
 void vgic_debug_init(struct kvm *kvm);
 void vgic_debug_destroy(struct kvm *kvm);
 
-bool lock_all_vcpus(struct kvm *kvm);
-void unlock_all_vcpus(struct kvm *kvm);
-
 static inline int vgic_v3_max_apr_idx(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *cpu_if = &vcpu->arch.vgic_cpu;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 04/16] KVM: arm64: timers: Use a per-vcpu, per-timer accumulator for fractional ns
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (2 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 03/16] kvm: arm64: Expose {un,}lock_all_vcpus() to the reset of KVM Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-23 22:30   ` Colton Lewis
  2023-02-16 14:21 ` [PATCH 05/16] KVM: arm64: timers: Convert per-vcpu virtual offset to a global value Marc Zyngier
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Instead of accumulating the fractionnal ns value generated everytime
we compute a ns delta in a global variable, use a per-vcpu, per-timer
variable. This keeps the fractional ns local to the timer instead of
contributing to any odd, unrelated timer.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c  | 2 +-
 include/kvm/arm_arch_timer.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 00610477ec7b..329a8444724f 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -219,7 +219,7 @@ static u64 kvm_counter_compute_delta(struct arch_timer_context *timer_ctx,
 		ns = cyclecounter_cyc2ns(timecounter->cc,
 					 val - now,
 					 timecounter->mask,
-					 &timecounter->frac);
+					 &timer_ctx->ns_frac);
 		return ns;
 	}
 
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 71916de7c6c4..76174f4fb646 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -31,6 +31,7 @@ struct arch_timer_context {
 
 	/* Emulated Timer (may be unused) */
 	struct hrtimer			hrtimer;
+	u64				ns_frac;
 
 	/*
 	 * We have multiple paths which can save/restore the timer state onto
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 05/16] KVM: arm64: timers: Convert per-vcpu virtual offset to a global value
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (3 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 04/16] KVM: arm64: timers: Use a per-vcpu, per-timer accumulator for fractional ns Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-22  6:15   ` Reiji Watanabe
  2023-02-16 14:21 ` [PATCH 06/16] KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer Marc Zyngier
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Having a per-vcpu virtual offset is a pain. It needs to be synchronized
on each update, and expands badly to a setup where different timers can
have different offsets, or have composite offsets (as with NV).

So let's start by replacing the use of the CNTVOFF_EL2 shadow register
(which we want to reclaim for NV anyway), and make the virtual timer
carry a pointer to a VM-wide offset.

This simplifies the code significantly.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  3 +++
 arch/arm64/kvm/arch_timer.c       | 45 +++++++------------------------
 arch/arm64/kvm/hypercalls.c       |  2 +-
 include/kvm/arm_arch_timer.h      | 15 +++++++++++
 4 files changed, 29 insertions(+), 36 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 2a8175f38439..3adac0c5e175 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -193,6 +193,9 @@ struct kvm_arch {
 	/* Interrupt controller */
 	struct vgic_dist	vgic;
 
+	/* Timers */
+	struct arch_timer_vm_offsets offsets;
+
 	/* Mandated version of PSCI */
 	u32 psci_version;
 
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 329a8444724f..1bb44f668608 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -84,14 +84,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
 
 static u64 timer_get_offset(struct arch_timer_context *ctxt)
 {
-	struct kvm_vcpu *vcpu = ctxt->vcpu;
+	if (ctxt->offset.vm_offset)
+		return *ctxt->offset.vm_offset;
 
-	switch(arch_timer_ctx_index(ctxt)) {
-	case TIMER_VTIMER:
-		return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
-	default:
-		return 0;
-	}
+	return 0;
 }
 
 static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
@@ -128,15 +124,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
 
 static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
 {
-	struct kvm_vcpu *vcpu = ctxt->vcpu;
-
-	switch(arch_timer_ctx_index(ctxt)) {
-	case TIMER_VTIMER:
-		__vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
-		break;
-	default:
+	if (!ctxt->offset.vm_offset) {
 		WARN(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
+		return;
 	}
+
+	WRITE_ONCE(*ctxt->offset.vm_offset, offset);
 }
 
 u64 kvm_phys_timer_read(void)
@@ -765,25 +758,6 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-/* Make the updates of cntvoff for all vtimer contexts atomic */
-static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
-{
-	unsigned long i;
-	struct kvm *kvm = vcpu->kvm;
-	struct kvm_vcpu *tmp;
-
-	mutex_lock(&kvm->lock);
-	kvm_for_each_vcpu(i, tmp, kvm)
-		timer_set_offset(vcpu_vtimer(tmp), cntvoff);
-
-	/*
-	 * When called from the vcpu create path, the CPU being created is not
-	 * included in the loop above, so we just set it here as well.
-	 */
-	timer_set_offset(vcpu_vtimer(vcpu), cntvoff);
-	mutex_unlock(&kvm->lock);
-}
-
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
@@ -791,10 +765,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
 	vtimer->vcpu = vcpu;
+	vtimer->offset.vm_offset = &vcpu->kvm->arch.offsets.voffset;
 	ptimer->vcpu = vcpu;
 
 	/* Synchronize cntvoff across all vtimers of a VM. */
-	update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
+	timer_set_offset(vtimer, kvm_phys_timer_read());
 	timer_set_offset(ptimer, 0);
 
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
@@ -840,7 +815,7 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 		break;
 	case KVM_REG_ARM_TIMER_CNT:
 		timer = vcpu_vtimer(vcpu);
-		update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
+		timer_set_offset(timer, kvm_phys_timer_read() - value);
 		break;
 	case KVM_REG_ARM_TIMER_CVAL:
 		timer = vcpu_vtimer(vcpu);
diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
index 64c086c02c60..169d629f97cb 100644
--- a/arch/arm64/kvm/hypercalls.c
+++ b/arch/arm64/kvm/hypercalls.c
@@ -44,7 +44,7 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
 	feature = smccc_get_arg1(vcpu);
 	switch (feature) {
 	case KVM_PTP_VIRT_COUNTER:
-		cycles = systime_snapshot.cycles - vcpu_read_sys_reg(vcpu, CNTVOFF_EL2);
+		cycles = systime_snapshot.cycles - vcpu->kvm->arch.offsets.voffset;
 		break;
 	case KVM_PTP_PHYS_COUNTER:
 		cycles = systime_snapshot.cycles;
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 76174f4fb646..41fda6255174 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -23,6 +23,19 @@ enum kvm_arch_timer_regs {
 	TIMER_REG_CTL,
 };
 
+struct arch_timer_offset {
+	/*
+	 * If set, pointer to one of the offsets in the kvm's offset
+	 * structure. If NULL, assume a zero offset.
+	 */
+	u64	*vm_offset;
+};
+
+struct arch_timer_vm_offsets {
+	/* Offset applied to the virtual timer/counter */
+	u64	voffset;
+};
+
 struct arch_timer_context {
 	struct kvm_vcpu			*vcpu;
 
@@ -33,6 +46,8 @@ struct arch_timer_context {
 	struct hrtimer			hrtimer;
 	u64				ns_frac;
 
+	/* Offset for this counter/timer */
+	struct arch_timer_offset	offset;
 	/*
 	 * We have multiple paths which can save/restore the timer state onto
 	 * the hardware, so we need some way of keeping track of where the
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 06/16] KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (4 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 05/16] KVM: arm64: timers: Convert per-vcpu virtual offset to a global value Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-23 22:34   ` Colton Lewis
  2023-02-16 14:21 ` [PATCH 07/16] KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2 Marc Zyngier
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

With ECV and CNTPOFF_EL2, it is very easy to offer an offset for
the physical timer. So let's do just that.

Nothing can set the offset yet, so this should have no effect
whatsoever (famous last words...).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c          | 18 +++++++++++++++++-
 include/clocksource/arm_arch_timer.h |  1 +
 include/kvm/arm_arch_timer.h         |  2 ++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 1bb44f668608..a6d5c556659e 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -52,6 +52,11 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer,
 			      enum kvm_arch_timer_regs treg);
 
+static bool has_cntpoff(void)
+{
+	return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
+}
+
 u32 timer_get_ctl(struct arch_timer_context *ctxt)
 {
 	struct kvm_vcpu *vcpu = ctxt->vcpu;
@@ -84,7 +89,7 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
 
 static u64 timer_get_offset(struct arch_timer_context *ctxt)
 {
-	if (ctxt->offset.vm_offset)
+	if (ctxt && ctxt->offset.vm_offset)
 		return *ctxt->offset.vm_offset;
 
 	return 0;
@@ -432,6 +437,12 @@ static void set_cntvoff(u64 cntvoff)
 	kvm_call_hyp(__kvm_timer_set_cntvoff, cntvoff);
 }
 
+static void set_cntpoff(u64 cntpoff)
+{
+	if (has_cntpoff())
+		write_sysreg_s(cntpoff, SYS_CNTPOFF_EL2);
+}
+
 static void timer_save_state(struct arch_timer_context *ctx)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(ctx->vcpu);
@@ -480,6 +491,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 		write_sysreg_el0(0, SYS_CNTP_CTL);
 		isb();
 
+		set_cntpoff(0);
 		break;
 	case NR_KVM_TIMERS:
 		BUG();
@@ -550,6 +562,7 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
 		break;
 	case TIMER_PTIMER:
+		set_cntpoff(timer_get_offset(ctx));
 		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
@@ -767,6 +780,7 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	vtimer->vcpu = vcpu;
 	vtimer->offset.vm_offset = &vcpu->kvm->arch.offsets.voffset;
 	ptimer->vcpu = vcpu;
+	ptimer->offset.vm_offset = &vcpu->kvm->arch.offsets.poffset;
 
 	/* Synchronize cntvoff across all vtimers of a VM. */
 	timer_set_offset(vtimer, kvm_phys_timer_read());
@@ -1297,6 +1311,8 @@ void kvm_timer_init_vhe(void)
 	val = read_sysreg(cnthctl_el2);
 	val |= (CNTHCTL_EL1PCEN << cnthctl_shift);
 	val |= (CNTHCTL_EL1PCTEN << cnthctl_shift);
+	if (cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF))
+		val |= CNTHCTL_ECV;
 	write_sysreg(val, cnthctl_el2);
 }
 
diff --git a/include/clocksource/arm_arch_timer.h b/include/clocksource/arm_arch_timer.h
index 057c8964aefb..cbbc9a6dc571 100644
--- a/include/clocksource/arm_arch_timer.h
+++ b/include/clocksource/arm_arch_timer.h
@@ -21,6 +21,7 @@
 #define CNTHCTL_EVNTEN			(1 << 2)
 #define CNTHCTL_EVNTDIR			(1 << 3)
 #define CNTHCTL_EVNTI			(0xF << 4)
+#define CNTHCTL_ECV			(1 << 12)
 
 enum arch_timer_reg {
 	ARCH_TIMER_REG_CTRL,
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 41fda6255174..4267eeb24353 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -34,6 +34,8 @@ struct arch_timer_offset {
 struct arch_timer_vm_offsets {
 	/* Offset applied to the virtual timer/counter */
 	u64	voffset;
+	/* Offset applied to the physical timer/counter */
+	u64	poffset;
 };
 
 struct arch_timer_context {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 07/16] KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (5 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 06/16] KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-23 22:40   ` Colton Lewis
  2023-02-16 14:21 ` [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets Marc Zyngier
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

CNTPOFF_EL2 is awesome, but it is mostly vapourware, and no publicly
available implementation has it. So for the common mortals, let's
implement the emulated version of this thing.

It means trapping accesses to the physical counter and timer, and
emulate some of it as necessary.

As for CNTPOFF_EL2, nobody sets the offset yet.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c        | 97 +++++++++++++++++++++++-------
 arch/arm64/kvm/hyp/nvhe/timer-sr.c | 18 ++++--
 2 files changed, 88 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index a6d5c556659e..444ea6dca218 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -458,6 +458,8 @@ static void timer_save_state(struct arch_timer_context *ctx)
 		goto out;
 
 	switch (index) {
+		u64 cval;
+
 	case TIMER_VTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
 		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
@@ -485,7 +487,12 @@ static void timer_save_state(struct arch_timer_context *ctx)
 		break;
 	case TIMER_PTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
-		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTP_CVAL));
+		cval = read_sysreg_el0(SYS_CNTP_CVAL);
+
+		if (!has_cntpoff())
+			cval -= timer_get_offset(ctx);
+
+		timer_set_cval(ctx, cval);
 
 		/* Disable the timer */
 		write_sysreg_el0(0, SYS_CNTP_CTL);
@@ -555,6 +562,8 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 		goto out;
 
 	switch (index) {
+		u64 cval, offset;
+
 	case TIMER_VTIMER:
 		set_cntvoff(timer_get_offset(ctx));
 		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
@@ -562,8 +571,12 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
 		break;
 	case TIMER_PTIMER:
-		set_cntpoff(timer_get_offset(ctx));
-		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTP_CVAL);
+		cval = timer_get_cval(ctx);
+		offset = timer_get_offset(ctx);
+		set_cntpoff(offset);
+		if (!has_cntpoff())
+			cval += offset;
+		write_sysreg_el0(cval, SYS_CNTP_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTP_CTL);
 		break;
@@ -634,6 +647,62 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
 		enable_percpu_irq(host_vtimer_irq, host_vtimer_irq_flags);
 }
 
+/* If _pred is true, set bit in _set, otherwise set it in _clr */
+#define assign_clear_set_bit(_pred, _bit, _clr, _set)			\
+	do {								\
+		if (_pred)						\
+			(_set) |= (_bit);				\
+		else							\
+			(_clr) |= (_bit);				\
+	} while (0)
+
+static void timer_set_traps(struct kvm_vcpu *vcpu, struct timer_map *map)
+{
+	bool tpt, tpc;
+	u64 clr, set;
+
+	/*
+	 * No trapping gets configured here with nVHE. See
+	 * __timer_enable_traps(), which is where the stuff happens.
+	 */
+	if (!has_vhe())
+		return;
+
+	/*
+	 * Our default policy is not to trap anything. As we progress
+	 * within this function, reality kicks in and we start adding
+	 * traps based on emulation requirements.
+	 */
+	tpt = tpc = false;
+
+	/*
+	 * We have two possibility to deal with a physical offset:
+	 *
+	 * - Either we have CNTPOFF (yay!) or the offset is 0:
+	 *   we let the guest freely access the HW
+	 *
+	 * - or neither of these condition apply:
+	 *   we trap accesses to the HW, but still use it
+	 *   after correcting the physical offset
+	 */
+	if (!has_cntpoff() && timer_get_offset(map->direct_ptimer))
+		tpt = tpc = true;
+
+	/*
+	/*
+	 * Now that we have collected our requirements, compute the
+	 * trap and enable bits.
+	 */
+	set = 0;
+	clr = 0;
+
+	assign_clear_set_bit(tpt, CNTHCTL_EL1PCEN << 10, set, clr);
+	assign_clear_set_bit(tpc, CNTHCTL_EL1PCTEN << 10, set, clr);
+
+	/* This only happens on VHE, so use the CNTKCTL_EL1 accessor */
+	sysreg_clear_set(cntkctl_el1, clr, set);
+}
+
 void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
@@ -657,9 +726,10 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	timer_restore_state(map.direct_vtimer);
 	if (map.direct_ptimer)
 		timer_restore_state(map.direct_ptimer);
-
 	if (map.emul_ptimer)
 		timer_emulate(map.emul_ptimer);
+
+	timer_set_traps(vcpu, &map);
 }
 
 bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
@@ -1293,27 +1363,12 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
 }
 
 /*
- * On VHE system, we only need to configure the EL2 timer trap register once,
- * not for every world switch.
- * The host kernel runs at EL2 with HCR_EL2.TGE == 1,
- * and this makes those bits have no effect for the host kernel execution.
+ * If we have CNTPOFF, permanently set ECV to enable it.
  */
 void kvm_timer_init_vhe(void)
 {
-	/* When HCR_EL2.E2H ==1, EL1PCEN and EL1PCTEN are shifted by 10 */
-	u32 cnthctl_shift = 10;
-	u64 val;
-
-	/*
-	 * VHE systems allow the guest direct access to the EL1 physical
-	 * timer/counter.
-	 */
-	val = read_sysreg(cnthctl_el2);
-	val |= (CNTHCTL_EL1PCEN << cnthctl_shift);
-	val |= (CNTHCTL_EL1PCTEN << cnthctl_shift);
 	if (cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF))
-		val |= CNTHCTL_ECV;
-	write_sysreg(val, cnthctl_el2);
+		sysreg_clear_set(cntkctl_el1, 0, CNTHCTL_ECV);
 }
 
 static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
diff --git a/arch/arm64/kvm/hyp/nvhe/timer-sr.c b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
index 9072e71693ba..d644f9d5d903 100644
--- a/arch/arm64/kvm/hyp/nvhe/timer-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
@@ -9,6 +9,7 @@
 #include <linux/kvm_host.h>
 
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_mmu.h>
 
 void __kvm_timer_set_cntvoff(u64 cntvoff)
 {
@@ -35,14 +36,19 @@ void __timer_disable_traps(struct kvm_vcpu *vcpu)
  */
 void __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
-	u64 val;
+	u64 clr = 0, set = 0;
 
 	/*
 	 * Disallow physical timer access for the guest
-	 * Physical counter access is allowed
+	 * Physical counter access is allowed if no offset is enforced
+	 * or running protected (we don't offset anything in this case).
 	 */
-	val = read_sysreg(cnthctl_el2);
-	val &= ~CNTHCTL_EL1PCEN;
-	val |= CNTHCTL_EL1PCTEN;
-	write_sysreg(val, cnthctl_el2);
+	clr = CNTHCTL_EL1PCEN;
+	if (is_protected_kvm_enabled() ||
+	    !kern_hyp_va(vcpu->kvm)->arch.offsets.poffset)
+		set |= CNTHCTL_EL1PCTEN;
+	else
+		clr |= CNTHCTL_EL1PCTEN;
+
+	sysreg_clear_set(cnthctl_el2, clr, set);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (6 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 07/16] KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2 Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-16 22:09   ` Oliver Upton
  2023-02-23 22:41   ` Colton Lewis
  2023-02-16 14:21 ` [PATCH 09/16] KVM: arm64: timers: Allow save/restoring of the physical timer Marc Zyngier
                   ` (10 subsequent siblings)
  18 siblings, 2 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

And this is the moment you have all been waiting for: setting the
counter offsets from userspace.

We expose a brand new capability that reports the ability to set
the offsets for both the virtual and physical sides, independently.

In keeping with the architecture, the offsets are expressed as
a delta that is substracted from the physical counter value.

Once this new API is used, there is no going back, and the counters
cannot be written to to set the offsets implicitly (the writes
are instead ignored).

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |  4 +++
 arch/arm64/include/uapi/asm/kvm.h | 15 ++++++++++
 arch/arm64/kvm/arch_timer.c       | 46 +++++++++++++++++++++++++++----
 arch/arm64/kvm/arm.c              |  8 ++++++
 include/uapi/linux/kvm.h          |  3 ++
 5 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3adac0c5e175..8514a37cf8d5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -221,6 +221,8 @@ struct kvm_arch {
 #define KVM_ARCH_FLAG_EL1_32BIT				4
 	/* PSCI SYSTEM_SUSPEND enabled for the guest */
 #define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED		5
+	/* VM counter offsets */
+#define KVM_ARCH_FLAG_COUNTER_OFFSETS			6
 
 	unsigned long flags;
 
@@ -1010,6 +1012,8 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
 
 long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
 				struct kvm_arm_copy_mte_tags *copy_tags);
+int kvm_vm_ioctl_set_counter_offsets(struct kvm *kvm,
+				     struct kvm_arm_counter_offsets *offsets);
 
 /* Guest/host FPSIMD coordination helpers */
 int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index f8129c624b07..2d7557a160bd 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -198,6 +198,21 @@ struct kvm_arm_copy_mte_tags {
 	__u64 reserved[2];
 };
 
+/*
+ * Counter/Timer offset structure. Describe the virtual/physical offsets.
+ * To be used with KVM_ARM_SET_CNT_OFFSETS.
+ */
+struct kvm_arm_counter_offsets {
+	__u64 virtual_offset;
+	__u64 physical_offset;
+
+#define KVM_COUNTER_SET_VOFFSET_FLAG	(1UL << 0)
+#define KVM_COUNTER_SET_POFFSET_FLAG	(1UL << 1)
+
+	__u64 flags;
+	__u64 reserved;
+};
+
 #define KVM_ARM_TAGS_TO_GUEST		0
 #define KVM_ARM_TAGS_FROM_GUEST		1
 
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 444ea6dca218..b04544b702f9 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -852,9 +852,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 	ptimer->vcpu = vcpu;
 	ptimer->offset.vm_offset = &vcpu->kvm->arch.offsets.poffset;
 
-	/* Synchronize cntvoff across all vtimers of a VM. */
-	timer_set_offset(vtimer, kvm_phys_timer_read());
-	timer_set_offset(ptimer, 0);
+	/* Synchronize offsets across timers of a VM if not already provided */
+	if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS, &vcpu->kvm->arch.flags)) {
+		timer_set_offset(vtimer, kvm_phys_timer_read());
+		timer_set_offset(ptimer, 0);
+	}
 
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
 	timer->bg_timer.function = kvm_bg_timer_expire;
@@ -898,8 +900,11 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 		kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
 		break;
 	case KVM_REG_ARM_TIMER_CNT:
-		timer = vcpu_vtimer(vcpu);
-		timer_set_offset(timer, kvm_phys_timer_read() - value);
+		if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS,
+			      &vcpu->kvm->arch.flags)) {
+			timer = vcpu_vtimer(vcpu);
+			timer_set_offset(timer, kvm_phys_timer_read() - value);
+		}
 		break;
 	case KVM_REG_ARM_TIMER_CVAL:
 		timer = vcpu_vtimer(vcpu);
@@ -909,6 +914,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
 		timer = vcpu_ptimer(vcpu);
 		kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
 		break;
+	case KVM_REG_ARM_PTIMER_CNT:
+		if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS,
+			      &vcpu->kvm->arch.flags)) {
+			timer = vcpu_ptimer(vcpu);
+			timer_set_offset(timer, kvm_phys_timer_read() - value);
+		}
+		break;
 	case KVM_REG_ARM_PTIMER_CVAL:
 		timer = vcpu_ptimer(vcpu);
 		kvm_arm_timer_write(vcpu, timer, TIMER_REG_CVAL, value);
@@ -1446,3 +1458,27 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 	return -ENXIO;
 }
+
+int kvm_vm_ioctl_set_counter_offsets(struct kvm *kvm,
+				     struct kvm_arm_counter_offsets *offsets)
+{
+	if (offsets->reserved ||
+	    (offsets->flags & ~(KVM_COUNTER_SET_VOFFSET_FLAG |
+				KVM_COUNTER_SET_POFFSET_FLAG)))
+		return -EINVAL;
+
+	if (!lock_all_vcpus(kvm))
+		return -EBUSY;
+
+	set_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS, &kvm->arch.flags);
+
+	if (offsets->flags & KVM_COUNTER_SET_VOFFSET_FLAG)
+		kvm->arch.offsets.voffset = offsets->virtual_offset;
+
+	if (offsets->flags & KVM_COUNTER_SET_POFFSET_FLAG)
+		kvm->arch.offsets.poffset = offsets->physical_offset;
+
+	unlock_all_vcpus(kvm);
+
+	return 0;
+}
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 097750a01497..1182d8ce7319 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -220,6 +220,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
 	case KVM_CAP_ARM_SYSTEM_SUSPEND:
+	case KVM_CAP_COUNTER_OFFSETS:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
@@ -1479,6 +1480,13 @@ long kvm_arch_vm_ioctl(struct file *filp,
 			return -EFAULT;
 		return kvm_vm_ioctl_mte_copy_tags(kvm, &copy_tags);
 	}
+	case KVM_ARM_SET_CNT_OFFSETS: {
+		struct kvm_arm_counter_offsets offsets;
+
+		if (copy_from_user(&offsets, argp, sizeof(offsets)))
+			return -EFAULT;
+		return kvm_vm_ioctl_set_counter_offsets(kvm, &offsets);
+	}
 	default:
 		return -EINVAL;
 	}
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 55155e262646..3753765dbc4f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1175,6 +1175,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_DIRTY_LOG_RING_ACQ_REL 223
 #define KVM_CAP_S390_PROTECTED_ASYNC_DISABLE 224
 #define KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP 225
+#define KVM_CAP_COUNTER_OFFSETS 226
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1534,6 +1535,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_PMU_EVENT_FILTER  _IOW(KVMIO,  0xb2, struct kvm_pmu_event_filter)
 #define KVM_PPC_SVM_OFF		  _IO(KVMIO,  0xb3)
 #define KVM_ARM_MTE_COPY_TAGS	  _IOR(KVMIO,  0xb4, struct kvm_arm_copy_mte_tags)
+/* Available with KVM_CAP_COUNTER_OFFSETS */
+#define KVM_ARM_SET_CNT_OFFSETS	  _IOW(KVMIO,  0xb5, struct kvm_arm_counter_offsets)
 
 /* ioctl for vm fd */
 #define KVM_CREATE_DEVICE	  _IOWR(KVMIO,  0xe0, struct kvm_create_device)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 09/16] KVM: arm64: timers: Allow save/restoring of the physical timer
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (7 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-16 14:21 ` [PATCH 10/16] KVM: arm64: timers: Rationalise per-vcpu timer init Marc Zyngier
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Nothing like being 10 year late to a party!

Now that userspace can set counter offsets, we can save/restore
the physical timer as well!

Nobody really cared so far, but you're welcome anyway.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/guest.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 07444fa22888..46e910819de6 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -590,11 +590,16 @@ static unsigned long num_core_regs(const struct kvm_vcpu *vcpu)
 	return copy_core_reg_indices(vcpu, NULL);
 }
 
-/**
- * ARM64 versions of the TIMER registers, always available on arm64
- */
+static const u64 timer_reg_list[] = {
+	KVM_REG_ARM_TIMER_CTL,
+	KVM_REG_ARM_TIMER_CNT,
+	KVM_REG_ARM_TIMER_CVAL,
+	KVM_REG_ARM_PTIMER_CTL,
+	KVM_REG_ARM_PTIMER_CNT,
+	KVM_REG_ARM_PTIMER_CVAL,
+};
 
-#define NUM_TIMER_REGS 3
+#define NUM_TIMER_REGS ARRAY_SIZE(timer_reg_list)
 
 static bool is_timer_reg(u64 index)
 {
@@ -602,6 +607,9 @@ static bool is_timer_reg(u64 index)
 	case KVM_REG_ARM_TIMER_CTL:
 	case KVM_REG_ARM_TIMER_CNT:
 	case KVM_REG_ARM_TIMER_CVAL:
+	case KVM_REG_ARM_PTIMER_CTL:
+	case KVM_REG_ARM_PTIMER_CNT:
+	case KVM_REG_ARM_PTIMER_CVAL:
 		return true;
 	}
 	return false;
@@ -609,14 +617,11 @@ static bool is_timer_reg(u64 index)
 
 static int copy_timer_indices(struct kvm_vcpu *vcpu, u64 __user *uindices)
 {
-	if (put_user(KVM_REG_ARM_TIMER_CTL, uindices))
-		return -EFAULT;
-	uindices++;
-	if (put_user(KVM_REG_ARM_TIMER_CNT, uindices))
-		return -EFAULT;
-	uindices++;
-	if (put_user(KVM_REG_ARM_TIMER_CVAL, uindices))
-		return -EFAULT;
+	for (int i = 0; i < NUM_TIMER_REGS; i++) {
+		if (put_user(timer_reg_list[i], uindices))
+			return -EFAULT;
+		uindices++;
+	}
 
 	return 0;
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 10/16] KVM: arm64: timers: Rationalise per-vcpu timer init
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (8 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 09/16] KVM: arm64: timers: Allow save/restoring of the physical timer Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-16 14:21 ` [PATCH 11/16] KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co Marc Zyngier
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

The way we initialise our timer contexts may be satisfactory
for two timers, but will be getting pretty annoying with four.

Cleanup the whole thing by removing the code duplication and
getting rid of unused IRQ configuration elements.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c  | 73 +++++++++++++++++++-----------------
 include/kvm/arm_arch_timer.h |  1 -
 2 files changed, 39 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index b04544b702f9..f968d6d3479b 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -30,14 +30,9 @@ static u32 host_ptimer_irq_flags;
 
 static DEFINE_STATIC_KEY_FALSE(has_gic_active_state);
 
-static const struct kvm_irq_level default_ptimer_irq = {
-	.irq	= 30,
-	.level	= 1,
-};
-
-static const struct kvm_irq_level default_vtimer_irq = {
-	.irq	= 27,
-	.level	= 1,
+static const u8 default_ppi[] = {
+	[TIMER_PTIMER]  = 30,
+	[TIMER_VTIMER]  = 27,
 };
 
 static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
@@ -821,12 +816,14 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	 * resets the timer to be disabled and unmasked and is compliant with
 	 * the ARMv7 architecture.
 	 */
-	timer_set_ctl(vcpu_vtimer(vcpu), 0);
-	timer_set_ctl(vcpu_ptimer(vcpu), 0);
+	for (int i = 0; i < NR_KVM_TIMERS; i++)
+		timer_set_ctl(vcpu_get_timer(vcpu, i), 0);
+
 
 	if (timer->enabled) {
-		kvm_timer_update_irq(vcpu, false, vcpu_vtimer(vcpu));
-		kvm_timer_update_irq(vcpu, false, vcpu_ptimer(vcpu));
+		for (int i = 0; i < NR_KVM_TIMERS; i++)
+			kvm_timer_update_irq(vcpu, false,
+					     vcpu_get_timer(vcpu, i));
 
 		if (irqchip_in_kernel(vcpu->kvm)) {
 			kvm_vgic_reset_mapped_irq(vcpu, map.direct_vtimer->irq.irq);
@@ -841,39 +838,47 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+static void timer_context_init(struct kvm_vcpu *vcpu, int timerid)
+{
+	struct arch_timer_context *ctxt = vcpu_get_timer(vcpu, timerid);
+	struct kvm *kvm = vcpu->kvm;
+
+	ctxt->vcpu = vcpu;
+
+	if (timerid == TIMER_VTIMER)
+		ctxt->offset.vm_offset = &kvm->arch.offsets.voffset;
+	else
+		ctxt->offset.vm_offset = &kvm->arch.offsets.poffset;
+
+	hrtimer_init(&ctxt->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
+	ctxt->hrtimer.function = kvm_hrtimer_expire;
+	ctxt->irq.irq = default_ppi[timerid];
+
+	switch (timerid) {
+	case TIMER_PTIMER:
+		ctxt->host_timer_irq = host_ptimer_irq;
+		break;
+	case TIMER_VTIMER:
+		ctxt->host_timer_irq = host_vtimer_irq;
+		break;
+	}
+}
+
 void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
 {
 	struct arch_timer_cpu *timer = vcpu_timer(vcpu);
-	struct arch_timer_context *vtimer = vcpu_vtimer(vcpu);
-	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 
-	vtimer->vcpu = vcpu;
-	vtimer->offset.vm_offset = &vcpu->kvm->arch.offsets.voffset;
-	ptimer->vcpu = vcpu;
-	ptimer->offset.vm_offset = &vcpu->kvm->arch.offsets.poffset;
+	for (int i = 0; i < NR_KVM_TIMERS; i++)
+		timer_context_init(vcpu, i);
 
 	/* Synchronize offsets across timers of a VM if not already provided */
 	if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS, &vcpu->kvm->arch.flags)) {
-		timer_set_offset(vtimer, kvm_phys_timer_read());
-		timer_set_offset(ptimer, 0);
+		timer_set_offset(vcpu_vtimer(vcpu), kvm_phys_timer_read());
+		timer_set_offset(vcpu_ptimer(vcpu), 0);
 	}
 
 	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
 	timer->bg_timer.function = kvm_bg_timer_expire;
-
-	hrtimer_init(&vtimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
-	hrtimer_init(&ptimer->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
-	vtimer->hrtimer.function = kvm_hrtimer_expire;
-	ptimer->hrtimer.function = kvm_hrtimer_expire;
-
-	vtimer->irq.irq = default_vtimer_irq.irq;
-	ptimer->irq.irq = default_ptimer_irq.irq;
-
-	vtimer->host_timer_irq = host_vtimer_irq;
-	ptimer->host_timer_irq = host_ptimer_irq;
-
-	vtimer->host_timer_irq_flags = host_vtimer_irq_flags;
-	ptimer->host_timer_irq_flags = host_ptimer_irq_flags;
 }
 
 void kvm_timer_cpu_up(void)
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 4267eeb24353..62ef4883e644 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -59,7 +59,6 @@ struct arch_timer_context {
 
 	/* Duplicated state from arch_timer.c for convenience */
 	u32				host_timer_irq;
-	u32				host_timer_irq_flags;
 };
 
 struct timer_map {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 11/16] KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (9 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 10/16] KVM: arm64: timers: Rationalise per-vcpu timer init Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-16 14:21 ` [PATCH 12/16] KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset Marc Zyngier
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Add some basic documentation on the effects of KVM_ARM_SET_CNT_OFFSETS.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 Documentation/virt/kvm/api.rst | 47 ++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0a67cb738013..bdd361fd90f4 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5937,6 +5937,53 @@ delivery must be provided via the "reg_aen" struct.
 The "pad" and "reserved" fields may be used for future extensions and should be
 set to 0s by userspace.
 
+4.138 KVM_ARM_SET_CNT_OFFSETS
+-----------------------------
+
+:Capability: KVM_CAP_COUNTER_OFFSETS
+:Architectures: arm64
+:Type: vm ioctl
+:Parameters: struct kvm_arm_counter_offsets (in)
+:Returns: 0 on success, <0 on error
+
+This capability indicates that userspace is able to apply a set of
+VM-wide offsets to the virtual and physical counters as viewed by the
+guest using the KVM_ARM_SET_CNT_OFFSETS ioctl and the following data
+structure:
+
+	struct kvm_arm_counter_offsets {
+		__u64 virtual_offset;
+		__u64 physical_offset;
+
+	#define KVM_COUNTER_SET_VOFFSET_FLAG   (1UL << 0)
+	#define KVM_COUNTER_SET_POFFSET_FLAG   (1UL << 1)
+
+		__u64 flags;
+		__u64 reserved;
+	};
+
+Each of the two offsets describe a number of counter cycles that are
+subtracted from the corresponding counter (similar to the effects of
+the CNTVOFF_EL2 and CNTPOFF_EL2 system registers). For each offset
+that userspace wishes to change, it must set the corresponding flag in
+the "flag" field. These values always apply to all vcpus (already
+created or created after this ioctl) in this VM.
+
+It is userspace's responsibility to compute the offsets based, for
+example, on previous values of the guest counters.
+
+With nested virtualisation, the virtual offset as no effect on the
+execution of the guest, and the nested hypervisor is responsible for
+the offset that is presented to its own guests.
+
+Any flag value that isn't described here, or any value other than 0
+for the "reserved" field may result in an error being returned.
+
+Note that using this ioctl results in KVM ignoring subsequent
+userspace writes to the CNTVCT_EL0 and CNTPCT_EL0 registers using the
+SET_ONE_REG interface. No error will be returned, but the resulting
+offset will not be applied.
+
 5. The kvm_run structure
 ========================
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 12/16] KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (10 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 11/16] KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-24 20:07   ` Colton Lewis
  2023-02-16 14:21 ` [PATCH 13/16] KVM: arm64: nv: timers: Support hyp timer emulation Marc Zyngier
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Being able to set a global offset isn't enough.

With NV, we also need to a per-vcpu, per-timer offset (for example,
CNTVCT_EL0 being offset by CNTVOFF_EL2).

Use a similar method as the VM-wide offset to have a timer point
to the shadow register that contains the offset value.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/kvm/arch_timer.c  | 13 ++++++++++---
 include/kvm/arm_arch_timer.h |  5 +++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index f968d6d3479b..1f442ac0c499 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -84,10 +84,17 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
 
 static u64 timer_get_offset(struct arch_timer_context *ctxt)
 {
-	if (ctxt && ctxt->offset.vm_offset)
-		return *ctxt->offset.vm_offset;
+	u64 offset = 0;
 
-	return 0;
+	if (!ctxt)
+		return 0;
+
+	if (ctxt->offset.vm_offset)
+		offset += *ctxt->offset.vm_offset;
+	if (ctxt->offset.vcpu_offset)
+		offset += *ctxt->offset.vcpu_offset;
+
+	return offset;
 }
 
 static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index 62ef4883e644..e76e513b90c5 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -29,6 +29,11 @@ struct arch_timer_offset {
 	 * structure. If NULL, assume a zero offset.
 	 */
 	u64	*vm_offset;
+	/*
+	 * If set, pointer to one of the offsets in the vcpu's sysreg
+	 * array. If NULL, assume a zero offset.
+	 */
+	u64	*vcpu_offset;
 };
 
 struct arch_timer_vm_offsets {
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 13/16] KVM: arm64: nv: timers: Support hyp timer emulation
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (11 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 12/16] KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-24 20:08   ` Colton Lewis
  2023-02-16 14:21 ` [PATCH 14/16] KVM: arm64: selftests: Add physical timer registers to the sysreg list Marc Zyngier
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2, Christoffer Dall

Emulating EL2 also means emulating the EL2 timers. To do so, we expand
our timer framework to deal with at most 4 timers. At any given time,
two timers are using the HW timers, and the two others are purely
emulated.

The role of deciding which is which at any given time is left to a
mapping function which is called every time we need to make such a
decision.

Co-developed-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Christoffer Dall <christoffer.dall@arm.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 arch/arm64/include/asm/kvm_host.h |   4 +
 arch/arm64/kvm/arch_timer.c       | 187 ++++++++++++++++++++++++++++--
 arch/arm64/kvm/trace_arm.h        |   6 +-
 arch/arm64/kvm/vgic/vgic.c        |  15 +++
 include/kvm/arm_arch_timer.h      |   8 +-
 include/kvm/arm_vgic.h            |   1 +
 6 files changed, 207 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8514a37cf8d5..4dceec1a5924 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -367,6 +367,10 @@ enum vcpu_sysreg {
 	TPIDR_EL2,	/* EL2 Software Thread ID Register */
 	CNTHCTL_EL2,	/* Counter-timer Hypervisor Control register */
 	SP_EL2,		/* EL2 Stack Pointer */
+	CNTHP_CTL_EL2,
+	CNTHP_CVAL_EL2,
+	CNTHV_CTL_EL2,
+	CNTHV_CVAL_EL2,
 
 	NR_SYS_REGS	/* Nothing after this line! */
 };
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 1f442ac0c499..0aac92c2d7e5 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -16,6 +16,7 @@
 #include <asm/arch_timer.h>
 #include <asm/kvm_emulate.h>
 #include <asm/kvm_hyp.h>
+#include <asm/kvm_nested.h>
 
 #include <kvm/arm_vgic.h>
 #include <kvm/arm_arch_timer.h>
@@ -33,6 +34,8 @@ static DEFINE_STATIC_KEY_FALSE(has_gic_active_state);
 static const u8 default_ppi[] = {
 	[TIMER_PTIMER]  = 30,
 	[TIMER_VTIMER]  = 27,
+	[TIMER_HPTIMER] = 26,
+	[TIMER_HVTIMER] = 28,
 };
 
 static bool kvm_timer_irq_can_fire(struct arch_timer_context *timer_ctx);
@@ -46,6 +49,11 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer,
 			      enum kvm_arch_timer_regs treg);
+static bool kvm_arch_timer_get_input_level(int vintid);
+
+static struct irq_ops arch_timer_irq_ops = {
+	.get_input_level = kvm_arch_timer_get_input_level,
+};
 
 static bool has_cntpoff(void)
 {
@@ -61,6 +69,10 @@ u32 timer_get_ctl(struct arch_timer_context *ctxt)
 		return __vcpu_sys_reg(vcpu, CNTV_CTL_EL0);
 	case TIMER_PTIMER:
 		return __vcpu_sys_reg(vcpu, CNTP_CTL_EL0);
+	case TIMER_HVTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHV_CTL_EL2);
+	case TIMER_HPTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHP_CTL_EL2);
 	default:
 		WARN_ON(1);
 		return 0;
@@ -76,6 +88,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
 		return __vcpu_sys_reg(vcpu, CNTV_CVAL_EL0);
 	case TIMER_PTIMER:
 		return __vcpu_sys_reg(vcpu, CNTP_CVAL_EL0);
+	case TIMER_HVTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2);
+	case TIMER_HPTIMER:
+		return __vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2);
 	default:
 		WARN_ON(1);
 		return 0;
@@ -108,6 +124,12 @@ static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
 	case TIMER_PTIMER:
 		__vcpu_sys_reg(vcpu, CNTP_CTL_EL0) = ctl;
 		break;
+	case TIMER_HVTIMER:
+		__vcpu_sys_reg(vcpu, CNTHV_CTL_EL2) = ctl;
+		break;
+	case TIMER_HPTIMER:
+		__vcpu_sys_reg(vcpu, CNTHP_CTL_EL2) = ctl;
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -124,6 +146,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
 	case TIMER_PTIMER:
 		__vcpu_sys_reg(vcpu, CNTP_CVAL_EL0) = cval;
 		break;
+	case TIMER_HVTIMER:
+		__vcpu_sys_reg(vcpu, CNTHV_CVAL_EL2) = cval;
+		break;
+	case TIMER_HPTIMER:
+		__vcpu_sys_reg(vcpu, CNTHP_CVAL_EL2) = cval;
+		break;
 	default:
 		WARN_ON(1);
 	}
@@ -146,13 +174,27 @@ u64 kvm_phys_timer_read(void)
 
 static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
-	if (has_vhe()) {
+	if (vcpu_has_nv(vcpu)) {
+		if (is_hyp_ctxt(vcpu)) {
+			map->direct_vtimer = vcpu_hvtimer(vcpu);
+			map->direct_ptimer = vcpu_hptimer(vcpu);
+			map->emul_vtimer = vcpu_vtimer(vcpu);
+			map->emul_ptimer = vcpu_ptimer(vcpu);
+		} else {
+			map->direct_vtimer = vcpu_vtimer(vcpu);
+			map->direct_ptimer = vcpu_ptimer(vcpu);
+			map->emul_vtimer = vcpu_hvtimer(vcpu);
+			map->emul_ptimer = vcpu_hptimer(vcpu);
+		}
+	} else if (has_vhe()) {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = vcpu_ptimer(vcpu);
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = NULL;
 	} else {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = NULL;
+		map->emul_vtimer = NULL;
 		map->emul_ptimer = vcpu_ptimer(vcpu);
 	}
 
@@ -247,8 +289,11 @@ static bool vcpu_has_wfit_active(struct kvm_vcpu *vcpu)
 
 static u64 wfit_delay_ns(struct kvm_vcpu *vcpu)
 {
-	struct arch_timer_context *ctx = vcpu_vtimer(vcpu);
 	u64 val = vcpu_get_reg(vcpu, kvm_vcpu_sys_get_rt(vcpu));
+	struct arch_timer_context *ctx;
+
+	ctx = (vcpu_has_nv(vcpu) && is_hyp_ctxt(vcpu)) ? vcpu_hvtimer(vcpu)
+						       : vcpu_vtimer(vcpu);
 
 	return kvm_counter_compute_delta(ctx, val);
 }
@@ -345,9 +390,11 @@ static bool kvm_timer_should_fire(struct arch_timer_context *timer_ctx)
 
 		switch (index) {
 		case TIMER_VTIMER:
+		case TIMER_HVTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTV_CTL);
 			break;
 		case TIMER_PTIMER:
+		case TIMER_HPTIMER:
 			cnt_ctl = read_sysreg_el0(SYS_CNTP_CTL);
 			break;
 		case NR_KVM_TIMERS:
@@ -463,6 +510,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 		u64 cval;
 
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTV_CTL));
 		timer_set_cval(ctx, read_sysreg_el0(SYS_CNTV_CVAL));
 
@@ -488,6 +536,7 @@ static void timer_save_state(struct arch_timer_context *ctx)
 		set_cntvoff(0);
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		timer_set_ctl(ctx, read_sysreg_el0(SYS_CNTP_CTL));
 		cval = read_sysreg_el0(SYS_CNTP_CVAL);
 
@@ -531,6 +580,7 @@ static void kvm_timer_blocking(struct kvm_vcpu *vcpu)
 	 */
 	if (!kvm_timer_irq_can_fire(map.direct_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.direct_ptimer) &&
+	    !kvm_timer_irq_can_fire(map.emul_vtimer) &&
 	    !kvm_timer_irq_can_fire(map.emul_ptimer) &&
 	    !vcpu_has_wfit_active(vcpu))
 		return;
@@ -567,12 +617,14 @@ static void timer_restore_state(struct arch_timer_context *ctx)
 		u64 cval, offset;
 
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		set_cntvoff(timer_get_offset(ctx));
 		write_sysreg_el0(timer_get_cval(ctx), SYS_CNTV_CVAL);
 		isb();
 		write_sysreg_el0(timer_get_ctl(ctx), SYS_CNTV_CTL);
 		break;
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		cval = timer_get_cval(ctx);
 		offset = timer_get_offset(ctx);
 		set_cntpoff(offset);
@@ -658,6 +710,57 @@ static void kvm_timer_vcpu_load_nogic(struct kvm_vcpu *vcpu)
 			(_clr) |= (_bit);				\
 	} while (0)
 
+static void kvm_timer_vcpu_load_nested_switch(struct kvm_vcpu *vcpu,
+					      struct timer_map *map)
+{
+	int hw, ret;
+
+	if (!irqchip_in_kernel(vcpu->kvm))
+		return;
+
+	/*
+	 * We only ever unmap the vtimer irq on a VHE system that runs nested
+	 * virtualization, in which case we have both a valid emul_vtimer,
+	 * emul_ptimer, direct_vtimer, and direct_ptimer.
+	 *
+	 * Since this is called from kvm_timer_vcpu_load(), a change between
+	 * vEL2 and vEL1/0 will have just happened, and the timer_map will
+	 * represent this, and therefore we switch the emul/direct mappings
+	 * below.
+	 */
+	hw = kvm_vgic_get_map(vcpu, map->direct_vtimer->irq.irq);
+	if (hw < 0) {
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_vtimer->irq.irq);
+		kvm_vgic_unmap_phys_irq(vcpu, map->emul_ptimer->irq.irq);
+
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_vtimer->host_timer_irq,
+					    map->direct_vtimer->irq.irq,
+					    &arch_timer_irq_ops);
+		WARN_ON_ONCE(ret);
+		ret = kvm_vgic_map_phys_irq(vcpu,
+					    map->direct_ptimer->host_timer_irq,
+					    map->direct_ptimer->irq.irq,
+					    &arch_timer_irq_ops);
+		WARN_ON_ONCE(ret);
+
+		/*
+		 * The virtual offset behaviour is "interresting", as it
+		 * always applies when HCR_EL2.E2H==0, but only when
+		 * accessed from EL1 when HCR_EL2.E2H==1. So make sure we
+		 * track E2H when putting the HV timer in "direct" mode.
+		 */
+		if (map->direct_vtimer == vcpu_hvtimer(vcpu)) {
+			struct arch_timer_offset *offs = &map->direct_vtimer->offset;
+
+			if (vcpu_el2_e2h_is_set(vcpu))
+				offs->vcpu_offset = NULL;
+			else
+				offs->vcpu_offset = &__vcpu_sys_reg(vcpu, CNTVOFF_EL2);
+		}
+	}
+}
+
 static void timer_set_traps(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
 	bool tpt, tpc;
@@ -691,6 +794,21 @@ static void timer_set_traps(struct kvm_vcpu *vcpu, struct timer_map *map)
 		tpt = tpc = true;
 
 	/*
+	 * Apply the enable bits that the guest hypervisor has requested for
+	 * its own guest. We can only add traps that wouldn't have been set
+	 * above.
+	 */
+	if (vcpu_has_nv(vcpu) && !is_hyp_ctxt(vcpu)) {
+		u64 val = __vcpu_sys_reg(vcpu, CNTHCTL_EL2);
+
+		/* Use the VHE format for mental sanity */
+		if (!vcpu_el2_e2h_is_set(vcpu))
+			val = (val & (CNTHCTL_EL1PCEN | CNTHCTL_EL1PCTEN)) << 10;
+
+		tpt |= !(val & (CNTHCTL_EL1PCEN << 10));
+		tpc |= !(val & (CNTHCTL_EL1PCTEN << 10));
+	}
+
 	/*
 	 * Now that we have collected our requirements, compute the
 	 * trap and enable bits.
@@ -716,6 +834,9 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	get_timer_map(vcpu, &map);
 
 	if (static_branch_likely(&has_gic_active_state)) {
+		if (vcpu_has_nv(vcpu))
+			kvm_timer_vcpu_load_nested_switch(vcpu, &map);
+
 		kvm_timer_vcpu_load_gic(map.direct_vtimer);
 		if (map.direct_ptimer)
 			kvm_timer_vcpu_load_gic(map.direct_ptimer);
@@ -728,6 +849,8 @@ void kvm_timer_vcpu_load(struct kvm_vcpu *vcpu)
 	timer_restore_state(map.direct_vtimer);
 	if (map.direct_ptimer)
 		timer_restore_state(map.direct_ptimer);
+	if (map.emul_vtimer)
+		timer_emulate(map.emul_vtimer);
 	if (map.emul_ptimer)
 		timer_emulate(map.emul_ptimer);
 
@@ -774,6 +897,8 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu)
 	 * In any case, we re-schedule the hrtimer for the physical timer when
 	 * coming back to the VCPU thread in kvm_timer_vcpu_load().
 	 */
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -826,6 +951,17 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	for (int i = 0; i < NR_KVM_TIMERS; i++)
 		timer_set_ctl(vcpu_get_timer(vcpu, i), 0);
 
+	/*
+	 * A vcpu running at EL2 is in charge of the offset applied to
+	 * the virtual timer, so use the physical VM offset, and point
+	 * the vcpu offset to CNTVOFF_EL2.
+	 */
+	if (vcpu_has_nv(vcpu)) {
+		struct arch_timer_offset *offs = &vcpu_vtimer(vcpu)->offset;
+
+		offs->vcpu_offset = &__vcpu_sys_reg(vcpu, CNTVOFF_EL2);
+		offs->vm_offset = &vcpu->kvm->arch.offsets.poffset;
+	}
 
 	if (timer->enabled) {
 		for (int i = 0; i < NR_KVM_TIMERS; i++)
@@ -839,6 +975,8 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 		}
 	}
 
+	if (map.emul_vtimer)
+		soft_timer_cancel(&map.emul_vtimer->hrtimer);
 	if (map.emul_ptimer)
 		soft_timer_cancel(&map.emul_ptimer->hrtimer);
 
@@ -863,9 +1001,11 @@ static void timer_context_init(struct kvm_vcpu *vcpu, int timerid)
 
 	switch (timerid) {
 	case TIMER_PTIMER:
+	case TIMER_HPTIMER:
 		ctxt->host_timer_irq = host_ptimer_irq;
 		break;
 	case TIMER_VTIMER:
+	case TIMER_HVTIMER:
 		ctxt->host_timer_irq = host_vtimer_irq;
 		break;
 	}
@@ -1010,6 +1150,10 @@ static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 		val = kvm_phys_timer_read() - timer_get_offset(timer);
 		break;
 
+	case TIMER_REG_VOFF:
+		val = *timer->offset.vcpu_offset;
+		break;
+
 	default:
 		BUG();
 	}
@@ -1028,7 +1172,7 @@ u64 kvm_arm_timer_read_sysreg(struct kvm_vcpu *vcpu,
 	get_timer_map(vcpu, &map);
 	timer = vcpu_get_timer(vcpu, tmr);
 
-	if (timer == map.emul_ptimer)
+	if (timer == map.emul_vtimer || timer == map.emul_ptimer)
 		return kvm_arm_timer_read(vcpu, timer, treg);
 
 	preempt_disable();
@@ -1060,6 +1204,10 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 		timer_set_cval(timer, val);
 		break;
 
+	case TIMER_REG_VOFF:
+		*timer->offset.vcpu_offset = val;
+		break;
+
 	default:
 		BUG();
 	}
@@ -1075,7 +1223,7 @@ void kvm_arm_timer_write_sysreg(struct kvm_vcpu *vcpu,
 
 	get_timer_map(vcpu, &map);
 	timer = vcpu_get_timer(vcpu, tmr);
-	if (timer == map.emul_ptimer) {
+	if (timer == map.emul_vtimer || timer == map.emul_ptimer) {
 		soft_timer_cancel(&timer->hrtimer);
 		kvm_arm_timer_write(vcpu, timer, treg, val);
 		timer_emulate(timer);
@@ -1155,10 +1303,6 @@ static const struct irq_domain_ops timer_domain_ops = {
 	.free	= timer_irq_domain_free,
 };
 
-static struct irq_ops arch_timer_irq_ops = {
-	.get_input_level = kvm_arch_timer_get_input_level,
-};
-
 static void kvm_irq_fixup_flags(unsigned int virq, u32 *flags)
 {
 	*flags = irq_get_trigger_type(virq);
@@ -1300,7 +1444,7 @@ void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu)
 
 static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 {
-	int vtimer_irq, ptimer_irq, ret;
+	int vtimer_irq, ptimer_irq, hvtimer_irq, hptimer_irq, ret;
 	unsigned long i;
 
 	vtimer_irq = vcpu_vtimer(vcpu)->irq.irq;
@@ -1313,16 +1457,28 @@ static bool timer_irqs_are_valid(struct kvm_vcpu *vcpu)
 	if (ret)
 		return false;
 
+	hvtimer_irq = vcpu_hvtimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hvtimer_irq, vcpu_hvtimer(vcpu));
+	if (ret)
+		return false;
+
+	hptimer_irq = vcpu_hptimer(vcpu)->irq.irq;
+	ret = kvm_vgic_set_owner(vcpu, hptimer_irq, vcpu_hptimer(vcpu));
+	if (ret)
+		return false;
+
 	kvm_for_each_vcpu(i, vcpu, vcpu->kvm) {
 		if (vcpu_vtimer(vcpu)->irq.irq != vtimer_irq ||
-		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq)
+		    vcpu_ptimer(vcpu)->irq.irq != ptimer_irq ||
+		    vcpu_hvtimer(vcpu)->irq.irq != hvtimer_irq ||
+		    vcpu_hptimer(vcpu)->irq.irq != hptimer_irq)
 			return false;
 	}
 
 	return true;
 }
 
-bool kvm_arch_timer_get_input_level(int vintid)
+static bool kvm_arch_timer_get_input_level(int vintid)
 {
 	struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
 	struct arch_timer_context *timer;
@@ -1334,6 +1490,10 @@ bool kvm_arch_timer_get_input_level(int vintid)
 		timer = vcpu_vtimer(vcpu);
 	else if (vintid == vcpu_ptimer(vcpu)->irq.irq)
 		timer = vcpu_ptimer(vcpu);
+	else if (vintid == vcpu_hvtimer(vcpu)->irq.irq)
+		timer = vcpu_hvtimer(vcpu);
+	else if (vintid == vcpu_hptimer(vcpu)->irq.irq)
+		timer = vcpu_hptimer(vcpu);
 	else
 		BUG();
 
@@ -1403,6 +1563,7 @@ static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
 	kvm_for_each_vcpu(i, vcpu, kvm) {
 		vcpu_vtimer(vcpu)->irq.irq = vtimer_irq;
 		vcpu_ptimer(vcpu)->irq.irq = ptimer_irq;
+		/* TODO: Add support for hv/hp timers */
 	}
 }
 
@@ -1413,6 +1574,8 @@ int kvm_arm_timer_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	if (!irqchip_in_kernel(vcpu->kvm))
 		return -EINVAL;
 
@@ -1445,6 +1608,8 @@ int kvm_arm_timer_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	struct arch_timer_context *timer;
 	int irq;
 
+	/* TODO: Add support for hv/hp timers */
+
 	switch (attr->attr) {
 	case KVM_ARM_VCPU_TIMER_IRQ_VTIMER:
 		timer = vcpu_vtimer(vcpu);
diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h
index f3e46a976125..6ce5c025218d 100644
--- a/arch/arm64/kvm/trace_arm.h
+++ b/arch/arm64/kvm/trace_arm.h
@@ -206,6 +206,7 @@ TRACE_EVENT(kvm_get_timer_map,
 		__field(	unsigned long,		vcpu_id	)
 		__field(	int,			direct_vtimer	)
 		__field(	int,			direct_ptimer	)
+		__field(	int,			emul_vtimer	)
 		__field(	int,			emul_ptimer	)
 	),
 
@@ -214,14 +215,17 @@ TRACE_EVENT(kvm_get_timer_map,
 		__entry->direct_vtimer		= arch_timer_ctx_index(map->direct_vtimer);
 		__entry->direct_ptimer =
 			(map->direct_ptimer) ? arch_timer_ctx_index(map->direct_ptimer) : -1;
+		__entry->emul_vtimer =
+			(map->emul_vtimer) ? arch_timer_ctx_index(map->emul_vtimer) : -1;
 		__entry->emul_ptimer =
 			(map->emul_ptimer) ? arch_timer_ctx_index(map->emul_ptimer) : -1;
 	),
 
-	TP_printk("VCPU: %ld, dv: %d, dp: %d, ep: %d",
+	TP_printk("VCPU: %ld, dv: %d, dp: %d, ev: %d, ep: %d",
 		  __entry->vcpu_id,
 		  __entry->direct_vtimer,
 		  __entry->direct_ptimer,
+		  __entry->emul_vtimer,
 		  __entry->emul_ptimer)
 );
 
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index d97e6080b421..ae491ef97188 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -573,6 +573,21 @@ int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid)
 	return 0;
 }
 
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid)
+{
+	struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, vintid);
+	unsigned long flags;
+	int ret = -1;
+
+	raw_spin_lock_irqsave(&irq->irq_lock, flags);
+	if (irq->hw)
+		ret = irq->hwintid;
+	raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+	vgic_put_irq(vcpu->kvm, irq);
+	return ret;
+}
+
 /**
  * kvm_vgic_set_owner - Set the owner of an interrupt for a VM
  *
diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
index e76e513b90c5..8ccd90b45587 100644
--- a/include/kvm/arm_arch_timer.h
+++ b/include/kvm/arm_arch_timer.h
@@ -13,6 +13,8 @@
 enum kvm_arch_timers {
 	TIMER_PTIMER,
 	TIMER_VTIMER,
+	TIMER_HVTIMER,
+	TIMER_HPTIMER,
 	NR_KVM_TIMERS
 };
 
@@ -21,6 +23,7 @@ enum kvm_arch_timer_regs {
 	TIMER_REG_CVAL,
 	TIMER_REG_TVAL,
 	TIMER_REG_CTL,
+	TIMER_REG_VOFF,
 };
 
 struct arch_timer_offset {
@@ -69,6 +72,7 @@ struct arch_timer_context {
 struct timer_map {
 	struct arch_timer_context *direct_vtimer;
 	struct arch_timer_context *direct_ptimer;
+	struct arch_timer_context *emul_vtimer;
 	struct arch_timer_context *emul_ptimer;
 };
 
@@ -105,12 +109,12 @@ void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu);
 
 void kvm_timer_init_vhe(void);
 
-bool kvm_arch_timer_get_input_level(int vintid);
-
 #define vcpu_timer(v)	(&(v)->arch.timer_cpu)
 #define vcpu_get_timer(v,t)	(&vcpu_timer(v)->timers[(t)])
 #define vcpu_vtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_VTIMER])
 #define vcpu_ptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_PTIMER])
+#define vcpu_hvtimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HVTIMER])
+#define vcpu_hptimer(v)	(&(v)->arch.timer_cpu.timers[TIMER_HPTIMER])
 
 #define arch_timer_ctx_index(ctx)	((ctx) - vcpu_timer((ctx)->vcpu)->timers)
 
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index d3ad51fde9db..402b545959af 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -380,6 +380,7 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int intid,
 int kvm_vgic_map_phys_irq(struct kvm_vcpu *vcpu, unsigned int host_irq,
 			  u32 vintid, struct irq_ops *ops);
 int kvm_vgic_unmap_phys_irq(struct kvm_vcpu *vcpu, unsigned int vintid);
+int kvm_vgic_get_map(struct kvm_vcpu *vcpu, unsigned int vintid);
 bool kvm_vgic_map_is_active(struct kvm_vcpu *vcpu, unsigned int vintid);
 
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 14/16] KVM: arm64: selftests: Add physical timer registers to the sysreg list
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (12 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 13/16] KVM: arm64: nv: timers: Support hyp timer emulation Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-16 14:21 ` [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets Marc Zyngier
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Now that KVM exposes CNTPCT_EL0, CNTP_CTL_EL0 and CNT_CVAL_EL0 to
userspace, add them to the get-reg-list selftest.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 tools/testing/selftests/kvm/aarch64/get-reg-list.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/aarch64/get-reg-list.c b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
index d287dd2cac0a..1b976b333d2c 100644
--- a/tools/testing/selftests/kvm/aarch64/get-reg-list.c
+++ b/tools/testing/selftests/kvm/aarch64/get-reg-list.c
@@ -651,7 +651,7 @@ int main(int ac, char **av)
  * The current blessed list was primed with the output of kernel version
  * v4.15 with --core-reg-fixup and then later updated with new registers.
  *
- * The blessed list is up to date with kernel version v5.13-rc3
+ * The blessed list is up to date with kernel version v6.4 (or so we hope)
  */
 static __u64 base_regs[] = {
 	KVM_REG_ARM64 | KVM_REG_SIZE_U64 | KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(regs.regs[0]),
@@ -858,6 +858,9 @@ static __u64 base_regs[] = {
 	ARM64_SYS_REG(3, 2, 0, 0, 0),	/* CSSELR_EL1 */
 	ARM64_SYS_REG(3, 3, 13, 0, 2),	/* TPIDR_EL0 */
 	ARM64_SYS_REG(3, 3, 13, 0, 3),	/* TPIDRRO_EL0 */
+	ARM64_SYS_REG(3, 3, 14, 0, 1),	/* CNTPCT_EL0 */
+	ARM64_SYS_REG(3, 3, 14, 2, 1),	/* CNTP_CTL_EL0 */
+	ARM64_SYS_REG(3, 3, 14, 2, 2),	/* CNTP_CVAL_EL0 */
 	ARM64_SYS_REG(3, 4, 3, 0, 0),	/* DACR32_EL2 */
 	ARM64_SYS_REG(3, 4, 5, 0, 1),	/* IFSR32_EL2 */
 	ARM64_SYS_REG(3, 4, 5, 3, 0),	/* FPEXC32_EL2 */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (13 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 14/16] KVM: arm64: selftests: Add physical timer registers to the sysreg list Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-03-06 22:08   ` Colton Lewis
  2023-02-16 14:21 ` [PATCH 16/16] KVM: arm64: selftests: Deal with spurious timer interrupts Marc Zyngier
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Allow a user to specify the physical and virtual offsets on the
command-line.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 .../selftests/kvm/aarch64/arch_timer.c        | 23 ++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c b/tools/testing/selftests/kvm/aarch64/arch_timer.c
index 26556a266021..62af0e7d10b4 100644
--- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
+++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
@@ -47,6 +47,7 @@ struct test_args {
 	int nr_iter;
 	int timer_period_ms;
 	int migration_freq_ms;
+	struct kvm_arm_counter_offsets offsets;
 };
 
 static struct test_args test_args = {
@@ -54,6 +55,7 @@ static struct test_args test_args = {
 	.nr_iter = NR_TEST_ITERS_DEF,
 	.timer_period_ms = TIMER_TEST_PERIOD_MS_DEF,
 	.migration_freq_ms = TIMER_TEST_MIGRATION_FREQ_MS,
+	.offsets = { .flags = 0 },
 };
 
 #define msecs_to_usecs(msec)		((msec) * 1000LL)
@@ -372,6 +374,13 @@ static struct kvm_vm *test_vm_create(void)
 	vm_init_descriptor_tables(vm);
 	vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT, guest_irq_handler);
 
+	if (test_args.offsets.flags) {
+		if (kvm_has_cap(KVM_CAP_COUNTER_OFFSETS))
+			vm_ioctl(vm, KVM_ARM_SET_CNT_OFFSETS, &test_args.offsets);
+		else
+			__TEST_REQUIRE(test_args.offsets.flags, "no support for global offsets");
+	}
+
 	for (i = 0; i < nr_vcpus; i++)
 		vcpu_init_descriptor_tables(vcpus[i]);
 
@@ -403,6 +412,10 @@ static void test_print_help(char *name)
 		TIMER_TEST_PERIOD_MS_DEF);
 	pr_info("\t-m: Frequency (in ms) of vCPUs to migrate to different pCPU. 0 to turn off (default: %u)\n",
 		TIMER_TEST_MIGRATION_FREQ_MS);
+	pr_info("\t-o: Virtual counter offset (in counter cycles, default: %llu)\n",
+		test_args.offsets.virtual_offset);
+	pr_info("\t-O: Physical counter offset (in counter cycles, default: %llu)\n",
+		test_args.offsets.physical_offset);
 	pr_info("\t-h: print this help screen\n");
 }
 
@@ -410,7 +423,7 @@ static bool parse_args(int argc, char *argv[])
 {
 	int opt;
 
-	while ((opt = getopt(argc, argv, "hn:i:p:m:")) != -1) {
+	while ((opt = getopt(argc, argv, "hn:i:p:m:o:O:")) != -1) {
 		switch (opt) {
 		case 'n':
 			test_args.nr_vcpus = atoi_positive("Number of vCPUs", optarg);
@@ -429,6 +442,14 @@ static bool parse_args(int argc, char *argv[])
 		case 'm':
 			test_args.migration_freq_ms = atoi_non_negative("Frequency", optarg);
 			break;
+		case 'o':
+			test_args.offsets.virtual_offset = strtol(optarg, NULL, 0);
+			test_args.offsets.flags |= KVM_COUNTER_SET_VOFFSET_FLAG;
+			break;
+		case 'O':
+			test_args.offsets.physical_offset = strtol(optarg, NULL, 0);
+			test_args.offsets.flags |= KVM_COUNTER_SET_POFFSET_FLAG;
+			break;
 		case 'h':
 		default:
 			goto err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH 16/16] KVM: arm64: selftests: Deal with spurious timer interrupts
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (14 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets Marc Zyngier
@ 2023-02-16 14:21 ` Marc Zyngier
  2023-02-21 16:28 ` [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Veith, Simon
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-16 14:21 UTC (permalink / raw)
  To: kvmarm, kvm, linux-arm-kernel
  Cc: James Morse, Suzuki K Poulose, Oliver Upton, Zenghui Yu,
	Ricardo Koller, Simon Veith, dwmw2

Make sure the timer test can properly handle a spurious timer
interrupt, something that is far from being unlikely.

Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 tools/testing/selftests/kvm/aarch64/arch_timer.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/testing/selftests/kvm/aarch64/arch_timer.c b/tools/testing/selftests/kvm/aarch64/arch_timer.c
index 62af0e7d10b4..3f6f0bfa68a2 100644
--- a/tools/testing/selftests/kvm/aarch64/arch_timer.c
+++ b/tools/testing/selftests/kvm/aarch64/arch_timer.c
@@ -158,6 +158,9 @@ static void guest_irq_handler(struct ex_regs *regs)
 	uint32_t cpu = guest_get_vcpuid();
 	struct test_vcpu_shared_data *shared_data = &vcpu_shared_data[cpu];
 
+	if (intid == IAR_SPURIOUS)
+		return;
+
 	guest_validate_irq(intid, shared_data);
 
 	WRITE_ONCE(shared_data->nr_iter, shared_data->nr_iter + 1);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-16 14:21 ` [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets Marc Zyngier
@ 2023-02-16 22:09   ` Oliver Upton
  2023-02-17 10:17     ` Marc Zyngier
  2023-02-23 22:41   ` Colton Lewis
  1 sibling, 1 reply; 55+ messages in thread
From: Oliver Upton @ 2023-02-16 22:09 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

Hi Marc,

On Thu, Feb 16, 2023 at 02:21:15PM +0000, Marc Zyngier wrote:
> And this is the moment you have all been waiting for: setting the
> counter offsets from userspace.
> 
> We expose a brand new capability that reports the ability to set
> the offsets for both the virtual and physical sides, independently.
> 
> In keeping with the architecture, the offsets are expressed as
> a delta that is substracted from the physical counter value.
> 
> Once this new API is used, there is no going back, and the counters
> cannot be written to to set the offsets implicitly (the writes
> are instead ignored).

Is there any particular reason to use an explicit ioctl as opposed to
the KVM_{GET,SET}_DEVICE_ATTR ioctls? Dunno where you stand on it, but I
quite like that interface for simple state management. We also avoid
eating up more UAPI bits in the global namespace.

You could also split up the two offsets as individual attributes, but I
really couldn't care less. Its userspace's problem after all!

And on that note, any VMM patches to match this implementation? kvmtool
would suffice, just want something that runs a real VM and pokes these
ioctls.

> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |  4 +++
>  arch/arm64/include/uapi/asm/kvm.h | 15 ++++++++++
>  arch/arm64/kvm/arch_timer.c       | 46 +++++++++++++++++++++++++++----
>  arch/arm64/kvm/arm.c              |  8 ++++++
>  include/uapi/linux/kvm.h          |  3 ++
>  5 files changed, 71 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 3adac0c5e175..8514a37cf8d5 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -221,6 +221,8 @@ struct kvm_arch {
>  #define KVM_ARCH_FLAG_EL1_32BIT				4
>  	/* PSCI SYSTEM_SUSPEND enabled for the guest */
>  #define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED		5
> +	/* VM counter offsets */
> +#define KVM_ARCH_FLAG_COUNTER_OFFSETS			6

nit: I'm not too bothered by the test_bit(...) mouthful when its used
sparingly but it may be nice to have a helper if it is repeated enough
times.

>  	unsigned long flags;
>  
> @@ -1010,6 +1012,8 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
>  
>  long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
>  				struct kvm_arm_copy_mte_tags *copy_tags);
> +int kvm_vm_ioctl_set_counter_offsets(struct kvm *kvm,
> +				     struct kvm_arm_counter_offsets *offsets);
>  
>  /* Guest/host FPSIMD coordination helpers */
>  int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index f8129c624b07..2d7557a160bd 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -198,6 +198,21 @@ struct kvm_arm_copy_mte_tags {
>  	__u64 reserved[2];
>  };
>  
> +/*
> + * Counter/Timer offset structure. Describe the virtual/physical offsets.
> + * To be used with KVM_ARM_SET_CNT_OFFSETS.
> + */
> +struct kvm_arm_counter_offsets {
> +	__u64 virtual_offset;
> +	__u64 physical_offset;
> +
> +#define KVM_COUNTER_SET_VOFFSET_FLAG	(1UL << 0)
> +#define KVM_COUNTER_SET_POFFSET_FLAG	(1UL << 1)
> +
> +	__u64 flags;
> +	__u64 reserved;
> +};
> +
>  #define KVM_ARM_TAGS_TO_GUEST		0
>  #define KVM_ARM_TAGS_FROM_GUEST		1
>  
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 444ea6dca218..b04544b702f9 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -852,9 +852,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>  	ptimer->vcpu = vcpu;
>  	ptimer->offset.vm_offset = &vcpu->kvm->arch.offsets.poffset;
>  
> -	/* Synchronize cntvoff across all vtimers of a VM. */
> -	timer_set_offset(vtimer, kvm_phys_timer_read());
> -	timer_set_offset(ptimer, 0);
> +	/* Synchronize offsets across timers of a VM if not already provided */
> +	if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS, &vcpu->kvm->arch.flags)) {
> +		timer_set_offset(vtimer, kvm_phys_timer_read());
> +		timer_set_offset(ptimer, 0);
> +	}
>  
>  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
>  	timer->bg_timer.function = kvm_bg_timer_expire;
> @@ -898,8 +900,11 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  		kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
>  		break;
>  	case KVM_REG_ARM_TIMER_CNT:
> -		timer = vcpu_vtimer(vcpu);
> -		timer_set_offset(timer, kvm_phys_timer_read() - value);
> +		if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS,
> +			      &vcpu->kvm->arch.flags)) {
> +			timer = vcpu_vtimer(vcpu);
> +			timer_set_offset(timer, kvm_phys_timer_read() - value);
> +		}

If we've already got userspace to buy in to the new UAPI, should we
return an error instead of silently failing? Might be good to catch
misbehavior in the act, even if it is benign as this patch is written.

>  		break;
>  	case KVM_REG_ARM_TIMER_CVAL:
>  		timer = vcpu_vtimer(vcpu);
> @@ -909,6 +914,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>  		timer = vcpu_ptimer(vcpu);
>  		kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
>  		break;
> +	case KVM_REG_ARM_PTIMER_CNT:
> +		if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS,
> +			      &vcpu->kvm->arch.flags)) {
> +			timer = vcpu_ptimer(vcpu);
> +			timer_set_offset(timer, kvm_phys_timer_read() - value);
> +		}
> +		break;
>  	case KVM_REG_ARM_PTIMER_CVAL:
>  		timer = vcpu_ptimer(vcpu);
>  		kvm_arm_timer_write(vcpu, timer, TIMER_REG_CVAL, value);
> @@ -1446,3 +1458,27 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>  
>  	return -ENXIO;
>  }
> +
> +int kvm_vm_ioctl_set_counter_offsets(struct kvm *kvm,
> +				     struct kvm_arm_counter_offsets *offsets)
> +{
> +	if (offsets->reserved ||
> +	    (offsets->flags & ~(KVM_COUNTER_SET_VOFFSET_FLAG |
> +				KVM_COUNTER_SET_POFFSET_FLAG)))
> +		return -EINVAL;
> +
> +	if (!lock_all_vcpus(kvm))
> +		return -EBUSY;

Is there any reason why we can't just order this ioctl before vCPU
creation altogether, or is there a need to do this at runtime? We're
about to tolerate multiple writers to the offset value, and I think the
only thing we need to guarantee is that the below flag is set before
vCPU ioctls have a chance to run.

Actually, the one benefit of enforcing a strict ordering is that you can
hide the old register indices completely from KVM_GET_REG_LIST to
provide further discouragement to userspace.

Otherwise, if you choose to keep it this way then the requirement to
have no vCPU ioctls in flight needs to be explicitly documented as that
is a bit of a tricky interface.

> +	set_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS, &kvm->arch.flags);
> +
> +	if (offsets->flags & KVM_COUNTER_SET_VOFFSET_FLAG)
> +		kvm->arch.offsets.voffset = offsets->virtual_offset;
> +
> +	if (offsets->flags & KVM_COUNTER_SET_POFFSET_FLAG)
> +		kvm->arch.offsets.poffset = offsets->physical_offset;
> +
> +	unlock_all_vcpus(kvm);
> +
> +	return 0;
> +}

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-16 22:09   ` Oliver Upton
@ 2023-02-17 10:17     ` Marc Zyngier
  2023-02-17 22:11       ` Oliver Upton
  0 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-17 10:17 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

Hi Oliver,

On Thu, 16 Feb 2023 22:09:47 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> Hi Marc,
> 
> On Thu, Feb 16, 2023 at 02:21:15PM +0000, Marc Zyngier wrote:
> > And this is the moment you have all been waiting for: setting the
> > counter offsets from userspace.
> > 
> > We expose a brand new capability that reports the ability to set
> > the offsets for both the virtual and physical sides, independently.
> > 
> > In keeping with the architecture, the offsets are expressed as
> > a delta that is substracted from the physical counter value.
> > 
> > Once this new API is used, there is no going back, and the counters
> > cannot be written to to set the offsets implicitly (the writes
> > are instead ignored).
> 
> Is there any particular reason to use an explicit ioctl as opposed to
> the KVM_{GET,SET}_DEVICE_ATTR ioctls? Dunno where you stand on it, but I
> quite like that interface for simple state management. We also avoid
> eating up more UAPI bits in the global namespace.

The problem with that is that it requires yet another KVM device for
this, and I'm lazy. It also makes it a bit harder for the VMM to buy
into this (need to track another FD, for example).

> You could also split up the two offsets as individual attributes, but I
> really couldn't care less. Its userspace's problem after all!

The current approach also allows you to deal with the two offsets
individually (you can update one or both as you see fit).

> 
> And on that note, any VMM patches to match this implementation? kvmtool
> would suffice, just want something that runs a real VM and pokes these
> ioctls.

I did hack kvmtool for that, but that's very uninteresting, as it
doesn't do any save/restore. I want to have a go at QEMU in my copious
spare time, stay tuned.

> 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  4 +++
> >  arch/arm64/include/uapi/asm/kvm.h | 15 ++++++++++
> >  arch/arm64/kvm/arch_timer.c       | 46 +++++++++++++++++++++++++++----
> >  arch/arm64/kvm/arm.c              |  8 ++++++
> >  include/uapi/linux/kvm.h          |  3 ++
> >  5 files changed, 71 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 3adac0c5e175..8514a37cf8d5 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -221,6 +221,8 @@ struct kvm_arch {
> >  #define KVM_ARCH_FLAG_EL1_32BIT				4
> >  	/* PSCI SYSTEM_SUSPEND enabled for the guest */
> >  #define KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED		5
> > +	/* VM counter offsets */
> > +#define KVM_ARCH_FLAG_COUNTER_OFFSETS			6
> 
> nit: I'm not too bothered by the test_bit(...) mouthful when its used
> sparingly but it may be nice to have a helper if it is repeated enough
> times.

3 times. I'll slap a helper in, can't hurt.

> 
> >  	unsigned long flags;
> >  
> > @@ -1010,6 +1012,8 @@ int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu,
> >  
> >  long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
> >  				struct kvm_arm_copy_mte_tags *copy_tags);
> > +int kvm_vm_ioctl_set_counter_offsets(struct kvm *kvm,
> > +				     struct kvm_arm_counter_offsets *offsets);
> >  
> >  /* Guest/host FPSIMD coordination helpers */
> >  int kvm_arch_vcpu_run_map_fp(struct kvm_vcpu *vcpu);
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> > index f8129c624b07..2d7557a160bd 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -198,6 +198,21 @@ struct kvm_arm_copy_mte_tags {
> >  	__u64 reserved[2];
> >  };
> >  
> > +/*
> > + * Counter/Timer offset structure. Describe the virtual/physical offsets.
> > + * To be used with KVM_ARM_SET_CNT_OFFSETS.
> > + */
> > +struct kvm_arm_counter_offsets {
> > +	__u64 virtual_offset;
> > +	__u64 physical_offset;
> > +
> > +#define KVM_COUNTER_SET_VOFFSET_FLAG	(1UL << 0)
> > +#define KVM_COUNTER_SET_POFFSET_FLAG	(1UL << 1)
> > +
> > +	__u64 flags;
> > +	__u64 reserved;
> > +};
> > +
> >  #define KVM_ARM_TAGS_TO_GUEST		0
> >  #define KVM_ARM_TAGS_FROM_GUEST		1
> >  
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index 444ea6dca218..b04544b702f9 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -852,9 +852,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> >  	ptimer->vcpu = vcpu;
> >  	ptimer->offset.vm_offset = &vcpu->kvm->arch.offsets.poffset;
> >  
> > -	/* Synchronize cntvoff across all vtimers of a VM. */
> > -	timer_set_offset(vtimer, kvm_phys_timer_read());
> > -	timer_set_offset(ptimer, 0);
> > +	/* Synchronize offsets across timers of a VM if not already provided */
> > +	if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS, &vcpu->kvm->arch.flags)) {
> > +		timer_set_offset(vtimer, kvm_phys_timer_read());
> > +		timer_set_offset(ptimer, 0);
> > +	}
> >  
> >  	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> >  	timer->bg_timer.function = kvm_bg_timer_expire;
> > @@ -898,8 +900,11 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  		kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
> >  		break;
> >  	case KVM_REG_ARM_TIMER_CNT:
> > -		timer = vcpu_vtimer(vcpu);
> > -		timer_set_offset(timer, kvm_phys_timer_read() - value);
> > +		if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS,
> > +			      &vcpu->kvm->arch.flags)) {
> > +			timer = vcpu_vtimer(vcpu);
> > +			timer_set_offset(timer, kvm_phys_timer_read() - value);
> > +		}
> 
> If we've already got userspace to buy in to the new UAPI, should we
> return an error instead of silently failing? Might be good to catch
> misbehavior in the act, even if it is benign as this patch is
> written.

I don't necessarily see it as a misbehaviour. I'm trying to make it
cheap for people to integrate this into their flow, and no two VMM do
that the same way (plus, I only have QEMU as an example -- I'm sure
Amazon, Google, and whoever do that in very different ways).

It is also pretty annoying, as the VMM now has to filter the registers
it restores. This is extra work. Instead, the VMM knows that having
used the new API, the registers become WI.

> 
> >  		break;
> >  	case KVM_REG_ARM_TIMER_CVAL:
> >  		timer = vcpu_vtimer(vcpu);
> > @@ -909,6 +914,13 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >  		timer = vcpu_ptimer(vcpu);
> >  		kvm_arm_timer_write(vcpu, timer, TIMER_REG_CTL, value);
> >  		break;
> > +	case KVM_REG_ARM_PTIMER_CNT:
> > +		if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS,
> > +			      &vcpu->kvm->arch.flags)) {
> > +			timer = vcpu_ptimer(vcpu);
> > +			timer_set_offset(timer, kvm_phys_timer_read() - value);
> > +		}
> > +		break;
> >  	case KVM_REG_ARM_PTIMER_CVAL:
> >  		timer = vcpu_ptimer(vcpu);
> >  		kvm_arm_timer_write(vcpu, timer, TIMER_REG_CVAL, value);
> > @@ -1446,3 +1458,27 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
> >  
> >  	return -ENXIO;
> >  }
> > +
> > +int kvm_vm_ioctl_set_counter_offsets(struct kvm *kvm,
> > +				     struct kvm_arm_counter_offsets *offsets)
> > +{
> > +	if (offsets->reserved ||
> > +	    (offsets->flags & ~(KVM_COUNTER_SET_VOFFSET_FLAG |
> > +				KVM_COUNTER_SET_POFFSET_FLAG)))
> > +		return -EINVAL;
> > +
> > +	if (!lock_all_vcpus(kvm))
> > +		return -EBUSY;
> 
> Is there any reason why we can't just order this ioctl before vCPU
> creation altogether, or is there a need to do this at runtime? We're
> about to tolerate multiple writers to the offset value, and I think the
> only thing we need to guarantee is that the below flag is set before
> vCPU ioctls have a chance to run.

Again, we don't know for sure whether the final offset is available
before vcpu creation time. My idea for QEMU would be to perform the
offset adjustment as late as possible, right before executing the VM,
after having restored the vcpus with whatever value they had.

> Actually, the one benefit of enforcing a strict ordering is that you can
> hide the old register indices completely from KVM_GET_REG_LIST to
> provide further discouragement to userspace.

But if you do that, how do you know what offset to program on the
target? You need the VM's view of time. You could compute it from the
physical counter and the offset locally, but you now need to transfer
it across. That's a change in the wire protocol, which is far more
than I'm prepared to bite.

> Otherwise, if you choose to keep it this way then the requirement to
> have no vCPU ioctls in flight needs to be explicitly documented as that
> is a bit of a tricky interface.

Ah, good point. Completely missed that.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-17 10:17     ` Marc Zyngier
@ 2023-02-17 22:11       ` Oliver Upton
  2023-02-22 11:56         ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Oliver Upton @ 2023-02-17 22:11 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

On Fri, Feb 17, 2023 at 10:17:27AM +0000, Marc Zyngier wrote:
> Hi Oliver,
> 
> On Thu, 16 Feb 2023 22:09:47 +0000,
> Oliver Upton <oliver.upton@linux.dev> wrote:
> > 
> > Hi Marc,
> > 
> > On Thu, Feb 16, 2023 at 02:21:15PM +0000, Marc Zyngier wrote:
> > > And this is the moment you have all been waiting for: setting the
> > > counter offsets from userspace.
> > > 
> > > We expose a brand new capability that reports the ability to set
> > > the offsets for both the virtual and physical sides, independently.
> > > 
> > > In keeping with the architecture, the offsets are expressed as
> > > a delta that is substracted from the physical counter value.
> > > 
> > > Once this new API is used, there is no going back, and the counters
> > > cannot be written to to set the offsets implicitly (the writes
> > > are instead ignored).
> > 
> > Is there any particular reason to use an explicit ioctl as opposed to
> > the KVM_{GET,SET}_DEVICE_ATTR ioctls? Dunno where you stand on it, but I
> > quite like that interface for simple state management. We also avoid
> > eating up more UAPI bits in the global namespace.
> 
> The problem with that is that it requires yet another KVM device for
> this, and I'm lazy. It also makes it a bit harder for the VMM to buy
> into this (need to track another FD, for example).

You can also accept the device ioctls on the actual VM FD, quite like
we do for the vCPU right now. And hey, I've got a patch that gets you
most of the way there!

https://lore.kernel.org/kvmarm/20230211013759.3556016-3-oliver.upton@linux.dev/

> > Is there any reason why we can't just order this ioctl before vCPU
> > creation altogether, or is there a need to do this at runtime? We're
> > about to tolerate multiple writers to the offset value, and I think the
> > only thing we need to guarantee is that the below flag is set before
> > vCPU ioctls have a chance to run.
> 
> Again, we don't know for sure whether the final offset is available
> before vcpu creation time. My idea for QEMU would be to perform the
> offset adjustment as late as possible, right before executing the VM,
> after having restored the vcpus with whatever value they had.

So how does userspace work out an offset based on available information?
The part that hasn't clicked for me yet is where userspace gets the
current value of the true physical counter to calculate an offset.

We could make it ABI that the guest's physical counter matches that of
the host by default. Of course, that has been the case since the
beginning of time but it is now directly user-visible.

The only part I don't like about that is that we aren't fully creating
an abstraction around host and guest system time. So here's my current
mental model of how we represent the generic timer to userspace:

				+-----------------------+
				|	   		|
				| Host System Counter	|
				|	   (1) 		|
				+-----------------------+
				    	   |
			       +-----------+-----------+
			       |		       |
       +-----------------+  +-----+		    +-----+  +--------------------+
       | (2) CNTPOFF_EL2 |--| sub |		    | sub |--| (3) CNTVOFF_EL2    |
       +-----------------+  +-----+	     	    +-----+  +--------------------+
			       |           	       |
			       |		       |
		     +-----------------+	 +----------------+
		     | (5) CNTPCT_EL0  |         | (4) CNTVCT_EL0 |
		     +-----------------+	 +----------------+

AFAICT, this UAPI exposes abstractions for (2) and (3) to userspace, but
userspace cannot directly get at (1).

Chewing on this a bit more, I don't think userspace has any business
messing with virtual and physical time independently, especially when
nested virtualization comes into play.

I think the illusion to userspace needs to be built around the notion of
a system counter:

                                +-----------------------+
                                |                       |
                                | Host System Counter   |
                                |          (1)          |
                                +-----------------------+
					   |
					   |
					+-----+   +-------------------+
					| sub |---| (6) system_offset |
					+-----+   +-------------------+
					   |
					   |
                                +-----------------------+
                                |                       |
                                | Guest System Counter  |
                                |          (7)          |
                                +-----------------------+
                                           |
                               +-----------+-----------+
                               |                       |
       +-----------------+  +-----+                 +-----+  +--------------------+
       | (2) CNTPOFF_EL2 |--| sub |                 | sub |--| (3) CNTVOFF_EL2    |
       +-----------------+  +-----+                 +-----+  +--------------------+
                               |                       |
                               |                       |
                     +-----------------+         +----------------+
                     | (5) CNTPCT_EL0  |         | (4) CNTVCT_EL0 |
                     +-----------------+         +----------------+

And from a UAPI perspective, we would either expose (1) and (6) to let
userspace calculate an offset or simply allow (7) to be directly
read/written.

That frees up the meaning of the counter offsets as being purely a
virtual EL2 thing. These registers would reset to 0, and non-NV guests
could never change their value.

Under the hood KVM would program the true offset registers as:

	CNT{P,V}OFF_EL2 = 'virtual CNT{P,V}OFF_EL2' + system_offset

With this we would effectively configure CNTPCT = CNTVCT = 0 at the
point of VM creation. Only crappy thing is it requires full physical
counter/timer emulation for non-ECV systems, but the guest shouldn't be
using the physical counter in the first place.

Yes, this sucks for guests running on hosts w/ NV but not ECV. If anyone
can tell me how an L0 hypervisor is supposed to do NV without ECV, I'm
all ears.

Does any of what I've written make remote sense or have I gone entirely
off the rails with my ASCII art? :)

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (15 preceding siblings ...)
  2023-02-16 14:21 ` [PATCH 16/16] KVM: arm64: selftests: Deal with spurious timer interrupts Marc Zyngier
@ 2023-02-21 16:28 ` Veith, Simon
  2023-02-21 22:17   ` Marc Zyngier
  2023-02-23 22:29 ` Colton Lewis
  2023-02-24 20:07 ` Colton Lewis
  18 siblings, 1 reply; 55+ messages in thread
From: Veith, Simon @ 2023-02-21 16:28 UTC (permalink / raw)
  To: maz, kvmarm, kvm, linux-arm-kernel
  Cc: dwmw2, yuzenghui, suzuki.poulose, james.morse, oliver.upton, ricarkol

Hello Marc,

On Thu, 2023-02-16 at 14:21 +0000, Marc Zyngier wrote:
> This series aims at satisfying multiple goals:
> 
> - allow a VMM to atomically restore a timer offset for a whole VM
>   instead of updating the offset each time a vcpu get its counter
>   written
> 
> - allow a VMM to save/restore the physical timer context, something
>   that we cannot do at the moment due to the lack of offsetting
> 
> - provide a framework that is suitable for NV support, where we get
>   both global and per timer, per vcpu offsetting

Thank you so much for following up on my (admittedly very basic) patch
with your own proposal!

> This has been moderately tested with nVHE, VHE and NV. I do not have
> access to CNTPOFF-aware HW, so the jury is still out on that one

Same here about CNTPOFF -- I gave it a quick spin on Graviton2 and
Graviton3, and neither chip claims the ARM64_HAS_ECV_CNTPOFF capability
from your patch.

I am working on testing your series with our userspace and will report
back.

Thanks
Simon



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879



^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit
  2023-02-21 16:28 ` [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Veith, Simon
@ 2023-02-21 22:17   ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-21 22:17 UTC (permalink / raw)
  To: Veith, Simon
  Cc: kvmarm, kvm, linux-arm-kernel, dwmw2, yuzenghui, suzuki.poulose,
	james.morse, oliver.upton, ricarkol

Hi Simon,

On Tue, 21 Feb 2023 16:28:36 +0000,
"Veith, Simon" <sveith@amazon.de> wrote:
> 
> Hello Marc,
> 
> On Thu, 2023-02-16 at 14:21 +0000, Marc Zyngier wrote:
> > This series aims at satisfying multiple goals:
> > 
> > - allow a VMM to atomically restore a timer offset for a whole VM
> >   instead of updating the offset each time a vcpu get its counter
> >   written
> > 
> > - allow a VMM to save/restore the physical timer context, something
> >   that we cannot do at the moment due to the lack of offsetting
> > 
> > - provide a framework that is suitable for NV support, where we get
> >   both global and per timer, per vcpu offsetting
> 
> Thank you so much for following up on my (admittedly very basic) patch
> with your own proposal!

No worries. There is nothing like "nerd sniping"... ;-)

> > This has been moderately tested with nVHE, VHE and NV. I do not have
> > access to CNTPOFF-aware HW, so the jury is still out on that one
> 
> Same here about CNTPOFF -- I gave it a quick spin on Graviton2 and
> Graviton3, and neither chip claims the ARM64_HAS_ECV_CNTPOFF capability
> from your patch.

Nah, G2/G3 are old stuff (v8.2 and v8.3+ respectively). You need at
least a v8.6 system, something that is made of the best Unobtainium
money can buy. M2 has ECV, but without CNTPOFF, which is just silly.

> I am working on testing your series with our userspace and will report
> back.

Thanks a lot, that'd be super helpful. Please don't get too attached
to the userspace interface though, as it is likely to change (Oliver
has an interesting suggestion to simplify it, but I need to convince
myself that it doesn't break migration of the physical timer).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/16] arm64: Add HAS_ECV_CNTPOFF capability
  2023-02-16 14:21 ` [PATCH 02/16] arm64: Add HAS_ECV_CNTPOFF capability Marc Zyngier
@ 2023-02-22  4:30   ` Reiji Watanabe
  2023-02-22 10:47     ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Reiji Watanabe @ 2023-02-22  4:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

Hi Marc,

On Thu, Feb 16, 2023 at 6:21 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Add the probing code for the FEAT_ECV variant that implements CNTPOFF_EL2.
> Why is it optional is a mystery, but let's try and detect it.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/kernel/cpufeature.c | 11 +++++++++++
>  arch/arm64/tools/cpucaps       |  1 +
>  2 files changed, 12 insertions(+)
>
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 23bd2a926b74..36852f96898d 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -2186,6 +2186,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
>                 .sign = FTR_UNSIGNED,
>                 .min_field_value = 1,
>         },
> +       {
> +               .desc = "Enhanced Counter Virtualization (CNTPOFF)",
> +               .capability = ARM64_HAS_ECV_CNTPOFF,
> +               .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> +               .matches = has_cpuid_feature,
> +               .sys_reg = SYS_ID_AA64MMFR0_EL1,
> +               .field_pos = ID_AA64MMFR0_EL1_ECV_SHIFT,
> +               .field_width = 4,
> +               .sign = FTR_UNSIGNED,
> +               .min_field_value = 2,

Nit: You might want to use ID_AA64MMFR0_EL1_ECV_CNTPOFF (instead of 2) ?

Reviewed-by: Reiji Watanabe <reijiw@google.com>

Thank you,
Reiji

> +       },
>  #ifdef CONFIG_ARM64_PAN
>         {
>                 .desc = "Privileged Access Never",
> diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps
> index 82c7e579a8ba..6a26a4678406 100644
> --- a/arch/arm64/tools/cpucaps
> +++ b/arch/arm64/tools/cpucaps
> @@ -23,6 +23,7 @@ HAS_DCPOP
>  HAS_DIT
>  HAS_E0PD
>  HAS_ECV
> +HAS_ECV_CNTPOFF
>  HAS_EPAN
>  HAS_GENERIC_AUTH
>  HAS_GENERIC_AUTH_ARCH_QARMA3
> --
> 2.34.1
>
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/16] KVM: arm64: timers: Convert per-vcpu virtual offset to a global value
  2023-02-16 14:21 ` [PATCH 05/16] KVM: arm64: timers: Convert per-vcpu virtual offset to a global value Marc Zyngier
@ 2023-02-22  6:15   ` Reiji Watanabe
  2023-02-22 10:54     ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Reiji Watanabe @ 2023-02-22  6:15 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

Hi Marc,

On Thu, Feb 16, 2023 at 6:21 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Having a per-vcpu virtual offset is a pain. It needs to be synchronized
> on each update, and expands badly to a setup where different timers can
> have different offsets, or have composite offsets (as with NV).
>
> So let's start by replacing the use of the CNTVOFF_EL2 shadow register
> (which we want to reclaim for NV anyway), and make the virtual timer
> carry a pointer to a VM-wide offset.

That's nice!

>
> This simplifies the code significantly.
>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  arch/arm64/include/asm/kvm_host.h |  3 +++
>  arch/arm64/kvm/arch_timer.c       | 45 +++++++------------------------
>  arch/arm64/kvm/hypercalls.c       |  2 +-
>  include/kvm/arm_arch_timer.h      | 15 +++++++++++
>  4 files changed, 29 insertions(+), 36 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 2a8175f38439..3adac0c5e175 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -193,6 +193,9 @@ struct kvm_arch {
>         /* Interrupt controller */
>         struct vgic_dist        vgic;
>
> +       /* Timers */
> +       struct arch_timer_vm_offsets offsets;

Nit: Perhaps using a more explicit name for the 'offsets' field might
be better than simply 'offsets', since it is a field of kvm_arch (not
a field of arch timer related struct).
(e.g. timer_offsets ?)

> +
>         /* Mandated version of PSCI */
>         u32 psci_version;
>
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 329a8444724f..1bb44f668608 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -84,14 +84,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
>
>  static u64 timer_get_offset(struct arch_timer_context *ctxt)
>  {
> -       struct kvm_vcpu *vcpu = ctxt->vcpu;
> +       if (ctxt->offset.vm_offset)
> +               return *ctxt->offset.vm_offset;
>
> -       switch(arch_timer_ctx_index(ctxt)) {
> -       case TIMER_VTIMER:
> -               return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
> -       default:
> -               return 0;
> -       }
> +       return 0;
>  }
>
>  static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
> @@ -128,15 +124,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
>
>  static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
>  {
> -       struct kvm_vcpu *vcpu = ctxt->vcpu;
> -
> -       switch(arch_timer_ctx_index(ctxt)) {
> -       case TIMER_VTIMER:
> -               __vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
> -               break;
> -       default:
> +       if (!ctxt->offset.vm_offset) {
>                 WARN(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
> +               return;
>         }
> +
> +       WRITE_ONCE(*ctxt->offset.vm_offset, offset);
>  }
>
>  u64 kvm_phys_timer_read(void)
> @@ -765,25 +758,6 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>         return 0;
>  }
>
> -/* Make the updates of cntvoff for all vtimer contexts atomic */
> -static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
> -{
> -       unsigned long i;
> -       struct kvm *kvm = vcpu->kvm;
> -       struct kvm_vcpu *tmp;
> -
> -       mutex_lock(&kvm->lock);
> -       kvm_for_each_vcpu(i, tmp, kvm)
> -               timer_set_offset(vcpu_vtimer(tmp), cntvoff);
> -
> -       /*
> -        * When called from the vcpu create path, the CPU being created is not
> -        * included in the loop above, so we just set it here as well.
> -        */
> -       timer_set_offset(vcpu_vtimer(vcpu), cntvoff);
> -       mutex_unlock(&kvm->lock);
> -}
> -
>  void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>  {
>         struct arch_timer_cpu *timer = vcpu_timer(vcpu);
> @@ -791,10 +765,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>         struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
>
>         vtimer->vcpu = vcpu;
> +       vtimer->offset.vm_offset = &vcpu->kvm->arch.offsets.voffset;
>         ptimer->vcpu = vcpu;
>
>         /* Synchronize cntvoff across all vtimers of a VM. */
> -       update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
> +       timer_set_offset(vtimer, kvm_phys_timer_read());
>         timer_set_offset(ptimer, 0);
>
>         hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> @@ -840,7 +815,7 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
>                 break;
>         case KVM_REG_ARM_TIMER_CNT:
>                 timer = vcpu_vtimer(vcpu);
> -               update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
> +               timer_set_offset(timer, kvm_phys_timer_read() - value);
>                 break;
>         case KVM_REG_ARM_TIMER_CVAL:
>                 timer = vcpu_vtimer(vcpu);
> diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> index 64c086c02c60..169d629f97cb 100644
> --- a/arch/arm64/kvm/hypercalls.c
> +++ b/arch/arm64/kvm/hypercalls.c
> @@ -44,7 +44,7 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
>         feature = smccc_get_arg1(vcpu);
>         switch (feature) {
>         case KVM_PTP_VIRT_COUNTER:
> -               cycles = systime_snapshot.cycles - vcpu_read_sys_reg(vcpu, CNTVOFF_EL2);
> +               cycles = systime_snapshot.cycles - vcpu->kvm->arch.offsets.voffset;
>                 break;
>         case KVM_PTP_PHYS_COUNTER:
>                 cycles = systime_snapshot.cycles;
> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 76174f4fb646..41fda6255174 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -23,6 +23,19 @@ enum kvm_arch_timer_regs {
>         TIMER_REG_CTL,
>  };
>
> +struct arch_timer_offset {
> +       /*
> +        * If set, pointer to one of the offsets in the kvm's offset
> +        * structure. If NULL, assume a zero offset.
> +        */
> +       u64     *vm_offset;
> +};
> +
> +struct arch_timer_vm_offsets {
> +       /* Offset applied to the virtual timer/counter */
> +       u64     voffset;
> +};
> +
>  struct arch_timer_context {
>         struct kvm_vcpu                 *vcpu;
>
> @@ -33,6 +46,8 @@ struct arch_timer_context {
>         struct hrtimer                  hrtimer;
>         u64                             ns_frac;
>
> +       /* Offset for this counter/timer */
> +       struct arch_timer_offset        offset;
>         /*
>          * We have multiple paths which can save/restore the timer state onto
>          * the hardware, so we need some way of keeping track of where the
> --
> 2.34.1
>
>

It appears that we can remove CNTVOFF_EL2 from the sysreg file.
Having said that, I don't think it matters anyway since the
following patch appears to use that again.

Reviewed-by: Reiji Watanabe <reijiw@google.com>

Thank you,
Reiji

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 02/16] arm64: Add HAS_ECV_CNTPOFF capability
  2023-02-22  4:30   ` Reiji Watanabe
@ 2023-02-22 10:47     ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-22 10:47 UTC (permalink / raw)
  To: Reiji Watanabe
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

On Wed, 22 Feb 2023 04:30:00 +0000,
Reiji Watanabe <reijiw@google.com> wrote:
> 
> Hi Marc,
> 
> On Thu, Feb 16, 2023 at 6:21 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Add the probing code for the FEAT_ECV variant that implements CNTPOFF_EL2.
> > Why is it optional is a mystery, but let's try and detect it.
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/kernel/cpufeature.c | 11 +++++++++++
> >  arch/arm64/tools/cpucaps       |  1 +
> >  2 files changed, 12 insertions(+)
> >
> > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> > index 23bd2a926b74..36852f96898d 100644
> > --- a/arch/arm64/kernel/cpufeature.c
> > +++ b/arch/arm64/kernel/cpufeature.c
> > @@ -2186,6 +2186,17 @@ static const struct arm64_cpu_capabilities arm64_features[] = {
> >                 .sign = FTR_UNSIGNED,
> >                 .min_field_value = 1,
> >         },
> > +       {
> > +               .desc = "Enhanced Counter Virtualization (CNTPOFF)",
> > +               .capability = ARM64_HAS_ECV_CNTPOFF,
> > +               .type = ARM64_CPUCAP_SYSTEM_FEATURE,
> > +               .matches = has_cpuid_feature,
> > +               .sys_reg = SYS_ID_AA64MMFR0_EL1,
> > +               .field_pos = ID_AA64MMFR0_EL1_ECV_SHIFT,
> > +               .field_width = 4,
> > +               .sign = FTR_UNSIGNED,
> > +               .min_field_value = 2,
> 
> Nit: You might want to use ID_AA64MMFR0_EL1_ECV_CNTPOFF (instead of 2) ?

Ah, of course! ;-)

> Reviewed-by: Reiji Watanabe <reijiw@google.com>

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 05/16] KVM: arm64: timers: Convert per-vcpu virtual offset to a global value
  2023-02-22  6:15   ` Reiji Watanabe
@ 2023-02-22 10:54     ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-22 10:54 UTC (permalink / raw)
  To: Reiji Watanabe
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Oliver Upton, Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

On Wed, 22 Feb 2023 06:15:56 +0000,
Reiji Watanabe <reijiw@google.com> wrote:
> 
> Hi Marc,
> 
> On Thu, Feb 16, 2023 at 6:21 AM Marc Zyngier <maz@kernel.org> wrote:
> >
> > Having a per-vcpu virtual offset is a pain. It needs to be synchronized
> > on each update, and expands badly to a setup where different timers can
> > have different offsets, or have composite offsets (as with NV).
> >
> > So let's start by replacing the use of the CNTVOFF_EL2 shadow register
> > (which we want to reclaim for NV anyway), and make the virtual timer
> > carry a pointer to a VM-wide offset.
> 
> That's nice!
> 
> >
> > This simplifies the code significantly.
> >
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  3 +++
> >  arch/arm64/kvm/arch_timer.c       | 45 +++++++------------------------
> >  arch/arm64/kvm/hypercalls.c       |  2 +-
> >  include/kvm/arm_arch_timer.h      | 15 +++++++++++
> >  4 files changed, 29 insertions(+), 36 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 2a8175f38439..3adac0c5e175 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -193,6 +193,9 @@ struct kvm_arch {
> >         /* Interrupt controller */
> >         struct vgic_dist        vgic;
> >
> > +       /* Timers */
> > +       struct arch_timer_vm_offsets offsets;
> 
> Nit: Perhaps using a more explicit name for the 'offsets' field might
> be better than simply 'offsets', since it is a field of kvm_arch (not
> a field of arch timer related struct).
> (e.g. timer_offsets ?)

Good point. Actually, maybe this should be a more generic structure
(arch_timer_global, or something similar), containing the offset(s).

I'll have a think.

> 
> > +
> >         /* Mandated version of PSCI */
> >         u32 psci_version;
> >
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index 329a8444724f..1bb44f668608 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -84,14 +84,10 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
> >
> >  static u64 timer_get_offset(struct arch_timer_context *ctxt)
> >  {
> > -       struct kvm_vcpu *vcpu = ctxt->vcpu;
> > +       if (ctxt->offset.vm_offset)
> > +               return *ctxt->offset.vm_offset;
> >
> > -       switch(arch_timer_ctx_index(ctxt)) {
> > -       case TIMER_VTIMER:
> > -               return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
> > -       default:
> > -               return 0;
> > -       }
> > +       return 0;
> >  }
> >
> >  static void timer_set_ctl(struct arch_timer_context *ctxt, u32 ctl)
> > @@ -128,15 +124,12 @@ static void timer_set_cval(struct arch_timer_context *ctxt, u64 cval)
> >
> >  static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
> >  {
> > -       struct kvm_vcpu *vcpu = ctxt->vcpu;
> > -
> > -       switch(arch_timer_ctx_index(ctxt)) {
> > -       case TIMER_VTIMER:
> > -               __vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
> > -               break;
> > -       default:
> > +       if (!ctxt->offset.vm_offset) {
> >                 WARN(offset, "timer %ld\n", arch_timer_ctx_index(ctxt));
> > +               return;
> >         }
> > +
> > +       WRITE_ONCE(*ctxt->offset.vm_offset, offset);
> >  }
> >
> >  u64 kvm_phys_timer_read(void)
> > @@ -765,25 +758,6 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
> >         return 0;
> >  }
> >
> > -/* Make the updates of cntvoff for all vtimer contexts atomic */
> > -static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
> > -{
> > -       unsigned long i;
> > -       struct kvm *kvm = vcpu->kvm;
> > -       struct kvm_vcpu *tmp;
> > -
> > -       mutex_lock(&kvm->lock);
> > -       kvm_for_each_vcpu(i, tmp, kvm)
> > -               timer_set_offset(vcpu_vtimer(tmp), cntvoff);
> > -
> > -       /*
> > -        * When called from the vcpu create path, the CPU being created is not
> > -        * included in the loop above, so we just set it here as well.
> > -        */
> > -       timer_set_offset(vcpu_vtimer(vcpu), cntvoff);
> > -       mutex_unlock(&kvm->lock);
> > -}
> > -
> >  void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> >  {
> >         struct arch_timer_cpu *timer = vcpu_timer(vcpu);
> > @@ -791,10 +765,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> >         struct arch_timer_context *ptimer = vcpu_ptimer(vcpu);
> >
> >         vtimer->vcpu = vcpu;
> > +       vtimer->offset.vm_offset = &vcpu->kvm->arch.offsets.voffset;
> >         ptimer->vcpu = vcpu;
> >
> >         /* Synchronize cntvoff across all vtimers of a VM. */
> > -       update_vtimer_cntvoff(vcpu, kvm_phys_timer_read());
> > +       timer_set_offset(vtimer, kvm_phys_timer_read());
> >         timer_set_offset(ptimer, 0);
> >
> >         hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> > @@ -840,7 +815,7 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *vcpu, u64 regid, u64 value)
> >                 break;
> >         case KVM_REG_ARM_TIMER_CNT:
> >                 timer = vcpu_vtimer(vcpu);
> > -               update_vtimer_cntvoff(vcpu, kvm_phys_timer_read() - value);
> > +               timer_set_offset(timer, kvm_phys_timer_read() - value);
> >                 break;
> >         case KVM_REG_ARM_TIMER_CVAL:
> >                 timer = vcpu_vtimer(vcpu);
> > diff --git a/arch/arm64/kvm/hypercalls.c b/arch/arm64/kvm/hypercalls.c
> > index 64c086c02c60..169d629f97cb 100644
> > --- a/arch/arm64/kvm/hypercalls.c
> > +++ b/arch/arm64/kvm/hypercalls.c
> > @@ -44,7 +44,7 @@ static void kvm_ptp_get_time(struct kvm_vcpu *vcpu, u64 *val)
> >         feature = smccc_get_arg1(vcpu);
> >         switch (feature) {
> >         case KVM_PTP_VIRT_COUNTER:
> > -               cycles = systime_snapshot.cycles - vcpu_read_sys_reg(vcpu, CNTVOFF_EL2);
> > +               cycles = systime_snapshot.cycles - vcpu->kvm->arch.offsets.voffset;
> >                 break;
> >         case KVM_PTP_PHYS_COUNTER:
> >                 cycles = systime_snapshot.cycles;
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index 76174f4fb646..41fda6255174 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -23,6 +23,19 @@ enum kvm_arch_timer_regs {
> >         TIMER_REG_CTL,
> >  };
> >
> > +struct arch_timer_offset {
> > +       /*
> > +        * If set, pointer to one of the offsets in the kvm's offset
> > +        * structure. If NULL, assume a zero offset.
> > +        */
> > +       u64     *vm_offset;
> > +};
> > +
> > +struct arch_timer_vm_offsets {
> > +       /* Offset applied to the virtual timer/counter */
> > +       u64     voffset;
> > +};
> > +
> >  struct arch_timer_context {
> >         struct kvm_vcpu                 *vcpu;
> >
> > @@ -33,6 +46,8 @@ struct arch_timer_context {
> >         struct hrtimer                  hrtimer;
> >         u64                             ns_frac;
> >
> > +       /* Offset for this counter/timer */
> > +       struct arch_timer_offset        offset;
> >         /*
> >          * We have multiple paths which can save/restore the timer state onto
> >          * the hardware, so we need some way of keeping track of where the
> > --
> > 2.34.1
> >
> >
> 
> It appears that we can remove CNTVOFF_EL2 from the sysreg file.
> Having said that, I don't think it matters anyway since the
> following patch appears to use that again.

Indeed, and this is why I didn't remove it the first place (we need it
for NV).

> 
> Reviewed-by: Reiji Watanabe <reijiw@google.com>

Thanks!

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-17 22:11       ` Oliver Upton
@ 2023-02-22 11:56         ` Marc Zyngier
  2023-02-22 16:34           ` Oliver Upton
  0 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-22 11:56 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

On Fri, 17 Feb 2023 22:11:36 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Fri, Feb 17, 2023 at 10:17:27AM +0000, Marc Zyngier wrote:
> > Hi Oliver,
> > 
> > On Thu, 16 Feb 2023 22:09:47 +0000,
> > Oliver Upton <oliver.upton@linux.dev> wrote:
> > > 
> > > Hi Marc,
> > > 
> > > On Thu, Feb 16, 2023 at 02:21:15PM +0000, Marc Zyngier wrote:
> > > > And this is the moment you have all been waiting for: setting the
> > > > counter offsets from userspace.
> > > > 
> > > > We expose a brand new capability that reports the ability to set
> > > > the offsets for both the virtual and physical sides, independently.
> > > > 
> > > > In keeping with the architecture, the offsets are expressed as
> > > > a delta that is substracted from the physical counter value.
> > > > 
> > > > Once this new API is used, there is no going back, and the counters
> > > > cannot be written to to set the offsets implicitly (the writes
> > > > are instead ignored).
> > > 
> > > Is there any particular reason to use an explicit ioctl as opposed to
> > > the KVM_{GET,SET}_DEVICE_ATTR ioctls? Dunno where you stand on it, but I
> > > quite like that interface for simple state management. We also avoid
> > > eating up more UAPI bits in the global namespace.
> > 
> > The problem with that is that it requires yet another KVM device for
> > this, and I'm lazy. It also makes it a bit harder for the VMM to buy
> > into this (need to track another FD, for example).
> 
> You can also accept the device ioctls on the actual VM FD, quite like
> we do for the vCPU right now. And hey, I've got a patch that gets you
> most of the way there!
> 
> https://lore.kernel.org/kvmarm/20230211013759.3556016-3-oliver.upton@linux.dev/

Huh... I don't know yet if I love it or hate it.At the end of the day,
this is just another ioctl, so I don't care either way.

> > > Is there any reason why we can't just order this ioctl before vCPU
> > > creation altogether, or is there a need to do this at runtime? We're
> > > about to tolerate multiple writers to the offset value, and I think the
> > > only thing we need to guarantee is that the below flag is set before
> > > vCPU ioctls have a chance to run.
> > 
> > Again, we don't know for sure whether the final offset is available
> > before vcpu creation time. My idea for QEMU would be to perform the
> > offset adjustment as late as possible, right before executing the VM,
> > after having restored the vcpus with whatever value they had.
> 
> So how does userspace work out an offset based on available information?
> The part that hasn't clicked for me yet is where userspace gets the
> current value of the true physical counter to calculate an offset.

What's wrong with CNTVCT_EL0?

> We could make it ABI that the guest's physical counter matches that of
> the host by default. Of course, that has been the case since the
> beginning of time but it is now directly user-visible.
> 
> The only part I don't like about that is that we aren't fully creating
> an abstraction around host and guest system time. So here's my current
> mental model of how we represent the generic timer to userspace:
> 
> 				+-----------------------+
> 				|	   		|
> 				| Host System Counter	|
> 				|	   (1) 		|
> 				+-----------------------+
> 				    	   |
> 			       +-----------+-----------+
> 			       |		       |
>        +-----------------+  +-----+		    +-----+  +--------------------+
>        | (2) CNTPOFF_EL2 |--| sub |		    | sub |--| (3) CNTVOFF_EL2    |
>        +-----------------+  +-----+	     	    +-----+  +--------------------+
> 			       |           	       |
> 			       |		       |
> 		     +-----------------+	 +----------------+
> 		     | (5) CNTPCT_EL0  |         | (4) CNTVCT_EL0 |
> 		     +-----------------+	 +----------------+
> 
> AFAICT, this UAPI exposes abstractions for (2) and (3) to userspace, but
> userspace cannot directly get at (1).

Of course it can! CNTVCT_EL0 is accessible from userspace, and is
guaranteed to have an offset of 0 on a host.

> 
> Chewing on this a bit more, I don't think userspace has any business
> messing with virtual and physical time independently, especially when
> nested virtualization comes into play.

Well, NV already ignores the virtual offset completely (see how the
virtual timer gets its offset reassigned at reset time).

> 
> I think the illusion to userspace needs to be built around the notion of
> a system counter:
> 
>                                 +-----------------------+
>                                 |                       |
>                                 | Host System Counter   |
>                                 |          (1)          |
>                                 +-----------------------+
> 					   |
> 					   |
> 					+-----+   +-------------------+
> 					| sub |---| (6) system_offset |
> 					+-----+   +-------------------+
> 					   |
> 					   |
>                                 +-----------------------+
>                                 |                       |
>                                 | Guest System Counter  |
>                                 |          (7)          |
>                                 +-----------------------+
>                                            |
>                                +-----------+-----------+
>                                |                       |
>        +-----------------+  +-----+                 +-----+  +--------------------+
>        | (2) CNTPOFF_EL2 |--| sub |                 | sub |--| (3) CNTVOFF_EL2    |
>        +-----------------+  +-----+                 +-----+  +--------------------+
>                                |                       |
>                                |                       |
>                      +-----------------+         +----------------+
>                      | (5) CNTPCT_EL0  |         | (4) CNTVCT_EL0 |
>                      +-----------------+         +----------------+
> 
> And from a UAPI perspective, we would either expose (1) and (6) to let
> userspace calculate an offset or simply allow (7) to be directly
> read/written.

I previously toyed with this idea, and I really like it. However, the
problem with this is that it breaks the current behaviour of having
two different values for CNTVCT and CNTPCT in the guest, and CNTPCT
representing the counter value on the host.

Such a VM cannot be migrated *today*, but not everybody cares about
migration. My "dual offset" approach allows the current behaviour to
persist, and such a VM to be migrated. The luser even gets the choice
of preserving counter continuity in the guest or to stay without a
physical offset and reflect the host's counter.

Is it a good behaviour? Of course not. Does anyone depend on it? I
have no idea, but odds are that someone does. Can we break their toys?
The jury is still out.

> 
> That frees up the meaning of the counter offsets as being purely a
> virtual EL2 thing. These registers would reset to 0, and non-NV guests
> could never change their value.
> 
> Under the hood KVM would program the true offset registers as:
> 
> 	CNT{P,V}OFF_EL2 = 'virtual CNT{P,V}OFF_EL2' + system_offset
> 
> With this we would effectively configure CNTPCT = CNTVCT = 0 at the
> point of VM creation. Only crappy thing is it requires full physical
> counter/timer emulation for non-ECV systems, but the guest shouldn't be
> using the physical counter in the first place.

And I think that's the point where we differ. I can completely imagine
some in-VM code using the physical counter to export some timestamping
to the host (for tracing purposes, amongst other things).

> Yes, this sucks for guests running on hosts w/ NV but not ECV. If anyone
> can tell me how an L0 hypervisor is supposed to do NV without ECV, I'm
> all ears.

You absolutely can run with NV2 without ECV. You just get a bad
quality of emulation for the EL0 timers. But that's about it.

> Does any of what I've written make remote sense or have I gone entirely
> off the rails with my ASCII art? :)

Your ASCII art is beautiful, only a tad too wide! ;-) What you suggest
makes a lot of sense, but it leaves existing behaviours in the lurch.
Can we pretend they don't exist? You tell me!

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-22 11:56         ` Marc Zyngier
@ 2023-02-22 16:34           ` Oliver Upton
  2023-02-23 18:25             ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Oliver Upton @ 2023-02-22 16:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

On Wed, Feb 22, 2023 at 11:56:53AM +0000, Marc Zyngier wrote:

[...]

> > AFAICT, this UAPI exposes abstractions for (2) and (3) to userspace, but
> > userspace cannot directly get at (1).
> 
> Of course it can! CNTVCT_EL0 is accessible from userspace, and is
> guaranteed to have an offset of 0 on a host.

Derp, yes :)

> > 
> > Chewing on this a bit more, I don't think userspace has any business
> > messing with virtual and physical time independently, especially when
> > nested virtualization comes into play.
> 
> Well, NV already ignores the virtual offset completely (see how the
> virtual timer gets its offset reassigned at reset time).

I'll need to have a look at that, but if we need to ignore user input on
the shiny new interface for NV then I really do wonder if it is the
right fit.

> I previously toyed with this idea, and I really like it. However, the
> problem with this is that it breaks the current behaviour of having
> two different values for CNTVCT and CNTPCT in the guest, and CNTPCT
> representing the counter value on the host.
> 
> Such a VM cannot be migrated *today*, but not everybody cares about
> migration. My "dual offset" approach allows the current behaviour to
> persist, and such a VM to be migrated. The luser even gets the choice
> of preserving counter continuity in the guest or to stay without a
> physical offset and reflect the host's counter.
> 
> Is it a good behaviour? Of course not. Does anyone depend on it? I
> have no idea, but odds are that someone does. Can we break their toys?
> The jury is still out.

Well, I think the new interface presents an opportunity to change the
rules around counter migration, and the illusion of two distinct offsets
for physical / virtual counters will need to be broken soon enough for
NV. Do we need to bend over backwards for a theoretical use case with
the new UAPI? If anyone depends on the existing behavior then they can
continue to use the old UAPI to partially migrate the guest counters.

My previous suggestion of tying the physical and virtual counters
together at VM creation would definitely break such a use case, though,
so we'd be at the point of requiring explicit opt-in from userspace.

> > 
> > That frees up the meaning of the counter offsets as being purely a
> > virtual EL2 thing. These registers would reset to 0, and non-NV guests
> > could never change their value.
> > 
> > Under the hood KVM would program the true offset registers as:
> > 
> > 	CNT{P,V}OFF_EL2 = 'virtual CNT{P,V}OFF_EL2' + system_offset
> > 
> > With this we would effectively configure CNTPCT = CNTVCT = 0 at the
> > point of VM creation. Only crappy thing is it requires full physical
> > counter/timer emulation for non-ECV systems, but the guest shouldn't be
> > using the physical counter in the first place.
> 
> And I think that's the point where we differ. I can completely imagine
> some in-VM code using the physical counter to export some timestamping
> to the host (for tracing purposes, amongst other things).

So in this case the guest and userspace would already be in cahoots, so
userspace could choose to not use UAPI. Hell, if userspace did
absolutely nothing then it all continues to work.

> > Yes, this sucks for guests running on hosts w/ NV but not ECV. If anyone
> > can tell me how an L0 hypervisor is supposed to do NV without ECV, I'm
> > all ears.
> 
> You absolutely can run with NV2 without ECV. You just get a bad
> quality of emulation for the EL0 timers. But that's about it.a

'do NV well', I should've said :)

> > Does any of what I've written make remote sense or have I gone entirely
> > off the rails with my ASCII art? :)
> 
> Your ASCII art is beautiful, only a tad too wide! ;-) What you suggest
> makes a lot of sense, but it leaves existing behaviours in the lurch.
> Can we pretend they don't exist? You tell me!

Oh, we're definitely on the hook for any existing misuse of observable
KVM behavior. I just think if we're at the point of adding new UAPI we
may as well lay down some new rules with userspace to avoid surprises.

OTOH, ignoring the virtual offset for NV is another way out of the mess,
but it just bothers me we're about to ignore input on a brand new
UAPI...

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-22 16:34           ` Oliver Upton
@ 2023-02-23 18:25             ` Marc Zyngier
  2023-03-08  7:46               ` Oliver Upton
  0 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-02-23 18:25 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

On Wed, 22 Feb 2023 16:34:32 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> On Wed, Feb 22, 2023 at 11:56:53AM +0000, Marc Zyngier wrote:
> 
> > > Chewing on this a bit more, I don't think userspace has any business
> > > messing with virtual and physical time independently, especially when
> > > nested virtualization comes into play.
> > 
> > Well, NV already ignores the virtual offset completely (see how the
> > virtual timer gets its offset reassigned at reset time).
> 
> I'll need to have a look at that, but if we need to ignore user input on
> the shiny new interface for NV then I really do wonder if it is the
> right fit.

The thing is, I'm not keen making the interface modal. Modes suck and
lead to invasive userspace changes. I'd rather ignore irrelevant
parameters than change the way userspace works. And you don't *have*
to provide the virtual offset with NV.

> > I previously toyed with this idea, and I really like it. However, the
> > problem with this is that it breaks the current behaviour of having
> > two different values for CNTVCT and CNTPCT in the guest, and CNTPCT
> > representing the counter value on the host.
> > 
> > Such a VM cannot be migrated *today*, but not everybody cares about
> > migration. My "dual offset" approach allows the current behaviour to
> > persist, and such a VM to be migrated. The luser even gets the choice
> > of preserving counter continuity in the guest or to stay without a
> > physical offset and reflect the host's counter.
> > 
> > Is it a good behaviour? Of course not. Does anyone depend on it? I
> > have no idea, but odds are that someone does. Can we break their toys?
> > The jury is still out.
> 
> Well, I think the new interface presents an opportunity to change the
> rules around counter migration, and the illusion of two distinct offsets
> for physical / virtual counters will need to be broken soon enough for
> NV.

Broken in the kernel, sure. Do we need to involve userspace in what
was an initial mis-design of the API? Changing these rules mean
changing the way userspace works, and I'm not keen on going there.

> Do we need to bend over backwards for a theoretical use case with
> the new UAPI? If anyone depends on the existing behavior then they can
> continue to use the old UAPI to partially migrate the guest counters.

I don't buy the old/new thing. My take is that these things should be
cumulative if there isn't a hard reason to break the existing API.

> My previous suggestion of tying the physical and virtual counters
> together at VM creation would definitely break such a use case, though,
> so we'd be at the point of requiring explicit opt-in from userspace.

I'm trying to find a middle ground, so bear with me. Here's the
situation as I see it:

(1) a VM that is migrating today can only set the virtual offset and
    doesn't affect the physical counter. This behaviour must be
    preserved in we cannot prove that nobody relies on it.

(2) setting the physical offset could be done by two means:

    (a) writing the counter register (like we do for CNTVCT)
    (b) providing an offset via a side channel

I think (1) must stay forever, just like we still support the old
GICv2 implicit initialisation.

(2a) is also desirable as it requires no extra work on the VMM side.
Just restore the damn thing, and nothing changes (we're finally able
to migrate the physical timer). (2b) really is icing on the cake.

The question is whether we can come up with an API offering (2b) that
still allows (1) and (2a). I'd be happy with a new API that, when
used, resets both offsets to the same value, matching your pretty
picture. But the dual offsetting still has to exist internally.

When it comes to NV, it uses either the physical offset that has been
provided by writing CNTPCT, or the one that has been written via the
new API. Under the hood, this is the same piece of data, of course.

The only meaningful difference with my initial proposal is that there
is no new virtual offset API. It is either register writes that obey
the same rules as before, or a single offset setting.

> > And I think that's the point where we differ. I can completely imagine
> > some in-VM code using the physical counter to export some timestamping
> > to the host (for tracing purposes, amongst other things).
> 
> So in this case the guest and userspace would already be in cahoots, so
> userspace could choose to not use UAPI. Hell, if userspace did
> absolutely nothing then it all continues to work.

Userspace, yes. Not necessarily the VMM. Let's say my guest spits a
bunch of timestamped traces over a network connection, which I then
correlate with host traces (all similarities with a shipping product
are completely fortuitous...). The VMM isn't involved at all here.

> Oh, we're definitely on the hook for any existing misuse of observable
> KVM behavior. I just think if we're at the point of adding new UAPI we
> may as well lay down some new rules with userspace to avoid surprises.
> 
> OTOH, ignoring the virtual offset for NV is another way out of the mess,
> but it just bothers me we're about to ignore input on a brand new
> UAPI...

I think my single-offset suggestion should answer your questioning.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (16 preceding siblings ...)
  2023-02-21 16:28 ` [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Veith, Simon
@ 2023-02-23 22:29 ` Colton Lewis
  2023-02-24  8:45   ` Marc Zyngier
  2023-02-24 20:07 ` Colton Lewis
  18 siblings, 1 reply; 55+ messages in thread
From: Colton Lewis @ 2023-02-23 22:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

Hello Marc,

This patch is of special interest to me as I was working on my own
ECV/CNTPOFF implementation, although mine was more narrow and doesn't
address any of your other goals. As far as I can tell at the moment,
your patch can cover the uses of mine, so I will dedicate
myself to reviewing and testing yours. Please include me on any future
rerolls of this patch.

Marc Zyngier <maz@kernel.org> writes:

> This series aims at satisfying multiple goals:

> - allow a VMM to atomically restore a timer offset for a whole VM
>    instead of updating the offset each time a vcpu get its counter
>    written

> - allow a VMM to save/restore the physical timer context, something
>    that we cannot do at the moment due to the lack of offsetting

> - provide a framework that is suitable for NV support, where we get
>    both global and per timer, per vcpu offsetting


If I am understanding your changes correctly, you introduce some VM-wide
timers with no hardware backing and a new API to access them from
userspace. This is useful both because it makes the code simpler (no
need to manually keep registers in sync), and so CNT{V,P}OFF can be
appropriately virtualized with NV. Is that a fair summary of the
important bits?

> This has been moderately tested with nVHE, VHE and NV. I do not have
> access to CNTPOFF-aware HW, so the jury is still out on that one. Note
> that the NV patches in this series are here to give a perspective on
> how this gets used.

> I've updated the arch_timer selftest to allow offsets to be provided
> from the command line, but the arch_test is pretty flimsy and tends to
> fail with an error==EINTR, even without this series. Something to
> investigate.

I can help you with testing because I have access to CNTPOFF-aware
hardware and emulators. Is there anything you are especially interested
to try out?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 03/16] kvm: arm64: Expose {un,}lock_all_vcpus() to the reset of KVM
  2023-02-16 14:21 ` [PATCH 03/16] kvm: arm64: Expose {un,}lock_all_vcpus() to the reset of KVM Marc Zyngier
@ 2023-02-23 22:30   ` Colton Lewis
  0 siblings, 0 replies; 55+ messages in thread
From: Colton Lewis @ 2023-02-23 22:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

nit: kvm should be capitalized in the summary line

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 04/16] KVM: arm64: timers: Use a per-vcpu, per-timer accumulator for fractional ns
  2023-02-16 14:21 ` [PATCH 04/16] KVM: arm64: timers: Use a per-vcpu, per-timer accumulator for fractional ns Marc Zyngier
@ 2023-02-23 22:30   ` Colton Lewis
  0 siblings, 0 replies; 55+ messages in thread
From: Colton Lewis @ 2023-02-23 22:30 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

Marc Zyngier <maz@kernel.org> writes:

> Instead of accumulating the fractionnal ns value generated everytime
                               ^                              ^
nit: fractional
nit: every time

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/16] KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer
  2023-02-16 14:21 ` [PATCH 06/16] KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer Marc Zyngier
@ 2023-02-23 22:34   ` Colton Lewis
  2023-02-24  8:59     ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Colton Lewis @ 2023-02-23 22:34 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2


CNTPOFF_EL2 should probably be added to the vcpu_sysreg enum in
kvm_host.h for this patch. This seemed like the most relevant commit.

Marc Zyngier <maz@kernel.org> writes:

> +static bool has_cntpoff(void)
> +{
> +	return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
> +}
> +

This being used to guard the register in set_cntpoff seems to say that
we can only write CNTPOFF in VHE mode. Since it's possible the
underlying hardware could still support CNTPOFF even if KVM is running
in nVHE mode, should there be a way to set CNTPOFF in nVHE mode? Is that
possible?

This caused some confusion for me on my implementation as some teammates
thought so but I could not get an implementation working for nVHE mode.

> @@ -84,7 +89,7 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)

>   static u64 timer_get_offset(struct arch_timer_context *ctxt)
>   {
> -	if (ctxt->offset.vm_offset)
> +	if (ctxt && ctxt->offset.vm_offset)
>   		return *ctxt->offset.vm_offset;

>   	return 0;

nit: this change should be in the previous commit

> @@ -480,6 +491,7 @@ static void timer_save_state(struct  
> arch_timer_context *ctx)
>   		write_sysreg_el0(0, SYS_CNTP_CTL);
>   		isb();

> +		set_cntpoff(0);
>   		break;
>   	case NR_KVM_TIMERS:
>   		BUG();

This seems to say CNTPOFF will be reset to 0 every time a vcpu
switches out. What if the host originally had some value other than 0?
Is KVM responsible for that context?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/16] KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2
  2023-02-16 14:21 ` [PATCH 07/16] KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2 Marc Zyngier
@ 2023-02-23 22:40   ` Colton Lewis
  2023-02-24 10:54     ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Colton Lewis @ 2023-02-23 22:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2


Marc Zyngier <maz@kernel.org> writes:

> +/* If _pred is true, set bit in _set, otherwise set it in _clr */
> +#define assign_clear_set_bit(_pred, _bit, _clr, _set)			\
> +	do {								\
> +		if (_pred)						\
> +			(_set) |= (_bit);				\
> +		else							\
> +			(_clr) |= (_bit);				\
> +	} while (0)
> +

I don't think the do-while wrapper is necessary. Is there any reason
besides style guide conformance?

> +	/*
> +	 * We have two possibility to deal with a physical offset:
> +	 *
> +	 * - Either we have CNTPOFF (yay!) or the offset is 0:
> +	 *   we let the guest freely access the HW
> +	 *
> +	 * - or neither of these condition apply:
> +	 *   we trap accesses to the HW, but still use it
> +	 *   after correcting the physical offset
> +	 */
> +	if (!has_cntpoff() && timer_get_offset(map->direct_ptimer))
> +		tpt = tpc = true;

If there are only two possibilites, then two different booleans makes
things more complicated than it has to be.

> +	assign_clear_set_bit(tpt, CNTHCTL_EL1PCEN << 10, set, clr);
> +	assign_clear_set_bit(tpc, CNTHCTL_EL1PCTEN << 10, set, clr);

Might be good to name the 10 something like VHE_SHIFT so people know why
it is applied.

> +
> +
> +	timer_set_traps(vcpu, &map);
>   }

>   bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> @@ -1293,27 +1363,12 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
>   }

>   /*
> - * On VHE system, we only need to configure the EL2 timer trap register  
> once,
> - * not for every world switch.
> - * The host kernel runs at EL2 with HCR_EL2.TGE == 1,
> - * and this makes those bits have no effect for the host kernel  
> execution.
> + * If we have CNTPOFF, permanently set ECV to enable it.
>    */
>   void kvm_timer_init_vhe(void)
>   {
> -	/* When HCR_EL2.E2H ==1, EL1PCEN and EL1PCTEN are shifted by 10 */
> -	u32 cnthctl_shift = 10;
> -	u64 val;
> -
> -	/*
> -	 * VHE systems allow the guest direct access to the EL1 physical
> -	 * timer/counter.
> -	 */
> -	val = read_sysreg(cnthctl_el2);
> -	val |= (CNTHCTL_EL1PCEN << cnthctl_shift);
> -	val |= (CNTHCTL_EL1PCTEN << cnthctl_shift);
>   	if (cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF))
> -		val |= CNTHCTL_ECV;
> -	write_sysreg(val, cnthctl_el2);
> +		sysreg_clear_set(cntkctl_el1, 0, CNTHCTL_ECV);
>   }

What is the reason for moving these register writes from initialization
to vcpu load time? This contradicts the comment that says this is only
needed once and not at every world switch. Seems like doing more work
for no reason.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-16 14:21 ` [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets Marc Zyngier
  2023-02-16 22:09   ` Oliver Upton
@ 2023-02-23 22:41   ` Colton Lewis
  2023-02-24 11:24     ` Marc Zyngier
  1 sibling, 1 reply; 55+ messages in thread
From: Colton Lewis @ 2023-02-23 22:41 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

Marc Zyngier <maz@kernel.org> writes:

> Once this new API is used, there is no going back, and the counters
> cannot be written to to set the offsets implicitly (the writes
> are instead ignored).

Why do this? I can't see a reason for disabling the other API the first
time this one is used.

> In keeping with the architecture, the offsets are expressed as
> a delta that is substracted from the physical counter value.
                   ^
nit: subtracted

> +/*
> + * Counter/Timer offset structure. Describe the virtual/physical offsets.
> + * To be used with KVM_ARM_SET_CNT_OFFSETS.
> + */
> +struct kvm_arm_counter_offsets {
> +	__u64 virtual_offset;
> +	__u64 physical_offset;
> +
> +#define KVM_COUNTER_SET_VOFFSET_FLAG	(1UL << 0)
> +#define KVM_COUNTER_SET_POFFSET_FLAG	(1UL << 1)
> +
> +	__u64 flags;
> +	__u64 reserved;
> +};
> +

It looks weird to have the #defines in the middle of the struct like
that. I think it would be easier to read with the #defines before the
struct.

> @@ -852,9 +852,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
>   	ptimer->vcpu = vcpu;
>   	ptimer->offset.vm_offset = &vcpu->kvm->arch.offsets.poffset;

> -	/* Synchronize cntvoff across all vtimers of a VM. */
> -	timer_set_offset(vtimer, kvm_phys_timer_read());
> -	timer_set_offset(ptimer, 0);
> +	/* Synchronize offsets across timers of a VM if not already provided */
> +	if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS, &vcpu->kvm->arch.flags)) {
> +		timer_set_offset(vtimer, kvm_phys_timer_read());
> +		timer_set_offset(ptimer, 0);
> +	}

>   	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
>   	timer->bg_timer.function = kvm_bg_timer_expire;

The code says "assign the offsets if the KVM_ARCH_FLAG_COUNTER_OFFSETS
flag is not on". The flag name is confusing and made it hard for me to
understand the intent. I think the intent is to only assign the offsets
if the user has not called the API to provide some offsets (that would
have been assigned in the API call along with flipping the flag
on). With that in mind, I would prefer the flag name reference the
user. KVM_ARCH_FLAG_USER_OFFSETS

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit
  2023-02-23 22:29 ` Colton Lewis
@ 2023-02-24  8:45   ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-24  8:45 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

On Thu, 23 Feb 2023 22:29:29 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> Hello Marc,
> 
> This patch is of special interest to me as I was working on my own
> ECV/CNTPOFF implementation, although mine was more narrow and doesn't
> address any of your other goals. As far as I can tell at the moment,
> your patch can cover the uses of mine, so I will dedicate
> myself to reviewing and testing yours. Please include me on any future
> rerolls of this patch.
> 
> Marc Zyngier <maz@kernel.org> writes:
> 
> > This series aims at satisfying multiple goals:
> 
> > - allow a VMM to atomically restore a timer offset for a whole VM
> >    instead of updating the offset each time a vcpu get its counter
> >    written
> 
> > - allow a VMM to save/restore the physical timer context, something
> >    that we cannot do at the moment due to the lack of offsetting
> 
> > - provide a framework that is suitable for NV support, where we get
> >    both global and per timer, per vcpu offsetting
> 
> 
> If I am understanding your changes correctly, you introduce some VM-wide
> timers with no hardware backing and a new API to access them from
> userspace. This is useful both because it makes the code simpler (no
> need to manually keep registers in sync), and so CNT{V,P}OFF can be
> appropriately virtualized with NV. Is that a fair summary of the
> important bits?

I think you are conflating a lot of things:

- We always were able to fully emulate a timer: see how non-VHE
  emulates the physical timer. What this allows is to move a HW timer
  from being "direct" to being *partially* emulated, still using the
  HW, but offsetting the counter (and thus requiring trapping).

- CNTPOFF isn't virtualised yet. Actually, none of ECV is virtualised.
  And I'm not convinced it can be virtualised if the HW doesn't
  support it.

- There is no new API for the timers. We only expose the physical
  timer within the existing API. The new API is solely for the purpose
  of setting the offsets.

> 
> > This has been moderately tested with nVHE, VHE and NV. I do not have
> > access to CNTPOFF-aware HW, so the jury is still out on that one. Note
> > that the NV patches in this series are here to give a perspective on
> > how this gets used.
> 
> > I've updated the arch_timer selftest to allow offsets to be provided
> > from the command line, but the arch_test is pretty flimsy and tends to
> > fail with an error==EINTR, even without this series. Something to
> > investigate.
> 
> I can help you with testing because I have access to CNTPOFF-aware
> hardware and emulators. Is there anything you are especially interested
> to try out?

Well, anything that makes use of CNTPOFF would be most helpful,
assuming you're confident that the quality of implementation of the HW
is sufficient (I have terrible memories of finding timer bugs in
emulation due to badly emulated clock domains...).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 06/16] KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer
  2023-02-23 22:34   ` Colton Lewis
@ 2023-02-24  8:59     ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-24  8:59 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

On Thu, 23 Feb 2023 22:34:19 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> 
> CNTPOFF_EL2 should probably be added to the vcpu_sysreg enum in
> kvm_host.h for this patch. This seemed like the most relevant commit.
> 
> Marc Zyngier <maz@kernel.org> writes:
> 
> > +static bool has_cntpoff(void)
> > +{
> > +	return (has_vhe() && cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF));
> > +}
> > +
> 
> This being used to guard the register in set_cntpoff seems to say that
> we can only write CNTPOFF in VHE mode. Since it's possible the
> underlying hardware could still support CNTPOFF even if KVM is running
> in nVHE mode, should there be a way to set CNTPOFF in nVHE mode? Is that
> possible?

It would have to happen at EL2 (the register is called CNTPOFF_EL2,
which should give you a clue). This code execute at EL1 when running
nVHE, so any CNTPOFF_EL2 access would UNDEF.

> 
> This caused some confusion for me on my implementation as some teammates
> thought so but I could not get an implementation working for nVHE mode.

Using CNTPOFF for nVHE is utterly pointless unless you start
offloading the host's physical timer to the EL2 physical timer on
entry. Otherwise, you completely screw the host timer and everybody
dies.

> > @@ -84,7 +89,7 @@ u64 timer_get_cval(struct arch_timer_context *ctxt)
> 
> >   static u64 timer_get_offset(struct arch_timer_context *ctxt)
> >   {
> > -	if (ctxt->offset.vm_offset)
> > +	if (ctxt && ctxt->offset.vm_offset)
> >   		return *ctxt->offset.vm_offset;
> 
> >   	return 0;
> 
> nit: this change should be in the previous commit

No. I don't even think this is required anymore, due to the way the
offset is now handled on the VHE-only path.

> 
> > @@ -480,6 +491,7 @@ static void timer_save_state(struct
> > arch_timer_context *ctx)
> >   		write_sysreg_el0(0, SYS_CNTP_CTL);
> >   		isb();
> 
> > +		set_cntpoff(0);
> >   		break;
> >   	case NR_KVM_TIMERS:
> >   		BUG();
> 
> This seems to say CNTPOFF will be reset to 0 every time a vcpu
> switches out. What if the host originally had some value other than 0?
> Is KVM responsible for that context?

The host never has any value other than zero, just like it doesn't
have a value other than 0 for CNTVOFF_EL2. See the comment a few lines
above that explain the rationale for CNTVOFF, which mostly applies to
CNTPOFF as well.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 07/16] KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2
  2023-02-23 22:40   ` Colton Lewis
@ 2023-02-24 10:54     ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-24 10:54 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

On Thu, 23 Feb 2023 22:40:25 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> 
> Marc Zyngier <maz@kernel.org> writes:
> 
> > +/* If _pred is true, set bit in _set, otherwise set it in _clr */
> > +#define assign_clear_set_bit(_pred, _bit, _clr, _set)			\
> > +	do {								\
> > +		if (_pred)						\
> > +			(_set) |= (_bit);				\
> > +		else							\
> > +			(_clr) |= (_bit);				\
> > +	} while (0)
> > +
> 
> I don't think the do-while wrapper is necessary. Is there any reason
> besides style guide conformance?

It is if you want to avoid a stray ';'.

> > +	/*
> > +	 * We have two possibility to deal with a physical offset:
> > +	 *
> > +	 * - Either we have CNTPOFF (yay!) or the offset is 0:
> > +	 *   we let the guest freely access the HW
> > +	 *
> > +	 * - or neither of these condition apply:
> > +	 *   we trap accesses to the HW, but still use it
> > +	 *   after correcting the physical offset
> > +	 */
> > +	if (!has_cntpoff() && timer_get_offset(map->direct_ptimer))
> > +		tpt = tpc = true;
> 
> If there are only two possibilites, then two different booleans makes
> things more complicated than it has to be.

Each boolean denotes a different architectural state. They are
separate so that someone can:

- easily understand what is going on

- affect one without affecting the other when extending this code

The "common state" is what we had before, and it was a real pig to
reverse engineer *my own code*. Yes, this is job security, but I don't
think that's a good enough reason! ;-)

So I contend that two bools make things far simpler to reason about
these things.

> 
> > +	assign_clear_set_bit(tpt, CNTHCTL_EL1PCEN << 10, set, clr);
> > +	assign_clear_set_bit(tpc, CNTHCTL_EL1PCTEN << 10, set, clr);
> 
> Might be good to name the 10 something like VHE_SHIFT so people know why
> it is applied.

VHE_SHIFT really doesn't mean more that '10' because it doesn't tell
you *why* you have to do this.

The real way of solving that one is it move everything to the sysreg
generation *and* have a way to contextualise the sysreg generation
based on features and other controls (see the discussion about
FEAT_CCIDX as an example).

>
> > +
> > +
> > +	timer_set_traps(vcpu, &map);
> >   }
> 
> >   bool kvm_timer_should_notify_user(struct kvm_vcpu *vcpu)
> > @@ -1293,27 +1363,12 @@ int kvm_timer_enable(struct kvm_vcpu *vcpu)
> >   }
> 
> >   /*
> > - * On VHE system, we only need to configure the EL2 timer trap
> > register once,
> > - * not for every world switch.
> > - * The host kernel runs at EL2 with HCR_EL2.TGE == 1,
> > - * and this makes those bits have no effect for the host kernel
> > execution.
> > + * If we have CNTPOFF, permanently set ECV to enable it.
> >    */
> >   void kvm_timer_init_vhe(void)
> >   {
> > -	/* When HCR_EL2.E2H ==1, EL1PCEN and EL1PCTEN are shifted by 10 */
> > -	u32 cnthctl_shift = 10;
> > -	u64 val;
> > -
> > -	/*
> > -	 * VHE systems allow the guest direct access to the EL1 physical
> > -	 * timer/counter.
> > -	 */
> > -	val = read_sysreg(cnthctl_el2);
> > -	val |= (CNTHCTL_EL1PCEN << cnthctl_shift);
> > -	val |= (CNTHCTL_EL1PCTEN << cnthctl_shift);
> >   	if (cpus_have_final_cap(ARM64_HAS_ECV_CNTPOFF))
> > -		val |= CNTHCTL_ECV;
> > -	write_sysreg(val, cnthctl_el2);
> > +		sysreg_clear_set(cntkctl_el1, 0, CNTHCTL_ECV);
> >   }
> 
> What is the reason for moving these register writes from initialization
> to vcpu load time? This contradicts the comment that says this is only
> needed once and not at every world switch. Seems like doing more work
> for no reason.

You did notice that the comment got *removed*, so that there is no
contradiction?

You also understand that with a physical offset, and in the absence of
CNTPOFF, we cannot grant access to the physical counter/timer to the
guest?

Finally, given that we always have to write various bits of
CNTKCTL_EL1 for other reasons, moving this settings shouldn't result
in any extra work (specially considering that they don't require any
extra synchronisation).

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-23 22:41   ` Colton Lewis
@ 2023-02-24 11:24     ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-24 11:24 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

On Thu, 23 Feb 2023 22:41:13 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> Marc Zyngier <maz@kernel.org> writes:
> 
> > Once this new API is used, there is no going back, and the counters
> > cannot be written to to set the offsets implicitly (the writes
> > are instead ignored).
> 
> Why do this? I can't see a reason for disabling the other API the first
> time this one is used.

I can't see a reason not to. The new API is VM-wide. The old one
operates on a per-vcpu basis. What sense does it make to accept
something that directly conflicts with the previous actions from
userspace?

Once userspace has bought into the new API, it should use it
consistently.  The only reason we don't reject the write with an error
is to allow userspace to keep using the vcpu register dump as an
opaque object that it doesn't have to scan and amend.

> 
> > In keeping with the architecture, the offsets are expressed as
> > a delta that is substracted from the physical counter value.
>                   ^
> nit: subtracted
> 
> > +/*
> > + * Counter/Timer offset structure. Describe the virtual/physical offsets.
> > + * To be used with KVM_ARM_SET_CNT_OFFSETS.
> > + */
> > +struct kvm_arm_counter_offsets {
> > +	__u64 virtual_offset;
> > +	__u64 physical_offset;
> > +
> > +#define KVM_COUNTER_SET_VOFFSET_FLAG	(1UL << 0)
> > +#define KVM_COUNTER_SET_POFFSET_FLAG	(1UL << 1)
> > +
> > +	__u64 flags;
> > +	__u64 reserved;
> > +};
> > +
> 
> It looks weird to have the #defines in the middle of the struct like
> that. I think it would be easier to read with the #defines before the
> struct.

I do like it, as it perfectly shows in which context these #defines
are valid. This is also a common idiom used all over the existing KVM
code (just take a look at kvm_run for the canonical example).

> 
> > @@ -852,9 +852,11 @@ void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu)
> >   	ptimer->vcpu = vcpu;
> >   	ptimer->offset.vm_offset = &vcpu->kvm->arch.offsets.poffset;
> 
> > -	/* Synchronize cntvoff across all vtimers of a VM. */
> > -	timer_set_offset(vtimer, kvm_phys_timer_read());
> > -	timer_set_offset(ptimer, 0);
> > +	/* Synchronize offsets across timers of a VM if not already provided */
> > +	if (!test_bit(KVM_ARCH_FLAG_COUNTER_OFFSETS, &vcpu->kvm->arch.flags)) {
> > +		timer_set_offset(vtimer, kvm_phys_timer_read());
> > +		timer_set_offset(ptimer, 0);
> > +	}
> 
> >   	hrtimer_init(&timer->bg_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS_HARD);
> >   	timer->bg_timer.function = kvm_bg_timer_expire;
> 
> The code says "assign the offsets if the KVM_ARCH_FLAG_COUNTER_OFFSETS
> flag is not on". The flag name is confusing and made it hard for me to
> understand the intent. I think the intent is to only assign the offsets
> if the user has not called the API to provide some offsets (that would
> have been assigned in the API call along with flipping the flag
> on). With that in mind, I would prefer the flag name reference the
> user. KVM_ARCH_FLAG_USER_OFFSETS

All offsets are provided by the user, no matter what API they used, so
I don't think this adds much clarity. The real distinction is between
the offsets being set by writing a vcpu attribute or a VM attribute.

By this token, I'd suggest KVM_ARM_FLAG_VM_COUNTER_OFFSETS.

Thoughts?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit
  2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
                   ` (17 preceding siblings ...)
  2023-02-23 22:29 ` Colton Lewis
@ 2023-02-24 20:07 ` Colton Lewis
  18 siblings, 0 replies; 55+ messages in thread
From: Colton Lewis @ 2023-02-24 20:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

Hello Marc,

This patch is of special interest to me as I was working on my own
ECV/CNTPOFF implementation, although mine was more narrow and doesn't
address any of your other goals. As far as I can tell at the moment,
your patch can cover the uses of mine, so I will dedicate
myself to reviewing and testing yours. Please include me on any future
rerolls of this patch.

Marc Zyngier <maz@kernel.org> writes:

> This series aims at satisfying multiple goals:

> - allow a VMM to atomically restore a timer offset for a whole VM
>    instead of updating the offset each time a vcpu get its counter
>    written

> - allow a VMM to save/restore the physical timer context, something
>    that we cannot do at the moment due to the lack of offsetting

> - provide a framework that is suitable for NV support, where we get
>    both global and per timer, per vcpu offsetting


If I am understanding your changes correctly, you introduce some VM-wide
timers with no hardware backing and a new API to access them from
userspace. This is useful both because it makes the code simpler (no
need to manually keep registers in sync), and so CNT{V,P}OFF can be
appropriately virtualized with NV. Is that a fair summary of the
important bits?

> This has been moderately tested with nVHE, VHE and NV. I do not have
> access to CNTPOFF-aware HW, so the jury is still out on that one. Note
> that the NV patches in this series are here to give a perspective on
> how this gets used.

> I've updated the arch_timer selftest to allow offsets to be provided
> from the command line, but the arch_test is pretty flimsy and tends to
> fail with an error==EINTR, even without this series. Something to
> investigate.

I can help you with testing because I have access to CNTPOFF-aware
hardware and emulators. Is there anything you are especially interested
to try out?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/16] KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset
  2023-02-16 14:21 ` [PATCH 12/16] KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset Marc Zyngier
@ 2023-02-24 20:07   ` Colton Lewis
  2023-02-25 10:32     ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Colton Lewis @ 2023-02-24 20:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

Marc Zyngier <maz@kernel.org> writes:

> diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> index 62ef4883e644..e76e513b90c5 100644
> --- a/include/kvm/arm_arch_timer.h
> +++ b/include/kvm/arm_arch_timer.h
> @@ -29,6 +29,11 @@ struct arch_timer_offset {
>   	 * structure. If NULL, assume a zero offset.
>   	 */
>   	u64	*vm_offset;
> +	/*
> +	 * If set, pointer to one of the offsets in the vcpu's sysreg
> +	 * array. If NULL, assume a zero offset.
> +	 */
> +	u64	*vcpu_offset;
>   };

>   struct arch_timer_vm_offsets {
> --
> 2.34.1

This pointer isn't initialized until next commit and this commit is
small so I think it should be merged with the next one.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/16] KVM: arm64: nv: timers: Support hyp timer emulation
  2023-02-16 14:21 ` [PATCH 13/16] KVM: arm64: nv: timers: Support hyp timer emulation Marc Zyngier
@ 2023-02-24 20:08   ` Colton Lewis
  2023-02-25 10:34     ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Colton Lewis @ 2023-02-24 20:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2,
	christoffer.dall

I hope someone else looks at this one because I don't understand NV.

Marc Zyngier <maz@kernel.org> writes:

> +	struct arch_timer_context *ctx;
> +
> +	ctx = (vcpu_has_nv(vcpu) && is_hyp_ctxt(vcpu)) ? vcpu_hvtimer(vcpu)
> +						       : vcpu_vtimer(vcpu);

I don't think that ternary is the easiest to read and would prefer an if
statement.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 12/16] KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset
  2023-02-24 20:07   ` Colton Lewis
@ 2023-02-25 10:32     ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-25 10:32 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

On Fri, 24 Feb 2023 20:07:59 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> Marc Zyngier <maz@kernel.org> writes:
> 
> > diff --git a/include/kvm/arm_arch_timer.h b/include/kvm/arm_arch_timer.h
> > index 62ef4883e644..e76e513b90c5 100644
> > --- a/include/kvm/arm_arch_timer.h
> > +++ b/include/kvm/arm_arch_timer.h
> > @@ -29,6 +29,11 @@ struct arch_timer_offset {
> >   	 * structure. If NULL, assume a zero offset.
> >   	 */
> >   	u64	*vm_offset;
> > +	/*
> > +	 * If set, pointer to one of the offsets in the vcpu's sysreg
> > +	 * array. If NULL, assume a zero offset.
> > +	 */
> > +	u64	*vcpu_offset;
> >   };
> 
> >   struct arch_timer_vm_offsets {
> > --
> > 2.34.1
> 
> This pointer isn't initialized until next commit and this commit is
> small so I think it should be merged with the next one.

Like the rest of the vcpu structure, this is of course initialised to
zero. Yes, the patch is small. It does *one* thing, which is to plug a
new offset.

Should it be buried into a much bigger patch? I don't think so.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 13/16] KVM: arm64: nv: timers: Support hyp timer emulation
  2023-02-24 20:08   ` Colton Lewis
@ 2023-02-25 10:34     ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-02-25 10:34 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2,
	christoffer.dall

On Fri, 24 Feb 2023 20:08:15 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> I hope someone else looks at this one because I don't understand NV.

Don't worry, you're not alone.

> Marc Zyngier <maz@kernel.org> writes:
> 
> > +	struct arch_timer_context *ctx;
> > +
> > +	ctx = (vcpu_has_nv(vcpu) && is_hyp_ctxt(vcpu)) ? vcpu_hvtimer(vcpu)
> > +						       : vcpu_vtimer(vcpu);
> 
> I don't think that ternary is the easiest to read and would prefer an if
> statement.

As the saying goes, there is no accounting for taste. Do you see a
technical issue with this statement?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
  2023-02-16 14:21 ` [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets Marc Zyngier
@ 2023-03-06 22:08   ` Colton Lewis
  2023-03-09  9:01     ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Colton Lewis @ 2023-03-06 22:08 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

Hi Marc,

First of all, thanks for your previous responses to my comments. Many of
them clarified things I did not fully understand on my own.

As I stated in another email, I've been testing this series on ECV
capable hardware. Things look good but I have been able to reproduce a
consistent assertion failure in this selftest when setting a
sufficiently large physical offset. I have so far not been able to
determine the cause of the failure and wonder if you have any insight as
to what might be causing this and how to debug.

The following example reproduces the error every time I have tried:

mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm# ./aarch64/arch_timer  
-O 0xffff
==== Test Assertion Failure ====
   aarch64/arch_timer.c:239: false
   pid=48094 tid=48095 errno=4 - Interrupted system call
      1  0x4010fb: test_vcpu_run at arch_timer.c:239
      2  0x42a5bf: start_thread at pthread_create.o:0
      3  0x46845b: thread_start at clone.o:0
   Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2

Observations:

- Failure always occurs at stage 3 or 4 (physical timer stages)
- xcnt_diff_us is always slightly less than 10000, or 10 ms
- Reducing offset size reduces the probability of failure linearly (for
   example, -O 0x8000 will fail close to half the time)
- Failure occurs with a wide range of different period values and
   whether or not migrations happen


^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-02-23 18:25             ` Marc Zyngier
@ 2023-03-08  7:46               ` Oliver Upton
  2023-03-08  7:53                 ` Oliver Upton
  2023-03-09  8:25                 ` Marc Zyngier
  0 siblings, 2 replies; 55+ messages in thread
From: Oliver Upton @ 2023-03-08  7:46 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

Hey Marc,

On Thu, Feb 23, 2023 at 06:25:57PM +0000, Marc Zyngier wrote:

[...]

> > Do we need to bend over backwards for a theoretical use case with
> > the new UAPI? If anyone depends on the existing behavior then they can
> > continue to use the old UAPI to partially migrate the guest counters.
> 
> I don't buy the old/new thing. My take is that these things should be
> cumulative if there isn't a hard reason to break the existing API.

Unsurprisingly, I may have been a bit confusing in my replies to you.

I have zero interest in breaking the existing API. Any suggestion of
'changing the rules' was more along the lines of providing an alternate
scheme for the counters and letting the quirks of the old interface
continue.

> > My previous suggestion of tying the physical and virtual counters
> > together at VM creation would definitely break such a use case, though,
> > so we'd be at the point of requiring explicit opt-in from userspace.
> 
> I'm trying to find a middle ground, so bear with me. Here's the
> situation as I see it:
> 
> (1) a VM that is migrating today can only set the virtual offset and
>     doesn't affect the physical counter. This behaviour must be
>     preserved in we cannot prove that nobody relies on it.
> 
> (2) setting the physical offset could be done by two means:
> 
>     (a) writing the counter register (like we do for CNTVCT)
>     (b) providing an offset via a side channel
> 
> I think (1) must stay forever, just like we still support the old
> GICv2 implicit initialisation.

No argument here. Unless userspace pokes some new bit of UAPI, the old
behavior of CNTVCT must live on.

> (2a) is also desirable as it requires no extra work on the VMM side.
> Just restore the damn thing, and nothing changes (we're finally able
> to migrate the physical timer). (2b) really is icing on the cake.
> 
> The question is whether we can come up with an API offering (2b) that
> still allows (1) and (2a). I'd be happy with a new API that, when
> used, resets both offsets to the same value, matching your pretty
> picture. But the dual offsetting still has to exist internally.
> 
> When it comes to NV, it uses either the physical offset that has been
> provided by writing CNTPCT, or the one that has been written via the
> new API. Under the hood, this is the same piece of data, of course.
> 
> The only meaningful difference with my initial proposal is that there
> is no new virtual offset API. It is either register writes that obey
> the same rules as before, or a single offset setting.

I certainly agree that (2a) is highly desirable to get existing VMMs to
'do the right thing' for free. Playing devil's advocate, would this not
also break the tracing example you've given of correlating timestamps
between the host and guest? I wouldn't expect a userspace + VM tracing
contraption to live migrate but restoring from a snapshot seems
plausible.

Regardless, I like the general direction you've proposed. IIUC, you'll
want to go ahead with ignoring writes to CNT{P,V}CT if the offset was
written by userspace, right?

-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-03-08  7:46               ` Oliver Upton
@ 2023-03-08  7:53                 ` Oliver Upton
  2023-03-09  8:29                   ` Marc Zyngier
  2023-03-09  8:25                 ` Marc Zyngier
  1 sibling, 1 reply; 55+ messages in thread
From: Oliver Upton @ 2023-03-08  7:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

On Wed, Mar 08, 2023 at 07:46:00AM +0000, Oliver Upton wrote:
> Hey Marc,
> 
> On Thu, Feb 23, 2023 at 06:25:57PM +0000, Marc Zyngier wrote:
> 
> [...]
> 
> > > Do we need to bend over backwards for a theoretical use case with
> > > the new UAPI? If anyone depends on the existing behavior then they can
> > > continue to use the old UAPI to partially migrate the guest counters.
> > 
> > I don't buy the old/new thing. My take is that these things should be
> > cumulative if there isn't a hard reason to break the existing API.
> 
> Unsurprisingly, I may have been a bit confusing in my replies to you.
> 
> I have zero interest in breaking the existing API. Any suggestion of
> 'changing the rules' was more along the lines of providing an alternate
> scheme for the counters and letting the quirks of the old interface
> continue.
> 
> > > My previous suggestion of tying the physical and virtual counters
> > > together at VM creation would definitely break such a use case, though,
> > > so we'd be at the point of requiring explicit opt-in from userspace.
> > 
> > I'm trying to find a middle ground, so bear with me. Here's the
> > situation as I see it:
> > 
> > (1) a VM that is migrating today can only set the virtual offset and
> >     doesn't affect the physical counter. This behaviour must be
> >     preserved in we cannot prove that nobody relies on it.
> > 
> > (2) setting the physical offset could be done by two means:
> > 
> >     (a) writing the counter register (like we do for CNTVCT)
> >     (b) providing an offset via a side channel
> > 
> > I think (1) must stay forever, just like we still support the old
> > GICv2 implicit initialisation.
> 
> No argument here. Unless userspace pokes some new bit of UAPI, the old
> behavior of CNTVCT must live on.
> 
> > (2a) is also desirable as it requires no extra work on the VMM side.
> > Just restore the damn thing, and nothing changes (we're finally able
> > to migrate the physical timer). (2b) really is icing on the cake.
> > 
> > The question is whether we can come up with an API offering (2b) that
> > still allows (1) and (2a). I'd be happy with a new API that, when
> > used, resets both offsets to the same value, matching your pretty
> > picture. But the dual offsetting still has to exist internally.
> > 
> > When it comes to NV, it uses either the physical offset that has been
> > provided by writing CNTPCT, or the one that has been written via the
> > new API. Under the hood, this is the same piece of data, of course.
> > 
> > The only meaningful difference with my initial proposal is that there
> > is no new virtual offset API. It is either register writes that obey
> > the same rules as before, or a single offset setting.
> 
> I certainly agree that (2a) is highly desirable to get existing VMMs to
> 'do the right thing' for free. Playing devil's advocate, would this not
> also break the tracing example you've given of correlating timestamps
> between the host and guest? I wouldn't expect a userspace + VM tracing
> contraption to live migrate but restoring from a snapshot seems
> plausible.

The problem I'm alluding to here is that the VMM will save/restore
the physical counter value and cause KVM to offset the physical counter.
Live migration is a pretty obvious example, but resuming from a snapshot
after resetting a system be similarly affected.

> Regardless, I like the general direction you've proposed. IIUC, you'll
> want to go ahead with ignoring writes to CNT{P,V}CT if the offset was
> written by userspace, right?
> 
> -- 
> Thanks,
> Oliver
> 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-03-08  7:46               ` Oliver Upton
  2023-03-08  7:53                 ` Oliver Upton
@ 2023-03-09  8:25                 ` Marc Zyngier
  1 sibling, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-03-09  8:25 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

Hi Oliver,

On Wed, 08 Mar 2023 07:46:00 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:

[...]

> I certainly agree that (2a) is highly desirable to get existing VMMs to
> 'do the right thing' for free. Playing devil's advocate, would this not
> also break the tracing example you've given of correlating timestamps
> between the host and guest? I wouldn't expect a userspace + VM tracing
> contraption to live migrate but restoring from a snapshot seems
> plausible.

It really depends when this VM was saved.

If you saved it on an old host that doesn't expose CNTPCT and restore
it on a new host, the physical offset is still zero (this is already
special-cased), and there is no difference in behaviour.

If you saved it from a host that does expose CNTPCT, then the
behaviour changes. But should we really care?

> Regardless, I like the general direction you've proposed. IIUC, you'll
> want to go ahead with ignoring writes to CNT{P,V}CT if the offset was
> written by userspace, right?

That'd be my preference in order not to break the "blind restore"
behaviour that QEMU already uses for about everything.

I'll repost the series shortly, as it has grown some extra goodies
such as moving the PPI settings out of the way...

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets
  2023-03-08  7:53                 ` Oliver Upton
@ 2023-03-09  8:29                   ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-03-09  8:29 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, kvm, linux-arm-kernel, James Morse, Suzuki K Poulose,
	Zenghui Yu, Ricardo Koller, Simon Veith, dwmw2

On Wed, 08 Mar 2023 07:53:46 +0000,
Oliver Upton <oliver.upton@linux.dev> wrote:
> 
> > I certainly agree that (2a) is highly desirable to get existing VMMs to
> > 'do the right thing' for free. Playing devil's advocate, would this not
> > also break the tracing example you've given of correlating timestamps
> > between the host and guest? I wouldn't expect a userspace + VM tracing
> > contraption to live migrate but restoring from a snapshot seems
> > plausible.
> 
> The problem I'm alluding to here is that the VMM will save/restore
> the physical counter value and cause KVM to offset the physical counter.
> Live migration is a pretty obvious example, but resuming from a snapshot
> after resetting a system be similarly affected.

My take on this is that if you have produced the snapshot on a
pre-CNTPCT host, there will be no change in behaviour. If you've
produced the snapshot on a new host, you get the new behaviour.

I am willing to be accommodating to the use case, but only to a
certain extent! ;-)

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
  2023-03-06 22:08   ` Colton Lewis
@ 2023-03-09  9:01     ` Marc Zyngier
  2023-03-10 19:26       ` Colton Lewis
  0 siblings, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-03-09  9:01 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

On Mon, 06 Mar 2023 22:08:04 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> Hi Marc,
> 
> First of all, thanks for your previous responses to my comments. Many of
> them clarified things I did not fully understand on my own.
> 
> As I stated in another email, I've been testing this series on ECV
> capable hardware. Things look good but I have been able to reproduce a
> consistent assertion failure in this selftest when setting a
> sufficiently large physical offset. I have so far not been able to
> determine the cause of the failure and wonder if you have any insight as
> to what might be causing this and how to debug.
> 
> The following example reproduces the error every time I have tried:
> 
> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
> ./aarch64/arch_timer -O 0xffff
> ==== Test Assertion Failure ====
>   aarch64/arch_timer.c:239: false
>   pid=48094 tid=48095 errno=4 - Interrupted system call
>      1  0x4010fb: test_vcpu_run at arch_timer.c:239
>      2  0x42a5bf: start_thread at pthread_create.o:0
>      3  0x46845b: thread_start at clone.o:0
>   Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2

The fun part is that you can see similar things without the series:

==== Test Assertion Failure ====
  aarch64/arch_timer.c:239: false
  pid=647 tid=651 errno=4 - Interrupted system call
     1  0x00000000004026db: test_vcpu_run at arch_timer.c:239
     2  0x00007fffb13cedd7: ?? ??:0
     3  0x00007fffb1437e9b: ?? ??:0
  Failed guest assert: config_iter + 1 == irq_iter at aarch64/arch_timer.c:188
values: 2, 3; 0, vcpu 3; stage; 4; iter: 3

That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
without any argument in a loop. After a few iterations, it blows.

>
> Observations:
> 
> - Failure always occurs at stage 3 or 4 (physical timer stages)
> - xcnt_diff_us is always slightly less than 10000, or 10 ms
> - Reducing offset size reduces the probability of failure linearly (for
>   example, -O 0x8000 will fail close to half the time)
> - Failure occurs with a wide range of different period values and
>   whether or not migrations happen

The problem is that I don't understand enough of the test to make a
judgement call. I hardly get *what* it is testing. Do you?

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
  2023-03-09  9:01     ` Marc Zyngier
@ 2023-03-10 19:26       ` Colton Lewis
  2023-03-12 15:53         ` Marc Zyngier
  2023-03-13 11:43         ` Marc Zyngier
  0 siblings, 2 replies; 55+ messages in thread
From: Colton Lewis @ 2023-03-10 19:26 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

Marc Zyngier <maz@kernel.org> writes:

>> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
>> ./aarch64/arch_timer -O 0xffff
>> ==== Test Assertion Failure ====
>>    aarch64/arch_timer.c:239: false
>>    pid=48094 tid=48095 errno=4 - Interrupted system call
>>       1  0x4010fb: test_vcpu_run at arch_timer.c:239
>>       2  0x42a5bf: start_thread at pthread_create.o:0
>>       3  0x46845b: thread_start at clone.o:0
>>    Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
>> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2

> The fun part is that you can see similar things without the series:

> ==== Test Assertion Failure ====
>    aarch64/arch_timer.c:239: false
>    pid=647 tid=651 errno=4 - Interrupted system call
>       1  0x00000000004026db: test_vcpu_run at arch_timer.c:239
>       2  0x00007fffb13cedd7: ?? ??:0
>       3  0x00007fffb1437e9b: ?? ??:0
>    Failed guest assert: config_iter + 1 == irq_iter at  
> aarch64/arch_timer.c:188
> values: 2, 3; 0, vcpu 3; stage; 4; iter: 3

> That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
> without any argument in a loop. After a few iterations, it blows.

These things are different failures. The first I've only ever found when
setting the -O option. What command did you use to trigger the second if
there were any non-default options?

Another interesting finding is that I can't reproduce any problems using
ARM's emulated platform. There is a possibility these errors are
ultimately down to individual hardware quirks, but that's still worth
understanding since everyone uses hardware and not emulators.

> The problem is that I don't understand enough of the test to make a
> judgement call. I hardly get *what* it is testing. Do you?

My understanding is the test validates timer interrupts are occuring
when the ARM manual says they should. It sets a comparison value (cval)
at some point a few miliseconds into the future and waits for the
counter (xcnt) to be greater than or equal to the comparison value, at
which point an interrupt should fire.

The failure I posted occurs at a line that says

GUEST_ASSERT_3(xcnt >= cval, xcnt, cval, xcnt_diff_us);

The counter was less than the comparison value, which implies the
interrupt fired early. Do we care? I don't know. I think it's weird that
this occurs when I set a physical offset with -O and no other time.

I've also noticed that the greater the offset I set, the greater the
difference between xcnt and cval. I think the physical offset is not
being accounted for every place it should. At the very least, that
indicates change is required in the test.

The failure you posted occurs at a line that says

GUEST_ASSERT_2(config_iter + 1 == irq_iter,
		config_iter + 1, irq_iter);

I gather from context that the values were unequal because an expected
interrupt never fired or was not counted. Do we care? I don't know. I
think someone should.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
  2023-03-10 19:26       ` Colton Lewis
@ 2023-03-12 15:53         ` Marc Zyngier
  2023-03-13 11:43         ` Marc Zyngier
  1 sibling, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-03-12 15:53 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

On Fri, 10 Mar 2023 19:26:47 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> Marc Zyngier <maz@kernel.org> writes:
> 
> >> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
> >> ./aarch64/arch_timer -O 0xffff
> >> ==== Test Assertion Failure ====
> >>    aarch64/arch_timer.c:239: false
> >>    pid=48094 tid=48095 errno=4 - Interrupted system call
> >>       1  0x4010fb: test_vcpu_run at arch_timer.c:239
> >>       2  0x42a5bf: start_thread at pthread_create.o:0
> >>       3  0x46845b: thread_start at clone.o:0
> >>    Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
> >> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2
> 
> > The fun part is that you can see similar things without the series:
> 
> > ==== Test Assertion Failure ====
> >    aarch64/arch_timer.c:239: false
> >    pid=647 tid=651 errno=4 - Interrupted system call
> >       1  0x00000000004026db: test_vcpu_run at arch_timer.c:239
> >       2  0x00007fffb13cedd7: ?? ??:0
> >       3  0x00007fffb1437e9b: ?? ??:0
> >    Failed guest assert: config_iter + 1 == irq_iter at
> > aarch64/arch_timer.c:188
> > values: 2, 3; 0, vcpu 3; stage; 4; iter: 3
> 
> > That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
> > without any argument in a loop. After a few iterations, it blows.
> 
> These things are different failures. The first I've only ever found when
> setting the -O option. What command did you use to trigger the second if
> there were any non-default options?

As I already said: "without any argument".

maz@babette:~$ ./arch_timer 
==== Test Assertion Failure ====
  aarch64/arch_timer.c:239: false
  pid=1110 tid=1113 errno=4 - Interrupted system call
     1	0x000000000040268b: test_vcpu_run at arch_timer.c:239
     2	0x00007fff9c48edd7: ?? ??:0
     3	0x00007fff9c4f7e9b: ?? ??:0
  Failed guest assert: config_iter + 1 == irq_iter at aarch64/arch_timer.c:188
values: 3, 4; 0, vcpu 1; stage; 4; iter: 4

As simple as it gets. So either KVM is terminally buggy (quite
possible), or this test is. My money is on the second one.

> Another interesting finding is that I can't reproduce any problems using
> ARM's emulated platform. There is a possibility these errors are
> ultimately down to individual hardware quirks, but that's still worth
> understanding since everyone uses hardware and not emulators.
> 
> > The problem is that I don't understand enough of the test to make a
> > judgement call. I hardly get *what* it is testing. Do you?
> 
> My understanding is the test validates timer interrupts are occuring
> when the ARM manual says they should. It sets a comparison value (cval)
> at some point a few miliseconds into the future and waits for the
> counter (xcnt) to be greater than or equal to the comparison value, at
> which point an interrupt should fire.
> 
> The failure I posted occurs at a line that says
> 
> GUEST_ASSERT_3(xcnt >= cval, xcnt, cval, xcnt_diff_us);
> 
> The counter was less than the comparison value, which implies the
> interrupt fired early. Do we care? I don't know. I think it's weird that
> this occurs when I set a physical offset with -O and no other time.

The thing is, you say nothing about your hardware. What is it? does it
have ECV? Does it have CNTPOFF? If it has any of those, does it help
if you disable this support?

> I've also noticed that the greater the offset I set, the greater the
> difference between xcnt and cval. I think the physical offset is not
> being accounted for every place it should. At the very least, that
> indicates change is required in the test.
> 
> The failure you posted occurs at a line that says
> 
> GUEST_ASSERT_2(config_iter + 1 == irq_iter,
> 		config_iter + 1, irq_iter);
> 
> I gather from context that the values were unequal because an expected
> interrupt never fired or was not counted. Do we care? I don't know. I
> think someone should.

What is the point of a test that fails randomly without anyone
understanding what it is supposed to do? If that's the state of the
selftests, maybe I should just go and remove the aarch64 directory.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
  2023-03-10 19:26       ` Colton Lewis
  2023-03-12 15:53         ` Marc Zyngier
@ 2023-03-13 11:43         ` Marc Zyngier
  2023-03-14 17:47           ` Colton Lewis
  1 sibling, 1 reply; 55+ messages in thread
From: Marc Zyngier @ 2023-03-13 11:43 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

On Fri, 10 Mar 2023 19:26:47 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> Marc Zyngier <maz@kernel.org> writes:
> 
> >> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
> >> ./aarch64/arch_timer -O 0xffff
> >> ==== Test Assertion Failure ====
> >>    aarch64/arch_timer.c:239: false
> >>    pid=48094 tid=48095 errno=4 - Interrupted system call
> >>       1  0x4010fb: test_vcpu_run at arch_timer.c:239
> >>       2  0x42a5bf: start_thread at pthread_create.o:0
> >>       3  0x46845b: thread_start at clone.o:0
> >>    Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
> >> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2
> 
> > The fun part is that you can see similar things without the series:
> 
> > ==== Test Assertion Failure ====
> >    aarch64/arch_timer.c:239: false
> >    pid=647 tid=651 errno=4 - Interrupted system call
> >       1  0x00000000004026db: test_vcpu_run at arch_timer.c:239
> >       2  0x00007fffb13cedd7: ?? ??:0
> >       3  0x00007fffb1437e9b: ?? ??:0
> >    Failed guest assert: config_iter + 1 == irq_iter at
> > aarch64/arch_timer.c:188
> > values: 2, 3; 0, vcpu 3; stage; 4; iter: 3
> 
> > That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
> > without any argument in a loop. After a few iterations, it blows.

I finally got to the bottom of that one. This is yet another case of
the test making the assumption that spurious interrupts don't exist...

Here, the timer interrupt has been masked at the source, but the GIC
(or its emulation) can be slow to retire it. So we take it again,
spuriously, and account it as a true interrupt. None of the asserts in
the timer handler fire because they only check the *previous* state.

Eventually, the interrupt retires and we progress to the next
iteration. But in the meantime, we have incremented the irq counter by
the number of spurious events, and the test fails.

The obvious fix is to check for the timer state in the handler and
exit early if the timer interrupt is masked or the timer disabled.
With that, I don't see these failures anymore.

I've folded that into the patch that already deals with some spurious
events.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
  2023-03-13 11:43         ` Marc Zyngier
@ 2023-03-14 17:47           ` Colton Lewis
  2023-03-14 18:18             ` Marc Zyngier
  0 siblings, 1 reply; 55+ messages in thread
From: Colton Lewis @ 2023-03-14 17:47 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

Marc Zyngier <maz@kernel.org> writes:

> On Fri, 10 Mar 2023 19:26:47 +0000,
> Colton Lewis <coltonlewis@google.com> wrote:

>> Marc Zyngier <maz@kernel.org> writes:

>> >> mvbbq9:/data/coltonlewis/ecv/arm64-obj/kselftest/kvm#
>> >> ./aarch64/arch_timer -O 0xffff
>> >> ==== Test Assertion Failure ====
>> >>    aarch64/arch_timer.c:239: false
>> >>    pid=48094 tid=48095 errno=4 - Interrupted system call
>> >>       1  0x4010fb: test_vcpu_run at arch_timer.c:239
>> >>       2  0x42a5bf: start_thread at pthread_create.o:0
>> >>       3  0x46845b: thread_start at clone.o:0
>> >>    Failed guest assert: xcnt >= cval at aarch64/arch_timer.c:151
>> >> values: 2500645901305, 2500645961845; 9939, vcpu 0; stage; 3; iter: 2

>> > The fun part is that you can see similar things without the series:

>> > ==== Test Assertion Failure ====
>> >    aarch64/arch_timer.c:239: false
>> >    pid=647 tid=651 errno=4 - Interrupted system call
>> >       1  0x00000000004026db: test_vcpu_run at arch_timer.c:239
>> >       2  0x00007fffb13cedd7: ?? ??:0
>> >       3  0x00007fffb1437e9b: ?? ??:0
>> >    Failed guest assert: config_iter + 1 == irq_iter at
>> > aarch64/arch_timer.c:188
>> > values: 2, 3; 0, vcpu 3; stage; 4; iter: 3

>> > That's on a vanilla kernel (6.2-rc4) on an M1 with the test run
>> > without any argument in a loop. After a few iterations, it blows.

> I finally got to the bottom of that one. This is yet another case of
> the test making the assumption that spurious interrupts don't exist...

That's great!

> Here, the timer interrupt has been masked at the source, but the GIC
> (or its emulation) can be slow to retire it. So we take it again,
> spuriously, and account it as a true interrupt. None of the asserts in
> the timer handler fire because they only check the *previous* state.

> Eventually, the interrupt retires and we progress to the next
> iteration. But in the meantime, we have incremented the irq counter by
> the number of spurious events, and the test fails.

> The obvious fix is to check for the timer state in the handler and
> exit early if the timer interrupt is masked or the timer disabled.
> With that, I don't see these failures anymore.

> I've folded that into the patch that already deals with some spurious
> events.

I'll be looking at it and will keep in mind your questions about my
hardware should I find any issues. Yes it has ECV and CNTPOFF but no I
didn't try turning it off for this because my issue occured only when
setting a physical offset and that can't be done without ECV.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets
  2023-03-14 17:47           ` Colton Lewis
@ 2023-03-14 18:18             ` Marc Zyngier
  0 siblings, 0 replies; 55+ messages in thread
From: Marc Zyngier @ 2023-03-14 18:18 UTC (permalink / raw)
  To: Colton Lewis
  Cc: kvmarm, kvm, linux-arm-kernel, james.morse, suzuki.poulose,
	oliver.upton, yuzenghui, ricarkol, sveith, dwmw2

On Tue, 14 Mar 2023 17:47:01 +0000,
Colton Lewis <coltonlewis@google.com> wrote:
> 
> I'll be looking at it and will keep in mind your questions about my
> hardware should I find any issues. Yes it has ECV and CNTPOFF but no I
> didn't try turning it off for this because my issue occured only when
> setting a physical offset and that can't be done without ECV.

And yet this is exactly what these patches do: allow setting a
physical offset without ECV+CNTPOFF.

CNTPOFF is only a performance optimisation. It could be that my
handling of CNTPOFF is buggy, or that your HW is less than stellar, or
both. Testing the various combinations would certainly help.

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2023-03-14 18:18 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-16 14:21 [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Marc Zyngier
2023-02-16 14:21 ` [PATCH 01/16] arm64: Add CNTPOFF_EL2 register definition Marc Zyngier
2023-02-16 14:21 ` [PATCH 02/16] arm64: Add HAS_ECV_CNTPOFF capability Marc Zyngier
2023-02-22  4:30   ` Reiji Watanabe
2023-02-22 10:47     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 03/16] kvm: arm64: Expose {un,}lock_all_vcpus() to the reset of KVM Marc Zyngier
2023-02-23 22:30   ` Colton Lewis
2023-02-16 14:21 ` [PATCH 04/16] KVM: arm64: timers: Use a per-vcpu, per-timer accumulator for fractional ns Marc Zyngier
2023-02-23 22:30   ` Colton Lewis
2023-02-16 14:21 ` [PATCH 05/16] KVM: arm64: timers: Convert per-vcpu virtual offset to a global value Marc Zyngier
2023-02-22  6:15   ` Reiji Watanabe
2023-02-22 10:54     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 06/16] KVM: arm64: timers: Use CNTPOFF_EL2 to offset the physical timer Marc Zyngier
2023-02-23 22:34   ` Colton Lewis
2023-02-24  8:59     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 07/16] KVM: arm64: timers: Allow physical offset without CNTPOFF_EL2 Marc Zyngier
2023-02-23 22:40   ` Colton Lewis
2023-02-24 10:54     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 08/16] KVM: arm64: timers: Allow userspace to set the counter offsets Marc Zyngier
2023-02-16 22:09   ` Oliver Upton
2023-02-17 10:17     ` Marc Zyngier
2023-02-17 22:11       ` Oliver Upton
2023-02-22 11:56         ` Marc Zyngier
2023-02-22 16:34           ` Oliver Upton
2023-02-23 18:25             ` Marc Zyngier
2023-03-08  7:46               ` Oliver Upton
2023-03-08  7:53                 ` Oliver Upton
2023-03-09  8:29                   ` Marc Zyngier
2023-03-09  8:25                 ` Marc Zyngier
2023-02-23 22:41   ` Colton Lewis
2023-02-24 11:24     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 09/16] KVM: arm64: timers: Allow save/restoring of the physical timer Marc Zyngier
2023-02-16 14:21 ` [PATCH 10/16] KVM: arm64: timers: Rationalise per-vcpu timer init Marc Zyngier
2023-02-16 14:21 ` [PATCH 11/16] KVM: arm64: Document KVM_ARM_SET_CNT_OFFSETS and co Marc Zyngier
2023-02-16 14:21 ` [PATCH 12/16] KVM: arm64: nv: timers: Add a per-timer, per-vcpu offset Marc Zyngier
2023-02-24 20:07   ` Colton Lewis
2023-02-25 10:32     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 13/16] KVM: arm64: nv: timers: Support hyp timer emulation Marc Zyngier
2023-02-24 20:08   ` Colton Lewis
2023-02-25 10:34     ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 14/16] KVM: arm64: selftests: Add physical timer registers to the sysreg list Marc Zyngier
2023-02-16 14:21 ` [PATCH 15/16] KVM: arm64: selftests: Augment existing timer test to handle variable offsets Marc Zyngier
2023-03-06 22:08   ` Colton Lewis
2023-03-09  9:01     ` Marc Zyngier
2023-03-10 19:26       ` Colton Lewis
2023-03-12 15:53         ` Marc Zyngier
2023-03-13 11:43         ` Marc Zyngier
2023-03-14 17:47           ` Colton Lewis
2023-03-14 18:18             ` Marc Zyngier
2023-02-16 14:21 ` [PATCH 16/16] KVM: arm64: selftests: Deal with spurious timer interrupts Marc Zyngier
2023-02-21 16:28 ` [PATCH 00/16] KVM: arm64: Rework timer offsetting for fun and profit Veith, Simon
2023-02-21 22:17   ` Marc Zyngier
2023-02-23 22:29 ` Colton Lewis
2023-02-24  8:45   ` Marc Zyngier
2023-02-24 20:07 ` Colton Lewis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).