All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
@ 2021-06-08 21:47 ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

KVM's current means of saving/restoring system counters is plagued with
temporal issues. At least on ARM64 and x86, we migrate the guest's
system counter by-value through the respective guest system register
values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
brittle as the state is not idempotent: the host system counter is still
oscillating between the attempted save and restore. Furthermore, VMMs
may wish to transparently live migrate guest VMs, meaning that they
include the elapsed time due to live migration blackout in the guest
system counter view. The VMM thread could be preempted for any number of
reasons (scheduler, L0 hypervisor under nested) between the time that
it calculates the desired guest counter value and when KVM actually sets
this counter state.

Despite the value-based interface that we present to userspace, KVM
actually has idempotent guest controls by way of system counter offsets.
We can avoid all of the issues associated with a value-based interface
by abstracting these offset controls in new ioctls. This series
introduces KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls, meant to provide
userspace with idempotent controls of the guest system counter.

Patch 1 defines the ioctls, and was separated from the two provided
implementations for the sake of review. If it is more intuitive, this
patch can be squashed into the implementation commit.

Patch 2 realizes initial support for ARM64, migrating only the state
associated with the guest's virtual counter-timer. Patch 3 introduces a
KVM selftest to assert that userspace manipulation via the
aforementioned ioctls produces the expected system counter values within
the guest.

Patch 4 extends upon the ARM64 implementation by adding support for
physical counter-timer offsetting. This is currently backed by a
trap-and-emulate implementation, but can also be virtualized in hardware
that fully implements ARMv8.6-ECV. ECV support has been elided from this
series out of convenience for the author :) Patch 5 adds some test cases
to the newly-minted kvm selftest to validate expectations of physical
counter-timer emulation.

Patch 6 introduces yet another KVM selftest for aarch64, intended to
measure the effects of physical counter-timer emulation. Data for this
test can be found below, but basically there is some tradeoff of
overhead for the sake of correctness, but it isn't too bad.

Patches 7-8 add support for the ioctls to x86 by shoehorning the
controls into the pre-existing synchronization heuristics. Patch 7
provides necessary helper methods for the implementation to play nice
with those heuristics, and patch 8 actually implements the ioctls.

Patch 9 adds x86 test cases to the system counter KVM selftest. Lastly,
patch 10 documents the ioctls for both x86 and arm64.

All patches apply cleanly to kvm/next at the following commit:

a4345a7cecfb ("Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")

Physical counter benchmark
--------------------------

The following data was collected by running 10000 iterations of the
benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
machine with 2 80-core Ampere Altra SoCs. Measurements were collected
for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
parameter.

nVHE
----

+--------------------+--------+---------+
|       Metric       | Native | Trapped |
+--------------------+--------+---------+
| Average            | 54ns   | 148ns   |
| Standard Deviation | 124ns  | 122ns   |
| 95th Percentile    | 258ns  | 348ns   |
+--------------------+--------+---------+

VHE
---

+--------------------+--------+---------+
|       Metric       | Native | Trapped |
+--------------------+--------+---------+
| Average            | 53ns   | 152ns   |
| Standard Deviation | 92ns   | 94ns    |
| 95th Percentile    | 204ns  | 307ns   |
+--------------------+--------+---------+

Oliver Upton (10):
  KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
  KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
  selftests: KVM: Introduce system_counter_state_test
  KVM: arm64: Add userspace control of the guest's physical counter
  selftests: KVM: Add test cases for physical counter offsetting
  selftests: KVM: Add counter emulation benchmark
  KVM: x86: Refactor tsc synchronization code
  KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
  selftests: KVM: Add support for x86 to system_counter_state_test
  Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls

 Documentation/virt/kvm/api.rst                |  98 +++++++
 Documentation/virt/kvm/locking.rst            |  11 +
 arch/arm64/include/asm/kvm_host.h             |   6 +
 arch/arm64/include/asm/sysreg.h               |   1 +
 arch/arm64/include/uapi/asm/kvm.h             |  17 ++
 arch/arm64/kvm/arch_timer.c                   |  84 +++++-
 arch/arm64/kvm/arm.c                          |  25 ++
 arch/arm64/kvm/hyp/include/hyp/switch.h       |  31 +++
 arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
 arch/x86/include/asm/kvm_host.h               |   1 +
 arch/x86/include/uapi/asm/kvm.h               |   8 +
 arch/x86/kvm/x86.c                            | 176 +++++++++---
 include/uapi/linux/kvm.h                      |   5 +
 tools/testing/selftests/kvm/.gitignore        |   2 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++
 .../selftests/kvm/include/aarch64/processor.h |  24 ++
 .../selftests/kvm/system_counter_state_test.c | 256 ++++++++++++++++++
 18 files changed, 926 insertions(+), 47 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
 create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c

-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
@ 2021-06-08 21:47 ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

KVM's current means of saving/restoring system counters is plagued with
temporal issues. At least on ARM64 and x86, we migrate the guest's
system counter by-value through the respective guest system register
values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
brittle as the state is not idempotent: the host system counter is still
oscillating between the attempted save and restore. Furthermore, VMMs
may wish to transparently live migrate guest VMs, meaning that they
include the elapsed time due to live migration blackout in the guest
system counter view. The VMM thread could be preempted for any number of
reasons (scheduler, L0 hypervisor under nested) between the time that
it calculates the desired guest counter value and when KVM actually sets
this counter state.

Despite the value-based interface that we present to userspace, KVM
actually has idempotent guest controls by way of system counter offsets.
We can avoid all of the issues associated with a value-based interface
by abstracting these offset controls in new ioctls. This series
introduces KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls, meant to provide
userspace with idempotent controls of the guest system counter.

Patch 1 defines the ioctls, and was separated from the two provided
implementations for the sake of review. If it is more intuitive, this
patch can be squashed into the implementation commit.

Patch 2 realizes initial support for ARM64, migrating only the state
associated with the guest's virtual counter-timer. Patch 3 introduces a
KVM selftest to assert that userspace manipulation via the
aforementioned ioctls produces the expected system counter values within
the guest.

Patch 4 extends upon the ARM64 implementation by adding support for
physical counter-timer offsetting. This is currently backed by a
trap-and-emulate implementation, but can also be virtualized in hardware
that fully implements ARMv8.6-ECV. ECV support has been elided from this
series out of convenience for the author :) Patch 5 adds some test cases
to the newly-minted kvm selftest to validate expectations of physical
counter-timer emulation.

Patch 6 introduces yet another KVM selftest for aarch64, intended to
measure the effects of physical counter-timer emulation. Data for this
test can be found below, but basically there is some tradeoff of
overhead for the sake of correctness, but it isn't too bad.

Patches 7-8 add support for the ioctls to x86 by shoehorning the
controls into the pre-existing synchronization heuristics. Patch 7
provides necessary helper methods for the implementation to play nice
with those heuristics, and patch 8 actually implements the ioctls.

Patch 9 adds x86 test cases to the system counter KVM selftest. Lastly,
patch 10 documents the ioctls for both x86 and arm64.

All patches apply cleanly to kvm/next at the following commit:

a4345a7cecfb ("Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")

Physical counter benchmark
--------------------------

The following data was collected by running 10000 iterations of the
benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
machine with 2 80-core Ampere Altra SoCs. Measurements were collected
for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
parameter.

nVHE
----

+--------------------+--------+---------+
|       Metric       | Native | Trapped |
+--------------------+--------+---------+
| Average            | 54ns   | 148ns   |
| Standard Deviation | 124ns  | 122ns   |
| 95th Percentile    | 258ns  | 348ns   |
+--------------------+--------+---------+

VHE
---

+--------------------+--------+---------+
|       Metric       | Native | Trapped |
+--------------------+--------+---------+
| Average            | 53ns   | 152ns   |
| Standard Deviation | 92ns   | 94ns    |
| 95th Percentile    | 204ns  | 307ns   |
+--------------------+--------+---------+

Oliver Upton (10):
  KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
  KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
  selftests: KVM: Introduce system_counter_state_test
  KVM: arm64: Add userspace control of the guest's physical counter
  selftests: KVM: Add test cases for physical counter offsetting
  selftests: KVM: Add counter emulation benchmark
  KVM: x86: Refactor tsc synchronization code
  KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
  selftests: KVM: Add support for x86 to system_counter_state_test
  Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls

 Documentation/virt/kvm/api.rst                |  98 +++++++
 Documentation/virt/kvm/locking.rst            |  11 +
 arch/arm64/include/asm/kvm_host.h             |   6 +
 arch/arm64/include/asm/sysreg.h               |   1 +
 arch/arm64/include/uapi/asm/kvm.h             |  17 ++
 arch/arm64/kvm/arch_timer.c                   |  84 +++++-
 arch/arm64/kvm/arm.c                          |  25 ++
 arch/arm64/kvm/hyp/include/hyp/switch.h       |  31 +++
 arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
 arch/x86/include/asm/kvm_host.h               |   1 +
 arch/x86/include/uapi/asm/kvm.h               |   8 +
 arch/x86/kvm/x86.c                            | 176 +++++++++---
 include/uapi/linux/kvm.h                      |   5 +
 tools/testing/selftests/kvm/.gitignore        |   2 +
 tools/testing/selftests/kvm/Makefile          |   3 +
 .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++
 .../selftests/kvm/include/aarch64/processor.h |  24 ++
 .../selftests/kvm/system_counter_state_test.c | 256 ++++++++++++++++++
 18 files changed, 926 insertions(+), 47 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
 create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c

-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/10] KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

Correctly migrating a guest's view of the system counter is challenging.
To date, KVM has utilized value-based interfaces (e.g. guest value for
IA32_TSC msr on x86, CNTVCT_EL0 on arm64) for migrating guest counter
state. Restoring counters by value is problematic, especially for
operators who intend to have the guest's counters account for elapsed
time during live migration. Reason being, the guest counters _still_
increment even after calculating an appropriate value to restore a
guest's counter on the target machine.

Furthermore, maintaining the phase relationship between vCPU counters is
impossible with a value-based approach. The only hope for maintaining
the phase relationship of counters is to restore them by offset.

Introduce a new pair of vcpu ioctls, KVM_GET_SYSTEM_COUNTER_STATE and
KVM_SET_SYSTEM_COUNTER_STATE, that aim to do just that. Each
implementing architecture will define its own counter state structure,
allowing for flexibility with ISAs that may have multiple counters
(like arm64).

Reviewed-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 include/uapi/linux/kvm.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..562650c14e39 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_SYSTEM_COUNTER_STATE 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1645,6 +1646,10 @@ struct kvm_xen_vcpu_attr {
 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA	0x4
 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST	0x5
 
+/* Available with KVM_CAP_SYSTEM_COUNTER_STATE */
+#define KVM_GET_SYSTEM_COUNTER_STATE	_IOWR(KVMIO, 0xcc, struct kvm_system_counter_state)
+#define KVM_SET_SYSTEM_COUNTER_STATE	_IOW(KVMIO, 0xcd, struct kvm_system_counter_state)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 01/10] KVM: Introduce KVM_{GET, SET}_SYSTEM_COUNTER_STATE ioctls
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

Correctly migrating a guest's view of the system counter is challenging.
To date, KVM has utilized value-based interfaces (e.g. guest value for
IA32_TSC msr on x86, CNTVCT_EL0 on arm64) for migrating guest counter
state. Restoring counters by value is problematic, especially for
operators who intend to have the guest's counters account for elapsed
time during live migration. Reason being, the guest counters _still_
increment even after calculating an appropriate value to restore a
guest's counter on the target machine.

Furthermore, maintaining the phase relationship between vCPU counters is
impossible with a value-based approach. The only hope for maintaining
the phase relationship of counters is to restore them by offset.

Introduce a new pair of vcpu ioctls, KVM_GET_SYSTEM_COUNTER_STATE and
KVM_SET_SYSTEM_COUNTER_STATE, that aim to do just that. Each
implementing architecture will define its own counter state structure,
allowing for flexibility with ISAs that may have multiple counters
(like arm64).

Reviewed-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 include/uapi/linux/kvm.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 3fd9a7e9d90c..562650c14e39 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1082,6 +1082,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SGX_ATTRIBUTE 196
 #define KVM_CAP_VM_COPY_ENC_CONTEXT_FROM 197
 #define KVM_CAP_PTP_KVM 198
+#define KVM_CAP_SYSTEM_COUNTER_STATE 199
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -1645,6 +1646,10 @@ struct kvm_xen_vcpu_attr {
 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_DATA	0x4
 #define KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST	0x5
 
+/* Available with KVM_CAP_SYSTEM_COUNTER_STATE */
+#define KVM_GET_SYSTEM_COUNTER_STATE	_IOWR(KVMIO, 0xcc, struct kvm_system_counter_state)
+#define KVM_SET_SYSTEM_COUNTER_STATE	_IOW(KVMIO, 0xcd, struct kvm_system_counter_state)
+
 /* Secure Encrypted Virtualization command */
 enum sev_cmd_id {
 	/* Guest initialization commands */
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

ARMv8 provides for a virtual counter-timer offset that is added to guest
views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not
provided userspace with any perception of this, and instead affords a
value-based scheme of migrating the virtual counter-timer by directly
reading/writing the guest's CNTVCT_EL0. This is problematic because
counters continue to elapse while the register is being written, meaning
it is possible for drift to sneak in to the guest's time scale. This is
exacerbated by the fact that KVM will calculate an appropriate
CNTVOFF_EL2 every time the register is written, which will be broadcast
to all virtual CPUs. The only possible way to avoid causing guest time
to drift is to restore counter-timers by offset.

Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
of the virtual counter-timers to userspace, allowing it to define its
own heuristics for managing vCPU offsets.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  5 +++++
 arch/arm64/include/uapi/asm/kvm.h | 10 ++++++++++
 arch/arm64/kvm/arch_timer.c       | 22 ++++++++++++++++++++++
 arch/arm64/kvm/arm.c              | 25 +++++++++++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..31107d5e61af 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -781,4 +781,9 @@ void __init kvm_hyp_reserve(void);
 static inline void kvm_hyp_reserve(void) { }
 #endif
 
+int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
+					  struct kvm_system_counter_state *state);
+int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
+					  struct kvm_system_counter_state *state);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..d3987089c524 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,16 @@ struct kvm_vcpu_events {
 	__u32 reserved[12];
 };
 
+/* for KVM_{GET,SET}_SYSTEM_COUNTER_STATE */
+struct kvm_system_counter_state {
+	/* indicates what fields are valid in the structure */
+	__u32 flags;
+	__u32 pad;
+	/* guest counter-timer offset, relative to host cntpct_el0 */
+	__u64 cntvoff;
+	__u64 rsvd[7];
+};
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 74e0699661e9..955a7a183362 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -1259,3 +1259,25 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 	return -ENXIO;
 }
+
+int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
+					  struct kvm_system_counter_state *state)
+{
+	if (state->flags)
+		return -EINVAL;
+
+	state->cntvoff = timer_get_offset(vcpu_vtimer(vcpu));
+
+	return 0;
+}
+
+int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
+					  struct kvm_system_counter_state *state)
+{
+	if (state->flags)
+		return -EINVAL;
+
+	timer_set_offset(vcpu_vtimer(vcpu), state->cntvoff);
+
+	return 0;
+}
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1126eae27400..b78ffb4db9dd 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -207,6 +207,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_SYSTEM_COUNTER_STATE:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
@@ -1273,6 +1274,30 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 		return kvm_arm_vcpu_finalize(vcpu, what);
 	}
+	case KVM_GET_SYSTEM_COUNTER_STATE: {
+		struct kvm_system_counter_state state;
+
+		if (copy_from_user(&state, argp, sizeof(state)))
+			return -EFAULT;
+
+		r = kvm_arm_vcpu_get_system_counter_state(vcpu, &state);
+		if (r)
+			break;
+
+		if (copy_to_user(argp, &state, sizeof(state)))
+			return -EFAULT;
+
+		break;
+	}
+	case KVM_SET_SYSTEM_COUNTER_STATE: {
+		struct kvm_system_counter_state state;
+
+		if (copy_from_user(&state, argp, sizeof(state)))
+			return -EFAULT;
+
+		r = kvm_arm_vcpu_set_system_counter_state(vcpu, &state);
+		break;
+	}
 	default:
 		r = -EINVAL;
 	}
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

ARMv8 provides for a virtual counter-timer offset that is added to guest
views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not
provided userspace with any perception of this, and instead affords a
value-based scheme of migrating the virtual counter-timer by directly
reading/writing the guest's CNTVCT_EL0. This is problematic because
counters continue to elapse while the register is being written, meaning
it is possible for drift to sneak in to the guest's time scale. This is
exacerbated by the fact that KVM will calculate an appropriate
CNTVOFF_EL2 every time the register is written, which will be broadcast
to all virtual CPUs. The only possible way to avoid causing guest time
to drift is to restore counter-timers by offset.

Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
of the virtual counter-timers to userspace, allowing it to define its
own heuristics for managing vCPU offsets.

Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h |  5 +++++
 arch/arm64/include/uapi/asm/kvm.h | 10 ++++++++++
 arch/arm64/kvm/arch_timer.c       | 22 ++++++++++++++++++++++
 arch/arm64/kvm/arm.c              | 25 +++++++++++++++++++++++++
 4 files changed, 62 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cd7d5c8c4bc..31107d5e61af 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -781,4 +781,9 @@ void __init kvm_hyp_reserve(void);
 static inline void kvm_hyp_reserve(void) { }
 #endif
 
+int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
+					  struct kvm_system_counter_state *state);
+int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
+					  struct kvm_system_counter_state *state);
+
 #endif /* __ARM64_KVM_HOST_H__ */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 24223adae150..d3987089c524 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,16 @@ struct kvm_vcpu_events {
 	__u32 reserved[12];
 };
 
+/* for KVM_{GET,SET}_SYSTEM_COUNTER_STATE */
+struct kvm_system_counter_state {
+	/* indicates what fields are valid in the structure */
+	__u32 flags;
+	__u32 pad;
+	/* guest counter-timer offset, relative to host cntpct_el0 */
+	__u64 cntvoff;
+	__u64 rsvd[7];
+};
+
 /* If you need to interpret the index values, here is the key: */
 #define KVM_REG_ARM_COPROC_MASK		0x000000000FFF0000
 #define KVM_REG_ARM_COPROC_SHIFT	16
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 74e0699661e9..955a7a183362 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -1259,3 +1259,25 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 
 	return -ENXIO;
 }
+
+int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
+					  struct kvm_system_counter_state *state)
+{
+	if (state->flags)
+		return -EINVAL;
+
+	state->cntvoff = timer_get_offset(vcpu_vtimer(vcpu));
+
+	return 0;
+}
+
+int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
+					  struct kvm_system_counter_state *state)
+{
+	if (state->flags)
+		return -EINVAL;
+
+	timer_set_offset(vcpu_vtimer(vcpu), state->cntvoff);
+
+	return 0;
+}
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 1126eae27400..b78ffb4db9dd 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -207,6 +207,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SET_GUEST_DEBUG:
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
+	case KVM_CAP_SYSTEM_COUNTER_STATE:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
@@ -1273,6 +1274,30 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 
 		return kvm_arm_vcpu_finalize(vcpu, what);
 	}
+	case KVM_GET_SYSTEM_COUNTER_STATE: {
+		struct kvm_system_counter_state state;
+
+		if (copy_from_user(&state, argp, sizeof(state)))
+			return -EFAULT;
+
+		r = kvm_arm_vcpu_get_system_counter_state(vcpu, &state);
+		if (r)
+			break;
+
+		if (copy_to_user(argp, &state, sizeof(state)))
+			return -EFAULT;
+
+		break;
+	}
+	case KVM_SET_SYSTEM_COUNTER_STATE: {
+		struct kvm_system_counter_state state;
+
+		if (copy_from_user(&state, argp, sizeof(state)))
+			return -EFAULT;
+
+		r = kvm_arm_vcpu_set_system_counter_state(vcpu, &state);
+		break;
+	}
 	default:
 		r = -EINVAL;
 	}
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/10] selftests: KVM: Introduce system_counter_state_test
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

Test that the KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls correctly
configure the guest's view of the virtual counter-timer (CNTVCT_EL0).

Reviewed-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/aarch64/processor.h |  24 +++
 .../selftests/kvm/system_counter_state_test.c | 199 ++++++++++++++++++
 4 files changed, 225 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index bd83158e0e0b..1a5782d8a0d4 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -34,6 +34,7 @@
 /x86_64/xen_vmcall_test
 /x86_64/xss_msr_test
 /x86_64/vmx_pmu_msrs_test
+/system_counter_state_test
 /demand_paging_test
 /dirty_log_test
 /dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e439d027939d..b14f16dc954a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -80,6 +80,7 @@ TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
+TEST_GEN_PROGS_aarch64 += system_counter_state_test
 TEST_GEN_PROGS_aarch64 += demand_paging_test
 TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h b/tools/testing/selftests/kvm/include/aarch64/processor.h
index b7fa0c8551db..48c964ce62ff 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -52,6 +52,30 @@ static inline void set_reg(struct kvm_vm *vm, uint32_t vcpuid, uint64_t id, uint
 	vcpu_ioctl(vm, vcpuid, KVM_SET_ONE_REG, &reg);
 }
 
+static inline uint64_t read_cntpct_ordered(void)
+{
+	uint64_t r;
+
+	asm volatile("isb\n\t"
+		     "mrs %0, cntpct_el0\n\t"
+		     "isb\n\t"
+		     : "=r"(r));
+
+	return r;
+}
+
+static inline uint64_t read_cntvct_ordered(void)
+{
+	uint64_t r;
+
+	asm volatile("isb\n\t"
+		     "mrs %0, cntvct_el0\n\t"
+		     "isb\n\t"
+		     : "=r"(r));
+
+	return r;
+}
+
 void aarch64_vcpu_setup(struct kvm_vm *vm, int vcpuid, struct kvm_vcpu_init *init);
 void aarch64_vcpu_add_default(struct kvm_vm *vm, uint32_t vcpuid,
 			      struct kvm_vcpu_init *init, void *guest_code);
diff --git a/tools/testing/selftests/kvm/system_counter_state_test.c b/tools/testing/selftests/kvm/system_counter_state_test.c
new file mode 100644
index 000000000000..059971f6cb87
--- /dev/null
+++ b/tools/testing/selftests/kvm/system_counter_state_test.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * system_counter_state_test.c -- suite of tests for correctness of
+ * KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls.
+ *
+ * Copyright (c) 2021, Google LLC.
+ */
+#define _GNU_SOURCE
+#include <asm/kvm.h>
+#include <linux/kvm.h>
+#include <stdint.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define VCPU_ID 0
+
+#ifdef __aarch64__
+
+enum counter {
+	VIRTUAL,
+	PHYSICAL,
+};
+
+static char *counter_name(enum counter counter)
+{
+	switch (counter) {
+	case VIRTUAL:
+		return "virtual";
+	case PHYSICAL:
+		return "physical";
+	default:
+		TEST_ASSERT(false, "unrecognized counter: %d", counter);
+	}
+
+	/* never reached */
+	return NULL;
+}
+
+struct system_counter_state_test {
+	enum counter counter;
+	struct kvm_system_counter_state state;
+};
+
+static struct system_counter_state_test test_cases[] = {
+	{
+		.counter = VIRTUAL,
+		.state = {
+			.cntvoff = 0
+		}
+	},
+	{
+		.counter = VIRTUAL,
+		.state = {
+			.cntvoff = 1000000
+		}
+	},
+	{
+		.counter = VIRTUAL,
+		.state = {
+			.cntvoff = -1
+		}
+	},
+};
+
+static void pr_test(struct system_counter_state_test *test)
+{
+	pr_info("counter: %s, cntvoff: %lld\n", counter_name(test->counter), test->state.cntvoff);
+}
+
+static struct kvm_system_counter_state *
+get_system_counter_state(struct system_counter_state_test *test)
+{
+	return &test->state;
+}
+
+/*
+ * Reads the guest counter-timer under test.
+ */
+static uint64_t guest_read_counter(struct system_counter_state_test *test)
+{
+	switch (test->counter) {
+	case PHYSICAL:
+		return read_cntpct_ordered();
+	case VIRTUAL:
+		return read_cntvct_ordered();
+	default:
+		GUEST_ASSERT(0);
+	}
+
+	/* never reached */
+	return -1;
+}
+
+/*
+ * Reads the host physical counter-timer and transforms it into a guest value
+ * according to the kvm_system_counter_state structure.
+ */
+static uint64_t host_read_guest_counter(struct system_counter_state_test *test)
+{
+	uint64_t r;
+
+	r = read_cntvct_ordered();
+	switch (test->counter) {
+	case VIRTUAL:
+		r -= test->state.cntvoff;
+		break;
+	default:
+		TEST_ASSERT(false, "unrecognized counter: %d", test->counter);
+	}
+
+	return r;
+}
+
+#else
+#error test not implemented for architecture being built!
+#endif
+
+static void guest_main(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		struct system_counter_state_test *test = &test_cases[i];
+
+		GUEST_SYNC(guest_read_counter(test));
+	}
+
+	GUEST_DONE();
+}
+
+static bool enter_guest(struct kvm_vm *vm, uint64_t *guest_counter)
+{
+	struct ucall uc;
+
+	vcpu_ioctl(vm, VCPU_ID, KVM_RUN, NULL);
+
+	switch (get_ucall(vm, VCPU_ID, &uc)) {
+	case UCALL_DONE:
+		return true;
+	case UCALL_SYNC:
+		if (guest_counter)
+			*guest_counter = uc.args[1];
+		break;
+	case UCALL_ABORT:
+		TEST_ASSERT(false, "%s at %s:%ld", (const char *)uc.args[0],
+			    __FILE__, uc.args[1]);
+		break;
+	default:
+		TEST_ASSERT(false, "unexpected exit: %s",
+			    exit_reason_str(vcpu_state(vm, VCPU_ID)->exit_reason));
+		break;
+	}
+
+	/* more work to do in the guest */
+	return false;
+}
+
+static void run_tests(struct kvm_vm *vm)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		struct system_counter_state_test *test = &test_cases[i];
+		uint64_t start, end, obs;
+
+		pr_info("test %d: ", i);
+		pr_test(test);
+
+		vcpu_ioctl(vm, VCPU_ID, KVM_SET_SYSTEM_COUNTER_STATE,
+			   get_system_counter_state(test));
+
+		start = host_read_guest_counter(test);
+		TEST_ASSERT(!enter_guest(vm, &obs), "guest completed unexpectedly");
+		end = host_read_guest_counter(test);
+
+		TEST_ASSERT(start < obs && obs < end,
+			    "guest counter value (%ld) outside expected bounds: (%ld, %ld)",
+			    obs, start, end);
+	}
+
+	TEST_ASSERT(enter_guest(vm, NULL), "guest didn't run to completion");
+}
+
+int main(void)
+{
+	struct kvm_vm *vm;
+
+	if (!kvm_check_cap(KVM_CAP_SYSTEM_COUNTER_STATE)) {
+		print_skip("KVM_CAP_SYSTEM_COUNTER_STATE not supported");
+		exit(KSFT_SKIP);
+	}
+
+	vm = vm_create_default(0, 0, guest_main);
+	ucall_init(vm, NULL);
+	run_tests(vm);
+	kvm_vm_free(vm);
+}
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/10] selftests: KVM: Introduce system_counter_state_test
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

Test that the KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls correctly
configure the guest's view of the virtual counter-timer (CNTVCT_EL0).

Reviewed-by: Jing Zhang <jingzhangos@google.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/include/aarch64/processor.h |  24 +++
 .../selftests/kvm/system_counter_state_test.c | 199 ++++++++++++++++++
 4 files changed, 225 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index bd83158e0e0b..1a5782d8a0d4 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -34,6 +34,7 @@
 /x86_64/xen_vmcall_test
 /x86_64/xss_msr_test
 /x86_64/vmx_pmu_msrs_test
+/system_counter_state_test
 /demand_paging_test
 /dirty_log_test
 /dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index e439d027939d..b14f16dc954a 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -80,6 +80,7 @@ TEST_GEN_PROGS_x86_64 += steal_time
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
+TEST_GEN_PROGS_aarch64 += system_counter_state_test
 TEST_GEN_PROGS_aarch64 += demand_paging_test
 TEST_GEN_PROGS_aarch64 += dirty_log_test
 TEST_GEN_PROGS_aarch64 += dirty_log_perf_test
diff --git a/tools/testing/selftests/kvm/include/aarch64/processor.h b/tools/testing/selftests/kvm/include/aarch64/processor.h
index b7fa0c8551db..48c964ce62ff 100644
--- a/tools/testing/selftests/kvm/include/aarch64/processor.h
+++ b/tools/testing/selftests/kvm/include/aarch64/processor.h
@@ -52,6 +52,30 @@ static inline void set_reg(struct kvm_vm *vm, uint32_t vcpuid, uint64_t id, uint
 	vcpu_ioctl(vm, vcpuid, KVM_SET_ONE_REG, &reg);
 }
 
+static inline uint64_t read_cntpct_ordered(void)
+{
+	uint64_t r;
+
+	asm volatile("isb\n\t"
+		     "mrs %0, cntpct_el0\n\t"
+		     "isb\n\t"
+		     : "=r"(r));
+
+	return r;
+}
+
+static inline uint64_t read_cntvct_ordered(void)
+{
+	uint64_t r;
+
+	asm volatile("isb\n\t"
+		     "mrs %0, cntvct_el0\n\t"
+		     "isb\n\t"
+		     : "=r"(r));
+
+	return r;
+}
+
 void aarch64_vcpu_setup(struct kvm_vm *vm, int vcpuid, struct kvm_vcpu_init *init);
 void aarch64_vcpu_add_default(struct kvm_vm *vm, uint32_t vcpuid,
 			      struct kvm_vcpu_init *init, void *guest_code);
diff --git a/tools/testing/selftests/kvm/system_counter_state_test.c b/tools/testing/selftests/kvm/system_counter_state_test.c
new file mode 100644
index 000000000000..059971f6cb87
--- /dev/null
+++ b/tools/testing/selftests/kvm/system_counter_state_test.c
@@ -0,0 +1,199 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * system_counter_state_test.c -- suite of tests for correctness of
+ * KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls.
+ *
+ * Copyright (c) 2021, Google LLC.
+ */
+#define _GNU_SOURCE
+#include <asm/kvm.h>
+#include <linux/kvm.h>
+#include <stdint.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define VCPU_ID 0
+
+#ifdef __aarch64__
+
+enum counter {
+	VIRTUAL,
+	PHYSICAL,
+};
+
+static char *counter_name(enum counter counter)
+{
+	switch (counter) {
+	case VIRTUAL:
+		return "virtual";
+	case PHYSICAL:
+		return "physical";
+	default:
+		TEST_ASSERT(false, "unrecognized counter: %d", counter);
+	}
+
+	/* never reached */
+	return NULL;
+}
+
+struct system_counter_state_test {
+	enum counter counter;
+	struct kvm_system_counter_state state;
+};
+
+static struct system_counter_state_test test_cases[] = {
+	{
+		.counter = VIRTUAL,
+		.state = {
+			.cntvoff = 0
+		}
+	},
+	{
+		.counter = VIRTUAL,
+		.state = {
+			.cntvoff = 1000000
+		}
+	},
+	{
+		.counter = VIRTUAL,
+		.state = {
+			.cntvoff = -1
+		}
+	},
+};
+
+static void pr_test(struct system_counter_state_test *test)
+{
+	pr_info("counter: %s, cntvoff: %lld\n", counter_name(test->counter), test->state.cntvoff);
+}
+
+static struct kvm_system_counter_state *
+get_system_counter_state(struct system_counter_state_test *test)
+{
+	return &test->state;
+}
+
+/*
+ * Reads the guest counter-timer under test.
+ */
+static uint64_t guest_read_counter(struct system_counter_state_test *test)
+{
+	switch (test->counter) {
+	case PHYSICAL:
+		return read_cntpct_ordered();
+	case VIRTUAL:
+		return read_cntvct_ordered();
+	default:
+		GUEST_ASSERT(0);
+	}
+
+	/* never reached */
+	return -1;
+}
+
+/*
+ * Reads the host physical counter-timer and transforms it into a guest value
+ * according to the kvm_system_counter_state structure.
+ */
+static uint64_t host_read_guest_counter(struct system_counter_state_test *test)
+{
+	uint64_t r;
+
+	r = read_cntvct_ordered();
+	switch (test->counter) {
+	case VIRTUAL:
+		r -= test->state.cntvoff;
+		break;
+	default:
+		TEST_ASSERT(false, "unrecognized counter: %d", test->counter);
+	}
+
+	return r;
+}
+
+#else
+#error test not implemented for architecture being built!
+#endif
+
+static void guest_main(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		struct system_counter_state_test *test = &test_cases[i];
+
+		GUEST_SYNC(guest_read_counter(test));
+	}
+
+	GUEST_DONE();
+}
+
+static bool enter_guest(struct kvm_vm *vm, uint64_t *guest_counter)
+{
+	struct ucall uc;
+
+	vcpu_ioctl(vm, VCPU_ID, KVM_RUN, NULL);
+
+	switch (get_ucall(vm, VCPU_ID, &uc)) {
+	case UCALL_DONE:
+		return true;
+	case UCALL_SYNC:
+		if (guest_counter)
+			*guest_counter = uc.args[1];
+		break;
+	case UCALL_ABORT:
+		TEST_ASSERT(false, "%s at %s:%ld", (const char *)uc.args[0],
+			    __FILE__, uc.args[1]);
+		break;
+	default:
+		TEST_ASSERT(false, "unexpected exit: %s",
+			    exit_reason_str(vcpu_state(vm, VCPU_ID)->exit_reason));
+		break;
+	}
+
+	/* more work to do in the guest */
+	return false;
+}
+
+static void run_tests(struct kvm_vm *vm)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(test_cases); i++) {
+		struct system_counter_state_test *test = &test_cases[i];
+		uint64_t start, end, obs;
+
+		pr_info("test %d: ", i);
+		pr_test(test);
+
+		vcpu_ioctl(vm, VCPU_ID, KVM_SET_SYSTEM_COUNTER_STATE,
+			   get_system_counter_state(test));
+
+		start = host_read_guest_counter(test);
+		TEST_ASSERT(!enter_guest(vm, &obs), "guest completed unexpectedly");
+		end = host_read_guest_counter(test);
+
+		TEST_ASSERT(start < obs && obs < end,
+			    "guest counter value (%ld) outside expected bounds: (%ld, %ld)",
+			    obs, start, end);
+	}
+
+	TEST_ASSERT(enter_guest(vm, NULL), "guest didn't run to completion");
+}
+
+int main(void)
+{
+	struct kvm_vm *vm;
+
+	if (!kvm_check_cap(KVM_CAP_SYSTEM_COUNTER_STATE)) {
+		print_skip("KVM_CAP_SYSTEM_COUNTER_STATE not supported");
+		exit(KSFT_SKIP);
+	}
+
+	vm = vm_create_default(0, 0, guest_main);
+	ucall_init(vm, NULL);
+	run_tests(vm);
+	kvm_vm_free(vm);
+}
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/10] KVM: arm64: Add userspace control of the guest's physical counter
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

ARMv8.6 adds an extension to the architecture providing hypervisors with
more extensive controls of the guest's counters. A particularly
interesting control is CNTPOFF_EL2, a fixed offset subtracted from the
physical counter value to derive the guest's value. VMMs that live
migrate their guests may be particularly interested in this feature in
order to provide a consistent view of the physical counter across live
migrations.

In the interim, KVM can emulate this behavior by simply enabling traps
on CNTPCT_EL0 and subtracting an offset.

Add a new field to kvm_system_counter_state allowing a VMM to request an
offset to the physical counter. If this offset is nonzero, enable traps
on CNTPCT_EL0. Emulate guest reads to the register in the fast path to
keep counter reads reasonably performant, avoiding a full exit from the
guest.

Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h       |  1 +
 arch/arm64/include/asm/sysreg.h         |  1 +
 arch/arm64/include/uapi/asm/kvm.h       |  9 +++-
 arch/arm64/kvm/arch_timer.c             | 66 +++++++++++++++++++++++--
 arch/arm64/kvm/hyp/include/hyp/switch.h | 31 ++++++++++++
 arch/arm64/kvm/hyp/nvhe/timer-sr.c      | 16 ++++--
 6 files changed, 117 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 31107d5e61af..a3abafcea328 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -200,6 +200,7 @@ enum vcpu_sysreg {
 	SP_EL1,
 	SPSR_EL1,
 
+	CNTPOFF_EL2,
 	CNTVOFF_EL2,
 	CNTV_CVAL_EL0,
 	CNTV_CTL_EL0,
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 65d15700a168..193da426690a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -505,6 +505,7 @@
 #define SYS_AMEVCNTR0_MEM_STALL		SYS_AMEVCNTR0_EL0(3)
 
 #define SYS_CNTFRQ_EL0			sys_reg(3, 3, 14, 0, 0)
+#define SYS_CNTPCT_EL0			sys_reg(3, 3, 14, 0, 1)
 
 #define SYS_CNTP_TVAL_EL0		sys_reg(3, 3, 14, 2, 0)
 #define SYS_CNTP_CTL_EL0		sys_reg(3, 3, 14, 2, 1)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index d3987089c524..ee709e2f0292 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,8 @@ struct kvm_vcpu_events {
 	__u32 reserved[12];
 };
 
+#define KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET	(1ul << 0)
+
 /* for KVM_{GET,SET}_SYSTEM_COUNTER_STATE */
 struct kvm_system_counter_state {
 	/* indicates what fields are valid in the structure */
@@ -191,7 +193,12 @@ struct kvm_system_counter_state {
 	__u32 pad;
 	/* guest counter-timer offset, relative to host cntpct_el0 */
 	__u64 cntvoff;
-	__u64 rsvd[7];
+	/*
+	 * Guest physical counter-timer offset, relative to host cntpct_el0.
+	 * Valid when KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET is set.
+	 */
+	__u64 cntpoff;
+	__u64 rsvd[6];
 };
 
 /* If you need to interpret the index values, here is the key: */
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 955a7a183362..a74642d1515f 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -50,6 +50,7 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer,
 			      enum kvm_arch_timer_regs treg);
+static bool kvm_timer_emulation_required(struct arch_timer_context *ctx);
 
 u32 timer_get_ctl(struct arch_timer_context *ctxt)
 {
@@ -86,6 +87,8 @@ static u64 timer_get_offset(struct arch_timer_context *ctxt)
 	struct kvm_vcpu *vcpu = ctxt->vcpu;
 
 	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_PTIMER:
+		return __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
 	case TIMER_VTIMER:
 		return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
 	default:
@@ -130,6 +133,9 @@ static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
 	struct kvm_vcpu *vcpu = ctxt->vcpu;
 
 	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_PTIMER:
+		__vcpu_sys_reg(vcpu, CNTPOFF_EL2) = offset;
+		break;
 	case TIMER_VTIMER:
 		__vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
 		break;
@@ -145,7 +151,7 @@ u64 kvm_phys_timer_read(void)
 
 static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
-	if (has_vhe()) {
+	if (has_vhe() && !kvm_timer_emulation_required(vcpu_ptimer(vcpu))) {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = vcpu_ptimer(vcpu);
 		map->emul_ptimer = NULL;
@@ -746,6 +752,30 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+bool kvm_timer_emulation_required(struct arch_timer_context *ctx)
+{
+	int idx = arch_timer_ctx_index(ctx);
+
+	switch (idx) {
+	/*
+	 * hardware doesn't support offsetting of the physical counter/timer.
+	 * If offsetting is requested, enable emulation of the physical
+	 * counter/timer.
+	 */
+	case TIMER_PTIMER:
+		return timer_get_offset(ctx);
+	/*
+	 * Conversely, hardware does support offsetting of the virtual
+	 * counter/timer.
+	 */
+	case TIMER_VTIMER:
+		return false;
+	default:
+		WARN_ON(1);
+		return false;
+	}
+}
+
 /* Make the updates of cntvoff for all vtimer contexts atomic */
 static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
 {
@@ -1184,6 +1214,24 @@ void kvm_timer_init_vhe(void)
 	write_sysreg(val, cnthctl_el2);
 }
 
+static void kvm_timer_update_traps_vhe(struct kvm_vcpu *vcpu)
+{
+	u32 cnthctl_shift = 10;
+	u64 val;
+
+	if (!kvm_timer_emulation_required(vcpu_ptimer(vcpu)))
+		return;
+
+	/*
+	 * We must trap accesses to the physical counter/timer to emulate the
+	 * nonzero offset.
+	 */
+	val = read_sysreg(cnthctl_el2);
+	val &= ~(CNTHCTL_EL1PCEN << cnthctl_shift);
+	val &= ~(CNTHCTL_EL1PCTEN << cnthctl_shift);
+	write_sysreg(val, cnthctl_el2);
+}
+
 static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
 {
 	struct kvm_vcpu *vcpu;
@@ -1260,24 +1308,36 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	return -ENXIO;
 }
 
+#define KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS	\
+		(KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
+
 int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
 					  struct kvm_system_counter_state *state)
 {
-	if (state->flags)
+	if (state->flags & ~KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS)
 		return -EINVAL;
 
 	state->cntvoff = timer_get_offset(vcpu_vtimer(vcpu));
 
+	if (state->flags & KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
+		state->cntpoff = timer_get_offset(vcpu_ptimer(vcpu));
+
 	return 0;
 }
 
 int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
 					  struct kvm_system_counter_state *state)
 {
-	if (state->flags)
+	if (state->flags & ~KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS)
 		return -EINVAL;
 
 	timer_set_offset(vcpu_vtimer(vcpu), state->cntvoff);
 
+	if (state->flags & KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
+		timer_set_offset(vcpu_ptimer(vcpu), state->cntpoff);
+
+	if (has_vhe())
+		kvm_timer_update_traps_vhe(vcpu);
+
 	return 0;
 }
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index e4a2f295a394..12ada31e12e2 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -287,6 +287,30 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 	return true;
 }
 
+static inline u64 __hyp_read_cntpct(struct kvm_vcpu *vcpu)
+{
+	return read_sysreg(cntpct_el0) - __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
+}
+
+static inline bool __hyp_handle_counter(struct kvm_vcpu *vcpu)
+{
+	u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
+	int rt = kvm_vcpu_sys_get_rt(vcpu);
+	u64 rv;
+
+	switch (sysreg) {
+	case SYS_CNTPCT_EL0:
+		rv = __hyp_read_cntpct(vcpu);
+		break;
+	default:
+		return false;
+	}
+
+	vcpu_set_reg(vcpu, rt, rv);
+	__kvm_skip_instr(vcpu);
+	return true;
+}
+
 static inline bool handle_tx2_tvm(struct kvm_vcpu *vcpu)
 {
 	u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
@@ -439,6 +463,13 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (*exit_code != ARM_EXCEPTION_TRAP)
 		goto exit;
 
+	/*
+	 * We trap acesses to the physical counter value register (CNTPCT_EL0)
+	 * if userspace has requested a physical counter offset.
+	 */
+	if (__hyp_handle_counter(vcpu))
+		goto guest;
+
 	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) &&
 	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 &&
 	    handle_tx2_tvm(vcpu))
diff --git a/arch/arm64/kvm/hyp/nvhe/timer-sr.c b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
index 9072e71693ba..1b8e6e47a4ea 100644
--- a/arch/arm64/kvm/hyp/nvhe/timer-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
@@ -35,14 +35,24 @@ void __timer_disable_traps(struct kvm_vcpu *vcpu)
  */
 void __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
-	u64 val;
+	u64 val, cntpoff;
+
+	cntpoff = __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
 
 	/*
 	 * Disallow physical timer access for the guest
-	 * Physical counter access is allowed
 	 */
 	val = read_sysreg(cnthctl_el2);
 	val &= ~CNTHCTL_EL1PCEN;
-	val |= CNTHCTL_EL1PCTEN;
+
+	/*
+	 * Disallow physical counter access for the guest if offsetting is
+	 * requested.
+	 */
+	if (cntpoff)
+		val &= ~CNTHCTL_EL1PCTEN;
+	else
+		val |= CNTHCTL_EL1PCTEN;
+
 	write_sysreg(val, cnthctl_el2);
 }
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/10] KVM: arm64: Add userspace control of the guest's physical counter
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

ARMv8.6 adds an extension to the architecture providing hypervisors with
more extensive controls of the guest's counters. A particularly
interesting control is CNTPOFF_EL2, a fixed offset subtracted from the
physical counter value to derive the guest's value. VMMs that live
migrate their guests may be particularly interested in this feature in
order to provide a consistent view of the physical counter across live
migrations.

In the interim, KVM can emulate this behavior by simply enabling traps
on CNTPCT_EL0 and subtracting an offset.

Add a new field to kvm_system_counter_state allowing a VMM to request an
offset to the physical counter. If this offset is nonzero, enable traps
on CNTPCT_EL0. Emulate guest reads to the register in the fast path to
keep counter reads reasonably performant, avoiding a full exit from the
guest.

Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/arm64/include/asm/kvm_host.h       |  1 +
 arch/arm64/include/asm/sysreg.h         |  1 +
 arch/arm64/include/uapi/asm/kvm.h       |  9 +++-
 arch/arm64/kvm/arch_timer.c             | 66 +++++++++++++++++++++++--
 arch/arm64/kvm/hyp/include/hyp/switch.h | 31 ++++++++++++
 arch/arm64/kvm/hyp/nvhe/timer-sr.c      | 16 ++++--
 6 files changed, 117 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 31107d5e61af..a3abafcea328 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -200,6 +200,7 @@ enum vcpu_sysreg {
 	SP_EL1,
 	SPSR_EL1,
 
+	CNTPOFF_EL2,
 	CNTVOFF_EL2,
 	CNTV_CVAL_EL0,
 	CNTV_CTL_EL0,
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 65d15700a168..193da426690a 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -505,6 +505,7 @@
 #define SYS_AMEVCNTR0_MEM_STALL		SYS_AMEVCNTR0_EL0(3)
 
 #define SYS_CNTFRQ_EL0			sys_reg(3, 3, 14, 0, 0)
+#define SYS_CNTPCT_EL0			sys_reg(3, 3, 14, 0, 1)
 
 #define SYS_CNTP_TVAL_EL0		sys_reg(3, 3, 14, 2, 0)
 #define SYS_CNTP_CTL_EL0		sys_reg(3, 3, 14, 2, 1)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index d3987089c524..ee709e2f0292 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -184,6 +184,8 @@ struct kvm_vcpu_events {
 	__u32 reserved[12];
 };
 
+#define KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET	(1ul << 0)
+
 /* for KVM_{GET,SET}_SYSTEM_COUNTER_STATE */
 struct kvm_system_counter_state {
 	/* indicates what fields are valid in the structure */
@@ -191,7 +193,12 @@ struct kvm_system_counter_state {
 	__u32 pad;
 	/* guest counter-timer offset, relative to host cntpct_el0 */
 	__u64 cntvoff;
-	__u64 rsvd[7];
+	/*
+	 * Guest physical counter-timer offset, relative to host cntpct_el0.
+	 * Valid when KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET is set.
+	 */
+	__u64 cntpoff;
+	__u64 rsvd[6];
 };
 
 /* If you need to interpret the index values, here is the key: */
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 955a7a183362..a74642d1515f 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -50,6 +50,7 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
 static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
 			      struct arch_timer_context *timer,
 			      enum kvm_arch_timer_regs treg);
+static bool kvm_timer_emulation_required(struct arch_timer_context *ctx);
 
 u32 timer_get_ctl(struct arch_timer_context *ctxt)
 {
@@ -86,6 +87,8 @@ static u64 timer_get_offset(struct arch_timer_context *ctxt)
 	struct kvm_vcpu *vcpu = ctxt->vcpu;
 
 	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_PTIMER:
+		return __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
 	case TIMER_VTIMER:
 		return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
 	default:
@@ -130,6 +133,9 @@ static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
 	struct kvm_vcpu *vcpu = ctxt->vcpu;
 
 	switch(arch_timer_ctx_index(ctxt)) {
+	case TIMER_PTIMER:
+		__vcpu_sys_reg(vcpu, CNTPOFF_EL2) = offset;
+		break;
 	case TIMER_VTIMER:
 		__vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
 		break;
@@ -145,7 +151,7 @@ u64 kvm_phys_timer_read(void)
 
 static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
 {
-	if (has_vhe()) {
+	if (has_vhe() && !kvm_timer_emulation_required(vcpu_ptimer(vcpu))) {
 		map->direct_vtimer = vcpu_vtimer(vcpu);
 		map->direct_ptimer = vcpu_ptimer(vcpu);
 		map->emul_ptimer = NULL;
@@ -746,6 +752,30 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
+bool kvm_timer_emulation_required(struct arch_timer_context *ctx)
+{
+	int idx = arch_timer_ctx_index(ctx);
+
+	switch (idx) {
+	/*
+	 * hardware doesn't support offsetting of the physical counter/timer.
+	 * If offsetting is requested, enable emulation of the physical
+	 * counter/timer.
+	 */
+	case TIMER_PTIMER:
+		return timer_get_offset(ctx);
+	/*
+	 * Conversely, hardware does support offsetting of the virtual
+	 * counter/timer.
+	 */
+	case TIMER_VTIMER:
+		return false;
+	default:
+		WARN_ON(1);
+		return false;
+	}
+}
+
 /* Make the updates of cntvoff for all vtimer contexts atomic */
 static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
 {
@@ -1184,6 +1214,24 @@ void kvm_timer_init_vhe(void)
 	write_sysreg(val, cnthctl_el2);
 }
 
+static void kvm_timer_update_traps_vhe(struct kvm_vcpu *vcpu)
+{
+	u32 cnthctl_shift = 10;
+	u64 val;
+
+	if (!kvm_timer_emulation_required(vcpu_ptimer(vcpu)))
+		return;
+
+	/*
+	 * We must trap accesses to the physical counter/timer to emulate the
+	 * nonzero offset.
+	 */
+	val = read_sysreg(cnthctl_el2);
+	val &= ~(CNTHCTL_EL1PCEN << cnthctl_shift);
+	val &= ~(CNTHCTL_EL1PCTEN << cnthctl_shift);
+	write_sysreg(val, cnthctl_el2);
+}
+
 static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
 {
 	struct kvm_vcpu *vcpu;
@@ -1260,24 +1308,36 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
 	return -ENXIO;
 }
 
+#define KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS	\
+		(KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
+
 int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
 					  struct kvm_system_counter_state *state)
 {
-	if (state->flags)
+	if (state->flags & ~KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS)
 		return -EINVAL;
 
 	state->cntvoff = timer_get_offset(vcpu_vtimer(vcpu));
 
+	if (state->flags & KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
+		state->cntpoff = timer_get_offset(vcpu_ptimer(vcpu));
+
 	return 0;
 }
 
 int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
 					  struct kvm_system_counter_state *state)
 {
-	if (state->flags)
+	if (state->flags & ~KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS)
 		return -EINVAL;
 
 	timer_set_offset(vcpu_vtimer(vcpu), state->cntvoff);
 
+	if (state->flags & KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
+		timer_set_offset(vcpu_ptimer(vcpu), state->cntpoff);
+
+	if (has_vhe())
+		kvm_timer_update_traps_vhe(vcpu);
+
 	return 0;
 }
diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
index e4a2f295a394..12ada31e12e2 100644
--- a/arch/arm64/kvm/hyp/include/hyp/switch.h
+++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
@@ -287,6 +287,30 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
 	return true;
 }
 
+static inline u64 __hyp_read_cntpct(struct kvm_vcpu *vcpu)
+{
+	return read_sysreg(cntpct_el0) - __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
+}
+
+static inline bool __hyp_handle_counter(struct kvm_vcpu *vcpu)
+{
+	u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
+	int rt = kvm_vcpu_sys_get_rt(vcpu);
+	u64 rv;
+
+	switch (sysreg) {
+	case SYS_CNTPCT_EL0:
+		rv = __hyp_read_cntpct(vcpu);
+		break;
+	default:
+		return false;
+	}
+
+	vcpu_set_reg(vcpu, rt, rv);
+	__kvm_skip_instr(vcpu);
+	return true;
+}
+
 static inline bool handle_tx2_tvm(struct kvm_vcpu *vcpu)
 {
 	u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
@@ -439,6 +463,13 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
 	if (*exit_code != ARM_EXCEPTION_TRAP)
 		goto exit;
 
+	/*
+	 * We trap acesses to the physical counter value register (CNTPCT_EL0)
+	 * if userspace has requested a physical counter offset.
+	 */
+	if (__hyp_handle_counter(vcpu))
+		goto guest;
+
 	if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) &&
 	    kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 &&
 	    handle_tx2_tvm(vcpu))
diff --git a/arch/arm64/kvm/hyp/nvhe/timer-sr.c b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
index 9072e71693ba..1b8e6e47a4ea 100644
--- a/arch/arm64/kvm/hyp/nvhe/timer-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
@@ -35,14 +35,24 @@ void __timer_disable_traps(struct kvm_vcpu *vcpu)
  */
 void __timer_enable_traps(struct kvm_vcpu *vcpu)
 {
-	u64 val;
+	u64 val, cntpoff;
+
+	cntpoff = __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
 
 	/*
 	 * Disallow physical timer access for the guest
-	 * Physical counter access is allowed
 	 */
 	val = read_sysreg(cnthctl_el2);
 	val &= ~CNTHCTL_EL1PCEN;
-	val |= CNTHCTL_EL1PCTEN;
+
+	/*
+	 * Disallow physical counter access for the guest if offsetting is
+	 * requested.
+	 */
+	if (cntpoff)
+		val &= ~CNTHCTL_EL1PCTEN;
+	else
+		val |= CNTHCTL_EL1PCTEN;
+
 	write_sysreg(val, cnthctl_el2);
 }
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/10] selftests: KVM: Add test cases for physical counter offsetting
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

Add support for physical counter offsetting to counter_state_test.
Assert that guest reads of the physical counter are within the expected
bounds.

Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 .../selftests/kvm/system_counter_state_test.c | 27 ++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/system_counter_state_test.c b/tools/testing/selftests/kvm/system_counter_state_test.c
index 059971f6cb87..f537eb7b928c 100644
--- a/tools/testing/selftests/kvm/system_counter_state_test.c
+++ b/tools/testing/selftests/kvm/system_counter_state_test.c
@@ -62,11 +62,34 @@ static struct system_counter_state_test test_cases[] = {
 			.cntvoff = -1
 		}
 	},
+	{
+		.counter = PHYSICAL,
+		.state = {
+			.flags = KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET,
+			.cntpoff = 0
+		}
+	},
+	{
+		.counter = PHYSICAL,
+		.state = {
+			.flags = KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET,
+			.cntpoff = 1000000
+		}
+	},
+	{
+		.counter = PHYSICAL,
+		.state = {
+			.flags = KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET,
+			.cntpoff = -1
+		}
+	},
 };
 
 static void pr_test(struct system_counter_state_test *test)
 {
-	pr_info("counter: %s, cntvoff: %lld\n", counter_name(test->counter), test->state.cntvoff);
+	pr_info("counter: %s, cntvoff: %lld, cntpoff: %lld\n",
+	       counter_name(test->counter), test->state.cntvoff,
+	       test->state.cntpoff);
 }
 
 static struct kvm_system_counter_state *
@@ -103,6 +126,8 @@ static uint64_t host_read_guest_counter(struct system_counter_state_test *test)
 
 	r = read_cntvct_ordered();
 	switch (test->counter) {
+	case PHYSICAL:
+		r -= test->state.cntpoff;
 	case VIRTUAL:
 		r -= test->state.cntvoff;
 		break;
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/10] selftests: KVM: Add test cases for physical counter offsetting
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

Add support for physical counter offsetting to counter_state_test.
Assert that guest reads of the physical counter are within the expected
bounds.

Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 .../selftests/kvm/system_counter_state_test.c | 27 ++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/system_counter_state_test.c b/tools/testing/selftests/kvm/system_counter_state_test.c
index 059971f6cb87..f537eb7b928c 100644
--- a/tools/testing/selftests/kvm/system_counter_state_test.c
+++ b/tools/testing/selftests/kvm/system_counter_state_test.c
@@ -62,11 +62,34 @@ static struct system_counter_state_test test_cases[] = {
 			.cntvoff = -1
 		}
 	},
+	{
+		.counter = PHYSICAL,
+		.state = {
+			.flags = KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET,
+			.cntpoff = 0
+		}
+	},
+	{
+		.counter = PHYSICAL,
+		.state = {
+			.flags = KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET,
+			.cntpoff = 1000000
+		}
+	},
+	{
+		.counter = PHYSICAL,
+		.state = {
+			.flags = KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET,
+			.cntpoff = -1
+		}
+	},
 };
 
 static void pr_test(struct system_counter_state_test *test)
 {
-	pr_info("counter: %s, cntvoff: %lld\n", counter_name(test->counter), test->state.cntvoff);
+	pr_info("counter: %s, cntvoff: %lld, cntpoff: %lld\n",
+	       counter_name(test->counter), test->state.cntvoff,
+	       test->state.cntpoff);
 }
 
 static struct kvm_system_counter_state *
@@ -103,6 +126,8 @@ static uint64_t host_read_guest_counter(struct system_counter_state_test *test)
 
 	r = read_cntvct_ordered();
 	switch (test->counter) {
+	case PHYSICAL:
+		r -= test->state.cntpoff;
 	case VIRTUAL:
 		r -= test->state.cntvoff;
 		break;
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/10] selftests: KVM: Add counter emulation benchmark
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

Add a test case for counter emulation on arm64. A side effect of how KVM
handles physical counter offsetting on non-ECV systems is that the
virtual counter will always hit hardware and the physical could be
emulated. Force emulation by writing a nonzero offset to the physical
counter and compare the elapsed cycles to a direct read of the hardware
register.

Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++++++
 3 files changed, 211 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 1a5782d8a0d4..ac6a7d17d04a 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
+/aarch64/counter_emulation_benchmark
 /aarch64/get-reg-list
 /aarch64/get-reg-list-sve
 /aarch64/vgic_init
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index b14f16dc954a..c2e5a7d877b1 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -77,6 +77,7 @@ TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
 
+TEST_GEN_PROGS_aarch64 += aarch64/counter_emulation_benchmark
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
diff --git a/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c b/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
new file mode 100644
index 000000000000..c403e0762200
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
@@ -0,0 +1,209 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * counter_emulation_benchmark.c -- test to measure the effects of counter
+ * emulation on guest reads of the physical counter.
+ *
+ * Copyright (c) 2021, Google LLC.
+ */
+
+#define _GNU_SOURCE
+#include <asm/kvm.h>
+#include <linux/kvm.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define VCPU_ID 0
+
+static struct counter_values {
+	uint64_t cntvct_start;
+	uint64_t cntpct;
+	uint64_t cntvct_end;
+} counter_values;
+
+static uint64_t nr_iterations = 1000;
+
+static void do_test(void)
+{
+	/*
+	 * Open-coded approach instead of using helper methods to keep a tight
+	 * interval around the physical counter read.
+	 */
+	asm volatile("isb\n\t"
+		     "mrs %[cntvct_start], cntvct_el0\n\t"
+		     "isb\n\t"
+		     "mrs %[cntpct], cntpct_el0\n\t"
+		     "isb\n\t"
+		     "mrs %[cntvct_end], cntvct_el0\n\t"
+		     "isb\n\t"
+		     : [cntvct_start] "=r"(counter_values.cntvct_start),
+		     [cntpct] "=r"(counter_values.cntpct),
+		     [cntvct_end] "=r"(counter_values.cntvct_end));
+}
+
+static void guest_main(void)
+{
+	int i;
+
+	for (i = 0; i < nr_iterations; i++) {
+		do_test();
+		GUEST_SYNC(i);
+	}
+
+	for (i = 0; i < nr_iterations; i++) {
+		do_test();
+		GUEST_SYNC(i);
+	}
+
+	GUEST_DONE();
+}
+
+static bool enter_guest(struct kvm_vm *vm)
+{
+	struct ucall uc;
+
+	vcpu_ioctl(vm, VCPU_ID, KVM_RUN, NULL);
+
+	switch (get_ucall(vm, VCPU_ID, &uc)) {
+	case UCALL_DONE:
+		return true;
+	case UCALL_SYNC:
+		break;
+	case UCALL_ABORT:
+		TEST_ASSERT(false, "%s at %s:%ld", (const char *)uc.args[0],
+			    __FILE__, uc.args[1]);
+		break;
+	default:
+		TEST_ASSERT(false, "unexpected exit: %s",
+			    exit_reason_str(vcpu_state(vm, VCPU_ID)->exit_reason));
+		break;
+	}
+
+	/* more work to do in the guest */
+	return false;
+}
+
+static double counter_frequency(void)
+{
+	uint32_t freq;
+
+	asm volatile("mrs %0, cntfrq_el0"
+		     : "=r" (freq));
+
+	return freq / 1000000.0;
+}
+
+static void log_csv(FILE *csv, bool trapped)
+{
+	double freq = counter_frequency();
+
+	fprintf(csv, "%s,%.02f,%lu,%lu,%lu\n",
+		trapped ? "true" : "false", freq,
+		counter_values.cntvct_start,
+		counter_values.cntpct,
+		counter_values.cntvct_end);
+}
+
+static double run_loop(struct kvm_vm *vm, FILE *csv, bool trapped)
+{
+	double avg = 0;
+	int i;
+
+	for (i = 0; i < nr_iterations; i++) {
+		uint64_t delta;
+
+		TEST_ASSERT(!enter_guest(vm), "guest exited unexpectedly");
+		sync_global_from_guest(vm, counter_values);
+
+		if (csv)
+			log_csv(csv, trapped);
+
+		delta = counter_values.cntvct_end - counter_values.cntvct_start;
+		avg = ((avg * i) + delta) / (i + 1);
+	}
+
+	return avg;
+}
+
+static void run_tests(struct kvm_vm *vm, FILE *csv)
+{
+	struct kvm_system_counter_state state = {0};
+	double avg_trapped, avg_native, freq;
+
+	freq = counter_frequency();
+
+	if (csv)
+		fputs("trapped,freq_mhz,cntvct_start,cntpct,cntvct_end\n", csv);
+
+	/* no physical offsetting; kvm allows reads of cntpct_el0 */
+	vcpu_ioctl(vm, VCPU_ID, KVM_SET_SYSTEM_COUNTER_STATE, &state);
+	avg_native = run_loop(vm, csv, false);
+
+	/* force emulation of the physical counter */
+	state.flags = KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET;
+	state.cntpoff = 1;
+	vcpu_ioctl(vm, VCPU_ID, KVM_SET_SYSTEM_COUNTER_STATE, &state);
+	avg_trapped = run_loop(vm, csv, true);
+
+	TEST_ASSERT(enter_guest(vm), "guest didn't run to completion");
+	pr_info("%lu iterations: average cycles (@%.02fMHz) native: %.02f, trapped: %.02f\n",
+		nr_iterations, freq, avg_native, avg_trapped);
+}
+
+static void usage(const char *program_name)
+{
+	fprintf(stderr,
+		"Usage: %s [-h] [-o csv_file] [-n iterations]\n"
+		"  -h prints this message\n"
+		"  -n number of test iterations (default: %lu)\n"
+		"  -o csv file to write data\n",
+		program_name, nr_iterations);
+}
+
+int main(int argc, char **argv)
+{
+	struct kvm_vm *vm;
+	FILE *csv = NULL;
+	int opt;
+
+	if (!kvm_check_cap(KVM_CAP_SYSTEM_COUNTER_STATE)) {
+		print_skip("KVM_CAP_SYSTEM_COUNTER_STATE not supported");
+		exit(KSFT_SKIP);
+	}
+
+	while ((opt = getopt(argc, argv, "hn:o:")) != -1) {
+		switch (opt) {
+		case 'o':
+			csv = fopen(optarg, "w");
+			if (!csv) {
+				fprintf(stderr, "failed to open file '%s': %d\n",
+					optarg, errno);
+				exit(1);
+			}
+			break;
+		case 'n':
+			nr_iterations = strtoul(optarg, NULL, 0);
+			break;
+		default:
+			fprintf(stderr, "unrecognized option: '-%c'\n", opt);
+			/* fallthrough */
+		case 'h':
+			usage(argv[0]);
+			exit(1);
+		}
+	}
+
+	vm = vm_create_default(0, 0, guest_main);
+	sync_global_to_guest(vm, nr_iterations);
+	ucall_init(vm, NULL);
+	run_tests(vm, csv);
+	kvm_vm_free(vm);
+
+	if (csv)
+		fclose(csv);
+}
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/10] selftests: KVM: Add counter emulation benchmark
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

Add a test case for counter emulation on arm64. A side effect of how KVM
handles physical counter offsetting on non-ECV systems is that the
virtual counter will always hit hardware and the physical could be
emulated. Force emulation by writing a nonzero offset to the physical
counter and compare the elapsed cycles to a direct read of the hardware
register.

Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/.gitignore        |   1 +
 tools/testing/selftests/kvm/Makefile          |   1 +
 .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++++++
 3 files changed, 211 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c

diff --git a/tools/testing/selftests/kvm/.gitignore b/tools/testing/selftests/kvm/.gitignore
index 1a5782d8a0d4..ac6a7d17d04a 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -1,4 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
+/aarch64/counter_emulation_benchmark
 /aarch64/get-reg-list
 /aarch64/get-reg-list-sve
 /aarch64/vgic_init
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index b14f16dc954a..c2e5a7d877b1 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -77,6 +77,7 @@ TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
 
+TEST_GEN_PROGS_aarch64 += aarch64/counter_emulation_benchmark
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list-sve
 TEST_GEN_PROGS_aarch64 += aarch64/vgic_init
diff --git a/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c b/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
new file mode 100644
index 000000000000..c403e0762200
--- /dev/null
+++ b/tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
@@ -0,0 +1,209 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * counter_emulation_benchmark.c -- test to measure the effects of counter
+ * emulation on guest reads of the physical counter.
+ *
+ * Copyright (c) 2021, Google LLC.
+ */
+
+#define _GNU_SOURCE
+#include <asm/kvm.h>
+#include <linux/kvm.h>
+#include <stdio.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+#include "kvm_util.h"
+#include "processor.h"
+#include "test_util.h"
+
+#define VCPU_ID 0
+
+static struct counter_values {
+	uint64_t cntvct_start;
+	uint64_t cntpct;
+	uint64_t cntvct_end;
+} counter_values;
+
+static uint64_t nr_iterations = 1000;
+
+static void do_test(void)
+{
+	/*
+	 * Open-coded approach instead of using helper methods to keep a tight
+	 * interval around the physical counter read.
+	 */
+	asm volatile("isb\n\t"
+		     "mrs %[cntvct_start], cntvct_el0\n\t"
+		     "isb\n\t"
+		     "mrs %[cntpct], cntpct_el0\n\t"
+		     "isb\n\t"
+		     "mrs %[cntvct_end], cntvct_el0\n\t"
+		     "isb\n\t"
+		     : [cntvct_start] "=r"(counter_values.cntvct_start),
+		     [cntpct] "=r"(counter_values.cntpct),
+		     [cntvct_end] "=r"(counter_values.cntvct_end));
+}
+
+static void guest_main(void)
+{
+	int i;
+
+	for (i = 0; i < nr_iterations; i++) {
+		do_test();
+		GUEST_SYNC(i);
+	}
+
+	for (i = 0; i < nr_iterations; i++) {
+		do_test();
+		GUEST_SYNC(i);
+	}
+
+	GUEST_DONE();
+}
+
+static bool enter_guest(struct kvm_vm *vm)
+{
+	struct ucall uc;
+
+	vcpu_ioctl(vm, VCPU_ID, KVM_RUN, NULL);
+
+	switch (get_ucall(vm, VCPU_ID, &uc)) {
+	case UCALL_DONE:
+		return true;
+	case UCALL_SYNC:
+		break;
+	case UCALL_ABORT:
+		TEST_ASSERT(false, "%s at %s:%ld", (const char *)uc.args[0],
+			    __FILE__, uc.args[1]);
+		break;
+	default:
+		TEST_ASSERT(false, "unexpected exit: %s",
+			    exit_reason_str(vcpu_state(vm, VCPU_ID)->exit_reason));
+		break;
+	}
+
+	/* more work to do in the guest */
+	return false;
+}
+
+static double counter_frequency(void)
+{
+	uint32_t freq;
+
+	asm volatile("mrs %0, cntfrq_el0"
+		     : "=r" (freq));
+
+	return freq / 1000000.0;
+}
+
+static void log_csv(FILE *csv, bool trapped)
+{
+	double freq = counter_frequency();
+
+	fprintf(csv, "%s,%.02f,%lu,%lu,%lu\n",
+		trapped ? "true" : "false", freq,
+		counter_values.cntvct_start,
+		counter_values.cntpct,
+		counter_values.cntvct_end);
+}
+
+static double run_loop(struct kvm_vm *vm, FILE *csv, bool trapped)
+{
+	double avg = 0;
+	int i;
+
+	for (i = 0; i < nr_iterations; i++) {
+		uint64_t delta;
+
+		TEST_ASSERT(!enter_guest(vm), "guest exited unexpectedly");
+		sync_global_from_guest(vm, counter_values);
+
+		if (csv)
+			log_csv(csv, trapped);
+
+		delta = counter_values.cntvct_end - counter_values.cntvct_start;
+		avg = ((avg * i) + delta) / (i + 1);
+	}
+
+	return avg;
+}
+
+static void run_tests(struct kvm_vm *vm, FILE *csv)
+{
+	struct kvm_system_counter_state state = {0};
+	double avg_trapped, avg_native, freq;
+
+	freq = counter_frequency();
+
+	if (csv)
+		fputs("trapped,freq_mhz,cntvct_start,cntpct,cntvct_end\n", csv);
+
+	/* no physical offsetting; kvm allows reads of cntpct_el0 */
+	vcpu_ioctl(vm, VCPU_ID, KVM_SET_SYSTEM_COUNTER_STATE, &state);
+	avg_native = run_loop(vm, csv, false);
+
+	/* force emulation of the physical counter */
+	state.flags = KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET;
+	state.cntpoff = 1;
+	vcpu_ioctl(vm, VCPU_ID, KVM_SET_SYSTEM_COUNTER_STATE, &state);
+	avg_trapped = run_loop(vm, csv, true);
+
+	TEST_ASSERT(enter_guest(vm), "guest didn't run to completion");
+	pr_info("%lu iterations: average cycles (@%.02fMHz) native: %.02f, trapped: %.02f\n",
+		nr_iterations, freq, avg_native, avg_trapped);
+}
+
+static void usage(const char *program_name)
+{
+	fprintf(stderr,
+		"Usage: %s [-h] [-o csv_file] [-n iterations]\n"
+		"  -h prints this message\n"
+		"  -n number of test iterations (default: %lu)\n"
+		"  -o csv file to write data\n",
+		program_name, nr_iterations);
+}
+
+int main(int argc, char **argv)
+{
+	struct kvm_vm *vm;
+	FILE *csv = NULL;
+	int opt;
+
+	if (!kvm_check_cap(KVM_CAP_SYSTEM_COUNTER_STATE)) {
+		print_skip("KVM_CAP_SYSTEM_COUNTER_STATE not supported");
+		exit(KSFT_SKIP);
+	}
+
+	while ((opt = getopt(argc, argv, "hn:o:")) != -1) {
+		switch (opt) {
+		case 'o':
+			csv = fopen(optarg, "w");
+			if (!csv) {
+				fprintf(stderr, "failed to open file '%s': %d\n",
+					optarg, errno);
+				exit(1);
+			}
+			break;
+		case 'n':
+			nr_iterations = strtoul(optarg, NULL, 0);
+			break;
+		default:
+			fprintf(stderr, "unrecognized option: '-%c'\n", opt);
+			/* fallthrough */
+		case 'h':
+			usage(argv[0]);
+			exit(1);
+		}
+	}
+
+	vm = vm_create_default(0, 0, guest_main);
+	sync_global_to_guest(vm, nr_iterations);
+	ucall_init(vm, NULL);
+	run_tests(vm, csv);
+	kvm_vm_free(vm);
+
+	if (csv)
+		fclose(csv);
+}
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/10] KVM: x86: Refactor tsc synchronization code
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.

This changes the locking semantics around TSC writes. Writes to the TSC
will now take the pvclock gtod lock while holding the tsc write lock,
whereas before these locks were disjoint.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/locking.rst |  11 +++
 arch/x86/kvm/x86.c                 | 106 +++++++++++++++++------------
 2 files changed, 74 insertions(+), 43 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 1fc860c007a3..d21e0e480d63 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -25,6 +25,9 @@ On x86:
   holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
   there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
 
+- kvm->arch.tsc_write_lock is taken outside
+  kvm->arch.pvclock_gtod_sync_lock
+
 Everything else is a leaf: no other lock is taken inside the critical
 sections.
 
@@ -211,6 +214,14 @@ time it will be set using the Dirty tracking mechanism described above.
 :Comment:	'raw' because hardware enabling/disabling must be atomic /wrt
 		migration.
 
+:Name:		kvm_arch::pvclock_gtod_sync_lock
+:Type:		raw_spinlock_t
+:Arch:		x86
+:Protects:	kvm_arch::{cur_tsc_generation,cur_tsc_nsec,cur_tsc_write,
+			cur_tsc_offset,nr_vcpus_matched_tsc}
+:Comment:	'raw' because updating the kvm master clock must not be
+		preempted.
+
 :Name:		kvm_arch::tsc_write_lock
 :Type:		raw_spinlock
 :Arch:		x86
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..61069995a592 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2351,13 +2351,73 @@ static inline bool kvm_check_tsc_unstable(void)
 	return check_tsc_unstable();
 }
 
+/*
+ * Infers attempts to synchronize the guest's tsc from host writes. Sets the
+ * offset for the vcpu and tracks the TSC matching generation that the vcpu
+ * participates in.
+ *
+ * Must hold kvm->arch.tsc_write_lock to call this function.
+ */
+static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
+				  u64 ns, bool matched)
+{
+	struct kvm *kvm = vcpu->kvm;
+	bool already_matched;
+	unsigned long flags;
+
+	lockdep_assert_held(&kvm->arch.tsc_write_lock);
+
+	already_matched =
+	       (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
+
+	/*
+	 * We track the most recent recorded KHZ, write and time to
+	 * allow the matching interval to be extended at each write.
+	 */
+	kvm->arch.last_tsc_nsec = ns;
+	kvm->arch.last_tsc_write = tsc;
+	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+
+	vcpu->arch.last_guest_tsc = tsc;
+
+	/* Keep track of which generation this VCPU has synchronized to */
+	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
+	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
+	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
+
+	kvm_vcpu_write_tsc_offset(vcpu, offset);
+
+	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
+	if (!matched) {
+		/*
+		 * We split periods of matched TSC writes into generations.
+		 * For each generation, we track the original measured
+		 * nanosecond time, offset, and write, so if TSCs are in
+		 * sync, we can match exact offset, and if not, we can match
+		 * exact software computation in compute_guest_tsc()
+		 *
+		 * These values are tracked in kvm->arch.cur_xxx variables.
+		 */
+		kvm->arch.nr_vcpus_matched_tsc = 0;
+		kvm->arch.cur_tsc_generation++;
+		kvm->arch.cur_tsc_nsec = ns;
+		kvm->arch.cur_tsc_write = tsc;
+		kvm->arch.cur_tsc_offset = offset;
+		matched = false;
+	} else if (!already_matched) {
+		kvm->arch.nr_vcpus_matched_tsc++;
+	}
+
+	kvm_track_tsc_matching(vcpu);
+	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
+}
+
 static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 {
 	struct kvm *kvm = vcpu->kvm;
 	u64 offset, ns, elapsed;
 	unsigned long flags;
-	bool matched;
-	bool already_matched;
+	bool matched = false;
 	bool synchronizing = false;
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
@@ -2403,51 +2463,11 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 			offset = kvm_compute_tsc_offset(vcpu, data);
 		}
 		matched = true;
-		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
-	} else {
-		/*
-		 * We split periods of matched TSC writes into generations.
-		 * For each generation, we track the original measured
-		 * nanosecond time, offset, and write, so if TSCs are in
-		 * sync, we can match exact offset, and if not, we can match
-		 * exact software computation in compute_guest_tsc()
-		 *
-		 * These values are tracked in kvm->arch.cur_xxx variables.
-		 */
-		kvm->arch.cur_tsc_generation++;
-		kvm->arch.cur_tsc_nsec = ns;
-		kvm->arch.cur_tsc_write = data;
-		kvm->arch.cur_tsc_offset = offset;
-		matched = false;
 	}
 
-	/*
-	 * We also track th most recent recorded KHZ, write and time to
-	 * allow the matching interval to be extended at each write.
-	 */
-	kvm->arch.last_tsc_nsec = ns;
-	kvm->arch.last_tsc_write = data;
-	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
-
-	vcpu->arch.last_guest_tsc = data;
+	__kvm_synchronize_tsc(vcpu, offset, data, ns, matched);
 
-	/* Keep track of which generation this VCPU has synchronized to */
-	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
-	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
-	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
-
-	kvm_vcpu_write_tsc_offset(vcpu, offset);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
-
-	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
-	if (!matched) {
-		kvm->arch.nr_vcpus_matched_tsc = 0;
-	} else if (!already_matched) {
-		kvm->arch.nr_vcpus_matched_tsc++;
-	}
-
-	kvm_track_tsc_matching(vcpu);
-	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
 }
 
 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/10] KVM: x86: Refactor tsc synchronization code
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

Refactor kvm_synchronize_tsc to make a new function that allows callers
to specify TSC parameters (offset, value, nanoseconds, etc.) explicitly
for the sake of participating in TSC synchronization.

This changes the locking semantics around TSC writes. Writes to the TSC
will now take the pvclock gtod lock while holding the tsc write lock,
whereas before these locks were disjoint.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/locking.rst |  11 +++
 arch/x86/kvm/x86.c                 | 106 +++++++++++++++++------------
 2 files changed, 74 insertions(+), 43 deletions(-)

diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 1fc860c007a3..d21e0e480d63 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -25,6 +25,9 @@ On x86:
   holding kvm->arch.mmu_lock (typically with ``read_lock``, otherwise
   there's no need to take kvm->arch.tdp_mmu_pages_lock at all).
 
+- kvm->arch.tsc_write_lock is taken outside
+  kvm->arch.pvclock_gtod_sync_lock
+
 Everything else is a leaf: no other lock is taken inside the critical
 sections.
 
@@ -211,6 +214,14 @@ time it will be set using the Dirty tracking mechanism described above.
 :Comment:	'raw' because hardware enabling/disabling must be atomic /wrt
 		migration.
 
+:Name:		kvm_arch::pvclock_gtod_sync_lock
+:Type:		raw_spinlock_t
+:Arch:		x86
+:Protects:	kvm_arch::{cur_tsc_generation,cur_tsc_nsec,cur_tsc_write,
+			cur_tsc_offset,nr_vcpus_matched_tsc}
+:Comment:	'raw' because updating the kvm master clock must not be
+		preempted.
+
 :Name:		kvm_arch::tsc_write_lock
 :Type:		raw_spinlock
 :Arch:		x86
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..61069995a592 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2351,13 +2351,73 @@ static inline bool kvm_check_tsc_unstable(void)
 	return check_tsc_unstable();
 }
 
+/*
+ * Infers attempts to synchronize the guest's tsc from host writes. Sets the
+ * offset for the vcpu and tracks the TSC matching generation that the vcpu
+ * participates in.
+ *
+ * Must hold kvm->arch.tsc_write_lock to call this function.
+ */
+static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
+				  u64 ns, bool matched)
+{
+	struct kvm *kvm = vcpu->kvm;
+	bool already_matched;
+	unsigned long flags;
+
+	lockdep_assert_held(&kvm->arch.tsc_write_lock);
+
+	already_matched =
+	       (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
+
+	/*
+	 * We track the most recent recorded KHZ, write and time to
+	 * allow the matching interval to be extended at each write.
+	 */
+	kvm->arch.last_tsc_nsec = ns;
+	kvm->arch.last_tsc_write = tsc;
+	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+
+	vcpu->arch.last_guest_tsc = tsc;
+
+	/* Keep track of which generation this VCPU has synchronized to */
+	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
+	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
+	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
+
+	kvm_vcpu_write_tsc_offset(vcpu, offset);
+
+	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
+	if (!matched) {
+		/*
+		 * We split periods of matched TSC writes into generations.
+		 * For each generation, we track the original measured
+		 * nanosecond time, offset, and write, so if TSCs are in
+		 * sync, we can match exact offset, and if not, we can match
+		 * exact software computation in compute_guest_tsc()
+		 *
+		 * These values are tracked in kvm->arch.cur_xxx variables.
+		 */
+		kvm->arch.nr_vcpus_matched_tsc = 0;
+		kvm->arch.cur_tsc_generation++;
+		kvm->arch.cur_tsc_nsec = ns;
+		kvm->arch.cur_tsc_write = tsc;
+		kvm->arch.cur_tsc_offset = offset;
+		matched = false;
+	} else if (!already_matched) {
+		kvm->arch.nr_vcpus_matched_tsc++;
+	}
+
+	kvm_track_tsc_matching(vcpu);
+	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
+}
+
 static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 {
 	struct kvm *kvm = vcpu->kvm;
 	u64 offset, ns, elapsed;
 	unsigned long flags;
-	bool matched;
-	bool already_matched;
+	bool matched = false;
 	bool synchronizing = false;
 
 	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
@@ -2403,51 +2463,11 @@ static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data)
 			offset = kvm_compute_tsc_offset(vcpu, data);
 		}
 		matched = true;
-		already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation);
-	} else {
-		/*
-		 * We split periods of matched TSC writes into generations.
-		 * For each generation, we track the original measured
-		 * nanosecond time, offset, and write, so if TSCs are in
-		 * sync, we can match exact offset, and if not, we can match
-		 * exact software computation in compute_guest_tsc()
-		 *
-		 * These values are tracked in kvm->arch.cur_xxx variables.
-		 */
-		kvm->arch.cur_tsc_generation++;
-		kvm->arch.cur_tsc_nsec = ns;
-		kvm->arch.cur_tsc_write = data;
-		kvm->arch.cur_tsc_offset = offset;
-		matched = false;
 	}
 
-	/*
-	 * We also track th most recent recorded KHZ, write and time to
-	 * allow the matching interval to be extended at each write.
-	 */
-	kvm->arch.last_tsc_nsec = ns;
-	kvm->arch.last_tsc_write = data;
-	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
-
-	vcpu->arch.last_guest_tsc = data;
+	__kvm_synchronize_tsc(vcpu, offset, data, ns, matched);
 
-	/* Keep track of which generation this VCPU has synchronized to */
-	vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation;
-	vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec;
-	vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write;
-
-	kvm_vcpu_write_tsc_offset(vcpu, offset);
 	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
-
-	spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags);
-	if (!matched) {
-		kvm->arch.nr_vcpus_matched_tsc = 0;
-	} else if (!already_matched) {
-		kvm->arch.nr_vcpus_matched_tsc++;
-	}
-
-	kvm_track_tsc_matching(vcpu);
-	spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags);
 }
 
 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/10] KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

To date, VMM-directed TSC synchronization and migration has been messy.
KVM has some baked-in heuristics around TSC writes to infer if the VMM
is attempting to synchronize. This is problematic, as it depends on the
host writing to the guest's TSC within 1 second of the last write.

A much cleaner approach to configuring the guest's views of the TSC is
to simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus are not subject to change depending on when the VMM actually
reads the values from KVM. The VMM can then read the TSC once to
capture the instant at which the guest's TSCs are paused.

Implement the KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls and advertise
KVM_CAP_SYSTEM_COUNTER_STATE to userspace.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/include/uapi/asm/kvm.h |  8 ++++
 arch/x86/kvm/x86.c              | 70 +++++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..8768173f614c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1018,6 +1018,7 @@ struct kvm_arch {
 	u64 last_tsc_nsec;
 	u64 last_tsc_write;
 	u32 last_tsc_khz;
+	u64 last_tsc_offset;
 	u64 cur_tsc_nsec;
 	u64 cur_tsc_write;
 	u64 cur_tsc_offset;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0662f644aad9..60ad6b9ebcd6 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -490,4 +490,12 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+/* for KVM_CAP_SYSTEM_COUNTER_STATE */
+struct kvm_system_counter_state {
+	__u32 flags;
+	__u32 pad;
+	__u64 tsc_offset;
+	__u64 rsvd[6];
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 61069995a592..bb3ecb5cd548 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2332,6 +2332,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
+static u64 kvm_vcpu_read_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.l1_tsc_offset;
+}
+
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vcpu->arch.l1_tsc_offset = offset;
@@ -2377,6 +2382,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
 	kvm->arch.last_tsc_nsec = ns;
 	kvm->arch.last_tsc_write = tsc;
 	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+	kvm->arch.last_tsc_offset = offset;
 
 	vcpu->arch.last_guest_tsc = tsc;
 
@@ -2485,6 +2491,44 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 	adjust_tsc_offset_guest(vcpu, adjustment);
 }
 
+static int kvm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
+					     struct kvm_system_counter_state *state)
+{
+	if (state->flags)
+		return -EINVAL;
+
+	state->tsc_offset = kvm_vcpu_read_tsc_offset(vcpu);
+	return 0;
+}
+
+static int kvm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
+					     struct kvm_system_counter_state *state)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u64 offset, tsc, ns;
+	unsigned long flags;
+	bool matched;
+
+	if (state->flags)
+		return -EINVAL;
+
+	offset = state->tsc_offset;
+
+	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
+
+	matched = (vcpu->arch.virtual_tsc_khz &&
+		   kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz &&
+		   kvm->arch.last_tsc_offset == offset);
+
+	tsc = kvm_scale_tsc(vcpu, rdtsc()) + offset;
+	ns = get_kvmclock_base_ns();
+
+	__kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched);
+	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
+
+	return 0;
+}
+
 #ifdef CONFIG_X86_64
 
 static u64 read_tsc(void)
@@ -3912,6 +3956,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SGX_ATTRIBUTE:
 #endif
 	case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
+	case KVM_CAP_SYSTEM_COUNTER_STATE:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
@@ -5200,6 +5245,31 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		break;
 	}
 #endif
+	case KVM_GET_SYSTEM_COUNTER_STATE: {
+		struct kvm_system_counter_state state;
+
+		r = -EFAULT;
+		if (copy_from_user(&state, argp, sizeof(state)))
+			goto out;
+
+		r = kvm_vcpu_get_system_counter_state(vcpu, &state);
+		if (r)
+			goto out;
+		if (copy_to_user(argp, &state, sizeof(state)))
+			r = -EFAULT;
+
+		break;
+	}
+	case KVM_SET_SYSTEM_COUNTER_STATE: {
+		struct kvm_system_counter_state state;
+
+		r = -EFAULT;
+		if (copy_from_user(&state, argp, sizeof(state)))
+			goto out;
+
+		r = kvm_vcpu_set_system_counter_state(vcpu, &state);
+		break;
+	}
 	default:
 		r = -EINVAL;
 	}
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/10] KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

To date, VMM-directed TSC synchronization and migration has been messy.
KVM has some baked-in heuristics around TSC writes to infer if the VMM
is attempting to synchronize. This is problematic, as it depends on the
host writing to the guest's TSC within 1 second of the last write.

A much cleaner approach to configuring the guest's views of the TSC is
to simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus are not subject to change depending on when the VMM actually
reads the values from KVM. The VMM can then read the TSC once to
capture the instant at which the guest's TSCs are paused.

Implement the KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls and advertise
KVM_CAP_SYSTEM_COUNTER_STATE to userspace.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 arch/x86/include/asm/kvm_host.h |  1 +
 arch/x86/include/uapi/asm/kvm.h |  8 ++++
 arch/x86/kvm/x86.c              | 70 +++++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..8768173f614c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1018,6 +1018,7 @@ struct kvm_arch {
 	u64 last_tsc_nsec;
 	u64 last_tsc_write;
 	u32 last_tsc_khz;
+	u64 last_tsc_offset;
 	u64 cur_tsc_nsec;
 	u64 cur_tsc_write;
 	u64 cur_tsc_offset;
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 0662f644aad9..60ad6b9ebcd6 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -490,4 +490,12 @@ struct kvm_pmu_event_filter {
 #define KVM_PMU_EVENT_ALLOW 0
 #define KVM_PMU_EVENT_DENY 1
 
+/* for KVM_CAP_SYSTEM_COUNTER_STATE */
+struct kvm_system_counter_state {
+	__u32 flags;
+	__u32 pad;
+	__u64 tsc_offset;
+	__u64 rsvd[6];
+};
+
 #endif /* _ASM_X86_KVM_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 61069995a592..bb3ecb5cd548 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2332,6 +2332,11 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 }
 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
+static u64 kvm_vcpu_read_tsc_offset(struct kvm_vcpu *vcpu)
+{
+	return vcpu->arch.l1_tsc_offset;
+}
+
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset)
 {
 	vcpu->arch.l1_tsc_offset = offset;
@@ -2377,6 +2382,7 @@ static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc,
 	kvm->arch.last_tsc_nsec = ns;
 	kvm->arch.last_tsc_write = tsc;
 	kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz;
+	kvm->arch.last_tsc_offset = offset;
 
 	vcpu->arch.last_guest_tsc = tsc;
 
@@ -2485,6 +2491,44 @@ static inline void adjust_tsc_offset_host(struct kvm_vcpu *vcpu, s64 adjustment)
 	adjust_tsc_offset_guest(vcpu, adjustment);
 }
 
+static int kvm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
+					     struct kvm_system_counter_state *state)
+{
+	if (state->flags)
+		return -EINVAL;
+
+	state->tsc_offset = kvm_vcpu_read_tsc_offset(vcpu);
+	return 0;
+}
+
+static int kvm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
+					     struct kvm_system_counter_state *state)
+{
+	struct kvm *kvm = vcpu->kvm;
+	u64 offset, tsc, ns;
+	unsigned long flags;
+	bool matched;
+
+	if (state->flags)
+		return -EINVAL;
+
+	offset = state->tsc_offset;
+
+	raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags);
+
+	matched = (vcpu->arch.virtual_tsc_khz &&
+		   kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz &&
+		   kvm->arch.last_tsc_offset == offset);
+
+	tsc = kvm_scale_tsc(vcpu, rdtsc()) + offset;
+	ns = get_kvmclock_base_ns();
+
+	__kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched);
+	raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags);
+
+	return 0;
+}
+
 #ifdef CONFIG_X86_64
 
 static u64 read_tsc(void)
@@ -3912,6 +3956,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_SGX_ATTRIBUTE:
 #endif
 	case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM:
+	case KVM_CAP_SYSTEM_COUNTER_STATE:
 		r = 1;
 		break;
 	case KVM_CAP_SET_GUEST_DEBUG2:
@@ -5200,6 +5245,31 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		break;
 	}
 #endif
+	case KVM_GET_SYSTEM_COUNTER_STATE: {
+		struct kvm_system_counter_state state;
+
+		r = -EFAULT;
+		if (copy_from_user(&state, argp, sizeof(state)))
+			goto out;
+
+		r = kvm_vcpu_get_system_counter_state(vcpu, &state);
+		if (r)
+			goto out;
+		if (copy_to_user(argp, &state, sizeof(state)))
+			r = -EFAULT;
+
+		break;
+	}
+	case KVM_SET_SYSTEM_COUNTER_STATE: {
+		struct kvm_system_counter_state state;
+
+		r = -EFAULT;
+		if (copy_from_user(&state, argp, sizeof(state)))
+			goto out;
+
+		r = kvm_vcpu_set_system_counter_state(vcpu, &state);
+		break;
+	}
 	default:
 		r = -EINVAL;
 	}
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/10] selftests: KVM: Add support for x86 to system_counter_state_test
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

Test that userspace manipulation of the tsc offset (via
KVM_{GET,SET}_SYSTEM_COUNTER_STATE) results in the expected guest view
of the TSC.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../selftests/kvm/system_counter_state_test.c | 32 +++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index c2e5a7d877b1..28207474c79c 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
 TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
+TEST_GEN_PROGS_x86_64 += system_counter_state_test
 
 TEST_GEN_PROGS_aarch64 += aarch64/counter_emulation_benchmark
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
diff --git a/tools/testing/selftests/kvm/system_counter_state_test.c b/tools/testing/selftests/kvm/system_counter_state_test.c
index f537eb7b928c..9dcbc95bba3f 100644
--- a/tools/testing/selftests/kvm/system_counter_state_test.c
+++ b/tools/testing/selftests/kvm/system_counter_state_test.c
@@ -138,6 +138,38 @@ static uint64_t host_read_guest_counter(struct system_counter_state_test *test)
 	return r;
 }
 
+#elif __x86_64__
+
+/* no additional information required beyond the counter state. */
+#define system_counter_state_test kvm_system_counter_state
+
+static struct system_counter_state_test test_cases[] = {
+	{ .tsc_offset = 0 },
+	{ .tsc_offset = 1000000 },
+	{ .tsc_offset = -1 },
+};
+
+static void pr_test(struct system_counter_state_test *test)
+{
+	printf("tsc_offset: %lld\n", test->tsc_offset);
+}
+
+static struct kvm_system_counter_state *
+get_system_counter_state(struct system_counter_state_test *test)
+{
+	return test;
+}
+
+static uint64_t guest_read_counter(struct system_counter_state_test *test)
+{
+	return rdtsc();
+}
+
+static uint64_t host_read_guest_counter(struct system_counter_state_test *test)
+{
+	return rdtsc() + test->tsc_offset;
+}
+
 #else
 #error test not implemented for architecture being built!
 #endif
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/10] selftests: KVM: Add support for x86 to system_counter_state_test
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

Test that userspace manipulation of the tsc offset (via
KVM_{GET,SET}_SYSTEM_COUNTER_STATE) results in the expected guest view
of the TSC.

Reviewed-by: David Matlack <dmatlack@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 tools/testing/selftests/kvm/Makefile          |  1 +
 .../selftests/kvm/system_counter_state_test.c | 32 +++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index c2e5a7d877b1..28207474c79c 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -76,6 +76,7 @@ TEST_GEN_PROGS_x86_64 += kvm_page_table_test
 TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
+TEST_GEN_PROGS_x86_64 += system_counter_state_test
 
 TEST_GEN_PROGS_aarch64 += aarch64/counter_emulation_benchmark
 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
diff --git a/tools/testing/selftests/kvm/system_counter_state_test.c b/tools/testing/selftests/kvm/system_counter_state_test.c
index f537eb7b928c..9dcbc95bba3f 100644
--- a/tools/testing/selftests/kvm/system_counter_state_test.c
+++ b/tools/testing/selftests/kvm/system_counter_state_test.c
@@ -138,6 +138,38 @@ static uint64_t host_read_guest_counter(struct system_counter_state_test *test)
 	return r;
 }
 
+#elif __x86_64__
+
+/* no additional information required beyond the counter state. */
+#define system_counter_state_test kvm_system_counter_state
+
+static struct system_counter_state_test test_cases[] = {
+	{ .tsc_offset = 0 },
+	{ .tsc_offset = 1000000 },
+	{ .tsc_offset = -1 },
+};
+
+static void pr_test(struct system_counter_state_test *test)
+{
+	printf("tsc_offset: %lld\n", test->tsc_offset);
+}
+
+static struct kvm_system_counter_state *
+get_system_counter_state(struct system_counter_state_test *test)
+{
+	return test;
+}
+
+static uint64_t guest_read_counter(struct system_counter_state_test *test)
+{
+	return rdtsc();
+}
+
+static uint64_t host_read_guest_counter(struct system_counter_state_test *test)
+{
+	return rdtsc() + test->tsc_offset;
+}
+
 #else
 #error test not implemented for architecture being built!
 #endif
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/10] Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-08 21:47   ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Oliver Upton

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst | 98 ++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..85654748156a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5034,6 +5034,104 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.130 KVM_GET_SYSTEM_COUNTER_STATE
+---------------------------
+
+:Capability: KVM_CAP_SYSTEM_COUNTER_STATE
+:Architectures: arm64, x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_system_counter_state
+:Returns: 0 on success, < 0 on error
+
+Allows the vCPU counter state to be read. Each architecture defines
+its own kvm_system_counter_state structure depending on the backing hardware
+controls used for the guest's counter.
+
+ARM64
+
+::
+
+  struct kvm_system_counter_state {
+	/* indicates what fields are valid in the structure */
+	__u32 flags;
+
+Enumerates what fields are valid in the kvm_system_counter_state structure.
+Userspace should set this field to indicate what fields it wants the kernel
+to populate.
+
+::
+
+	__u32 pad;
+	/*
+	 * Guest physical counter-timer offset, relative to host cntpct_el0.
+	 * Valid when KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET is set.
+	 */
+	__u64 cntvoff;
+
+Offset for the guest virtual counter-timer, as it relates to the host's
+physical counter-timer (CNTPCT_EL0). This field is populated when the
+KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET bit is set in the flags field.
+
+::
+
+	/* guest physical counter-timer offset, relative to host cntpct_el0 */
+	__u64 cntpoff;
+
+Offset for the guest physical counter-timer, as it relates to the host's
+physical counter-timer (CNTPCT_EL0).
+
+::
+
+	__u64 rsvd[5];
+  };
+
+x86
+
+::
+
+  struct kvm_system_counter_state {
+	__u32 flags;
+
+Enumerates what fields are valid in the kvm_system_counter_state structure.
+Currently, the structure has not been extended, so there are no valid flag
+bits. This field should then be set to zero.
+
+::
+
+	__u32 pad;
+	__u64 tsc_offset;
+
+Offset for the guest TSC, as it relates to the host's TSC.
+
+::
+
+	__u64 rsvd[6];
+  };
+
+4.131 KVM_SET_SYSTEM_COUNTER_STATE
+---------------------------
+
+:Capability: KVM_CAP_SYSTEM_COUNTER_STATE
+:Architectures: arm64, x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_system_counter_state
+:Returns: 0 on success, < 0 on error.
+
+Allows the vCPU counter state to be written. For more details on the
+structure, see KVM_GET_SYSTEM_COUNTER_STATE above.
+
+ARM64
+
+VMMs should either use this ioctl *OR* directly write to the vCPU's
+CNTVCT_EL0 register. Mixing both methods of restoring counter state
+can cause drift between virtual CPUs.
+
+x86
+
+VMMs should either use this ioctl *OR* directly write to the vCPU's
+IA32_TSC register. Mixing both methods of restoring TSC state can
+cause drift between virtual CPUs.
+
 5. The kvm_run structure
 ========================
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/10] Documentation: KVM: Document KVM_{GET, SET}_SYSTEM_COUNTER_STATE ioctls
@ 2021-06-08 21:47   ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:47 UTC (permalink / raw)
  To: kvm, kvmarm
  Cc: Oliver Upton, Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Peter Shier <pshier@google.com>
Reviewed-by: Ricardo Koller <ricarkol@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
---
 Documentation/virt/kvm/api.rst | 98 ++++++++++++++++++++++++++++++++++
 1 file changed, 98 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 7fcb2fd38f42..85654748156a 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5034,6 +5034,104 @@ see KVM_XEN_VCPU_SET_ATTR above.
 The KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST type may not be used
 with the KVM_XEN_VCPU_GET_ATTR ioctl.
 
+4.130 KVM_GET_SYSTEM_COUNTER_STATE
+---------------------------
+
+:Capability: KVM_CAP_SYSTEM_COUNTER_STATE
+:Architectures: arm64, x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_system_counter_state
+:Returns: 0 on success, < 0 on error
+
+Allows the vCPU counter state to be read. Each architecture defines
+its own kvm_system_counter_state structure depending on the backing hardware
+controls used for the guest's counter.
+
+ARM64
+
+::
+
+  struct kvm_system_counter_state {
+	/* indicates what fields are valid in the structure */
+	__u32 flags;
+
+Enumerates what fields are valid in the kvm_system_counter_state structure.
+Userspace should set this field to indicate what fields it wants the kernel
+to populate.
+
+::
+
+	__u32 pad;
+	/*
+	 * Guest physical counter-timer offset, relative to host cntpct_el0.
+	 * Valid when KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET is set.
+	 */
+	__u64 cntvoff;
+
+Offset for the guest virtual counter-timer, as it relates to the host's
+physical counter-timer (CNTPCT_EL0). This field is populated when the
+KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET bit is set in the flags field.
+
+::
+
+	/* guest physical counter-timer offset, relative to host cntpct_el0 */
+	__u64 cntpoff;
+
+Offset for the guest physical counter-timer, as it relates to the host's
+physical counter-timer (CNTPCT_EL0).
+
+::
+
+	__u64 rsvd[5];
+  };
+
+x86
+
+::
+
+  struct kvm_system_counter_state {
+	__u32 flags;
+
+Enumerates what fields are valid in the kvm_system_counter_state structure.
+Currently, the structure has not been extended, so there are no valid flag
+bits. This field should then be set to zero.
+
+::
+
+	__u32 pad;
+	__u64 tsc_offset;
+
+Offset for the guest TSC, as it relates to the host's TSC.
+
+::
+
+	__u64 rsvd[6];
+  };
+
+4.131 KVM_SET_SYSTEM_COUNTER_STATE
+---------------------------
+
+:Capability: KVM_CAP_SYSTEM_COUNTER_STATE
+:Architectures: arm64, x86
+:Type: vcpu ioctl
+:Parameters: struct kvm_system_counter_state
+:Returns: 0 on success, < 0 on error.
+
+Allows the vCPU counter state to be written. For more details on the
+structure, see KVM_GET_SYSTEM_COUNTER_STATE above.
+
+ARM64
+
+VMMs should either use this ioctl *OR* directly write to the vCPU's
+CNTVCT_EL0 register. Mixing both methods of restoring counter state
+can cause drift between virtual CPUs.
+
+x86
+
+VMMs should either use this ioctl *OR* directly write to the vCPU's
+IA32_TSC register. Mixing both methods of restoring TSC state can
+cause drift between virtual CPUs.
+
 5. The kvm_run structure
 ========================
 
-- 
2.32.0.rc1.229.g3e70b5a671-goog

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
  2021-06-08 21:47   ` Oliver Upton
@ 2021-06-08 21:55     ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:55 UTC (permalink / raw)
  To: kvm list, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata

On Tue, Jun 8, 2021 at 4:48 PM Oliver Upton <oupton@google.com> wrote:
>
> ARMv8 provides for a virtual counter-timer offset that is added to guest
> views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not
> provided userspace with any perception of this, and instead affords a
> value-based scheme of migrating the virtual counter-timer by directly
> reading/writing the guest's CNTVCT_EL0. This is problematic because
> counters continue to elapse while the register is being written, meaning
> it is possible for drift to sneak in to the guest's time scale. This is
> exacerbated by the fact that KVM will calculate an appropriate
> CNTVOFF_EL2 every time the register is written, which will be broadcast
> to all virtual CPUs. The only possible way to avoid causing guest time
> to drift is to restore counter-timers by offset.
>
> Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
> of the virtual counter-timers to userspace, allowing it to define its
> own heuristics for managing vCPU offsets.
>
> Reviewed-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Jing Zhang <jingzhangos@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  5 +++++
>  arch/arm64/include/uapi/asm/kvm.h | 10 ++++++++++
>  arch/arm64/kvm/arch_timer.c       | 22 ++++++++++++++++++++++
>  arch/arm64/kvm/arm.c              | 25 +++++++++++++++++++++++++
>  4 files changed, 62 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7cd7d5c8c4bc..31107d5e61af 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -781,4 +781,9 @@ void __init kvm_hyp_reserve(void);
>  static inline void kvm_hyp_reserve(void) { }
>  #endif
>
> +int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
> +                                         struct kvm_system_counter_state *state);
> +int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
> +                                         struct kvm_system_counter_state *state);
> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 24223adae150..d3987089c524 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,6 +184,16 @@ struct kvm_vcpu_events {
>         __u32 reserved[12];
>  };
>
> +/* for KVM_{GET,SET}_SYSTEM_COUNTER_STATE */
> +struct kvm_system_counter_state {
> +       /* indicates what fields are valid in the structure */
> +       __u32 flags;
> +       __u32 pad;
> +       /* guest counter-timer offset, relative to host cntpct_el0 */
> +       __u64 cntvoff;
> +       __u64 rsvd[7];
> +};
> +
>  /* If you need to interpret the index values, here is the key: */
>  #define KVM_REG_ARM_COPROC_MASK                0x000000000FFF0000
>  #define KVM_REG_ARM_COPROC_SHIFT       16
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 74e0699661e9..955a7a183362 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -1259,3 +1259,25 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>
>         return -ENXIO;
>  }
> +
> +int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
> +                                         struct kvm_system_counter_state *state)
> +{
> +       if (state->flags)
> +               return -EINVAL;
> +
> +       state->cntvoff = timer_get_offset(vcpu_vtimer(vcpu));
> +
> +       return 0;
> +}
> +
> +int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
> +                                         struct kvm_system_counter_state *state)
> +{
> +       if (state->flags)
> +               return -EINVAL;
> +
> +       timer_set_offset(vcpu_vtimer(vcpu), state->cntvoff);

Adding some discussion that Ricardo and I had regarding this portion
of the patch:

Ricardo asks if it would make more sense to have the
KVM_SET_SYSTEM_COUNTER_STATE ioctl broadcast the counter offset to all
vCPUs, like we do for the value-based SET_REG() implementation. To me,
the broadcasting was more necessary for the value-based interface as
it is difficult/impossible to synchronize by value, but now some of
the onus to do the right thing might be on the VMM. No strong opinions
either way, so open to suggestions here.

--
Thanks,
Oliver

> +
> +       return 0;
> +}
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 1126eae27400..b78ffb4db9dd 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -207,6 +207,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>         case KVM_CAP_SET_GUEST_DEBUG:
>         case KVM_CAP_VCPU_ATTRIBUTES:
>         case KVM_CAP_PTP_KVM:
> +       case KVM_CAP_SYSTEM_COUNTER_STATE:
>                 r = 1;
>                 break;
>         case KVM_CAP_SET_GUEST_DEBUG2:
> @@ -1273,6 +1274,30 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>
>                 return kvm_arm_vcpu_finalize(vcpu, what);
>         }
> +       case KVM_GET_SYSTEM_COUNTER_STATE: {
> +               struct kvm_system_counter_state state;
> +
> +               if (copy_from_user(&state, argp, sizeof(state)))
> +                       return -EFAULT;
> +
> +               r = kvm_arm_vcpu_get_system_counter_state(vcpu, &state);
> +               if (r)
> +                       break;
> +
> +               if (copy_to_user(argp, &state, sizeof(state)))
> +                       return -EFAULT;
> +
> +               break;
> +       }
> +       case KVM_SET_SYSTEM_COUNTER_STATE: {
> +               struct kvm_system_counter_state state;
> +
> +               if (copy_from_user(&state, argp, sizeof(state)))
> +                       return -EFAULT;
> +
> +               r = kvm_arm_vcpu_set_system_counter_state(vcpu, &state);
> +               break;
> +       }
>         default:
>                 r = -EINVAL;
>         }
> --
> 2.32.0.rc1.229.g3e70b5a671-goog
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
@ 2021-06-08 21:55     ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:55 UTC (permalink / raw)
  To: kvm list, kvmarm
  Cc: Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

On Tue, Jun 8, 2021 at 4:48 PM Oliver Upton <oupton@google.com> wrote:
>
> ARMv8 provides for a virtual counter-timer offset that is added to guest
> views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not
> provided userspace with any perception of this, and instead affords a
> value-based scheme of migrating the virtual counter-timer by directly
> reading/writing the guest's CNTVCT_EL0. This is problematic because
> counters continue to elapse while the register is being written, meaning
> it is possible for drift to sneak in to the guest's time scale. This is
> exacerbated by the fact that KVM will calculate an appropriate
> CNTVOFF_EL2 every time the register is written, which will be broadcast
> to all virtual CPUs. The only possible way to avoid causing guest time
> to drift is to restore counter-timers by offset.
>
> Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
> of the virtual counter-timers to userspace, allowing it to define its
> own heuristics for managing vCPU offsets.
>
> Reviewed-by: Jim Mattson <jmattson@google.com>
> Reviewed-by: Jing Zhang <jingzhangos@google.com>
> Reviewed-by: Peter Shier <pshier@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h |  5 +++++
>  arch/arm64/include/uapi/asm/kvm.h | 10 ++++++++++
>  arch/arm64/kvm/arch_timer.c       | 22 ++++++++++++++++++++++
>  arch/arm64/kvm/arm.c              | 25 +++++++++++++++++++++++++
>  4 files changed, 62 insertions(+)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7cd7d5c8c4bc..31107d5e61af 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -781,4 +781,9 @@ void __init kvm_hyp_reserve(void);
>  static inline void kvm_hyp_reserve(void) { }
>  #endif
>
> +int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
> +                                         struct kvm_system_counter_state *state);
> +int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
> +                                         struct kvm_system_counter_state *state);
> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index 24223adae150..d3987089c524 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,6 +184,16 @@ struct kvm_vcpu_events {
>         __u32 reserved[12];
>  };
>
> +/* for KVM_{GET,SET}_SYSTEM_COUNTER_STATE */
> +struct kvm_system_counter_state {
> +       /* indicates what fields are valid in the structure */
> +       __u32 flags;
> +       __u32 pad;
> +       /* guest counter-timer offset, relative to host cntpct_el0 */
> +       __u64 cntvoff;
> +       __u64 rsvd[7];
> +};
> +
>  /* If you need to interpret the index values, here is the key: */
>  #define KVM_REG_ARM_COPROC_MASK                0x000000000FFF0000
>  #define KVM_REG_ARM_COPROC_SHIFT       16
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 74e0699661e9..955a7a183362 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -1259,3 +1259,25 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>
>         return -ENXIO;
>  }
> +
> +int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
> +                                         struct kvm_system_counter_state *state)
> +{
> +       if (state->flags)
> +               return -EINVAL;
> +
> +       state->cntvoff = timer_get_offset(vcpu_vtimer(vcpu));
> +
> +       return 0;
> +}
> +
> +int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
> +                                         struct kvm_system_counter_state *state)
> +{
> +       if (state->flags)
> +               return -EINVAL;
> +
> +       timer_set_offset(vcpu_vtimer(vcpu), state->cntvoff);

Adding some discussion that Ricardo and I had regarding this portion
of the patch:

Ricardo asks if it would make more sense to have the
KVM_SET_SYSTEM_COUNTER_STATE ioctl broadcast the counter offset to all
vCPUs, like we do for the value-based SET_REG() implementation. To me,
the broadcasting was more necessary for the value-based interface as
it is difficult/impossible to synchronize by value, but now some of
the onus to do the right thing might be on the VMM. No strong opinions
either way, so open to suggestions here.

--
Thanks,
Oliver

> +
> +       return 0;
> +}
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 1126eae27400..b78ffb4db9dd 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -207,6 +207,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>         case KVM_CAP_SET_GUEST_DEBUG:
>         case KVM_CAP_VCPU_ATTRIBUTES:
>         case KVM_CAP_PTP_KVM:
> +       case KVM_CAP_SYSTEM_COUNTER_STATE:
>                 r = 1;
>                 break;
>         case KVM_CAP_SET_GUEST_DEBUG2:
> @@ -1273,6 +1274,30 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>
>                 return kvm_arm_vcpu_finalize(vcpu, what);
>         }
> +       case KVM_GET_SYSTEM_COUNTER_STATE: {
> +               struct kvm_system_counter_state state;
> +
> +               if (copy_from_user(&state, argp, sizeof(state)))
> +                       return -EFAULT;
> +
> +               r = kvm_arm_vcpu_get_system_counter_state(vcpu, &state);
> +               if (r)
> +                       break;
> +
> +               if (copy_to_user(argp, &state, sizeof(state)))
> +                       return -EFAULT;
> +
> +               break;
> +       }
> +       case KVM_SET_SYSTEM_COUNTER_STATE: {
> +               struct kvm_system_counter_state state;
> +
> +               if (copy_from_user(&state, argp, sizeof(state)))
> +                       return -EFAULT;
> +
> +               r = kvm_arm_vcpu_set_system_counter_state(vcpu, &state);
> +               break;
> +       }
>         default:
>                 r = -EINVAL;
>         }
> --
> 2.32.0.rc1.229.g3e70b5a671-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/10] KVM: arm64: Add userspace control of the guest's physical counter
  2021-06-08 21:47   ` Oliver Upton
@ 2021-06-08 21:58     ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:58 UTC (permalink / raw)
  To: kvm list, kvmarm
  Cc: Paolo Bonzini, Sean Christopherson, Marc Zyngier, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata

On Tue, Jun 8, 2021 at 4:48 PM Oliver Upton <oupton@google.com> wrote:
>
> ARMv8.6 adds an extension to the architecture providing hypervisors with
> more extensive controls of the guest's counters. A particularly
> interesting control is CNTPOFF_EL2, a fixed offset subtracted from the
> physical counter value to derive the guest's value. VMMs that live
> migrate their guests may be particularly interested in this feature in
> order to provide a consistent view of the physical counter across live
> migrations.
>
> In the interim, KVM can emulate this behavior by simply enabling traps
> on CNTPCT_EL0 and subtracting an offset.
>
> Add a new field to kvm_system_counter_state allowing a VMM to request an
> offset to the physical counter. If this offset is nonzero, enable traps
> on CNTPCT_EL0. Emulate guest reads to the register in the fast path to
> keep counter reads reasonably performant, avoiding a full exit from the
> guest.
>
> Reviewed-by: Peter Shier <pshier@google.com>
> Reviewed-by: Ricardo Koller <ricarkol@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h       |  1 +
>  arch/arm64/include/asm/sysreg.h         |  1 +
>  arch/arm64/include/uapi/asm/kvm.h       |  9 +++-
>  arch/arm64/kvm/arch_timer.c             | 66 +++++++++++++++++++++++--
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 31 ++++++++++++
>  arch/arm64/kvm/hyp/nvhe/timer-sr.c      | 16 ++++--
>  6 files changed, 117 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 31107d5e61af..a3abafcea328 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -200,6 +200,7 @@ enum vcpu_sysreg {
>         SP_EL1,
>         SPSR_EL1,
>
> +       CNTPOFF_EL2,
>         CNTVOFF_EL2,
>         CNTV_CVAL_EL0,
>         CNTV_CTL_EL0,
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 65d15700a168..193da426690a 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -505,6 +505,7 @@
>  #define SYS_AMEVCNTR0_MEM_STALL                SYS_AMEVCNTR0_EL0(3)
>
>  #define SYS_CNTFRQ_EL0                 sys_reg(3, 3, 14, 0, 0)
> +#define SYS_CNTPCT_EL0                 sys_reg(3, 3, 14, 0, 1)
>
>  #define SYS_CNTP_TVAL_EL0              sys_reg(3, 3, 14, 2, 0)
>  #define SYS_CNTP_CTL_EL0               sys_reg(3, 3, 14, 2, 1)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index d3987089c524..ee709e2f0292 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,6 +184,8 @@ struct kvm_vcpu_events {
>         __u32 reserved[12];
>  };
>
> +#define KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET   (1ul << 0)
> +
>  /* for KVM_{GET,SET}_SYSTEM_COUNTER_STATE */
>  struct kvm_system_counter_state {
>         /* indicates what fields are valid in the structure */
> @@ -191,7 +193,12 @@ struct kvm_system_counter_state {
>         __u32 pad;
>         /* guest counter-timer offset, relative to host cntpct_el0 */
>         __u64 cntvoff;
> -       __u64 rsvd[7];
> +       /*
> +        * Guest physical counter-timer offset, relative to host cntpct_el0.
> +        * Valid when KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET is set.
> +        */
> +       __u64 cntpoff;
> +       __u64 rsvd[6];
>  };
>
>  /* If you need to interpret the index values, here is the key: */
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 955a7a183362..a74642d1515f 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -50,6 +50,7 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>                               struct arch_timer_context *timer,
>                               enum kvm_arch_timer_regs treg);
> +static bool kvm_timer_emulation_required(struct arch_timer_context *ctx);
>
>  u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  {
> @@ -86,6 +87,8 @@ static u64 timer_get_offset(struct arch_timer_context *ctxt)
>         struct kvm_vcpu *vcpu = ctxt->vcpu;
>
>         switch(arch_timer_ctx_index(ctxt)) {
> +       case TIMER_PTIMER:
> +               return __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
>         case TIMER_VTIMER:
>                 return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
>         default:
> @@ -130,6 +133,9 @@ static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
>         struct kvm_vcpu *vcpu = ctxt->vcpu;
>
>         switch(arch_timer_ctx_index(ctxt)) {
> +       case TIMER_PTIMER:
> +               __vcpu_sys_reg(vcpu, CNTPOFF_EL2) = offset;
> +               break;
>         case TIMER_VTIMER:
>                 __vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
>                 break;
> @@ -145,7 +151,7 @@ u64 kvm_phys_timer_read(void)
>
>  static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
>  {
> -       if (has_vhe()) {
> +       if (has_vhe() && !kvm_timer_emulation_required(vcpu_ptimer(vcpu))) {
>                 map->direct_vtimer = vcpu_vtimer(vcpu);
>                 map->direct_ptimer = vcpu_ptimer(vcpu);
>                 map->emul_ptimer = NULL;
> @@ -746,6 +752,30 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>         return 0;
>  }
>
> +bool kvm_timer_emulation_required(struct arch_timer_context *ctx)
> +{
> +       int idx = arch_timer_ctx_index(ctx);
> +
> +       switch (idx) {
> +       /*
> +        * hardware doesn't support offsetting of the physical counter/timer.
> +        * If offsetting is requested, enable emulation of the physical
> +        * counter/timer.
> +        */
> +       case TIMER_PTIMER:
> +               return timer_get_offset(ctx);
> +       /*
> +        * Conversely, hardware does support offsetting of the virtual
> +        * counter/timer.
> +        */
> +       case TIMER_VTIMER:
> +               return false;
> +       default:
> +               WARN_ON(1);
> +               return false;
> +       }
> +}
> +
>  /* Make the updates of cntvoff for all vtimer contexts atomic */
>  static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
>  {
> @@ -1184,6 +1214,24 @@ void kvm_timer_init_vhe(void)
>         write_sysreg(val, cnthctl_el2);
>  }
>
> +static void kvm_timer_update_traps_vhe(struct kvm_vcpu *vcpu)
> +{
> +       u32 cnthctl_shift = 10;
> +       u64 val;
> +
> +       if (!kvm_timer_emulation_required(vcpu_ptimer(vcpu)))
> +               return;
> +
> +       /*
> +        * We must trap accesses to the physical counter/timer to emulate the
> +        * nonzero offset.
> +        */
> +       val = read_sysreg(cnthctl_el2);
> +       val &= ~(CNTHCTL_EL1PCEN << cnthctl_shift);
> +       val &= ~(CNTHCTL_EL1PCTEN << cnthctl_shift);
> +       write_sysreg(val, cnthctl_el2);
> +}
> +
>  static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
>  {
>         struct kvm_vcpu *vcpu;
> @@ -1260,24 +1308,36 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>         return -ENXIO;
>  }
>
> +#define KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS       \
> +               (KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
> +
>  int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
>                                           struct kvm_system_counter_state *state)
>  {
> -       if (state->flags)
> +       if (state->flags & ~KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS)
>                 return -EINVAL;
>
>         state->cntvoff = timer_get_offset(vcpu_vtimer(vcpu));
>
> +       if (state->flags & KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
> +               state->cntpoff = timer_get_offset(vcpu_ptimer(vcpu));
> +
>         return 0;
>  }
>
>  int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
>                                           struct kvm_system_counter_state *state)
>  {
> -       if (state->flags)
> +       if (state->flags & ~KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS)
>                 return -EINVAL;
>
>         timer_set_offset(vcpu_vtimer(vcpu), state->cntvoff);
>
> +       if (state->flags & KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
> +               timer_set_offset(vcpu_ptimer(vcpu), state->cntpoff);
> +
> +       if (has_vhe())
> +               kvm_timer_update_traps_vhe(vcpu);
> +
>         return 0;
>  }
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index e4a2f295a394..12ada31e12e2 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -287,6 +287,30 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
>         return true;
>  }
>
> +static inline u64 __hyp_read_cntpct(struct kvm_vcpu *vcpu)
> +{
> +       return read_sysreg(cntpct_el0) - __vcpu_sys_reg(vcpu, CNTPOFF_EL2);

Question for those with much more experience on the ARM side: is there
any decent way to infer the counter bit-width to properly emulate
here? This code is problematic when migrating a narrower guest (i.e.
56-bit counter) to a wider host (say an 8.6 implementation with a
64-bit counter). Otherwise, it would seem that userspace needs to
explicitly request a counter width.

--
Thanks,
Oliver

> +}
> +
> +static inline bool __hyp_handle_counter(struct kvm_vcpu *vcpu)
> +{
> +       u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
> +       int rt = kvm_vcpu_sys_get_rt(vcpu);
> +       u64 rv;
> +
> +       switch (sysreg) {
> +       case SYS_CNTPCT_EL0:
> +               rv = __hyp_read_cntpct(vcpu);
> +               break;
> +       default:
> +               return false;
> +       }
> +
> +       vcpu_set_reg(vcpu, rt, rv);
> +       __kvm_skip_instr(vcpu);
> +       return true;
> +}
> +
>  static inline bool handle_tx2_tvm(struct kvm_vcpu *vcpu)
>  {
>         u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
> @@ -439,6 +463,13 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>         if (*exit_code != ARM_EXCEPTION_TRAP)
>                 goto exit;
>
> +       /*
> +        * We trap acesses to the physical counter value register (CNTPCT_EL0)
> +        * if userspace has requested a physical counter offset.
> +        */
> +       if (__hyp_handle_counter(vcpu))
> +               goto guest;
> +
>         if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) &&
>             kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 &&
>             handle_tx2_tvm(vcpu))
> diff --git a/arch/arm64/kvm/hyp/nvhe/timer-sr.c b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
> index 9072e71693ba..1b8e6e47a4ea 100644
> --- a/arch/arm64/kvm/hyp/nvhe/timer-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
> @@ -35,14 +35,24 @@ void __timer_disable_traps(struct kvm_vcpu *vcpu)
>   */
>  void __timer_enable_traps(struct kvm_vcpu *vcpu)
>  {
> -       u64 val;
> +       u64 val, cntpoff;
> +
> +       cntpoff = __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
>
>         /*
>          * Disallow physical timer access for the guest
> -        * Physical counter access is allowed
>          */
>         val = read_sysreg(cnthctl_el2);
>         val &= ~CNTHCTL_EL1PCEN;
> -       val |= CNTHCTL_EL1PCTEN;
> +
> +       /*
> +        * Disallow physical counter access for the guest if offsetting is
> +        * requested.
> +        */
> +       if (cntpoff)
> +               val &= ~CNTHCTL_EL1PCTEN;
> +       else
> +               val |= CNTHCTL_EL1PCTEN;
> +
>         write_sysreg(val, cnthctl_el2);
>  }
> --
> 2.32.0.rc1.229.g3e70b5a671-goog
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/10] KVM: arm64: Add userspace control of the guest's physical counter
@ 2021-06-08 21:58     ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-08 21:58 UTC (permalink / raw)
  To: kvm list, kvmarm
  Cc: Marc Zyngier, Raghavendra Rao Anata, Peter Shier,
	Sean Christopherson, David Matlack, Paolo Bonzini, Jim Mattson

On Tue, Jun 8, 2021 at 4:48 PM Oliver Upton <oupton@google.com> wrote:
>
> ARMv8.6 adds an extension to the architecture providing hypervisors with
> more extensive controls of the guest's counters. A particularly
> interesting control is CNTPOFF_EL2, a fixed offset subtracted from the
> physical counter value to derive the guest's value. VMMs that live
> migrate their guests may be particularly interested in this feature in
> order to provide a consistent view of the physical counter across live
> migrations.
>
> In the interim, KVM can emulate this behavior by simply enabling traps
> on CNTPCT_EL0 and subtracting an offset.
>
> Add a new field to kvm_system_counter_state allowing a VMM to request an
> offset to the physical counter. If this offset is nonzero, enable traps
> on CNTPCT_EL0. Emulate guest reads to the register in the fast path to
> keep counter reads reasonably performant, avoiding a full exit from the
> guest.
>
> Reviewed-by: Peter Shier <pshier@google.com>
> Reviewed-by: Ricardo Koller <ricarkol@google.com>
> Signed-off-by: Oliver Upton <oupton@google.com>
> ---
>  arch/arm64/include/asm/kvm_host.h       |  1 +
>  arch/arm64/include/asm/sysreg.h         |  1 +
>  arch/arm64/include/uapi/asm/kvm.h       |  9 +++-
>  arch/arm64/kvm/arch_timer.c             | 66 +++++++++++++++++++++++--
>  arch/arm64/kvm/hyp/include/hyp/switch.h | 31 ++++++++++++
>  arch/arm64/kvm/hyp/nvhe/timer-sr.c      | 16 ++++--
>  6 files changed, 117 insertions(+), 7 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 31107d5e61af..a3abafcea328 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -200,6 +200,7 @@ enum vcpu_sysreg {
>         SP_EL1,
>         SPSR_EL1,
>
> +       CNTPOFF_EL2,
>         CNTVOFF_EL2,
>         CNTV_CVAL_EL0,
>         CNTV_CTL_EL0,
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 65d15700a168..193da426690a 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -505,6 +505,7 @@
>  #define SYS_AMEVCNTR0_MEM_STALL                SYS_AMEVCNTR0_EL0(3)
>
>  #define SYS_CNTFRQ_EL0                 sys_reg(3, 3, 14, 0, 0)
> +#define SYS_CNTPCT_EL0                 sys_reg(3, 3, 14, 0, 1)
>
>  #define SYS_CNTP_TVAL_EL0              sys_reg(3, 3, 14, 2, 0)
>  #define SYS_CNTP_CTL_EL0               sys_reg(3, 3, 14, 2, 1)
> diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
> index d3987089c524..ee709e2f0292 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -184,6 +184,8 @@ struct kvm_vcpu_events {
>         __u32 reserved[12];
>  };
>
> +#define KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET   (1ul << 0)
> +
>  /* for KVM_{GET,SET}_SYSTEM_COUNTER_STATE */
>  struct kvm_system_counter_state {
>         /* indicates what fields are valid in the structure */
> @@ -191,7 +193,12 @@ struct kvm_system_counter_state {
>         __u32 pad;
>         /* guest counter-timer offset, relative to host cntpct_el0 */
>         __u64 cntvoff;
> -       __u64 rsvd[7];
> +       /*
> +        * Guest physical counter-timer offset, relative to host cntpct_el0.
> +        * Valid when KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET is set.
> +        */
> +       __u64 cntpoff;
> +       __u64 rsvd[6];
>  };
>
>  /* If you need to interpret the index values, here is the key: */
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 955a7a183362..a74642d1515f 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -50,6 +50,7 @@ static void kvm_arm_timer_write(struct kvm_vcpu *vcpu,
>  static u64 kvm_arm_timer_read(struct kvm_vcpu *vcpu,
>                               struct arch_timer_context *timer,
>                               enum kvm_arch_timer_regs treg);
> +static bool kvm_timer_emulation_required(struct arch_timer_context *ctx);
>
>  u32 timer_get_ctl(struct arch_timer_context *ctxt)
>  {
> @@ -86,6 +87,8 @@ static u64 timer_get_offset(struct arch_timer_context *ctxt)
>         struct kvm_vcpu *vcpu = ctxt->vcpu;
>
>         switch(arch_timer_ctx_index(ctxt)) {
> +       case TIMER_PTIMER:
> +               return __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
>         case TIMER_VTIMER:
>                 return __vcpu_sys_reg(vcpu, CNTVOFF_EL2);
>         default:
> @@ -130,6 +133,9 @@ static void timer_set_offset(struct arch_timer_context *ctxt, u64 offset)
>         struct kvm_vcpu *vcpu = ctxt->vcpu;
>
>         switch(arch_timer_ctx_index(ctxt)) {
> +       case TIMER_PTIMER:
> +               __vcpu_sys_reg(vcpu, CNTPOFF_EL2) = offset;
> +               break;
>         case TIMER_VTIMER:
>                 __vcpu_sys_reg(vcpu, CNTVOFF_EL2) = offset;
>                 break;
> @@ -145,7 +151,7 @@ u64 kvm_phys_timer_read(void)
>
>  static void get_timer_map(struct kvm_vcpu *vcpu, struct timer_map *map)
>  {
> -       if (has_vhe()) {
> +       if (has_vhe() && !kvm_timer_emulation_required(vcpu_ptimer(vcpu))) {
>                 map->direct_vtimer = vcpu_vtimer(vcpu);
>                 map->direct_ptimer = vcpu_ptimer(vcpu);
>                 map->emul_ptimer = NULL;
> @@ -746,6 +752,30 @@ int kvm_timer_vcpu_reset(struct kvm_vcpu *vcpu)
>         return 0;
>  }
>
> +bool kvm_timer_emulation_required(struct arch_timer_context *ctx)
> +{
> +       int idx = arch_timer_ctx_index(ctx);
> +
> +       switch (idx) {
> +       /*
> +        * hardware doesn't support offsetting of the physical counter/timer.
> +        * If offsetting is requested, enable emulation of the physical
> +        * counter/timer.
> +        */
> +       case TIMER_PTIMER:
> +               return timer_get_offset(ctx);
> +       /*
> +        * Conversely, hardware does support offsetting of the virtual
> +        * counter/timer.
> +        */
> +       case TIMER_VTIMER:
> +               return false;
> +       default:
> +               WARN_ON(1);
> +               return false;
> +       }
> +}
> +
>  /* Make the updates of cntvoff for all vtimer contexts atomic */
>  static void update_vtimer_cntvoff(struct kvm_vcpu *vcpu, u64 cntvoff)
>  {
> @@ -1184,6 +1214,24 @@ void kvm_timer_init_vhe(void)
>         write_sysreg(val, cnthctl_el2);
>  }
>
> +static void kvm_timer_update_traps_vhe(struct kvm_vcpu *vcpu)
> +{
> +       u32 cnthctl_shift = 10;
> +       u64 val;
> +
> +       if (!kvm_timer_emulation_required(vcpu_ptimer(vcpu)))
> +               return;
> +
> +       /*
> +        * We must trap accesses to the physical counter/timer to emulate the
> +        * nonzero offset.
> +        */
> +       val = read_sysreg(cnthctl_el2);
> +       val &= ~(CNTHCTL_EL1PCEN << cnthctl_shift);
> +       val &= ~(CNTHCTL_EL1PCTEN << cnthctl_shift);
> +       write_sysreg(val, cnthctl_el2);
> +}
> +
>  static void set_timer_irqs(struct kvm *kvm, int vtimer_irq, int ptimer_irq)
>  {
>         struct kvm_vcpu *vcpu;
> @@ -1260,24 +1308,36 @@ int kvm_arm_timer_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
>         return -ENXIO;
>  }
>
> +#define KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS       \
> +               (KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
> +
>  int kvm_arm_vcpu_get_system_counter_state(struct kvm_vcpu *vcpu,
>                                           struct kvm_system_counter_state *state)
>  {
> -       if (state->flags)
> +       if (state->flags & ~KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS)
>                 return -EINVAL;
>
>         state->cntvoff = timer_get_offset(vcpu_vtimer(vcpu));
>
> +       if (state->flags & KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
> +               state->cntpoff = timer_get_offset(vcpu_ptimer(vcpu));
> +
>         return 0;
>  }
>
>  int kvm_arm_vcpu_set_system_counter_state(struct kvm_vcpu *vcpu,
>                                           struct kvm_system_counter_state *state)
>  {
> -       if (state->flags)
> +       if (state->flags & ~KVM_SYSTEM_COUNTER_STATE_VALID_FLAG_BITS)
>                 return -EINVAL;
>
>         timer_set_offset(vcpu_vtimer(vcpu), state->cntvoff);
>
> +       if (state->flags & KVM_SYSTEM_COUNTER_STATE_PHYS_OFFSET)
> +               timer_set_offset(vcpu_ptimer(vcpu), state->cntpoff);
> +
> +       if (has_vhe())
> +               kvm_timer_update_traps_vhe(vcpu);
> +
>         return 0;
>  }
> diff --git a/arch/arm64/kvm/hyp/include/hyp/switch.h b/arch/arm64/kvm/hyp/include/hyp/switch.h
> index e4a2f295a394..12ada31e12e2 100644
> --- a/arch/arm64/kvm/hyp/include/hyp/switch.h
> +++ b/arch/arm64/kvm/hyp/include/hyp/switch.h
> @@ -287,6 +287,30 @@ static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu)
>         return true;
>  }
>
> +static inline u64 __hyp_read_cntpct(struct kvm_vcpu *vcpu)
> +{
> +       return read_sysreg(cntpct_el0) - __vcpu_sys_reg(vcpu, CNTPOFF_EL2);

Question for those with much more experience on the ARM side: is there
any decent way to infer the counter bit-width to properly emulate
here? This code is problematic when migrating a narrower guest (i.e.
56-bit counter) to a wider host (say an 8.6 implementation with a
64-bit counter). Otherwise, it would seem that userspace needs to
explicitly request a counter width.

--
Thanks,
Oliver

> +}
> +
> +static inline bool __hyp_handle_counter(struct kvm_vcpu *vcpu)
> +{
> +       u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
> +       int rt = kvm_vcpu_sys_get_rt(vcpu);
> +       u64 rv;
> +
> +       switch (sysreg) {
> +       case SYS_CNTPCT_EL0:
> +               rv = __hyp_read_cntpct(vcpu);
> +               break;
> +       default:
> +               return false;
> +       }
> +
> +       vcpu_set_reg(vcpu, rt, rv);
> +       __kvm_skip_instr(vcpu);
> +       return true;
> +}
> +
>  static inline bool handle_tx2_tvm(struct kvm_vcpu *vcpu)
>  {
>         u32 sysreg = esr_sys64_to_sysreg(kvm_vcpu_get_esr(vcpu));
> @@ -439,6 +463,13 @@ static inline bool fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code)
>         if (*exit_code != ARM_EXCEPTION_TRAP)
>                 goto exit;
>
> +       /*
> +        * We trap acesses to the physical counter value register (CNTPCT_EL0)
> +        * if userspace has requested a physical counter offset.
> +        */
> +       if (__hyp_handle_counter(vcpu))
> +               goto guest;
> +
>         if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) &&
>             kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 &&
>             handle_tx2_tvm(vcpu))
> diff --git a/arch/arm64/kvm/hyp/nvhe/timer-sr.c b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
> index 9072e71693ba..1b8e6e47a4ea 100644
> --- a/arch/arm64/kvm/hyp/nvhe/timer-sr.c
> +++ b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
> @@ -35,14 +35,24 @@ void __timer_disable_traps(struct kvm_vcpu *vcpu)
>   */
>  void __timer_enable_traps(struct kvm_vcpu *vcpu)
>  {
> -       u64 val;
> +       u64 val, cntpoff;
> +
> +       cntpoff = __vcpu_sys_reg(vcpu, CNTPOFF_EL2);
>
>         /*
>          * Disallow physical timer access for the guest
> -        * Physical counter access is allowed
>          */
>         val = read_sysreg(cnthctl_el2);
>         val &= ~CNTHCTL_EL1PCEN;
> -       val |= CNTHCTL_EL1PCTEN;
> +
> +       /*
> +        * Disallow physical counter access for the guest if offsetting is
> +        * requested.
> +        */
> +       if (cntpoff)
> +               val &= ~CNTHCTL_EL1PCTEN;
> +       else
> +               val |= CNTHCTL_EL1PCTEN;
> +
>         write_sysreg(val, cnthctl_el2);
>  }
> --
> 2.32.0.rc1.229.g3e70b5a671-goog
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
  2021-06-08 21:47   ` Oliver Upton
@ 2021-06-09 10:23     ` Marc Zyngier
  -1 siblings, 0 replies; 46+ messages in thread
From: Marc Zyngier @ 2021-06-09 10:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, kvmarm, Paolo Bonzini, Sean Christopherson, Peter Shier,
	Jim Mattson, David Matlack, Ricardo Koller, Jing Zhang,
	Raghavendra Rao Anata, Alexandru Elisei, James Morse,
	Suzuki K Poulose

Hi Oliver,

Please Cc the KVM/arm64 reviewers (now added). Also, please consider
subscribing to the kvmarm mailing list so that I don't have to
manually approve your posts ;-).

On Tue, 08 Jun 2021 22:47:34 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> ARMv8 provides for a virtual counter-timer offset that is added to guest
> views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not
> provided userspace with any perception of this, and instead affords a
> value-based scheme of migrating the virtual counter-timer by directly
> reading/writing the guest's CNTVCT_EL0. This is problematic because
> counters continue to elapse while the register is being written, meaning
> it is possible for drift to sneak in to the guest's time scale. This is
> exacerbated by the fact that KVM will calculate an appropriate
> CNTVOFF_EL2 every time the register is written, which will be broadcast
> to all virtual CPUs. The only possible way to avoid causing guest time
> to drift is to restore counter-timers by offset.

Well, the current method has one huge advantage: time can never go
backward from the guest PoV if you restore what you have saved. Yes,
time can elapse, but you don't even need to migrate to observe that.

>
> Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
> of the virtual counter-timers to userspace, allowing it to define its
> own heuristics for managing vCPU offsets.

I'm not really in favour of inventing a completely new API, for
multiple reasons:

- CNTVOFF is an EL2 concept. I'd rather not expose it as such as it
  becomes really confusing with NV (which does expose its own CNTVOFF
  via the ONE_REG interface)

- You seem to allow each vcpu to get its own offset. I don't think
  that's right. The architecture defines that all PEs have the same
  view of the counters, and an EL1 guest should be given that
  illusion.

- by having a parallel save/restore interface, you make it harder to
  reason about what happens with concurrent calls to both interfaces

- the userspace API is already horribly bloated, and I'm not overly
  keen on adding more if we can avoid it.

I'd rather you extend the current ONE_REG interface and make it modal,
either allowing the restore of an absolute value or an offset for
CNTVCT_EL0. This would also keep a consistent behaviour when restoring
vcpus. The same logic would apply to the physical offset.

As for how to make it modal, we have plenty of bits left in the
ONE_REG encoding. Pick one, and make that a "relative" attribute. This
will result in some minor surgery in the get/set code paths, but at
least no entirely new mechanism.

One question though: how do you plan to reliably compute the offset?
As far as I can see, it is subject to the same issues you described
above (while the guest is being restored, time flies), and you have
the added risk of exposing a counter going backward from a guest
perspective.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
@ 2021-06-09 10:23     ` Marc Zyngier
  0 siblings, 0 replies; 46+ messages in thread
From: Marc Zyngier @ 2021-06-09 10:23 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm, Sean Christopherson, Peter Shier, Raghavendra Rao Anata,
	David Matlack, Paolo Bonzini, kvmarm, Jim Mattson

Hi Oliver,

Please Cc the KVM/arm64 reviewers (now added). Also, please consider
subscribing to the kvmarm mailing list so that I don't have to
manually approve your posts ;-).

On Tue, 08 Jun 2021 22:47:34 +0100,
Oliver Upton <oupton@google.com> wrote:
> 
> ARMv8 provides for a virtual counter-timer offset that is added to guest
> views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not
> provided userspace with any perception of this, and instead affords a
> value-based scheme of migrating the virtual counter-timer by directly
> reading/writing the guest's CNTVCT_EL0. This is problematic because
> counters continue to elapse while the register is being written, meaning
> it is possible for drift to sneak in to the guest's time scale. This is
> exacerbated by the fact that KVM will calculate an appropriate
> CNTVOFF_EL2 every time the register is written, which will be broadcast
> to all virtual CPUs. The only possible way to avoid causing guest time
> to drift is to restore counter-timers by offset.

Well, the current method has one huge advantage: time can never go
backward from the guest PoV if you restore what you have saved. Yes,
time can elapse, but you don't even need to migrate to observe that.

>
> Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
> of the virtual counter-timers to userspace, allowing it to define its
> own heuristics for managing vCPU offsets.

I'm not really in favour of inventing a completely new API, for
multiple reasons:

- CNTVOFF is an EL2 concept. I'd rather not expose it as such as it
  becomes really confusing with NV (which does expose its own CNTVOFF
  via the ONE_REG interface)

- You seem to allow each vcpu to get its own offset. I don't think
  that's right. The architecture defines that all PEs have the same
  view of the counters, and an EL1 guest should be given that
  illusion.

- by having a parallel save/restore interface, you make it harder to
  reason about what happens with concurrent calls to both interfaces

- the userspace API is already horribly bloated, and I'm not overly
  keen on adding more if we can avoid it.

I'd rather you extend the current ONE_REG interface and make it modal,
either allowing the restore of an absolute value or an offset for
CNTVCT_EL0. This would also keep a consistent behaviour when restoring
vcpus. The same logic would apply to the physical offset.

As for how to make it modal, we have plenty of bits left in the
ONE_REG encoding. Pick one, and make that a "relative" attribute. This
will result in some minor surgery in the get/set code paths, but at
least no entirely new mechanism.

One question though: how do you plan to reliably compute the offset?
As far as I can see, it is subject to the same issues you described
above (while the guest is being restored, time flies), and you have
the added risk of exposing a counter going backward from a guest
perspective.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
  2021-06-08 21:47 ` Oliver Upton
@ 2021-06-09 13:05   ` Paolo Bonzini
  -1 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-09 13:05 UTC (permalink / raw)
  To: Oliver Upton, kvm, kvmarm, Maxim Levitsky
  Cc: Sean Christopherson, Marc Zyngier, Peter Shier, Jim Mattson,
	David Matlack, Ricardo Koller, Jing Zhang, Raghavendra Rao Anata

On 08/06/21 23:47, Oliver Upton wrote:
> KVM's current means of saving/restoring system counters is plagued with
> temporal issues. At least on ARM64 and x86, we migrate the guest's
> system counter by-value through the respective guest system register
> values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> brittle as the state is not idempotent: the host system counter is still
> oscillating between the attempted save and restore. Furthermore, VMMs
> may wish to transparently live migrate guest VMs, meaning that they
> include the elapsed time due to live migration blackout in the guest
> system counter view. The VMM thread could be preempted for any number of
> reasons (scheduler, L0 hypervisor under nested) between the time that
> it calculates the desired guest counter value and when KVM actually sets
> this counter state.
> 
> Despite the value-based interface that we present to userspace, KVM
> actually has idempotent guest controls by way of system counter offsets.
> We can avoid all of the issues associated with a value-based interface
> by abstracting these offset controls in new ioctls. This series
> introduces KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls, meant to provide
> userspace with idempotent controls of the guest system counter.

Hi Oliver,

I wonder how this compares to the idea of initializing the TSC via a 
synchronized (nanoseconds, TSC) pair. 
(https://lore.kernel.org/r/20201130133559.233242-2-mlevitsk@redhat.com), 
and whether it makes sense to apply that idea to ARM as well.  If so, it 
certainly is a good idea to use the same capability and ioctl, even 
though the details of the struct would be architecture-dependent.

In your patches there isn't much architecture dependency in struct 
kvm_system_counter_state.  However,  Maxim's also added an 
MSR_IA32_TSC_ADJUST value to the struct, thus ensuring that the host 
could write not just an arbitrary TSC value, but also tie it to an 
arbitrary MSR_IA32_TSC_ADJUST value.  Specifying both in the same ioctl 
simplifies the userspace API.

Paolo

> Patch 1 defines the ioctls, and was separated from the two provided
> implementations for the sake of review. If it is more intuitive, this
> patch can be squashed into the implementation commit.
> 
> Patch 2 realizes initial support for ARM64, migrating only the state
> associated with the guest's virtual counter-timer. Patch 3 introduces a
> KVM selftest to assert that userspace manipulation via the
> aforementioned ioctls produces the expected system counter values within
> the guest.
> 
> Patch 4 extends upon the ARM64 implementation by adding support for
> physical counter-timer offsetting. This is currently backed by a
> trap-and-emulate implementation, but can also be virtualized in hardware
> that fully implements ARMv8.6-ECV. ECV support has been elided from this
> series out of convenience for the author :) Patch 5 adds some test cases
> to the newly-minted kvm selftest to validate expectations of physical
> counter-timer emulation.
> 
> Patch 6 introduces yet another KVM selftest for aarch64, intended to
> measure the effects of physical counter-timer emulation. Data for this
> test can be found below, but basically there is some tradeoff of
> overhead for the sake of correctness, but it isn't too bad.
> 
> Patches 7-8 add support for the ioctls to x86 by shoehorning the
> controls into the pre-existing synchronization heuristics. Patch 7
> provides necessary helper methods for the implementation to play nice
> with those heuristics, and patch 8 actually implements the ioctls.
> 
> Patch 9 adds x86 test cases to the system counter KVM selftest. Lastly,
> patch 10 documents the ioctls for both x86 and arm64.
> 
> All patches apply cleanly to kvm/next at the following commit:
> 
> a4345a7cecfb ("Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
> 
> Physical counter benchmark
> --------------------------
> 
> The following data was collected by running 10000 iterations of the
> benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
> machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> parameter.
> 
> nVHE
> ----
> 
> +--------------------+--------+---------+
> |       Metric       | Native | Trapped |
> +--------------------+--------+---------+
> | Average            | 54ns   | 148ns   |
> | Standard Deviation | 124ns  | 122ns   |
> | 95th Percentile    | 258ns  | 348ns   |
> +--------------------+--------+---------+
> 
> VHE
> ---
> 
> +--------------------+--------+---------+
> |       Metric       | Native | Trapped |
> +--------------------+--------+---------+
> | Average            | 53ns   | 152ns   |
> | Standard Deviation | 92ns   | 94ns    |
> | 95th Percentile    | 204ns  | 307ns   |
> +--------------------+--------+---------+
> 
> Oliver Upton (10):
>    KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
>    KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
>    selftests: KVM: Introduce system_counter_state_test
>    KVM: arm64: Add userspace control of the guest's physical counter
>    selftests: KVM: Add test cases for physical counter offsetting
>    selftests: KVM: Add counter emulation benchmark
>    KVM: x86: Refactor tsc synchronization code
>    KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
>    selftests: KVM: Add support for x86 to system_counter_state_test
>    Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> 
>   Documentation/virt/kvm/api.rst                |  98 +++++++
>   Documentation/virt/kvm/locking.rst            |  11 +
>   arch/arm64/include/asm/kvm_host.h             |   6 +
>   arch/arm64/include/asm/sysreg.h               |   1 +
>   arch/arm64/include/uapi/asm/kvm.h             |  17 ++
>   arch/arm64/kvm/arch_timer.c                   |  84 +++++-
>   arch/arm64/kvm/arm.c                          |  25 ++
>   arch/arm64/kvm/hyp/include/hyp/switch.h       |  31 +++
>   arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
>   arch/x86/include/asm/kvm_host.h               |   1 +
>   arch/x86/include/uapi/asm/kvm.h               |   8 +
>   arch/x86/kvm/x86.c                            | 176 +++++++++---
>   include/uapi/linux/kvm.h                      |   5 +
>   tools/testing/selftests/kvm/.gitignore        |   2 +
>   tools/testing/selftests/kvm/Makefile          |   3 +
>   .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++
>   .../selftests/kvm/include/aarch64/processor.h |  24 ++
>   .../selftests/kvm/system_counter_state_test.c | 256 ++++++++++++++++++
>   18 files changed, 926 insertions(+), 47 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
>   create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c
> 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
@ 2021-06-09 13:05   ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-09 13:05 UTC (permalink / raw)
  To: Oliver Upton, kvm, kvmarm, Maxim Levitsky
  Cc: Sean Christopherson, Raghavendra Rao Anata, Peter Shier,
	Marc Zyngier, David Matlack, Jim Mattson

On 08/06/21 23:47, Oliver Upton wrote:
> KVM's current means of saving/restoring system counters is plagued with
> temporal issues. At least on ARM64 and x86, we migrate the guest's
> system counter by-value through the respective guest system register
> values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> brittle as the state is not idempotent: the host system counter is still
> oscillating between the attempted save and restore. Furthermore, VMMs
> may wish to transparently live migrate guest VMs, meaning that they
> include the elapsed time due to live migration blackout in the guest
> system counter view. The VMM thread could be preempted for any number of
> reasons (scheduler, L0 hypervisor under nested) between the time that
> it calculates the desired guest counter value and when KVM actually sets
> this counter state.
> 
> Despite the value-based interface that we present to userspace, KVM
> actually has idempotent guest controls by way of system counter offsets.
> We can avoid all of the issues associated with a value-based interface
> by abstracting these offset controls in new ioctls. This series
> introduces KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls, meant to provide
> userspace with idempotent controls of the guest system counter.

Hi Oliver,

I wonder how this compares to the idea of initializing the TSC via a 
synchronized (nanoseconds, TSC) pair. 
(https://lore.kernel.org/r/20201130133559.233242-2-mlevitsk@redhat.com), 
and whether it makes sense to apply that idea to ARM as well.  If so, it 
certainly is a good idea to use the same capability and ioctl, even 
though the details of the struct would be architecture-dependent.

In your patches there isn't much architecture dependency in struct 
kvm_system_counter_state.  However,  Maxim's also added an 
MSR_IA32_TSC_ADJUST value to the struct, thus ensuring that the host 
could write not just an arbitrary TSC value, but also tie it to an 
arbitrary MSR_IA32_TSC_ADJUST value.  Specifying both in the same ioctl 
simplifies the userspace API.

Paolo

> Patch 1 defines the ioctls, and was separated from the two provided
> implementations for the sake of review. If it is more intuitive, this
> patch can be squashed into the implementation commit.
> 
> Patch 2 realizes initial support for ARM64, migrating only the state
> associated with the guest's virtual counter-timer. Patch 3 introduces a
> KVM selftest to assert that userspace manipulation via the
> aforementioned ioctls produces the expected system counter values within
> the guest.
> 
> Patch 4 extends upon the ARM64 implementation by adding support for
> physical counter-timer offsetting. This is currently backed by a
> trap-and-emulate implementation, but can also be virtualized in hardware
> that fully implements ARMv8.6-ECV. ECV support has been elided from this
> series out of convenience for the author :) Patch 5 adds some test cases
> to the newly-minted kvm selftest to validate expectations of physical
> counter-timer emulation.
> 
> Patch 6 introduces yet another KVM selftest for aarch64, intended to
> measure the effects of physical counter-timer emulation. Data for this
> test can be found below, but basically there is some tradeoff of
> overhead for the sake of correctness, but it isn't too bad.
> 
> Patches 7-8 add support for the ioctls to x86 by shoehorning the
> controls into the pre-existing synchronization heuristics. Patch 7
> provides necessary helper methods for the implementation to play nice
> with those heuristics, and patch 8 actually implements the ioctls.
> 
> Patch 9 adds x86 test cases to the system counter KVM selftest. Lastly,
> patch 10 documents the ioctls for both x86 and arm64.
> 
> All patches apply cleanly to kvm/next at the following commit:
> 
> a4345a7cecfb ("Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
> 
> Physical counter benchmark
> --------------------------
> 
> The following data was collected by running 10000 iterations of the
> benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
> machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> parameter.
> 
> nVHE
> ----
> 
> +--------------------+--------+---------+
> |       Metric       | Native | Trapped |
> +--------------------+--------+---------+
> | Average            | 54ns   | 148ns   |
> | Standard Deviation | 124ns  | 122ns   |
> | 95th Percentile    | 258ns  | 348ns   |
> +--------------------+--------+---------+
> 
> VHE
> ---
> 
> +--------------------+--------+---------+
> |       Metric       | Native | Trapped |
> +--------------------+--------+---------+
> | Average            | 53ns   | 152ns   |
> | Standard Deviation | 92ns   | 94ns    |
> | 95th Percentile    | 204ns  | 307ns   |
> +--------------------+--------+---------+
> 
> Oliver Upton (10):
>    KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
>    KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
>    selftests: KVM: Introduce system_counter_state_test
>    KVM: arm64: Add userspace control of the guest's physical counter
>    selftests: KVM: Add test cases for physical counter offsetting
>    selftests: KVM: Add counter emulation benchmark
>    KVM: x86: Refactor tsc synchronization code
>    KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
>    selftests: KVM: Add support for x86 to system_counter_state_test
>    Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> 
>   Documentation/virt/kvm/api.rst                |  98 +++++++
>   Documentation/virt/kvm/locking.rst            |  11 +
>   arch/arm64/include/asm/kvm_host.h             |   6 +
>   arch/arm64/include/asm/sysreg.h               |   1 +
>   arch/arm64/include/uapi/asm/kvm.h             |  17 ++
>   arch/arm64/kvm/arch_timer.c                   |  84 +++++-
>   arch/arm64/kvm/arm.c                          |  25 ++
>   arch/arm64/kvm/hyp/include/hyp/switch.h       |  31 +++
>   arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
>   arch/x86/include/asm/kvm_host.h               |   1 +
>   arch/x86/include/uapi/asm/kvm.h               |   8 +
>   arch/x86/kvm/x86.c                            | 176 +++++++++---
>   include/uapi/linux/kvm.h                      |   5 +
>   tools/testing/selftests/kvm/.gitignore        |   2 +
>   tools/testing/selftests/kvm/Makefile          |   3 +
>   .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++
>   .../selftests/kvm/include/aarch64/processor.h |  24 ++
>   .../selftests/kvm/system_counter_state_test.c | 256 ++++++++++++++++++
>   18 files changed, 926 insertions(+), 47 deletions(-)
>   create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
>   create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c
> 

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
  2021-06-09 10:23     ` Marc Zyngier
@ 2021-06-09 14:51       ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-09 14:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm list, kvmarm, Paolo Bonzini, Sean Christopherson,
	Peter Shier, Jim Mattson, David Matlack, Ricardo Koller,
	Jing Zhang, Raghavendra Rao Anata, Alexandru Elisei, James Morse,
	Suzuki K Poulose

On Wed, Jun 9, 2021 at 5:23 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Oliver,
>
> Please Cc the KVM/arm64 reviewers (now added). Also, please consider
> subscribing to the kvmarm mailing list so that I don't have to
> manually approve your posts ;-).

/facepalm

Thought I had done this already. Re-requested to join kvmarm@. Seems
that gmail politely decided the mailing list was spam, so no
confirmation email came through.

> On Tue, 08 Jun 2021 22:47:34 +0100,
> Oliver Upton <oupton@google.com> wrote:
> >
> > ARMv8 provides for a virtual counter-timer offset that is added to guest
> > views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not
> > provided userspace with any perception of this, and instead affords a
> > value-based scheme of migrating the virtual counter-timer by directly
> > reading/writing the guest's CNTVCT_EL0. This is problematic because
> > counters continue to elapse while the register is being written, meaning
> > it is possible for drift to sneak in to the guest's time scale. This is
> > exacerbated by the fact that KVM will calculate an appropriate
> > CNTVOFF_EL2 every time the register is written, which will be broadcast
> > to all virtual CPUs. The only possible way to avoid causing guest time
> > to drift is to restore counter-timers by offset.
>
> Well, the current method has one huge advantage: time can never go
> backward from the guest PoV if you restore what you have saved. Yes,
> time can elapse, but you don't even need to migrate to observe that.
>
> >
> > Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> > to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
> > of the virtual counter-timers to userspace, allowing it to define its
> > own heuristics for managing vCPU offsets.
>
> I'm not really in favour of inventing a completely new API, for
> multiple reasons:
>
> - CNTVOFF is an EL2 concept. I'd rather not expose it as such as it
>   becomes really confusing with NV (which does expose its own CNTVOFF
>   via the ONE_REG interface)

Very true. At least on x86, there's a fair bit of plumbing to handle
the KVM-owned L0 offset reg and the guest-owned L1 offset reg.

> - You seem to allow each vcpu to get its own offset. I don't think
>   that's right. The architecture defines that all PEs have the same
>   view of the counters, and an EL1 guest should be given that
>   illusion.

Agreed. I would have preferred a VM-wide ioctl to do this, but since
x86 explicitly allows for drifted TSCs that can't be the case in a
generic ioctl. I can do the same broadcasting as we do in the case of
a VMM write to CNTVCT_EL0.

> - by having a parallel save/restore interface, you make it harder to
>   reason about what happens with concurrent calls to both interfaces
>
> - the userspace API is already horribly bloated, and I'm not overly
>   keen on adding more if we can avoid it.

Pssh. My ioctl numbers aren't _too_ close to the limit ;-)

>
> I'd rather you extend the current ONE_REG interface and make it modal,
> either allowing the restore of an absolute value or an offset for
> CNTVCT_EL0. This would also keep a consistent behaviour when restoring
> vcpus. The same logic would apply to the physical offset.
>
> As for how to make it modal, we have plenty of bits left in the
> ONE_REG encoding. Pick one, and make that a "relative" attribute. This
> will result in some minor surgery in the get/set code paths, but at
> least no entirely new mechanism.

Yeah, it'd be good to do it w/o adding new plumbing. The only reason
I'd considered it is because x86 might necessitate it. Not wanting to
apply bad convention to other arches, but keeping at least a somewhat
consistent UAPI would be nice.

> One question though: how do you plan to reliably compute the offset?
> As far as I can see, it is subject to the same issues you described
> above (while the guest is being restored, time flies), and you have
> the added risk of exposing a counter going backward from a guest
> perspective.

Indeed, we do have the risk of time going backwards, but I'd say that
the VMM shares in the responsibility to provide a consistent view of
the counter too.

Here's how I envisioned it working:

Record the time, cycles, and offset (T0, C0, Off0) when saving the
counter state. Record time and cycles (T1, C1) again when trying to
restore counter state. Compute the new offset:

Off1 = Off0 - (T1-T0) * CNTFRQ - (C0 - C1).

The primary concern here is idempotence. Once Off1 is calculated, it
doesn't matter how much time elapses between the calculation and the
call into KVM, it will always produce the intended result. If instead
we restore the counters by-value (whilst trying to account for elapsed
time), my impression is that we'd do the following: Record time and
guest counter (T0, G0) when saving counter state. Record time again
when trying to restore counter state.

In userspace, compute the time elapsed and fold it into the guest counter (G1):

G1 = G0 + (T1-T0) * CNTFRQ

And then in the kernel:

CNTVOFF = G1 - CNTPCT

Any number of things can happen in between the kernel and userspace
portions of this operation, causing some drift of the VM's counter.
Fundamentally I suppose the issue we have is that we sample the host
counter twice (T1, G1), when really we'd want to only do so once.

So, open to any suggestions where we avoid the issue of causing the
guest counter to drift, offsets only seemed to be the easiest thing
given that they ought to be constant for the lifetime of a VM on a
host and is the backing state used by hardware.

--
Thanks,
Oliver

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
@ 2021-06-09 14:51       ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-09 14:51 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: kvm list, Sean Christopherson, Peter Shier,
	Raghavendra Rao Anata, David Matlack, Paolo Bonzini, kvmarm,
	Jim Mattson

On Wed, Jun 9, 2021 at 5:23 AM Marc Zyngier <maz@kernel.org> wrote:
>
> Hi Oliver,
>
> Please Cc the KVM/arm64 reviewers (now added). Also, please consider
> subscribing to the kvmarm mailing list so that I don't have to
> manually approve your posts ;-).

/facepalm

Thought I had done this already. Re-requested to join kvmarm@. Seems
that gmail politely decided the mailing list was spam, so no
confirmation email came through.

> On Tue, 08 Jun 2021 22:47:34 +0100,
> Oliver Upton <oupton@google.com> wrote:
> >
> > ARMv8 provides for a virtual counter-timer offset that is added to guest
> > views of the virtual counter-timer (CNTVOFF_EL2). To date, KVM has not
> > provided userspace with any perception of this, and instead affords a
> > value-based scheme of migrating the virtual counter-timer by directly
> > reading/writing the guest's CNTVCT_EL0. This is problematic because
> > counters continue to elapse while the register is being written, meaning
> > it is possible for drift to sneak in to the guest's time scale. This is
> > exacerbated by the fact that KVM will calculate an appropriate
> > CNTVOFF_EL2 every time the register is written, which will be broadcast
> > to all virtual CPUs. The only possible way to avoid causing guest time
> > to drift is to restore counter-timers by offset.
>
> Well, the current method has one huge advantage: time can never go
> backward from the guest PoV if you restore what you have saved. Yes,
> time can elapse, but you don't even need to migrate to observe that.
>
> >
> > Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> > to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
> > of the virtual counter-timers to userspace, allowing it to define its
> > own heuristics for managing vCPU offsets.
>
> I'm not really in favour of inventing a completely new API, for
> multiple reasons:
>
> - CNTVOFF is an EL2 concept. I'd rather not expose it as such as it
>   becomes really confusing with NV (which does expose its own CNTVOFF
>   via the ONE_REG interface)

Very true. At least on x86, there's a fair bit of plumbing to handle
the KVM-owned L0 offset reg and the guest-owned L1 offset reg.

> - You seem to allow each vcpu to get its own offset. I don't think
>   that's right. The architecture defines that all PEs have the same
>   view of the counters, and an EL1 guest should be given that
>   illusion.

Agreed. I would have preferred a VM-wide ioctl to do this, but since
x86 explicitly allows for drifted TSCs that can't be the case in a
generic ioctl. I can do the same broadcasting as we do in the case of
a VMM write to CNTVCT_EL0.

> - by having a parallel save/restore interface, you make it harder to
>   reason about what happens with concurrent calls to both interfaces
>
> - the userspace API is already horribly bloated, and I'm not overly
>   keen on adding more if we can avoid it.

Pssh. My ioctl numbers aren't _too_ close to the limit ;-)

>
> I'd rather you extend the current ONE_REG interface and make it modal,
> either allowing the restore of an absolute value or an offset for
> CNTVCT_EL0. This would also keep a consistent behaviour when restoring
> vcpus. The same logic would apply to the physical offset.
>
> As for how to make it modal, we have plenty of bits left in the
> ONE_REG encoding. Pick one, and make that a "relative" attribute. This
> will result in some minor surgery in the get/set code paths, but at
> least no entirely new mechanism.

Yeah, it'd be good to do it w/o adding new plumbing. The only reason
I'd considered it is because x86 might necessitate it. Not wanting to
apply bad convention to other arches, but keeping at least a somewhat
consistent UAPI would be nice.

> One question though: how do you plan to reliably compute the offset?
> As far as I can see, it is subject to the same issues you described
> above (while the guest is being restored, time flies), and you have
> the added risk of exposing a counter going backward from a guest
> perspective.

Indeed, we do have the risk of time going backwards, but I'd say that
the VMM shares in the responsibility to provide a consistent view of
the counter too.

Here's how I envisioned it working:

Record the time, cycles, and offset (T0, C0, Off0) when saving the
counter state. Record time and cycles (T1, C1) again when trying to
restore counter state. Compute the new offset:

Off1 = Off0 - (T1-T0) * CNTFRQ - (C0 - C1).

The primary concern here is idempotence. Once Off1 is calculated, it
doesn't matter how much time elapses between the calculation and the
call into KVM, it will always produce the intended result. If instead
we restore the counters by-value (whilst trying to account for elapsed
time), my impression is that we'd do the following: Record time and
guest counter (T0, G0) when saving counter state. Record time again
when trying to restore counter state.

In userspace, compute the time elapsed and fold it into the guest counter (G1):

G1 = G0 + (T1-T0) * CNTFRQ

And then in the kernel:

CNTVOFF = G1 - CNTPCT

Any number of things can happen in between the kernel and userspace
portions of this operation, causing some drift of the VM's counter.
Fundamentally I suppose the issue we have is that we sample the host
counter twice (T1, G1), when really we'd want to only do so once.

So, open to any suggestions where we avoid the issue of causing the
guest counter to drift, offsets only seemed to be the easiest thing
given that they ought to be constant for the lifetime of a VM on a
host and is the backing state used by hardware.

--
Thanks,
Oliver

>
> Thanks,
>
>         M.
>
> --
> Without deviation from the norm, progress is not possible.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
  2021-06-09 13:05   ` Paolo Bonzini
@ 2021-06-09 15:11     ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-09 15:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm list, kvmarm, Maxim Levitsky, Sean Christopherson,
	Marc Zyngier, Peter Shier, Jim Mattson, David Matlack,
	Ricardo Koller, Jing Zhang, Raghavendra Rao Anata

On Wed, Jun 9, 2021 at 8:06 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 08/06/21 23:47, Oliver Upton wrote:
> > KVM's current means of saving/restoring system counters is plagued with
> > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > system counter by-value through the respective guest system register
> > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > brittle as the state is not idempotent: the host system counter is still
> > oscillating between the attempted save and restore. Furthermore, VMMs
> > may wish to transparently live migrate guest VMs, meaning that they
> > include the elapsed time due to live migration blackout in the guest
> > system counter view. The VMM thread could be preempted for any number of
> > reasons (scheduler, L0 hypervisor under nested) between the time that
> > it calculates the desired guest counter value and when KVM actually sets
> > this counter state.
> >
> > Despite the value-based interface that we present to userspace, KVM
> > actually has idempotent guest controls by way of system counter offsets.
> > We can avoid all of the issues associated with a value-based interface
> > by abstracting these offset controls in new ioctls. This series
> > introduces KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls, meant to provide
> > userspace with idempotent controls of the guest system counter.
>
> Hi Oliver,
>
> I wonder how this compares to the idea of initializing the TSC via a
> synchronized (nanoseconds, TSC) pair.
> (https://lore.kernel.org/r/20201130133559.233242-2-mlevitsk@redhat.com),
> and whether it makes sense to apply that idea to ARM as well.  If so, it
> certainly is a good idea to use the same capability and ioctl, even
> though the details of the struct would be architecture-dependent.

Hey Paolo,

Yeah, great question, I had actually alluded to this on [02/10] in
talking to Marc about this.

Really the issue we want to avoid is sampling the host counter twice,
which at least based on the existing means of counter migration is
impossible as the VMM must account for elapsed time. Maxim's patches
appear to address that very issue as well.

Perhaps this will clarify the motivation for my approach: what if the
kernel wasn't the authoritative source for wall time in a system?
Furthermore, VMMs may wish to define their own heuristics for counter
migration (e.g. we only allow the counter to 'jump' by X seconds
during migration blackout). If a VMM tried to assert its whims on the
TSC state before handing it down to the kernel, we would inadvertently
be sampling the host counter twice again. And, anything can happen
between the time we assert elapsed time is within SLO and KVM
computing the TSC offset (scheduling, L0 hypervisor preemption).

So, Maxim's changes would address my concerns in the general case, but
maybe not as much in edge cases where an operator may make decisions
about how much time can elapse while the guest hasn't had CPU time.

--
Thanks,
Oliver

> In your patches there isn't much architecture dependency in struct
> kvm_system_counter_state.  However,  Maxim's also added an
> MSR_IA32_TSC_ADJUST value to the struct, thus ensuring that the host
> could write not just an arbitrary TSC value, but also tie it to an
> arbitrary MSR_IA32_TSC_ADJUST value.  Specifying both in the same ioctl
> simplifies the userspace API.
>
> Paolo
>
> > Patch 1 defines the ioctls, and was separated from the two provided
> > implementations for the sake of review. If it is more intuitive, this
> > patch can be squashed into the implementation commit.
> >
> > Patch 2 realizes initial support for ARM64, migrating only the state
> > associated with the guest's virtual counter-timer. Patch 3 introduces a
> > KVM selftest to assert that userspace manipulation via the
> > aforementioned ioctls produces the expected system counter values within
> > the guest.
> >
> > Patch 4 extends upon the ARM64 implementation by adding support for
> > physical counter-timer offsetting. This is currently backed by a
> > trap-and-emulate implementation, but can also be virtualized in hardware
> > that fully implements ARMv8.6-ECV. ECV support has been elided from this
> > series out of convenience for the author :) Patch 5 adds some test cases
> > to the newly-minted kvm selftest to validate expectations of physical
> > counter-timer emulation.
> >
> > Patch 6 introduces yet another KVM selftest for aarch64, intended to
> > measure the effects of physical counter-timer emulation. Data for this
> > test can be found below, but basically there is some tradeoff of
> > overhead for the sake of correctness, but it isn't too bad.
> >
> > Patches 7-8 add support for the ioctls to x86 by shoehorning the
> > controls into the pre-existing synchronization heuristics. Patch 7
> > provides necessary helper methods for the implementation to play nice
> > with those heuristics, and patch 8 actually implements the ioctls.
> >
> > Patch 9 adds x86 test cases to the system counter KVM selftest. Lastly,
> > patch 10 documents the ioctls for both x86 and arm64.
> >
> > All patches apply cleanly to kvm/next at the following commit:
> >
> > a4345a7cecfb ("Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
> >
> > Physical counter benchmark
> > --------------------------
> >
> > The following data was collected by running 10000 iterations of the
> > benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
> > machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> > parameter.
> >
> > nVHE
> > ----
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 54ns   | 148ns   |
> > | Standard Deviation | 124ns  | 122ns   |
> > | 95th Percentile    | 258ns  | 348ns   |
> > +--------------------+--------+---------+
> >
> > VHE
> > ---
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 53ns   | 152ns   |
> > | Standard Deviation | 92ns   | 94ns    |
> > | 95th Percentile    | 204ns  | 307ns   |
> > +--------------------+--------+---------+
> >
> > Oliver Upton (10):
> >    KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> >    KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
> >    selftests: KVM: Introduce system_counter_state_test
> >    KVM: arm64: Add userspace control of the guest's physical counter
> >    selftests: KVM: Add test cases for physical counter offsetting
> >    selftests: KVM: Add counter emulation benchmark
> >    KVM: x86: Refactor tsc synchronization code
> >    KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
> >    selftests: KVM: Add support for x86 to system_counter_state_test
> >    Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> >
> >   Documentation/virt/kvm/api.rst                |  98 +++++++
> >   Documentation/virt/kvm/locking.rst            |  11 +
> >   arch/arm64/include/asm/kvm_host.h             |   6 +
> >   arch/arm64/include/asm/sysreg.h               |   1 +
> >   arch/arm64/include/uapi/asm/kvm.h             |  17 ++
> >   arch/arm64/kvm/arch_timer.c                   |  84 +++++-
> >   arch/arm64/kvm/arm.c                          |  25 ++
> >   arch/arm64/kvm/hyp/include/hyp/switch.h       |  31 +++
> >   arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
> >   arch/x86/include/asm/kvm_host.h               |   1 +
> >   arch/x86/include/uapi/asm/kvm.h               |   8 +
> >   arch/x86/kvm/x86.c                            | 176 +++++++++---
> >   include/uapi/linux/kvm.h                      |   5 +
> >   tools/testing/selftests/kvm/.gitignore        |   2 +
> >   tools/testing/selftests/kvm/Makefile          |   3 +
> >   .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++
> >   .../selftests/kvm/include/aarch64/processor.h |  24 ++
> >   .../selftests/kvm/system_counter_state_test.c | 256 ++++++++++++++++++
> >   18 files changed, 926 insertions(+), 47 deletions(-)
> >   create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
> >   create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c
> >
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
@ 2021-06-09 15:11     ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-09 15:11 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm list, Sean Christopherson, Raghavendra Rao Anata,
	Peter Shier, Maxim Levitsky, Marc Zyngier, David Matlack, kvmarm,
	Jim Mattson

On Wed, Jun 9, 2021 at 8:06 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 08/06/21 23:47, Oliver Upton wrote:
> > KVM's current means of saving/restoring system counters is plagued with
> > temporal issues. At least on ARM64 and x86, we migrate the guest's
> > system counter by-value through the respective guest system register
> > values (cntvct_el0, ia32_tsc). Restoring system counters by-value is
> > brittle as the state is not idempotent: the host system counter is still
> > oscillating between the attempted save and restore. Furthermore, VMMs
> > may wish to transparently live migrate guest VMs, meaning that they
> > include the elapsed time due to live migration blackout in the guest
> > system counter view. The VMM thread could be preempted for any number of
> > reasons (scheduler, L0 hypervisor under nested) between the time that
> > it calculates the desired guest counter value and when KVM actually sets
> > this counter state.
> >
> > Despite the value-based interface that we present to userspace, KVM
> > actually has idempotent guest controls by way of system counter offsets.
> > We can avoid all of the issues associated with a value-based interface
> > by abstracting these offset controls in new ioctls. This series
> > introduces KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls, meant to provide
> > userspace with idempotent controls of the guest system counter.
>
> Hi Oliver,
>
> I wonder how this compares to the idea of initializing the TSC via a
> synchronized (nanoseconds, TSC) pair.
> (https://lore.kernel.org/r/20201130133559.233242-2-mlevitsk@redhat.com),
> and whether it makes sense to apply that idea to ARM as well.  If so, it
> certainly is a good idea to use the same capability and ioctl, even
> though the details of the struct would be architecture-dependent.

Hey Paolo,

Yeah, great question, I had actually alluded to this on [02/10] in
talking to Marc about this.

Really the issue we want to avoid is sampling the host counter twice,
which at least based on the existing means of counter migration is
impossible as the VMM must account for elapsed time. Maxim's patches
appear to address that very issue as well.

Perhaps this will clarify the motivation for my approach: what if the
kernel wasn't the authoritative source for wall time in a system?
Furthermore, VMMs may wish to define their own heuristics for counter
migration (e.g. we only allow the counter to 'jump' by X seconds
during migration blackout). If a VMM tried to assert its whims on the
TSC state before handing it down to the kernel, we would inadvertently
be sampling the host counter twice again. And, anything can happen
between the time we assert elapsed time is within SLO and KVM
computing the TSC offset (scheduling, L0 hypervisor preemption).

So, Maxim's changes would address my concerns in the general case, but
maybe not as much in edge cases where an operator may make decisions
about how much time can elapse while the guest hasn't had CPU time.

--
Thanks,
Oliver

> In your patches there isn't much architecture dependency in struct
> kvm_system_counter_state.  However,  Maxim's also added an
> MSR_IA32_TSC_ADJUST value to the struct, thus ensuring that the host
> could write not just an arbitrary TSC value, but also tie it to an
> arbitrary MSR_IA32_TSC_ADJUST value.  Specifying both in the same ioctl
> simplifies the userspace API.
>
> Paolo
>
> > Patch 1 defines the ioctls, and was separated from the two provided
> > implementations for the sake of review. If it is more intuitive, this
> > patch can be squashed into the implementation commit.
> >
> > Patch 2 realizes initial support for ARM64, migrating only the state
> > associated with the guest's virtual counter-timer. Patch 3 introduces a
> > KVM selftest to assert that userspace manipulation via the
> > aforementioned ioctls produces the expected system counter values within
> > the guest.
> >
> > Patch 4 extends upon the ARM64 implementation by adding support for
> > physical counter-timer offsetting. This is currently backed by a
> > trap-and-emulate implementation, but can also be virtualized in hardware
> > that fully implements ARMv8.6-ECV. ECV support has been elided from this
> > series out of convenience for the author :) Patch 5 adds some test cases
> > to the newly-minted kvm selftest to validate expectations of physical
> > counter-timer emulation.
> >
> > Patch 6 introduces yet another KVM selftest for aarch64, intended to
> > measure the effects of physical counter-timer emulation. Data for this
> > test can be found below, but basically there is some tradeoff of
> > overhead for the sake of correctness, but it isn't too bad.
> >
> > Patches 7-8 add support for the ioctls to x86 by shoehorning the
> > controls into the pre-existing synchronization heuristics. Patch 7
> > provides necessary helper methods for the implementation to play nice
> > with those heuristics, and patch 8 actually implements the ioctls.
> >
> > Patch 9 adds x86 test cases to the system counter KVM selftest. Lastly,
> > patch 10 documents the ioctls for both x86 and arm64.
> >
> > All patches apply cleanly to kvm/next at the following commit:
> >
> > a4345a7cecfb ("Merge tag 'kvmarm-fixes-5.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD")
> >
> > Physical counter benchmark
> > --------------------------
> >
> > The following data was collected by running 10000 iterations of the
> > benchmark test from Patch 6 on an Ampere Mt. Jade reference server, A 2S
> > machine with 2 80-core Ampere Altra SoCs. Measurements were collected
> > for both VHE and nVHE operation using the `kvm-arm.mode=` command-line
> > parameter.
> >
> > nVHE
> > ----
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 54ns   | 148ns   |
> > | Standard Deviation | 124ns  | 122ns   |
> > | 95th Percentile    | 258ns  | 348ns   |
> > +--------------------+--------+---------+
> >
> > VHE
> > ---
> >
> > +--------------------+--------+---------+
> > |       Metric       | Native | Trapped |
> > +--------------------+--------+---------+
> > | Average            | 53ns   | 152ns   |
> > | Standard Deviation | 92ns   | 94ns    |
> > | 95th Percentile    | 204ns  | 307ns   |
> > +--------------------+--------+---------+
> >
> > Oliver Upton (10):
> >    KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> >    KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
> >    selftests: KVM: Introduce system_counter_state_test
> >    KVM: arm64: Add userspace control of the guest's physical counter
> >    selftests: KVM: Add test cases for physical counter offsetting
> >    selftests: KVM: Add counter emulation benchmark
> >    KVM: x86: Refactor tsc synchronization code
> >    KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE
> >    selftests: KVM: Add support for x86 to system_counter_state_test
> >    Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
> >
> >   Documentation/virt/kvm/api.rst                |  98 +++++++
> >   Documentation/virt/kvm/locking.rst            |  11 +
> >   arch/arm64/include/asm/kvm_host.h             |   6 +
> >   arch/arm64/include/asm/sysreg.h               |   1 +
> >   arch/arm64/include/uapi/asm/kvm.h             |  17 ++
> >   arch/arm64/kvm/arch_timer.c                   |  84 +++++-
> >   arch/arm64/kvm/arm.c                          |  25 ++
> >   arch/arm64/kvm/hyp/include/hyp/switch.h       |  31 +++
> >   arch/arm64/kvm/hyp/nvhe/timer-sr.c            |  16 +-
> >   arch/x86/include/asm/kvm_host.h               |   1 +
> >   arch/x86/include/uapi/asm/kvm.h               |   8 +
> >   arch/x86/kvm/x86.c                            | 176 +++++++++---
> >   include/uapi/linux/kvm.h                      |   5 +
> >   tools/testing/selftests/kvm/.gitignore        |   2 +
> >   tools/testing/selftests/kvm/Makefile          |   3 +
> >   .../kvm/aarch64/counter_emulation_benchmark.c | 209 ++++++++++++++
> >   .../selftests/kvm/include/aarch64/processor.h |  24 ++
> >   .../selftests/kvm/system_counter_state_test.c | 256 ++++++++++++++++++
> >   18 files changed, 926 insertions(+), 47 deletions(-)
> >   create mode 100644 tools/testing/selftests/kvm/aarch64/counter_emulation_benchmark.c
> >   create mode 100644 tools/testing/selftests/kvm/system_counter_state_test.c
> >
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
  2021-06-09 15:11     ` Oliver Upton
@ 2021-06-09 17:05       ` Paolo Bonzini
  -1 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-09 17:05 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm list, kvmarm, Maxim Levitsky, Sean Christopherson,
	Marc Zyngier, Peter Shier, Jim Mattson, David Matlack,
	Ricardo Koller, Jing Zhang, Raghavendra Rao Anata

On 09/06/21 17:11, Oliver Upton wrote:
> Perhaps this will clarify the motivation for my approach: what if the
> kernel wasn't the authoritative source for wall time in a system?
> Furthermore, VMMs may wish to define their own heuristics for counter
> migration (e.g. we only allow the counter to 'jump' by X seconds
> during migration blackout). If a VMM tried to assert its whims on the
> TSC state before handing it down to the kernel, we would inadvertently
> be sampling the host counter twice again. And, anything can happen
> between the time we assert elapsed time is within SLO and KVM
> computing the TSC offset (scheduling, L0 hypervisor preemption).
> 
> So, Maxim's changes would address my concerns in the general case, but
> maybe not as much in edge cases where an operator may make decisions
> about how much time can elapse while the guest hasn't had CPU time.

I think I understand.  We still need a way to get a consistent 
(host_TSC, nanosecond) pair on the source, the TSC offset is not enough. 
  This is arguably not a KVM issue, but we're still the one having to 
provide a solution, so we would need a slightly more complicated interface.

1) In the kernel:

* KVM_GET_CLOCK should also return kvmclock_ns - realtime_ns and 
host_TSC.  It should set two flags in struct kvm_clock_data saying that 
the respective fields are valid.

* KVM_SET_CLOCK checks the flag for kvmclock_ns - realtime_ns.  If set, 
it looks at the kvmclock_ns - realtime_ns field and disregards the 
kvmclock_ns field.

2) On the source, userspace will:

* per-VM: invoke KVM_GET_CLOCK.  Migrate kvmclock_ns - realtime_ns and 
kvmclock_ns.  Stash host_TSC for subsequent use.

* per-vCPU: retrieve guest_TSC - host_TSC with your new ioctl.  Sum it 
to the stashed host_TSC value; migrate the resulting value (a guest TSC).

3) On the destination:

* per-VM: Pass the migrated kvmclock_ns - realtime_ns to KVM_SET_CLOCK. 
  Use KVM_GET_CLOCK to get a consistent pair of kvmclock_ns ("newNS" 
below) and host TSC ("newHostTSC").  Stash them for subsequent use, 
together with the migrated kvmclock_ns value ("sourceNS") that you 
haven't used yet.

* per-vCPU: using the data of the previous step, and the sourceGuestTSC 
in the migration stream, compute sourceGuestTSC + (newNS - sourceNS) * 
freq - newHostTSC.  This is the TSC offset to be passed to your new ioctl.

Your approach still needs to use the "quirky" approach to host-initiated 
MSR_IA32_TSC_ADJUST writes, which write the MSR without affecting the 
VMCS offset.  This is just a documentation issue.

Does this make sense?

Paolo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
@ 2021-06-09 17:05       ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-09 17:05 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvm list, Sean Christopherson, Raghavendra Rao Anata,
	Peter Shier, Maxim Levitsky, Marc Zyngier, David Matlack, kvmarm,
	Jim Mattson

On 09/06/21 17:11, Oliver Upton wrote:
> Perhaps this will clarify the motivation for my approach: what if the
> kernel wasn't the authoritative source for wall time in a system?
> Furthermore, VMMs may wish to define their own heuristics for counter
> migration (e.g. we only allow the counter to 'jump' by X seconds
> during migration blackout). If a VMM tried to assert its whims on the
> TSC state before handing it down to the kernel, we would inadvertently
> be sampling the host counter twice again. And, anything can happen
> between the time we assert elapsed time is within SLO and KVM
> computing the TSC offset (scheduling, L0 hypervisor preemption).
> 
> So, Maxim's changes would address my concerns in the general case, but
> maybe not as much in edge cases where an operator may make decisions
> about how much time can elapse while the guest hasn't had CPU time.

I think I understand.  We still need a way to get a consistent 
(host_TSC, nanosecond) pair on the source, the TSC offset is not enough. 
  This is arguably not a KVM issue, but we're still the one having to 
provide a solution, so we would need a slightly more complicated interface.

1) In the kernel:

* KVM_GET_CLOCK should also return kvmclock_ns - realtime_ns and 
host_TSC.  It should set two flags in struct kvm_clock_data saying that 
the respective fields are valid.

* KVM_SET_CLOCK checks the flag for kvmclock_ns - realtime_ns.  If set, 
it looks at the kvmclock_ns - realtime_ns field and disregards the 
kvmclock_ns field.

2) On the source, userspace will:

* per-VM: invoke KVM_GET_CLOCK.  Migrate kvmclock_ns - realtime_ns and 
kvmclock_ns.  Stash host_TSC for subsequent use.

* per-vCPU: retrieve guest_TSC - host_TSC with your new ioctl.  Sum it 
to the stashed host_TSC value; migrate the resulting value (a guest TSC).

3) On the destination:

* per-VM: Pass the migrated kvmclock_ns - realtime_ns to KVM_SET_CLOCK. 
  Use KVM_GET_CLOCK to get a consistent pair of kvmclock_ns ("newNS" 
below) and host TSC ("newHostTSC").  Stash them for subsequent use, 
together with the migrated kvmclock_ns value ("sourceNS") that you 
haven't used yet.

* per-vCPU: using the data of the previous step, and the sourceGuestTSC 
in the migration stream, compute sourceGuestTSC + (newNS - sourceNS) * 
freq - newHostTSC.  This is the TSC offset to be passed to your new ioctl.

Your approach still needs to use the "quirky" approach to host-initiated 
MSR_IA32_TSC_ADJUST writes, which write the MSR without affecting the 
VMCS offset.  This is just a documentation issue.

Does this make sense?

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
  2021-06-09 17:05       ` Paolo Bonzini
@ 2021-06-09 22:04         ` Oliver Upton
  -1 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-09 22:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm list, kvmarm, Maxim Levitsky, Sean Christopherson,
	Marc Zyngier, Peter Shier, Jim Mattson, David Matlack,
	Ricardo Koller, Jing Zhang, Raghavendra Rao Anata

On Wed, Jun 9, 2021 at 12:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 09/06/21 17:11, Oliver Upton wrote:
> > Perhaps this will clarify the motivation for my approach: what if the
> > kernel wasn't the authoritative source for wall time in a system?
> > Furthermore, VMMs may wish to define their own heuristics for counter
> > migration (e.g. we only allow the counter to 'jump' by X seconds
> > during migration blackout). If a VMM tried to assert its whims on the
> > TSC state before handing it down to the kernel, we would inadvertently
> > be sampling the host counter twice again. And, anything can happen
> > between the time we assert elapsed time is within SLO and KVM
> > computing the TSC offset (scheduling, L0 hypervisor preemption).
> >
> > So, Maxim's changes would address my concerns in the general case, but
> > maybe not as much in edge cases where an operator may make decisions
> > about how much time can elapse while the guest hasn't had CPU time.
>
> I think I understand.  We still need a way to get a consistent
> (host_TSC, nanosecond) pair on the source, the TSC offset is not enough.
>   This is arguably not a KVM issue, but we're still the one having to
> provide a solution, so we would need a slightly more complicated interface.

Ah, right, good luck doing that without some help from the kernel. +1
to this _not_ being a KVM issue, but anyways...

> 1) In the kernel:
>
> * KVM_GET_CLOCK should also return kvmclock_ns - realtime_ns and
> host_TSC.  It should set two flags in struct kvm_clock_data saying that
> the respective fields are valid.
>
> * KVM_SET_CLOCK checks the flag for kvmclock_ns - realtime_ns.  If set,
> it looks at the kvmclock_ns - realtime_ns field and disregards the
> kvmclock_ns field.

Yeah, these additions all make sense to me.

>
> 2) On the source, userspace will:
>
> * per-VM: invoke KVM_GET_CLOCK.  Migrate kvmclock_ns - realtime_ns and
> kvmclock_ns.  Stash host_TSC for subsequent use.
>
> * per-vCPU: retrieve guest_TSC - host_TSC with your new ioctl.  Sum it
> to the stashed host_TSC value; migrate the resulting value (a guest TSC).
>
> 3) On the destination:
>
> * per-VM: Pass the migrated kvmclock_ns - realtime_ns to KVM_SET_CLOCK.
>   Use KVM_GET_CLOCK to get a consistent pair of kvmclock_ns ("newNS"
> below) and host TSC ("newHostTSC").  Stash them for subsequent use,
> together with the migrated kvmclock_ns value ("sourceNS") that you
> haven't used yet.
>
> * per-vCPU: using the data of the previous step, and the sourceGuestTSC
> in the migration stream, compute sourceGuestTSC + (newNS - sourceNS) *
> freq - newHostTSC.  This is the TSC offset to be passed to your new ioctl.
>
> Your approach still needs to use the "quirky" approach to host-initiated
> MSR_IA32_TSC_ADJUST writes, which write the MSR without affecting the
> VMCS offset.  This is just a documentation issue.

I think I follow what you're saying. To confirm:

My suggested ioctl for the vCPU will still exist, and it will still
affect the VMCS tsc offset, right? However, we need to do one of the
following:

- Stash the guest's MSR_IA32_TSC_ADJUST value in the
kvm_system_counter_state structure. During
KVM_SET_SYSTEM_COUNTER_STATE, check to see if the field is valid. If
so, treat it as a dumb value (i.e. show it to the guest but don't fold
it into the offset).
- Inform userspace that it must still migrate MSR_IA32_TSC_ADJUST, and
continue to our quirky behavior around host-initiated writes of the
MSR.

This is why Maxim's spin migrated a value for IA32_TSC_ADJUST, right?
Doing so ensures we don't have any guest-observable consequences due
to our migration of TSC state. To me, adding the guest IA32_TSC_ADJUST
MSR into the new counter state structure is probably best. No strong
opinions in either direction on this point, though :)

--
Thanks,
Oliver

>
> Does this make sense?
>
> Paolo
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
@ 2021-06-09 22:04         ` Oliver Upton
  0 siblings, 0 replies; 46+ messages in thread
From: Oliver Upton @ 2021-06-09 22:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kvm list, Sean Christopherson, Raghavendra Rao Anata,
	Peter Shier, Maxim Levitsky, Marc Zyngier, David Matlack, kvmarm,
	Jim Mattson

On Wed, Jun 9, 2021 at 12:05 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 09/06/21 17:11, Oliver Upton wrote:
> > Perhaps this will clarify the motivation for my approach: what if the
> > kernel wasn't the authoritative source for wall time in a system?
> > Furthermore, VMMs may wish to define their own heuristics for counter
> > migration (e.g. we only allow the counter to 'jump' by X seconds
> > during migration blackout). If a VMM tried to assert its whims on the
> > TSC state before handing it down to the kernel, we would inadvertently
> > be sampling the host counter twice again. And, anything can happen
> > between the time we assert elapsed time is within SLO and KVM
> > computing the TSC offset (scheduling, L0 hypervisor preemption).
> >
> > So, Maxim's changes would address my concerns in the general case, but
> > maybe not as much in edge cases where an operator may make decisions
> > about how much time can elapse while the guest hasn't had CPU time.
>
> I think I understand.  We still need a way to get a consistent
> (host_TSC, nanosecond) pair on the source, the TSC offset is not enough.
>   This is arguably not a KVM issue, but we're still the one having to
> provide a solution, so we would need a slightly more complicated interface.

Ah, right, good luck doing that without some help from the kernel. +1
to this _not_ being a KVM issue, but anyways...

> 1) In the kernel:
>
> * KVM_GET_CLOCK should also return kvmclock_ns - realtime_ns and
> host_TSC.  It should set two flags in struct kvm_clock_data saying that
> the respective fields are valid.
>
> * KVM_SET_CLOCK checks the flag for kvmclock_ns - realtime_ns.  If set,
> it looks at the kvmclock_ns - realtime_ns field and disregards the
> kvmclock_ns field.

Yeah, these additions all make sense to me.

>
> 2) On the source, userspace will:
>
> * per-VM: invoke KVM_GET_CLOCK.  Migrate kvmclock_ns - realtime_ns and
> kvmclock_ns.  Stash host_TSC for subsequent use.
>
> * per-vCPU: retrieve guest_TSC - host_TSC with your new ioctl.  Sum it
> to the stashed host_TSC value; migrate the resulting value (a guest TSC).
>
> 3) On the destination:
>
> * per-VM: Pass the migrated kvmclock_ns - realtime_ns to KVM_SET_CLOCK.
>   Use KVM_GET_CLOCK to get a consistent pair of kvmclock_ns ("newNS"
> below) and host TSC ("newHostTSC").  Stash them for subsequent use,
> together with the migrated kvmclock_ns value ("sourceNS") that you
> haven't used yet.
>
> * per-vCPU: using the data of the previous step, and the sourceGuestTSC
> in the migration stream, compute sourceGuestTSC + (newNS - sourceNS) *
> freq - newHostTSC.  This is the TSC offset to be passed to your new ioctl.
>
> Your approach still needs to use the "quirky" approach to host-initiated
> MSR_IA32_TSC_ADJUST writes, which write the MSR without affecting the
> VMCS offset.  This is just a documentation issue.

I think I follow what you're saying. To confirm:

My suggested ioctl for the vCPU will still exist, and it will still
affect the VMCS tsc offset, right? However, we need to do one of the
following:

- Stash the guest's MSR_IA32_TSC_ADJUST value in the
kvm_system_counter_state structure. During
KVM_SET_SYSTEM_COUNTER_STATE, check to see if the field is valid. If
so, treat it as a dumb value (i.e. show it to the guest but don't fold
it into the offset).
- Inform userspace that it must still migrate MSR_IA32_TSC_ADJUST, and
continue to our quirky behavior around host-initiated writes of the
MSR.

This is why Maxim's spin migrated a value for IA32_TSC_ADJUST, right?
Doing so ensures we don't have any guest-observable consequences due
to our migration of TSC state. To me, adding the guest IA32_TSC_ADJUST
MSR into the new counter state structure is probably best. No strong
opinions in either direction on this point, though :)

--
Thanks,
Oliver

>
> Does this make sense?
>
> Paolo
>
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
  2021-06-09 22:04         ` Oliver Upton
@ 2021-06-10  6:22           ` Paolo Bonzini
  -1 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-10  6:22 UTC (permalink / raw)
  To: Oliver Upton, Christian Borntraeger
  Cc: kvm list, kvmarm, Maxim Levitsky, Sean Christopherson,
	Marc Zyngier, Peter Shier, Jim Mattson, David Matlack,
	Ricardo Koller, Jing Zhang, Raghavendra Rao Anata

On 10/06/21 00:04, Oliver Upton wrote:
>> Your approach still needs to use the "quirky" approach to host-initiated
>> MSR_IA32_TSC_ADJUST writes, which write the MSR without affecting the
>> VMCS offset.  This is just a documentation issue.
> 
> My suggested ioctl for the vCPU will still exist, and it will still
> affect the VMCS tsc offset, right? However, we need to do one of the
> following:
> 
> - Stash the guest's MSR_IA32_TSC_ADJUST value in the
> kvm_system_counter_state structure. During
> KVM_SET_SYSTEM_COUNTER_STATE, check to see if the field is valid. If
> so, treat it as a dumb value (i.e. show it to the guest but don't fold
> it into the offset).

Yes, it's already folded into the guestTSC-hostTSC offset that the 
caller provides.

> - Inform userspace that it must still migrate MSR_IA32_TSC_ADJUST, and
> continue to our quirky behavior around host-initiated writes of the
> MSR.
> 
> This is why Maxim's spin migrated a value for IA32_TSC_ADJUST, right?

Yes, so that he could then remove (optionally) the quirk for 
host-initiated writes to the TSC and TSC_ADJUST MSRs.

> Doing so ensures we don't have any guest-observable consequences due
> to our migration of TSC state. To me, adding the guest IA32_TSC_ADJUST
> MSR into the new counter state structure is probably best. No strong
> opinions in either direction on this point, though:)

Either is good for me, since documentation will be very important either 
way.  This is a complex API to use due to the possibility of skewed TSCs.

Just one thing, please don't introduce a new ioctl and use 
KVM_GET_DEVICE_ATTR/KVM_SET_DEVICE_ATTR/KVM_HAS_DEVICE_ATTR.

Christian, based on what Oliver mentions here, it's probably useful for 
s390 to have functionality to get/set kvm->arch.epoch and kvm->arch.epdx 
in addition to the absolute TOD values that you are migrating now.

Paolo

>> 1) In the kernel:
>>
>> * KVM_GET_CLOCK should also return kvmclock_ns - realtime_ns and
>> host_TSC.  It should set two flags in struct kvm_clock_data saying that
>> the respective fields are valid.
>>
>> * KVM_SET_CLOCK checks the flag for kvmclock_ns - realtime_ns.  If set,
>> it looks at the kvmclock_ns - realtime_ns field and disregards the
>> kvmclock_ns field.
>>
>> 2) On the source, userspace will:
>>
>> * per-VM: invoke KVM_GET_CLOCK.  Migrate kvmclock_ns - realtime_ns and
>> kvmclock_ns.  Stash host_TSC for subsequent use.
>>
>> * per-vCPU: retrieve guest_TSC - host_TSC with your new ioctl.  Sum it
>> to the stashed host_TSC value; migrate the resulting value (a guest TSC).
>>
>> 3) On the destination:
>>
>> * per-VM: Pass the migrated kvmclock_ns - realtime_ns to KVM_SET_CLOCK.
>>   Use KVM_GET_CLOCK to get a consistent pair of kvmclock_ns ("newNS"
>> below) and host TSC ("newHostTSC").  Stash them for subsequent use,
>> together with the migrated kvmclock_ns value ("sourceNS") that you
>> haven't used yet.
>>
>> * per-vCPU: using the data of the previous step, and the sourceGuestTSC
>> in the migration stream, compute sourceGuestTSC + (newNS - sourceNS) *
>> freq - newHostTSC.  This is the TSC offset to be passed to your new ioctl.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
@ 2021-06-10  6:22           ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-10  6:22 UTC (permalink / raw)
  To: Oliver Upton, Christian Borntraeger
  Cc: kvm list, Sean Christopherson, Raghavendra Rao Anata,
	Peter Shier, Maxim Levitsky, Marc Zyngier, David Matlack, kvmarm,
	Jim Mattson

On 10/06/21 00:04, Oliver Upton wrote:
>> Your approach still needs to use the "quirky" approach to host-initiated
>> MSR_IA32_TSC_ADJUST writes, which write the MSR without affecting the
>> VMCS offset.  This is just a documentation issue.
> 
> My suggested ioctl for the vCPU will still exist, and it will still
> affect the VMCS tsc offset, right? However, we need to do one of the
> following:
> 
> - Stash the guest's MSR_IA32_TSC_ADJUST value in the
> kvm_system_counter_state structure. During
> KVM_SET_SYSTEM_COUNTER_STATE, check to see if the field is valid. If
> so, treat it as a dumb value (i.e. show it to the guest but don't fold
> it into the offset).

Yes, it's already folded into the guestTSC-hostTSC offset that the 
caller provides.

> - Inform userspace that it must still migrate MSR_IA32_TSC_ADJUST, and
> continue to our quirky behavior around host-initiated writes of the
> MSR.
> 
> This is why Maxim's spin migrated a value for IA32_TSC_ADJUST, right?

Yes, so that he could then remove (optionally) the quirk for 
host-initiated writes to the TSC and TSC_ADJUST MSRs.

> Doing so ensures we don't have any guest-observable consequences due
> to our migration of TSC state. To me, adding the guest IA32_TSC_ADJUST
> MSR into the new counter state structure is probably best. No strong
> opinions in either direction on this point, though:)

Either is good for me, since documentation will be very important either 
way.  This is a complex API to use due to the possibility of skewed TSCs.

Just one thing, please don't introduce a new ioctl and use 
KVM_GET_DEVICE_ATTR/KVM_SET_DEVICE_ATTR/KVM_HAS_DEVICE_ATTR.

Christian, based on what Oliver mentions here, it's probably useful for 
s390 to have functionality to get/set kvm->arch.epoch and kvm->arch.epdx 
in addition to the absolute TOD values that you are migrating now.

Paolo

>> 1) In the kernel:
>>
>> * KVM_GET_CLOCK should also return kvmclock_ns - realtime_ns and
>> host_TSC.  It should set two flags in struct kvm_clock_data saying that
>> the respective fields are valid.
>>
>> * KVM_SET_CLOCK checks the flag for kvmclock_ns - realtime_ns.  If set,
>> it looks at the kvmclock_ns - realtime_ns field and disregards the
>> kvmclock_ns field.
>>
>> 2) On the source, userspace will:
>>
>> * per-VM: invoke KVM_GET_CLOCK.  Migrate kvmclock_ns - realtime_ns and
>> kvmclock_ns.  Stash host_TSC for subsequent use.
>>
>> * per-vCPU: retrieve guest_TSC - host_TSC with your new ioctl.  Sum it
>> to the stashed host_TSC value; migrate the resulting value (a guest TSC).
>>
>> 3) On the destination:
>>
>> * per-VM: Pass the migrated kvmclock_ns - realtime_ns to KVM_SET_CLOCK.
>>   Use KVM_GET_CLOCK to get a consistent pair of kvmclock_ns ("newNS"
>> below) and host TSC ("newHostTSC").  Stash them for subsequent use,
>> together with the migrated kvmclock_ns value ("sourceNS") that you
>> haven't used yet.
>>
>> * per-vCPU: using the data of the previous step, and the sourceGuestTSC
>> in the migration stream, compute sourceGuestTSC + (newNS - sourceNS) *
>> freq - newHostTSC.  This is the TSC offset to be passed to your new ioctl.

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
  2021-06-09 10:23     ` Marc Zyngier
@ 2021-06-10  6:26       ` Paolo Bonzini
  -1 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-10  6:26 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: kvm, kvmarm, Sean Christopherson, Peter Shier, Jim Mattson,
	David Matlack, Ricardo Koller, Jing Zhang, Raghavendra Rao Anata,
	Alexandru Elisei, James Morse, Suzuki K Poulose

On 09/06/21 12:23, Marc Zyngier wrote:
>> Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
>> to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
>> of the virtual counter-timers to userspace, allowing it to define its
>> own heuristics for managing vCPU offsets.
> I'm not really in favour of inventing a completely new API, for
> multiple reasons:
> 
> - CNTVOFF is an EL2 concept. I'd rather not expose it as such as it
>    becomes really confusing with NV (which does expose its own CNTVOFF
>    via the ONE_REG interface)
> 
> - You seem to allow each vcpu to get its own offset. I don't think
>    that's right. The architecture defines that all PEs have the same
>    view of the counters, and an EL1 guest should be given that
>    illusion.
> 
> - by having a parallel save/restore interface, you make it harder to
>    reason about what happens with concurrent calls to both interfaces
> 
> - the userspace API is already horribly bloated, and I'm not overly
>    keen on adding more if we can avoid it.
> 
> I'd rather you extend the current ONE_REG interface and make it modal,
> either allowing the restore of an absolute value or an offset for
> CNTVCT_EL0. This would also keep a consistent behaviour when restoring
> vcpus. The same logic would apply to the physical offset.

What about using KVM_GET/SET_DEVICE_ATTR?  It makes sense that the guest 
value for nested virt goes through ONE_REG, while the host value goes 
through DEVICE_ATTR.

Paolo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
@ 2021-06-10  6:26       ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-10  6:26 UTC (permalink / raw)
  To: Marc Zyngier, Oliver Upton
  Cc: kvm, Sean Christopherson, Peter Shier, Raghavendra Rao Anata,
	David Matlack, kvmarm, Jim Mattson

On 09/06/21 12:23, Marc Zyngier wrote:
>> Implement initial support for KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls
>> to migrate the value of CNTVOFF_EL2. These ioctls yield precise control
>> of the virtual counter-timers to userspace, allowing it to define its
>> own heuristics for managing vCPU offsets.
> I'm not really in favour of inventing a completely new API, for
> multiple reasons:
> 
> - CNTVOFF is an EL2 concept. I'd rather not expose it as such as it
>    becomes really confusing with NV (which does expose its own CNTVOFF
>    via the ONE_REG interface)
> 
> - You seem to allow each vcpu to get its own offset. I don't think
>    that's right. The architecture defines that all PEs have the same
>    view of the counters, and an EL1 guest should be given that
>    illusion.
> 
> - by having a parallel save/restore interface, you make it harder to
>    reason about what happens with concurrent calls to both interfaces
> 
> - the userspace API is already horribly bloated, and I'm not overly
>    keen on adding more if we can avoid it.
> 
> I'd rather you extend the current ONE_REG interface and make it modal,
> either allowing the restore of an absolute value or an offset for
> CNTVCT_EL0. This would also keep a consistent behaviour when restoring
> vcpus. The same logic would apply to the physical offset.

What about using KVM_GET/SET_DEVICE_ATTR?  It makes sense that the guest 
value for nested virt goes through ONE_REG, while the host value goes 
through DEVICE_ATTR.

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
  2021-06-10  6:22           ` Paolo Bonzini
@ 2021-06-10  6:53             ` Christian Borntraeger
  -1 siblings, 0 replies; 46+ messages in thread
From: Christian Borntraeger @ 2021-06-10  6:53 UTC (permalink / raw)
  To: Paolo Bonzini, Oliver Upton
  Cc: kvm list, kvmarm, Maxim Levitsky, Sean Christopherson,
	Marc Zyngier, Peter Shier, Jim Mattson, David Matlack,
	Ricardo Koller, Jing Zhang, Raghavendra Rao Anata



On 10.06.21 08:22, Paolo Bonzini wrote:
> On 10/06/21 00:04, Oliver Upton wrote:
>>> Your approach still needs to use the "quirky" approach to host-initiated
>>> MSR_IA32_TSC_ADJUST writes, which write the MSR without affecting the
>>> VMCS offset.  This is just a documentation issue.
>>
>> My suggested ioctl for the vCPU will still exist, and it will still
>> affect the VMCS tsc offset, right? However, we need to do one of the
>> following:
>>
>> - Stash the guest's MSR_IA32_TSC_ADJUST value in the
>> kvm_system_counter_state structure. During
>> KVM_SET_SYSTEM_COUNTER_STATE, check to see if the field is valid. If
>> so, treat it as a dumb value (i.e. show it to the guest but don't fold
>> it into the offset).
> 
> Yes, it's already folded into the guestTSC-hostTSC offset that the caller provides.
> 
>> - Inform userspace that it must still migrate MSR_IA32_TSC_ADJUST, and
>> continue to our quirky behavior around host-initiated writes of the
>> MSR.
>>
>> This is why Maxim's spin migrated a value for IA32_TSC_ADJUST, right?
> 
> Yes, so that he could then remove (optionally) the quirk for host-initiated writes to the TSC and TSC_ADJUST MSRs.
> 
>> Doing so ensures we don't have any guest-observable consequences due
>> to our migration of TSC state. To me, adding the guest IA32_TSC_ADJUST
>> MSR into the new counter state structure is probably best. No strong
>> opinions in either direction on this point, though:)
> 
> Either is good for me, since documentation will be very important either way.  This is a complex API to use due to the possibility of skewed TSCs.
> 
> Just one thing, please don't introduce a new ioctl and use KVM_GET_DEVICE_ATTR/KVM_SET_DEVICE_ATTR/KVM_HAS_DEVICE_ATTR.
> 
> Christian, based on what Oliver mentions here, it's probably useful for s390 to have functionality to get/set kvm->arch.epoch and kvm->arch.epdx in addition to the absolute TOD values that you are migrating now.

Yes, a scheme where we migrate the offsets (assuming that the hosts are synced) would be often better.
If the hosts are not synced, things will be harder. I will have a look at this series, Thanks for the pointer.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state
@ 2021-06-10  6:53             ` Christian Borntraeger
  0 siblings, 0 replies; 46+ messages in thread
From: Christian Borntraeger @ 2021-06-10  6:53 UTC (permalink / raw)
  To: Paolo Bonzini, Oliver Upton
  Cc: kvm list, Sean Christopherson, Raghavendra Rao Anata,
	Peter Shier, Maxim Levitsky, Marc Zyngier, David Matlack, kvmarm,
	Jim Mattson



On 10.06.21 08:22, Paolo Bonzini wrote:
> On 10/06/21 00:04, Oliver Upton wrote:
>>> Your approach still needs to use the "quirky" approach to host-initiated
>>> MSR_IA32_TSC_ADJUST writes, which write the MSR without affecting the
>>> VMCS offset.  This is just a documentation issue.
>>
>> My suggested ioctl for the vCPU will still exist, and it will still
>> affect the VMCS tsc offset, right? However, we need to do one of the
>> following:
>>
>> - Stash the guest's MSR_IA32_TSC_ADJUST value in the
>> kvm_system_counter_state structure. During
>> KVM_SET_SYSTEM_COUNTER_STATE, check to see if the field is valid. If
>> so, treat it as a dumb value (i.e. show it to the guest but don't fold
>> it into the offset).
> 
> Yes, it's already folded into the guestTSC-hostTSC offset that the caller provides.
> 
>> - Inform userspace that it must still migrate MSR_IA32_TSC_ADJUST, and
>> continue to our quirky behavior around host-initiated writes of the
>> MSR.
>>
>> This is why Maxim's spin migrated a value for IA32_TSC_ADJUST, right?
> 
> Yes, so that he could then remove (optionally) the quirk for host-initiated writes to the TSC and TSC_ADJUST MSRs.
> 
>> Doing so ensures we don't have any guest-observable consequences due
>> to our migration of TSC state. To me, adding the guest IA32_TSC_ADJUST
>> MSR into the new counter state structure is probably best. No strong
>> opinions in either direction on this point, though:)
> 
> Either is good for me, since documentation will be very important either way.  This is a complex API to use due to the possibility of skewed TSCs.
> 
> Just one thing, please don't introduce a new ioctl and use KVM_GET_DEVICE_ATTR/KVM_SET_DEVICE_ATTR/KVM_HAS_DEVICE_ATTR.
> 
> Christian, based on what Oliver mentions here, it's probably useful for s390 to have functionality to get/set kvm->arch.epoch and kvm->arch.epdx in addition to the absolute TOD values that you are migrating now.

Yes, a scheme where we migrate the offsets (assuming that the hosts are synced) would be often better.
If the hosts are not synced, things will be harder. I will have a look at this series, Thanks for the pointer.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
  2021-06-09 14:51       ` Oliver Upton
@ 2021-06-10  6:54         ` Paolo Bonzini
  -1 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-10  6:54 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier
  Cc: kvm list, kvmarm, Sean Christopherson, Peter Shier, Jim Mattson,
	David Matlack, Ricardo Koller, Jing Zhang, Raghavendra Rao Anata,
	Alexandru Elisei, James Morse, Suzuki K Poulose

On 09/06/21 16:51, Oliver Upton wrote:
>> - You seem to allow each vcpu to get its own offset. I don't think
>>    that's right. The architecture defines that all PEs have the same
>>    view of the counters, and an EL1 guest should be given that
>>    illusion.
> Agreed. I would have preferred a VM-wide ioctl to do this, but since
> x86 explicitly allows for drifted TSCs that can't be the case in a
> generic ioctl. I can do the same broadcasting as we do in the case of
> a VMM write to CNTVCT_EL0.
> 

If you use VM-wide GET/SET_DEVICE_ATTR, please make it retrieve the host 
CLOCK_REALTIME and counter at the same time.

Paolo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE
@ 2021-06-10  6:54         ` Paolo Bonzini
  0 siblings, 0 replies; 46+ messages in thread
From: Paolo Bonzini @ 2021-06-10  6:54 UTC (permalink / raw)
  To: Oliver Upton, Marc Zyngier
  Cc: kvm list, Sean Christopherson, Peter Shier,
	Raghavendra Rao Anata, David Matlack, kvmarm, Jim Mattson

On 09/06/21 16:51, Oliver Upton wrote:
>> - You seem to allow each vcpu to get its own offset. I don't think
>>    that's right. The architecture defines that all PEs have the same
>>    view of the counters, and an EL1 guest should be given that
>>    illusion.
> Agreed. I would have preferred a VM-wide ioctl to do this, but since
> x86 explicitly allows for drifted TSCs that can't be the case in a
> generic ioctl. I can do the same broadcasting as we do in the case of
> a VMM write to CNTVCT_EL0.
> 

If you use VM-wide GET/SET_DEVICE_ATTR, please make it retrieve the host 
CLOCK_REALTIME and counter at the same time.

Paolo

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2021-06-10  6:55 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-08 21:47 [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state Oliver Upton
2021-06-08 21:47 ` Oliver Upton
2021-06-08 21:47 ` [PATCH 01/10] KVM: Introduce KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls Oliver Upton
2021-06-08 21:47   ` [PATCH 01/10] KVM: Introduce KVM_{GET, SET}_SYSTEM_COUNTER_STATE ioctls Oliver Upton
2021-06-08 21:47 ` [PATCH 02/10] KVM: arm64: Implement initial support for KVM_CAP_SYSTEM_COUNTER_STATE Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:55   ` Oliver Upton
2021-06-08 21:55     ` Oliver Upton
2021-06-09 10:23   ` Marc Zyngier
2021-06-09 10:23     ` Marc Zyngier
2021-06-09 14:51     ` Oliver Upton
2021-06-09 14:51       ` Oliver Upton
2021-06-10  6:54       ` Paolo Bonzini
2021-06-10  6:54         ` Paolo Bonzini
2021-06-10  6:26     ` Paolo Bonzini
2021-06-10  6:26       ` Paolo Bonzini
2021-06-08 21:47 ` [PATCH 03/10] selftests: KVM: Introduce system_counter_state_test Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 04/10] KVM: arm64: Add userspace control of the guest's physical counter Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:58   ` Oliver Upton
2021-06-08 21:58     ` Oliver Upton
2021-06-08 21:47 ` [PATCH 05/10] selftests: KVM: Add test cases for physical counter offsetting Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 06/10] selftests: KVM: Add counter emulation benchmark Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 07/10] KVM: x86: Refactor tsc synchronization code Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 08/10] KVM: x86: Implement KVM_CAP_SYSTEM_COUNTER_STATE Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 09/10] selftests: KVM: Add support for x86 to system_counter_state_test Oliver Upton
2021-06-08 21:47   ` Oliver Upton
2021-06-08 21:47 ` [PATCH 10/10] Documentation: KVM: Document KVM_{GET,SET}_SYSTEM_COUNTER_STATE ioctls Oliver Upton
2021-06-08 21:47   ` [PATCH 10/10] Documentation: KVM: Document KVM_{GET, SET}_SYSTEM_COUNTER_STATE ioctls Oliver Upton
2021-06-09 13:05 ` [PATCH 00/10] KVM: Add idempotent controls for migrating system counter state Paolo Bonzini
2021-06-09 13:05   ` Paolo Bonzini
2021-06-09 15:11   ` Oliver Upton
2021-06-09 15:11     ` Oliver Upton
2021-06-09 17:05     ` Paolo Bonzini
2021-06-09 17:05       ` Paolo Bonzini
2021-06-09 22:04       ` Oliver Upton
2021-06-09 22:04         ` Oliver Upton
2021-06-10  6:22         ` Paolo Bonzini
2021-06-10  6:22           ` Paolo Bonzini
2021-06-10  6:53           ` Christian Borntraeger
2021-06-10  6:53             ` Christian Borntraeger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.