All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests
@ 2017-03-31 16:06 Andrew Jones
  2017-03-31 16:06 ` [PATCH v2 1/9] KVM: add kvm_request_pending Andrew Jones
                   ` (9 more replies)
  0 siblings, 10 replies; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: cdall, marc.zyngier, pbonzini, rkrcmar

This series fixes some hard to produce races by introducing the use of
vcpu requests.  It also fixes a couple easier to produce races, ones
that have been produced with the PSCI kvm-unit-test test.  The easy two
are addressed in two different ways: the first takes advantage of
power_off having been changed to a vcpu request, the second caches vcpu
MPIDRs in order to avoid extracting them from sys_regs.  I've tested the
series on a Mustang and a ThunderX and compile-tested the ARM bits.

Patch 2/9 adds documentation, as, at least for me, understanding vcpu
request interplay with vcpu kicks and vcpu mode and the memory barriers
that interplay implies, is exhausting.  Hopefully the document is useful
to others.  I'm not married to it though, so it can be deferred/dropped
as people like...

v2:
  - No longer based on Radim's vcpu request API rework[1], except for
    including "add kvm_request_pending" as patch 1/9 [drew]
  - Added vcpu request documentation [drew]
  - Dropped the introduction of user settable MPIDRs [Christoffer]
  - Added vcpu requests to all request-less vcpu kicks [Christoffer]

[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1340496.html

Andrew Jones (7):
  KVM: Add documentation for VCPU requests
  KVM: arm/arm64: prepare to use vcpu requests
  KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  KVM: arm/arm64: replace vcpu->arch.power_off with a vcpu request
  KVM: arm/arm64: use a vcpu request on irq injection
  KVM: arm/arm64: PMU: remove request-less vcpu kick
  KVM: arm/arm64: avoid race by caching MPIDR

Levente Kurusa (1):
  KVM: arm/arm64: fix race in kvm_psci_vcpu_on

Radim Krčmář (1):
  KVM: add kvm_request_pending

 Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++
 arch/arm/include/asm/kvm_emulate.h          |   2 +-
 arch/arm/include/asm/kvm_host.h             |  13 ++--
 arch/arm/kvm/arm.c                          |  68 +++++++++++------
 arch/arm/kvm/coproc.c                       |  20 +++--
 arch/arm/kvm/handle_exit.c                  |   1 +
 arch/arm/kvm/psci.c                         |  18 ++---
 arch/arm64/include/asm/kvm_emulate.h        |   2 +-
 arch/arm64/include/asm/kvm_host.h           |  13 ++--
 arch/arm64/kvm/handle_exit.c                |   1 +
 arch/arm64/kvm/sys_regs.c                   |  27 +++----
 arch/mips/kvm/trap_emul.c                   |   2 +-
 arch/powerpc/kvm/booke.c                    |   2 +-
 arch/powerpc/kvm/powerpc.c                  |   5 +-
 arch/s390/kvm/kvm-s390.c                    |   2 +-
 arch/x86/kvm/x86.c                          |   4 +-
 include/linux/kvm_host.h                    |   5 ++
 virt/kvm/arm/arch_timer.c                   |   1 +
 virt/kvm/arm/pmu.c                          |  29 +++----
 virt/kvm/arm/vgic/vgic.c                    |  12 ++-
 20 files changed, 245 insertions(+), 96 deletions(-)
 create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst

-- 
2.9.3

^ permalink raw reply	[flat|nested] 85+ messages in thread

* [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
@ 2017-03-31 16:06 ` Andrew Jones
  2017-04-04 15:30   ` Christoffer Dall
  2017-03-31 16:06 ` [PATCH v2 2/9] KVM: Add documentation for VCPU requests Andrew Jones
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: cdall, marc.zyngier, pbonzini, rkrcmar

From: Radim Krčmář <rkrcmar@redhat.com>

A first step in vcpu->requests encapsulation.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/mips/kvm/trap_emul.c  | 2 +-
 arch/powerpc/kvm/booke.c   | 2 +-
 arch/powerpc/kvm/powerpc.c | 5 ++---
 arch/s390/kvm/kvm-s390.c   | 2 +-
 arch/x86/kvm/x86.c         | 4 ++--
 include/linux/kvm_host.h   | 5 +++++
 6 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/mips/kvm/trap_emul.c b/arch/mips/kvm/trap_emul.c
index b1fa53b252ea..9ac8b1d62643 100644
--- a/arch/mips/kvm/trap_emul.c
+++ b/arch/mips/kvm/trap_emul.c
@@ -1029,7 +1029,7 @@ static void kvm_trap_emul_check_requests(struct kvm_vcpu *vcpu, int cpu,
 	struct mm_struct *mm;
 	int i;
 
-	if (likely(!vcpu->requests))
+	if (likely(!kvm_request_pending(vcpu)))
 		return;
 
 	if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 0514cbd4e533..65ed6595c9c2 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -682,7 +682,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
 
 	kvmppc_core_check_exceptions(vcpu);
 
-	if (vcpu->requests) {
+	if (kvm_request_pending(vcpu)) {
 		/* Exception delivery raised request; start over */
 		return 1;
 	}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 95c91a9de351..714674ea5be6 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -52,8 +52,7 @@ EXPORT_SYMBOL_GPL(kvmppc_pr_ops);
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return !!(v->arch.pending_exceptions) ||
-	       v->requests;
+	return !!(v->arch.pending_exceptions) || kvm_request_pending(v);
 }
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
@@ -105,7 +104,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 		 */
 		smp_mb();
 
-		if (vcpu->requests) {
+		if (kvm_request_pending(vcpu)) {
 			/* Make sure we process requests preemptable */
 			local_irq_enable();
 			trace_kvm_check_requests(vcpu);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index fd6cd05bb6a7..40ad6c8d082f 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2396,7 +2396,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
 {
 retry:
 	kvm_s390_vcpu_request_handled(vcpu);
-	if (!vcpu->requests)
+	if (!kvm_request_pending(vcpu))
 		return 0;
 	/*
 	 * We use MMU_RELOAD just to re-arm the ipte notifier for the
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1faf620a6fdc..9714bb230524 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6726,7 +6726,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	if (vcpu->requests) {
+	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
 			kvm_mmu_unload(vcpu);
 		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
@@ -6890,7 +6890,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_x86_ops->sync_pir_to_irr(vcpu);
 	}
 
-	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
+	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
 	    || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2c14ad9809da..946bf0b3c43c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1085,6 +1085,11 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
 
 #endif /* CONFIG_HAVE_KVM_EVENTFD */
 
+static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
+{
+	return READ_ONCE(vcpu->requests);
+}
+
 static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
 {
 	/*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
  2017-03-31 16:06 ` [PATCH v2 1/9] KVM: add kvm_request_pending Andrew Jones
@ 2017-03-31 16:06 ` Andrew Jones
  2017-04-04 15:24   ` Christoffer Dall
  2017-04-06 10:18   ` Christian Borntraeger
  2017-03-31 16:06 ` [PATCH v2 3/9] KVM: arm/arm64: prepare to use vcpu requests Andrew Jones
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: cdall, marc.zyngier, pbonzini, rkrcmar

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++
 1 file changed, 114 insertions(+)
 create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst

diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
new file mode 100644
index 000000000000..ea4a966d5c8a
--- /dev/null
+++ b/Documentation/virtual/kvm/vcpu-requests.rst
@@ -0,0 +1,114 @@
+=================
+KVM VCPU Requests
+=================
+
+Overview
+========
+
+KVM supports an internal API enabling threads to request a VCPU thread to
+perform some activity.  For example, a thread may request a VCPU to flush
+its TLB with a VCPU request.  The API consists of only four calls::
+
+  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
+  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
+
+  /* Check if any requests are pending for VCPU @vcpu. */
+  bool kvm_request_pending(struct kvm_vcpu *vcpu);
+
+  /* Make request @req of VCPU @vcpu. */
+  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
+
+  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
+  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
+
+Typically a requester wants the VCPU to perform the activity as soon
+as possible after making the request.  This means most requests,
+kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
+and kvm_make_all_cpus_request() has the kicking of all VCPUs built
+into it.
+
+VCPU Kicks
+----------
+
+A VCPU kick does one of three things:
+
+ 1) wakes a sleeping VCPU (which sleeps outside guest mode).
+ 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
+    out.
+ 3) nothing, when the VCPU is already outside guest mode and not sleeping.
+
+VCPU Request Internals
+======================
+
+VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
+means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
+may also be used.  The first 8 bits are reserved for architecture
+independent requests, all additional bits are available for architecture
+dependent requests.
+
+VCPU Requests with Associated State
+===================================
+
+Requesters that want the requested VCPU to handle new state need to ensure
+the state is observable to the requested VCPU thread's CPU at the time the
+CPU observes the request.  This means a write memory barrier should be
+insert between the preparation of the state and the write of the VCPU
+request bitmap.  Additionally, on the requested VCPU thread's side, a
+corresponding read barrier should be issued after reading the request bit
+and before proceeding to use the state associated with it.  See the kernel
+memory barrier documentation [2].
+
+VCPU Requests and Guest Mode
+============================
+
+As long as the guest is either in guest mode, in which case it gets an IPI
+and will definitely see the request, or is outside guest mode, but has yet
+to do its final request check, and therefore when it does, it will see the
+request, then things will work.  However, the transition from outside to
+inside guest mode, after the last request check has been made, opens a
+window where a request could be made, but the VCPU would not see until it
+exits guest mode some time later.  See the table below.
+
++------------------+-----------------+----------------+--------------+
+| vcpu->mode       | done last check | kick sends IPI | request seen |
++==================+=================+================+==============+
+| IN_GUEST_MODE    |      N/A        |      YES       |     YES      |
++------------------+-----------------+----------------+--------------+
+| !IN_GUEST_MODE   |      NO         |      NO        |     YES      |
++------------------+-----------------+----------------+--------------+
+| !IN_GUEST_MODE   |      YES        |      NO        |     NO       |
++------------------+-----------------+----------------+--------------+
+
+To ensure the third scenario shown in the table above cannot happen, we
+need to ensure the VCPU's mode change is observable by all CPUs prior to
+its final request check and that a requester's request is observable by
+the requested VCPU prior to the kick.  To do that we need general memory
+barriers between each pair of operations involving mode and requests, i.e.
+
+  CPU_i                                  CPU_j
+-------------------------------------------------------------------------
+  vcpu->mode = IN_GUEST_MODE;            kvm_make_request(REQ, vcpu);
+  smp_mb();                              smp_mb();
+  if (kvm_request_pending(vcpu))         if (vcpu->mode == IN_GUEST_MODE)
+      handle_requests();                     send_IPI(vcpu->cpu);
+
+Whether explicit barriers are needed, or reliance on implicit barriers is
+sufficient, is architecture dependent.  Alternatively, an architecture may
+choose to just always send the IPI, as not sending it, when it's not
+necessary, is just an optimization.
+
+Additionally, the error prone third scenario described above also exhibits
+why a request-less VCPU kick is almost never correct.  Without the
+assurance that a non-IPI generating kick will still result in an action by
+the requested VCPU, as the final kvm_request_pending() check does, then
+the kick may not initiate anything useful at all.  If, for instance, a
+request-less kick was made to a VCPU that was just about to set its mode
+to IN_GUEST_MODE, meaning no IPI is sent, then the VCPU may continue its
+entry without actually having done whatever it was the kick was meant to
+initiate.
+
+References
+==========
+
+[1] Documentation/core-api/atomic_ops.rst
+[2] Documentation/memory-barriers.txt
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v2 3/9] KVM: arm/arm64: prepare to use vcpu requests
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
  2017-03-31 16:06 ` [PATCH v2 1/9] KVM: add kvm_request_pending Andrew Jones
  2017-03-31 16:06 ` [PATCH v2 2/9] KVM: Add documentation for VCPU requests Andrew Jones
@ 2017-03-31 16:06 ` Andrew Jones
  2017-04-04 15:34   ` Christoffer Dall
  2017-03-31 16:06 ` [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request Andrew Jones
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: cdall, marc.zyngier, pbonzini, rkrcmar

Make sure we don't leave vcpu requests we don't intend to
handle later set in the request bitmap. If we don't clear
them, then kvm_request_pending() may return true when we
don't want it to.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
---
 arch/arm/kvm/handle_exit.c   | 1 +
 arch/arm/kvm/psci.c          | 1 +
 arch/arm64/kvm/handle_exit.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
index 96af65a30d78..ffb2406e5905 100644
--- a/arch/arm/kvm/handle_exit.c
+++ b/arch/arm/kvm/handle_exit.c
@@ -72,6 +72,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		trace_kvm_wfx(*vcpu_pc(vcpu), false);
 		vcpu->stat.wfi_exit_stat++;
 		kvm_vcpu_block(vcpu);
+		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
 	}
 
 	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index c2b131527a64..82fe7eb5b6a7 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -57,6 +57,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 	 * for KVM will preserve the register state.
 	 */
 	kvm_vcpu_block(vcpu);
+	clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
 
 	return PSCI_RET_SUCCESS;
 }
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index fa1b18e364fc..e4937fb2fb89 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -89,6 +89,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
 		vcpu->stat.wfi_exit_stat++;
 		kvm_vcpu_block(vcpu);
+		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
 	}
 
 	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (2 preceding siblings ...)
  2017-03-31 16:06 ` [PATCH v2 3/9] KVM: arm/arm64: prepare to use vcpu requests Andrew Jones
@ 2017-03-31 16:06 ` Andrew Jones
  2017-04-04 13:39   ` Marc Zyngier
  2017-04-04 16:04   ` Christoffer Dall
  2017-03-31 16:06 ` [PATCH v2 5/9] KVM: arm/arm64: replace vcpu->arch.power_off " Andrew Jones
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

This not only ensures visibility of changes to pause by using
atomic ops, but also plugs a small race where a vcpu could get its
pause state enabled just after its last check before entering the
guest. With this patch, while the vcpu will still initially enter
the guest, it will exit immediately due to the IPI sent by the vcpu
kick issued after making the vcpu request.

We use bitops, rather than kvm_make/check_request(), because we
don't need the barriers they provide, nor do we want the side-effect
of kvm_check_request() clearing the request. For pause, only the
requester should do the clearing.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/include/asm/kvm_host.h   |  5 +----
 arch/arm/kvm/arm.c                | 45 +++++++++++++++++++++++++++------------
 arch/arm64/include/asm/kvm_host.h |  5 +----
 3 files changed, 33 insertions(+), 22 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 31ee468ce667..52c25536d254 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -45,7 +45,7 @@
 #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
 #endif
 
-#define KVM_REQ_VCPU_EXIT	8
+#define KVM_REQ_PAUSE		8
 
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 int __attribute_const__ kvm_target_cpu(void);
@@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
 	/* vcpu power-off state */
 	bool power_off;
 
-	 /* Don't run the guest (internal implementation need) */
-	bool pause;
-
 	/* IO related fields */
 	struct kvm_decode mmio_decode;
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 314eb6abe1ff..f3bfbb5f3d96 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
+	/*
+	 * If we return true from this function, then it means the vcpu is
+	 * either in guest mode, or has already indicated that it's in guest
+	 * mode. The indication is done by setting ->mode to IN_GUEST_MODE,
+	 * and must be done before the final kvm_request_pending() read. It's
+	 * important that the observability of that order be enforced and that
+	 * the request receiving CPU can observe any new request before the
+	 * requester issues a kick. Thus, the general barrier below pairs with
+	 * the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
+	 * write to ->mode and the final request pending read.
+	 */
+	smp_mb();
 	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
 }
 
@@ -404,7 +416,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
 	return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
-		&& !v->arch.power_off && !v->arch.pause);
+		&& !v->arch.power_off
+		&& !test_bit(KVM_REQ_PAUSE, &v->requests));
 }
 
 /* Just ensure a guest exit from a particular CPU */
@@ -535,17 +548,12 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
 
 void kvm_arm_halt_guest(struct kvm *kvm)
 {
-	int i;
-	struct kvm_vcpu *vcpu;
-
-	kvm_for_each_vcpu(i, vcpu, kvm)
-		vcpu->arch.pause = true;
-	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
+	kvm_make_all_cpus_request(kvm, KVM_REQ_PAUSE);
 }
 
 void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.pause = true;
+	set_bit(KVM_REQ_PAUSE, &vcpu->requests);
 	kvm_vcpu_kick(vcpu);
 }
 
@@ -553,7 +561,7 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
-	vcpu->arch.pause = false;
+	clear_bit(KVM_REQ_PAUSE, &vcpu->requests);
 	swake_up(wq);
 }
 
@@ -571,7 +579,7 @@ static void vcpu_sleep(struct kvm_vcpu *vcpu)
 	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
 	swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
-				       (!vcpu->arch.pause)));
+		(!test_bit(KVM_REQ_PAUSE, &vcpu->requests))));
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -624,7 +632,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		update_vttbr(vcpu->kvm);
 
-		if (vcpu->arch.power_off || vcpu->arch.pause)
+		if (vcpu->arch.power_off || test_bit(KVM_REQ_PAUSE, &vcpu->requests))
 			vcpu_sleep(vcpu);
 
 		/*
@@ -647,8 +655,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			run->exit_reason = KVM_EXIT_INTR;
 		}
 
+		/*
+		 * Indicate we're in guest mode now, before doing a final
+		 * check for pending vcpu requests. The general barrier
+		 * pairs with the one in kvm_arch_vcpu_should_kick().
+		 * Please see the comment there for more details.
+		 */
+		WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
+		smp_mb();
+
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
-			vcpu->arch.power_off || vcpu->arch.pause) {
+			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
+			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
 			local_irq_enable();
 			kvm_pmu_sync_hwstate(vcpu);
 			kvm_timer_sync_hwstate(vcpu);
@@ -664,11 +682,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		trace_kvm_entry(*vcpu_pc(vcpu));
 		guest_enter_irqoff();
-		vcpu->mode = IN_GUEST_MODE;
 
 		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
-		vcpu->mode = OUTSIDE_GUEST_MODE;
+		WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
 		vcpu->stat.exits++;
 		/*
 		 * Back from guest
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e7705e7bb07b..6e1271a77e92 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -42,7 +42,7 @@
 
 #define KVM_VCPU_MAX_FEATURES 4
 
-#define KVM_REQ_VCPU_EXIT	8
+#define KVM_REQ_PAUSE		8
 
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
@@ -256,9 +256,6 @@ struct kvm_vcpu_arch {
 	/* vcpu power-off state */
 	bool power_off;
 
-	/* Don't run the guest (internal implementation need) */
-	bool pause;
-
 	/* IO related fields */
 	struct kvm_decode mmio_decode;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v2 5/9] KVM: arm/arm64: replace vcpu->arch.power_off with a vcpu request
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (3 preceding siblings ...)
  2017-03-31 16:06 ` [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request Andrew Jones
@ 2017-03-31 16:06 ` Andrew Jones
  2017-04-04 17:37   ` Christoffer Dall
  2017-03-31 16:06 ` [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection Andrew Jones
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: cdall, marc.zyngier, pbonzini, rkrcmar

Like pause, replacing power_off with a vcpu request ensures
visibility of changes and avoids the final race before entering
the guest.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/include/asm/kvm_host.h   |  4 +---
 arch/arm/kvm/arm.c                | 32 ++++++++++++++++++--------------
 arch/arm/kvm/psci.c               | 17 +++++------------
 arch/arm64/include/asm/kvm_host.h |  4 +---
 4 files changed, 25 insertions(+), 32 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 52c25536d254..afed5d44634d 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -46,6 +46,7 @@
 #endif
 
 #define KVM_REQ_PAUSE		8
+#define KVM_REQ_POWER_OFF	9
 
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 int __attribute_const__ kvm_target_cpu(void);
@@ -170,9 +171,6 @@ struct kvm_vcpu_arch {
 	 * here.
 	 */
 
-	/* vcpu power-off state */
-	bool power_off;
-
 	/* IO related fields */
 	struct kvm_decode mmio_decode;
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index f3bfbb5f3d96..7ed39060b1cf 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -381,7 +381,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
-	if (vcpu->arch.power_off)
+	if (test_bit(KVM_REQ_POWER_OFF, &vcpu->requests))
 		mp_state->mp_state = KVM_MP_STATE_STOPPED;
 	else
 		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
@@ -394,10 +394,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 {
 	switch (mp_state->mp_state) {
 	case KVM_MP_STATE_RUNNABLE:
-		vcpu->arch.power_off = false;
+		clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
 		break;
 	case KVM_MP_STATE_STOPPED:
-		vcpu->arch.power_off = true;
+		set_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
 		break;
 	default:
 		return -EINVAL;
@@ -415,9 +415,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
  */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
-		&& !v->arch.power_off
-		&& !test_bit(KVM_REQ_PAUSE, &v->requests));
+	return (!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v)) &&
+		!test_bit(KVM_REQ_POWER_OFF, &v->requests) &&
+		!test_bit(KVM_REQ_PAUSE, &v->requests);
 }
 
 /* Just ensure a guest exit from a particular CPU */
@@ -578,8 +578,9 @@ static void vcpu_sleep(struct kvm_vcpu *vcpu)
 {
 	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
-	swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
-		(!test_bit(KVM_REQ_PAUSE, &vcpu->requests))));
+	swait_event_interruptible(*wq,
+		!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests) &&
+		!test_bit(KVM_REQ_PAUSE, &vcpu->requests));
 }
 
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -632,8 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		update_vttbr(vcpu->kvm);
 
-		if (vcpu->arch.power_off || test_bit(KVM_REQ_PAUSE, &vcpu->requests))
-			vcpu_sleep(vcpu);
+		if (kvm_request_pending(vcpu)) {
+			if (test_bit(KVM_REQ_POWER_OFF, &vcpu->requests) ||
+			    test_bit(KVM_REQ_PAUSE, &vcpu->requests))
+				vcpu_sleep(vcpu);
+		}
 
 		/*
 		 * Preparing the interrupts to be injected also
@@ -664,8 +668,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
 		smp_mb();
 
-		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
-			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
+		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)
+		    || kvm_request_pending(vcpu)) {
 			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
 			local_irq_enable();
 			kvm_pmu_sync_hwstate(vcpu);
@@ -892,9 +896,9 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	 * Handle the "start in power-off" case.
 	 */
 	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
-		vcpu->arch.power_off = true;
+		set_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
 	else
-		vcpu->arch.power_off = false;
+		clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
 
 	return 0;
 }
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index 82fe7eb5b6a7..f732484abc7a 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -64,7 +64,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 
 static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
 {
-	vcpu->arch.power_off = true;
+	set_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
 }
 
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
@@ -88,7 +88,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	 */
 	if (!vcpu)
 		return PSCI_RET_INVALID_PARAMS;
-	if (!vcpu->arch.power_off) {
+	if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
 		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
 			return PSCI_RET_ALREADY_ON;
 		else
@@ -116,8 +116,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	 * the general puspose registers are undefined upon CPU_ON.
 	 */
 	vcpu_set_reg(vcpu, 0, context_id);
-	vcpu->arch.power_off = false;
-	smp_mb();		/* Make sure the above is visible */
+	clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
 
 	wq = kvm_arch_vcpu_wq(vcpu);
 	swake_up(wq);
@@ -154,7 +153,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 		mpidr = kvm_vcpu_get_mpidr_aff(tmp);
 		if ((mpidr & target_affinity_mask) == target_affinity) {
 			matching_cpus++;
-			if (!tmp->arch.power_off)
+			if (!test_bit(KVM_REQ_POWER_OFF, &tmp->requests))
 				return PSCI_0_2_AFFINITY_LEVEL_ON;
 		}
 	}
@@ -167,9 +166,6 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
 
 static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 {
-	int i;
-	struct kvm_vcpu *tmp;
-
 	/*
 	 * The KVM ABI specifies that a system event exit may call KVM_RUN
 	 * again and may perform shutdown/reboot at a later time that when the
@@ -179,10 +175,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 	 * after this call is handled and before the VCPUs have been
 	 * re-initialized.
 	 */
-	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
-		tmp->arch.power_off = true;
-		kvm_vcpu_kick(tmp);
-	}
+	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_POWER_OFF);
 
 	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
 	vcpu->run->system_event.type = type;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6e1271a77e92..e78895f675d0 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -43,6 +43,7 @@
 #define KVM_VCPU_MAX_FEATURES 4
 
 #define KVM_REQ_PAUSE		8
+#define KVM_REQ_POWER_OFF	9
 
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
@@ -253,9 +254,6 @@ struct kvm_vcpu_arch {
 		u32	mdscr_el1;
 	} guest_debug_preserved;
 
-	/* vcpu power-off state */
-	bool power_off;
-
 	/* IO related fields */
 	struct kvm_decode mmio_decode;
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (4 preceding siblings ...)
  2017-03-31 16:06 ` [PATCH v2 5/9] KVM: arm/arm64: replace vcpu->arch.power_off " Andrew Jones
@ 2017-03-31 16:06 ` Andrew Jones
  2017-04-04 17:42   ` Christoffer Dall
  2017-04-04 18:51   ` Paolo Bonzini
  2017-03-31 16:06 ` [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick Andrew Jones
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
kick meant to trigger the interrupt injection could be sent while
the VCPU is outside guest mode, which means no IPI is sent, and
after it has called kvm_vgic_flush_hwstate(), meaning it won't see
the updated GIC state until its next exit some time later for some
other reason.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/arm.c                |  1 +
 arch/arm64/include/asm/kvm_host.h |  1 +
 virt/kvm/arm/arch_timer.c         |  1 +
 virt/kvm/arm/vgic/vgic.c          | 12 ++++++++++--
 5 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index afed5d44634d..0b8a6d6b3cb3 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -47,6 +47,7 @@
 
 #define KVM_REQ_PAUSE		8
 #define KVM_REQ_POWER_OFF	9
+#define KVM_REQ_IRQ_PENDING	10
 
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 int __attribute_const__ kvm_target_cpu(void);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 7ed39060b1cf..a106feccf314 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -768,6 +768,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
 	 * trigger a world-switch round on the running physical CPU to set the
 	 * virtual IRQ/FIQ fields in the HCR appropriately.
 	 */
+	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 	kvm_vcpu_kick(vcpu);
 
 	return 0;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index e78895f675d0..7057512b3474 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -44,6 +44,7 @@
 
 #define KVM_REQ_PAUSE		8
 #define KVM_REQ_POWER_OFF	9
+#define KVM_REQ_IRQ_PENDING	10
 
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 35d7100e0815..3c48abbf951b 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
 	 * If the vcpu is blocked we want to wake it up so that it will see
 	 * the timer has expired when entering the guest.
 	 */
+	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 	kvm_vcpu_kick(vcpu);
 }
 
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 654dfd40e449..31fb89057f0c 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 		 * won't see this one until it exits for some other
 		 * reason.
 		 */
-		if (vcpu)
+		if (vcpu) {
+			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 			kvm_vcpu_kick(vcpu);
+		}
 		return false;
 	}
 
@@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 	spin_unlock(&irq->irq_lock);
 	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
 
+	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 	kvm_vcpu_kick(vcpu);
 
 	return true;
@@ -654,6 +657,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
 	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
 	vgic_flush_lr_state(vcpu);
 	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
+
+	/* The GIC is now ready to deliver the IRQ. */
+	clear_bit(KVM_REQ_IRQ_PENDING, &vcpu->requests);
 }
 
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
@@ -691,8 +697,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
 	 * a good kick...
 	 */
 	kvm_for_each_vcpu(c, vcpu, kvm) {
-		if (kvm_vgic_vcpu_pending_irq(vcpu))
+		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
+			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 			kvm_vcpu_kick(vcpu);
+		}
 	}
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (5 preceding siblings ...)
  2017-03-31 16:06 ` [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection Andrew Jones
@ 2017-03-31 16:06 ` Andrew Jones
  2017-04-04 17:46   ` Christoffer Dall
  2017-03-31 16:06 ` [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on Andrew Jones
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

Refactor PMU overflow handling in order to remove the request-less
vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
there should be no chance that a kick sent at just the wrong time
(between the VCPU's call to kvm_pmu_flush_hwstate() and before it
enters guest mode) results in a failure for the guest to see updated
GIC state until its next exit some time later for some other reason.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 virt/kvm/arm/pmu.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 69ccce308458..9d725f3afb11 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -203,6 +203,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
 	return reg;
 }
 
+static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = &vcpu->arch.pmu;
+	bool overflow;
+
+	overflow = !!kvm_pmu_overflow_status(vcpu);
+	if (pmu->irq_level != overflow) {
+		pmu->irq_level = overflow;
+		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+				    pmu->irq_num, overflow);
+	}
+}
+
 /**
  * kvm_pmu_overflow_set - set PMU overflow interrupt
  * @vcpu: The vcpu pointer
@@ -210,31 +223,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
  */
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
 {
-	u64 reg;
-
 	if (val == 0)
 		return;
 
 	vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
-	reg = kvm_pmu_overflow_status(vcpu);
-	if (reg != 0)
-		kvm_vcpu_kick(vcpu);
+	kvm_pmu_check_overflow(vcpu);
 }
 
 static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
 {
-	struct kvm_pmu *pmu = &vcpu->arch.pmu;
-	bool overflow;
-
 	if (!kvm_arm_pmu_v3_ready(vcpu))
 		return;
 
-	overflow = !!kvm_pmu_overflow_status(vcpu);
-	if (pmu->irq_level != overflow) {
-		pmu->irq_level = overflow;
-		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
-				    pmu->irq_num, overflow);
-	}
+	kvm_pmu_check_overflow(vcpu);
 }
 
 /**
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (6 preceding siblings ...)
  2017-03-31 16:06 ` [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick Andrew Jones
@ 2017-03-31 16:06 ` Andrew Jones
  2017-04-04 19:42   ` Christoffer Dall
  2017-03-31 16:06 ` [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR Andrew Jones
  2017-04-03 15:28 ` [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Christoffer Dall
  9 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: cdall, marc.zyngier, pbonzini, rkrcmar, Levente Kurusa

From: Levente Kurusa <lkurusa@redhat.com>

When two vcpus issue PSCI_CPU_ON on the same core at the same time,
then it's possible for them to both enter the target vcpu's setup
at the same time. This results in unexpected behaviors at best,
and the potential for some nasty bugs at worst.

Signed-off-by: Levente Kurusa <lkurusa@redhat.com>
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/kvm/psci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index f732484abc7a..0204daa899b1 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	 */
 	if (!vcpu)
 		return PSCI_RET_INVALID_PARAMS;
-	if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
+
+	if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
 		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
 			return PSCI_RET_ALREADY_ON;
 		else
@@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 	 * the general puspose registers are undefined upon CPU_ON.
 	 */
 	vcpu_set_reg(vcpu, 0, context_id);
-	clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
 
 	wq = kvm_arch_vcpu_wq(vcpu);
 	swake_up(wq);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (7 preceding siblings ...)
  2017-03-31 16:06 ` [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on Andrew Jones
@ 2017-03-31 16:06 ` Andrew Jones
  2017-04-04 19:44   ` Christoffer Dall
  2017-04-03 15:28 ` [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Christoffer Dall
  9 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-03-31 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: cdall, marc.zyngier, pbonzini, rkrcmar

Cache the MPIDR in the vcpu structure to fix potential races that
can arise between vcpu reset and the extraction of the MPIDR from
the sys-reg array.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/include/asm/kvm_emulate.h   |  2 +-
 arch/arm/include/asm/kvm_host.h      |  3 +++
 arch/arm/kvm/coproc.c                | 20 ++++++++++++--------
 arch/arm64/include/asm/kvm_emulate.h |  2 +-
 arch/arm64/include/asm/kvm_host.h    |  3 +++
 arch/arm64/kvm/sys_regs.c            | 27 ++++++++++++++-------------
 6 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
index 9a8a45aaf19a..1b922de46785 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -213,7 +213,7 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK;
+	return vcpu->arch.vmpidr & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 0b8a6d6b3cb3..e0f461f0af67 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -151,6 +151,9 @@ struct kvm_vcpu_arch {
 	/* The CPU type we expose to the VM */
 	u32 midr;
 
+	/* vcpu MPIDR */
+	u32 vmpidr;
+
 	/* HYP trapping configuration */
 	u32 hcr;
 
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 3e5e4194ef86..c4df7c9c8ddb 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -101,14 +101,18 @@ int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
 {
-	/*
-	 * Compute guest MPIDR. We build a virtual cluster out of the
-	 * vcpu_id, but we read the 'U' bit from the underlying
-	 * hardware directly.
-	 */
-	vcpu_cp15(vcpu, c0_MPIDR) = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
-				     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
-				     (vcpu->vcpu_id & 3));
+	if (!vcpu->arch.vmpidr) {
+		/*
+		 * Compute guest MPIDR. We build a virtual cluster out of the
+		 * vcpu_id, but we read the 'U' bit from the underlying
+		 * hardware directly.
+		 */
+		u32 mpidr = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
+			     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
+			     (vcpu->vcpu_id & 3));
+		vcpu->arch.vmpidr = mpidr;
+	}
+	vcpu_cp15(vcpu, c0_MPIDR) = vcpu->arch.vmpidr;
 }
 
 /* TRM entries A7:4.3.31 A15:4.3.28 - RO WI */
diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
index f5ea0ba70f07..c138bb15b507 100644
--- a/arch/arm64/include/asm/kvm_emulate.h
+++ b/arch/arm64/include/asm/kvm_emulate.h
@@ -242,7 +242,7 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
 
 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
 {
-	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
+	return vcpu->arch.vmpidr_el2 & MPIDR_HWID_BITMASK;
 }
 
 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7057512b3474..268c10d95a79 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -198,6 +198,9 @@ typedef struct kvm_cpu_context kvm_cpu_context_t;
 struct kvm_vcpu_arch {
 	struct kvm_cpu_context ctxt;
 
+	/* vcpu MPIDR */
+	u64 vmpidr_el2;
+
 	/* HYP configuration */
 	u64 hcr_el2;
 	u32 mdcr_el2;
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 0e26f8c2b56f..517aed6d8016 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -431,19 +431,20 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 
 static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
 {
-	u64 mpidr;
-
-	/*
-	 * Map the vcpu_id into the first three affinity level fields of
-	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
-	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
-	 * of the GICv3 to be able to address each CPU directly when
-	 * sending IPIs.
-	 */
-	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
-	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
-	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
-	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
+	if (!vcpu->arch.vmpidr_el2) {
+		/*
+		 * Map the vcpu_id into the first three affinity level fields
+		 * of the MPIDR. We limit the number of VCPUs in level 0 due to
+		 * a limitation of 16 CPUs in that level in the ICC_SGIxR
+		 * registers of the GICv3, which are used to address each CPU
+		 * directly when sending IPIs.
+		 */
+		u64 mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
+		mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
+		mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
+		vcpu->arch.vmpidr_el2 = (1ULL << 31) | mpidr;
+	}
+	vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu->arch.vmpidr_el2;
 }
 
 static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests
  2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (8 preceding siblings ...)
  2017-03-31 16:06 ` [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR Andrew Jones
@ 2017-04-03 15:28 ` Christoffer Dall
  2017-04-03 17:11   ` Paolo Bonzini
  2017-04-04  7:27   ` Andrew Jones
  9 siblings, 2 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-03 15:28 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

Hi Drew,

On Fri, Mar 31, 2017 at 06:06:49PM +0200, Andrew Jones wrote:
> This series fixes some hard to produce races by introducing the use of
> vcpu requests.  It also fixes a couple easier to produce races, ones
> that have been produced with the PSCI kvm-unit-test test.  The easy two
> are addressed in two different ways: the first takes advantage of
> power_off having been changed to a vcpu request, the second caches vcpu
> MPIDRs in order to avoid extracting them from sys_regs.  I've tested the
> series on a Mustang and a ThunderX and compile-tested the ARM bits.
> 
> Patch 2/9 adds documentation, as, at least for me, understanding vcpu
> request interplay with vcpu kicks and vcpu mode and the memory barriers
> that interplay implies, is exhausting.  Hopefully the document is useful
> to others.  I'm not married to it though, so it can be deferred/dropped
> as people like...

Sounds helpful, I'll have a look.

> 
> v2:
>   - No longer based on Radim's vcpu request API rework[1], except for
>     including "add kvm_request_pending" as patch 1/9 [drew]

I lost track here; did those patches get merged or dropped and why are
we not basing this work on them anymore, and should patch 1/9 be applied
here or is it expected to land in the KVM tree via some other path?

>   - Added vcpu request documentation [drew]
>   - Dropped the introduction of user settable MPIDRs [Christoffer]
>   - Added vcpu requests to all request-less vcpu kicks [Christoffer]
> 

Didn't we also have an issue with a missing barrier if the cmpxchg
operation doesn't succeed?  Did that fall though the cracks or is it
just missing in the changelog?

Thanks,
-Christoffer

> [1] https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1340496.html
> 
> Andrew Jones (7):
>   KVM: Add documentation for VCPU requests
>   KVM: arm/arm64: prepare to use vcpu requests
>   KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
>   KVM: arm/arm64: replace vcpu->arch.power_off with a vcpu request
>   KVM: arm/arm64: use a vcpu request on irq injection
>   KVM: arm/arm64: PMU: remove request-less vcpu kick
>   KVM: arm/arm64: avoid race by caching MPIDR
> 
> Levente Kurusa (1):
>   KVM: arm/arm64: fix race in kvm_psci_vcpu_on
> 
> Radim Krčmář (1):
>   KVM: add kvm_request_pending
> 
>  Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++
>  arch/arm/include/asm/kvm_emulate.h          |   2 +-
>  arch/arm/include/asm/kvm_host.h             |  13 ++--
>  arch/arm/kvm/arm.c                          |  68 +++++++++++------
>  arch/arm/kvm/coproc.c                       |  20 +++--
>  arch/arm/kvm/handle_exit.c                  |   1 +
>  arch/arm/kvm/psci.c                         |  18 ++---
>  arch/arm64/include/asm/kvm_emulate.h        |   2 +-
>  arch/arm64/include/asm/kvm_host.h           |  13 ++--
>  arch/arm64/kvm/handle_exit.c                |   1 +
>  arch/arm64/kvm/sys_regs.c                   |  27 +++----
>  arch/mips/kvm/trap_emul.c                   |   2 +-
>  arch/powerpc/kvm/booke.c                    |   2 +-
>  arch/powerpc/kvm/powerpc.c                  |   5 +-
>  arch/s390/kvm/kvm-s390.c                    |   2 +-
>  arch/x86/kvm/x86.c                          |   4 +-
>  include/linux/kvm_host.h                    |   5 ++
>  virt/kvm/arm/arch_timer.c                   |   1 +
>  virt/kvm/arm/pmu.c                          |  29 +++----
>  virt/kvm/arm/vgic/vgic.c                    |  12 ++-
>  20 files changed, 245 insertions(+), 96 deletions(-)
>  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> 
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests
  2017-04-03 15:28 ` [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Christoffer Dall
@ 2017-04-03 17:11   ` Paolo Bonzini
  2017-04-04  7:27   ` Andrew Jones
  1 sibling, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-03 17:11 UTC (permalink / raw)
  To: Christoffer Dall, Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, rkrcmar



On 03/04/2017 17:28, Christoffer Dall wrote:
>>   - No longer based on Radim's vcpu request API rework[1], except for
>>     including "add kvm_request_pending" as patch 1/9 [drew]
> 
> I lost track here; did those patches get merged or dropped and why are
> we not basing this work on them anymore, and should patch 1/9 be applied
> here or is it expected to land in the KVM tree via some other path?

Feel free to apply it, since any conflicts from those patches would not
reach Linus.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests
  2017-04-03 15:28 ` [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Christoffer Dall
  2017-04-03 17:11   ` Paolo Bonzini
@ 2017-04-04  7:27   ` Andrew Jones
  2017-04-04 16:05     ` Christoffer Dall
  1 sibling, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-04  7:27 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Mon, Apr 03, 2017 at 05:28:45PM +0200, Christoffer Dall wrote:
> Hi Drew,
> 
> On Fri, Mar 31, 2017 at 06:06:49PM +0200, Andrew Jones wrote:
> > This series fixes some hard to produce races by introducing the use of
> > vcpu requests.  It also fixes a couple easier to produce races, ones
> > that have been produced with the PSCI kvm-unit-test test.  The easy two
> > are addressed in two different ways: the first takes advantage of
> > power_off having been changed to a vcpu request, the second caches vcpu
> > MPIDRs in order to avoid extracting them from sys_regs.  I've tested the
> > series on a Mustang and a ThunderX and compile-tested the ARM bits.
> > 
> > Patch 2/9 adds documentation, as, at least for me, understanding vcpu
> > request interplay with vcpu kicks and vcpu mode and the memory barriers
> > that interplay implies, is exhausting.  Hopefully the document is useful
> > to others.  I'm not married to it though, so it can be deferred/dropped
> > as people like...
> 
> Sounds helpful, I'll have a look.
> 
> > 
> > v2:
> >   - No longer based on Radim's vcpu request API rework[1], except for
> >     including "add kvm_request_pending" as patch 1/9 [drew]
> 
> I lost track here; did those patches get merged or dropped and why are
> we not basing this work on them anymore, and should patch 1/9 be applied
> here or is it expected to land in the KVM tree via some other path?

I think Radim still wants to rework the API, but, as his work doesn't
provide fixes or functional changes, his timeline may not be the same
as for this series.  He also wants to expand his rework to add API
that includes kicking with requesting.  I'm not sure how all that will
look yet, so, in the end, I decided I might as well just use the current
API for now.  kvm_request_pending() was too nice an addition to drop
though.

> 
> >   - Added vcpu request documentation [drew]
> >   - Dropped the introduction of user settable MPIDRs [Christoffer]
> >   - Added vcpu requests to all request-less vcpu kicks [Christoffer]
> > 
> 
> Didn't we also have an issue with a missing barrier if the cmpxchg
> operation doesn't succeed?  Did that fall though the cracks or is it
> just missing in the changelog?

Just missing from the changelog. Sorry about that.

  - Ensure we have a read barrier (or equivalent) prior to issuing the
    cmpxchg in kvm_vcpu_exiting_guest_mode(), as a failed cmpxchg does
    not guarantee any barrier [Christoffer]

Thanks,
drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-03-31 16:06 ` [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request Andrew Jones
@ 2017-04-04 13:39   ` Marc Zyngier
  2017-04-04 14:47     ` Andrew Jones
  2017-04-04 16:04   ` Christoffer Dall
  1 sibling, 1 reply; 85+ messages in thread
From: Marc Zyngier @ 2017-04-04 13:39 UTC (permalink / raw)
  To: Andrew Jones, kvmarm, kvm; +Cc: cdall, pbonzini, rkrcmar

On 31/03/17 17:06, Andrew Jones wrote:
> This not only ensures visibility of changes to pause by using
> atomic ops, but also plugs a small race where a vcpu could get its
> pause state enabled just after its last check before entering the
> guest. With this patch, while the vcpu will still initially enter
> the guest, it will exit immediately due to the IPI sent by the vcpu
> kick issued after making the vcpu request.
> 
> We use bitops, rather than kvm_make/check_request(), because we
> don't need the barriers they provide, nor do we want the side-effect
> of kvm_check_request() clearing the request. For pause, only the
> requester should do the clearing.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  5 +----
>  arch/arm/kvm/arm.c                | 45 +++++++++++++++++++++++++++------------
>  arch/arm64/include/asm/kvm_host.h |  5 +----
>  3 files changed, 33 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 31ee468ce667..52c25536d254 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -45,7 +45,7 @@
>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>  #endif
>  
> -#define KVM_REQ_VCPU_EXIT	8
> +#define KVM_REQ_PAUSE		8

Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
something along those lines?

I've otherwise started hammering this series over a number of systems,
looking good so far.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 13:39   ` Marc Zyngier
@ 2017-04-04 14:47     ` Andrew Jones
  2017-04-04 14:51       ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-04 14:47 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm, kvm, cdall, pbonzini, rkrcmar

On Tue, Apr 04, 2017 at 02:39:19PM +0100, Marc Zyngier wrote:
> On 31/03/17 17:06, Andrew Jones wrote:
> > This not only ensures visibility of changes to pause by using
> > atomic ops, but also plugs a small race where a vcpu could get its
> > pause state enabled just after its last check before entering the
> > guest. With this patch, while the vcpu will still initially enter
> > the guest, it will exit immediately due to the IPI sent by the vcpu
> > kick issued after making the vcpu request.
> > 
> > We use bitops, rather than kvm_make/check_request(), because we
> > don't need the barriers they provide, nor do we want the side-effect
> > of kvm_check_request() clearing the request. For pause, only the
> > requester should do the clearing.
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  5 +----
> >  arch/arm/kvm/arm.c                | 45 +++++++++++++++++++++++++++------------
> >  arch/arm64/include/asm/kvm_host.h |  5 +----
> >  3 files changed, 33 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 31ee468ce667..52c25536d254 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -45,7 +45,7 @@
> >  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
> >  #endif
> >  
> > -#define KVM_REQ_VCPU_EXIT	8
> > +#define KVM_REQ_PAUSE		8
> 
> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
> something along those lines?

Sounds good to me.  Should I even do something like

 #define KVM_REQ_ARCH_BASE 8

 #define KVM_ARCH_REQ(bit) ({ \
     BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \
     ((bit) + KVM_REQ_ARCH_BASE); \
 })

 #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)

or would that be overkill?  Also, whether we switch to just the base
define, or the macro, I guess it would be good to do for all
architectures.

Thanks,
drew

> 
> I've otherwise started hammering this series over a number of systems,
> looking good so far.
> 
> Thanks,
> 
> 	M.
> -- 
> Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 14:47     ` Andrew Jones
@ 2017-04-04 14:51       ` Paolo Bonzini
  2017-04-04 15:05         ` Marc Zyngier
  2017-04-04 17:07         ` Andrew Jones
  0 siblings, 2 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-04 14:51 UTC (permalink / raw)
  To: Andrew Jones, Marc Zyngier; +Cc: cdall, kvmarm, kvm



On 04/04/2017 16:47, Andrew Jones wrote:
>>> -#define KVM_REQ_VCPU_EXIT	8
>>> +#define KVM_REQ_PAUSE		8
>> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
>> something along those lines?
> Sounds good to me.  Should I even do something like
> 
>  #define KVM_REQ_ARCH_BASE 8
> 
>  #define KVM_ARCH_REQ(bit) ({ \
>      BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \

Please make this 32 so that we don't fail on 32-bit machines.

or even

BUILD_BUG_ON((unsigned)(bit) >= BITS_PER_LONG - KVM_REQ_ARCH_BASE);

in case someone is crazy enough to pass a negative value!

Paolo

>      ((bit) + KVM_REQ_ARCH_BASE); \
>  })
> 
>  #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)
> 
> or would that be overkill?  Also, whether we switch to just the base
> define, or the macro, I guess it would be good to do for all
> architectures.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 14:51       ` Paolo Bonzini
@ 2017-04-04 15:05         ` Marc Zyngier
  2017-04-04 17:07         ` Andrew Jones
  1 sibling, 0 replies; 85+ messages in thread
From: Marc Zyngier @ 2017-04-04 15:05 UTC (permalink / raw)
  To: Paolo Bonzini, Andrew Jones; +Cc: kvmarm, kvm, cdall, rkrcmar

On 04/04/17 15:51, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 16:47, Andrew Jones wrote:
>>>> -#define KVM_REQ_VCPU_EXIT	8
>>>> +#define KVM_REQ_PAUSE		8
>>> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
>>> something along those lines?
>> Sounds good to me.  Should I even do something like
>>
>>  #define KVM_REQ_ARCH_BASE 8
>>
>>  #define KVM_ARCH_REQ(bit) ({ \
>>      BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \
> 
> Please make this 32 so that we don't fail on 32-bit machines.
> 
> or even
> 
> BUILD_BUG_ON((unsigned)(bit) >= BITS_PER_LONG - KVM_REQ_ARCH_BASE);
> 
> in case someone is crazy enough to pass a negative value!
> 
> Paolo
> 
>>      ((bit) + KVM_REQ_ARCH_BASE); \
>>  })
>>
>>  #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)
>>
>> or would that be overkill?  Also, whether we switch to just the base
>> define, or the macro, I guess it would be good to do for all
>> architectures.
> 

Both suggestions look good to me.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-03-31 16:06 ` [PATCH v2 2/9] KVM: Add documentation for VCPU requests Andrew Jones
@ 2017-04-04 15:24   ` Christoffer Dall
  2017-04-04 17:06     ` Andrew Jones
  2017-04-06 10:18   ` Christian Borntraeger
  1 sibling, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 15:24 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

Hi Drew,

On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote:
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++
>  1 file changed, 114 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> 
> diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
> new file mode 100644
> index 000000000000..ea4a966d5c8a
> --- /dev/null
> +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> @@ -0,0 +1,114 @@
> +=================
> +KVM VCPU Requests
> +=================
> +
> +Overview
> +========
> +
> +KVM supports an internal API enabling threads to request a VCPU thread to
> +perform some activity.  For example, a thread may request a VCPU to flush
> +its TLB with a VCPU request.  The API consists of only four calls::
> +
> +  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
> +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /* Check if any requests are pending for VCPU @vcpu. */
> +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> +
> +  /* Make request @req of VCPU @vcpu. */
> +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> +
> +Typically a requester wants the VCPU to perform the activity as soon
> +as possible after making the request.  This means most requests,
> +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
> +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> +into it.
> +
> +VCPU Kicks
> +----------
> +
> +A VCPU kick does one of three things:
> +
> + 1) wakes a sleeping VCPU (which sleeps outside guest mode).

You could clarify this to say that a sleeping VCPU is a VCPU thread
which is not runnable and placed on waitqueue, and waking it makes
the thread runnable again.

> + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
> +    out.
> + 3) nothing, when the VCPU is already outside guest mode and not sleeping.
> +
> +VCPU Request Internals
> +======================
> +
> +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
> +may also be used.  The first 8 bits are reserved for architecture
> +independent requests, all additional bits are available for architecture
> +dependent requests.

Should we explain the ones that are generically defined and how they're
supposed to be used?  For example, we don't use them on ARM, and I don't
think I understand why another thread would ever make a PENDING_TIMER
request on a vcpu?

> +
> +VCPU Requests with Associated State
> +===================================
> +
> +Requesters that want the requested VCPU to handle new state need to ensure
> +the state is observable to the requested VCPU thread's CPU at the time the

nit: need to ensure that the newly written state is observable ... by
the time it observed the request.

> +CPU observes the request.  This means a write memory barrier should be
                                                                 ^^^
							         must

> +insert between the preparation of the state and the write of the VCPU
    ^^^
   inserted

I would rephrase this as: '... after writing the new state to memory and
before setting the VCPU request bit.'


> +request bitmap.  Additionally, on the requested VCPU thread's side, a
> +corresponding read barrier should be issued after reading the request bit
                                ^^^       ^^^
			       must      inserted (for consistency)



> +and before proceeding to use the state associated with it.  See the kernel
                            ^^^    ^
		           read    new


> +memory barrier documentation [2].

I think it would be great if this document explains if this is currently
taken care of by the API you explain above or if there are cases where
people have to explicitly insert these barriers, and in that case, which
barriers they should use (if we know at this point already).

> +
> +VCPU Requests and Guest Mode
> +============================
> +

I feel like an intro about the overall goal here is missing.  How about
something like this:

  When making requests to VCPUs, we want to avoid the receiving VCPU
  executing inside the guest for an arbitrary long time without handling
  the request.  The way we prevent this from happening is by keeping
  track of when a VCPU is running and sending an IPI to the physical CPU
  running the VCPU when that is the case.  However, each architecture
  implementation of KVM must take great care to ensure that requests are
  not missed when a VCPU stops running at the same time when a request
  is received.

Also, I'm not sure what the semantics are with kvm_vcpu_block().  Is it
ok to send a request to a VCPU and then the VCPU blocks and goes to
sleep forever even though there are pending requests?
kvm_vcpu_check_block() doesn't seem to check vcpu->requests which would
indicate that this is the case, but maybe architectures that actually do
use requests implement something else themselves?

> +As long as the guest is either in guest mode, in which case it gets an IPI

guest is in guest mode?

Perhaps this could be more clearly written as:

As long as the VCPU is running, it is marked as having vcpu->mode =
IN_GUEST MODE.  A requesting thread observing IN_GUEST_MODE will send an
IPI to the CPU running the VCPU thread.  On the other hand, when a
requesting thread observes vcpu->mode == OUTSIDE_GUEST_MODE, it will not send
any IPIs, but will simply set the request bit, a the VCPU thread will be
able to check the requests before running the VCPU again.  However, the
transition...

> +and will definitely see the request, or is outside guest mode, but has yet
> +to do its final request check, and therefore when it does, it will see the
> +request, then things will work.  However, the transition from outside to
> +inside guest mode, after the last request check has been made, opens a
> +window where a request could be made, but the VCPU would not see until it
> +exits guest mode some time later.  See the table below.

This text, and the table below, only deals with the details of entering
the guest.  Should we talk about kvm_vcpu_exiting_guest_mode() and
anything related to exiting the guest?

> +
> ++------------------+-----------------+----------------+--------------+
> +| vcpu->mode       | done last check | kick sends IPI | request seen |
> ++==================+=================+================+==============+
> +| IN_GUEST_MODE    |      N/A        |      YES       |     YES      |
> ++------------------+-----------------+----------------+--------------+
> +| !IN_GUEST_MODE   |      NO         |      NO        |     YES      |
> ++------------------+-----------------+----------------+--------------+
> +| !IN_GUEST_MODE   |      YES        |      NO        |     NO       |
> ++------------------+-----------------+----------------+--------------+
> +
> +To ensure the third scenario shown in the table above cannot happen, we
> +need to ensure the VCPU's mode change is observable by all CPUs prior to
> +its final request check and that a requester's request is observable by
> +the requested VCPU prior to the kick.  To do that we need general memory
> +barriers between each pair of operations involving mode and requests, i.e.
> +
> +  CPU_i                                  CPU_j
> +-------------------------------------------------------------------------
> +  vcpu->mode = IN_GUEST_MODE;            kvm_make_request(REQ, vcpu);
> +  smp_mb();                              smp_mb();
> +  if (kvm_request_pending(vcpu))         if (vcpu->mode == IN_GUEST_MODE)
> +      handle_requests();                     send_IPI(vcpu->cpu);
> +
> +Whether explicit barriers are needed, or reliance on implicit barriers is
> +sufficient, is architecture dependent.  Alternatively, an architecture may
> +choose to just always send the IPI, as not sending it, when it's not
> +necessary, is just an optimization.

Is this universally true?  This is certainly true on ARM, because we
disable interrupts before doing all this, so the IPI remains pending and
causes an immediate exit, but if any of the above is done with
interrupts enabled, just sending an IPI does nothing to ensure the
request is observed.  Perhaps this is not a case we should care about.

> +
> +Additionally, the error prone third scenario described above also exhibits
> +why a request-less VCPU kick is almost never correct.  Without the
> +assurance that a non-IPI generating kick will still result in an action by
> +the requested VCPU, as the final kvm_request_pending() check does, then
> +the kick may not initiate anything useful at all.  If, for instance, a
> +request-less kick was made to a VCPU that was just about to set its mode
> +to IN_GUEST_MODE, meaning no IPI is sent, then the VCPU may continue its
> +entry without actually having done whatever it was the kick was meant to
> +initiate.

Indeed.


> +
> +References
> +==========
> +
> +[1] Documentation/core-api/atomic_ops.rst
> +[2] Documentation/memory-barriers.txt
> -- 
> 2.9.3
> 

This is a great writeup!  I enjoyed reading it and it made me think more
carefully about a number of things, so I definitely think we should
merge this.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-03-31 16:06 ` [PATCH v2 1/9] KVM: add kvm_request_pending Andrew Jones
@ 2017-04-04 15:30   ` Christoffer Dall
  2017-04-04 16:41     ` Andrew Jones
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 15:30 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
> From: Radim Krčmář <rkrcmar@redhat.com>
> 
> A first step in vcpu->requests encapsulation.

Could we have a note here on why we need to access vcpu->requests using
READ_ONCE now?

Thanks,
-Christoffer

> 
> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/mips/kvm/trap_emul.c  | 2 +-
>  arch/powerpc/kvm/booke.c   | 2 +-
>  arch/powerpc/kvm/powerpc.c | 5 ++---
>  arch/s390/kvm/kvm-s390.c   | 2 +-
>  arch/x86/kvm/x86.c         | 4 ++--
>  include/linux/kvm_host.h   | 5 +++++
>  6 files changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/mips/kvm/trap_emul.c b/arch/mips/kvm/trap_emul.c
> index b1fa53b252ea..9ac8b1d62643 100644
> --- a/arch/mips/kvm/trap_emul.c
> +++ b/arch/mips/kvm/trap_emul.c
> @@ -1029,7 +1029,7 @@ static void kvm_trap_emul_check_requests(struct kvm_vcpu *vcpu, int cpu,
>  	struct mm_struct *mm;
>  	int i;
>  
> -	if (likely(!vcpu->requests))
> +	if (likely(!kvm_request_pending(vcpu)))
>  		return;
>  
>  	if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 0514cbd4e533..65ed6595c9c2 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -682,7 +682,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
>  
>  	kvmppc_core_check_exceptions(vcpu);
>  
> -	if (vcpu->requests) {
> +	if (kvm_request_pending(vcpu)) {
>  		/* Exception delivery raised request; start over */
>  		return 1;
>  	}
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 95c91a9de351..714674ea5be6 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -52,8 +52,7 @@ EXPORT_SYMBOL_GPL(kvmppc_pr_ops);
>  
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
> -	return !!(v->arch.pending_exceptions) ||
> -	       v->requests;
> +	return !!(v->arch.pending_exceptions) || kvm_request_pending(v);
>  }
>  
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> @@ -105,7 +104,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
>  		 */
>  		smp_mb();
>  
> -		if (vcpu->requests) {
> +		if (kvm_request_pending(vcpu)) {
>  			/* Make sure we process requests preemptable */
>  			local_irq_enable();
>  			trace_kvm_check_requests(vcpu);
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index fd6cd05bb6a7..40ad6c8d082f 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -2396,7 +2396,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
>  {
>  retry:
>  	kvm_s390_vcpu_request_handled(vcpu);
> -	if (!vcpu->requests)
> +	if (!kvm_request_pending(vcpu))
>  		return 0;
>  	/*
>  	 * We use MMU_RELOAD just to re-arm the ipte notifier for the
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 1faf620a6fdc..9714bb230524 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6726,7 +6726,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  
>  	bool req_immediate_exit = false;
>  
> -	if (vcpu->requests) {
> +	if (kvm_request_pending(vcpu)) {
>  		if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
>  			kvm_mmu_unload(vcpu);
>  		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
> @@ -6890,7 +6890,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>  			kvm_x86_ops->sync_pir_to_irr(vcpu);
>  	}
>  
> -	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> +	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
>  	    || need_resched() || signal_pending(current)) {
>  		vcpu->mode = OUTSIDE_GUEST_MODE;
>  		smp_wmb();
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 2c14ad9809da..946bf0b3c43c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1085,6 +1085,11 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args)
>  
>  #endif /* CONFIG_HAVE_KVM_EVENTFD */
>  
> +static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
> +{
> +	return READ_ONCE(vcpu->requests);
> +}
> +
>  static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
>  {
>  	/*
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 3/9] KVM: arm/arm64: prepare to use vcpu requests
  2017-03-31 16:06 ` [PATCH v2 3/9] KVM: arm/arm64: prepare to use vcpu requests Andrew Jones
@ 2017-04-04 15:34   ` Christoffer Dall
  2017-04-04 17:06     ` Andrew Jones
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 15:34 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Fri, Mar 31, 2017 at 06:06:52PM +0200, Andrew Jones wrote:
> Make sure we don't leave vcpu requests we don't intend to
> handle later set in the request bitmap. If we don't clear
> them, then kvm_request_pending() may return true when we
> don't want it to.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> Acked-by: Christoffer Dall <cdall@linaro.org>
> ---
>  arch/arm/kvm/handle_exit.c   | 1 +
>  arch/arm/kvm/psci.c          | 1 +
>  arch/arm64/kvm/handle_exit.c | 1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
> index 96af65a30d78..ffb2406e5905 100644
> --- a/arch/arm/kvm/handle_exit.c
> +++ b/arch/arm/kvm/handle_exit.c
> @@ -72,6 +72,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		trace_kvm_wfx(*vcpu_pc(vcpu), false);
>  		vcpu->stat.wfi_exit_stat++;
>  		kvm_vcpu_block(vcpu);
> +		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);

I actually don't understand the idea behind KVM_REQ_UNHALT?

It seems there's a semantic difference that architectures should adhere
by when returning from kvm_vcpu_block() with or without KVM_REQ_UNHALT
set (i.e. if the vcpu was runnable when kvm_vcpu_check_blocK() was
called?) - can you explain what the deal is?  Perhaps that belongs in
the documentation patch.

>  	}
>  
>  	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index c2b131527a64..82fe7eb5b6a7 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -57,6 +57,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
>  	 * for KVM will preserve the register state.
>  	 */
>  	kvm_vcpu_block(vcpu);
> +	clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
>  
>  	return PSCI_RET_SUCCESS;
>  }
> diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> index fa1b18e364fc..e4937fb2fb89 100644
> --- a/arch/arm64/kvm/handle_exit.c
> +++ b/arch/arm64/kvm/handle_exit.c
> @@ -89,6 +89,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
>  		vcpu->stat.wfi_exit_stat++;
>  		kvm_vcpu_block(vcpu);
> +		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
>  	}
>  
>  	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> -- 
> 2.9.3
> 

Ignoring my comment above, for the content of this patch:

Acked-by: Christoffer Dall <cdall@linaro.org>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-03-31 16:06 ` [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request Andrew Jones
  2017-04-04 13:39   ` Marc Zyngier
@ 2017-04-04 16:04   ` Christoffer Dall
  2017-04-04 16:24     ` Paolo Bonzini
  2017-04-04 17:57     ` Andrew Jones
  1 sibling, 2 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 16:04 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Fri, Mar 31, 2017 at 06:06:53PM +0200, Andrew Jones wrote:
> This not only ensures visibility of changes to pause by using
> atomic ops, but also plugs a small race where a vcpu could get its
> pause state enabled just after its last check before entering the
> guest. With this patch, while the vcpu will still initially enter
> the guest, it will exit immediately due to the IPI sent by the vcpu
> kick issued after making the vcpu request.
> 
> We use bitops, rather than kvm_make/check_request(), because we
> don't need the barriers they provide,

why not?

> nor do we want the side-effect
> of kvm_check_request() clearing the request. For pause, only the
> requester should do the clearing.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  5 +----
>  arch/arm/kvm/arm.c                | 45 +++++++++++++++++++++++++++------------
>  arch/arm64/include/asm/kvm_host.h |  5 +----
>  3 files changed, 33 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 31ee468ce667..52c25536d254 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -45,7 +45,7 @@
>  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
>  #endif
>  
> -#define KVM_REQ_VCPU_EXIT	8
> +#define KVM_REQ_PAUSE		8
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> @@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
>  	/* vcpu power-off state */
>  	bool power_off;
>  
> -	 /* Don't run the guest (internal implementation need) */
> -	bool pause;
> -
>  	/* IO related fields */
>  	struct kvm_decode mmio_decode;
>  
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 314eb6abe1ff..f3bfbb5f3d96 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
>  
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
>  {
> +	/*
> +	 * If we return true from this function, then it means the vcpu is
> +	 * either in guest mode, or has already indicated that it's in guest
> +	 * mode. The indication is done by setting ->mode to IN_GUEST_MODE,
> +	 * and must be done before the final kvm_request_pending() read. It's
> +	 * important that the observability of that order be enforced and that
> +	 * the request receiving CPU can observe any new request before the
> +	 * requester issues a kick. Thus, the general barrier below pairs with
> +	 * the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
> +	 * write to ->mode and the final request pending read.
> +	 */

I am having a hard time understanding this comment.  For example, I
don't understand the difference between 'is either in guest mode or has
already indicated it's in guest mode'.  Which case is which again, and
how are we checking for two cases below?

Also, the stuff about observability of an order is hard to follow, and
the comment assumes the reader is thinking about the specific race when
entering the guest.

I think we should focus on getting the documentation in place, refer to
the documentation from here, and be much more brief and say something
like:

	/*
	 * The memory barrier below pairs with the barrier in
	 * kvm_arch_vcpu_ioctl_run() between writes to vcpu->mode
	 * and reading vcpu->requests before entering the guest.
	 *
	 * Ensures that the VCPU thread's CPU can observe changes to
	 * vcpu->requests written prior to calling this function before
	 * it writes vcpu->mode = IN_GUEST_MODE, and correspondingly
	 * ensures that this CPU observes vcpu->mode == IN_GUEST_MODE
	 * only if the VCPU thread's CPU could observe writes to
	 * vcpu->requests from this CPU.
	 /

Is this correct?  I'm not really sure anymore?

There's also the obvious fact that we're adding this memory barrier
inside a funciton that checks if we should kick a vcpu, and there's no
documentation that says that this is always called in association with
setting a request, is there?

I finally don't undertand why this would be a requirement only on ARM?

> +	smp_mb();
>  	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
>  }
>  
> @@ -404,7 +416,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
>  	return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
> -		&& !v->arch.power_off && !v->arch.pause);
> +		&& !v->arch.power_off
> +		&& !test_bit(KVM_REQ_PAUSE, &v->requests));
>  }
>  
>  /* Just ensure a guest exit from a particular CPU */
> @@ -535,17 +548,12 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
>  
>  void kvm_arm_halt_guest(struct kvm *kvm)
>  {
> -	int i;
> -	struct kvm_vcpu *vcpu;
> -
> -	kvm_for_each_vcpu(i, vcpu, kvm)
> -		vcpu->arch.pause = true;
> -	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
> +	kvm_make_all_cpus_request(kvm, KVM_REQ_PAUSE);
>  }
>  
>  void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
>  {
> -	vcpu->arch.pause = true;
> +	set_bit(KVM_REQ_PAUSE, &vcpu->requests);
>  	kvm_vcpu_kick(vcpu);
>  }
>  
> @@ -553,7 +561,7 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
>  
> -	vcpu->arch.pause = false;
> +	clear_bit(KVM_REQ_PAUSE, &vcpu->requests);
>  	swake_up(wq);
>  }
>  
> @@ -571,7 +579,7 @@ static void vcpu_sleep(struct kvm_vcpu *vcpu)
>  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
>  
>  	swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
> -				       (!vcpu->arch.pause)));
> +		(!test_bit(KVM_REQ_PAUSE, &vcpu->requests))));
>  }
>  
>  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> @@ -624,7 +632,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		update_vttbr(vcpu->kvm);
>  
> -		if (vcpu->arch.power_off || vcpu->arch.pause)
> +		if (vcpu->arch.power_off || test_bit(KVM_REQ_PAUSE, &vcpu->requests))
>  			vcpu_sleep(vcpu);
>  
>  		/*
> @@ -647,8 +655,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  			run->exit_reason = KVM_EXIT_INTR;
>  		}
>  
> +		/*
> +		 * Indicate we're in guest mode now, before doing a final
> +		 * check for pending vcpu requests. The general barrier
> +		 * pairs with the one in kvm_arch_vcpu_should_kick().
> +		 * Please see the comment there for more details.
> +		 */
> +		WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
> +		smp_mb();

There are two changes here:

there's a change from a normal write to a WRITE_ONCE and there's also a
change to that adds a memory barrier.  I feel like I'd like to know if
these are tied together or two separate cleanups.  I also wonder if we
could split out more general changes from the pause thing to have a
better log of why we changed the run loop?

It looks to me like there could be a separate patch that encapsulated
the reads and writes of vcpu->mode into a function that does the
WRITE_ONCE and READ_ONCE with a nice comment.

> +
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> -			vcpu->arch.power_off || vcpu->arch.pause) {
> +			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> +			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
>  			local_irq_enable();
>  			kvm_pmu_sync_hwstate(vcpu);
>  			kvm_timer_sync_hwstate(vcpu);
> @@ -664,11 +682,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		 */
>  		trace_kvm_entry(*vcpu_pc(vcpu));
>  		guest_enter_irqoff();
> -		vcpu->mode = IN_GUEST_MODE;
>  
>  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
>  
> -		vcpu->mode = OUTSIDE_GUEST_MODE;
> +		WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
>  		vcpu->stat.exits++;
>  		/*
>  		 * Back from guest
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e7705e7bb07b..6e1271a77e92 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -42,7 +42,7 @@
>  
>  #define KVM_VCPU_MAX_FEATURES 4
>  
> -#define KVM_REQ_VCPU_EXIT	8
> +#define KVM_REQ_PAUSE		8
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> @@ -256,9 +256,6 @@ struct kvm_vcpu_arch {
>  	/* vcpu power-off state */
>  	bool power_off;
>  
> -	/* Don't run the guest (internal implementation need) */
> -	bool pause;
> -
>  	/* IO related fields */
>  	struct kvm_decode mmio_decode;
>  
> -- 
> 2.9.3

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests
  2017-04-04  7:27   ` Andrew Jones
@ 2017-04-04 16:05     ` Christoffer Dall
  0 siblings, 0 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 16:05 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Tue, Apr 04, 2017 at 09:27:46AM +0200, Andrew Jones wrote:
> On Mon, Apr 03, 2017 at 05:28:45PM +0200, Christoffer Dall wrote:
> > Hi Drew,
> > 
> > On Fri, Mar 31, 2017 at 06:06:49PM +0200, Andrew Jones wrote:
> > > This series fixes some hard to produce races by introducing the use of
> > > vcpu requests.  It also fixes a couple easier to produce races, ones
> > > that have been produced with the PSCI kvm-unit-test test.  The easy two
> > > are addressed in two different ways: the first takes advantage of
> > > power_off having been changed to a vcpu request, the second caches vcpu
> > > MPIDRs in order to avoid extracting them from sys_regs.  I've tested the
> > > series on a Mustang and a ThunderX and compile-tested the ARM bits.
> > > 
> > > Patch 2/9 adds documentation, as, at least for me, understanding vcpu
> > > request interplay with vcpu kicks and vcpu mode and the memory barriers
> > > that interplay implies, is exhausting.  Hopefully the document is useful
> > > to others.  I'm not married to it though, so it can be deferred/dropped
> > > as people like...
> > 
> > Sounds helpful, I'll have a look.
> > 
> > > 
> > > v2:
> > >   - No longer based on Radim's vcpu request API rework[1], except for
> > >     including "add kvm_request_pending" as patch 1/9 [drew]
> > 
> > I lost track here; did those patches get merged or dropped and why are
> > we not basing this work on them anymore, and should patch 1/9 be applied
> > here or is it expected to land in the KVM tree via some other path?
> 
> I think Radim still wants to rework the API, but, as his work doesn't
> provide fixes or functional changes, his timeline may not be the same
> as for this series.  He also wants to expand his rework to add API
> that includes kicking with requesting.  I'm not sure how all that will
> look yet, so, in the end, I decided I might as well just use the current
> API for now.  kvm_request_pending() was too nice an addition to drop
> though.

Makes sense, thanks for the explanation.

> 
> > 
> > >   - Added vcpu request documentation [drew]
> > >   - Dropped the introduction of user settable MPIDRs [Christoffer]
> > >   - Added vcpu requests to all request-less vcpu kicks [Christoffer]
> > > 
> > 
> > Didn't we also have an issue with a missing barrier if the cmpxchg
> > operation doesn't succeed?  Did that fall though the cracks or is it
> > just missing in the changelog?
> 
> Just missing from the changelog. Sorry about that.

No worries.

> 
>   - Ensure we have a read barrier (or equivalent) prior to issuing the
>     cmpxchg in kvm_vcpu_exiting_guest_mode(), as a failed cmpxchg does
>     not guarantee any barrier [Christoffer]

Thanks for adding this, although I'm not able to convince myself that we
got all the detailed aspects of this correct, just yet, but hopefully
some of the questions I've asked on the individual patches can improve
this.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 16:04   ` Christoffer Dall
@ 2017-04-04 16:24     ` Paolo Bonzini
  2017-04-04 17:19       ` Christoffer Dall
  2017-04-04 17:57     ` Andrew Jones
  1 sibling, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-04 16:24 UTC (permalink / raw)
  To: Christoffer Dall, Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, rkrcmar



On 04/04/2017 18:04, Christoffer Dall wrote:
>> For pause, only the requester should do the clearing.

This suggests that maybe this should not be a request.  The request
would be just the need to act on a GIC command, exactly as before this patch.

What I don't understand is:

>> With this patch, while the vcpu will still initially enter
>> the guest, it will exit immediately due to the IPI sent by the vcpu
>> kick issued after making the vcpu request.

Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?

So this:

+			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
+			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);

is the crux of the fix, you can keep using vcpu->arch.pause.

By the way, vcpu->arch.power_off can go away from this "if" too because
KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
The earlier check is enough:

                 if (vcpu->arch.power_off || vcpu->arch.pause)
                         vcpu_sleep(vcpu);


>> +		/*
>> +		 * Indicate we're in guest mode now, before doing a final
>> +		 * check for pending vcpu requests. The general barrier
>> +		 * pairs with the one in kvm_arch_vcpu_should_kick().
>> +		 * Please see the comment there for more details.
>> +		 */
>> +		WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
>> +		smp_mb();
> 
> There are two changes here:
> 
> there's a change from a normal write to a WRITE_ONCE and there's also a
> change to that adds a memory barrier.  I feel like I'd like to know if
> these are tied together or two separate cleanups.  I also wonder if we
> could split out more general changes from the pause thing to have a
> better log of why we changed the run loop?

You probably should just use smp_store_mb here.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-04 15:30   ` Christoffer Dall
@ 2017-04-04 16:41     ` Andrew Jones
  2017-04-05 13:10       ` Radim Krčmář
  0 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-04 16:41 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, Apr 04, 2017 at 05:30:14PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
> > From: Radim Krčmář <rkrcmar@redhat.com>
> > 
> > A first step in vcpu->requests encapsulation.
> 
> Could we have a note here on why we need to access vcpu->requests using
> READ_ONCE now?

Sure, maybe we should put the note as a comment above the read in
kvm_request_pending().  Something like

 /*
  * vcpu->requests reads may appear in sequences that have strict
  * data or control dependencies.  Use READ_ONCE() to ensure the
  * compiler does not do anything that breaks the required ordering.
  */

Radim?

Thanks,
drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-04 15:24   ` Christoffer Dall
@ 2017-04-04 17:06     ` Andrew Jones
  2017-04-04 17:23       ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-04 17:06 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, Apr 04, 2017 at 05:24:03PM +0200, Christoffer Dall wrote:
> Hi Drew,
> 
> On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote:
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++
> >  1 file changed, 114 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> > 
> > diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
> > new file mode 100644
> > index 000000000000..ea4a966d5c8a
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> > @@ -0,0 +1,114 @@
> > +=================
> > +KVM VCPU Requests
> > +=================
> > +
> > +Overview
> > +========
> > +
> > +KVM supports an internal API enabling threads to request a VCPU thread to
> > +perform some activity.  For example, a thread may request a VCPU to flush
> > +its TLB with a VCPU request.  The API consists of only four calls::
> > +
> > +  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
> > +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /* Check if any requests are pending for VCPU @vcpu. */
> > +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> > +
> > +  /* Make request @req of VCPU @vcpu. */
> > +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> > +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> > +
> > +Typically a requester wants the VCPU to perform the activity as soon
> > +as possible after making the request.  This means most requests,
> > +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
> > +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> > +into it.
> > +
> > +VCPU Kicks
> > +----------
> > +
> > +A VCPU kick does one of three things:
> > +
> > + 1) wakes a sleeping VCPU (which sleeps outside guest mode).
> 
> You could clarify this to say that a sleeping VCPU is a VCPU thread
> which is not runnable and placed on waitqueue, and waking it makes
> the thread runnable again.
> 
> > + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
> > +    out.
> > + 3) nothing, when the VCPU is already outside guest mode and not sleeping.
> > +
> > +VCPU Request Internals
> > +======================
> > +
> > +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> > +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
> > +may also be used.  The first 8 bits are reserved for architecture
> > +independent requests, all additional bits are available for architecture
> > +dependent requests.
> 
> Should we explain the ones that are generically defined and how they're
> supposed to be used?  For example, we don't use them on ARM, and I don't
> think I understand why another thread would ever make a PENDING_TIMER
> request on a vcpu?

Yes, I agree the general requests should be described.  I'll have to
figure out how :-)  Describing KVM_REQ_UNHALT will likely lead to a
subsection on kvm_vcpu_block(), as you bring up below.

> 
> > +
> > +VCPU Requests with Associated State
> > +===================================
> > +
> > +Requesters that want the requested VCPU to handle new state need to ensure
> > +the state is observable to the requested VCPU thread's CPU at the time the
> 
> nit: need to ensure that the newly written state is observable ... by
> the time it observed the request.
> 
> > +CPU observes the request.  This means a write memory barrier should be
>                                                                  ^^^
> 							         must
> 
> > +insert between the preparation of the state and the write of the VCPU
>     ^^^
>    inserted
> 
> I would rephrase this as: '... after writing the new state to memory and
> before setting the VCPU request bit.'
> 
> 
> > +request bitmap.  Additionally, on the requested VCPU thread's side, a
> > +corresponding read barrier should be issued after reading the request bit
>                                 ^^^       ^^^
> 			       must      inserted (for consistency)
> 
> 
> 
> > +and before proceeding to use the state associated with it.  See the kernel
>                             ^^^    ^
> 		           read    new
> 
> 
> > +memory barrier documentation [2].
> 
> I think it would be great if this document explains if this is currently
> taken care of by the API you explain above or if there are cases where
> people have to explicitly insert these barriers, and in that case, which
> barriers they should use (if we know at this point already).

Will do.  The current API does take care of it.  I'll state that.  I'd
have to grep around to see if there are any non-API users that also need
barriers, but as they could change, I probably wouldn't want to call them
out is the doc.  So I guess I'll still just wave my hand at that type of
use.

> 
> > +
> > +VCPU Requests and Guest Mode
> > +============================
> > +
> 
> I feel like an intro about the overall goal here is missing.  How about
> something like this:
> 
>   When making requests to VCPUs, we want to avoid the receiving VCPU
>   executing inside the guest for an arbitrary long time without handling
>   the request.  The way we prevent this from happening is by keeping
>   track of when a VCPU is running and sending an IPI to the physical CPU
>   running the VCPU when that is the case.  However, each architecture
>   implementation of KVM must take great care to ensure that requests are
>   not missed when a VCPU stops running at the same time when a request
>   is received.
> 
> Also, I'm not sure what the semantics are with kvm_vcpu_block().  Is it
> ok to send a request to a VCPU and then the VCPU blocks and goes to
> sleep forever even though there are pending requests?
> kvm_vcpu_check_block() doesn't seem to check vcpu->requests which would
> indicate that this is the case, but maybe architectures that actually do
> use requests implement something else themselves?

I'll add a kvm_vcpu_block() subsection as part of the KVM_REQ_UNHALT
documentation.

> 
> > +As long as the guest is either in guest mode, in which case it gets an IPI
> 
> guest is in guest mode?

oops, s/guest/vcpu/

> 
> Perhaps this could be more clearly written as:
> 
> As long as the VCPU is running, it is marked as having vcpu->mode =
> IN_GUEST MODE.  A requesting thread observing IN_GUEST_MODE will send an
> IPI to the CPU running the VCPU thread.  On the other hand, when a
> requesting thread observes vcpu->mode == OUTSIDE_GUEST_MODE, it will not send
> any IPIs, but will simply set the request bit, a the VCPU thread will be
> able to check the requests before running the VCPU again.  However, the
> transition...
> 
> > +and will definitely see the request, or is outside guest mode, but has yet
> > +to do its final request check, and therefore when it does, it will see the
> > +request, then things will work.  However, the transition from outside to
> > +inside guest mode, after the last request check has been made, opens a
> > +window where a request could be made, but the VCPU would not see until it
> > +exits guest mode some time later.  See the table below.
> 
> This text, and the table below, only deals with the details of entering
> the guest.  Should we talk about kvm_vcpu_exiting_guest_mode() and
> anything related to exiting the guest?

I think all !IN_GUEST_MODE should behave the same, so I was avoiding
the use of EXITING_GUEST_MODE and OUTSIDE_GUEST_MODE, which wouldn't be
hard to address, but then I'd also have to address
READING_SHADOW_PAGE_TABLES, which may complicate the document more than
necessary.  I'm not sure we need to address a VCPU exiting guest mode,
other than making sure it's clear that a VCPU that exits must check
requests before it enters again.

> 
> > +
> > ++------------------+-----------------+----------------+--------------+
> > +| vcpu->mode       | done last check | kick sends IPI | request seen |
> > ++==================+=================+================+==============+
> > +| IN_GUEST_MODE    |      N/A        |      YES       |     YES      |
> > ++------------------+-----------------+----------------+--------------+
> > +| !IN_GUEST_MODE   |      NO         |      NO        |     YES      |
> > ++------------------+-----------------+----------------+--------------+
> > +| !IN_GUEST_MODE   |      YES        |      NO        |     NO       |
> > ++------------------+-----------------+----------------+--------------+
> > +
> > +To ensure the third scenario shown in the table above cannot happen, we
> > +need to ensure the VCPU's mode change is observable by all CPUs prior to
> > +its final request check and that a requester's request is observable by
> > +the requested VCPU prior to the kick.  To do that we need general memory
> > +barriers between each pair of operations involving mode and requests, i.e.
> > +
> > +  CPU_i                                  CPU_j
> > +-------------------------------------------------------------------------
> > +  vcpu->mode = IN_GUEST_MODE;            kvm_make_request(REQ, vcpu);
> > +  smp_mb();                              smp_mb();
> > +  if (kvm_request_pending(vcpu))         if (vcpu->mode == IN_GUEST_MODE)
> > +      handle_requests();                     send_IPI(vcpu->cpu);
> > +
> > +Whether explicit barriers are needed, or reliance on implicit barriers is
> > +sufficient, is architecture dependent.  Alternatively, an architecture may
> > +choose to just always send the IPI, as not sending it, when it's not
> > +necessary, is just an optimization.
> 
> Is this universally true?  This is certainly true on ARM, because we
> disable interrupts before doing all this, so the IPI remains pending and
> causes an immediate exit, but if any of the above is done with
> interrupts enabled, just sending an IPI does nothing to ensure the
> request is observed.  Perhaps this is not a case we should care about.

I'll try to make this less generic, as some architectures may not work
this way.  Indeed, s390 doesn't seem to have kvm_vcpu_kick(), so I guess
things don't work this way for them.

> 
> > +
> > +Additionally, the error prone third scenario described above also exhibits
> > +why a request-less VCPU kick is almost never correct.  Without the
> > +assurance that a non-IPI generating kick will still result in an action by
> > +the requested VCPU, as the final kvm_request_pending() check does, then
> > +the kick may not initiate anything useful at all.  If, for instance, a
> > +request-less kick was made to a VCPU that was just about to set its mode
> > +to IN_GUEST_MODE, meaning no IPI is sent, then the VCPU may continue its
> > +entry without actually having done whatever it was the kick was meant to
> > +initiate.
> 
> Indeed.
> 
> 
> > +
> > +References
> > +==========
> > +
> > +[1] Documentation/core-api/atomic_ops.rst
> > +[2] Documentation/memory-barriers.txt
> > -- 
> > 2.9.3
> > 
> 
> This is a great writeup!  I enjoyed reading it and it made me think more
> carefully about a number of things, so I definitely think we should
> merge this.
>

Thanks Christoffer!  I'll take all your suggestions above and try to
answer your questions for v2.

drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 3/9] KVM: arm/arm64: prepare to use vcpu requests
  2017-04-04 15:34   ` Christoffer Dall
@ 2017-04-04 17:06     ` Andrew Jones
  0 siblings, 0 replies; 85+ messages in thread
From: Andrew Jones @ 2017-04-04 17:06 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, Apr 04, 2017 at 05:34:01PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:52PM +0200, Andrew Jones wrote:
> > Make sure we don't leave vcpu requests we don't intend to
> > handle later set in the request bitmap. If we don't clear
> > them, then kvm_request_pending() may return true when we
> > don't want it to.
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > Acked-by: Christoffer Dall <cdall@linaro.org>
> > ---
> >  arch/arm/kvm/handle_exit.c   | 1 +
> >  arch/arm/kvm/psci.c          | 1 +
> >  arch/arm64/kvm/handle_exit.c | 1 +
> >  3 files changed, 3 insertions(+)
> > 
> > diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
> > index 96af65a30d78..ffb2406e5905 100644
> > --- a/arch/arm/kvm/handle_exit.c
> > +++ b/arch/arm/kvm/handle_exit.c
> > @@ -72,6 +72,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		trace_kvm_wfx(*vcpu_pc(vcpu), false);
> >  		vcpu->stat.wfi_exit_stat++;
> >  		kvm_vcpu_block(vcpu);
> > +		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
> 
> I actually don't understand the idea behind KVM_REQ_UNHALT?
> 
> It seems there's a semantic difference that architectures should adhere
> by when returning from kvm_vcpu_block() with or without KVM_REQ_UNHALT
> set (i.e. if the vcpu was runnable when kvm_vcpu_check_blocK() was
> called?) - can you explain what the deal is?  Perhaps that belongs in
> the documentation patch.

Yup, will address this in the doc patch.

> 
> >  	}
> >  
> >  	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > index c2b131527a64..82fe7eb5b6a7 100644
> > --- a/arch/arm/kvm/psci.c
> > +++ b/arch/arm/kvm/psci.c
> > @@ -57,6 +57,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
> >  	 * for KVM will preserve the register state.
> >  	 */
> >  	kvm_vcpu_block(vcpu);
> > +	clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
> >  
> >  	return PSCI_RET_SUCCESS;
> >  }
> > diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
> > index fa1b18e364fc..e4937fb2fb89 100644
> > --- a/arch/arm64/kvm/handle_exit.c
> > +++ b/arch/arm64/kvm/handle_exit.c
> > @@ -89,6 +89,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
> >  		vcpu->stat.wfi_exit_stat++;
> >  		kvm_vcpu_block(vcpu);
> > +		clear_bit(KVM_REQ_UNHALT, &vcpu->requests);
> >  	}
> >  
> >  	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
> > -- 
> > 2.9.3
> > 
> 
> Ignoring my comment above, for the content of this patch:
> 
> Acked-by: Christoffer Dall <cdall@linaro.org>

Thanks,
drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 14:51       ` Paolo Bonzini
  2017-04-04 15:05         ` Marc Zyngier
@ 2017-04-04 17:07         ` Andrew Jones
  1 sibling, 0 replies; 85+ messages in thread
From: Andrew Jones @ 2017-04-04 17:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Marc Zyngier, kvmarm, kvm, cdall, rkrcmar

On Tue, Apr 04, 2017 at 04:51:40PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 16:47, Andrew Jones wrote:
> >>> -#define KVM_REQ_VCPU_EXIT	8
> >>> +#define KVM_REQ_PAUSE		8
> >> Small nit: can we have a #define for this 8? KVM_REQ_ARCH_BASE, or
> >> something along those lines?
> > Sounds good to me.  Should I even do something like
> > 
> >  #define KVM_REQ_ARCH_BASE 8
> > 
> >  #define KVM_ARCH_REQ(bit) ({ \
> >      BUILD_BUG_ON(((bit) + KVM_REQ_ARCH_BASE) >= BITS_PER_LONG); \
> 
> Please make this 32 so that we don't fail on 32-bit machines.
> 
> or even
> 
> BUILD_BUG_ON((unsigned)(bit) >= BITS_PER_LONG - KVM_REQ_ARCH_BASE);
> 
> in case someone is crazy enough to pass a negative value!

Will do.

Thanks,
drew

> 
> Paolo
> 
> >      ((bit) + KVM_REQ_ARCH_BASE); \
> >  })
> > 
> >  #define KVM_REQ_PAUSE KVM_ARCH_REQ(0)
> > 
> > or would that be overkill?  Also, whether we switch to just the base
> > define, or the macro, I guess it would be good to do for all
> > architectures.
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 16:24     ` Paolo Bonzini
@ 2017-04-04 17:19       ` Christoffer Dall
  2017-04-04 17:35         ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 17:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, kvmarm, kvm

On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 18:04, Christoffer Dall wrote:
> >> For pause, only the requester should do the clearing.
> 
> This suggests that maybe this should not be a request.  The request
> would be just the need to act on a GIC command, exactly as before this patch.

Maybe the semantics should be:

requester:                                vcpu:
----------                                -----
make_requet(vcpu, KVM_REQ_PAUSE);
                                          handles the request by
					  clearing it and setting
					  vcpu->pause = true;
wait until vcpu->pause == true
make_request(vcpu, KVM_REQ_UNPAUSE);
                                          vcpus 'wake up' clear the
					  UNPAUSE request and set
					  vcpu->pause = false;

The benefit would be that we get to re-use the complicated "figure out
the VCPU mode and whether or not we should send an IPI and get the
barriers right" stuff.

> 
> What I don't understand is:
> 
> >> With this patch, while the vcpu will still initially enter
> >> the guest, it will exit immediately due to the IPI sent by the vcpu
> >> kick issued after making the vcpu request.
> 
> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?
> 
> So this:
> 
> +			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> +			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> 
> is the crux of the fix, you can keep using vcpu->arch.pause.

Probably; I feel like there's a fix here which should be a separate
patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
the pause flag.

> 
> By the way, vcpu->arch.power_off can go away from this "if" too because
> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.

But we also allow setting the power_off flag from the in-kernel PSCI
emulation in the context of another VCPU thread.

> The earlier check is enough:
> 
>                  if (vcpu->arch.power_off || vcpu->arch.pause)
>                          vcpu_sleep(vcpu);
> 
> 
> >> +		/*
> >> +		 * Indicate we're in guest mode now, before doing a final
> >> +		 * check for pending vcpu requests. The general barrier
> >> +		 * pairs with the one in kvm_arch_vcpu_should_kick().
> >> +		 * Please see the comment there for more details.
> >> +		 */
> >> +		WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
> >> +		smp_mb();
> > 
> > There are two changes here:
> > 
> > there's a change from a normal write to a WRITE_ONCE and there's also a
> > change to that adds a memory barrier.  I feel like I'd like to know if
> > these are tied together or two separate cleanups.  I also wonder if we
> > could split out more general changes from the pause thing to have a
> > better log of why we changed the run loop?
> 
> You probably should just use smp_store_mb here.
> 

That looks cleaner at least.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-04 17:06     ` Andrew Jones
@ 2017-04-04 17:23       ` Christoffer Dall
  2017-04-04 17:36         ` Paolo Bonzini
  2017-04-05 14:11         ` Radim Krčmář
  0 siblings, 2 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 17:23 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Tue, Apr 04, 2017 at 07:06:00PM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 05:24:03PM +0200, Christoffer Dall wrote:
> > Hi Drew,
> > 
> > On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote:
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > ---
> > >  Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++
> > >  1 file changed, 114 insertions(+)
> > >  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> > > 
> > > diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
> > > new file mode 100644
> > > index 000000000000..ea4a966d5c8a
> > > --- /dev/null
> > > +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> > > @@ -0,0 +1,114 @@
> > > +=================
> > > +KVM VCPU Requests
> > > +=================
> > > +
> > > +Overview
> > > +========
> > > +
> > > +KVM supports an internal API enabling threads to request a VCPU thread to
> > > +perform some activity.  For example, a thread may request a VCPU to flush
> > > +its TLB with a VCPU request.  The API consists of only four calls::
> > > +
> > > +  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
> > > +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> > > +
> > > +  /* Check if any requests are pending for VCPU @vcpu. */
> > > +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> > > +
> > > +  /* Make request @req of VCPU @vcpu. */
> > > +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> > > +
> > > +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> > > +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> > > +
> > > +Typically a requester wants the VCPU to perform the activity as soon
> > > +as possible after making the request.  This means most requests,
> > > +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
> > > +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> > > +into it.
> > > +
> > > +VCPU Kicks
> > > +----------
> > > +
> > > +A VCPU kick does one of three things:
> > > +
> > > + 1) wakes a sleeping VCPU (which sleeps outside guest mode).
> > 
> > You could clarify this to say that a sleeping VCPU is a VCPU thread
> > which is not runnable and placed on waitqueue, and waking it makes
> > the thread runnable again.
> > 
> > > + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
> > > +    out.
> > > + 3) nothing, when the VCPU is already outside guest mode and not sleeping.
> > > +
> > > +VCPU Request Internals
> > > +======================
> > > +
> > > +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> > > +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
> > > +may also be used.  The first 8 bits are reserved for architecture
> > > +independent requests, all additional bits are available for architecture
> > > +dependent requests.
> > 
> > Should we explain the ones that are generically defined and how they're
> > supposed to be used?  For example, we don't use them on ARM, and I don't
> > think I understand why another thread would ever make a PENDING_TIMER
> > request on a vcpu?
> 
> Yes, I agree the general requests should be described.  I'll have to
> figure out how :-)  Describing KVM_REQ_UNHALT will likely lead to a
> subsection on kvm_vcpu_block(), as you bring up below.
> 
> > 
> > > +
> > > +VCPU Requests with Associated State
> > > +===================================
> > > +
> > > +Requesters that want the requested VCPU to handle new state need to ensure
> > > +the state is observable to the requested VCPU thread's CPU at the time the
> > 
> > nit: need to ensure that the newly written state is observable ... by
> > the time it observed the request.
> > 
> > > +CPU observes the request.  This means a write memory barrier should be
> >                                                                  ^^^
> > 							         must
> > 
> > > +insert between the preparation of the state and the write of the VCPU
> >     ^^^
> >    inserted
> > 
> > I would rephrase this as: '... after writing the new state to memory and
> > before setting the VCPU request bit.'
> > 
> > 
> > > +request bitmap.  Additionally, on the requested VCPU thread's side, a
> > > +corresponding read barrier should be issued after reading the request bit
> >                                 ^^^       ^^^
> > 			       must      inserted (for consistency)
> > 
> > 
> > 
> > > +and before proceeding to use the state associated with it.  See the kernel
> >                             ^^^    ^
> > 		           read    new
> > 
> > 
> > > +memory barrier documentation [2].
> > 
> > I think it would be great if this document explains if this is currently
> > taken care of by the API you explain above or if there are cases where
> > people have to explicitly insert these barriers, and in that case, which
> > barriers they should use (if we know at this point already).
> 
> Will do.  The current API does take care of it.  I'll state that.  I'd
> have to grep around to see if there are any non-API users that also need
> barriers, but as they could change, I probably wouldn't want to call them
> out is the doc.  So I guess I'll still just wave my hand at that type of
> use.
> 

Sounds good.

> > 
> > > +
> > > +VCPU Requests and Guest Mode
> > > +============================
> > > +
> > 
> > I feel like an intro about the overall goal here is missing.  How about
> > something like this:
> > 
> >   When making requests to VCPUs, we want to avoid the receiving VCPU
> >   executing inside the guest for an arbitrary long time without handling
> >   the request.  The way we prevent this from happening is by keeping
> >   track of when a VCPU is running and sending an IPI to the physical CPU
> >   running the VCPU when that is the case.  However, each architecture
> >   implementation of KVM must take great care to ensure that requests are
> >   not missed when a VCPU stops running at the same time when a request
> >   is received.
> > 
> > Also, I'm not sure what the semantics are with kvm_vcpu_block().  Is it
> > ok to send a request to a VCPU and then the VCPU blocks and goes to
> > sleep forever even though there are pending requests?
> > kvm_vcpu_check_block() doesn't seem to check vcpu->requests which would
> > indicate that this is the case, but maybe architectures that actually do
> > use requests implement something else themselves?
> 
> I'll add a kvm_vcpu_block() subsection as part of the KVM_REQ_UNHALT
> documentation.
> 
> > 
> > > +As long as the guest is either in guest mode, in which case it gets an IPI
> > 
> > guest is in guest mode?
> 
> oops, s/guest/vcpu/
> 
> > 
> > Perhaps this could be more clearly written as:
> > 
> > As long as the VCPU is running, it is marked as having vcpu->mode =
> > IN_GUEST MODE.  A requesting thread observing IN_GUEST_MODE will send an
> > IPI to the CPU running the VCPU thread.  On the other hand, when a
> > requesting thread observes vcpu->mode == OUTSIDE_GUEST_MODE, it will not send
> > any IPIs, but will simply set the request bit, a the VCPU thread will be
> > able to check the requests before running the VCPU again.  However, the
> > transition...
> > 
> > > +and will definitely see the request, or is outside guest mode, but has yet
> > > +to do its final request check, and therefore when it does, it will see the
> > > +request, then things will work.  However, the transition from outside to
> > > +inside guest mode, after the last request check has been made, opens a
> > > +window where a request could be made, but the VCPU would not see until it
> > > +exits guest mode some time later.  See the table below.
> > 
> > This text, and the table below, only deals with the details of entering
> > the guest.  Should we talk about kvm_vcpu_exiting_guest_mode() and
> > anything related to exiting the guest?
> 
> I think all !IN_GUEST_MODE should behave the same, so I was avoiding
> the use of EXITING_GUEST_MODE and OUTSIDE_GUEST_MODE, which wouldn't be
> hard to address, but then I'd also have to address
> READING_SHADOW_PAGE_TABLES, which may complicate the document more than
> necessary.  I'm not sure we need to address a VCPU exiting guest mode,
> other than making sure it's clear that a VCPU that exits must check
> requests before it enters again.

But the problem is that kvm_make_all_cpus_request() only sends IPIs to
CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
subtlety here which I feel like it's dangerous to paper over.

> 
> > 
> > > +
> > > ++------------------+-----------------+----------------+--------------+
> > > +| vcpu->mode       | done last check | kick sends IPI | request seen |
> > > ++==================+=================+================+==============+
> > > +| IN_GUEST_MODE    |      N/A        |      YES       |     YES      |
> > > ++------------------+-----------------+----------------+--------------+
> > > +| !IN_GUEST_MODE   |      NO         |      NO        |     YES      |
> > > ++------------------+-----------------+----------------+--------------+
> > > +| !IN_GUEST_MODE   |      YES        |      NO        |     NO       |
> > > ++------------------+-----------------+----------------+--------------+
> > > +
> > > +To ensure the third scenario shown in the table above cannot happen, we
> > > +need to ensure the VCPU's mode change is observable by all CPUs prior to
> > > +its final request check and that a requester's request is observable by
> > > +the requested VCPU prior to the kick.  To do that we need general memory
> > > +barriers between each pair of operations involving mode and requests, i.e.
> > > +
> > > +  CPU_i                                  CPU_j
> > > +-------------------------------------------------------------------------
> > > +  vcpu->mode = IN_GUEST_MODE;            kvm_make_request(REQ, vcpu);
> > > +  smp_mb();                              smp_mb();
> > > +  if (kvm_request_pending(vcpu))         if (vcpu->mode == IN_GUEST_MODE)
> > > +      handle_requests();                     send_IPI(vcpu->cpu);
> > > +
> > > +Whether explicit barriers are needed, or reliance on implicit barriers is
> > > +sufficient, is architecture dependent.  Alternatively, an architecture may
> > > +choose to just always send the IPI, as not sending it, when it's not
> > > +necessary, is just an optimization.
> > 
> > Is this universally true?  This is certainly true on ARM, because we
> > disable interrupts before doing all this, so the IPI remains pending and
> > causes an immediate exit, but if any of the above is done with
> > interrupts enabled, just sending an IPI does nothing to ensure the
> > request is observed.  Perhaps this is not a case we should care about.
> 
> I'll try to make this less generic, as some architectures may not work
> this way.  Indeed, s390 doesn't seem to have kvm_vcpu_kick(), so I guess
> things don't work this way for them.
> 
> > 
> > > +
> > > +Additionally, the error prone third scenario described above also exhibits
> > > +why a request-less VCPU kick is almost never correct.  Without the
> > > +assurance that a non-IPI generating kick will still result in an action by
> > > +the requested VCPU, as the final kvm_request_pending() check does, then
> > > +the kick may not initiate anything useful at all.  If, for instance, a
> > > +request-less kick was made to a VCPU that was just about to set its mode
> > > +to IN_GUEST_MODE, meaning no IPI is sent, then the VCPU may continue its
> > > +entry without actually having done whatever it was the kick was meant to
> > > +initiate.
> > 
> > Indeed.
> > 
> > 
> > > +
> > > +References
> > > +==========
> > > +
> > > +[1] Documentation/core-api/atomic_ops.rst
> > > +[2] Documentation/memory-barriers.txt
> > > -- 
> > > 2.9.3
> > > 
> > 
> > This is a great writeup!  I enjoyed reading it and it made me think more
> > carefully about a number of things, so I definitely think we should
> > merge this.
> >
> 
> Thanks Christoffer!  I'll take all your suggestions above and try to
> answer your questions for v2.
> 

Awesome, I hope Radim finds this useful for his series and the rework
later on.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 17:19       ` Christoffer Dall
@ 2017-04-04 17:35         ` Paolo Bonzini
  2017-04-04 17:57           ` Christoffer Dall
  2017-04-04 18:18           ` Andrew Jones
  0 siblings, 2 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-04 17:35 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Andrew Jones, kvmarm, kvm, marc.zyngier, rkrcmar



On 04/04/2017 19:19, Christoffer Dall wrote:
> On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 04/04/2017 18:04, Christoffer Dall wrote:
>>>> For pause, only the requester should do the clearing.
>>
>> This suggests that maybe this should not be a request.  The request
>> would be just the need to act on a GIC command, exactly as before this patch.
> 
> Maybe the semantics should be:
> 
> requester:                                vcpu:
> ----------                                -----
> make_requet(vcpu, KVM_REQ_PAUSE);
>                                           handles the request by
> 					  clearing it and setting
> 					  vcpu->pause = true;
> wait until vcpu->pause == true
> make_request(vcpu, KVM_REQ_UNPAUSE);
>                                           vcpus 'wake up' clear the
> 					  UNPAUSE request and set
> 					  vcpu->pause = false;
> 
> The benefit would be that we get to re-use the complicated "figure out
> the VCPU mode and whether or not we should send an IPI and get the
> barriers right" stuff.

I don't think that's necessary.  As long as the complicated stuff avoids
that you enter the VCPU, the next run through the loop will
find that 'vcpu->arch.power_off || vcpu->arch.pause' is true and
go to sleep.

>> What I don't understand is:
>>
>>>> With this patch, while the vcpu will still initially enter
>>>> the guest, it will exit immediately due to the IPI sent by the vcpu
>>>> kick issued after making the vcpu request.
>>
>> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?
>>
>> So this:
>>
>> +			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
>> +			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
>>
>> is the crux of the fix, you can keep using vcpu->arch.pause.
> 
> Probably; I feel like there's a fix here which should be a separate
> patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
> the pause flag.

Yeah, and then the pause flag can stay.

>> By the way, vcpu->arch.power_off can go away from this "if" too because
>> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
> 
> But we also allow setting the power_off flag from the in-kernel PSCI
> emulation in the context of another VCPU thread.

Right.  That code does

                tmp->arch.power_off = true;
                kvm_vcpu_kick(tmp);

and I think what's really missing in arm.c is the "if (vcpu->mode ==
EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
also simply use kvm_vcpu_kick.

My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
reuse the smp_call_function_many code in kvm_make_all_cpus_request.
Once you add EXITING_GUEST_MODE, ARM can just add a new function
kvm_kick_all_cpus and use it for both pause and power_off.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-04 17:23       ` Christoffer Dall
@ 2017-04-04 17:36         ` Paolo Bonzini
  2017-04-05 14:11         ` Radim Krčmář
  1 sibling, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-04 17:36 UTC (permalink / raw)
  To: Christoffer Dall, Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, rkrcmar



On 04/04/2017 19:23, Christoffer Dall wrote:
>> I think all !IN_GUEST_MODE should behave the same, so I was avoiding
>> the use of EXITING_GUEST_MODE and OUTSIDE_GUEST_MODE, which wouldn't be
>> hard to address, but then I'd also have to address
>> READING_SHADOW_PAGE_TABLES, which may complicate the document more than
>> necessary.  I'm not sure we need to address a VCPU exiting guest mode,
>> other than making sure it's clear that a VCPU that exits must check
>> requests before it enters again.
> 
> But the problem is that kvm_make_all_cpus_request() only sends IPIs to
> CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
> about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
> subtlety here which I feel like it's dangerous to paper over.

Don't bother documenting READING_SHADOW_PAGE_TABLES---but
EXITING_GUEST_MODE should be used in ARM and documented, because it's
the key in making kvm_vcpu_kick not racy.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 5/9] KVM: arm/arm64: replace vcpu->arch.power_off with a vcpu request
  2017-03-31 16:06 ` [PATCH v2 5/9] KVM: arm/arm64: replace vcpu->arch.power_off " Andrew Jones
@ 2017-04-04 17:37   ` Christoffer Dall
  0 siblings, 0 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 17:37 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Fri, Mar 31, 2017 at 06:06:54PM +0200, Andrew Jones wrote:
> Like pause, replacing power_off with a vcpu request ensures
> visibility of changes and avoids the final race before entering
> the guest.

I think it's worth explaining the race in the commit message first, just
briefly.

> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  4 +---
>  arch/arm/kvm/arm.c                | 32 ++++++++++++++++++--------------
>  arch/arm/kvm/psci.c               | 17 +++++------------
>  arch/arm64/include/asm/kvm_host.h |  4 +---
>  4 files changed, 25 insertions(+), 32 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 52c25536d254..afed5d44634d 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -46,6 +46,7 @@
>  #endif
>  
>  #define KVM_REQ_PAUSE		8
> +#define KVM_REQ_POWER_OFF	9
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> @@ -170,9 +171,6 @@ struct kvm_vcpu_arch {
>  	 * here.
>  	 */
>  
> -	/* vcpu power-off state */
> -	bool power_off;
> -
>  	/* IO related fields */
>  	struct kvm_decode mmio_decode;
>  
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index f3bfbb5f3d96..7ed39060b1cf 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -381,7 +381,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> -	if (vcpu->arch.power_off)
> +	if (test_bit(KVM_REQ_POWER_OFF, &vcpu->requests))
>  		mp_state->mp_state = KVM_MP_STATE_STOPPED;
>  	else
>  		mp_state->mp_state = KVM_MP_STATE_RUNNABLE;
> @@ -394,10 +394,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  {
>  	switch (mp_state->mp_state) {
>  	case KVM_MP_STATE_RUNNABLE:
> -		vcpu->arch.power_off = false;
> +		clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
>  		break;
>  	case KVM_MP_STATE_STOPPED:
> -		vcpu->arch.power_off = true;
> +		set_bit(KVM_REQ_POWER_OFF, &vcpu->requests);

this looks a bit dodgy; I am getting an even stronger feeling that we
should keep power_off = true, and here we can safely set it directly
because we have mutual exclusion from KVM_RUN, and that leaves us using
requests only to "ask the VCPU to do something for us, like setting its
power_off state", except...

>  		break;
>  	default:
>  		return -EINVAL;
> @@ -415,9 +415,9 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>   */
>  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
>  {
> -	return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
> -		&& !v->arch.power_off
> -		&& !test_bit(KVM_REQ_PAUSE, &v->requests));
> +	return (!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v)) &&
> +		!test_bit(KVM_REQ_POWER_OFF, &v->requests) &&
> +		!test_bit(KVM_REQ_PAUSE, &v->requests);
>  }
>  
>  /* Just ensure a guest exit from a particular CPU */
> @@ -578,8 +578,9 @@ static void vcpu_sleep(struct kvm_vcpu *vcpu)
>  {
>  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
>  
> -	swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
> -		(!test_bit(KVM_REQ_PAUSE, &vcpu->requests))));
> +	swait_event_interruptible(*wq,
> +		!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests) &&
> +		!test_bit(KVM_REQ_PAUSE, &vcpu->requests));
>  }
>  
>  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> @@ -632,8 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		update_vttbr(vcpu->kvm);
>  
> -		if (vcpu->arch.power_off || test_bit(KVM_REQ_PAUSE, &vcpu->requests))
> -			vcpu_sleep(vcpu);
> +		if (kvm_request_pending(vcpu)) {
> +			if (test_bit(KVM_REQ_POWER_OFF, &vcpu->requests) ||
> +			    test_bit(KVM_REQ_PAUSE, &vcpu->requests))
> +				vcpu_sleep(vcpu);
> +		}

...hmm, I do like that we only need to check the requests variable once,
and not check multiple flags, but at least we'd only have to do it once
(not after disabling interrupts again).

>  
>  		/*
>  		 * Preparing the interrupts to be injected also
> @@ -664,8 +668,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
>  		smp_mb();
>  
> -		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> -			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> +		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm)
> +		    || kvm_request_pending(vcpu)) {
>  			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
>  			local_irq_enable();
>  			kvm_pmu_sync_hwstate(vcpu);
> @@ -892,9 +896,9 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>  	 * Handle the "start in power-off" case.
>  	 */
>  	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
> -		vcpu->arch.power_off = true;
> +		set_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
>  	else
> -		vcpu->arch.power_off = false;
> +		clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
>  
>  	return 0;
>  }
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index 82fe7eb5b6a7..f732484abc7a 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -64,7 +64,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
>  
>  static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
>  {
> -	vcpu->arch.power_off = true;
> +	set_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
>  }
>  
>  static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> @@ -88,7 +88,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	 */
>  	if (!vcpu)
>  		return PSCI_RET_INVALID_PARAMS;
> -	if (!vcpu->arch.power_off) {
> +	if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
>  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
>  			return PSCI_RET_ALREADY_ON;
>  		else
> @@ -116,8 +116,7 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	 * the general puspose registers are undefined upon CPU_ON.
>  	 */
>  	vcpu_set_reg(vcpu, 0, context_id);
> -	vcpu->arch.power_off = false;
> -	smp_mb();		/* Make sure the above is visible */
> +	clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
>  
>  	wq = kvm_arch_vcpu_wq(vcpu);
>  	swake_up(wq);
> @@ -154,7 +153,7 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
>  		mpidr = kvm_vcpu_get_mpidr_aff(tmp);
>  		if ((mpidr & target_affinity_mask) == target_affinity) {
>  			matching_cpus++;
> -			if (!tmp->arch.power_off)
> +			if (!test_bit(KVM_REQ_POWER_OFF, &tmp->requests))
>  				return PSCI_0_2_AFFINITY_LEVEL_ON;
>  		}
>  	}
> @@ -167,9 +166,6 @@ static unsigned long kvm_psci_vcpu_affinity_info(struct kvm_vcpu *vcpu)
>  
>  static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
>  {
> -	int i;
> -	struct kvm_vcpu *tmp;
> -
>  	/*
>  	 * The KVM ABI specifies that a system event exit may call KVM_RUN
>  	 * again and may perform shutdown/reboot at a later time that when the
> @@ -179,10 +175,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
>  	 * after this call is handled and before the VCPUs have been
>  	 * re-initialized.
>  	 */
> -	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> -		tmp->arch.power_off = true;
> -		kvm_vcpu_kick(tmp);
> -	}
> +	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_POWER_OFF);

certainly we want this part of the change in some form.

Thanks,
-Christoffer

>  
>  	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
>  	vcpu->run->system_event.type = type;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 6e1271a77e92..e78895f675d0 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -43,6 +43,7 @@
>  #define KVM_VCPU_MAX_FEATURES 4
>  
>  #define KVM_REQ_PAUSE		8
> +#define KVM_REQ_POWER_OFF	9
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> @@ -253,9 +254,6 @@ struct kvm_vcpu_arch {
>  		u32	mdscr_el1;
>  	} guest_debug_preserved;
>  
> -	/* vcpu power-off state */
> -	bool power_off;
> -
>  	/* IO related fields */
>  	struct kvm_decode mmio_decode;
>  
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection
  2017-03-31 16:06 ` [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection Andrew Jones
@ 2017-04-04 17:42   ` Christoffer Dall
  2017-04-04 18:27     ` Andrew Jones
  2017-04-04 18:59     ` Paolo Bonzini
  2017-04-04 18:51   ` Paolo Bonzini
  1 sibling, 2 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 17:42 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Fri, Mar 31, 2017 at 06:06:55PM +0200, Andrew Jones wrote:
> Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> kick meant to trigger the interrupt injection could be sent while
> the VCPU is outside guest mode, which means no IPI is sent, and
> after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> the updated GIC state until its next exit some time later for some
> other reason.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm/kvm/arm.c                |  1 +
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  virt/kvm/arm/arch_timer.c         |  1 +
>  virt/kvm/arm/vgic/vgic.c          | 12 ++++++++++--
>  5 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index afed5d44634d..0b8a6d6b3cb3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -47,6 +47,7 @@
>  
>  #define KVM_REQ_PAUSE		8
>  #define KVM_REQ_POWER_OFF	9
> +#define KVM_REQ_IRQ_PENDING	10
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 7ed39060b1cf..a106feccf314 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -768,6 +768,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>  	 * trigger a world-switch round on the running physical CPU to set the
>  	 * virtual IRQ/FIQ fields in the HCR appropriately.
>  	 */
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  
>  	return 0;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e78895f675d0..7057512b3474 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -44,6 +44,7 @@
>  
>  #define KVM_REQ_PAUSE		8
>  #define KVM_REQ_POWER_OFF	9
> +#define KVM_REQ_IRQ_PENDING	10
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 35d7100e0815..3c48abbf951b 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
>  	 * If the vcpu is blocked we want to wake it up so that it will see
>  	 * the timer has expired when entering the guest.
>  	 */
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 654dfd40e449..31fb89057f0c 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  		 * won't see this one until it exits for some other
>  		 * reason.
>  		 */
> -		if (vcpu)
> +		if (vcpu) {
> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  			kvm_vcpu_kick(vcpu);
> +		}
>  		return false;
>  	}
>  
> @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  	spin_unlock(&irq->irq_lock);
>  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  
>  	return true;
> @@ -654,6 +657,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>  	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  	vgic_flush_lr_state(vcpu);
>  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +
> +	/* The GIC is now ready to deliver the IRQ. */
> +	clear_bit(KVM_REQ_IRQ_PENDING, &vcpu->requests);

this is not going to be called when we don't have the vgic, which means
that if vcpu_interrupt_line() is used as you modify it above, the
request will never get cleared.

>  }
>  
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> @@ -691,8 +697,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
>  	 * a good kick...
>  	 */
>  	kvm_for_each_vcpu(c, vcpu, kvm) {
> -		if (kvm_vgic_vcpu_pending_irq(vcpu))
> +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  			kvm_vcpu_kick(vcpu);
> +		}
>  	}
>  }
>  
> -- 
> 2.9.3
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick
  2017-03-31 16:06 ` [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick Andrew Jones
@ 2017-04-04 17:46   ` Christoffer Dall
  2017-04-04 18:29     ` Andrew Jones
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 17:46 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Fri, Mar 31, 2017 at 06:06:56PM +0200, Andrew Jones wrote:
> Refactor PMU overflow handling in order to remove the request-less
> vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
> there should be no chance that a kick sent at just the wrong time
> (between the VCPU's call to kvm_pmu_flush_hwstate() and before it
> enters guest mode) results in a failure for the guest to see updated
> GIC state until its next exit some time later for some other reason.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  virt/kvm/arm/pmu.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 69ccce308458..9d725f3afb11 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -203,6 +203,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
>  	return reg;
>  }
>  
> +static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> +	bool overflow;
> +
> +	overflow = !!kvm_pmu_overflow_status(vcpu);
> +	if (pmu->irq_level != overflow) {
> +		pmu->irq_level = overflow;
> +		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> +				    pmu->irq_num, overflow);
> +	}
> +}
> +

If we are changing the way the PMU works to adjust the interrupt
signaling whenever the PMU changes its internal state, do we still ahv
to call kvm_pmu_update_state() from each flush/sync path now?

>  /**
>   * kvm_pmu_overflow_set - set PMU overflow interrupt
>   * @vcpu: The vcpu pointer
> @@ -210,31 +223,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
>   */
>  void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
>  {
> -	u64 reg;
> -
>  	if (val == 0)
>  		return;
>  
>  	vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
> -	reg = kvm_pmu_overflow_status(vcpu);
> -	if (reg != 0)
> -		kvm_vcpu_kick(vcpu);
> +	kvm_pmu_check_overflow(vcpu);
>  }
>  
>  static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> -	bool overflow;
> -
>  	if (!kvm_arm_pmu_v3_ready(vcpu))
>  		return;
>  
> -	overflow = !!kvm_pmu_overflow_status(vcpu);
> -	if (pmu->irq_level != overflow) {
> -		pmu->irq_level = overflow;
> -		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> -				    pmu->irq_num, overflow);
> -	}
> +	kvm_pmu_check_overflow(vcpu);
>  }
>  
>  /**
> -- 
> 2.9.3
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 16:04   ` Christoffer Dall
  2017-04-04 16:24     ` Paolo Bonzini
@ 2017-04-04 17:57     ` Andrew Jones
  2017-04-04 19:04       ` Christoffer Dall
  1 sibling, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-04 17:57 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Tue, Apr 04, 2017 at 06:04:17PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:53PM +0200, Andrew Jones wrote:
> > This not only ensures visibility of changes to pause by using
> > atomic ops, but also plugs a small race where a vcpu could get its
> > pause state enabled just after its last check before entering the
> > guest. With this patch, while the vcpu will still initially enter
> > the guest, it will exit immediately due to the IPI sent by the vcpu
> > kick issued after making the vcpu request.
> > 
> > We use bitops, rather than kvm_make/check_request(), because we
> > don't need the barriers they provide,
> 
> why not?

I'll add that it's because the only state of interest is the request bit
itself.  When the request is observable then we're good to go, no need to
ensure that at the time the request is observable, something else is too.

> 
> > nor do we want the side-effect
> > of kvm_check_request() clearing the request. For pause, only the
> > requester should do the clearing.
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  5 +----
> >  arch/arm/kvm/arm.c                | 45 +++++++++++++++++++++++++++------------
> >  arch/arm64/include/asm/kvm_host.h |  5 +----
> >  3 files changed, 33 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 31ee468ce667..52c25536d254 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -45,7 +45,7 @@
> >  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
> >  #endif
> >  
> > -#define KVM_REQ_VCPU_EXIT	8
> > +#define KVM_REQ_PAUSE		8
> >  
> >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> >  int __attribute_const__ kvm_target_cpu(void);
> > @@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
> >  	/* vcpu power-off state */
> >  	bool power_off;
> >  
> > -	 /* Don't run the guest (internal implementation need) */
> > -	bool pause;
> > -
> >  	/* IO related fields */
> >  	struct kvm_decode mmio_decode;
> >  
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 314eb6abe1ff..f3bfbb5f3d96 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
> >  
> >  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> >  {
> > +	/*
> > +	 * If we return true from this function, then it means the vcpu is
> > +	 * either in guest mode, or has already indicated that it's in guest
> > +	 * mode. The indication is done by setting ->mode to IN_GUEST_MODE,
> > +	 * and must be done before the final kvm_request_pending() read. It's
> > +	 * important that the observability of that order be enforced and that
> > +	 * the request receiving CPU can observe any new request before the
> > +	 * requester issues a kick. Thus, the general barrier below pairs with
> > +	 * the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
> > +	 * write to ->mode and the final request pending read.
> > +	 */
> 
> I am having a hard time understanding this comment.  For example, I
> don't understand the difference between 'is either in guest mode or has
> already indicated it's in guest mode'.  Which case is which again, and
> how are we checking for two cases below?
> 
> Also, the stuff about observability of an order is hard to follow, and
> the comment assumes the reader is thinking about the specific race when
> entering the guest.
> 
> I think we should focus on getting the documentation in place, refer to
> the documentation from here, and be much more brief and say something
> like:
> 
> 	/*
> 	 * The memory barrier below pairs with the barrier in
> 	 * kvm_arch_vcpu_ioctl_run() between writes to vcpu->mode
> 	 * and reading vcpu->requests before entering the guest.
> 	 *
> 	 * Ensures that the VCPU thread's CPU can observe changes to
> 	 * vcpu->requests written prior to calling this function before
> 	 * it writes vcpu->mode = IN_GUEST_MODE, and correspondingly
> 	 * ensures that this CPU observes vcpu->mode == IN_GUEST_MODE
> 	 * only if the VCPU thread's CPU could observe writes to
> 	 * vcpu->requests from this CPU.
> 	 /
> 
> Is this correct?  I'm not really sure anymore?

It's confusing because we have cross dependencies on the negatives of
two conditions.

Here's the cross dependencies:

  vcpu->mode = IN_GUEST_MODE;   ---   ---  kvm_make_request(REQ, vcpu);
  smp_mb();                        \ /     smp_mb();
                                    X
                                   / \
  if (kvm_request_pending(vcpu))<--   -->  if (vcpu->mode == IN_GUEST_MODE)

On each side the smp_mb() ensures no reordering of the pair of operations
that each side has.  I.e. on the LHS the requests LOAD cannot be ordered
before the mode STORE and on the RHS side the mode LOAD cannot be ordered
before the requests STORE.  This is why they must be general barriers.

Now, for extra fun, the cross dependencies arise because we care about
the cases when we *don't* observe the respective dependency.

Condition 1:

  The final requests check in vcpu run, if (kvm_request_pending(vcpu))

  What we really care about though is !kvm_request_pending(vcpu).  When
  we observe !kvm_request_pending(vcpu) we know we're safe to enter the
  guest.  We know that any thread in the process of making a request has
  yet to check 'if (vcpu->mode == IN_GUEST_MODE)', so if it was just about
  to set a request, then it doesn't matter, as it will observe mode ==
  IN_GUEST_MODE afterwards (thanks to the paired smp_mb()) and send the
  IPI.

Condition 2:

  The kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE check we do
  here in this function, kvm_arch_vcpu_should_kick()

  What we really care about is (vcpu->mode != IN_GUEST_MODE).  When
  we observe (vcpu->mode != IN_GUEST_MODE) we know we're safe to not
  send the IPI.  We're safe because, by not observing IN_GUEST_MODE,
  we know the VCPU thread has yet to do its final requests check,
  since, thanks to the paired smp_mb(), we know that order must be
  enforced.


I'll try to merge what I originally wrote, with your suggestion, and
some of what I just wrote now.  But, also like you suggest, I'll put
the bulk of it in the document and then just reference it.

> 
> There's also the obvious fact that we're adding this memory barrier
> inside a funciton that checks if we should kick a vcpu, and there's no
> documentation that says that this is always called in association with
> setting a request, is there?

You're right, there's nothing forcing this.  Just the undocumented
kvm_cpu_kick() is needed after a request pattern.  I can try to add
something to the doc to highlight the importance of kvm_cpu_kick(),
which calls kvm_arch_vcpu_should_kick() and therefore is a fairly
safe place to put an explicit barrier if the architecture requires one.

> 
> I finally don't undertand why this would be a requirement only on ARM?

At least x86's cmpxchg() always produces the equivalent of a general
memory barrier before and after the exchange, not just on success, like
ARM.

> 
> > +	smp_mb();
> >  	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
> >  }
> >  
> > @@ -404,7 +416,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> >  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> >  {
> >  	return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
> > -		&& !v->arch.power_off && !v->arch.pause);
> > +		&& !v->arch.power_off
> > +		&& !test_bit(KVM_REQ_PAUSE, &v->requests));
> >  }
> >  
> >  /* Just ensure a guest exit from a particular CPU */
> > @@ -535,17 +548,12 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
> >  
> >  void kvm_arm_halt_guest(struct kvm *kvm)
> >  {
> > -	int i;
> > -	struct kvm_vcpu *vcpu;
> > -
> > -	kvm_for_each_vcpu(i, vcpu, kvm)
> > -		vcpu->arch.pause = true;
> > -	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
> > +	kvm_make_all_cpus_request(kvm, KVM_REQ_PAUSE);
> >  }
> >  
> >  void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> >  {
> > -	vcpu->arch.pause = true;
> > +	set_bit(KVM_REQ_PAUSE, &vcpu->requests);
> >  	kvm_vcpu_kick(vcpu);
> >  }
> >  
> > @@ -553,7 +561,7 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> >  {
> >  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> >  
> > -	vcpu->arch.pause = false;
> > +	clear_bit(KVM_REQ_PAUSE, &vcpu->requests);
> >  	swake_up(wq);
> >  }
> >  
> > @@ -571,7 +579,7 @@ static void vcpu_sleep(struct kvm_vcpu *vcpu)
> >  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> >  
> >  	swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
> > -				       (!vcpu->arch.pause)));
> > +		(!test_bit(KVM_REQ_PAUSE, &vcpu->requests))));
> >  }
> >  
> >  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> > @@ -624,7 +632,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		update_vttbr(vcpu->kvm);
> >  
> > -		if (vcpu->arch.power_off || vcpu->arch.pause)
> > +		if (vcpu->arch.power_off || test_bit(KVM_REQ_PAUSE, &vcpu->requests))
> >  			vcpu_sleep(vcpu);
> >  
> >  		/*
> > @@ -647,8 +655,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  			run->exit_reason = KVM_EXIT_INTR;
> >  		}
> >  
> > +		/*
> > +		 * Indicate we're in guest mode now, before doing a final
> > +		 * check for pending vcpu requests. The general barrier
> > +		 * pairs with the one in kvm_arch_vcpu_should_kick().
> > +		 * Please see the comment there for more details.
> > +		 */
> > +		WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
> > +		smp_mb();
> 
> There are two changes here:
> 
> there's a change from a normal write to a WRITE_ONCE and there's also a
> change to that adds a memory barrier.  I feel like I'd like to know if
> these are tied together or two separate cleanups.  I also wonder if we
> could split out more general changes from the pause thing to have a
> better log of why we changed the run loop?
> 
> It looks to me like there could be a separate patch that encapsulated
> the reads and writes of vcpu->mode into a function that does the
> WRITE_ONCE and READ_ONCE with a nice comment.

The thought crossed my mind as well, I guess I should have followed that
thought through.  Will do.

> 
> > +
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> > -			vcpu->arch.power_off || vcpu->arch.pause) {
> > +			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> > +			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> >  			local_irq_enable();
> >  			kvm_pmu_sync_hwstate(vcpu);
> >  			kvm_timer_sync_hwstate(vcpu);
> > @@ -664,11 +682,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		 */
> >  		trace_kvm_entry(*vcpu_pc(vcpu));
> >  		guest_enter_irqoff();
> > -		vcpu->mode = IN_GUEST_MODE;
> >  
> >  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> >  
> > -		vcpu->mode = OUTSIDE_GUEST_MODE;
> > +		WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> >  		vcpu->stat.exits++;
> >  		/*
> >  		 * Back from guest
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index e7705e7bb07b..6e1271a77e92 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -42,7 +42,7 @@
> >  
> >  #define KVM_VCPU_MAX_FEATURES 4
> >  
> > -#define KVM_REQ_VCPU_EXIT	8
> > +#define KVM_REQ_PAUSE		8
> >  
> >  int __attribute_const__ kvm_target_cpu(void);
> >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > @@ -256,9 +256,6 @@ struct kvm_vcpu_arch {
> >  	/* vcpu power-off state */
> >  	bool power_off;
> >  
> > -	/* Don't run the guest (internal implementation need) */
> > -	bool pause;
> > -
> >  	/* IO related fields */
> >  	struct kvm_decode mmio_decode;
> >  
> > -- 
> > 2.9.3
> 
> Thanks,
> -Christoffer

Thanks,
drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 17:35         ` Paolo Bonzini
@ 2017-04-04 17:57           ` Christoffer Dall
  2017-04-04 18:15             ` Paolo Bonzini
  2017-04-04 18:18           ` Andrew Jones
  1 sibling, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 17:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, kvmarm, kvm

On Tue, Apr 04, 2017 at 07:35:11PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 19:19, Christoffer Dall wrote:
> > On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 04/04/2017 18:04, Christoffer Dall wrote:
> >>>> For pause, only the requester should do the clearing.
> >>
> >> This suggests that maybe this should not be a request.  The request
> >> would be just the need to act on a GIC command, exactly as before this patch.
> > 
> > Maybe the semantics should be:
> > 
> > requester:                                vcpu:
> > ----------                                -----
> > make_requet(vcpu, KVM_REQ_PAUSE);
> >                                           handles the request by
> > 					  clearing it and setting
> > 					  vcpu->pause = true;
> > wait until vcpu->pause == true
> > make_request(vcpu, KVM_REQ_UNPAUSE);
> >                                           vcpus 'wake up' clear the
> > 					  UNPAUSE request and set
> > 					  vcpu->pause = false;
> > 
> > The benefit would be that we get to re-use the complicated "figure out
> > the VCPU mode and whether or not we should send an IPI and get the
> > barriers right" stuff.
> 
> I don't think that's necessary.  As long as the complicated stuff avoids
> that you enter the VCPU, the next run through the loop will
> find that 'vcpu->arch.power_off || vcpu->arch.pause' is true and
> go to sleep.
> 
> >> What I don't understand is:
> >>
> >>>> With this patch, while the vcpu will still initially enter
> >>>> the guest, it will exit immediately due to the IPI sent by the vcpu
> >>>> kick issued after making the vcpu request.
> >>
> >> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?
> >>
> >> So this:
> >>
> >> +			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> >> +			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> >>
> >> is the crux of the fix, you can keep using vcpu->arch.pause.
> > 
> > Probably; I feel like there's a fix here which should be a separate
> > patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
> > the pause flag.
> 
> Yeah, and then the pause flag can stay.
> 
> >> By the way, vcpu->arch.power_off can go away from this "if" too because
> >> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
> > 
> > But we also allow setting the power_off flag from the in-kernel PSCI
> > emulation in the context of another VCPU thread.
> 
> Right.  That code does
> 
>                 tmp->arch.power_off = true;
>                 kvm_vcpu_kick(tmp);
> 
> and I think what's really missing in arm.c is the "if (vcpu->mode ==
> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
> also simply use kvm_vcpu_kick.

I see, that's why the cmpxchg() works the way it does.  We just still
need to move the vcpu->mode = IN_GUEST_MODE before our
with-interrupts-disabled check.

What I'm not sure is why you can get away without using a memory barrier
or WRITE_ONCE on x86, but is this simply because x86 is a strongly
ordered architecture?

> 
> My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
> reuse the smp_call_function_many code in kvm_make_all_cpus_request.

Your understanding is correct.

> Once you add EXITING_GUEST_MODE, ARM can just add a new function
> kvm_kick_all_cpus and use it for both pause and power_off.
> 

Yes, that should work.

I think Drew's approach should also work, but at this point, I'm not
really sure which approach is better than the other.


Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 17:57           ` Christoffer Dall
@ 2017-04-04 18:15             ` Paolo Bonzini
  2017-04-04 18:38               ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-04 18:15 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, kvmarm, kvm



On 04/04/2017 19:57, Christoffer Dall wrote:
>> Right.  That code does
>>
>>                 tmp->arch.power_off = true;
>>                 kvm_vcpu_kick(tmp);
>>
>> and I think what's really missing in arm.c is the "if (vcpu->mode ==
>> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
>> also simply use kvm_vcpu_kick.
> I see, that's why the cmpxchg() works the way it does.  We just still
> need to move the vcpu->mode = IN_GUEST_MODE before our
> with-interrupts-disabled check.
> 
> What I'm not sure is why you can get away without using a memory barrier
> or WRITE_ONCE on x86, but is this simply because x86 is a strongly
> ordered architecture?

x86 does have a memory barrier:

        vcpu->mode = IN_GUEST_MODE;

        srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
        smp_mb__after_srcu_read_unlock();

        /*
         * This handles the case where a posted interrupt was
         * notified with kvm_vcpu_kick.
         */
        if (kvm_lapic_enabled(vcpu)) {
                if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
                        kvm_x86_ops->sync_pir_to_irr(vcpu);
        }

        if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests

and WRITE_ONCE is not needed if you have a memory barrier (though I find it
more self-documenting to use it anyway).

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 17:35         ` Paolo Bonzini
  2017-04-04 17:57           ` Christoffer Dall
@ 2017-04-04 18:18           ` Andrew Jones
  2017-04-04 18:59             ` Paolo Bonzini
  1 sibling, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-04 18:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Christoffer Dall, kvmarm, kvm, marc.zyngier, rkrcmar

On Tue, Apr 04, 2017 at 07:35:11PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 19:19, Christoffer Dall wrote:
> > On Tue, Apr 04, 2017 at 06:24:36PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 04/04/2017 18:04, Christoffer Dall wrote:
> >>>> For pause, only the requester should do the clearing.
> >>
> >> This suggests that maybe this should not be a request.  The request
> >> would be just the need to act on a GIC command, exactly as before this patch.
> > 
> > Maybe the semantics should be:
> > 
> > requester:                                vcpu:
> > ----------                                -----
> > make_requet(vcpu, KVM_REQ_PAUSE);
> >                                           handles the request by
> > 					  clearing it and setting
> > 					  vcpu->pause = true;
> > wait until vcpu->pause == true
> > make_request(vcpu, KVM_REQ_UNPAUSE);
> >                                           vcpus 'wake up' clear the
> > 					  UNPAUSE request and set
> > 					  vcpu->pause = false;

I thought of this originally, but then decided to [ab]use the concept
of pause being a boolean and requests being bits in a bitmap.  Simpler,
but arguably not as clean.

> > 
> > The benefit would be that we get to re-use the complicated "figure out
> > the VCPU mode and whether or not we should send an IPI and get the
> > barriers right" stuff.
> 
> I don't think that's necessary.  As long as the complicated stuff avoids
> that you enter the VCPU, the next run through the loop will
> find that 'vcpu->arch.power_off || vcpu->arch.pause' is true and
> go to sleep.
> 
> >> What I don't understand is:
> >>
> >>>> With this patch, while the vcpu will still initially enter
> >>>> the guest, it will exit immediately due to the IPI sent by the vcpu
> >>>> kick issued after making the vcpu request.
> >>
> >> Isn't this also true of KVM_REQ_VCPU_EXIT that was used before?

As you state below, KVM_REQ_VCPU_EXIT was getting used as a
kick-all-vcpus, but without the request/mode stuff it wasn't
sufficient for the small race window.

> >>
> >> So this:
> >>
> >> +			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> >> +			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> >>
> >> is the crux of the fix, you can keep using vcpu->arch.pause.
> > 
> > Probably; I feel like there's a fix here which should be a separate
> > patch from using a different requests instead of the KVM_REQ_VCPU_EXIT +
> > the pause flag.
> 
> Yeah, and then the pause flag can stay.
> 
> >> By the way, vcpu->arch.power_off can go away from this "if" too because
> >> KVM_RUN and KVM_SET_MP_STATE are mutually exclusive through the vcpu mutex.
> > 
> > But we also allow setting the power_off flag from the in-kernel PSCI
> > emulation in the context of another VCPU thread.
> 
> Right.  That code does
> 
>                 tmp->arch.power_off = true;
>                 kvm_vcpu_kick(tmp);
> 
> and I think what's really missing in arm.c is the "if (vcpu->mode ==
> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
> also simply use kvm_vcpu_kick.
> 
> My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
> reuse the smp_call_function_many code in kvm_make_all_cpus_request.
> Once you add EXITING_GUEST_MODE, ARM can just add a new function
> kvm_kick_all_cpus and use it for both pause and power_off.
> 

I was wondering about the justification of
'if (vcpu->mode == EXITING_GUEST_MODE)' in the x86 code, as it seemed
redundant to me with the requests.  I'll have another think on it to see
if request-less kicks can be satisfied in all cases by this, as long as we
have the mode setting, barrier, mode checking order ensured in vcpu run.

Thanks,
drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection
  2017-04-04 17:42   ` Christoffer Dall
@ 2017-04-04 18:27     ` Andrew Jones
  2017-04-04 18:59     ` Paolo Bonzini
  1 sibling, 0 replies; 85+ messages in thread
From: Andrew Jones @ 2017-04-04 18:27 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, Apr 04, 2017 at 07:42:08PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:55PM +0200, Andrew Jones wrote:
> > Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> > kick meant to trigger the interrupt injection could be sent while
> > the VCPU is outside guest mode, which means no IPI is sent, and
> > after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> > the updated GIC state until its next exit some time later for some
> > other reason.
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  1 +
> >  arch/arm/kvm/arm.c                |  1 +
> >  arch/arm64/include/asm/kvm_host.h |  1 +
> >  virt/kvm/arm/arch_timer.c         |  1 +
> >  virt/kvm/arm/vgic/vgic.c          | 12 ++++++++++--
> >  5 files changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index afed5d44634d..0b8a6d6b3cb3 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -47,6 +47,7 @@
> >  
> >  #define KVM_REQ_PAUSE		8
> >  #define KVM_REQ_POWER_OFF	9
> > +#define KVM_REQ_IRQ_PENDING	10
> >  
> >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> >  int __attribute_const__ kvm_target_cpu(void);
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 7ed39060b1cf..a106feccf314 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -768,6 +768,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> >  	 * trigger a world-switch round on the running physical CPU to set the
> >  	 * virtual IRQ/FIQ fields in the HCR appropriately.
> >  	 */
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  
> >  	return 0;
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index e78895f675d0..7057512b3474 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -44,6 +44,7 @@
> >  
> >  #define KVM_REQ_PAUSE		8
> >  #define KVM_REQ_POWER_OFF	9
> > +#define KVM_REQ_IRQ_PENDING	10
> >  
> >  int __attribute_const__ kvm_target_cpu(void);
> >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 35d7100e0815..3c48abbf951b 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
> >  	 * If the vcpu is blocked we want to wake it up so that it will see
> >  	 * the timer has expired when entering the guest.
> >  	 */
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  }
> >  
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index 654dfd40e449..31fb89057f0c 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  		 * won't see this one until it exits for some other
> >  		 * reason.
> >  		 */
> > -		if (vcpu)
> > +		if (vcpu) {
> > +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  			kvm_vcpu_kick(vcpu);
> > +		}
> >  		return false;
> >  	}
> >  
> > @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  	spin_unlock(&irq->irq_lock);
> >  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> >  
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  
> >  	return true;
> > @@ -654,6 +657,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
> >  	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
> >  	vgic_flush_lr_state(vcpu);
> >  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> > +
> > +	/* The GIC is now ready to deliver the IRQ. */
> > +	clear_bit(KVM_REQ_IRQ_PENDING, &vcpu->requests);
> 
> this is not going to be called when we don't have the vgic, which means
> that if vcpu_interrupt_line() is used as you modify it above, the
> request will never get cleared.

Ah, thanks. I'll try to sort that out a better way.  Of course with
Paolo's comments about request-less kicks working due to the mode
change in kvm_vcpu_kick() (when an additional condition is added
before the vcpu enters guest mode), then maybe we don't need this, or
the next patch, at all.

Thanks,
drew

> 
> >  }
> >  
> >  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> > @@ -691,8 +697,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
> >  	 * a good kick...
> >  	 */
> >  	kvm_for_each_vcpu(c, vcpu, kvm) {
> > -		if (kvm_vgic_vcpu_pending_irq(vcpu))
> > +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> > +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  			kvm_vcpu_kick(vcpu);
> > +		}
> >  	}
> >  }
> >  
> > -- 
> > 2.9.3
> > 
> 
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick
  2017-04-04 17:46   ` Christoffer Dall
@ 2017-04-04 18:29     ` Andrew Jones
  2017-04-04 19:35       ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-04 18:29 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, Apr 04, 2017 at 07:46:12PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:56PM +0200, Andrew Jones wrote:
> > Refactor PMU overflow handling in order to remove the request-less
> > vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
> > there should be no chance that a kick sent at just the wrong time
> > (between the VCPU's call to kvm_pmu_flush_hwstate() and before it
> > enters guest mode) results in a failure for the guest to see updated
> > GIC state until its next exit some time later for some other reason.
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  virt/kvm/arm/pmu.c | 29 +++++++++++++++--------------
> >  1 file changed, 15 insertions(+), 14 deletions(-)
> > 
> > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> > index 69ccce308458..9d725f3afb11 100644
> > --- a/virt/kvm/arm/pmu.c
> > +++ b/virt/kvm/arm/pmu.c
> > @@ -203,6 +203,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
> >  	return reg;
> >  }
> >  
> > +static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > +	bool overflow;
> > +
> > +	overflow = !!kvm_pmu_overflow_status(vcpu);
> > +	if (pmu->irq_level != overflow) {
> > +		pmu->irq_level = overflow;
> > +		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > +				    pmu->irq_num, overflow);
> > +	}
> > +}
> > +
> 
> If we are changing the way the PMU works to adjust the interrupt
> signaling whenever the PMU changes its internal state, do we still ahv
> to call kvm_pmu_update_state() from each flush/sync path now?

The thought crossed my mind to rework that completely, in order to remove
that flush/sync, but then went for the smaller patch for this series.  I
can take a look at it though.

Thanks,
drew

> 
> >  /**
> >   * kvm_pmu_overflow_set - set PMU overflow interrupt
> >   * @vcpu: The vcpu pointer
> > @@ -210,31 +223,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
> >   */
> >  void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
> >  {
> > -	u64 reg;
> > -
> >  	if (val == 0)
> >  		return;
> >  
> >  	vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
> > -	reg = kvm_pmu_overflow_status(vcpu);
> > -	if (reg != 0)
> > -		kvm_vcpu_kick(vcpu);
> > +	kvm_pmu_check_overflow(vcpu);
> >  }
> >  
> >  static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
> >  {
> > -	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > -	bool overflow;
> > -
> >  	if (!kvm_arm_pmu_v3_ready(vcpu))
> >  		return;
> >  
> > -	overflow = !!kvm_pmu_overflow_status(vcpu);
> > -	if (pmu->irq_level != overflow) {
> > -		pmu->irq_level = overflow;
> > -		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > -				    pmu->irq_num, overflow);
> > -	}
> > +	kvm_pmu_check_overflow(vcpu);
> >  }
> >  
> >  /**
> > -- 
> > 2.9.3
> > 
> 
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 18:15             ` Paolo Bonzini
@ 2017-04-04 18:38               ` Christoffer Dall
  0 siblings, 0 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 18:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, kvmarm, kvm

On Tue, Apr 04, 2017 at 08:15:09PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 19:57, Christoffer Dall wrote:
> >> Right.  That code does
> >>
> >>                 tmp->arch.power_off = true;
> >>                 kvm_vcpu_kick(tmp);
> >>
> >> and I think what's really missing in arm.c is the "if (vcpu->mode ==
> >> EXITING_GUEST_MODE)" check that is found in x86.c.  Then pausing can
> >> also simply use kvm_vcpu_kick.
> > I see, that's why the cmpxchg() works the way it does.  We just still
> > need to move the vcpu->mode = IN_GUEST_MODE before our
> > with-interrupts-disabled check.
> > 
> > What I'm not sure is why you can get away without using a memory barrier
> > or WRITE_ONCE on x86, but is this simply because x86 is a strongly
> > ordered architecture?
> 
> x86 does have a memory barrier:
> 
>         vcpu->mode = IN_GUEST_MODE;
> 
>         srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx);
>         smp_mb__after_srcu_read_unlock();

duh, the long complicated barrier version made me totally miss it.
Sorry.

> 
>         /*
>          * This handles the case where a posted interrupt was
>          * notified with kvm_vcpu_kick.
>          */
>         if (kvm_lapic_enabled(vcpu)) {
>                 if (kvm_x86_ops->sync_pir_to_irr && vcpu->arch.apicv_active)
>                         kvm_x86_ops->sync_pir_to_irr(vcpu);
>         }
> 
>         if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
> 
> and WRITE_ONCE is not needed if you have a memory barrier (though I find it
> more self-documenting to use it anyway).
> 

ok, thanks.

-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection
  2017-03-31 16:06 ` [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection Andrew Jones
  2017-04-04 17:42   ` Christoffer Dall
@ 2017-04-04 18:51   ` Paolo Bonzini
  1 sibling, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-04 18:51 UTC (permalink / raw)
  To: Andrew Jones, kvmarm, kvm; +Cc: marc.zyngier, cdall



On 31/03/2017 18:06, Andrew Jones wrote:
> Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> kick meant to trigger the interrupt injection could be sent while
> the VCPU is outside guest mode, which means no IPI is sent, and
> after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> the updated GIC state until its next exit some time later for some
> other reason.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm/kvm/arm.c                |  1 +
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  virt/kvm/arm/arch_timer.c         |  1 +
>  virt/kvm/arm/vgic/vgic.c          | 12 ++++++++++--
>  5 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index afed5d44634d..0b8a6d6b3cb3 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -47,6 +47,7 @@
>  
>  #define KVM_REQ_PAUSE		8
>  #define KVM_REQ_POWER_OFF	9
> +#define KVM_REQ_IRQ_PENDING	10
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 7ed39060b1cf..a106feccf314 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -768,6 +768,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>  	 * trigger a world-switch round on the running physical CPU to set the
>  	 * virtual IRQ/FIQ fields in the HCR appropriately.
>  	 */
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  
>  	return 0;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index e78895f675d0..7057512b3474 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -44,6 +44,7 @@
>  
>  #define KVM_REQ_PAUSE		8
>  #define KVM_REQ_POWER_OFF	9
> +#define KVM_REQ_IRQ_PENDING	10
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 35d7100e0815..3c48abbf951b 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
>  	 * If the vcpu is blocked we want to wake it up so that it will see
>  	 * the timer has expired when entering the guest.
>  	 */
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 654dfd40e449..31fb89057f0c 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  		 * won't see this one until it exits for some other
>  		 * reason.
>  		 */
> -		if (vcpu)
> +		if (vcpu) {
> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  			kvm_vcpu_kick(vcpu);
> +		}
>  		return false;
>  	}
>  
> @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  	spin_unlock(&irq->irq_lock);
>  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  
>  	return true;
> @@ -654,6 +657,9 @@ void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu)
>  	spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  	vgic_flush_lr_state(vcpu);
>  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> +
> +	/* The GIC is now ready to deliver the IRQ. */
> +	clear_bit(KVM_REQ_IRQ_PENDING, &vcpu->requests);
>  }
>  
>  int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
> @@ -691,8 +697,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
>  	 * a good kick...
>  	 */
>  	kvm_for_each_vcpu(c, vcpu, kvm) {
> -		if (kvm_vgic_vcpu_pending_irq(vcpu))
> +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  			kvm_vcpu_kick(vcpu);
> +		}
>  	}
>  }
>  
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection
  2017-04-04 17:42   ` Christoffer Dall
  2017-04-04 18:27     ` Andrew Jones
@ 2017-04-04 18:59     ` Paolo Bonzini
  1 sibling, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-04 18:59 UTC (permalink / raw)
  To: Christoffer Dall, Andrew Jones; +Cc: marc.zyngier, kvmarm, kvm



On 04/04/2017 19:42, Christoffer Dall wrote:
> 
> this is not going to be called when we don't have the vgic, which means
> that if vcpu_interrupt_line() is used as you modify it above, the
> request will never get cleared.

Heh, I'll stop pretending I can give positive reviews of ARM patches and
keep bash^Wproviding constructive criticism. :)

In fact, I forgot to add that x86 recently moved to request-less
kvm_vcpu_kick when hardware interrupt injection is available.  This was
after I noticed that requests were just used as a workaround for
incorrect ordering of local_irq_disable, vcpu->mode=IN_GUEST_MODE, and
so on.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 18:18           ` Andrew Jones
@ 2017-04-04 18:59             ` Paolo Bonzini
  0 siblings, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-04 18:59 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, Christoffer Dall, kvmarm, kvm



On 04/04/2017 20:18, Andrew Jones wrote:
>> My understanding is that KVM-ARM is using KVM_REQ_VCPU_EXIT simply to
>> reuse the smp_call_function_many code in kvm_make_all_cpus_request.
>> Once you add EXITING_GUEST_MODE, ARM can just add a new function
>> kvm_kick_all_cpus and use it for both pause and power_off.
>
> I was wondering about the justification of
> 'if (vcpu->mode == EXITING_GUEST_MODE)' in the x86 code, as it seemed
> redundant to me with the requests.  I'll have another think on it to see
> if request-less kicks can be satisfied in all cases by this, as long as we
> have the mode setting, barrier, mode checking order ensured in vcpu run.

Yes, this is the justification.  You should add that to
kvm_arch_vcpu_ioctl_run to close the race window (as well as the
kvm_request_pending, just for good measure).  These two are not really
optional, they are part of how kvm_vcpu_exiting_guest_mode and requests
are supposed to work.  kvm_vcpu_exiting_guest_mode is optional, but ARM
is using it and it's a pity to undo it.

Once you have done this, you can choose whether to use requests or not
for pause and poweroff, but I think it will not be necessary.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 17:57     ` Andrew Jones
@ 2017-04-04 19:04       ` Christoffer Dall
  2017-04-04 20:10         ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 19:04 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, Apr 04, 2017 at 07:57:18PM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 06:04:17PM +0200, Christoffer Dall wrote:
> > On Fri, Mar 31, 2017 at 06:06:53PM +0200, Andrew Jones wrote:
> > > This not only ensures visibility of changes to pause by using
> > > atomic ops, but also plugs a small race where a vcpu could get its
> > > pause state enabled just after its last check before entering the
> > > guest. With this patch, while the vcpu will still initially enter
> > > the guest, it will exit immediately due to the IPI sent by the vcpu
> > > kick issued after making the vcpu request.
> > > 
> > > We use bitops, rather than kvm_make/check_request(), because we
> > > don't need the barriers they provide,
> > 
> > why not?
> 
> I'll add that it's because the only state of interest is the request bit
> itself.  When the request is observable then we're good to go, no need to
> ensure that at the time the request is observable, something else is too.
> 
> > 
> > > nor do we want the side-effect
> > > of kvm_check_request() clearing the request. For pause, only the
> > > requester should do the clearing.
> > > 
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > ---
> > >  arch/arm/include/asm/kvm_host.h   |  5 +----
> > >  arch/arm/kvm/arm.c                | 45 +++++++++++++++++++++++++++------------
> > >  arch/arm64/include/asm/kvm_host.h |  5 +----
> > >  3 files changed, 33 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > > index 31ee468ce667..52c25536d254 100644
> > > --- a/arch/arm/include/asm/kvm_host.h
> > > +++ b/arch/arm/include/asm/kvm_host.h
> > > @@ -45,7 +45,7 @@
> > >  #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
> > >  #endif
> > >  
> > > -#define KVM_REQ_VCPU_EXIT	8
> > > +#define KVM_REQ_PAUSE		8
> > >  
> > >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> > >  int __attribute_const__ kvm_target_cpu(void);
> > > @@ -173,9 +173,6 @@ struct kvm_vcpu_arch {
> > >  	/* vcpu power-off state */
> > >  	bool power_off;
> > >  
> > > -	 /* Don't run the guest (internal implementation need) */
> > > -	bool pause;
> > > -
> > >  	/* IO related fields */
> > >  	struct kvm_decode mmio_decode;
> > >  
> > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > index 314eb6abe1ff..f3bfbb5f3d96 100644
> > > --- a/arch/arm/kvm/arm.c
> > > +++ b/arch/arm/kvm/arm.c
> > > @@ -94,6 +94,18 @@ struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void)
> > >  
> > >  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
> > >  {
> > > +	/*
> > > +	 * If we return true from this function, then it means the vcpu is
> > > +	 * either in guest mode, or has already indicated that it's in guest
> > > +	 * mode. The indication is done by setting ->mode to IN_GUEST_MODE,
> > > +	 * and must be done before the final kvm_request_pending() read. It's
> > > +	 * important that the observability of that order be enforced and that
> > > +	 * the request receiving CPU can observe any new request before the
> > > +	 * requester issues a kick. Thus, the general barrier below pairs with
> > > +	 * the general barrier in kvm_arch_vcpu_ioctl_run() which divides the
> > > +	 * write to ->mode and the final request pending read.
> > > +	 */
> > 
> > I am having a hard time understanding this comment.  For example, I
> > don't understand the difference between 'is either in guest mode or has
> > already indicated it's in guest mode'.  Which case is which again, and
> > how are we checking for two cases below?
> > 
> > Also, the stuff about observability of an order is hard to follow, and
> > the comment assumes the reader is thinking about the specific race when
> > entering the guest.
> > 
> > I think we should focus on getting the documentation in place, refer to
> > the documentation from here, and be much more brief and say something
> > like:
> > 
> > 	/*
> > 	 * The memory barrier below pairs with the barrier in
> > 	 * kvm_arch_vcpu_ioctl_run() between writes to vcpu->mode
> > 	 * and reading vcpu->requests before entering the guest.
> > 	 *
> > 	 * Ensures that the VCPU thread's CPU can observe changes to
> > 	 * vcpu->requests written prior to calling this function before
> > 	 * it writes vcpu->mode = IN_GUEST_MODE, and correspondingly
> > 	 * ensures that this CPU observes vcpu->mode == IN_GUEST_MODE
> > 	 * only if the VCPU thread's CPU could observe writes to
> > 	 * vcpu->requests from this CPU.
> > 	 /
> > 
> > Is this correct?  I'm not really sure anymore?
> 
> It's confusing because we have cross dependencies on the negatives of
> two conditions.
> 
> Here's the cross dependencies:
> 
>   vcpu->mode = IN_GUEST_MODE;   ---   ---  kvm_make_request(REQ, vcpu);
>   smp_mb();                        \ /     smp_mb();
>                                     X
>                                    / \
>   if (kvm_request_pending(vcpu))<--   -->  if (vcpu->mode == IN_GUEST_MODE)
> 
> On each side the smp_mb() ensures no reordering of the pair of operations
> that each side has.  I.e. on the LHS the requests LOAD cannot be ordered
> before the mode STORE and on the RHS side the mode LOAD cannot be ordered
> before the requests STORE.  This is why they must be general barriers.
> 
> Now, for extra fun, the cross dependencies arise because we care about
> the cases when we *don't* observe the respective dependency.
> 
> Condition 1:
> 
>   The final requests check in vcpu run, if (kvm_request_pending(vcpu))
> 
>   What we really care about though is !kvm_request_pending(vcpu).  When
>   we observe !kvm_request_pending(vcpu) we know we're safe to enter the
>   guest.  We know that any thread in the process of making a request has
>   yet to check 'if (vcpu->mode == IN_GUEST_MODE)', so if it was just about
>   to set a request, then it doesn't matter, as it will observe mode ==
>   IN_GUEST_MODE afterwards (thanks to the paired smp_mb()) and send the
>   IPI.
> 
> Condition 2:
> 
>   The kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE check we do
>   here in this function, kvm_arch_vcpu_should_kick()
> 
>   What we really care about is (vcpu->mode != IN_GUEST_MODE).  When
>   we observe (vcpu->mode != IN_GUEST_MODE) we know we're safe to not
>   send the IPI.  We're safe because, by not observing IN_GUEST_MODE,
>   we know the VCPU thread has yet to do its final requests check,
>   since, thanks to the paired smp_mb(), we know that order must be
>   enforced.
> 

This feels convincing, but I have a few concerns (which may be mostly
because I'm getting tired, but here goes, for the record):

 - We don't just care about != IN_GUEST_MODE,
   kvm_make_all_cpus_request() checks for !OUTSIDE_GUEST_MODE, but I
   don't think this changes what you said above.

 - (On a related work, I suddenly felt it weird that
   kvm_make_all_cpus_request() doesn't wake up sleeping VCPUs, but only
   sends an IPI; does this mean that calling this function should be
   followed by a kick() for each VCPU?  Maybe Radim was looking at this
   in his series already.)

 - In the explanation you wrote, you use the term 'we' a lot, but when
   talking about SMP barriers, I think it only makes sense to talk about
   actions and observations between multiple CPUs and we have to be
   specific about which CPU observes or does what with respect to the
   other.  Maybe I'm being a stickler here, but there something here
   which is making me uneasy.

 - Finally, it feels very hard to prove the correctness of this, and
   equally hard to test it (given how long we've been running with
   apparently racy code).  I would hope that we could abstract some of
   this into architecture generic things, that someone who eat memory
   barriers for breakfast could help us verify, but again, maybe this is
   Radim's series I'm asking for here.

> 
> I'll try to merge what I originally wrote, with your suggestion, and
> some of what I just wrote now.  But, also like you suggest, I'll put
> the bulk of it in the document and then just reference it.

Maybe that will solve all my concerns.

> > 
> > There's also the obvious fact that we're adding this memory barrier
> > inside a funciton that checks if we should kick a vcpu, and there's no
> > documentation that says that this is always called in association with
> > setting a request, is there?
> 
> You're right, there's nothing forcing this.  Just the undocumented
> kvm_cpu_kick() is needed after a request pattern.  I can try to add
> something to the doc to highlight the importance of kvm_cpu_kick(),
> which calls kvm_arch_vcpu_should_kick() and therefore is a fairly
> safe place to put an explicit barrier if the architecture requires one.
> 

Or we could add it around the prototype for kvm_make_all_cpus_request()
or in the caller, just that it requires that implementations of the
functions include a barrier.

> > 
> > I finally don't undertand why this would be a requirement only on ARM?
> 
> At least x86's cmpxchg() always produces the equivalent of a general
> memory barrier before and after the exchange, not just on success, like
> ARM.
> 

ok, I see.

> > 
> > > +	smp_mb();
> > >  	return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
> > >  }
> > >  
> > > @@ -404,7 +416,8 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> > >  int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
> > >  {
> > >  	return ((!!v->arch.irq_lines || kvm_vgic_vcpu_pending_irq(v))
> > > -		&& !v->arch.power_off && !v->arch.pause);
> > > +		&& !v->arch.power_off
> > > +		&& !test_bit(KVM_REQ_PAUSE, &v->requests));
> > >  }
> > >  
> > >  /* Just ensure a guest exit from a particular CPU */
> > > @@ -535,17 +548,12 @@ bool kvm_arch_intc_initialized(struct kvm *kvm)
> > >  
> > >  void kvm_arm_halt_guest(struct kvm *kvm)
> > >  {
> > > -	int i;
> > > -	struct kvm_vcpu *vcpu;
> > > -
> > > -	kvm_for_each_vcpu(i, vcpu, kvm)
> > > -		vcpu->arch.pause = true;
> > > -	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
> > > +	kvm_make_all_cpus_request(kvm, KVM_REQ_PAUSE);
> > >  }
> > >  
> > >  void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> > >  {
> > > -	vcpu->arch.pause = true;
> > > +	set_bit(KVM_REQ_PAUSE, &vcpu->requests);
> > >  	kvm_vcpu_kick(vcpu);
> > >  }
> > >  
> > > @@ -553,7 +561,7 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> > >  {
> > >  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> > >  
> > > -	vcpu->arch.pause = false;
> > > +	clear_bit(KVM_REQ_PAUSE, &vcpu->requests);
> > >  	swake_up(wq);
> > >  }
> > >  
> > > @@ -571,7 +579,7 @@ static void vcpu_sleep(struct kvm_vcpu *vcpu)
> > >  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> > >  
> > >  	swait_event_interruptible(*wq, ((!vcpu->arch.power_off) &&
> > > -				       (!vcpu->arch.pause)));
> > > +		(!test_bit(KVM_REQ_PAUSE, &vcpu->requests))));
> > >  }
> > >  
> > >  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> > > @@ -624,7 +632,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > >  
> > >  		update_vttbr(vcpu->kvm);
> > >  
> > > -		if (vcpu->arch.power_off || vcpu->arch.pause)
> > > +		if (vcpu->arch.power_off || test_bit(KVM_REQ_PAUSE, &vcpu->requests))
> > >  			vcpu_sleep(vcpu);
> > >  
> > >  		/*
> > > @@ -647,8 +655,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > >  			run->exit_reason = KVM_EXIT_INTR;
> > >  		}
> > >  
> > > +		/*
> > > +		 * Indicate we're in guest mode now, before doing a final
> > > +		 * check for pending vcpu requests. The general barrier
> > > +		 * pairs with the one in kvm_arch_vcpu_should_kick().
> > > +		 * Please see the comment there for more details.
> > > +		 */
> > > +		WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);
> > > +		smp_mb();
> > 
> > There are two changes here:
> > 
> > there's a change from a normal write to a WRITE_ONCE and there's also a
> > change to that adds a memory barrier.  I feel like I'd like to know if
> > these are tied together or two separate cleanups.  I also wonder if we
> > could split out more general changes from the pause thing to have a
> > better log of why we changed the run loop?
> > 
> > It looks to me like there could be a separate patch that encapsulated
> > the reads and writes of vcpu->mode into a function that does the
> > WRITE_ONCE and READ_ONCE with a nice comment.
> 
> The thought crossed my mind as well, I guess I should have followed that
> thought through.  Will do.
> 

Cool.

> > 
> > > +
> > >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> > > -			vcpu->arch.power_off || vcpu->arch.pause) {
> > > +			vcpu->arch.power_off || kvm_request_pending(vcpu)) {
> > > +			WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> > >  			local_irq_enable();
> > >  			kvm_pmu_sync_hwstate(vcpu);
> > >  			kvm_timer_sync_hwstate(vcpu);
> > > @@ -664,11 +682,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > >  		 */
> > >  		trace_kvm_entry(*vcpu_pc(vcpu));
> > >  		guest_enter_irqoff();
> > > -		vcpu->mode = IN_GUEST_MODE;
> > >  
> > >  		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
> > >  
> > > -		vcpu->mode = OUTSIDE_GUEST_MODE;
> > > +		WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE);
> > >  		vcpu->stat.exits++;
> > >  		/*
> > >  		 * Back from guest
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index e7705e7bb07b..6e1271a77e92 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -42,7 +42,7 @@
> > >  
> > >  #define KVM_VCPU_MAX_FEATURES 4
> > >  
> > > -#define KVM_REQ_VCPU_EXIT	8
> > > +#define KVM_REQ_PAUSE		8
> > >  
> > >  int __attribute_const__ kvm_target_cpu(void);
> > >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > > @@ -256,9 +256,6 @@ struct kvm_vcpu_arch {
> > >  	/* vcpu power-off state */
> > >  	bool power_off;
> > >  
> > > -	/* Don't run the guest (internal implementation need) */
> > > -	bool pause;
> > > -
> > >  	/* IO related fields */
> > >  	struct kvm_decode mmio_decode;
> > >  
> > > -- 
> > > 2.9.3
> > 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick
  2017-04-04 18:29     ` Andrew Jones
@ 2017-04-04 19:35       ` Christoffer Dall
  0 siblings, 0 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 19:35 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Tue, Apr 04, 2017 at 08:29:18PM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 07:46:12PM +0200, Christoffer Dall wrote:
> > On Fri, Mar 31, 2017 at 06:06:56PM +0200, Andrew Jones wrote:
> > > Refactor PMU overflow handling in order to remove the request-less
> > > vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
> > > there should be no chance that a kick sent at just the wrong time
> > > (between the VCPU's call to kvm_pmu_flush_hwstate() and before it
> > > enters guest mode) results in a failure for the guest to see updated
> > > GIC state until its next exit some time later for some other reason.
> > > 
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > ---
> > >  virt/kvm/arm/pmu.c | 29 +++++++++++++++--------------
> > >  1 file changed, 15 insertions(+), 14 deletions(-)
> > > 
> > > diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> > > index 69ccce308458..9d725f3afb11 100644
> > > --- a/virt/kvm/arm/pmu.c
> > > +++ b/virt/kvm/arm/pmu.c
> > > @@ -203,6 +203,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
> > >  	return reg;
> > >  }
> > >  
> > > +static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
> > > +{
> > > +	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > > +	bool overflow;
> > > +
> > > +	overflow = !!kvm_pmu_overflow_status(vcpu);
> > > +	if (pmu->irq_level != overflow) {
> > > +		pmu->irq_level = overflow;
> > > +		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > > +				    pmu->irq_num, overflow);
> > > +	}
> > > +}
> > > +
> > 
> > If we are changing the way the PMU works to adjust the interrupt
> > signaling whenever the PMU changes its internal state, do we still ahv
> > to call kvm_pmu_update_state() from each flush/sync path now?
> 
> The thought crossed my mind to rework that completely, in order to remove
> that flush/sync, but then went for the smaller patch for this series.  I
> can take a look at it though.
> 

Actually, now when I actually read what this code will be doing, the
extra thing in flush/sync won't do any work because it will find that
omu->irq_level == overflow, so never mind - we can improve that later as
an optimization patch.

Let's focus on getting it right.

Thanks,
-Christoffer


> > 
> > >  /**
> > >   * kvm_pmu_overflow_set - set PMU overflow interrupt
> > >   * @vcpu: The vcpu pointer
> > > @@ -210,31 +223,19 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
> > >   */
> > >  void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
> > >  {
> > > -	u64 reg;
> > > -
> > >  	if (val == 0)
> > >  		return;
> > >  
> > >  	vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
> > > -	reg = kvm_pmu_overflow_status(vcpu);
> > > -	if (reg != 0)
> > > -		kvm_vcpu_kick(vcpu);
> > > +	kvm_pmu_check_overflow(vcpu);
> > >  }
> > >  
> > >  static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
> > >  {
> > > -	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > > -	bool overflow;
> > > -
> > >  	if (!kvm_arm_pmu_v3_ready(vcpu))
> > >  		return;
> > >  
> > > -	overflow = !!kvm_pmu_overflow_status(vcpu);
> > > -	if (pmu->irq_level != overflow) {
> > > -		pmu->irq_level = overflow;
> > > -		kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> > > -				    pmu->irq_num, overflow);
> > > -	}
> > > +	kvm_pmu_check_overflow(vcpu);
> > >  }
> > >  
> > >  /**
> > > -- 
> > > 2.9.3
> > > 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on
  2017-03-31 16:06 ` [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on Andrew Jones
@ 2017-04-04 19:42   ` Christoffer Dall
  2017-04-05  8:35     ` Andrew Jones
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 19:42 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Levente Kurusa, kvm, marc.zyngier, pbonzini, kvmarm

On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> From: Levente Kurusa <lkurusa@redhat.com>
> 
> When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> then it's possible for them to both enter the target vcpu's setup
> at the same time. This results in unexpected behaviors at best,
> and the potential for some nasty bugs at worst.
> 
> Signed-off-by: Levente Kurusa <lkurusa@redhat.com>
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/kvm/psci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index f732484abc7a..0204daa899b1 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	 */
>  	if (!vcpu)
>  		return PSCI_RET_INVALID_PARAMS;
> -	if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> +
> +	if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
>  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
>  			return PSCI_RET_ALREADY_ON;
>  		else
> @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  	 * the general puspose registers are undefined upon CPU_ON.
>  	 */
>  	vcpu_set_reg(vcpu, 0, context_id);
> -	clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
>  
>  	wq = kvm_arch_vcpu_wq(vcpu);
>  	swake_up(wq);
> -- 
> 2.9.3
> 

Depending on what you end up doing with the requests, if you keep the
bool flag you could just use the kvm->lock mutex instead.

Have you considered if there are any potential races between
kvm_psci_system_off() being called on one VCPU while two other VPCUs are
turning on the same CPU that is being turend off as part of system-wide
power down as well?

I'm wondering if this means we should take the kvm->lock at a higher
level when handling PSCI events...

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR
  2017-03-31 16:06 ` [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR Andrew Jones
@ 2017-04-04 19:44   ` Christoffer Dall
  2017-04-05  8:50     ` Andrew Jones
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-04 19:44 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Fri, Mar 31, 2017 at 06:06:58PM +0200, Andrew Jones wrote:
> Cache the MPIDR in the vcpu structure to fix potential races that
> can arise between vcpu reset and the extraction of the MPIDR from
> the sys-reg array.

I don't understand the race, sorry.

Can you be more specific in where this goes wrong and exactly what this
fixes?

Thanks,
-Christoffer

> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/include/asm/kvm_emulate.h   |  2 +-
>  arch/arm/include/asm/kvm_host.h      |  3 +++
>  arch/arm/kvm/coproc.c                | 20 ++++++++++++--------
>  arch/arm64/include/asm/kvm_emulate.h |  2 +-
>  arch/arm64/include/asm/kvm_host.h    |  3 +++
>  arch/arm64/kvm/sys_regs.c            | 27 ++++++++++++++-------------
>  6 files changed, 34 insertions(+), 23 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> index 9a8a45aaf19a..1b922de46785 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -213,7 +213,7 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
>  
>  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK;
> +	return vcpu->arch.vmpidr & MPIDR_HWID_BITMASK;
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 0b8a6d6b3cb3..e0f461f0af67 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -151,6 +151,9 @@ struct kvm_vcpu_arch {
>  	/* The CPU type we expose to the VM */
>  	u32 midr;
>  
> +	/* vcpu MPIDR */
> +	u32 vmpidr;
> +
>  	/* HYP trapping configuration */
>  	u32 hcr;
>  
> diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> index 3e5e4194ef86..c4df7c9c8ddb 100644
> --- a/arch/arm/kvm/coproc.c
> +++ b/arch/arm/kvm/coproc.c
> @@ -101,14 +101,18 @@ int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
>  {
> -	/*
> -	 * Compute guest MPIDR. We build a virtual cluster out of the
> -	 * vcpu_id, but we read the 'U' bit from the underlying
> -	 * hardware directly.
> -	 */
> -	vcpu_cp15(vcpu, c0_MPIDR) = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> -				     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
> -				     (vcpu->vcpu_id & 3));
> +	if (!vcpu->arch.vmpidr) {
> +		/*
> +		 * Compute guest MPIDR. We build a virtual cluster out of the
> +		 * vcpu_id, but we read the 'U' bit from the underlying
> +		 * hardware directly.
> +		 */
> +		u32 mpidr = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> +			     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
> +			     (vcpu->vcpu_id & 3));
> +		vcpu->arch.vmpidr = mpidr;
> +	}
> +	vcpu_cp15(vcpu, c0_MPIDR) = vcpu->arch.vmpidr;
>  }
>  
>  /* TRM entries A7:4.3.31 A15:4.3.28 - RO WI */
> diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> index f5ea0ba70f07..c138bb15b507 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -242,7 +242,7 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
>  
>  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
>  {
> -	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> +	return vcpu->arch.vmpidr_el2 & MPIDR_HWID_BITMASK;
>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7057512b3474..268c10d95a79 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -198,6 +198,9 @@ typedef struct kvm_cpu_context kvm_cpu_context_t;
>  struct kvm_vcpu_arch {
>  	struct kvm_cpu_context ctxt;
>  
> +	/* vcpu MPIDR */
> +	u64 vmpidr_el2;
> +
>  	/* HYP configuration */
>  	u64 hcr_el2;
>  	u32 mdcr_el2;
> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> index 0e26f8c2b56f..517aed6d8016 100644
> --- a/arch/arm64/kvm/sys_regs.c
> +++ b/arch/arm64/kvm/sys_regs.c
> @@ -431,19 +431,20 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  
>  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
>  {
> -	u64 mpidr;
> -
> -	/*
> -	 * Map the vcpu_id into the first three affinity level fields of
> -	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
> -	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> -	 * of the GICv3 to be able to address each CPU directly when
> -	 * sending IPIs.
> -	 */
> -	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> -	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> -	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> -	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
> +	if (!vcpu->arch.vmpidr_el2) {
> +		/*
> +		 * Map the vcpu_id into the first three affinity level fields
> +		 * of the MPIDR. We limit the number of VCPUs in level 0 due to
> +		 * a limitation of 16 CPUs in that level in the ICC_SGIxR
> +		 * registers of the GICv3, which are used to address each CPU
> +		 * directly when sending IPIs.
> +		 */
> +		u64 mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> +		mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> +		mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> +		vcpu->arch.vmpidr_el2 = (1ULL << 31) | mpidr;
> +	}
> +	vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu->arch.vmpidr_el2;
>  }
>  
>  static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 19:04       ` Christoffer Dall
@ 2017-04-04 20:10         ` Paolo Bonzini
  2017-04-05  7:09           ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-04 20:10 UTC (permalink / raw)
  To: Christoffer Dall, Andrew Jones; +Cc: marc.zyngier, kvmarm, kvm



On 04/04/2017 21:04, Christoffer Dall wrote:
>  - (On a related work, I suddenly felt it weird that
>    kvm_make_all_cpus_request() doesn't wake up sleeping VCPUs, but only
>    sends an IPI; does this mean that calling this function should be
>    followed by a kick() for each VCPU?  Maybe Radim was looking at this
>    in his series already.)

Yes, kvm_make_all_cpus_request in x86 is only used for "non urgent"
requests, i.e. things to do before the next guest entry.

>  - In the explanation you wrote, you use the term 'we' a lot, but when
>    talking about SMP barriers, I think it only makes sense to talk about
>    actions and observations between multiple CPUs and we have to be
>    specific about which CPU observes or does what with respect to the
>    other.  Maybe I'm being a stickler here, but there something here
>    which is making me uneasy.

The write1-mb-if(read2) / write2-mb-if(read1) pattern is pretty common,
so I think it is justified to cut the ordering on the reasoning and just
focus on what the two memory locations and conditions mean.  But I'd
wait for v3, since I'm sure that Drew also understands the
synchronization better.

>  - Finally, it feels very hard to prove the correctness of this, and
>    equally hard to test it (given how long we've been running with
>    apparently racy code).  I would hope that we could abstract some of
>    this into architecture generic things, that someone who eat memory
>    barriers for breakfast could help us verify, but again, maybe this is
>    Radim's series I'm asking for here.

What I can do here is to suggest copying the paradigms from x86, which
is quite battle tested (Windows hammers it really hard).

For QEMU I did use model checking in the past for some similarly hairy
synchronization code, but that is really just "executable documentation"
because the model is not written in C.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-04 20:10         ` Paolo Bonzini
@ 2017-04-05  7:09           ` Christoffer Dall
  2017-04-05 11:37             ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-05  7:09 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Andrew Jones, kvmarm, kvm, marc.zyngier, rkrcmar

On Tue, Apr 04, 2017 at 10:10:15PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/04/2017 21:04, Christoffer Dall wrote:
> >  - (On a related work, I suddenly felt it weird that
> >    kvm_make_all_cpus_request() doesn't wake up sleeping VCPUs, but only
> >    sends an IPI; does this mean that calling this function should be
> >    followed by a kick() for each VCPU?  Maybe Radim was looking at this
> >    in his series already.)
> 
> Yes, kvm_make_all_cpus_request in x86 is only used for "non urgent"
> requests, i.e. things to do before the next guest entry.
> 

Ah, another good thing to document somewhere.

> >  - In the explanation you wrote, you use the term 'we' a lot, but when
> >    talking about SMP barriers, I think it only makes sense to talk about
> >    actions and observations between multiple CPUs and we have to be
> >    specific about which CPU observes or does what with respect to the
> >    other.  Maybe I'm being a stickler here, but there something here
> >    which is making me uneasy.
> 
> The write1-mb-if(read2) / write2-mb-if(read1) pattern is pretty common,
> so I think it is justified to cut the ordering on the reasoning and just
> focus on what the two memory locations and conditions mean.

ok, but the pattern above was not common to me (and I'm pretty sure I'm
not the only fool in the bunch here), so if we can reference something
that explains that this is a known pattern which has been tried and
proven, that would be even better.

> But I'd
> wait for v3, since I'm sure that Drew also understands the
> synchronization better.
> 

Yes, I'm confident that v3 will be great ;)

> >  - Finally, it feels very hard to prove the correctness of this, and
> >    equally hard to test it (given how long we've been running with
> >    apparently racy code).  I would hope that we could abstract some of
> >    this into architecture generic things, that someone who eat memory
> >    barriers for breakfast could help us verify, but again, maybe this is
> >    Radim's series I'm asking for here.
> 
> What I can do here is to suggest copying the paradigms from x86, which
> is quite battle tested (Windows hammers it really hard).

That sounds reasonable, but I think part of the problem was that we
simply didn't understand what the paradigms were (see the
kvm_make_all_cpus_request above as an example), so Drew's action about
documenting what this all is and the constraints of using it is really
important for me to do that.

> 
> For QEMU I did use model checking in the past for some similarly hairy
> synchronization code, but that is really just "executable documentation"
> because the model is not written in C.
> 

I played with using blast on some of the KVM/ARM code a long time ago,
and while I was able to find a bug with it, it was sort of an obvious
bug, and the things I was able to do with it was pretty limited to the
problems I could imagine myself anyhow.  Perhaps this is what you mean
with executable documentation.  In any case, I feel it starts with
documentation.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on
  2017-04-04 19:42   ` Christoffer Dall
@ 2017-04-05  8:35     ` Andrew Jones
  2017-04-05  8:50       ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-05  8:35 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Levente Kurusa, kvm, marc.zyngier, pbonzini, kvmarm

On Tue, Apr 04, 2017 at 09:42:08PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> > From: Levente Kurusa <lkurusa@redhat.com>
> > 
> > When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> > then it's possible for them to both enter the target vcpu's setup
> > at the same time. This results in unexpected behaviors at best,
> > and the potential for some nasty bugs at worst.
> > 
> > Signed-off-by: Levente Kurusa <lkurusa@redhat.com>
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/arm/kvm/psci.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > index f732484abc7a..0204daa899b1 100644
> > --- a/arch/arm/kvm/psci.c
> > +++ b/arch/arm/kvm/psci.c
> > @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> >  	 */
> >  	if (!vcpu)
> >  		return PSCI_RET_INVALID_PARAMS;
> > -	if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > +
> > +	if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> >  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> >  			return PSCI_RET_ALREADY_ON;
> >  		else
> > @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> >  	 * the general puspose registers are undefined upon CPU_ON.
> >  	 */
> >  	vcpu_set_reg(vcpu, 0, context_id);
> > -	clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
> >  
> >  	wq = kvm_arch_vcpu_wq(vcpu);
> >  	swake_up(wq);
> > -- 
> > 2.9.3
> > 
> 
> Depending on what you end up doing with the requests, if you keep the
> bool flag you could just use the kvm->lock mutex instead.
> 
> Have you considered if there are any potential races between
> kvm_psci_system_off() being called on one VCPU while two other VPCUs are
> turning on the same CPU that is being turend off as part of system-wide
> power down as well?

Sounds like a nice unit test.  I haven't considered it, but I guess
the kvm_psci_system_off/reset calling VCPU will ultimately "win", as
it'll cause an exit to userspace that initiates a shutdown/reset.
When the VCPUs are restarted then vcpu init should reset the power_off
state correctly.  As long as the race this patch addresses is fixed, then
I'm not sure there should be any risk with the actual system_off/reset
being delayed wrt a vcpu being "on'ed" again, nor with there being more
than one VCPU trying to "on" it at the same time.

> 
> I'm wondering if this means we should take the kvm->lock at a higher
> level when handling PSCI events...

That would simplify our analysis of the PSCI emulation, but I'm not
sure we want to give a guest the power to constantly acquire that
mutex with a barrage of PSCI calls.  Maybe we should create a PSCI
mutex?  In order to avoid holding it too long we may want power_off to
be more than a boolean though, i.e. the PENDING state might also be
a good idea to represent.

Thanks,
drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on
  2017-04-05  8:35     ` Andrew Jones
@ 2017-04-05  8:50       ` Christoffer Dall
  2017-04-05  9:12         ` Andrew Jones
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-05  8:50 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Levente Kurusa, kvm, marc.zyngier, pbonzini, kvmarm

On Wed, Apr 05, 2017 at 10:35:59AM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 09:42:08PM +0200, Christoffer Dall wrote:
> > On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> > > From: Levente Kurusa <lkurusa@redhat.com>
> > > 
> > > When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> > > then it's possible for them to both enter the target vcpu's setup
> > > at the same time. This results in unexpected behaviors at best,
> > > and the potential for some nasty bugs at worst.
> > > 
> > > Signed-off-by: Levente Kurusa <lkurusa@redhat.com>
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > ---
> > >  arch/arm/kvm/psci.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > > index f732484abc7a..0204daa899b1 100644
> > > --- a/arch/arm/kvm/psci.c
> > > +++ b/arch/arm/kvm/psci.c
> > > @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> > >  	 */
> > >  	if (!vcpu)
> > >  		return PSCI_RET_INVALID_PARAMS;
> > > -	if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > +
> > > +	if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > >  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> > >  			return PSCI_RET_ALREADY_ON;
> > >  		else
> > > @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> > >  	 * the general puspose registers are undefined upon CPU_ON.
> > >  	 */
> > >  	vcpu_set_reg(vcpu, 0, context_id);
> > > -	clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
> > >  
> > >  	wq = kvm_arch_vcpu_wq(vcpu);
> > >  	swake_up(wq);
> > > -- 
> > > 2.9.3
> > > 
> > 
> > Depending on what you end up doing with the requests, if you keep the
> > bool flag you could just use the kvm->lock mutex instead.
> > 
> > Have you considered if there are any potential races between
> > kvm_psci_system_off() being called on one VCPU while two other VPCUs are
> > turning on the same CPU that is being turend off as part of system-wide
> > power down as well?
> 
> Sounds like a nice unit test.  I haven't considered it, but I guess
> the kvm_psci_system_off/reset calling VCPU will ultimately "win", as
> it'll cause an exit to userspace that initiates a shutdown/reset.
> When the VCPUs are restarted then vcpu init should reset the power_off
> state correctly.  As long as the race this patch addresses is fixed, then
> I'm not sure there should be any risk with the actual system_off/reset
> being delayed wrt a vcpu being "on'ed" again, nor with there being more
> than one VCPU trying to "on" it at the same time.
> 
> > 
> > I'm wondering if this means we should take the kvm->lock at a higher
> > level when handling PSCI events...
> 
> That would simplify our analysis of the PSCI emulation, but I'm not
> sure we want to give a guest the power to constantly acquire that
> mutex with a barrage of PSCI calls.  Maybe we should create a PSCI
> mutex?  In order to avoid holding it too long we may want power_off to
> be more than a boolean though, i.e. the PENDING state might also be
> a good idea to represent.
> 

Hmm, the kvm->lock mutex is per-VM, so if a VM wants to use its CPU
resources by taking its own mutex, I don't really see the problem.

-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR
  2017-04-04 19:44   ` Christoffer Dall
@ 2017-04-05  8:50     ` Andrew Jones
  2017-04-05 11:03       ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-05  8:50 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Tue, Apr 04, 2017 at 09:44:39PM +0200, Christoffer Dall wrote:
> On Fri, Mar 31, 2017 at 06:06:58PM +0200, Andrew Jones wrote:
> > Cache the MPIDR in the vcpu structure to fix potential races that
> > can arise between vcpu reset and the extraction of the MPIDR from
> > the sys-reg array.
> 
> I don't understand the race, sorry.
> 
> Can you be more specific in where this goes wrong and exactly what this
> fixes?
> 

At the start of kvm_psci_vcpu_on() we look up the vcpu struct of
the target vcpu by MPIDR.

 vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);

This necessarily comes before the newly added

 if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests))

If another vcpu is trying to PSCI_ON the same target vcpu at the same
time, but is further along, i.e. already past the test-and-clear and
even in kvm_reset_vcpu(), then there's a chance it has already called
into kvm_reset_sys_regs(), which does

 /* Catch someone adding a register without putting in reset entry. */
 memset(&vcpu->arch.ctxt.sys_regs, 0x42, sizeof(vcpu->arch.ctxt.sys_regs));

but has not yet called reset_mpidr(). In that case the kvm_mpidr_to_vcpu()
pointed out above will fail to find the vcpu, which results in the PSCI_ON
call returning PSCI_RET_INVALID_PARAMS, as can be seen from this snip
of kvm_psci_vcpu_on()

 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
 {
        struct kvm *kvm = source_vcpu->kvm;
        struct kvm_vcpu *vcpu = NULL;
        struct swait_queue_head *wq;
        unsigned long cpu_id;
        unsigned long context_id;
        phys_addr_t target_pc;

        cpu_id = vcpu_get_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK;
        if (vcpu_mode_is_32bit(source_vcpu))
                cpu_id &= ~((u32) 0);

        vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);

        /*
         * Make sure the caller requested a valid CPU and that the CPU is
         * turned off.
         */
        if (!vcpu)
                return PSCI_RET_INVALID_PARAMS;

        if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
...

Thanks,
drew

> Thanks,
> -Christoffer
> 
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/arm/include/asm/kvm_emulate.h   |  2 +-
> >  arch/arm/include/asm/kvm_host.h      |  3 +++
> >  arch/arm/kvm/coproc.c                | 20 ++++++++++++--------
> >  arch/arm64/include/asm/kvm_emulate.h |  2 +-
> >  arch/arm64/include/asm/kvm_host.h    |  3 +++
> >  arch/arm64/kvm/sys_regs.c            | 27 ++++++++++++++-------------
> >  6 files changed, 34 insertions(+), 23 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> > index 9a8a45aaf19a..1b922de46785 100644
> > --- a/arch/arm/include/asm/kvm_emulate.h
> > +++ b/arch/arm/include/asm/kvm_emulate.h
> > @@ -213,7 +213,7 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
> >  
> >  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
> >  {
> > -	return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK;
> > +	return vcpu->arch.vmpidr & MPIDR_HWID_BITMASK;
> >  }
> >  
> >  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 0b8a6d6b3cb3..e0f461f0af67 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -151,6 +151,9 @@ struct kvm_vcpu_arch {
> >  	/* The CPU type we expose to the VM */
> >  	u32 midr;
> >  
> > +	/* vcpu MPIDR */
> > +	u32 vmpidr;
> > +
> >  	/* HYP trapping configuration */
> >  	u32 hcr;
> >  
> > diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> > index 3e5e4194ef86..c4df7c9c8ddb 100644
> > --- a/arch/arm/kvm/coproc.c
> > +++ b/arch/arm/kvm/coproc.c
> > @@ -101,14 +101,18 @@ int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
> >  {
> > -	/*
> > -	 * Compute guest MPIDR. We build a virtual cluster out of the
> > -	 * vcpu_id, but we read the 'U' bit from the underlying
> > -	 * hardware directly.
> > -	 */
> > -	vcpu_cp15(vcpu, c0_MPIDR) = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> > -				     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
> > -				     (vcpu->vcpu_id & 3));
> > +	if (!vcpu->arch.vmpidr) {
> > +		/*
> > +		 * Compute guest MPIDR. We build a virtual cluster out of the
> > +		 * vcpu_id, but we read the 'U' bit from the underlying
> > +		 * hardware directly.
> > +		 */
> > +		u32 mpidr = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> > +			     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
> > +			     (vcpu->vcpu_id & 3));
> > +		vcpu->arch.vmpidr = mpidr;
> > +	}
> > +	vcpu_cp15(vcpu, c0_MPIDR) = vcpu->arch.vmpidr;
> >  }
> >  
> >  /* TRM entries A7:4.3.31 A15:4.3.28 - RO WI */
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > index f5ea0ba70f07..c138bb15b507 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -242,7 +242,7 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
> >  
> >  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
> >  {
> > -	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> > +	return vcpu->arch.vmpidr_el2 & MPIDR_HWID_BITMASK;
> >  }
> >  
> >  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 7057512b3474..268c10d95a79 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -198,6 +198,9 @@ typedef struct kvm_cpu_context kvm_cpu_context_t;
> >  struct kvm_vcpu_arch {
> >  	struct kvm_cpu_context ctxt;
> >  
> > +	/* vcpu MPIDR */
> > +	u64 vmpidr_el2;
> > +
> >  	/* HYP configuration */
> >  	u64 hcr_el2;
> >  	u32 mdcr_el2;
> > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > index 0e26f8c2b56f..517aed6d8016 100644
> > --- a/arch/arm64/kvm/sys_regs.c
> > +++ b/arch/arm64/kvm/sys_regs.c
> > @@ -431,19 +431,20 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> >  
> >  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> >  {
> > -	u64 mpidr;
> > -
> > -	/*
> > -	 * Map the vcpu_id into the first three affinity level fields of
> > -	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
> > -	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> > -	 * of the GICv3 to be able to address each CPU directly when
> > -	 * sending IPIs.
> > -	 */
> > -	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> > -	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> > -	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> > -	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
> > +	if (!vcpu->arch.vmpidr_el2) {
> > +		/*
> > +		 * Map the vcpu_id into the first three affinity level fields
> > +		 * of the MPIDR. We limit the number of VCPUs in level 0 due to
> > +		 * a limitation of 16 CPUs in that level in the ICC_SGIxR
> > +		 * registers of the GICv3, which are used to address each CPU
> > +		 * directly when sending IPIs.
> > +		 */
> > +		u64 mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> > +		mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> > +		mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> > +		vcpu->arch.vmpidr_el2 = (1ULL << 31) | mpidr;
> > +	}
> > +	vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu->arch.vmpidr_el2;
> >  }
> >  
> >  static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > -- 
> > 2.9.3
> > 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on
  2017-04-05  8:50       ` Christoffer Dall
@ 2017-04-05  9:12         ` Andrew Jones
  2017-04-05  9:30           ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-05  9:12 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Levente Kurusa, kvm, marc.zyngier, pbonzini, kvmarm

On Wed, Apr 05, 2017 at 10:50:05AM +0200, Christoffer Dall wrote:
> On Wed, Apr 05, 2017 at 10:35:59AM +0200, Andrew Jones wrote:
> > On Tue, Apr 04, 2017 at 09:42:08PM +0200, Christoffer Dall wrote:
> > > On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> > > > From: Levente Kurusa <lkurusa@redhat.com>
> > > > 
> > > > When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> > > > then it's possible for them to both enter the target vcpu's setup
> > > > at the same time. This results in unexpected behaviors at best,
> > > > and the potential for some nasty bugs at worst.
> > > > 
> > > > Signed-off-by: Levente Kurusa <lkurusa@redhat.com>
> > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > ---
> > > >  arch/arm/kvm/psci.c | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > > > index f732484abc7a..0204daa899b1 100644
> > > > --- a/arch/arm/kvm/psci.c
> > > > +++ b/arch/arm/kvm/psci.c
> > > > @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> > > >  	 */
> > > >  	if (!vcpu)
> > > >  		return PSCI_RET_INVALID_PARAMS;
> > > > -	if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > > +
> > > > +	if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > >  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> > > >  			return PSCI_RET_ALREADY_ON;
> > > >  		else
> > > > @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> > > >  	 * the general puspose registers are undefined upon CPU_ON.
> > > >  	 */
> > > >  	vcpu_set_reg(vcpu, 0, context_id);
> > > > -	clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
> > > >  
> > > >  	wq = kvm_arch_vcpu_wq(vcpu);
> > > >  	swake_up(wq);
> > > > -- 
> > > > 2.9.3
> > > > 
> > > 
> > > Depending on what you end up doing with the requests, if you keep the
> > > bool flag you could just use the kvm->lock mutex instead.
> > > 
> > > Have you considered if there are any potential races between
> > > kvm_psci_system_off() being called on one VCPU while two other VPCUs are
> > > turning on the same CPU that is being turend off as part of system-wide
> > > power down as well?
> > 
> > Sounds like a nice unit test.  I haven't considered it, but I guess
> > the kvm_psci_system_off/reset calling VCPU will ultimately "win", as
> > it'll cause an exit to userspace that initiates a shutdown/reset.
> > When the VCPUs are restarted then vcpu init should reset the power_off
> > state correctly.  As long as the race this patch addresses is fixed, then
> > I'm not sure there should be any risk with the actual system_off/reset
> > being delayed wrt a vcpu being "on'ed" again, nor with there being more
> > than one VCPU trying to "on" it at the same time.
> > 
> > > 
> > > I'm wondering if this means we should take the kvm->lock at a higher
> > > level when handling PSCI events...
> > 
> > That would simplify our analysis of the PSCI emulation, but I'm not
> > sure we want to give a guest the power to constantly acquire that
> > mutex with a barrage of PSCI calls.  Maybe we should create a PSCI
> > mutex?  In order to avoid holding it too long we may want power_off to
> > be more than a boolean though, i.e. the PENDING state might also be
> > a good idea to represent.
> > 
> 
> Hmm, the kvm->lock mutex is per-VM, so if a VM wants to use its CPU
> resources by taking its own mutex, I don't really see the problem.

I was worried about management paths that lead to a need for that
lock. For example, I see x86's kvm_free_vcpus(), called from
kvm_arch_destroy_vm(), acquires it. A quick grep of ARM code doesn't
reveal anything though.

Thanks,
drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on
  2017-04-05  9:12         ` Andrew Jones
@ 2017-04-05  9:30           ` Christoffer Dall
  0 siblings, 0 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-05  9:30 UTC (permalink / raw)
  To: Andrew Jones; +Cc: Levente Kurusa, kvm, marc.zyngier, pbonzini, kvmarm

On Wed, Apr 05, 2017 at 11:12:12AM +0200, Andrew Jones wrote:
> On Wed, Apr 05, 2017 at 10:50:05AM +0200, Christoffer Dall wrote:
> > On Wed, Apr 05, 2017 at 10:35:59AM +0200, Andrew Jones wrote:
> > > On Tue, Apr 04, 2017 at 09:42:08PM +0200, Christoffer Dall wrote:
> > > > On Fri, Mar 31, 2017 at 06:06:57PM +0200, Andrew Jones wrote:
> > > > > From: Levente Kurusa <lkurusa@redhat.com>
> > > > > 
> > > > > When two vcpus issue PSCI_CPU_ON on the same core at the same time,
> > > > > then it's possible for them to both enter the target vcpu's setup
> > > > > at the same time. This results in unexpected behaviors at best,
> > > > > and the potential for some nasty bugs at worst.
> > > > > 
> > > > > Signed-off-by: Levente Kurusa <lkurusa@redhat.com>
> > > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > > ---
> > > > >  arch/arm/kvm/psci.c | 4 ++--
> > > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > > > > index f732484abc7a..0204daa899b1 100644
> > > > > --- a/arch/arm/kvm/psci.c
> > > > > +++ b/arch/arm/kvm/psci.c
> > > > > @@ -88,7 +88,8 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> > > > >  	 */
> > > > >  	if (!vcpu)
> > > > >  		return PSCI_RET_INVALID_PARAMS;
> > > > > -	if (!test_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > > > +
> > > > > +	if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > > > >  		if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1)
> > > > >  			return PSCI_RET_ALREADY_ON;
> > > > >  		else
> > > > > @@ -116,7 +117,6 @@ static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> > > > >  	 * the general puspose registers are undefined upon CPU_ON.
> > > > >  	 */
> > > > >  	vcpu_set_reg(vcpu, 0, context_id);
> > > > > -	clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests);
> > > > >  
> > > > >  	wq = kvm_arch_vcpu_wq(vcpu);
> > > > >  	swake_up(wq);
> > > > > -- 
> > > > > 2.9.3
> > > > > 
> > > > 
> > > > Depending on what you end up doing with the requests, if you keep the
> > > > bool flag you could just use the kvm->lock mutex instead.
> > > > 
> > > > Have you considered if there are any potential races between
> > > > kvm_psci_system_off() being called on one VCPU while two other VPCUs are
> > > > turning on the same CPU that is being turend off as part of system-wide
> > > > power down as well?
> > > 
> > > Sounds like a nice unit test.  I haven't considered it, but I guess
> > > the kvm_psci_system_off/reset calling VCPU will ultimately "win", as
> > > it'll cause an exit to userspace that initiates a shutdown/reset.
> > > When the VCPUs are restarted then vcpu init should reset the power_off
> > > state correctly.  As long as the race this patch addresses is fixed, then
> > > I'm not sure there should be any risk with the actual system_off/reset
> > > being delayed wrt a vcpu being "on'ed" again, nor with there being more
> > > than one VCPU trying to "on" it at the same time.
> > > 
> > > > 
> > > > I'm wondering if this means we should take the kvm->lock at a higher
> > > > level when handling PSCI events...
> > > 
> > > That would simplify our analysis of the PSCI emulation, but I'm not
> > > sure we want to give a guest the power to constantly acquire that
> > > mutex with a barrage of PSCI calls.  Maybe we should create a PSCI
> > > mutex?  In order to avoid holding it too long we may want power_off to
> > > be more than a boolean though, i.e. the PENDING state might also be
> > > a good idea to represent.
> > > 
> > 
> > Hmm, the kvm->lock mutex is per-VM, so if a VM wants to use its CPU
> > resources by taking its own mutex, I don't really see the problem.
> 
> I was worried about management paths that lead to a need for that
> lock. For example, I see x86's kvm_free_vcpus(), called from
> kvm_arch_destroy_vm(), acquires it. A quick grep of ARM code doesn't
> reveal anything though.
> 
Even in that case, PSCI is guaranteed to make progress, right?  So I
still don't understand the challenge.

In any case, I'll have a look over this patch again when you respin.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR
  2017-04-05  8:50     ` Andrew Jones
@ 2017-04-05 11:03       ` Christoffer Dall
  2017-04-05 11:14         ` Andrew Jones
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-05 11:03 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, Apr 05, 2017 at 10:50:05AM +0200, Andrew Jones wrote:
> On Tue, Apr 04, 2017 at 09:44:39PM +0200, Christoffer Dall wrote:
> > On Fri, Mar 31, 2017 at 06:06:58PM +0200, Andrew Jones wrote:
> > > Cache the MPIDR in the vcpu structure to fix potential races that
> > > can arise between vcpu reset and the extraction of the MPIDR from
> > > the sys-reg array.
> > 
> > I don't understand the race, sorry.
> > 
> > Can you be more specific in where this goes wrong and exactly what this
> > fixes?
> > 
> 
> At the start of kvm_psci_vcpu_on() we look up the vcpu struct of
> the target vcpu by MPIDR.
> 
>  vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
> 
> This necessarily comes before the newly added
> 
>  if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests))
> 
> If another vcpu is trying to PSCI_ON the same target vcpu at the same
> time, but is further along, i.e. already past the test-and-clear and
> even in kvm_reset_vcpu(), then there's a chance it has already called
> into kvm_reset_sys_regs(), which does
> 
>  /* Catch someone adding a register without putting in reset entry. */
>  memset(&vcpu->arch.ctxt.sys_regs, 0x42, sizeof(vcpu->arch.ctxt.sys_regs));

Ah right, I didn't remember that we did this.

> 
> but has not yet called reset_mpidr(). In that case the kvm_mpidr_to_vcpu()
> pointed out above will fail to find the vcpu, which results in the PSCI_ON
> call returning PSCI_RET_INVALID_PARAMS, as can be seen from this snip
> of kvm_psci_vcpu_on()
> 
>  static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  {
>         struct kvm *kvm = source_vcpu->kvm;
>         struct kvm_vcpu *vcpu = NULL;
>         struct swait_queue_head *wq;
>         unsigned long cpu_id;
>         unsigned long context_id;
>         phys_addr_t target_pc;
> 
>         cpu_id = vcpu_get_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK;
>         if (vcpu_mode_is_32bit(source_vcpu))
>                 cpu_id &= ~((u32) 0);
> 
>         vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
> 
>         /*
>          * Make sure the caller requested a valid CPU and that the CPU is
>          * turned off.
>          */
>         if (!vcpu)
>                 return PSCI_RET_INVALID_PARAMS;
> 
>         if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> ...
> 

Thanks for the explanation.  Could you add this to the commit message as
you respin:

  ...arise between vcpu reset, which fills the entire sys_regs array
  with a temporary value including the MPIDR register, and looking up
  the VCPU based on the MPIDR value.

With that:

Reviewed-by: Christoffer Dall <cdall@linaro.org>

> 
> > Thanks,
> > -Christoffer
> > 
> > > 
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > ---
> > >  arch/arm/include/asm/kvm_emulate.h   |  2 +-
> > >  arch/arm/include/asm/kvm_host.h      |  3 +++
> > >  arch/arm/kvm/coproc.c                | 20 ++++++++++++--------
> > >  arch/arm64/include/asm/kvm_emulate.h |  2 +-
> > >  arch/arm64/include/asm/kvm_host.h    |  3 +++
> > >  arch/arm64/kvm/sys_regs.c            | 27 ++++++++++++++-------------
> > >  6 files changed, 34 insertions(+), 23 deletions(-)
> > > 
> > > diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> > > index 9a8a45aaf19a..1b922de46785 100644
> > > --- a/arch/arm/include/asm/kvm_emulate.h
> > > +++ b/arch/arm/include/asm/kvm_emulate.h
> > > @@ -213,7 +213,7 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
> > >  
> > >  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
> > >  {
> > > -	return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK;
> > > +	return vcpu->arch.vmpidr & MPIDR_HWID_BITMASK;
> > >  }
> > >  
> > >  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > > index 0b8a6d6b3cb3..e0f461f0af67 100644
> > > --- a/arch/arm/include/asm/kvm_host.h
> > > +++ b/arch/arm/include/asm/kvm_host.h
> > > @@ -151,6 +151,9 @@ struct kvm_vcpu_arch {
> > >  	/* The CPU type we expose to the VM */
> > >  	u32 midr;
> > >  
> > > +	/* vcpu MPIDR */
> > > +	u32 vmpidr;
> > > +
> > >  	/* HYP trapping configuration */
> > >  	u32 hcr;
> > >  
> > > diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> > > index 3e5e4194ef86..c4df7c9c8ddb 100644
> > > --- a/arch/arm/kvm/coproc.c
> > > +++ b/arch/arm/kvm/coproc.c
> > > @@ -101,14 +101,18 @@ int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > >  
> > >  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
> > >  {
> > > -	/*
> > > -	 * Compute guest MPIDR. We build a virtual cluster out of the
> > > -	 * vcpu_id, but we read the 'U' bit from the underlying
> > > -	 * hardware directly.
> > > -	 */
> > > -	vcpu_cp15(vcpu, c0_MPIDR) = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> > > -				     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
> > > -				     (vcpu->vcpu_id & 3));
> > > +	if (!vcpu->arch.vmpidr) {
> > > +		/*
> > > +		 * Compute guest MPIDR. We build a virtual cluster out of the
> > > +		 * vcpu_id, but we read the 'U' bit from the underlying
> > > +		 * hardware directly.
> > > +		 */
> > > +		u32 mpidr = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> > > +			     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
> > > +			     (vcpu->vcpu_id & 3));
> > > +		vcpu->arch.vmpidr = mpidr;
> > > +	}
> > > +	vcpu_cp15(vcpu, c0_MPIDR) = vcpu->arch.vmpidr;
> > >  }
> > >  
> > >  /* TRM entries A7:4.3.31 A15:4.3.28 - RO WI */
> > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > > index f5ea0ba70f07..c138bb15b507 100644
> > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > @@ -242,7 +242,7 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
> > >  
> > >  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
> > >  {
> > > -	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> > > +	return vcpu->arch.vmpidr_el2 & MPIDR_HWID_BITMASK;
> > >  }
> > >  
> > >  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 7057512b3474..268c10d95a79 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -198,6 +198,9 @@ typedef struct kvm_cpu_context kvm_cpu_context_t;
> > >  struct kvm_vcpu_arch {
> > >  	struct kvm_cpu_context ctxt;
> > >  
> > > +	/* vcpu MPIDR */
> > > +	u64 vmpidr_el2;
> > > +
> > >  	/* HYP configuration */
> > >  	u64 hcr_el2;
> > >  	u32 mdcr_el2;
> > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > > index 0e26f8c2b56f..517aed6d8016 100644
> > > --- a/arch/arm64/kvm/sys_regs.c
> > > +++ b/arch/arm64/kvm/sys_regs.c
> > > @@ -431,19 +431,20 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > >  
> > >  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > >  {
> > > -	u64 mpidr;
> > > -
> > > -	/*
> > > -	 * Map the vcpu_id into the first three affinity level fields of
> > > -	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
> > > -	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> > > -	 * of the GICv3 to be able to address each CPU directly when
> > > -	 * sending IPIs.
> > > -	 */
> > > -	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> > > -	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> > > -	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> > > -	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
> > > +	if (!vcpu->arch.vmpidr_el2) {
> > > +		/*
> > > +		 * Map the vcpu_id into the first three affinity level fields
> > > +		 * of the MPIDR. We limit the number of VCPUs in level 0 due to
> > > +		 * a limitation of 16 CPUs in that level in the ICC_SGIxR
> > > +		 * registers of the GICv3, which are used to address each CPU
> > > +		 * directly when sending IPIs.
> > > +		 */
> > > +		u64 mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> > > +		mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> > > +		mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> > > +		vcpu->arch.vmpidr_el2 = (1ULL << 31) | mpidr;
> > > +	}
> > > +	vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu->arch.vmpidr_el2;
> > >  }
> > >  
> > >  static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > > -- 
> > > 2.9.3
> > > 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR
  2017-04-05 11:03       ` Christoffer Dall
@ 2017-04-05 11:14         ` Andrew Jones
  0 siblings, 0 replies; 85+ messages in thread
From: Andrew Jones @ 2017-04-05 11:14 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Wed, Apr 05, 2017 at 01:03:36PM +0200, Christoffer Dall wrote:
> On Wed, Apr 05, 2017 at 10:50:05AM +0200, Andrew Jones wrote:
> > On Tue, Apr 04, 2017 at 09:44:39PM +0200, Christoffer Dall wrote:
> > > On Fri, Mar 31, 2017 at 06:06:58PM +0200, Andrew Jones wrote:
> > > > Cache the MPIDR in the vcpu structure to fix potential races that
> > > > can arise between vcpu reset and the extraction of the MPIDR from
> > > > the sys-reg array.
> > > 
> > > I don't understand the race, sorry.
> > > 
> > > Can you be more specific in where this goes wrong and exactly what this
> > > fixes?
> > > 
> > 
> > At the start of kvm_psci_vcpu_on() we look up the vcpu struct of
> > the target vcpu by MPIDR.
> > 
> >  vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
> > 
> > This necessarily comes before the newly added
> > 
> >  if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests))
> > 
> > If another vcpu is trying to PSCI_ON the same target vcpu at the same
> > time, but is further along, i.e. already past the test-and-clear and
> > even in kvm_reset_vcpu(), then there's a chance it has already called
> > into kvm_reset_sys_regs(), which does
> > 
> >  /* Catch someone adding a register without putting in reset entry. */
> >  memset(&vcpu->arch.ctxt.sys_regs, 0x42, sizeof(vcpu->arch.ctxt.sys_regs));
> 
> Ah right, I didn't remember that we did this.
> 
> > 
> > but has not yet called reset_mpidr(). In that case the kvm_mpidr_to_vcpu()
> > pointed out above will fail to find the vcpu, which results in the PSCI_ON
> > call returning PSCI_RET_INVALID_PARAMS, as can be seen from this snip
> > of kvm_psci_vcpu_on()
> > 
> >  static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> >  {
> >         struct kvm *kvm = source_vcpu->kvm;
> >         struct kvm_vcpu *vcpu = NULL;
> >         struct swait_queue_head *wq;
> >         unsigned long cpu_id;
> >         unsigned long context_id;
> >         phys_addr_t target_pc;
> > 
> >         cpu_id = vcpu_get_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK;
> >         if (vcpu_mode_is_32bit(source_vcpu))
> >                 cpu_id &= ~((u32) 0);
> > 
> >         vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
> > 
> >         /*
> >          * Make sure the caller requested a valid CPU and that the CPU is
> >          * turned off.
> >          */
> >         if (!vcpu)
> >                 return PSCI_RET_INVALID_PARAMS;
> > 
> >         if (!test_and_clear_bit(KVM_REQ_POWER_OFF, &vcpu->requests)) {
> > ...
> > 
> 
> Thanks for the explanation.  Could you add this to the commit message as
> you respin:
> 
>   ...arise between vcpu reset, which fills the entire sys_regs array
>   with a temporary value including the MPIDR register, and looking up
>   the VCPU based on the MPIDR value.

Will do.

> 
> With that:
> 
> Reviewed-by: Christoffer Dall <cdall@linaro.org>

Thanks,
drew

> 
> > 
> > > Thanks,
> > > -Christoffer
> > > 
> > > > 
> > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > ---
> > > >  arch/arm/include/asm/kvm_emulate.h   |  2 +-
> > > >  arch/arm/include/asm/kvm_host.h      |  3 +++
> > > >  arch/arm/kvm/coproc.c                | 20 ++++++++++++--------
> > > >  arch/arm64/include/asm/kvm_emulate.h |  2 +-
> > > >  arch/arm64/include/asm/kvm_host.h    |  3 +++
> > > >  arch/arm64/kvm/sys_regs.c            | 27 ++++++++++++++-------------
> > > >  6 files changed, 34 insertions(+), 23 deletions(-)
> > > > 
> > > > diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h
> > > > index 9a8a45aaf19a..1b922de46785 100644
> > > > --- a/arch/arm/include/asm/kvm_emulate.h
> > > > +++ b/arch/arm/include/asm/kvm_emulate.h
> > > > @@ -213,7 +213,7 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu *vcpu)
> > > >  
> > > >  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
> > > >  {
> > > > -	return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK;
> > > > +	return vcpu->arch.vmpidr & MPIDR_HWID_BITMASK;
> > > >  }
> > > >  
> > > >  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> > > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > > > index 0b8a6d6b3cb3..e0f461f0af67 100644
> > > > --- a/arch/arm/include/asm/kvm_host.h
> > > > +++ b/arch/arm/include/asm/kvm_host.h
> > > > @@ -151,6 +151,9 @@ struct kvm_vcpu_arch {
> > > >  	/* The CPU type we expose to the VM */
> > > >  	u32 midr;
> > > >  
> > > > +	/* vcpu MPIDR */
> > > > +	u32 vmpidr;
> > > > +
> > > >  	/* HYP trapping configuration */
> > > >  	u32 hcr;
> > > >  
> > > > diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
> > > > index 3e5e4194ef86..c4df7c9c8ddb 100644
> > > > --- a/arch/arm/kvm/coproc.c
> > > > +++ b/arch/arm/kvm/coproc.c
> > > > @@ -101,14 +101,18 @@ int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > >  
> > > >  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct coproc_reg *r)
> > > >  {
> > > > -	/*
> > > > -	 * Compute guest MPIDR. We build a virtual cluster out of the
> > > > -	 * vcpu_id, but we read the 'U' bit from the underlying
> > > > -	 * hardware directly.
> > > > -	 */
> > > > -	vcpu_cp15(vcpu, c0_MPIDR) = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> > > > -				     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
> > > > -				     (vcpu->vcpu_id & 3));
> > > > +	if (!vcpu->arch.vmpidr) {
> > > > +		/*
> > > > +		 * Compute guest MPIDR. We build a virtual cluster out of the
> > > > +		 * vcpu_id, but we read the 'U' bit from the underlying
> > > > +		 * hardware directly.
> > > > +		 */
> > > > +		u32 mpidr = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) |
> > > > +			     ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) |
> > > > +			     (vcpu->vcpu_id & 3));
> > > > +		vcpu->arch.vmpidr = mpidr;
> > > > +	}
> > > > +	vcpu_cp15(vcpu, c0_MPIDR) = vcpu->arch.vmpidr;
> > > >  }
> > > >  
> > > >  /* TRM entries A7:4.3.31 A15:4.3.28 - RO WI */
> > > > diff --git a/arch/arm64/include/asm/kvm_emulate.h b/arch/arm64/include/asm/kvm_emulate.h
> > > > index f5ea0ba70f07..c138bb15b507 100644
> > > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > > @@ -242,7 +242,7 @@ static inline u8 kvm_vcpu_trap_get_fault_type(const struct kvm_vcpu *vcpu)
> > > >  
> > > >  static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu)
> > > >  {
> > > > -	return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;
> > > > +	return vcpu->arch.vmpidr_el2 & MPIDR_HWID_BITMASK;
> > > >  }
> > > >  
> > > >  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > index 7057512b3474..268c10d95a79 100644
> > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > @@ -198,6 +198,9 @@ typedef struct kvm_cpu_context kvm_cpu_context_t;
> > > >  struct kvm_vcpu_arch {
> > > >  	struct kvm_cpu_context ctxt;
> > > >  
> > > > +	/* vcpu MPIDR */
> > > > +	u64 vmpidr_el2;
> > > > +
> > > >  	/* HYP configuration */
> > > >  	u64 hcr_el2;
> > > >  	u32 mdcr_el2;
> > > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
> > > > index 0e26f8c2b56f..517aed6d8016 100644
> > > > --- a/arch/arm64/kvm/sys_regs.c
> > > > +++ b/arch/arm64/kvm/sys_regs.c
> > > > @@ -431,19 +431,20 @@ static void reset_amair_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > > >  
> > > >  static void reset_mpidr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > > >  {
> > > > -	u64 mpidr;
> > > > -
> > > > -	/*
> > > > -	 * Map the vcpu_id into the first three affinity level fields of
> > > > -	 * the MPIDR. We limit the number of VCPUs in level 0 due to a
> > > > -	 * limitation to 16 CPUs in that level in the ICC_SGIxR registers
> > > > -	 * of the GICv3 to be able to address each CPU directly when
> > > > -	 * sending IPIs.
> > > > -	 */
> > > > -	mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> > > > -	mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> > > > -	mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> > > > -	vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr;
> > > > +	if (!vcpu->arch.vmpidr_el2) {
> > > > +		/*
> > > > +		 * Map the vcpu_id into the first three affinity level fields
> > > > +		 * of the MPIDR. We limit the number of VCPUs in level 0 due to
> > > > +		 * a limitation of 16 CPUs in that level in the ICC_SGIxR
> > > > +		 * registers of the GICv3, which are used to address each CPU
> > > > +		 * directly when sending IPIs.
> > > > +		 */
> > > > +		u64 mpidr = (vcpu->vcpu_id & 0x0f) << MPIDR_LEVEL_SHIFT(0);
> > > > +		mpidr |= ((vcpu->vcpu_id >> 4) & 0xff) << MPIDR_LEVEL_SHIFT(1);
> > > > +		mpidr |= ((vcpu->vcpu_id >> 12) & 0xff) << MPIDR_LEVEL_SHIFT(2);
> > > > +		vcpu->arch.vmpidr_el2 = (1ULL << 31) | mpidr;
> > > > +	}
> > > > +	vcpu_sys_reg(vcpu, MPIDR_EL1) = vcpu->arch.vmpidr_el2;
> > > >  }
> > > >  
> > > >  static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r)
> > > > -- 
> > > > 2.9.3
> > > > 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-05  7:09           ` Christoffer Dall
@ 2017-04-05 11:37             ` Paolo Bonzini
  2017-04-06 14:14               ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-05 11:37 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, kvmarm, kvm

[-- Attachment #1: Type: text/plain, Size: 5661 bytes --]



On 05/04/2017 09:09, Christoffer Dall wrote:
>>>  - In the explanation you wrote, you use the term 'we' a lot, but when
>>>    talking about SMP barriers, I think it only makes sense to talk about
>>>    actions and observations between multiple CPUs and we have to be
>>>    specific about which CPU observes or does what with respect to the
>>>    other.  Maybe I'm being a stickler here, but there something here
>>>    which is making me uneasy.
>> The write1-mb-if(read2) / write2-mb-if(read1) pattern is pretty common,
>> so I think it is justified to cut the ordering on the reasoning and just
>> focus on what the two memory locations and conditions mean.
> ok, but the pattern above was not common to me (and I'm pretty sure I'm
> not the only fool in the bunch here), so if we can reference something
> that explains that this is a known pattern which has been tried and
> proven, that would be even better.

I found https://lwn.net/Articles/573436/ which shows this example:

  CPU 0					CPU 1
  ---------------------			----------------------
  WRITE_ONCE(x, 1);			WRITE_ONCE(y, 1);
  smp_mb();				smp_mb();
  r2 = READ_ONCE(y);			r4 = READ_ONCE(x);

And says that it is a bug if r2 == 0 && r4 == 0.  This is exactly what 
happens in KVM:

  CPU 0					CPU 1
  ---------------------			----------------------
  vcpu->mode = IN_GUEST_MODE;		kvm_make_request(REQ, vcpu);
  smp_mb();				smp_mb();
  r2 = kvm_request_pending(vcpu)	r4 = (vcpu->mode == IN_GUEST_MODE)
  if (r2)				if (r4)
	abort entry				kick();

If r2 sees no request and r4 doesn't kick there would be a bug.
But why can't this happen?

- if no request is pending at the time of the read to r2, CPU 1 must
not have executed kvm_make_request yet.  In CPU 0, kvm_request_pending
must happen after vcpu->mode is set to IN_GUEST_MODE, therefore CPU 1
will read IN_GUEST_MODE and kick.

- if no kick happens in CPU 1, CPU 0 must not have set vcpu->mode yet.
In CPU 1, vcpu->mode is read after setting the request bit, therefore
CPU 0 will see the request bit and abort the guest entry.

>>>  - Finally, it feels very hard to prove the correctness of this, and
>>>    equally hard to test it (given how long we've been running with
>>>    apparently racy code).  I would hope that we could abstract some of
>>>    this into architecture generic things, that someone who eat memory
>>>    barriers for breakfast could help us verify, but again, maybe this is
>>>    Radim's series I'm asking for here.
>>
>> What I can do here is to suggest copying the paradigms from x86, which
>> is quite battle tested (Windows hammers it really hard).
>
> That sounds reasonable, but I think part of the problem was that we
> simply didn't understand what the paradigms were (see the
> kvm_make_all_cpus_request above as an example), so Drew's action about
> documenting what this all is and the constraints of using it is really
> important for me to do that.

Yes, totally agreed on that.

>> For QEMU I did use model checking in the past for some similarly hairy
>> synchronization code, but that is really just "executable documentation"
>> because the model is not written in C.
>>
> I played with using blast on some of the KVM/ARM code a long time ago,
> and while I was able to find a bug with it, it was sort of an obvious
> bug, and the things I was able to do with it was pretty limited to the
> problems I could imagine myself anyhow.  Perhaps this is what you mean
> with executable documentation.

I prepared three examples of a spin model for KVM vCPU kicking, and
the outcome was actually pretty surprising: the mode check seems not
to be necessary.

I haven't covered all x86 cases so I'm not going to remove it right
ahead, but for ARM it really seems like EXITING_GUEST_MODE is nothing
but an optimization of consecutive kvm_vcpu_kicks.

All three models can use C preprocessor #defines to inject bugs:

- kvm-arm-pause.promela: the "paused" mechanism; the model proves that
  the "paused" test in the interrupt-disabled region is necessary

- kvm-req.promela: the requests mechanism; the model proves that
  the requests check in the interrupt-disabled region is necessary

- kvm-x86-pi.promela: the x86 posted interrupt mechanism (simplified
  a bit); the model proves that KVM must disable interrupts before
  checking for interrupts injected while outside guest mode
  (commit b95234c84004, "kvm: x86: do not use KVM_REQ_EVENT for APICv
  interrupt injection", 2017-02-15)

So it seems like there are no races after all in KVM/ARM code, though
the code can still be cleaned up.  And I have been convinced of the wrong
thing all the time. :)

But why is KVM/ARM using KVM_REQ_VCPU_EXIT
just fine without checking for requests (kvm-req.promela)?  Because,
as mentioned earlier in the thread, KVM/ARM is using kvm_make_all_vcpus_request
simply to kick all VCPUs.  The paused variable _is_ checked after disabling
interrupts, and that is fine.

After this experiment, I think I like Drew's KVM_REQ_PAUSE more than I did
yesterday.  However, yet another alternative is to leave pause/power_off as
they are, while taking some inspiration from his patch to do some cleanups:

1) change the "if"

                if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
                        vcpu->arch.power_off || vcpu->arch.pause) {

to test kvm_requests_pending instead of pause/power_off

2) clear KVM_REQ_VCPU_EXIT before the other "if":

                if (vcpu->arch.power_off || vcpu->arch.pause)
                        vcpu_sleep(vcpu);


In any case, the no-wakeup behavior of kvm_make_all_vcpus_request suits
either use of requests (KVM_REQ_PAUSE or "fixed" KVM_REQ_VCPU_EXIT).

Paolo

[-- Attachment #2: kvm-arm-pause.promela --]
[-- Type: text/plain, Size: 2286 bytes --]

/* To run the model checker:
 *
 *      spin -a kvm-arm-pause.promela
 *      gcc -O2 pan.c
 *      ./a.out -a -f
 *
 * Remove the tests using -DREMOVE_MODE_TEST, -DREMOVE_PAUSED_TEST
 * right after -a.  The mode test is not necessary, the paused test is.
 */
#define OUTSIDE_GUEST_MODE      0
#define IN_GUEST_MODE           1
#define EXITING_GUEST_MODE      2

bool kick;
bool paused;
int vcpu_mode = OUTSIDE_GUEST_MODE;

active proctype vcpu_run()
{
    do
        :: true -> {
            /* In paused state, sleep with interrupts on */
            if
                :: !paused -> skip;
            fi;

            /* IPIs are eaten until interrupts are turned off.  */
            kick = 0;

            /* Interrupts are now off. */
            vcpu_mode = IN_GUEST_MODE;

            if
#ifndef REMOVE_MODE_TEST
                :: vcpu_mode != IN_GUEST_MODE -> skip;
#endif
#ifndef REMOVE_PAUSED_TEST
                :: paused -> skip;
#endif
                :: else -> {
                    do
                        /* Stay in guest mode until an IPI comes */
                        :: kick -> break;
                    od;
                }
            fi;
            vcpu_mode = OUTSIDE_GUEST_MODE;

            /* Turn on interrupts */
        }
    od
}

active proctype vcpu_kick()
{
    int old;

    do
        :: true -> {
            paused = 1;
            /* cmpxchg */
            atomic {
                old = vcpu_mode;
                if
                    :: vcpu_mode == IN_GUEST_MODE ->
                        vcpu_mode = EXITING_GUEST_MODE;
                    :: else -> skip;
                fi;
            }

            if
                :: old == IN_GUEST_MODE -> kick = 1;
                :: else -> skip;
            fi;

            if
               :: vcpu_mode == OUTSIDE_GUEST_MODE -> paused = 0;
            fi; 
        }
    od;
}

never {
    do
       /* After an arbitrarily long prefix */
       :: 1 -> skip;

       /* if we get a pause request */
       :: paused -> break;
    od; 

accept:
    /* we must eventually leave guest mode (this condition is reversed!) */
    do
       :: vcpu_mode != OUTSIDE_GUEST_MODE
    od; 
}

[-- Attachment #3: kvm-req.promela --]
[-- Type: text/plain, Size: 2358 bytes --]

/* To run the model checker:
 *
 *      spin -a kvm-req.promela
 *      gcc -O2 pan.c
 *      ./a.out -a -f
 *
 * Remove the tests using -DREMOVE_MODE_TEST, -DREMOVE_REQ_TEST
 * right after -a.  The mode test is not necessary, the vcpu_req test is.
 */
#define OUTSIDE_GUEST_MODE      0
#define IN_GUEST_MODE           1
#define EXITING_GUEST_MODE      2

bool kick;
int vcpu_req;
int vcpu_mode = OUTSIDE_GUEST_MODE;

active proctype vcpu_run()
{
    do
        :: true -> {
            /* Requests are processed with interrupts on */
            vcpu_req = 0;

            /* IPIs are eaten until interrupts are turned off.  */
            kick = 0;

            /* Interrupts are now off. */
            vcpu_mode = IN_GUEST_MODE;

            if
#ifndef REMOVE_MODE_TEST
                :: vcpu_mode != IN_GUEST_MODE -> skip;
#endif
#ifndef REMOVE_REQ_TEST
                :: vcpu_req -> skip;
#endif
                :: else -> {
                    do
                        /* Stay in guest mode until an IPI comes */
                        :: kick -> break;
                    od;
                }
            fi;
            vcpu_mode = OUTSIDE_GUEST_MODE;

            /* Turn on interrupts */
        }
    od
}

active proctype vcpu_kick()
{
    int old;

    do
        :: true -> {
            vcpu_req = 1;
            if
                :: old == 0 -> {
                    /* cmpxchg */
                    atomic {
                        old = vcpu_mode;
                        if
                            :: vcpu_mode == IN_GUEST_MODE ->
                                vcpu_mode = EXITING_GUEST_MODE;
                            :: else -> skip;
                        fi;
                    }

                    if
                        :: old == IN_GUEST_MODE -> kick = 1;
                        :: else -> skip;
                    fi;
                }
                :: else -> skip;
            fi;
        }
    od;
}

never {
    do
       /* After an arbitrarily long prefix */
       :: 1 -> skip;

       /* we get in guest mode */
       :: vcpu_mode == IN_GUEST_MODE -> break;
    od; 

accept:
    /* and never leave it (this condition is reversed!) */
    do
       :: vcpu_mode != OUTSIDE_GUEST_MODE
    od; 
}

[-- Attachment #4: kvm-x86-pi.promela --]
[-- Type: text/plain, Size: 3000 bytes --]

/* To run the model checker:
 *
 *      spin -a kvm-x86-pi.promela
 *      gcc -O2 pan.c
 *      ./a.out -a -f
 *
 * Remove the test using -DREMOVE_MODE_TEST, move the PIR->IRR sync
 * before local_irq_disable() with SYNC_WITH_INTERRUPTS_ENABLED.  The
 * mode test is not necessary, while sync_pir_to_irr must be placed
 * after interrupts are disabled.
 */
#define OUTSIDE_GUEST_MODE      0
#define IN_GUEST_MODE           1
#define EXITING_GUEST_MODE      2

bool kick;
bool posted_interrupt;
int vcpu_pir;
int vcpu_mode = OUTSIDE_GUEST_MODE;

active proctype vcpu_run()
{
    do
        :: true -> {
#ifdef SYNC_WITH_INTERRUPTS_ENABLED
            /* Guest interrupts are injected with interrupts off */
            vcpu_pir = 0;
#endif

            /* Both kinds of IPI are eaten until interrupts are turned off.  */
            atomic {
                kick = 0;
                posted_interrupt = 0;
            }

            /* Interrupts are now off. */
            vcpu_mode = IN_GUEST_MODE;

#ifndef SYNC_WITH_INTERRUPTS_ENABLED
            /* Guest interrupts are injected with interrupts off */
            vcpu_pir = 0;
#endif

            if
#ifndef REMOVE_MODE_TEST
                :: vcpu_mode != IN_GUEST_MODE -> skip;
#endif

                :: else -> {
                    do
                        /* Stay in guest mode until an IPI comes */
                        :: kick -> break;

                        /* The processor handles the posted interrupt IPI */
                        :: posted_interrupt -> vcpu_pir = 0;
                    od;
                }
            fi;
            vcpu_mode = OUTSIDE_GUEST_MODE;

            /* Turn on interrupts */
        }
    od
}

active proctype vcpu_posted_interrupt()
{
    int old;

    do
        :: vcpu_pir == 0 -> {
            vcpu_pir = 1;
            if
                :: vcpu_mode == IN_GUEST_MODE ->
                    /* If in guest mode, we can send a posted interrupt IPI */
                    posted_interrupt = 1;

                :: else -> {
                    /* Else, do a kvm_vcpu_kick.  */
                    atomic {
                        old = vcpu_mode;
                        if
                            :: vcpu_mode == IN_GUEST_MODE ->
                                vcpu_mode = EXITING_GUEST_MODE;
                            :: else -> skip;
                        fi;
                    }

                    if
                        :: old == IN_GUEST_MODE -> kick = 1;
                        :: else -> skip;
                    fi;
                }
            fi;
        }
    od;
}

never {
    do
       /* After an arbitrarily long prefix */
       :: 1 -> skip;

       /* if we get an interrupt */
       :: vcpu_pir -> break;
    od; 

accept:
    /* we must eventually inject it (this condition is reversed!) */
    do
       :: vcpu_pir
    od; 
}

[-- Attachment #5: Type: text/plain, Size: 151 bytes --]

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-04 16:41     ` Andrew Jones
@ 2017-04-05 13:10       ` Radim Krčmář
  2017-04-05 17:39         ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Radim Krčmář @ 2017-04-05 13:10 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, Christoffer Dall, kvmarm, kvm, pbonzini

2017-04-04 18:41+0200, Andrew Jones:
> On Tue, Apr 04, 2017 at 05:30:14PM +0200, Christoffer Dall wrote:
>> On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
>> > From: Radim Krčmář <rkrcmar@redhat.com>
>> > 
>> > A first step in vcpu->requests encapsulation.
>> 
>> Could we have a note here on why we need to access vcpu->requests using
>> READ_ONCE now?
> 
> Sure, maybe we should put the note as a comment above the read in
> kvm_request_pending().  Something like
> 
>  /*
>   * vcpu->requests reads may appear in sequences that have strict
>   * data or control dependencies.  Use READ_ONCE() to ensure the
>   * compiler does not do anything that breaks the required ordering.
>   */
> 
> Radim?

Uses of vcpu->requests should already have barriers that take care of
the ordering.  I think the main reason for READ_ONCE() is to tell
programmers that requests are special, but predictable.

READ_ONCE() is not necessary in any use I'm aware of, but there is no
harm in telling the compiler that vcpu->requests are what we think they
are ...

 /*
  * vcpu->requests are a lockless synchronization mechanism, where
  * memory barriers are necessary for correct behavior, see
  * Documentation/virtual/kvm/vcpu-requests.rst.
  *
  * READ_ONCE() is not necessary for correctness, but simplifies
  * reasoning by constricting the generated code.
  */

I considered READ_ONCE() to be self-documenting. :)
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-04 17:23       ` Christoffer Dall
  2017-04-04 17:36         ` Paolo Bonzini
@ 2017-04-05 14:11         ` Radim Krčmář
  2017-04-05 17:45           ` Christoffer Dall
  1 sibling, 1 reply; 85+ messages in thread
From: Radim Krčmář @ 2017-04-05 14:11 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

2017-04-04 19:23+0200, Christoffer Dall:
> On Tue, Apr 04, 2017 at 07:06:00PM +0200, Andrew Jones wrote:
>> On Tue, Apr 04, 2017 at 05:24:03PM +0200, Christoffer Dall wrote:
>> > On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote:
>> > > +and will definitely see the request, or is outside guest mode, but has yet
>> > > +to do its final request check, and therefore when it does, it will see the
>> > > +request, then things will work.  However, the transition from outside to
>> > > +inside guest mode, after the last request check has been made, opens a
>> > > +window where a request could be made, but the VCPU would not see until it
>> > > +exits guest mode some time later.  See the table below.
>> > 
>> > This text, and the table below, only deals with the details of entering
>> > the guest.  Should we talk about kvm_vcpu_exiting_guest_mode() and
>> > anything related to exiting the guest?
>> 
>> I think all !IN_GUEST_MODE should behave the same, so I was avoiding
>> the use of EXITING_GUEST_MODE and OUTSIDE_GUEST_MODE, which wouldn't be
>> hard to address, but then I'd also have to address
>> READING_SHADOW_PAGE_TABLES, which may complicate the document more than
>> necessary.  I'm not sure we need to address a VCPU exiting guest mode,
>> other than making sure it's clear that a VCPU that exits must check
>> requests before it enters again.
> 
> But the problem is that kvm_make_all_cpus_request() only sends IPIs to
> CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
> about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
> subtlety here which I feel like it's dangerous to paper over.

Right, that needs fixing in the code.

guest_mode is just an optimization that allows us to skip sending the
IPI when the VCPU is known to handle the request as soon as possible.

  IN_GUEST_MODE: we must force VM exit or the request could never be
    handled
  EXITING_GUEST_MODE: another request already forces the VM exit and
    we're just waiting for the VCPU to notice our request
  OUTSIDE_GUEST_MODE: KVM is going to notice our request without any
    intervention
  READING_SHADOW_PAGE_TABLES: same as OUTSIDE_GUEST_MODE -- rename to
    unwieldly OUTSIDE_GUEST_MODE_READING_SHADOW_PAGE_TABLES?

The kick is needed only in IN_GUEST_MODE and a wake up is needed in case
where the guest is halted OUTSIDE_GUEST_MODE ...
Hm, maybe we should add a halt state too?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-05 13:10       ` Radim Krčmář
@ 2017-04-05 17:39         ` Christoffer Dall
  2017-04-05 18:30           ` Paolo Bonzini
  2017-04-05 20:20           ` Radim Krčmář
  0 siblings, 2 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-05 17:39 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Wed, Apr 05, 2017 at 03:10:50PM +0200, Radim Krčmář wrote:
> 2017-04-04 18:41+0200, Andrew Jones:
> > On Tue, Apr 04, 2017 at 05:30:14PM +0200, Christoffer Dall wrote:
> >> On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
> >> > From: Radim Krčmář <rkrcmar@redhat.com>
> >> > 
> >> > A first step in vcpu->requests encapsulation.
> >> 
> >> Could we have a note here on why we need to access vcpu->requests using
> >> READ_ONCE now?
> > 
> > Sure, maybe we should put the note as a comment above the read in
> > kvm_request_pending().  Something like
> > 
> >  /*
> >   * vcpu->requests reads may appear in sequences that have strict
> >   * data or control dependencies.  Use READ_ONCE() to ensure the
> >   * compiler does not do anything that breaks the required ordering.
> >   */
> > 
> > Radim?
> 
> Uses of vcpu->requests should already have barriers that take care of
> the ordering.  I think the main reason for READ_ONCE() is to tell
> programmers that requests are special, but predictable.

I don't know what to do with "special, but predictable", unfortunately.
In fact, I don't even think I know what you mean.

> 
> READ_ONCE() is not necessary in any use I'm aware of, but there is no
> harm in telling the compiler that vcpu->requests are what we think they
> are ...

Hmmm, I'm equally lost.

> 
>  /*
>   * vcpu->requests are a lockless synchronization mechanism, where

is the requests a synchronization mechanism?  I think of it more as a
cross-thread communication protocol.

>   * memory barriers are necessary for correct behavior, see
>   * Documentation/virtual/kvm/vcpu-requests.rst.
>   *
>   * READ_ONCE() is not necessary for correctness, but simplifies
>   * reasoning by constricting the generated code.
>   */
> 
> I considered READ_ONCE() to be self-documenting. :)

I realize that I'm probably unusually slow in this whole area, but using
READ_ONCE() where unnecessary doesn't help my reasoning, but makes me
wonder which part of this I didn't understand, so I don't seem to agree
with the statement that it simplifies reasoning.

Really, if there is no reason to use it, I don't think we should use it.
To me, READ_ONCE() indicates that there's some flow in the code where
it's essential that the compiler doesn't generate multiple loads, but
that we only see a momentary single-read snapshot of the value, and this
doesn't seem to be the case.

Thanks,
-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-05 14:11         ` Radim Krčmář
@ 2017-04-05 17:45           ` Christoffer Dall
  2017-04-05 18:29             ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-05 17:45 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Wed, Apr 05, 2017 at 04:11:40PM +0200, Radim Krčmář wrote:
> 2017-04-04 19:23+0200, Christoffer Dall:
> > On Tue, Apr 04, 2017 at 07:06:00PM +0200, Andrew Jones wrote:
> >> On Tue, Apr 04, 2017 at 05:24:03PM +0200, Christoffer Dall wrote:
> >> > On Fri, Mar 31, 2017 at 06:06:51PM +0200, Andrew Jones wrote:
> >> > > +and will definitely see the request, or is outside guest mode, but has yet
> >> > > +to do its final request check, and therefore when it does, it will see the
> >> > > +request, then things will work.  However, the transition from outside to
> >> > > +inside guest mode, after the last request check has been made, opens a
> >> > > +window where a request could be made, but the VCPU would not see until it
> >> > > +exits guest mode some time later.  See the table below.
> >> > 
> >> > This text, and the table below, only deals with the details of entering
> >> > the guest.  Should we talk about kvm_vcpu_exiting_guest_mode() and
> >> > anything related to exiting the guest?
> >> 
> >> I think all !IN_GUEST_MODE should behave the same, so I was avoiding
> >> the use of EXITING_GUEST_MODE and OUTSIDE_GUEST_MODE, which wouldn't be
> >> hard to address, but then I'd also have to address
> >> READING_SHADOW_PAGE_TABLES, which may complicate the document more than
> >> necessary.  I'm not sure we need to address a VCPU exiting guest mode,
> >> other than making sure it's clear that a VCPU that exits must check
> >> requests before it enters again.
> > 
> > But the problem is that kvm_make_all_cpus_request() only sends IPIs to
> > CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
> > about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
> > subtlety here which I feel like it's dangerous to paper over.
> 
> Right, that needs fixing in the code.

Really?  I thought Paolo said that this is the intended behavior and
semantics; non-urgent requests that should just be serviced before the
next guest entry.

Now I'm confused again.  What did I miss?

> 
> guest_mode is just an optimization that allows us to skip sending the
> IPI when the VCPU is known to handle the request as soon as possible.
> 
>   IN_GUEST_MODE: we must force VM exit or the request could never be
>     handled
>   EXITING_GUEST_MODE: another request already forces the VM exit and
>     we're just waiting for the VCPU to notice our request
>   OUTSIDE_GUEST_MODE: KVM is going to notice our request without any
>     intervention
>   READING_SHADOW_PAGE_TABLES: same as OUTSIDE_GUEST_MODE -- rename to
>     unwieldly OUTSIDE_GUEST_MODE_READING_SHADOW_PAGE_TABLES?

Again, I thought Paolo was arguing that EXITING_GUEST_MODE makes the
whole thing work because you check that after checking requests?

> 
> The kick is needed only in IN_GUEST_MODE and a wake up is needed in case
> where the guest is halted OUTSIDE_GUEST_MODE ...

> Hm, maybe we should add a halt state too?

Wouldn't that be swait_active(&vcpu->wq) ?   You could add a wrapper
though.

What I think you need is a way to distinguish the semantics of calling
kvm_make_all_cpus_request(), perhaps by adding a 'bool wake_up'
parameter.

I also feel like it would be more reliable or easier to understand if
kvm_make_all_cpus_request() called kvm_vcpu_kick() somehow, but there
may be such an established and understood use of the differences between
the two by other architectures that it's worse to introduce the churn of
changing it.  I don't know.

Thanks,
-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-05 17:45           ` Christoffer Dall
@ 2017-04-05 18:29             ` Paolo Bonzini
  2017-04-05 20:46               ` Radim Krčmář
  2017-04-06 14:27               ` Christoffer Dall
  0 siblings, 2 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-05 18:29 UTC (permalink / raw)
  To: Christoffer Dall, Radim Krčmář
  Cc: Andrew Jones, kvmarm, kvm, marc.zyngier



On 05/04/2017 19:45, Christoffer Dall wrote:
>>> But the problem is that kvm_make_all_cpus_request() only sends IPIs to
>>> CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
>>> about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
>>> subtlety here which I feel like it's dangerous to paper over.
>> Right, that needs fixing in the code.
> Really?  I thought Paolo said that this is the intended behavior and
> semantics; non-urgent requests that should just be serviced before the
> next guest entry.

Indeed, that's right...

>> guest_mode is just an optimization that allows us to skip sending the
>> IPI when the VCPU is known to handle the request as soon as possible.
>>
>>   IN_GUEST_MODE: we must force VM exit or the request could never be
>>     handled
>>   EXITING_GUEST_MODE: another request already forces the VM exit and
>>     we're just waiting for the VCPU to notice our request
>>   OUTSIDE_GUEST_MODE: KVM is going to notice our request without any
>>     intervention
>>   READING_SHADOW_PAGE_TABLES: same as OUTSIDE_GUEST_MODE -- rename to
>>     unwieldly OUTSIDE_GUEST_MODE_READING_SHADOW_PAGE_TABLES?
> Again, I thought Paolo was arguing that EXITING_GUEST_MODE makes the
> whole thing work because you check that after checking requests?

... but apparently I was wrong here, see my email from this morning.

>> The kick is needed only in IN_GUEST_MODE and a wake up is needed in case
>> where the guest is halted OUTSIDE_GUEST_MODE ...
>> Hm, maybe we should add a halt state too?
> Wouldn't that be swait_active(&vcpu->wq) ?   You could add a wrapper
> though.

Yes.  I think the wrapper is probably unnecessary.

> What I think you need is a way to distinguish the semantics of calling
> kvm_make_all_cpus_request(), perhaps by adding a 'bool wake_up'
> parameter.

That would be fine.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-05 17:39         ` Christoffer Dall
@ 2017-04-05 18:30           ` Paolo Bonzini
  2017-04-05 20:20           ` Radim Krčmář
  1 sibling, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-05 18:30 UTC (permalink / raw)
  To: Christoffer Dall, Radim Krčmář; +Cc: marc.zyngier, kvmarm, kvm



On 05/04/2017 19:39, Christoffer Dall wrote:
>> Uses of vcpu->requests should already have barriers that take care of
>> the ordering.  I think the main reason for READ_ONCE() is to tell
>> programmers that requests are special, but predictable.
> 
> I don't know what to do with "special, but predictable", unfortunately.
> In fact, I don't even think I know what you mean.

I don't think it's special, but predictable.  It's atomic, but with
relaxed ordering.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-05 17:39         ` Christoffer Dall
  2017-04-05 18:30           ` Paolo Bonzini
@ 2017-04-05 20:20           ` Radim Krčmář
  2017-04-06 12:02             ` Andrew Jones
  2017-04-06 14:25             ` Christoffer Dall
  1 sibling, 2 replies; 85+ messages in thread
From: Radim Krčmář @ 2017-04-05 20:20 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

2017-04-05 19:39+0200, Christoffer Dall:
> On Wed, Apr 05, 2017 at 03:10:50PM +0200, Radim Krčmář wrote:
>> 2017-04-04 18:41+0200, Andrew Jones:
>> > On Tue, Apr 04, 2017 at 05:30:14PM +0200, Christoffer Dall wrote:
>> >> On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
>> >> > From: Radim Krčmář <rkrcmar@redhat.com>
>> >> > 
>> >> > A first step in vcpu->requests encapsulation.
>> >> 
>> >> Could we have a note here on why we need to access vcpu->requests using
>> >> READ_ONCE now?
>> > 
>> > Sure, maybe we should put the note as a comment above the read in
>> > kvm_request_pending().  Something like
>> > 
>> >  /*
>> >   * vcpu->requests reads may appear in sequences that have strict
>> >   * data or control dependencies.  Use READ_ONCE() to ensure the
>> >   * compiler does not do anything that breaks the required ordering.
>> >   */
>> > 
>> > Radim?
>> 
>> Uses of vcpu->requests should already have barriers that take care of
>> the ordering.  I think the main reason for READ_ONCE() is to tell
>> programmers that requests are special, but predictable.
> 
> I don't know what to do with "special, but predictable", unfortunately.
> In fact, I don't even think I know what you mean.

With "special" to stand for the idea that vcpu->requests can change
outside of the current execution thread.  Letting the programmer assume
additional guarantees makes the generated code and resulting behavior
more predictable.

>> READ_ONCE() is not necessary in any use I'm aware of, but there is no
>> harm in telling the compiler that vcpu->requests are what we think they
>> are ...
> 
> Hmmm, I'm equally lost.

vcpu->requests are volatile, so we need to assume that they can change
at any moment when using them.

I would prefer if vcpu->requests were of an atomic type and READ_ONCE()
is about as close we can get without a major overhaul.

>> 
>>  /*
>>   * vcpu->requests are a lockless synchronization mechanism, where
> 
> is the requests a synchronization mechanism?  I think of it more as a
> cross-thread communication protocol.

Partly, synchronization is too restrictive and communication seems too
generic, but probably still better.  No idea how to shortly describe the
part of vcpu->requests that prevents VM entry and that setting a request
kicks VM out of guest mode.

x86 uses KVM_REQ_MCLOCK_INPROGRESS for synchronization between cores and
the use in this series looked very similar.

>>   * memory barriers are necessary for correct behavior, see
>>   * Documentation/virtual/kvm/vcpu-requests.rst.
>>   *
>>   * READ_ONCE() is not necessary for correctness, but simplifies
>>   * reasoning by constricting the generated code.
>>   */
>> 
>> I considered READ_ONCE() to be self-documenting. :)
> 
> I realize that I'm probably unusually slow in this whole area, but using
> READ_ONCE() where unnecessary doesn't help my reasoning, but makes me
> wonder which part of this I didn't understand, so I don't seem to agree
> with the statement that it simplifies reasoning.

No, I think it is a matter of approach.  When I see a READ_ONCE()
without a comment, I think that the programmer was aware that this
memory can change at any time and was defensive about it.

I consider this use to simplify future development:
We think now that READ_ONCE() is not needed, but vcpu->requests is still
volatile and future changes in code might make READ_ONCE() necessary.
Preemptively putting READ_ONCE() there saves us thinking or hard-to-find
bugs.

> Really, if there is no reason to use it, I don't think we should use it.

I am leaning towards READ_ONCE() as the default for implicitly volatile
memory, but commenting why we didn't have to use READ_ONCE() sounds good
too.

> To me, READ_ONCE() indicates that there's some flow in the code where
> it's essential that the compiler doesn't generate multiple loads, but
> that we only see a momentary single-read snapshot of the value, and this
> doesn't seem to be the case.

The compiler can also squash multiple reads together, which is more
dangerous in this case as we would not notice a new requests.  Avoiding
READ_ONCE() requires a better knowledge of the compiler algorithms that
prove which variable can be optimized.

The difference is really minor and I agree that the comment is bad.
The only comment I'm happy with is nothing, though ... even "READ_ONCE()
is not necessary" is wrong as that might change without us noticing.
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-05 18:29             ` Paolo Bonzini
@ 2017-04-05 20:46               ` Radim Krčmář
  2017-04-06 14:29                 ` Christoffer Dall
  2017-04-06 14:27               ` Christoffer Dall
  1 sibling, 1 reply; 85+ messages in thread
From: Radim Krčmář @ 2017-04-05 20:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, Christoffer Dall, kvmarm, kvm

2017-04-05 20:29+0200, Paolo Bonzini:
> On 05/04/2017 19:45, Christoffer Dall wrote:
>>>> But the problem is that kvm_make_all_cpus_request() only sends IPIs to
>>>> CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
>>>> about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
>>>> subtlety here which I feel like it's dangerous to paper over.
>>> Right, that needs fixing in the code.
>> Really?  I thought Paolo said that this is the intended behavior and
>> semantics; non-urgent requests that should just be serviced before the
>> next guest entry.
> 
> Indeed, that's right...
> 
>>> guest_mode is just an optimization that allows us to skip sending the
>>> IPI when the VCPU is known to handle the request as soon as possible.
>>>
>>>   IN_GUEST_MODE: we must force VM exit or the request could never be
>>>     handled
>>>   EXITING_GUEST_MODE: another request already forces the VM exit and
>>>     we're just waiting for the VCPU to notice our request
>>>   OUTSIDE_GUEST_MODE: KVM is going to notice our request without any
>>>     intervention
>>>   READING_SHADOW_PAGE_TABLES: same as OUTSIDE_GUEST_MODE -- rename to
>>>     unwieldly OUTSIDE_GUEST_MODE_READING_SHADOW_PAGE_TABLES?
>> Again, I thought Paolo was arguing that EXITING_GUEST_MODE makes the
>> whole thing work because you check that after checking requests?
> 
> ... but apparently I was wrong here, see my email from this morning.

Ok, I'll prepare a patch that uses kvm_arch_vcpu_should_kick() in
kvm_make_all_cpus_request().

>>> The kick is needed only in IN_GUEST_MODE and a wake up is needed in case
>>> where the guest is halted OUTSIDE_GUEST_MODE ...
>>> Hm, maybe we should add a halt state too?
>> Wouldn't that be swait_active(&vcpu->wq) ?   You could add a wrapper
>> though.
> 
> Yes.  I think the wrapper is probably unnecessary.

The state would only make sense if it were different -- there is a time
when the VCPU is halted, but swait_active(&vcpu->wq) returns false.
Probably doesn't have any good use now, though.

>> What I think you need is a way to distinguish the semantics of calling
>> kvm_make_all_cpus_request(), perhaps by adding a 'bool wake_up'
>> parameter.
> 
> That would be fine.

I think the wakeup is more tied to the request type than to the place
calling it, so we could say which requests are "lazy" and don't do
wakeup and have kvm_make_all_cpus_request() without an extra argument.

In the end, I would like if kvm_make_all_cpus_request() was an optimized
variant of

  kvm_for_each_vcpu(i, vcpu, kvm) {
    kvm_make_request(vcpu, request);
    kvm_vcpu_kick(vcpu);
  }

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-03-31 16:06 ` [PATCH v2 2/9] KVM: Add documentation for VCPU requests Andrew Jones
  2017-04-04 15:24   ` Christoffer Dall
@ 2017-04-06 10:18   ` Christian Borntraeger
  2017-04-06 12:08     ` Andrew Jones
  2017-04-06 12:29     ` Radim Krčmář
  1 sibling, 2 replies; 85+ messages in thread
From: Christian Borntraeger @ 2017-04-06 10:18 UTC (permalink / raw)
  To: Andrew Jones, kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

On 03/31/2017 06:06 PM, Andrew Jones wrote:
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++
>  1 file changed, 114 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> 
> diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
> new file mode 100644
> index 000000000000..ea4a966d5c8a
> --- /dev/null
> +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> @@ -0,0 +1,114 @@
> +=================
> +KVM VCPU Requests
> +=================
> +
> +Overview
> +========
> +
> +KVM supports an internal API enabling threads to request a VCPU thread to
> +perform some activity.  For example, a thread may request a VCPU to flush
> +its TLB with a VCPU request.  The API consists of only four calls::
> +
> +  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
> +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /* Check if any requests are pending for VCPU @vcpu. */
> +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> +
> +  /* Make request @req of VCPU @vcpu. */
> +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> +
> +Typically a requester wants the VCPU to perform the activity as soon
> +as possible after making the request.  This means most requests,
> +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
> +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> +into it.
> +
> +VCPU Kicks
> +----------
> +
> +A VCPU kick does one of three things:
> +
> + 1) wakes a sleeping VCPU (which sleeps outside guest mode).
> + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
> +    out.
> + 3) nothing, when the VCPU is already outside guest mode and not sleeping.
> +
> +VCPU Request Internals
> +======================
> +
> +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
> +may also be used.  The first 8 bits are reserved for architecture
> +independent requests, all additional bits are available for architecture
> +dependent requests.
> +
> +VCPU Requests with Associated State
> +===================================
> +
> +Requesters that want the requested VCPU to handle new state need to ensure
> +the state is observable to the requested VCPU thread's CPU at the time the
> +CPU observes the request.  This means a write memory barrier should be
> +insert between the preparation of the state and the write of the VCPU
> +request bitmap.  Additionally, on the requested VCPU thread's side, a
> +corresponding read barrier should be issued after reading the request bit
> +and before proceeding to use the state associated with it.  See the kernel
> +memory barrier documentation [2].
> +
> +VCPU Requests and Guest Mode
> +============================

FWIW, s390 does not implement the guest mode. Maybe add some words that not
all architectures implement that? Or do we expect Radims rework soon?

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-05 20:20           ` Radim Krčmář
@ 2017-04-06 12:02             ` Andrew Jones
  2017-04-06 14:37               ` Christoffer Dall
  2017-04-06 14:25             ` Christoffer Dall
  1 sibling, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-06 12:02 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Christoffer Dall, kvmarm, kvm, marc.zyngier, pbonzini

On Wed, Apr 05, 2017 at 10:20:17PM +0200, Radim Krčmář wrote:
> 2017-04-05 19:39+0200, Christoffer Dall:
> > On Wed, Apr 05, 2017 at 03:10:50PM +0200, Radim Krčmář wrote:
> >> 2017-04-04 18:41+0200, Andrew Jones:
> >> > On Tue, Apr 04, 2017 at 05:30:14PM +0200, Christoffer Dall wrote:
> >> >> On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
> >> >> > From: Radim Krčmář <rkrcmar@redhat.com>
> >> >> > 
> >> >> > A first step in vcpu->requests encapsulation.
> >> >> 
> >> >> Could we have a note here on why we need to access vcpu->requests using
> >> >> READ_ONCE now?
> >> > 
> >> > Sure, maybe we should put the note as a comment above the read in
> >> > kvm_request_pending().  Something like
> >> > 
> >> >  /*
> >> >   * vcpu->requests reads may appear in sequences that have strict
> >> >   * data or control dependencies.  Use READ_ONCE() to ensure the
> >> >   * compiler does not do anything that breaks the required ordering.
> >> >   */
> >> > 
> >> > Radim?
> >> 
> >> Uses of vcpu->requests should already have barriers that take care of
> >> the ordering.  I think the main reason for READ_ONCE() is to tell
> >> programmers that requests are special, but predictable.
> > 
> > I don't know what to do with "special, but predictable", unfortunately.
> > In fact, I don't even think I know what you mean.
> 
> With "special" to stand for the idea that vcpu->requests can change
> outside of the current execution thread.  Letting the programmer assume
> additional guarantees makes the generated code and resulting behavior
> more predictable.
> 
> >> READ_ONCE() is not necessary in any use I'm aware of, but there is no
> >> harm in telling the compiler that vcpu->requests are what we think they
> >> are ...
> > 
> > Hmmm, I'm equally lost.
> 
> vcpu->requests are volatile, so we need to assume that they can change
> at any moment when using them.
> 
> I would prefer if vcpu->requests were of an atomic type and READ_ONCE()
> is about as close we can get without a major overhaul.
> 
> >> 
> >>  /*
> >>   * vcpu->requests are a lockless synchronization mechanism, where
> > 
> > is the requests a synchronization mechanism?  I think of it more as a
> > cross-thread communication protocol.
> 
> Partly, synchronization is too restrictive and communication seems too
> generic, but probably still better.  No idea how to shortly describe the
> part of vcpu->requests that prevents VM entry and that setting a request
> kicks VM out of guest mode.
> 
> x86 uses KVM_REQ_MCLOCK_INPROGRESS for synchronization between cores and
> the use in this series looked very similar.
> 
> >>   * memory barriers are necessary for correct behavior, see
> >>   * Documentation/virtual/kvm/vcpu-requests.rst.
> >>   *
> >>   * READ_ONCE() is not necessary for correctness, but simplifies
> >>   * reasoning by constricting the generated code.
> >>   */
> >> 
> >> I considered READ_ONCE() to be self-documenting. :)
> > 
> > I realize that I'm probably unusually slow in this whole area, but using
> > READ_ONCE() where unnecessary doesn't help my reasoning, but makes me
> > wonder which part of this I didn't understand, so I don't seem to agree
> > with the statement that it simplifies reasoning.
> 
> No, I think it is a matter of approach.  When I see a READ_ONCE()
> without a comment, I think that the programmer was aware that this
> memory can change at any time and was defensive about it.
> 
> I consider this use to simplify future development:
> We think now that READ_ONCE() is not needed, but vcpu->requests is still
> volatile and future changes in code might make READ_ONCE() necessary.
> Preemptively putting READ_ONCE() there saves us thinking or hard-to-find
> bugs.
> 
> > Really, if there is no reason to use it, I don't think we should use it.
> 
> I am leaning towards READ_ONCE() as the default for implicitly volatile
> memory, but commenting why we didn't have to use READ_ONCE() sounds good
> too.
> 
> > To me, READ_ONCE() indicates that there's some flow in the code where
> > it's essential that the compiler doesn't generate multiple loads, but
> > that we only see a momentary single-read snapshot of the value, and this
> > doesn't seem to be the case.
> 
> The compiler can also squash multiple reads together, which is more
> dangerous in this case as we would not notice a new requests.  Avoiding
> READ_ONCE() requires a better knowledge of the compiler algorithms that
> prove which variable can be optimized.
> 
> The difference is really minor and I agree that the comment is bad.
> The only comment I'm happy with is nothing, though ... even "READ_ONCE()
> is not necessary" is wrong as that might change without us noticing.

FWIW, I first suggested using READ_ONCE() for the freshness argument,
but then started to believe there were even more reasons after reading
the comment above it in include/linux/compiler.h.  The last paragraph
of that comment says there are two major use cases for it.  I think the
first one maps to the freshness argument.  The second one is

 (2) Ensuring that the compiler does not  fold, spindle, or otherwise
     mutilate accesses that either do not require ordering or that
     interact with an explicit memory barrier or atomic instruction that
     provides the required ordering.

Documentation/memory-barriers.txt seemed to agree with that, as it's
full of READ/WRITE_ONCE's, and statements saying they are not optional.
However, reading it closer (how many times have I tried to read it
closer?), I can't see any pattern where they're required that we need
to be concerned about.  I think the future-proofing / freshness argument
still stands though, and I also like that it flags the variable as
"special".

I think I actually prefer no comment now too, but the commit message
should get a sentence or two explaining why it got thrown in.

Thanks,
drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-06 10:18   ` Christian Borntraeger
@ 2017-04-06 12:08     ` Andrew Jones
  2017-04-06 12:29     ` Radim Krčmář
  1 sibling, 0 replies; 85+ messages in thread
From: Andrew Jones @ 2017-04-06 12:08 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: kvmarm, kvm, cdall, marc.zyngier, pbonzini, rkrcmar

On Thu, Apr 06, 2017 at 12:18:02PM +0200, Christian Borntraeger wrote:
> On 03/31/2017 06:06 PM, Andrew Jones wrote:
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  Documentation/virtual/kvm/vcpu-requests.rst | 114 ++++++++++++++++++++++++++++
> >  1 file changed, 114 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> > 
> > diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
> > new file mode 100644
> > index 000000000000..ea4a966d5c8a
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> > @@ -0,0 +1,114 @@
> > +=================
> > +KVM VCPU Requests
> > +=================
> > +
> > +Overview
> > +========
> > +
> > +KVM supports an internal API enabling threads to request a VCPU thread to
> > +perform some activity.  For example, a thread may request a VCPU to flush
> > +its TLB with a VCPU request.  The API consists of only four calls::
> > +
> > +  /* Check if VCPU @vcpu has request @req pending. Clears the request. */
> > +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /* Check if any requests are pending for VCPU @vcpu. */
> > +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> > +
> > +  /* Make request @req of VCPU @vcpu. */
> > +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> > +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> > +
> > +Typically a requester wants the VCPU to perform the activity as soon
> > +as possible after making the request.  This means most requests,
> > +kvm_make_request() calls, are followed by a call to kvm_vcpu_kick(),
> > +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> > +into it.
> > +
> > +VCPU Kicks
> > +----------
> > +
> > +A VCPU kick does one of three things:
> > +
> > + 1) wakes a sleeping VCPU (which sleeps outside guest mode).
> > + 2) sends an IPI to a VCPU currently in guest mode, in order to bring it
> > +    out.
> > + 3) nothing, when the VCPU is already outside guest mode and not sleeping.
> > +
> > +VCPU Request Internals
> > +======================
> > +
> > +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> > +means general bitops[1], e.g. clear_bit(KVM_REQ_UNHALT, &vcpu->requests),
> > +may also be used.  The first 8 bits are reserved for architecture
> > +independent requests, all additional bits are available for architecture
> > +dependent requests.
> > +
> > +VCPU Requests with Associated State
> > +===================================
> > +
> > +Requesters that want the requested VCPU to handle new state need to ensure
> > +the state is observable to the requested VCPU thread's CPU at the time the
> > +CPU observes the request.  This means a write memory barrier should be
> > +insert between the preparation of the state and the write of the VCPU
> > +request bitmap.  Additionally, on the requested VCPU thread's side, a
> > +corresponding read barrier should be issued after reading the request bit
> > +and before proceeding to use the state associated with it.  See the kernel
> > +memory barrier documentation [2].
> > +
> > +VCPU Requests and Guest Mode
> > +============================
> 
> FWIW, s390 does not implement the guest mode. Maybe add some words that not
> all architectures implement that? Or do we expect Radims rework soon?
> 
>

OK, I'll try to word this in such a way to point out that this is an
arch-specific thing.

Thanks,
drew 

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-06 10:18   ` Christian Borntraeger
  2017-04-06 12:08     ` Andrew Jones
@ 2017-04-06 12:29     ` Radim Krčmář
  1 sibling, 0 replies; 85+ messages in thread
From: Radim Krčmář @ 2017-04-06 12:29 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Andrew Jones, kvmarm, kvm, cdall, marc.zyngier, pbonzini

2017-04-06 12:18+0200, Christian Borntraeger:
> On 03/31/2017 06:06 PM, Andrew Jones wrote:
>> Signed-off-by: Andrew Jones <drjones@redhat.com>
>> +VCPU Requests and Guest Mode
>> +============================
> 
> FWIW, s390 does not implement the guest mode. Maybe add some words that not
> all architectures implement that? Or do we expect Radims rework soon?

Yes, mention that it is an optional optimization sounds good.
I won't be adding guest->mode to s390 (at least in the beginning).
This means that kvm_arch_vcpu_should_kick() should return true.

In kvm_make_all_cpus_request(), there is

  kvm_vcpu_exiting_guest_mode(vcpu) != OUTSIDE_GUEST_MODE

that ought to be replaced with kvm_arch_vcpu_should_kick(), and it
should return true if the arch doesn't implement guest->mode.
But on s390, the condition currently always returns false ... is that
IPI-less behavior of kvm_make_all_cpus_request intended on s390?

Thanks.

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-05 11:37             ` Paolo Bonzini
@ 2017-04-06 14:14               ` Christoffer Dall
  2017-04-07 11:47                 ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-06 14:14 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, kvmarm, kvm

On Wed, Apr 05, 2017 at 01:37:10PM +0200, Paolo Bonzini wrote:
> 
> 
> On 05/04/2017 09:09, Christoffer Dall wrote:
> >>>  - In the explanation you wrote, you use the term 'we' a lot, but when
> >>>    talking about SMP barriers, I think it only makes sense to talk about
> >>>    actions and observations between multiple CPUs and we have to be
> >>>    specific about which CPU observes or does what with respect to the
> >>>    other.  Maybe I'm being a stickler here, but there something here
> >>>    which is making me uneasy.
> >> The write1-mb-if(read2) / write2-mb-if(read1) pattern is pretty common,
> >> so I think it is justified to cut the ordering on the reasoning and just
> >> focus on what the two memory locations and conditions mean.
> > ok, but the pattern above was not common to me (and I'm pretty sure I'm
> > not the only fool in the bunch here), so if we can reference something
> > that explains that this is a known pattern which has been tried and
> > proven, that would be even better.
> 
> I found https://lwn.net/Articles/573436/ which shows this example:
> 
>   CPU 0					CPU 1
>   ---------------------			----------------------
>   WRITE_ONCE(x, 1);			WRITE_ONCE(y, 1);
>   smp_mb();				smp_mb();
>   r2 = READ_ONCE(y);			r4 = READ_ONCE(x);
> 
> And says that it is a bug if r2 == 0 && r4 == 0.  This is exactly what 
> happens in KVM:
> 
>   CPU 0					CPU 1
>   ---------------------			----------------------
>   vcpu->mode = IN_GUEST_MODE;		kvm_make_request(REQ, vcpu);
>   smp_mb();				smp_mb();
>   r2 = kvm_request_pending(vcpu)	r4 = (vcpu->mode == IN_GUEST_MODE)
>   if (r2)				if (r4)
> 	abort entry				kick();
> 
> If r2 sees no request and r4 doesn't kick there would be a bug.
> But why can't this happen?
> 
> - if no request is pending at the time of the read to r2, CPU 1 must
> not have executed kvm_make_request yet.  In CPU 0, kvm_request_pending
> must happen after vcpu->mode is set to IN_GUEST_MODE, therefore CPU 1
> will read IN_GUEST_MODE and kick.
> 
> - if no kick happens in CPU 1, CPU 0 must not have set vcpu->mode yet.
> In CPU 1, vcpu->mode is read after setting the request bit, therefore
> CPU 0 will see the request bit and abort the guest entry.
> 
> >>>  - Finally, it feels very hard to prove the correctness of this, and
> >>>    equally hard to test it (given how long we've been running with
> >>>    apparently racy code).  I would hope that we could abstract some of
> >>>    this into architecture generic things, that someone who eat memory
> >>>    barriers for breakfast could help us verify, but again, maybe this is
> >>>    Radim's series I'm asking for here.
> >>
> >> What I can do here is to suggest copying the paradigms from x86, which
> >> is quite battle tested (Windows hammers it really hard).
> >
> > That sounds reasonable, but I think part of the problem was that we
> > simply didn't understand what the paradigms were (see the
> > kvm_make_all_cpus_request above as an example), so Drew's action about
> > documenting what this all is and the constraints of using it is really
> > important for me to do that.
> 
> Yes, totally agreed on that.
> 
> >> For QEMU I did use model checking in the past for some similarly hairy
> >> synchronization code, but that is really just "executable documentation"
> >> because the model is not written in C.
> >>
> > I played with using blast on some of the KVM/ARM code a long time ago,
> > and while I was able to find a bug with it, it was sort of an obvious
> > bug, and the things I was able to do with it was pretty limited to the
> > problems I could imagine myself anyhow.  Perhaps this is what you mean
> > with executable documentation.
> 
> I prepared three examples of a spin model for KVM vCPU kicking, and
> the outcome was actually pretty surprising: the mode check seems not
> to be necessary.
> 
> I haven't covered all x86 cases so I'm not going to remove it right
> ahead, but for ARM it really seems like EXITING_GUEST_MODE is nothing
> but an optimization of consecutive kvm_vcpu_kicks.
> 
> All three models can use C preprocessor #defines to inject bugs:
> 
> - kvm-arm-pause.promela: the "paused" mechanism; the model proves that
>   the "paused" test in the interrupt-disabled region is necessary
> 
> - kvm-req.promela: the requests mechanism; the model proves that
>   the requests check in the interrupt-disabled region is necessary
> 
> - kvm-x86-pi.promela: the x86 posted interrupt mechanism (simplified
>   a bit); the model proves that KVM must disable interrupts before
>   checking for interrupts injected while outside guest mode
>   (commit b95234c84004, "kvm: x86: do not use KVM_REQ_EVENT for APICv
>   interrupt injection", 2017-02-15)
> 
> So it seems like there are no races after all in KVM/ARM code

No races after Drew's fix has been applied to set vcpu->mode =
IN_GUEST_MODE, before checking the pause flag, correct?  (I think that's
what the spin model below is modeling).

(Currently, we have a window between checking the pause flag for the
last time and etting mode = IN_GUEST_MODE, where we would loose IPIs and
not check any variables.)


> , though
> the code can still be cleaned up.  And I have been convinced of the wrong
> thing all the time. :)
> 
> But why is KVM/ARM using KVM_REQ_VCPU_EXIT
> just fine without checking for requests (kvm-req.promela)?  Because,
> as mentioned earlier in the thread, KVM/ARM is using kvm_make_all_vcpus_request
> simply to kick all VCPUs.  The paused variable _is_ checked after disabling
> interrupts, and that is fine.
> 
> After this experiment, I think I like Drew's KVM_REQ_PAUSE more than I did
> yesterday.  However, yet another alternative is to leave pause/power_off as
> they are, while taking some inspiration from his patch to do some cleanups:
> 
> 1) change the "if"
> 
>                 if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
>                         vcpu->arch.power_off || vcpu->arch.pause) {
> 
> to test kvm_requests_pending instead of pause/power_off
> 
> 2) clear KVM_REQ_VCPU_EXIT before the other "if":
> 
>                 if (vcpu->arch.power_off || vcpu->arch.pause)
>                         vcpu_sleep(vcpu);
> 

I like using requests as only requests from one thread to the VCPU
thread, and not to maintain specific state about a VCPU.

The benefit of Drew's approach is that since these pieces of state are
boolean, you can have just a single check in the critical path in the
run loop instead of having to access multiple fields.

I think I'll let Drew decide at this point what he prefers.

> 
> In any case, the no-wakeup behavior of kvm_make_all_vcpus_request suits
> either use of requests (KVM_REQ_PAUSE or "fixed" KVM_REQ_VCPU_EXIT).
> 

Agreed.

Thanks,
-Christoffer

> /* To run the model checker:
>  *
>  *      spin -a kvm-arm-pause.promela
>  *      gcc -O2 pan.c
>  *      ./a.out -a -f
>  *
>  * Remove the tests using -DREMOVE_MODE_TEST, -DREMOVE_PAUSED_TEST
>  * right after -a.  The mode test is not necessary, the paused test is.
>  */
> #define OUTSIDE_GUEST_MODE      0
> #define IN_GUEST_MODE           1
> #define EXITING_GUEST_MODE      2
> 
> bool kick;
> bool paused;
> int vcpu_mode = OUTSIDE_GUEST_MODE;
> 
> active proctype vcpu_run()
> {
>     do
>         :: true -> {
>             /* In paused state, sleep with interrupts on */
>             if
>                 :: !paused -> skip;
>             fi;
> 
>             /* IPIs are eaten until interrupts are turned off.  */
>             kick = 0;
> 
>             /* Interrupts are now off. */
>             vcpu_mode = IN_GUEST_MODE;
> 
>             if
> #ifndef REMOVE_MODE_TEST
>                 :: vcpu_mode != IN_GUEST_MODE -> skip;
> #endif
> #ifndef REMOVE_PAUSED_TEST
>                 :: paused -> skip;
> #endif
>                 :: else -> {
>                     do
>                         /* Stay in guest mode until an IPI comes */
>                         :: kick -> break;
>                     od;
>                 }
>             fi;
>             vcpu_mode = OUTSIDE_GUEST_MODE;
> 
>             /* Turn on interrupts */
>         }
>     od
> }
> 
> active proctype vcpu_kick()
> {
>     int old;
> 
>     do
>         :: true -> {
>             paused = 1;
>             /* cmpxchg */
>             atomic {
>                 old = vcpu_mode;
>                 if
>                     :: vcpu_mode == IN_GUEST_MODE ->
>                         vcpu_mode = EXITING_GUEST_MODE;
>                     :: else -> skip;
>                 fi;
>             }
> 
>             if
>                 :: old == IN_GUEST_MODE -> kick = 1;
>                 :: else -> skip;
>             fi;
> 
>             if
>                :: vcpu_mode == OUTSIDE_GUEST_MODE -> paused = 0;
>             fi; 
>         }
>     od;
> }
> 
> never {
>     do
>        /* After an arbitrarily long prefix */
>        :: 1 -> skip;
> 
>        /* if we get a pause request */
>        :: paused -> break;
>     od; 
> 
> accept:
>     /* we must eventually leave guest mode (this condition is reversed!) */
>     do
>        :: vcpu_mode != OUTSIDE_GUEST_MODE
>     od; 
> }

> /* To run the model checker:
>  *
>  *      spin -a kvm-req.promela
>  *      gcc -O2 pan.c
>  *      ./a.out -a -f
>  *
>  * Remove the tests using -DREMOVE_MODE_TEST, -DREMOVE_REQ_TEST
>  * right after -a.  The mode test is not necessary, the vcpu_req test is.
>  */
> #define OUTSIDE_GUEST_MODE      0
> #define IN_GUEST_MODE           1
> #define EXITING_GUEST_MODE      2
> 
> bool kick;
> int vcpu_req;
> int vcpu_mode = OUTSIDE_GUEST_MODE;
> 
> active proctype vcpu_run()
> {
>     do
>         :: true -> {
>             /* Requests are processed with interrupts on */
>             vcpu_req = 0;
> 
>             /* IPIs are eaten until interrupts are turned off.  */
>             kick = 0;
> 
>             /* Interrupts are now off. */
>             vcpu_mode = IN_GUEST_MODE;
> 
>             if
> #ifndef REMOVE_MODE_TEST
>                 :: vcpu_mode != IN_GUEST_MODE -> skip;
> #endif
> #ifndef REMOVE_REQ_TEST
>                 :: vcpu_req -> skip;
> #endif
>                 :: else -> {
>                     do
>                         /* Stay in guest mode until an IPI comes */
>                         :: kick -> break;
>                     od;
>                 }
>             fi;
>             vcpu_mode = OUTSIDE_GUEST_MODE;
> 
>             /* Turn on interrupts */
>         }
>     od
> }
> 
> active proctype vcpu_kick()
> {
>     int old;
> 
>     do
>         :: true -> {
>             vcpu_req = 1;
>             if
>                 :: old == 0 -> {
>                     /* cmpxchg */
>                     atomic {
>                         old = vcpu_mode;
>                         if
>                             :: vcpu_mode == IN_GUEST_MODE ->
>                                 vcpu_mode = EXITING_GUEST_MODE;
>                             :: else -> skip;
>                         fi;
>                     }
> 
>                     if
>                         :: old == IN_GUEST_MODE -> kick = 1;
>                         :: else -> skip;
>                     fi;
>                 }
>                 :: else -> skip;
>             fi;
>         }
>     od;
> }
> 
> never {
>     do
>        /* After an arbitrarily long prefix */
>        :: 1 -> skip;
> 
>        /* we get in guest mode */
>        :: vcpu_mode == IN_GUEST_MODE -> break;
>     od; 
> 
> accept:
>     /* and never leave it (this condition is reversed!) */
>     do
>        :: vcpu_mode != OUTSIDE_GUEST_MODE
>     od; 
> }

> /* To run the model checker:
>  *
>  *      spin -a kvm-x86-pi.promela
>  *      gcc -O2 pan.c
>  *      ./a.out -a -f
>  *
>  * Remove the test using -DREMOVE_MODE_TEST, move the PIR->IRR sync
>  * before local_irq_disable() with SYNC_WITH_INTERRUPTS_ENABLED.  The
>  * mode test is not necessary, while sync_pir_to_irr must be placed
>  * after interrupts are disabled.
>  */
> #define OUTSIDE_GUEST_MODE      0
> #define IN_GUEST_MODE           1
> #define EXITING_GUEST_MODE      2
> 
> bool kick;
> bool posted_interrupt;
> int vcpu_pir;
> int vcpu_mode = OUTSIDE_GUEST_MODE;
> 
> active proctype vcpu_run()
> {
>     do
>         :: true -> {
> #ifdef SYNC_WITH_INTERRUPTS_ENABLED
>             /* Guest interrupts are injected with interrupts off */
>             vcpu_pir = 0;
> #endif
> 
>             /* Both kinds of IPI are eaten until interrupts are turned off.  */
>             atomic {
>                 kick = 0;
>                 posted_interrupt = 0;
>             }
> 
>             /* Interrupts are now off. */
>             vcpu_mode = IN_GUEST_MODE;
> 
> #ifndef SYNC_WITH_INTERRUPTS_ENABLED
>             /* Guest interrupts are injected with interrupts off */
>             vcpu_pir = 0;
> #endif
> 
>             if
> #ifndef REMOVE_MODE_TEST
>                 :: vcpu_mode != IN_GUEST_MODE -> skip;
> #endif
> 
>                 :: else -> {
>                     do
>                         /* Stay in guest mode until an IPI comes */
>                         :: kick -> break;
> 
>                         /* The processor handles the posted interrupt IPI */
>                         :: posted_interrupt -> vcpu_pir = 0;
>                     od;
>                 }
>             fi;
>             vcpu_mode = OUTSIDE_GUEST_MODE;
> 
>             /* Turn on interrupts */
>         }
>     od
> }
> 
> active proctype vcpu_posted_interrupt()
> {
>     int old;
> 
>     do
>         :: vcpu_pir == 0 -> {
>             vcpu_pir = 1;
>             if
>                 :: vcpu_mode == IN_GUEST_MODE ->
>                     /* If in guest mode, we can send a posted interrupt IPI */
>                     posted_interrupt = 1;
> 
>                 :: else -> {
>                     /* Else, do a kvm_vcpu_kick.  */
>                     atomic {
>                         old = vcpu_mode;
>                         if
>                             :: vcpu_mode == IN_GUEST_MODE ->
>                                 vcpu_mode = EXITING_GUEST_MODE;
>                             :: else -> skip;
>                         fi;
>                     }
> 
>                     if
>                         :: old == IN_GUEST_MODE -> kick = 1;
>                         :: else -> skip;
>                     fi;
>                 }
>             fi;
>         }
>     od;
> }
> 
> never {
>     do
>        /* After an arbitrarily long prefix */
>        :: 1 -> skip;
> 
>        /* if we get an interrupt */
>        :: vcpu_pir -> break;
>     od; 
> 
> accept:
>     /* we must eventually inject it (this condition is reversed!) */
>     do
>        :: vcpu_pir
>     od; 
> }

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-05 20:20           ` Radim Krčmář
  2017-04-06 12:02             ` Andrew Jones
@ 2017-04-06 14:25             ` Christoffer Dall
  2017-04-07 13:15               ` Radim Krčmář
  1 sibling, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-06 14:25 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Andrew Jones, kvmarm, kvm, marc.zyngier, pbonzini

On Wed, Apr 05, 2017 at 10:20:17PM +0200, Radim Krčmář wrote:
> 2017-04-05 19:39+0200, Christoffer Dall:
> > On Wed, Apr 05, 2017 at 03:10:50PM +0200, Radim Krčmář wrote:
> >> 2017-04-04 18:41+0200, Andrew Jones:
> >> > On Tue, Apr 04, 2017 at 05:30:14PM +0200, Christoffer Dall wrote:
> >> >> On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
> >> >> > From: Radim Krčmář <rkrcmar@redhat.com>
> >> >> > 
> >> >> > A first step in vcpu->requests encapsulation.
> >> >> 
> >> >> Could we have a note here on why we need to access vcpu->requests using
> >> >> READ_ONCE now?
> >> > 
> >> > Sure, maybe we should put the note as a comment above the read in
> >> > kvm_request_pending().  Something like
> >> > 
> >> >  /*
> >> >   * vcpu->requests reads may appear in sequences that have strict
> >> >   * data or control dependencies.  Use READ_ONCE() to ensure the
> >> >   * compiler does not do anything that breaks the required ordering.
> >> >   */
> >> > 
> >> > Radim?
> >> 
> >> Uses of vcpu->requests should already have barriers that take care of
> >> the ordering.  I think the main reason for READ_ONCE() is to tell
> >> programmers that requests are special, but predictable.
> > 
> > I don't know what to do with "special, but predictable", unfortunately.
> > In fact, I don't even think I know what you mean.
> 
> With "special" to stand for the idea that vcpu->requests can change
> outside of the current execution thread.  Letting the programmer assume
> additional guarantees makes the generated code and resulting behavior
> more predictable.
> 
> >> READ_ONCE() is not necessary in any use I'm aware of, but there is no
> >> harm in telling the compiler that vcpu->requests are what we think they
> >> are ...
> > 
> > Hmmm, I'm equally lost.
> 
> vcpu->requests are volatile, so we need to assume that they can change
> at any moment when using them.
> 
> I would prefer if vcpu->requests were of an atomic type and READ_ONCE()
> is about as close we can get without a major overhaul.
> 

I finally see your point of conveying how things work using READ_ONCE().

If there's really no harm in letting the compiler read this as it wishes
(within the boundaries already placed by our use of compiler and memory
barriers), then I think we should just document that instead of relying
on how people would interpret READ_ONCE, but it's up to you - I think
I'm beginning to understand regardless.

> >> 
> >>  /*
> >>   * vcpu->requests are a lockless synchronization mechanism, where
> > 
> > is the requests a synchronization mechanism?  I think of it more as a
> > cross-thread communication protocol.
> 
> Partly, synchronization is too restrictive and communication seems too
> generic, but probably still better.  No idea how to shortly describe the
> part of vcpu->requests that prevents VM entry and that setting a request
> kicks VM out of guest mode.

heh, neither do I.

> 
> x86 uses KVM_REQ_MCLOCK_INPROGRESS for synchronization between cores and
> the use in this series looked very similar.
> 
> >>   * memory barriers are necessary for correct behavior, see
> >>   * Documentation/virtual/kvm/vcpu-requests.rst.
> >>   *
> >>   * READ_ONCE() is not necessary for correctness, but simplifies
> >>   * reasoning by constricting the generated code.
> >>   */
> >> 
> >> I considered READ_ONCE() to be self-documenting. :)
> > 
> > I realize that I'm probably unusually slow in this whole area, but using
> > READ_ONCE() where unnecessary doesn't help my reasoning, but makes me
> > wonder which part of this I didn't understand, so I don't seem to agree
> > with the statement that it simplifies reasoning.
> 
> No, I think it is a matter of approach.  When I see a READ_ONCE()
> without a comment, I think that the programmer was aware that this
> memory can change at any time and was defensive about it.

I think it means that you have to read it exactly once at the exact flow
in the code where it's placed.

> 
> I consider this use to simplify future development:
> We think now that READ_ONCE() is not needed, but vcpu->requests is still
> volatile and future changes in code might make READ_ONCE() necessary.
> Preemptively putting READ_ONCE() there saves us thinking or hard-to-find
> bugs.
> 

I'm always a bit sceptical about such reasoning as I think without a
complete understanding of what needs to change when doing changes, we're
likely to get it wrong anyway.

> > Really, if there is no reason to use it, I don't think we should use it.
> 
> I am leaning towards READ_ONCE() as the default for implicitly volatile
> memory, but commenting why we didn't have to use READ_ONCE() sounds good
> too.
> 

Sure, I can live with both solutions :)

> > To me, READ_ONCE() indicates that there's some flow in the code where
> > it's essential that the compiler doesn't generate multiple loads, but
> > that we only see a momentary single-read snapshot of the value, and this
> > doesn't seem to be the case.
> 
> The compiler can also squash multiple reads together, which is more
> dangerous in this case as we would not notice a new requests.  Avoiding
> READ_ONCE() requires a better knowledge of the compiler algorithms that
> prove which variable can be optimized.

Isn't that covered by the memory barriers that imply compiler barriers
that we (will) have between checking the mode and the requests variable?

> 
> The difference is really minor and I agree that the comment is bad.
> The only comment I'm happy with is nothing, though ... even "READ_ONCE()
> is not necessary" is wrong as that might change without us noticing.

"READ_ONCE() is not necessary" while actually using READ_ONCE() is a
terrible comment because it makes readers just doubt the correctness of
the code.

Regardless of whether or not we end up using READ_ONCE(), I think we
should document exactly what the requirements are for accessing this
variable at this time, i.e. any assumption about preceding barriers or
other flows of events that we rely on.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-05 18:29             ` Paolo Bonzini
  2017-04-05 20:46               ` Radim Krčmář
@ 2017-04-06 14:27               ` Christoffer Dall
  1 sibling, 0 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-06 14:27 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Radim Krčmář, Andrew Jones, kvmarm, kvm, marc.zyngier

On Wed, Apr 05, 2017 at 08:29:12PM +0200, Paolo Bonzini wrote:
> 
> 
> On 05/04/2017 19:45, Christoffer Dall wrote:
> >>> But the problem is that kvm_make_all_cpus_request() only sends IPIs to
> >>> CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
> >>> about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
> >>> subtlety here which I feel like it's dangerous to paper over.
> >> Right, that needs fixing in the code.
> > Really?  I thought Paolo said that this is the intended behavior and
> > semantics; non-urgent requests that should just be serviced before the
> > next guest entry.
> 
> Indeed, that's right...
> 
> >> guest_mode is just an optimization that allows us to skip sending the
> >> IPI when the VCPU is known to handle the request as soon as possible.
> >>
> >>   IN_GUEST_MODE: we must force VM exit or the request could never be
> >>     handled
> >>   EXITING_GUEST_MODE: another request already forces the VM exit and
> >>     we're just waiting for the VCPU to notice our request
> >>   OUTSIDE_GUEST_MODE: KVM is going to notice our request without any
> >>     intervention
> >>   READING_SHADOW_PAGE_TABLES: same as OUTSIDE_GUEST_MODE -- rename to
> >>     unwieldly OUTSIDE_GUEST_MODE_READING_SHADOW_PAGE_TABLES?
> > Again, I thought Paolo was arguing that EXITING_GUEST_MODE makes the
> > whole thing work because you check that after checking requests?
> 
> ... but apparently I was wrong here, see my email from this morning.
> 
I now managed to understand that e-mail, so it feels like we're
converging in our understanding, which I hope is a good thing - meaning
that we're converging to a correct understanding :)

-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-05 20:46               ` Radim Krčmář
@ 2017-04-06 14:29                 ` Christoffer Dall
  2017-04-07 11:44                   ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-06 14:29 UTC (permalink / raw)
  To: Radim Krčmář; +Cc: marc.zyngier, Paolo Bonzini, kvmarm, kvm

On Wed, Apr 05, 2017 at 10:46:07PM +0200, Radim Krčmář wrote:
> 2017-04-05 20:29+0200, Paolo Bonzini:
> > On 05/04/2017 19:45, Christoffer Dall wrote:
> >>>> But the problem is that kvm_make_all_cpus_request() only sends IPIs to
> >>>> CPUs where the mode was different from OUTSIDE_GUEST_MODE, so there it's
> >>>> about !OUTSIDE_GUEST_MODE rather than !IN_GUEST_MODE, so there's some
> >>>> subtlety here which I feel like it's dangerous to paper over.
> >>> Right, that needs fixing in the code.
> >> Really?  I thought Paolo said that this is the intended behavior and
> >> semantics; non-urgent requests that should just be serviced before the
> >> next guest entry.
> > 
> > Indeed, that's right...
> > 
> >>> guest_mode is just an optimization that allows us to skip sending the
> >>> IPI when the VCPU is known to handle the request as soon as possible.
> >>>
> >>>   IN_GUEST_MODE: we must force VM exit or the request could never be
> >>>     handled
> >>>   EXITING_GUEST_MODE: another request already forces the VM exit and
> >>>     we're just waiting for the VCPU to notice our request
> >>>   OUTSIDE_GUEST_MODE: KVM is going to notice our request without any
> >>>     intervention
> >>>   READING_SHADOW_PAGE_TABLES: same as OUTSIDE_GUEST_MODE -- rename to
> >>>     unwieldly OUTSIDE_GUEST_MODE_READING_SHADOW_PAGE_TABLES?
> >> Again, I thought Paolo was arguing that EXITING_GUEST_MODE makes the
> >> whole thing work because you check that after checking requests?
> > 
> > ... but apparently I was wrong here, see my email from this morning.
> 
> Ok, I'll prepare a patch that uses kvm_arch_vcpu_should_kick() in
> kvm_make_all_cpus_request().
> 
> >>> The kick is needed only in IN_GUEST_MODE and a wake up is needed in case
> >>> where the guest is halted OUTSIDE_GUEST_MODE ...
> >>> Hm, maybe we should add a halt state too?
> >> Wouldn't that be swait_active(&vcpu->wq) ?   You could add a wrapper
> >> though.
> > 
> > Yes.  I think the wrapper is probably unnecessary.
> 
> The state would only make sense if it were different -- there is a time
> when the VCPU is halted, but swait_active(&vcpu->wq) returns false.
> Probably doesn't have any good use now, though.
> 
> >> What I think you need is a way to distinguish the semantics of calling
> >> kvm_make_all_cpus_request(), perhaps by adding a 'bool wake_up'
> >> parameter.
> > 
> > That would be fine.
> 
> I think the wakeup is more tied to the request type than to the place
> calling it, 

agreed

> so we could say which requests are "lazy" and don't do
> wakeup and have kvm_make_all_cpus_request() without an extra argument.

Sounds good to me.

We could encode the lazy/non-lazy thing in a bit in the request number
that gets masked off before clearing/checking the request bit, but
perhaps there are nicer solutions.

> 
> In the end, I would like if kvm_make_all_cpus_request() was an optimized
> variant of
> 
>   kvm_for_each_vcpu(i, vcpu, kvm) {
>     kvm_make_request(vcpu, request);
>     kvm_vcpu_kick(vcpu);
>   }

Yes, I would like this too.

-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-06 12:02             ` Andrew Jones
@ 2017-04-06 14:37               ` Christoffer Dall
  2017-04-06 15:08                 ` Andrew Jones
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-06 14:37 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

Hi Drew,

On Thu, Apr 06, 2017 at 02:02:12PM +0200, Andrew Jones wrote:
> On Wed, Apr 05, 2017 at 10:20:17PM +0200, Radim Krčmář wrote:
> > 2017-04-05 19:39+0200, Christoffer Dall:
> > > On Wed, Apr 05, 2017 at 03:10:50PM +0200, Radim Krčmář wrote:
> > >> 2017-04-04 18:41+0200, Andrew Jones:
> > >> > On Tue, Apr 04, 2017 at 05:30:14PM +0200, Christoffer Dall wrote:
> > >> >> On Fri, Mar 31, 2017 at 06:06:50PM +0200, Andrew Jones wrote:
> > >> >> > From: Radim Krčmář <rkrcmar@redhat.com>
> > >> >> > 
> > >> >> > A first step in vcpu->requests encapsulation.
> > >> >> 
> > >> >> Could we have a note here on why we need to access vcpu->requests using
> > >> >> READ_ONCE now?
> > >> > 
> > >> > Sure, maybe we should put the note as a comment above the read in
> > >> > kvm_request_pending().  Something like
> > >> > 
> > >> >  /*
> > >> >   * vcpu->requests reads may appear in sequences that have strict
> > >> >   * data or control dependencies.  Use READ_ONCE() to ensure the
> > >> >   * compiler does not do anything that breaks the required ordering.
> > >> >   */
> > >> > 
> > >> > Radim?
> > >> 
> > >> Uses of vcpu->requests should already have barriers that take care of
> > >> the ordering.  I think the main reason for READ_ONCE() is to tell
> > >> programmers that requests are special, but predictable.
> > > 
> > > I don't know what to do with "special, but predictable", unfortunately.
> > > In fact, I don't even think I know what you mean.
> > 
> > With "special" to stand for the idea that vcpu->requests can change
> > outside of the current execution thread.  Letting the programmer assume
> > additional guarantees makes the generated code and resulting behavior
> > more predictable.
> > 
> > >> READ_ONCE() is not necessary in any use I'm aware of, but there is no
> > >> harm in telling the compiler that vcpu->requests are what we think they
> > >> are ...
> > > 
> > > Hmmm, I'm equally lost.
> > 
> > vcpu->requests are volatile, so we need to assume that they can change
> > at any moment when using them.
> > 
> > I would prefer if vcpu->requests were of an atomic type and READ_ONCE()
> > is about as close we can get without a major overhaul.
> > 
> > >> 
> > >>  /*
> > >>   * vcpu->requests are a lockless synchronization mechanism, where
> > > 
> > > is the requests a synchronization mechanism?  I think of it more as a
> > > cross-thread communication protocol.
> > 
> > Partly, synchronization is too restrictive and communication seems too
> > generic, but probably still better.  No idea how to shortly describe the
> > part of vcpu->requests that prevents VM entry and that setting a request
> > kicks VM out of guest mode.
> > 
> > x86 uses KVM_REQ_MCLOCK_INPROGRESS for synchronization between cores and
> > the use in this series looked very similar.
> > 
> > >>   * memory barriers are necessary for correct behavior, see
> > >>   * Documentation/virtual/kvm/vcpu-requests.rst.
> > >>   *
> > >>   * READ_ONCE() is not necessary for correctness, but simplifies
> > >>   * reasoning by constricting the generated code.
> > >>   */
> > >> 
> > >> I considered READ_ONCE() to be self-documenting. :)
> > > 
> > > I realize that I'm probably unusually slow in this whole area, but using
> > > READ_ONCE() where unnecessary doesn't help my reasoning, but makes me
> > > wonder which part of this I didn't understand, so I don't seem to agree
> > > with the statement that it simplifies reasoning.
> > 
> > No, I think it is a matter of approach.  When I see a READ_ONCE()
> > without a comment, I think that the programmer was aware that this
> > memory can change at any time and was defensive about it.
> > 
> > I consider this use to simplify future development:
> > We think now that READ_ONCE() is not needed, but vcpu->requests is still
> > volatile and future changes in code might make READ_ONCE() necessary.
> > Preemptively putting READ_ONCE() there saves us thinking or hard-to-find
> > bugs.
> > 
> > > Really, if there is no reason to use it, I don't think we should use it.
> > 
> > I am leaning towards READ_ONCE() as the default for implicitly volatile
> > memory, but commenting why we didn't have to use READ_ONCE() sounds good
> > too.
> > 
> > > To me, READ_ONCE() indicates that there's some flow in the code where
> > > it's essential that the compiler doesn't generate multiple loads, but
> > > that we only see a momentary single-read snapshot of the value, and this
> > > doesn't seem to be the case.
> > 
> > The compiler can also squash multiple reads together, which is more
> > dangerous in this case as we would not notice a new requests.  Avoiding
> > READ_ONCE() requires a better knowledge of the compiler algorithms that
> > prove which variable can be optimized.
> > 
> > The difference is really minor and I agree that the comment is bad.
> > The only comment I'm happy with is nothing, though ... even "READ_ONCE()
> > is not necessary" is wrong as that might change without us noticing.
> 
> FWIW, I first suggested using READ_ONCE() for the freshness argument,

What is the 'freshness argument' ?

> but then started to believe there were even more reasons after reading
> the comment above it in include/linux/compiler.h.  The last paragraph
> of that comment says there are two major use cases for it.  I think the
> first one maps to the freshness argument.  The second one is
> 
>  (2) Ensuring that the compiler does not  fold, spindle, or otherwise
>      mutilate accesses that either do not require ordering or that
>      interact with an explicit memory barrier or atomic instruction that
>      provides the required ordering.
> 
> Documentation/memory-barriers.txt seemed to agree with that, as it's
> full of READ/WRITE_ONCE's, and statements saying they are not optional.
> However, reading it closer (how many times have I tried to read it
> closer?), I can't see any pattern where they're required that we need
> to be concerned about.  I think the future-proofing / freshness argument
> still stands though, and I also like that it flags the variable as
> "special".
> 
> I think I actually prefer no comment now too, but the commit message
> should get a sentence or two explaining why it got thrown in.
> 

After reading that comment in detail again, I tend to be more open to
the argument that READ_ONCE() actually makes it more clear what is going
on.

All I care about in this case, is that we're spending ages here trying
to really understand the flow, and unless we document exactly what
happens (including whether or not READ_ONCE is strictly required),
someone is going to spend that same time all over again, and come to
partial flawed conclusions again.  So documenting the details, including
which primites are absolutely required and which are not, is important
IMHO, but doing so in the commit message is just fine.

Thanks,
-Christoffer
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-06 14:37               ` Christoffer Dall
@ 2017-04-06 15:08                 ` Andrew Jones
  2017-04-07 15:33                   ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Andrew Jones @ 2017-04-06 15:08 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Radim Krčmář, kvmarm, kvm, marc.zyngier, pbonzini

On Thu, Apr 06, 2017 at 04:37:51PM +0200, Christoffer Dall wrote:
> > FWIW, I first suggested using READ_ONCE() for the freshness argument,
> 
> What is the 'freshness argument' ?

My own made-up lingo to state that each time the variable is accessed it
must be loaded anew, taken care of by the volatile use in READ_ONCE.  As
vcpu->requests can be written by other threads, then I prefer READ_ONCE
being used to read it, as it allows me to avoid spending energy convincing
myself that the compiler would have emitted a load at that point anyway.

The writes to vcpu->requests always go through bitops, so they're already
also emitting fresh loads before doing their stores.

Thanks,
drew

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 2/9] KVM: Add documentation for VCPU requests
  2017-04-06 14:29                 ` Christoffer Dall
@ 2017-04-07 11:44                   ` Paolo Bonzini
  0 siblings, 0 replies; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-07 11:44 UTC (permalink / raw)
  To: Christoffer Dall, Radim Krčmář; +Cc: marc.zyngier, kvmarm, kvm



On 06/04/2017 22:29, Christoffer Dall wrote:
> We could encode the lazy/non-lazy thing in a bit in the request number
> that gets masked off before clearing/checking the request bit, but
> perhaps there are nicer solutions.

Actually this is the first "nice" solution that it's been proposed.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-06 14:14               ` Christoffer Dall
@ 2017-04-07 11:47                 ` Paolo Bonzini
  2017-04-08  8:35                   ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-07 11:47 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, kvmarm, kvm



On 06/04/2017 22:14, Christoffer Dall wrote:
>> So it seems like there are no races after all in KVM/ARM code
>
> No races after Drew's fix has been applied to set vcpu->mode =
> IN_GUEST_MODE, before checking the pause flag, correct?  (I think that's
> what the spin model below is modeling).

Yes.  All of them include the vcpu->mode = IN_GUEST_MODE assignment and
(implicitly because spin is sequentially consistent) the memory barrier.

>> After this experiment, I think I like Drew's KVM_REQ_PAUSE more than I did
>> yesterday.  However, yet another alternative is to leave pause/power_off as
>> they are, while taking some inspiration from his patch to do some cleanups:
>>
>> 1) change the "if"
>>
>>                 if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
>>                         vcpu->arch.power_off || vcpu->arch.pause) {
>>
>> to test kvm_requests_pending instead of pause/power_off
>>
>> 2) clear KVM_REQ_VCPU_EXIT before the other "if":
>>
>>                 if (vcpu->arch.power_off || vcpu->arch.pause)
>>                         vcpu_sleep(vcpu);
>
> I like using requests as only requests from one thread to the VCPU
> thread, and not to maintain specific state about a VCPU.
> 
> The benefit of Drew's approach is that since these pieces of state are
> boolean, you can have just a single check in the critical path in the
> run loop instead of having to access multiple fields.

I think you'd still need two checks for KVM_REQ_PAUSE/KVM_REQ_POWEROFF:
one to check whether to sleep, and one to check whether to abort the
vmentry.

Pause and power_off could be merged into a single bitmask if necessary, too.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-06 14:25             ` Christoffer Dall
@ 2017-04-07 13:15               ` Radim Krčmář
  2017-04-08 18:23                 ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Radim Krčmář @ 2017-04-07 13:15 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

2017-04-06 16:25+0200, Christoffer Dall:
> On Wed, Apr 05, 2017 at 10:20:17PM +0200, Radim Krčmář wrote:
>> 2017-04-05 19:39+0200, Christoffer Dall:
>> > On Wed, Apr 05, 2017 at 03:10:50PM +0200, Radim Krčmář wrote:
>> x86 uses KVM_REQ_MCLOCK_INPROGRESS for synchronization between cores and
>> the use in this series looked very similar.
>> 
>> >>   * memory barriers are necessary for correct behavior, see
>> >>   * Documentation/virtual/kvm/vcpu-requests.rst.
>> >>   *
>> >>   * READ_ONCE() is not necessary for correctness, but simplifies
>> >>   * reasoning by constricting the generated code.
>> >>   */
>> >> 
>> >> I considered READ_ONCE() to be self-documenting. :)
>> > 
>> > I realize that I'm probably unusually slow in this whole area, but using
>> > READ_ONCE() where unnecessary doesn't help my reasoning, but makes me
>> > wonder which part of this I didn't understand, so I don't seem to agree
>> > with the statement that it simplifies reasoning.
>> 
>> No, I think it is a matter of approach.  When I see a READ_ONCE()
>> without a comment, I think that the programmer was aware that this
>> memory can change at any time and was defensive about it.
> 
> I think it means that you have to read it exactly once at the exact flow
> in the code where it's placed.

The compiler can still reorder surrounding non-volatile code, but
reading exactly once is the subset of meaning that READ_ONCE() should
have.  Not assigning it any more meaning sounds good.

>> I consider this use to simplify future development:
>> We think now that READ_ONCE() is not needed, but vcpu->requests is still
>> volatile and future changes in code might make READ_ONCE() necessary.
>> Preemptively putting READ_ONCE() there saves us thinking or hard-to-find
>> bugs.
>> 
> 
> I'm always a bit sceptical about such reasoning as I think without a
> complete understanding of what needs to change when doing changes, we're
> likely to get it wrong anyway.

I think we cannot achieve and maintain a complete understanding, so
getting things wrong is just a matter of time.

It is almost impossible to break ordering of vcpu->requests, though.

>> > To me, READ_ONCE() indicates that there's some flow in the code where
>> > it's essential that the compiler doesn't generate multiple loads, but
>> > that we only see a momentary single-read snapshot of the value, and this
>> > doesn't seem to be the case.
>> 
>> The compiler can also squash multiple reads together, which is more
>> dangerous in this case as we would not notice a new requests.  Avoiding
>> READ_ONCE() requires a better knowledge of the compiler algorithms that
>> prove which variable can be optimized.
> 
> Isn't that covered by the memory barriers that imply compiler barriers
> that we (will) have between checking the mode and the requests variable?

It is, asm volatile ("" ::: "memory") is enough.

The minimal conditions that would require explicit barrier:
 1) not having vcpu->mode(), because it cannot work without memory
    barriers
 2) the instruction that disables interrupts doesn't have "memory"
    constraint  (the smp_rmb in between is not necessary here)

And of course, there would have to be no functions that would contain a
compiler barrier or their bodies remained unknown in between disabling
interrupts and checking requests ...

>> The difference is really minor and I agree that the comment is bad.
>> The only comment I'm happy with is nothing, though ... even "READ_ONCE()
>> is not necessary" is wrong as that might change without us noticing.
> 
> "READ_ONCE() is not necessary" while actually using READ_ONCE() is a
> terrible comment because it makes readers just doubt the correctness of
> the code.
> 
> Regardless of whether or not we end up using READ_ONCE(), I think we
> should document exactly what the requirements are for accessing this
> variable at this time, i.e. any assumption about preceding barriers or
> other flows of events that we rely on.

Makes sense.  My pitch at the documentation after dropping READ_ONCE():

  /*
   *  The return value of kvm_request_pending() is implicitly volatile
   *  and must be protected from reordering by the caller.
   */
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-06 15:08                 ` Andrew Jones
@ 2017-04-07 15:33                   ` Paolo Bonzini
  2017-04-08 18:19                     ` Christoffer Dall
  0 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-07 15:33 UTC (permalink / raw)
  To: Andrew Jones, Christoffer Dall
  Cc: Radim Krčmář, kvmarm, kvm, marc.zyngier



On 06/04/2017 23:08, Andrew Jones wrote:
> My own made-up lingo to state that each time the variable is accessed it
> must be loaded anew, taken care of by the volatile use in READ_ONCE.  As
> vcpu->requests can be written by other threads, then I prefer READ_ONCE
> being used to read it, as it allows me to avoid spending energy convincing
> myself that the compiler would have emitted a load at that point anyway.

Also, READ_ONCE without a barrier is really fishy unless it's

  while (READ_ONCE(x) != 2) {
      ...
  }

or similar, and WRITE_ONCE doesn't even have this exception.  So
annotating variables accessed by multiple threads with
READ_ONCE/WRITE_ONCE is generally a good idea.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request
  2017-04-07 11:47                 ` Paolo Bonzini
@ 2017-04-08  8:35                   ` Christoffer Dall
  0 siblings, 0 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-08  8:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, Christoffer Dall, kvmarm, kvm

On Fri, Apr 07, 2017 at 07:47:29PM +0800, Paolo Bonzini wrote:
> 
> 
> On 06/04/2017 22:14, Christoffer Dall wrote:
> >> So it seems like there are no races after all in KVM/ARM code
> >
> > No races after Drew's fix has been applied to set vcpu->mode =
> > IN_GUEST_MODE, before checking the pause flag, correct?  (I think that's
> > what the spin model below is modeling).
> 
> Yes.  All of them include the vcpu->mode = IN_GUEST_MODE assignment and
> (implicitly because spin is sequentially consistent) the memory barrier.
> 
> >> After this experiment, I think I like Drew's KVM_REQ_PAUSE more than I did
> >> yesterday.  However, yet another alternative is to leave pause/power_off as
> >> they are, while taking some inspiration from his patch to do some cleanups:
> >>
> >> 1) change the "if"
> >>
> >>                 if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> >>                         vcpu->arch.power_off || vcpu->arch.pause) {
> >>
> >> to test kvm_requests_pending instead of pause/power_off
> >>
> >> 2) clear KVM_REQ_VCPU_EXIT before the other "if":
> >>
> >>                 if (vcpu->arch.power_off || vcpu->arch.pause)
> >>                         vcpu_sleep(vcpu);
> >
> > I like using requests as only requests from one thread to the VCPU
> > thread, and not to maintain specific state about a VCPU.
> > 
> > The benefit of Drew's approach is that since these pieces of state are
> > boolean, you can have just a single check in the critical path in the
> > run loop instead of having to access multiple fields.
> 
> I think you'd still need two checks for KVM_REQ_PAUSE/KVM_REQ_POWEROFF:
> one to check whether to sleep, and one to check whether to abort the
> vmentry.
> 
> Pause and power_off could be merged into a single bitmask if necessary, too.

True, a sort of flags.

I think at this point, I'll let Drew decide what looks cleanest when he
writes up the next revision and review those.

Thanks for the thorough help and checking!
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-07 15:33                   ` Paolo Bonzini
@ 2017-04-08 18:19                     ` Christoffer Dall
  0 siblings, 0 replies; 85+ messages in thread
From: Christoffer Dall @ 2017-04-08 18:19 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Andrew Jones, Christoffer Dall, Radim Krčmář,
	kvmarm, kvm, marc.zyngier

On Fri, Apr 07, 2017 at 11:33:33PM +0800, Paolo Bonzini wrote:
> 
> 
> On 06/04/2017 23:08, Andrew Jones wrote:
> > My own made-up lingo to state that each time the variable is accessed it
> > must be loaded anew, taken care of by the volatile use in READ_ONCE.  As
> > vcpu->requests can be written by other threads, then I prefer READ_ONCE
> > being used to read it, as it allows me to avoid spending energy convincing
> > myself that the compiler would have emitted a load at that point anyway.
> 
> Also, READ_ONCE without a barrier is really fishy unless it's
> 
>   while (READ_ONCE(x) != 2) {
>       ...
>   }
> 
> or similar, and WRITE_ONCE doesn't even have this exception.  So
> annotating variables accessed by multiple threads with
> READ_ONCE/WRITE_ONCE is generally a good idea.
> 

I'm sorry, I'm confused.  You're saying that it's fishy to use
READ_ONCE() without also having a barrier, but does that imply that if
you have a barrier, you should also have READ_ONCE() ?

In any case, as I hope I've made clear, I'm perfectly fine with having
READ_ONCE(), as long as we make a best effort attempt at describing why
we added it, so even I can understand that later - in the code directly
or in the commit message.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-07 13:15               ` Radim Krčmář
@ 2017-04-08 18:23                 ` Christoffer Dall
  2017-04-08 19:32                   ` Paolo Bonzini
  0 siblings, 1 reply; 85+ messages in thread
From: Christoffer Dall @ 2017-04-08 18:23 UTC (permalink / raw)
  To: Radim Krčmář
  Cc: Christoffer Dall, Andrew Jones, kvmarm, kvm, marc.zyngier, pbonzini

On Fri, Apr 07, 2017 at 03:15:37PM +0200, Radim Krčmář wrote:
> 2017-04-06 16:25+0200, Christoffer Dall:
> > On Wed, Apr 05, 2017 at 10:20:17PM +0200, Radim Krčmář wrote:
> >> 2017-04-05 19:39+0200, Christoffer Dall:
> >> > On Wed, Apr 05, 2017 at 03:10:50PM +0200, Radim Krčmář wrote:
> >> x86 uses KVM_REQ_MCLOCK_INPROGRESS for synchronization between cores and
> >> the use in this series looked very similar.
> >> 
> >> >>   * memory barriers are necessary for correct behavior, see
> >> >>   * Documentation/virtual/kvm/vcpu-requests.rst.
> >> >>   *
> >> >>   * READ_ONCE() is not necessary for correctness, but simplifies
> >> >>   * reasoning by constricting the generated code.
> >> >>   */
> >> >> 
> >> >> I considered READ_ONCE() to be self-documenting. :)
> >> > 
> >> > I realize that I'm probably unusually slow in this whole area, but using
> >> > READ_ONCE() where unnecessary doesn't help my reasoning, but makes me
> >> > wonder which part of this I didn't understand, so I don't seem to agree
> >> > with the statement that it simplifies reasoning.
> >> 
> >> No, I think it is a matter of approach.  When I see a READ_ONCE()
> >> without a comment, I think that the programmer was aware that this
> >> memory can change at any time and was defensive about it.
> > 
> > I think it means that you have to read it exactly once at the exact flow
> > in the code where it's placed.
> 
> The compiler can still reorder surrounding non-volatile code, but
> reading exactly once is the subset of meaning that READ_ONCE() should
> have.  Not assigning it any more meaning sounds good.
> 
> >> I consider this use to simplify future development:
> >> We think now that READ_ONCE() is not needed, but vcpu->requests is still
> >> volatile and future changes in code might make READ_ONCE() necessary.
> >> Preemptively putting READ_ONCE() there saves us thinking or hard-to-find
> >> bugs.
> >> 
> > 
> > I'm always a bit sceptical about such reasoning as I think without a
> > complete understanding of what needs to change when doing changes, we're
> > likely to get it wrong anyway.
> 
> I think we cannot achieve and maintain a complete understanding, so
> getting things wrong is just a matter of time.
> 
> It is almost impossible to break ordering of vcpu->requests, though.
> 
> >> > To me, READ_ONCE() indicates that there's some flow in the code where
> >> > it's essential that the compiler doesn't generate multiple loads, but
> >> > that we only see a momentary single-read snapshot of the value, and this
> >> > doesn't seem to be the case.
> >> 
> >> The compiler can also squash multiple reads together, which is more
> >> dangerous in this case as we would not notice a new requests.  Avoiding
> >> READ_ONCE() requires a better knowledge of the compiler algorithms that
> >> prove which variable can be optimized.
> > 
> > Isn't that covered by the memory barriers that imply compiler barriers
> > that we (will) have between checking the mode and the requests variable?
> 
> It is, asm volatile ("" ::: "memory") is enough.
> 
> The minimal conditions that would require explicit barrier:
>  1) not having vcpu->mode(), because it cannot work without memory
>     barriers
>  2) the instruction that disables interrupts doesn't have "memory"
>     constraint  (the smp_rmb in between is not necessary here)
> 
> And of course, there would have to be no functions that would contain a
> compiler barrier or their bodies remained unknown in between disabling
> interrupts and checking requests ...
> 
> >> The difference is really minor and I agree that the comment is bad.
> >> The only comment I'm happy with is nothing, though ... even "READ_ONCE()
> >> is not necessary" is wrong as that might change without us noticing.
> > 
> > "READ_ONCE() is not necessary" while actually using READ_ONCE() is a
> > terrible comment because it makes readers just doubt the correctness of
> > the code.
> > 
> > Regardless of whether or not we end up using READ_ONCE(), I think we
> > should document exactly what the requirements are for accessing this
> > variable at this time, i.e. any assumption about preceding barriers or
> > other flows of events that we rely on.
> 
> Makes sense.  My pitch at the documentation after dropping READ_ONCE():

I'm confused again, I thought you wanted to keep READ_ONCE().

> 
>   /*
>    *  The return value of kvm_request_pending() is implicitly volatile

why is that, actually?

>    *  and must be protected from reordering by the caller.
>    */

Can we be specific about what that means?  (e.g. must be preceded by a
full smp_mb() - or whatever the case is).

Perhaps we should just let Drew respin at this point, in case he's
confident about the right path, and then pick up from there?

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-08 18:23                 ` Christoffer Dall
@ 2017-04-08 19:32                   ` Paolo Bonzini
  2017-04-11 21:06                     ` Radim Krčmář
  0 siblings, 1 reply; 85+ messages in thread
From: Paolo Bonzini @ 2017-04-08 19:32 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Radim Krčmář,
	Christoffer Dall, Andrew Jones, kvmarm, kvm, marc zyngier


> > Makes sense.  My pitch at the documentation after dropping READ_ONCE():
> 
> I'm confused again, I thought you wanted to keep READ_ONCE().
> 
> > 
> >   /*
> >    *  The return value of kvm_request_pending() is implicitly volatile
> 
> why is that, actually?
> 
> >    *  and must be protected from reordering by the caller.
> >    */
> 
> Can we be specific about what that means?  (e.g. must be preceded by a
> full smp_mb() - or whatever the case is).

You can play devil's advocate both ways and argue that READ_ONCE is
better, or that it is unnecessary hence worse.  You can write good
comments in either case.  That's how I read Radim's message.  But
I think we all agree on keeping it in the end.

> Perhaps we should just let Drew respin at this point, in case he's
> confident about the right path, and then pick up from there?

I agree.

In any case, the memory barrier is the important part, but
adding READ_ONCE is self-documenting and I prefer to have it.

Paolo

^ permalink raw reply	[flat|nested] 85+ messages in thread

* Re: [PATCH v2 1/9] KVM: add kvm_request_pending
  2017-04-08 19:32                   ` Paolo Bonzini
@ 2017-04-11 21:06                     ` Radim Krčmář
  0 siblings, 0 replies; 85+ messages in thread
From: Radim Krčmář @ 2017-04-11 21:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Christoffer Dall, kvm, marc zyngier, kvmarm

2017-04-08 15:32-0400, Paolo Bonzini:
> > > Makes sense.  My pitch at the documentation after dropping READ_ONCE():
> > 
> > I'm confused again, I thought you wanted to keep READ_ONCE().
> > 
> > > 
> > >   /*
> > >    *  The return value of kvm_request_pending() is implicitly volatile
> > 
> > why is that, actually?

"that" is the return value?
  kvm_request_pending() is 'inline static' so the compiler can prove that
  the function only returns vcpu->requests and that the surrounding code
  doesn't change it, so various optimizations are possible.

Or the implicitly volatile bit?
  We know that vcpu->requests can change at any time without action of the
  execution thread, which makes it volatile, but we don't tell that to the
  compiler, hence implicit.
  
  READ_ONCE(vcpu->requests) is
  
    *(volatile unsigned long *)&vcpu->requests
  
  and makes it explicitly volatile.

> > >    *  and must be protected from reordering by the caller.
> > >    */
> > 
> > Can we be specific about what that means?  (e.g. must be preceded by a
> > full smp_mb() - or whatever the case is).
> 
> You can play devil's advocate both ways and argue that READ_ONCE is
> better, or that it is unnecessary hence worse.  You can write good
> comments in either case.  That's how I read Radim's message.  But
> I think we all agree on keeping it in the end.

Exactly, I am in favor of READ_ONCE() as I don't like to keep the
compiler informed, just wanted to cover all options, because they are
not that different ... minute details are the hardest to decide. :)

^ permalink raw reply	[flat|nested] 85+ messages in thread

end of thread, other threads:[~2017-04-11 21:06 UTC | newest]

Thread overview: 85+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-31 16:06 [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
2017-03-31 16:06 ` [PATCH v2 1/9] KVM: add kvm_request_pending Andrew Jones
2017-04-04 15:30   ` Christoffer Dall
2017-04-04 16:41     ` Andrew Jones
2017-04-05 13:10       ` Radim Krčmář
2017-04-05 17:39         ` Christoffer Dall
2017-04-05 18:30           ` Paolo Bonzini
2017-04-05 20:20           ` Radim Krčmář
2017-04-06 12:02             ` Andrew Jones
2017-04-06 14:37               ` Christoffer Dall
2017-04-06 15:08                 ` Andrew Jones
2017-04-07 15:33                   ` Paolo Bonzini
2017-04-08 18:19                     ` Christoffer Dall
2017-04-06 14:25             ` Christoffer Dall
2017-04-07 13:15               ` Radim Krčmář
2017-04-08 18:23                 ` Christoffer Dall
2017-04-08 19:32                   ` Paolo Bonzini
2017-04-11 21:06                     ` Radim Krčmář
2017-03-31 16:06 ` [PATCH v2 2/9] KVM: Add documentation for VCPU requests Andrew Jones
2017-04-04 15:24   ` Christoffer Dall
2017-04-04 17:06     ` Andrew Jones
2017-04-04 17:23       ` Christoffer Dall
2017-04-04 17:36         ` Paolo Bonzini
2017-04-05 14:11         ` Radim Krčmář
2017-04-05 17:45           ` Christoffer Dall
2017-04-05 18:29             ` Paolo Bonzini
2017-04-05 20:46               ` Radim Krčmář
2017-04-06 14:29                 ` Christoffer Dall
2017-04-07 11:44                   ` Paolo Bonzini
2017-04-06 14:27               ` Christoffer Dall
2017-04-06 10:18   ` Christian Borntraeger
2017-04-06 12:08     ` Andrew Jones
2017-04-06 12:29     ` Radim Krčmář
2017-03-31 16:06 ` [PATCH v2 3/9] KVM: arm/arm64: prepare to use vcpu requests Andrew Jones
2017-04-04 15:34   ` Christoffer Dall
2017-04-04 17:06     ` Andrew Jones
2017-03-31 16:06 ` [PATCH v2 4/9] KVM: arm/arm64: replace vcpu->arch.pause with a vcpu request Andrew Jones
2017-04-04 13:39   ` Marc Zyngier
2017-04-04 14:47     ` Andrew Jones
2017-04-04 14:51       ` Paolo Bonzini
2017-04-04 15:05         ` Marc Zyngier
2017-04-04 17:07         ` Andrew Jones
2017-04-04 16:04   ` Christoffer Dall
2017-04-04 16:24     ` Paolo Bonzini
2017-04-04 17:19       ` Christoffer Dall
2017-04-04 17:35         ` Paolo Bonzini
2017-04-04 17:57           ` Christoffer Dall
2017-04-04 18:15             ` Paolo Bonzini
2017-04-04 18:38               ` Christoffer Dall
2017-04-04 18:18           ` Andrew Jones
2017-04-04 18:59             ` Paolo Bonzini
2017-04-04 17:57     ` Andrew Jones
2017-04-04 19:04       ` Christoffer Dall
2017-04-04 20:10         ` Paolo Bonzini
2017-04-05  7:09           ` Christoffer Dall
2017-04-05 11:37             ` Paolo Bonzini
2017-04-06 14:14               ` Christoffer Dall
2017-04-07 11:47                 ` Paolo Bonzini
2017-04-08  8:35                   ` Christoffer Dall
2017-03-31 16:06 ` [PATCH v2 5/9] KVM: arm/arm64: replace vcpu->arch.power_off " Andrew Jones
2017-04-04 17:37   ` Christoffer Dall
2017-03-31 16:06 ` [PATCH v2 6/9] KVM: arm/arm64: use a vcpu request on irq injection Andrew Jones
2017-04-04 17:42   ` Christoffer Dall
2017-04-04 18:27     ` Andrew Jones
2017-04-04 18:59     ` Paolo Bonzini
2017-04-04 18:51   ` Paolo Bonzini
2017-03-31 16:06 ` [PATCH v2 7/9] KVM: arm/arm64: PMU: remove request-less vcpu kick Andrew Jones
2017-04-04 17:46   ` Christoffer Dall
2017-04-04 18:29     ` Andrew Jones
2017-04-04 19:35       ` Christoffer Dall
2017-03-31 16:06 ` [PATCH v2 8/9] KVM: arm/arm64: fix race in kvm_psci_vcpu_on Andrew Jones
2017-04-04 19:42   ` Christoffer Dall
2017-04-05  8:35     ` Andrew Jones
2017-04-05  8:50       ` Christoffer Dall
2017-04-05  9:12         ` Andrew Jones
2017-04-05  9:30           ` Christoffer Dall
2017-03-31 16:06 ` [PATCH v2 9/9] KVM: arm/arm64: avoid race by caching MPIDR Andrew Jones
2017-04-04 19:44   ` Christoffer Dall
2017-04-05  8:50     ` Andrew Jones
2017-04-05 11:03       ` Christoffer Dall
2017-04-05 11:14         ` Andrew Jones
2017-04-03 15:28 ` [PATCH v2 0/9] KVM: arm/arm64: race fixes and vcpu requests Christoffer Dall
2017-04-03 17:11   ` Paolo Bonzini
2017-04-04  7:27   ` Andrew Jones
2017-04-04 16:05     ` Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.