[PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests
@ 2017-05-03 16:06 Andrew Jones
  2017-05-03 16:06 ` [PATCH v3 01/10] KVM: add kvm_request_pending Andrew Jones
                   ` (9 more replies)
  0 siblings, 10 replies; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

This series fixes some hard to produce races by introducing the use of
vcpu requests. I've tested the series on a Mustang and a ThunderX and
compile-tested the ARM bits.

Patch 2/10 adds documentation, as, at least for me, understanding vcpu
request interplay with vcpu kicks and vcpu mode and the memory barriers
that interplay implies, is exhausting.

v3:
  - The easier to reproduce PSCI races previously fixed with this
    series were fixed independently with a different approach suggested
    by Christoffer[1].
  - Based on Radim's latest vcpu request rework series[2]
  - Rewrote the documentation adding much more information and
    incorporating Christoffer's comments from v2.
  - Reworked the approach to controlling pause and power_off by
    requests.
  - Clear request for pending irq injection in the correct place, as
    pointed out by Christoffer.
  - Rebase required some simple changes to the PMU code patch.

v2:
  - No longer based on Radim's vcpu request API rework[3], except for
    including "add kvm_request_pending" as patch 1/10 [drew]
  - Added vcpu request documentation [drew]
  - Dropped the introduction of user settable MPIDRs [Christoffer]
  - Added vcpu requests to all request-less vcpu kicks [Christoffer]

[1] https://www.spinics.net/lists/arm-kernel/msg577630.html
[2] http://www.spinics.net/lists/kvm/msg148890.html
[3] https://www.spinics.net/lists/kvm/msg145588.html


Andrew Jones (9):
  KVM: Add documentation for VCPU requests
  KVM: arm/arm64: prepare to use vcpu requests
  KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu
  KVM: arm/arm64: don't clear exit request from caller
  KVM: arm/arm64: use vcpu requests for power_off
  KVM: arm/arm64: optimize VCPU RUN
  KVM: arm/arm64: change exit request to sleep request
  KVM: arm/arm64: use vcpu requests for irq injection
  KVM: arm/arm64: PMU: remove request-less vcpu kick

Radim Krčmář (1):
  KVM: add kvm_request_pending

 Documentation/virtual/kvm/vcpu-requests.rst | 269 ++++++++++++++++++++++++++++
 arch/arm/include/asm/kvm_host.h             |   3 +-
 arch/arm/kvm/arm.c                          |  46 ++++-
 arch/arm/kvm/handle_exit.c                  |   1 +
 arch/arm/kvm/psci.c                         |   8 +-
 arch/arm64/include/asm/kvm_host.h           |   3 +-
 arch/arm64/kvm/handle_exit.c                |   1 +
 arch/mips/kvm/trap_emul.c                   |   2 +-
 arch/powerpc/kvm/booke.c                    |   2 +-
 arch/powerpc/kvm/powerpc.c                  |   5 +-
 arch/s390/kvm/kvm-s390.c                    |   2 +-
 arch/x86/kvm/x86.c                          |   4 +-
 include/linux/kvm_host.h                    |   5 +
 virt/kvm/arm/arch_timer.c                   |   1 +
 virt/kvm/arm/pmu.c                          |  40 ++---
 virt/kvm/arm/vgic/vgic.c                    |   9 +-
 16 files changed, 357 insertions(+), 44 deletions(-)
 create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst

-- 
2.9.3

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 43+ messages in thread

* [PATCH v3 01/10] KVM: add kvm_request_pending
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-03 16:06 ` [PATCH v3 02/10] KVM: Add documentation for VCPU requests Andrew Jones
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

From: Radim Krčmář <rkrcmar@redhat.com>

A first step in vcpu->requests encapsulation.  Additionally, we now
use READ_ONCE() when accessing vcpu->requests, which ensures we
always load vcpu->requests when it's accessed.  This is important as
other threads can change it any time.  Also, READ_ONCE() documents
that vcpu->requests is used with other threads, likely requiring
memory barriers, which it does.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
[ Documented the new use of READ_ONCE(). ]
Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/mips/kvm/trap_emul.c  | 2 +-
 arch/powerpc/kvm/booke.c   | 2 +-
 arch/powerpc/kvm/powerpc.c | 5 ++---
 arch/s390/kvm/kvm-s390.c   | 2 +-
 arch/x86/kvm/x86.c         | 4 ++--
 include/linux/kvm_host.h   | 5 +++++
 6 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/mips/kvm/trap_emul.c b/arch/mips/kvm/trap_emul.c
index b1fa53b252ea..9ac8b1d62643 100644
--- a/arch/mips/kvm/trap_emul.c
+++ b/arch/mips/kvm/trap_emul.c
@@ -1029,7 +1029,7 @@ static void kvm_trap_emul_check_requests(struct kvm_vcpu *vcpu, int cpu,
 	struct mm_struct *mm;
 	int i;
 
-	if (likely(!vcpu->requests))
+	if (likely(!kvm_request_pending(vcpu)))
 		return;
 
 	if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) {
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index ab968f60d14c..c427202b3cbb 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -682,7 +682,7 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
 
 	kvmppc_core_check_exceptions(vcpu);
 
-	if (vcpu->requests) {
+	if (kvm_request_pending(vcpu)) {
 		/* Exception delivery raised request; start over */
 		return 1;
 	}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 378ed9cee1f9..0ce1a219984e 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -52,8 +52,7 @@ EXPORT_SYMBOL_GPL(kvmppc_pr_ops);
 
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-	return !!(v->arch.pending_exceptions) ||
-	       v->requests;
+	return !!(v->arch.pending_exceptions) || kvm_request_pending(v);
 }
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
@@ -105,7 +104,7 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 		 */
 		smp_mb();
 
-		if (vcpu->requests) {
+		if (kvm_request_pending(vcpu)) {
 			/* Make sure we process requests preemptable */
 			local_irq_enable();
 			trace_kvm_check_requests(vcpu);
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 9ab89f73612a..66c33eb53707 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2396,7 +2396,7 @@ static int kvm_s390_handle_requests(struct kvm_vcpu *vcpu)
 {
 retry:
 	kvm_s390_vcpu_request_handled(vcpu);
-	if (!vcpu->requests)
+	if (!kvm_request_pending(vcpu))
 		return 0;
 	/*
 	 * We use MMU_RELOAD just to re-arm the ipte notifier for the
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 073175e14083..92e643af848e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6720,7 +6720,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	bool req_immediate_exit = false;
 
-	if (vcpu->requests) {
+	if (kvm_request_pending(vcpu)) {
 		if (kvm_check_request(KVM_REQ_MMU_RELOAD, vcpu))
 			kvm_mmu_unload(vcpu);
 		if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu))
@@ -6884,7 +6884,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 			kvm_x86_ops->sync_pir_to_irr(vcpu);
 	}
 
-	if (vcpu->mode == EXITING_GUEST_MODE || vcpu->requests
+	if (vcpu->mode == EXITING_GUEST_MODE || kvm_request_pending(vcpu)
 	    || need_resched() || signal_pending(current)) {
 		vcpu->mode = OUTSIDE_GUEST_MODE;
 		smp_wmb();
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 94b12154e67c..b1f7d40a4ca4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1111,6 +1111,11 @@ static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu)
 	set_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
 }
 
+static inline bool kvm_request_pending(struct kvm_vcpu *vcpu)
+{
+	return READ_ONCE(vcpu->requests);
+}
+
 static inline bool kvm_test_request(int req, struct kvm_vcpu *vcpu)
 {
 	return test_bit(req & KVM_REQUEST_MASK, &vcpu->requests);
-- 
2.9.3

_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 02/10] KVM: Add documentation for VCPU requests
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
  2017-05-03 16:06 ` [PATCH v3 01/10] KVM: add kvm_request_pending Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-04 11:27   ` Paolo Bonzini
  2017-05-03 16:06 ` [PATCH v3 03/10] KVM: arm/arm64: prepare to use vcpu requests Andrew Jones
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 Documentation/virtual/kvm/vcpu-requests.rst | 269 ++++++++++++++++++++++++++++
 1 file changed, 269 insertions(+)
 create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst

diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
new file mode 100644
index 000000000000..d74616d7999a
--- /dev/null
+++ b/Documentation/virtual/kvm/vcpu-requests.rst
@@ -0,0 +1,269 @@
+=================
+KVM VCPU Requests
+=================
+
+Overview
+========
+
+KVM supports an internal API enabling threads to request a VCPU thread to
+perform some activity.  For example, a thread may request a VCPU to flush
+its TLB with a VCPU request.  The API consists of the following functions::
+
+  /* Check if any requests are pending for VCPU @vcpu. */
+  bool kvm_request_pending(struct kvm_vcpu *vcpu);
+
+  /* Check if VCPU @vcpu has request @req pending. */
+  bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
+
+  /* Clear request @req for VCPU @vcpu. */
+  void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
+
+  /*
+   * Check if VCPU @vcpu has request @req pending. When the request is
+   * pending it will be cleared and a memory barrier, which pairs with
+   * another in kvm_make_request(), will be issued.
+   */
+  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
+
+  /*
+   * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
+   * with another in kvm_check_request(), prior to setting the request.
+   */
+  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
+
+  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
+  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
+
+Typically a requester wants the VCPU to perform the activity as soon
+as possible after making the request.  This means most requests
+(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
+and kvm_make_all_cpus_request() has the kicking of all VCPUs built
+into it.
+
+VCPU Kicks
+----------
+
+The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
+order to perform some KVM maintenance.  To do so, an IPI is sent, forcing
+a guest mode exit.  However, a VCPU thread may not be in guest mode at the
+time of the kick.  Therefore, depending on the mode and state of the VCPU
+thread, there are two other actions a kick may take.  All three actions
+are listed below:
+
+1) Send an IPI.  This forces a guest mode exit.
+2) Waking a sleeping VCPU.  Sleeping VCPUs are VCPU threads outside guest
+   mode that wait on waitqueues.  Waking them removes the threads from
+   the waitqueues, allowing the threads to run again.  This behavior
+   may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
+3) Nothing.  When the VCPU is not in guest mode and the VCPU thread is not
+   sleeping, then there is nothing to do.
+
+VCPU Mode
+---------
+
+VCPUs have a mode state, vcpu->mode, that is used to track whether the
+guest is running in guest mode or not, as well as some specific
+outside guest mode states.  The architecture may use vcpu->mode to ensure
+VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"), as
+well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and even
+to ensure IPI acknowledgements are waited upon (see "Waiting for
+Acknowledgements").  The following modes are defined:
+
+OUTSIDE_GUEST_MODE
+
+  The VCPU thread is outside guest mode.
+
+IN_GUEST_MODE
+
+  The VCPU thread is in guest mode.
+
+EXITING_GUEST_MODE
+
+  The VCPU thread is transitioning from IN_GUEST_MODE to
+  OUTSIDE_GUEST_MODE.
+
+READING_SHADOW_PAGE_TABLES
+
+  The VCPU thread is outside guest mode and wants certain VCPU requests,
+  namely KVM_REQ_TLB_FLUSH, to be delayed until it's done reading the
+  page tables.
+
+VCPU Request Internals
+======================
+
+VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
+means general bitops, like those documented in [atomic-ops]_ could also be
+used, e.g. ::
+
+  clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
+
+However, VCPU request users should refrain from doing so, as it would
+break the abstraction.  The first 8 bits are reserved for architecture
+independent requests, all additional bits are available for architecture
+dependent requests.
+
+Architecture Independent Requests
+---------------------------------
+
+KVM_REQ_TLB_FLUSH
+
+  KVM's common MMU notifier may need to flush all of a guest's TLB
+  entries, calling kvm_flush_remote_tlbs() to do so.  Architectures that
+  choose to use the common kvm_flush_remote_tlbs() implementation will
+  need to handle this VCPU request.
+
+KVM_REQ_MMU_RELOAD
+
+  When shadow page tables are used and memory slots are removed it's
+  necessary to inform each VCPU to completely refresh the tables.  This
+  request is used for that.
+
+KVM_REQ_PENDING_TIMER
+
+  This request may be made from a timer handler run on the host on behalf
+  of a VCPU.  It informs the VCPU thread to inject a timer interrupt.
+
+KVM_REQ_UNHALT
+
+  This request may be made from the KVM common function kvm_vcpu_block(),
+  which is used to emulate an instruction that causes a CPU to halt until
+  one of an architectural specific set of events and/or interrupts is
+  received (determined by checking kvm_arch_vcpu_runnable()).  When that
+  event or interrupt arrives kvm_vcpu_block() makes the request.  This is
+  in contrast to when kvm_vcpu_block() returns due to any other reason,
+  such as a pending signal, which does not indicate the VCPU's halt
+  emulation should stop, and therefore does not make the request.
+
+KVM_REQUEST_MASK
+----------------
+
+VCPU requests should be masked by KVM_REQUEST_MASK before using them with
+bitops.  This is because only the lower 8 bits are used to represent the
+request's number.  The upper bits are reserved, and may be used as flags.
+
+VCPU Request Flags
+------------------
+
+KVM_REQUEST_NO_WAKEUP
+
+  This flag is applied to a request that does not need immediate
+  attention.  When a request does not need immediate attention, and the
+  VCPU's thread is outside guest mode sleeping, then the thread is not
+  awaken by a kick.
+
+KVM_REQUEST_WAIT
+
+  When requests with this flag are made with kvm_make_all_cpus_request(),
+  then the caller will wait for each VCPU to acknowledge the IPI before
+  proceeding.
+
+VCPU Requests with Associated State
+===================================
+
+Requesters that want the receiving VCPU to handle new state need to ensure
+the newly written state is observable to the receiving VCPU thread's CPU
+by the time it observes the request.  This means a write memory barrier
+must be inserted after writing the new state and before setting the VCPU
+request bit.  Additionally, on the receiving VCPU thread's side, a
+corresponding read barrier must be inserted after reading the request bit
+and before proceeding to read the new state associated with it.  See
+scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
+[memory-barriers]_.
+
+The pair of functions, kvm_check_request() and kvm_make_request(), provide
+the memory barriers, allowing this requirement to be handled internally by
+the API.
+
+Ensuring Requests Are Seen
+==========================
+
+When making requests to VCPUs, we want to avoid the receiving VCPU
+executing in guest mode for an arbitrary long time without handling the
+request.  We can be sure this won't happen as long as we ensure the VCPU
+thread checks kvm_request_pending() before entering guest mode and that a
+kick will send an IPI when necessary.  Extra care must be taken to cover
+the period after the VCPU thread's last kvm_request_pending() check and
+before it has entered guest mode, as kick IPIs will only trigger VCPU run
+loops for VCPU threads that are in guest mode or at least have already
+disabled interrupts in order to prepare to enter guest mode.  This means
+that an optimized implementation (see "IPI Reduction") must be certain
+when it's safe to not send the IPI.  One solution, which all architectures
+except s390 apply, is to set vcpu->mode to IN_GUEST_MODE prior to the last
+kvm_request_pending() check and to rely on memory barrier guarantees.
+
+With memory barriers we can exclude the possibility of a VCPU thread
+observing !kvm_request_pending() on its last check and then not receiving
+an IPI for the next request made of it, even if the request is made
+immediately after the check.  This is done by way of the Dekker memory
+barrier pattern (scenario 10 of [lwn-mb]_).  As the Dekker pattern
+requires two variables, this solution pairs vcpu->mode with
+vcpu->requests.  Substituting them into the pattern gives::
+
+  CPU1                                    CPU2
+  =================                       =================
+  local_irq_disable();
+  WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);  kvm_make_request(REQ, vcpu);
+  smp_mb();                               smp_mb();
+  if (kvm_request_pending(vcpu)) {        if (READ_ONCE(vcpu->mode) ==
+                                              IN_GUEST_MODE) {
+      ...abort guest entry...                 ...send IPI...
+  }                                       }
+
+As stated above, the IPI is only useful for VCPU threads in guest mode or
+that have already disabled interrupts.  This is why this specific case of
+the Dekker pattern has been extended to disable interrupts before setting
+vcpu->mode to IN_GUEST_MODE.  WRITE_ONCE() and READ_ONCE() are used to
+pedantically implement the memory barrier pattern, guaranteeing the
+compiler doesn't interfere with vcpu->mode's carefully planned accesses.
+
+IPI Reduction
+-------------
+
+As only one IPI is needed to get a VCPU to check for any/all requests,
+then they may be coalesced.  This is easily done by having the first IPI
+sending kick also change the VCPU mode to something !IN_GUEST_MODE.  The
+transitional state, EXITING_GUEST_MODE, is used for this purpose.
+
+Waiting for Acknowledgements
+----------------------------
+
+Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
+be sent, and the acknowledgements to be waited upon, even when the target
+VCPU threads are in modes other than IN_GUEST_MODE.  For example, one case
+is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
+is set after disabling interrupts.  For these cases, the "should send an
+IPI" condition becomes READ_ONCE(vcpu->mode) != OUTSIDE_GUEST_MODE.
+
+Request-less VCPU Kicks
+-----------------------
+
+As the determination of whether or not to send an IPI depends on the
+two-variable Dekker memory barrier pattern, then it's clear that
+request-less VCPU kicks are almost never correct.  Without the assurance
+that a non-IPI generating kick will still result in an action by the
+receiving VCPU, as the final kvm_request_pending() check does for
+request-accompanying kicks, then the kick may not do anything useful at
+all.  If, for instance, a request-less kick was made to a VCPU that was
+just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
+the VCPU thread may continue its entry without actually having done
+whatever it was the kick was meant to initiate.
+
+Additional Considerations
+=========================
+
+Sleeping VCPUs
+--------------
+
+VCPU threads may need to consider requests before and/or after calling
+functions that may put them to sleep, e.g. kvm_vcpu_block().  Whether they
+do or not, and, if they do, which requests need consideration, is
+architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
+to check if it should awaken.  One reason to do so is to provide
+architectures a function where requests may be checked if necessary.
+
+References
+==========
+
+.. [atomic-ops] Documentation/core-api/atomic_ops.rst
+.. [memory-barriers] Documentation/memory-barriers.txt
+.. [lwn-mb] https://lwn.net/Articles/573436/
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 03/10] KVM: arm/arm64: prepare to use vcpu requests
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
  2017-05-03 16:06 ` [PATCH v3 01/10] KVM: add kvm_request_pending Andrew Jones
  2017-05-03 16:06 ` [PATCH v3 02/10] KVM: Add documentation for VCPU requests Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-03 16:06 ` [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu Andrew Jones
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

Make sure we don't leave vcpu requests we don't intend to
handle later set in the request bitmap. If we don't clear
them, then kvm_request_pending() may return true when we
don't want it to. Also, make sure we set vcpu->mode at the
appropriate time (before the last requests check) and with
the appropriate barriers. See
Documentation/virtual/kvm/vcpu-requests.rst

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/kvm/arm.c           | 14 ++++++++++++--
 arch/arm/kvm/handle_exit.c   |  1 +
 arch/arm/kvm/psci.c          |  1 +
 arch/arm64/kvm/handle_exit.c |  1 +
 4 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 7941699a766d..47f6c7fdca96 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -552,6 +552,7 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
+	kvm_clear_request(KVM_REQ_VCPU_EXIT, vcpu);
 	vcpu->arch.pause = false;
 	swake_up(wq);
 }
@@ -653,8 +654,18 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 			run->exit_reason = KVM_EXIT_INTR;
 		}
 
+		/*
+		 * Ensure we set mode to IN_GUEST_MODE after we disable
+		 * interrupts and before the final VCPU requests check.
+		 * See the comment in kvm_vcpu_exiting_guest_mode() and
+		 * Documentation/virtual/kvm/vcpu-requests.rst
+		 */
+		smp_store_mb(vcpu->mode, IN_GUEST_MODE);
+
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
-			vcpu->arch.power_off || vcpu->arch.pause) {
+		    kvm_request_pending(vcpu) ||
+		    vcpu->arch.power_off || vcpu->arch.pause) {
+			vcpu->mode = OUTSIDE_GUEST_MODE;
 			local_irq_enable();
 			kvm_pmu_sync_hwstate(vcpu);
 			kvm_timer_sync_hwstate(vcpu);
@@ -670,7 +681,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		 */
 		trace_kvm_entry(*vcpu_pc(vcpu));
 		guest_enter_irqoff();
-		vcpu->mode = IN_GUEST_MODE;
 
 		ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c
index 5fd7968cdae9..a2b4f7b82356 100644
--- a/arch/arm/kvm/handle_exit.c
+++ b/arch/arm/kvm/handle_exit.c
@@ -72,6 +72,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		trace_kvm_wfx(*vcpu_pc(vcpu), false);
 		vcpu->stat.wfi_exit_stat++;
 		kvm_vcpu_block(vcpu);
+		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	}
 
 	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index a08d7a93aebb..f68be2cc6256 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -57,6 +57,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 	 * for KVM will preserve the register state.
 	 */
 	kvm_vcpu_block(vcpu);
+	kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 
 	return PSCI_RET_SUCCESS;
 }
diff --git a/arch/arm64/kvm/handle_exit.c b/arch/arm64/kvm/handle_exit.c
index fa1b18e364fc..17d8a1677a0b 100644
--- a/arch/arm64/kvm/handle_exit.c
+++ b/arch/arm64/kvm/handle_exit.c
@@ -89,6 +89,7 @@ static int kvm_handle_wfx(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		trace_kvm_wfx_arm64(*vcpu_pc(vcpu), false);
 		vcpu->stat.wfi_exit_stat++;
 		kvm_vcpu_block(vcpu);
+		kvm_clear_request(KVM_REQ_UNHALT, vcpu);
 	}
 
 	kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu));
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (2 preceding siblings ...)
  2017-05-03 16:06 ` [PATCH v3 03/10] KVM: arm/arm64: prepare to use vcpu requests Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-06 18:08   ` Christoffer Dall
  2017-05-03 16:06 ` [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller Andrew Jones
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

VCPU halting/resuming is partially implemented with VCPU requests.
When kvm_arm_halt_guest() is called all VCPUs get the EXIT request,
telling them to exit guest mode and look at the state of 'pause',
which will be true, telling them to sleep.  As ARM's VCPU RUN
implements the memory barrier pattern described in "Ensuring Requests
Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst, there's
no way for a VCPU halted by kvm_arm_halt_guest() to miss the pause
state change.  However, before this patch, a single VCPU halted with
kvm_arm_halt_vcpu() did not get a request, opening a tiny race window.
This patch adds the request, closing the race window and also allowing
us to remove the final check of pause in VCPU RUN, as the final check
for requests is sufficient.

Signed-off-by: Andrew Jones <drjones@redhat.com>

---

I have two questions about the halting/resuming.

Question 1:

Do we even need kvm_arm_halt_vcpu()/kvm_arm_resume_vcpu()? It should
only be necessary if one VCPU can activate or inactivate the private
IRQs of another VCPU, right?  That doesn't seem like something that
should be possible, but I'm GIC-illiterate...

Question 2:

It's not clear to me if we have another problem with halting/resuming
or not.  If it's possible for VCPU1 and VCPU2 to race in
vgic_mmio_write_s/cactive(), then the following scenario could occur,
leading to VCPU3 being in guest mode when it should not be.  Does the
hardware prohibit more than one VCPU entering trap handlers that lead
to these functions at the same time?  If not, then I guess pause needs
to be a counter instead of a boolean.

 VCPU1                 VCPU2                  VCPU3
 -----                 -----                  -----
                       VCPU3->pause = true;
                       halt(VCPU3);
                                              if (pause)
                                                sleep();
 VCPU3->pause = true;
 halt(VCPU3);
                       VCPU3->pause = false;
                       resume(VCPU3);
                                              ...wake up...
                                              if (!pause)
                                                Enter guest mode. Bad!
 VCPU3->pause = false;
 resume(VCPU3);

(Yes, the "Bad!" is there to both identify something we don't want
 occurring and to make fun of Trump's tweeting style.)
---
 arch/arm/kvm/arm.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 47f6c7fdca96..9174ed13135a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -545,6 +545,7 @@ void kvm_arm_halt_guest(struct kvm *kvm)
 void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.pause = true;
+	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
 	kvm_vcpu_kick(vcpu);
 }
 
@@ -664,7 +665,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
 		    kvm_request_pending(vcpu) ||
-		    vcpu->arch.power_off || vcpu->arch.pause) {
+		    vcpu->arch.power_off) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
 			local_irq_enable();
 			kvm_pmu_sync_hwstate(vcpu);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (3 preceding siblings ...)
  2017-05-03 16:06 ` [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-06 18:12   ` Christoffer Dall
  2017-05-03 16:06 ` [PATCH v3 06/10] KVM: arm/arm64: use vcpu requests for power_off Andrew Jones
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

VCPU requests that the receiver should handle should only be cleared
by the receiver. Not only does this properly implement the protocol,
but also avoids bugs where one VCPU clears another VCPU's request,
before the receiving VCPU has had a chance to see it.  ARM VCPUs
currently only handle one request, EXIT, and handling it is achieved
by checking pause to see if the VCPU should sleep.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/kvm/arm.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 9174ed13135a..7be0d9b0c63a 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -553,7 +553,6 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
 {
 	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
-	kvm_clear_request(KVM_REQ_VCPU_EXIT, vcpu);
 	vcpu->arch.pause = false;
 	swake_up(wq);
 }
@@ -625,7 +624,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		update_vttbr(vcpu->kvm);
 
-		if (vcpu->arch.power_off || vcpu->arch.pause)
+		if (kvm_request_pending(vcpu)) {
+			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
+				if (vcpu->arch.pause)
+					vcpu_sleep(vcpu);
+			}
+		}
+
+		if (vcpu->arch.power_off)
 			vcpu_sleep(vcpu);
 
 		/*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 06/10] KVM: arm/arm64: use vcpu requests for power_off
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (4 preceding siblings ...)
  2017-05-03 16:06 ` [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-06 18:17   ` Christoffer Dall
  2017-05-03 16:06 ` [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN Andrew Jones
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

System shutdown is currently using request-less VCPU kicks. This
leaves open a tiny race window, as it doesn't ensure the state
change to power_off is seen by a VCPU just about to enter guest
mode. VCPU requests, OTOH, are guaranteed to be seen (see "Ensuring
Requests Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst)
This patch applies the EXIT request used by pause to power_off,
closing the race window and also allowing us to remove the final
check of power_off in VCPU RUN, as the final check for requests
is sufficient.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/kvm/arm.c  | 3 +--
 arch/arm/kvm/psci.c | 5 ++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 7be0d9b0c63a..26d9d4d72853 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -670,8 +670,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		smp_store_mb(vcpu->mode, IN_GUEST_MODE);
 
 		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
-		    kvm_request_pending(vcpu) ||
-		    vcpu->arch.power_off) {
+		    kvm_request_pending(vcpu)) {
 			vcpu->mode = OUTSIDE_GUEST_MODE;
 			local_irq_enable();
 			kvm_pmu_sync_hwstate(vcpu);
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index f68be2cc6256..f189d0ad30d5 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -179,10 +179,9 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 	 * after this call is handled and before the VCPUs have been
 	 * re-initialized.
 	 */
-	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
+	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
 		tmp->arch.power_off = true;
-		kvm_vcpu_kick(tmp);
-	}
+	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_VCPU_EXIT);
 
 	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
 	vcpu->run->system_event.type = type;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (5 preceding siblings ...)
  2017-05-03 16:06 ` [PATCH v3 06/10] KVM: arm/arm64: use vcpu requests for power_off Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-06 18:27   ` Christoffer Dall
  2017-05-03 16:06 ` [PATCH v3 08/10] KVM: arm/arm64: change exit request to sleep request Andrew Jones
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

We can make a small optimization by not checking the state of
the power_off field on each run. This is done by treating
power_off like pause, only checking it when we get the EXIT
VCPU request. When a VCPU powers off another VCPU the EXIT
request is already made, so we just need to make sure the
request is also made on self power off. kvm_vcpu_kick() isn't
necessary for these cases, as the VCPU would just be kicking
itself, but we add it anyway as a self kick doesn't cost much,
and it makes the code more future-proof.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/kvm/arm.c  | 16 ++++++++++------
 arch/arm/kvm/psci.c |  2 ++
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 26d9d4d72853..24bbc7671d89 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -371,6 +371,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 	kvm_timer_vcpu_put(vcpu);
 }
 
+static void vcpu_power_off(struct kvm_vcpu *vcpu)
+{
+	vcpu->arch.power_off = true;
+	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
+	kvm_vcpu_kick(vcpu);
+}
+
 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
 				    struct kvm_mp_state *mp_state)
 {
@@ -390,7 +397,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
 		vcpu->arch.power_off = false;
 		break;
 	case KVM_MP_STATE_STOPPED:
-		vcpu->arch.power_off = true;
+		vcpu_power_off(vcpu);
 		break;
 	default:
 		return -EINVAL;
@@ -626,14 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 
 		if (kvm_request_pending(vcpu)) {
 			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
-				if (vcpu->arch.pause)
+				if (vcpu->arch.power_off || vcpu->arch.pause)
 					vcpu_sleep(vcpu);
 			}
 		}
 
-		if (vcpu->arch.power_off)
-			vcpu_sleep(vcpu);
-
 		/*
 		 * Preparing the interrupts to be injected also
 		 * involves poking the GIC, which must be done in a
@@ -903,7 +907,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 	 * Handle the "start in power-off" case.
 	 */
 	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
-		vcpu->arch.power_off = true;
+		vcpu_power_off(vcpu);
 	else
 		vcpu->arch.power_off = false;
 
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index f189d0ad30d5..4a436685c552 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -65,6 +65,8 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.power_off = true;
+	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
+	kvm_vcpu_kick(vcpu);
 }
 
 static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 08/10] KVM: arm/arm64: change exit request to sleep request
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (6 preceding siblings ...)
  2017-05-03 16:06 ` [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-04 11:38   ` Paolo Bonzini
  2017-05-03 16:06 ` [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection Andrew Jones
  2017-05-03 16:06 ` [PATCH v3 10/10] KVM: arm/arm64: PMU: remove request-less vcpu kick Andrew Jones
  9 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

A request called EXIT is too generic. All requests are meant to cause
exits, but different requests have different flags. Let's not make
it difficult to decide if the EXIT request is correct for some case
by just always providing unique requests for each case. This patch
changes EXIT to SLEEP, because that's what the request is asking the
VCPU to do.

We can also remove the 'if (power_off || pause)' condition because it's
likely one of them will be true when the request is pending, and the
same condition will be checked in swait_event_interruptible() before
actually going to sleep anyway.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/include/asm/kvm_host.h   |  2 +-
 arch/arm/kvm/arm.c                | 14 ++++++--------
 arch/arm/kvm/psci.c               |  4 ++--
 arch/arm64/include/asm/kvm_host.h |  2 +-
 4 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index a95d8e315507..41669578b3df 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -45,7 +45,7 @@
 #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS
 #endif
 
-#define KVM_REQ_VCPU_EXIT \
+#define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
 
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 24bbc7671d89..d62e99885434 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -374,7 +374,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 static void vcpu_power_off(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.power_off = true;
-	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
+	kvm_make_request(KVM_REQ_SLEEP, vcpu);
 	kvm_vcpu_kick(vcpu);
 }
 
@@ -546,13 +546,13 @@ void kvm_arm_halt_guest(struct kvm *kvm)
 
 	kvm_for_each_vcpu(i, vcpu, kvm)
 		vcpu->arch.pause = true;
-	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
+	kvm_make_all_cpus_request(kvm, KVM_REQ_SLEEP);
 }
 
 void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.pause = true;
-	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
+	kvm_make_request(KVM_REQ_SLEEP, vcpu);
 	kvm_vcpu_kick(vcpu);
 }
 
@@ -573,7 +573,7 @@ void kvm_arm_resume_guest(struct kvm *kvm)
 		kvm_arm_resume_vcpu(vcpu);
 }
 
-static void vcpu_sleep(struct kvm_vcpu *vcpu)
+static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
 {
 	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
 
@@ -632,10 +632,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		update_vttbr(vcpu->kvm);
 
 		if (kvm_request_pending(vcpu)) {
-			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
-				if (vcpu->arch.power_off || vcpu->arch.pause)
-					vcpu_sleep(vcpu);
-			}
+			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
+				vcpu_req_sleep(vcpu);
 		}
 
 		/*
diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
index 4a436685c552..f1e363bab5e8 100644
--- a/arch/arm/kvm/psci.c
+++ b/arch/arm/kvm/psci.c
@@ -65,7 +65,7 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
 static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
 {
 	vcpu->arch.power_off = true;
-	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
+	kvm_make_request(KVM_REQ_SLEEP, vcpu);
 	kvm_vcpu_kick(vcpu);
 }
 
@@ -183,7 +183,7 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
 	 */
 	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
 		tmp->arch.power_off = true;
-	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_VCPU_EXIT);
+	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP);
 
 	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
 	vcpu->run->system_event.type = type;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8631d210dde1..04c0f9d37386 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -42,7 +42,7 @@
 
 #define KVM_VCPU_MAX_FEATURES 4
 
-#define KVM_REQ_VCPU_EXIT \
+#define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
 
 int __attribute_const__ kvm_target_cpu(void);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (7 preceding siblings ...)
  2017-05-03 16:06 ` [PATCH v3 08/10] KVM: arm/arm64: change exit request to sleep request Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-04 11:47   ` Paolo Bonzini
  2017-05-06 18:51   ` Christoffer Dall
  2017-05-03 16:06 ` [PATCH v3 10/10] KVM: arm/arm64: PMU: remove request-less vcpu kick Andrew Jones
  9 siblings, 2 replies; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
kick meant to trigger the interrupt injection could be sent while
the VCPU is outside guest mode, which means no IPI is sent, and
after it has called kvm_vgic_flush_hwstate(), meaning it won't see
the updated GIC state until its next exit some time later for some
other reason.  The receiving VCPU only needs to check this request
in VCPU RUN to handle it.  By checking it, if it's pending, a
memory barrier will be issued that ensures all state is visible.
We still create a vcpu_req_irq_pending() function (which is a nop),
though, in order to allow us to use the standard request checking
pattern.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 arch/arm/include/asm/kvm_host.h   |  1 +
 arch/arm/kvm/arm.c                | 12 ++++++++++++
 arch/arm64/include/asm/kvm_host.h |  1 +
 virt/kvm/arm/arch_timer.c         |  1 +
 virt/kvm/arm/vgic/vgic.c          |  9 +++++++--
 5 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 41669578b3df..7bf90aaf2e87 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -47,6 +47,7 @@
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
+#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
 
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
 int __attribute_const__ kvm_target_cpu(void);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index d62e99885434..330064475914 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -581,6 +581,15 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
 				       (!vcpu->arch.pause)));
 }
 
+static void vcpu_req_irq_pending(struct kvm_vcpu *vcpu)
+{
+	/*
+	 * Nothing to do here. kvm_check_request() already issued a memory
+	 * barrier that pairs with kvm_make_request(), so all hardware state
+	 * we need to flush should now be visible.
+	 */
+}
+
 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.target >= 0;
@@ -634,6 +643,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
 		if (kvm_request_pending(vcpu)) {
 			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
 				vcpu_req_sleep(vcpu);
+			if (kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu))
+				vcpu_req_irq_pending(vcpu);
 		}
 
 		/*
@@ -777,6 +788,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
 	 * trigger a world-switch round on the running physical CPU to set the
 	 * virtual IRQ/FIQ fields in the HCR appropriately.
 	 */
+	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 	kvm_vcpu_kick(vcpu);
 
 	return 0;
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 04c0f9d37386..2c33fef945fe 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -44,6 +44,7 @@
 
 #define KVM_REQ_SLEEP \
 	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
+#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
 
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
index 5976609ef27c..469b43315c0a 100644
--- a/virt/kvm/arm/arch_timer.c
+++ b/virt/kvm/arm/arch_timer.c
@@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
 	 * If the vcpu is blocked we want to wake it up so that it will see
 	 * the timer has expired when entering the guest.
 	 */
+	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 	kvm_vcpu_kick(vcpu);
 }
 
diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
index 3d0979c30721..bdd4b3a953b5 100644
--- a/virt/kvm/arm/vgic/vgic.c
+++ b/virt/kvm/arm/vgic/vgic.c
@@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 		 * won't see this one until it exits for some other
 		 * reason.
 		 */
-		if (vcpu)
+		if (vcpu) {
+			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 			kvm_vcpu_kick(vcpu);
+		}
 		return false;
 	}
 
@@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
 	spin_unlock(&irq->irq_lock);
 	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
 
+	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 	kvm_vcpu_kick(vcpu);
 
 	return true;
@@ -719,8 +722,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
 	 * a good kick...
 	 */
 	kvm_for_each_vcpu(c, vcpu, kvm) {
-		if (kvm_vgic_vcpu_pending_irq(vcpu))
+		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
+			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
 			kvm_vcpu_kick(vcpu);
+		}
 	}
 }
 
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* [PATCH v3 10/10] KVM: arm/arm64: PMU: remove request-less vcpu kick
  2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
                   ` (8 preceding siblings ...)
  2017-05-03 16:06 ` [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection Andrew Jones
@ 2017-05-03 16:06 ` Andrew Jones
  2017-05-06 18:55   ` Christoffer Dall
  9 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-03 16:06 UTC (permalink / raw)
  To: kvmarm, kvm; +Cc: marc.zyngier, cdall, pbonzini

Refactor PMU overflow handling in order to remove the request-less
vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
there should be no chance that a kick sent at just the wrong time
(between the VCPU's call to kvm_pmu_flush_hwstate() and before it
enters guest mode) results in a failure for the guest to see updated
GIC state until its next exit some time later for some other reason.

Signed-off-by: Andrew Jones <drjones@redhat.com>
---
 virt/kvm/arm/pmu.c | 40 +++++++++++++++++++---------------------
 1 file changed, 19 insertions(+), 21 deletions(-)

diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
index 4b43e7f3b158..2451607dc25e 100644
--- a/virt/kvm/arm/pmu.c
+++ b/virt/kvm/arm/pmu.c
@@ -203,6 +203,23 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
 	return reg;
 }
 
+static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
+{
+	struct kvm_pmu *pmu = &vcpu->arch.pmu;
+	bool overflow = !!kvm_pmu_overflow_status(vcpu);
+
+	if (pmu->irq_level == overflow)
+		return;
+
+	pmu->irq_level = overflow;
+
+	if (likely(irqchip_in_kernel(vcpu->kvm))) {
+		int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
+					      pmu->irq_num, overflow);
+		WARN_ON(ret);
+	}
+}
+
 /**
  * kvm_pmu_overflow_set - set PMU overflow interrupt
  * @vcpu: The vcpu pointer
@@ -210,37 +227,18 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
  */
 void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
 {
-	u64 reg;
-
 	if (val == 0)
 		return;
 
 	vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
-	reg = kvm_pmu_overflow_status(vcpu);
-	if (reg != 0)
-		kvm_vcpu_kick(vcpu);
+	kvm_pmu_check_overflow(vcpu);
 }
 
 static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
 {
-	struct kvm_pmu *pmu = &vcpu->arch.pmu;
-	bool overflow;
-
 	if (!kvm_arm_pmu_v3_ready(vcpu))
 		return;
-
-	overflow = !!kvm_pmu_overflow_status(vcpu);
-	if (pmu->irq_level == overflow)
-		return;
-
-	pmu->irq_level = overflow;
-
-	if (likely(irqchip_in_kernel(vcpu->kvm))) {
-		int ret;
-		ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
-					  pmu->irq_num, overflow);
-		WARN_ON(ret);
-	}
+	kvm_pmu_check_overflow(vcpu);
 }
 
 bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 02/10] KVM: Add documentation for VCPU requests
  2017-05-03 16:06 ` [PATCH v3 02/10] KVM: Add documentation for VCPU requests Andrew Jones
@ 2017-05-04 11:27   ` Paolo Bonzini
  2017-05-04 12:06     ` Andrew Jones
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2017-05-04 11:27 UTC (permalink / raw)
  To: Andrew Jones, kvmarm, kvm; +Cc: marc.zyngier, cdall



On 03/05/2017 18:06, Andrew Jones wrote:
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  Documentation/virtual/kvm/vcpu-requests.rst | 269 ++++++++++++++++++++++++++++

I for one welcome our new reStructuredText overlords. :)

Thanks for the excellent writeup.

>  1 file changed, 269 insertions(+)
>  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> 
> diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
> new file mode 100644
> index 000000000000..d74616d7999a
> --- /dev/null
> +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> @@ -0,0 +1,269 @@
> +=================
> +KVM VCPU Requests
> +=================
> +
> +Overview
> +========
> +
> +KVM supports an internal API enabling threads to request a VCPU thread to
> +perform some activity.  For example, a thread may request a VCPU to flush
> +its TLB with a VCPU request.  The API consists of the following functions::
> +
> +  /* Check if any requests are pending for VCPU @vcpu. */
> +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> +
> +  /* Check if VCPU @vcpu has request @req pending. */
> +  bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /* Clear request @req for VCPU @vcpu. */
> +  void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /*
> +   * Check if VCPU @vcpu has request @req pending. When the request is
> +   * pending it will be cleared and a memory barrier, which pairs with
> +   * another in kvm_make_request(), will be issued.
> +   */
> +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /*
> +   * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
> +   * with another in kvm_check_request(), prior to setting the request.
> +   */
> +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> +
> +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> +
> +Typically a requester wants the VCPU to perform the activity as soon
> +as possible after making the request.  This means most requests
> +(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
> +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> +into it.
> +
> +VCPU Kicks
> +----------
> +
> +The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
> +order to perform some KVM maintenance.  To do so, an IPI is sent, forcing
> +a guest mode exit.  However, a VCPU thread may not be in guest mode at the
> +time of the kick.  Therefore, depending on the mode and state of the VCPU
> +thread, there are two other actions a kick may take.  All three actions
> +are listed below:
> +
> +1) Send an IPI.  This forces a guest mode exit.
> +2) Waking a sleeping VCPU.  Sleeping VCPUs are VCPU threads outside guest
> +   mode that wait on waitqueues.  Waking them removes the threads from
> +   the waitqueues, allowing the threads to run again.  This behavior
> +   may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
> +3) Nothing.  When the VCPU is not in guest mode and the VCPU thread is not
> +   sleeping, then there is nothing to do.
> +
> +VCPU Mode
> +---------
> +
> +VCPUs have a mode state, vcpu->mode, that is used to track whether the
> +guest is running in guest mode or not, as well as some specific
> +outside guest mode states.  The architecture may use vcpu->mode to ensure
> +VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"), as
> +well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and even
> +to ensure IPI acknowledgements are waited upon (see "Waiting for
> +Acknowledgements").  The following modes are defined:
> +
> +OUTSIDE_GUEST_MODE
> +
> +  The VCPU thread is outside guest mode.
> +
> +IN_GUEST_MODE
> +
> +  The VCPU thread is in guest mode.
> +
> +EXITING_GUEST_MODE
> +
> +  The VCPU thread is transitioning from IN_GUEST_MODE to
> +  OUTSIDE_GUEST_MODE.
> +
> +READING_SHADOW_PAGE_TABLES
> +
> +  The VCPU thread is outside guest mode and wants certain VCPU requests,
> +  namely KVM_REQ_TLB_FLUSH, to be delayed until it's done reading the
> +  page tables.

... but it wants the sender of certain VCPU requests, namely
KVM_REQ_TLB_FLUSH to wait until the VCPU thread is done reading the page
tables.

> +VCPU Request Internals
> +======================
> +
> +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> +means general bitops, like those documented in [atomic-ops]_ could also be
> +used, e.g. ::
> +
> +  clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
> +
> +However, VCPU request users should refrain from doing so, as it would
> +break the abstraction.  The first 8 bits are reserved for architecture
> +independent requests, all additional bits are available for architecture
> +dependent requests.
> +
> +Architecture Independent Requests
> +---------------------------------
> +
> +KVM_REQ_TLB_FLUSH
> +
> +  KVM's common MMU notifier may need to flush all of a guest's TLB
> +  entries, calling kvm_flush_remote_tlbs() to do so.  Architectures that
> +  choose to use the common kvm_flush_remote_tlbs() implementation will
> +  need to handle this VCPU request.
> +
> +KVM_REQ_MMU_RELOAD
> +
> +  When shadow page tables are used and memory slots are removed it's
> +  necessary to inform each VCPU to completely refresh the tables.  This
> +  request is used for that.
> +
> +KVM_REQ_PENDING_TIMER
> +
> +  This request may be made from a timer handler run on the host on behalf
> +  of a VCPU.  It informs the VCPU thread to inject a timer interrupt.
> +
> +KVM_REQ_UNHALT
> +
> +  This request may be made from the KVM common function kvm_vcpu_block(),
> +  which is used to emulate an instruction that causes a CPU to halt until
> +  one of an architectural specific set of events and/or interrupts is
> +  received (determined by checking kvm_arch_vcpu_runnable()).  When that
> +  event or interrupt arrives kvm_vcpu_block() makes the request.  This is
> +  in contrast to when kvm_vcpu_block() returns due to any other reason,
> +  such as a pending signal, which does not indicate the VCPU's halt
> +  emulation should stop, and therefore does not make the request.
> +
> +KVM_REQUEST_MASK
> +----------------
> +
> +VCPU requests should be masked by KVM_REQUEST_MASK before using them with
> +bitops.  This is because only the lower 8 bits are used to represent the
> +request's number.  The upper bits are reserved, and may be used as flags.

The upper bits are used as flags.  Currently only two flags are defined.

> +VCPU Request Flags
> +------------------
> +
> +KVM_REQUEST_NO_WAKEUP
> +
> +  This flag is applied to a request that does not need immediate
> +  attention.  When a request does not need immediate attention, and the
> +  VCPU's thread is outside guest mode sleeping, then the thread is not
> +  awaken by a kick.
> +
> +KVM_REQUEST_WAIT
> +
> +  When requests with this flag are made with kvm_make_all_cpus_request(),
> +  then the caller will wait for each VCPU to acknowledge the IPI before
> +  proceeding.
> +
> +VCPU Requests with Associated State
> +===================================
> +
> +Requesters that want the receiving VCPU to handle new state need to ensure
> +the newly written state is observable to the receiving VCPU thread's CPU
> +by the time it observes the request.  This means a write memory barrier
> +must be inserted after writing the new state and before setting the VCPU
> +request bit.  Additionally, on the receiving VCPU thread's side, a
> +corresponding read barrier must be inserted after reading the request bit
> +and before proceeding to read the new state associated with it.  See
> +scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
> +[memory-barriers]_.
> +
> +The pair of functions, kvm_check_request() and kvm_make_request(), provide
> +the memory barriers, allowing this requirement to be handled internally by
> +the API.
> +
> +Ensuring Requests Are Seen
> +==========================
> +
> +When making requests to VCPUs, we want to avoid the receiving VCPU
> +executing in guest mode for an arbitrary long time without handling the
> +request.  We can be sure this won't happen as long as we ensure the VCPU
> +thread checks kvm_request_pending() before entering guest mode and that a
> +kick will send an IPI when necessary.  Extra care must be taken to cover
> +the period after the VCPU thread's last kvm_request_pending() check and
> +before it has entered guest mode, as kick IPIs will only trigger VCPU run
> +loops for VCPU threads that are in guest mode or at least have already
> +disabled interrupts in order to prepare to enter guest mode.  This means
> +that an optimized implementation (see "IPI Reduction") must be certain
> +when it's safe to not send the IPI.  One solution, which all architectures
> +except s390 apply, is to set vcpu->mode to IN_GUEST_MODE prior to the last
> +kvm_request_pending() check and to rely on memory barrier guarantees.

is to:

- set vcpu->mode to IN_GUEST_MODE between disabling the interrupts and
the last kvm_request_pending() check;

- enable interrupts atomically when entering the guest.

Then at the beginning of the next paragraph: "This solution also
requires memory barriers to be placed carefully in both the sender of
the IPI and the VCPU thread."

Should vcpu->mode and IN_GUEST_MODE use monospaced font?  Likewise
elsewhere in the document.


> +With memory barriers we can exclude the possibility of a VCPU thread
> +observing !kvm_request_pending() on its last check and then not receiving
> +an IPI for the next request made of it, even if the request is made
> +immediately after the check.  This is done by way of the Dekker memory
> +barrier pattern (scenario 10 of [lwn-mb]_).  As the Dekker pattern
> +requires two variables, this solution pairs vcpu->mode with
> +vcpu->requests.  Substituting them into the pattern gives::
> +
> +  CPU1                                    CPU2
> +  =================                       =================
> +  local_irq_disable();
> +  WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);  kvm_make_request(REQ, vcpu);
> +  smp_mb();                               smp_mb();
> +  if (kvm_request_pending(vcpu)) {        if (READ_ONCE(vcpu->mode) ==
> +                                              IN_GUEST_MODE) {
> +      ...abort guest entry...                 ...send IPI...
> +  }                                       }
> +
> +As stated above, the IPI is only useful for VCPU threads in guest mode or
> +that have already disabled interrupts.  This is why this specific case of
> +the Dekker pattern has been extended to disable interrupts before setting
> +vcpu->mode to IN_GUEST_MODE.  WRITE_ONCE() and READ_ONCE() are used to
> +pedantically implement the memory barrier pattern, guaranteeing the
> +compiler doesn't interfere with vcpu->mode's carefully planned accesses.
> +
> +IPI Reduction
> +-------------
> +
> +As only one IPI is needed to get a VCPU to check for any/all requests,
> +then they may be coalesced.  This is easily done by having the first IPI
> +sending kick also change the VCPU mode to something !IN_GUEST_MODE.  The
> +transitional state, EXITING_GUEST_MODE, is used for this purpose.
> +
> +Waiting for Acknowledgements
> +----------------------------
> +
> +Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
> +be sent, and the acknowledgements to be waited upon, even when the target
> +VCPU threads are in modes other than IN_GUEST_MODE.  For example, one case
> +is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
> +is set after disabling interrupts.  For these cases, the "should send an
> +IPI" condition becomes READ_ONCE(vcpu->mode) != OUTSIDE_GUEST_MODE.
> +
> +Request-less VCPU Kicks
> +-----------------------
> +
> +As the determination of whether or not to send an IPI depends on the
> +two-variable Dekker memory barrier pattern, then it's clear that
> +request-less VCPU kicks are almost never correct.  Without the assurance
> +that a non-IPI generating kick will still result in an action by the
> +receiving VCPU, as the final kvm_request_pending() check does for
> +request-accompanying kicks, then the kick may not do anything useful at
> +all.  If, for instance, a request-less kick was made to a VCPU that was
> +just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
> +the VCPU thread may continue its entry without actually having done
> +whatever it was the kick was meant to initiate.

One exception is x86's posted interrupt mechanism.  In this case,
however, even the request-less VCPU kick is coupled with the same
local_irq_disable()+smp_mb() pattern described above; the ON bit
(Outstanding Notification) in the posted interrupt descriptor takes the
role of vcpu->requests.  When sending a posted interrupt, PIR.ON is set
before reading vcpu->mode; dually, in the VCPU thread,
vmx_sync_pir_to_irr reads PIR after setting vcpu->mode to IN_GUEST_MODE.

> +Additional Considerations
> +=========================
> +
> +Sleeping VCPUs
> +--------------
> +
> +VCPU threads may need to consider requests before and/or after calling
> +functions that may put them to sleep, e.g. kvm_vcpu_block().  Whether they
> +do or not, and, if they do, which requests need consideration, is
> +architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
> +to check if it should awaken.  One reason to do so is to provide
> +architectures a function where requests may be checked if necessary.

What did you have in mind here?

Paolo

> +References
> +==========
> +
> +.. [atomic-ops] Documentation/core-api/atomic_ops.rst
> +.. [memory-barriers] Documentation/memory-barriers.txt
> +.. [lwn-mb] https://lwn.net/Articles/573436/
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 08/10] KVM: arm/arm64: change exit request to sleep request
  2017-05-03 16:06 ` [PATCH v3 08/10] KVM: arm/arm64: change exit request to sleep request Andrew Jones
@ 2017-05-04 11:38   ` Paolo Bonzini
  2017-05-04 12:07     ` Andrew Jones
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2017-05-04 11:38 UTC (permalink / raw)
  To: Andrew Jones, kvmarm, kvm; +Cc: cdall, marc.zyngier, rkrcmar



On 03/05/2017 18:06, Andrew Jones wrote:
> -#define KVM_REQ_VCPU_EXIT \
> +#define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)


Note that this is still like this in kvm/queue:

#define KVM_REQ_VCPU_EXIT       (8 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)

but I did like the KVM_ARCH_REQ_FLAGS of Radim's series (just not
KVM_REQUEST_NO_WAKEUP_WAIT or whatever it was...).

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection
  2017-05-03 16:06 ` [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection Andrew Jones
@ 2017-05-04 11:47   ` Paolo Bonzini
  2017-05-06 18:49     ` Christoffer Dall
  2017-05-06 18:51   ` Christoffer Dall
  1 sibling, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2017-05-04 11:47 UTC (permalink / raw)
  To: Andrew Jones, kvmarm, kvm; +Cc: cdall, marc.zyngier, rkrcmar



On 03/05/2017 18:06, Andrew Jones wrote:
> Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> kick meant to trigger the interrupt injection could be sent while
> the VCPU is outside guest mode, which means no IPI is sent, and
> after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> the updated GIC state until its next exit some time later for some
> other reason.  The receiving VCPU only needs to check this request
> in VCPU RUN to handle it.  By checking it, if it's pending, a
> memory barrier will be issued that ensures all state is visible.
> We still create a vcpu_req_irq_pending() function (which is a nop),
> though, in order to allow us to use the standard request checking
> pattern.

I wonder if you aren't just papering over this race:

        /*
         * If there are no virtual interrupts active or pending for this
         * VCPU, then there is no work to do and we can bail out without
         * taking any lock.  There is a potential race with someone injecting
         * interrupts to the VCPU, but it is a benign race as the VCPU will
         * either observe the new interrupt before or after doing this check,
         * and introducing additional synchronization mechanism doesn't change
         * this.
         */
        if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
                return;

        spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
        vgic_flush_lr_state(vcpu);
        spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);

not being so "benign" after all. :)  Maybe you can remove the if (list_empty()),
and have kvm_arch_vcpu_ioctl_run do this instead:

 		if (kvm_request_pending(vcpu)) {
 			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
 				vcpu_req_sleep(vcpu);
		}

                preempt_disable();

                kvm_pmu_flush_hwstate(vcpu);
                kvm_timer_flush_hwstate(vcpu);

		if (kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu))
			kvm_vgic_flush_hwstate(vcpu);

?

Paolo

> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm/kvm/arm.c                | 12 ++++++++++++
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  virt/kvm/arm/arch_timer.c         |  1 +
>  virt/kvm/arm/vgic/vgic.c          |  9 +++++++--
>  5 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 41669578b3df..7bf90aaf2e87 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -47,6 +47,7 @@
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
> +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index d62e99885434..330064475914 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -581,6 +581,15 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
>  				       (!vcpu->arch.pause)));
>  }
>  
> +static void vcpu_req_irq_pending(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * Nothing to do here. kvm_check_request() already issued a memory
> +	 * barrier that pairs with kvm_make_request(), so all hardware state
> +	 * we need to flush should now be visible.
> +	 */
> +}
> +
>  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.target >= 0;
> @@ -634,6 +643,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (kvm_request_pending(vcpu)) {
>  			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
>  				vcpu_req_sleep(vcpu);
> +			if (kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu))
> +				vcpu_req_irq_pending(vcpu);
>  		}
>  
>  		/*
> @@ -777,6 +788,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>  	 * trigger a world-switch round on the running physical CPU to set the
>  	 * virtual IRQ/FIQ fields in the HCR appropriately.
>  	 */
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  
>  	return 0;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 04c0f9d37386..2c33fef945fe 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -44,6 +44,7 @@
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
> +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 5976609ef27c..469b43315c0a 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
>  	 * If the vcpu is blocked we want to wake it up so that it will see
>  	 * the timer has expired when entering the guest.
>  	 */
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 3d0979c30721..bdd4b3a953b5 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  		 * won't see this one until it exits for some other
>  		 * reason.
>  		 */
> -		if (vcpu)
> +		if (vcpu) {
> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  			kvm_vcpu_kick(vcpu);
> +		}
>  		return false;
>  	}
>  
> @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  	spin_unlock(&irq->irq_lock);
>  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  
>  	return true;
> @@ -719,8 +722,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
>  	 * a good kick...
>  	 */
>  	kvm_for_each_vcpu(c, vcpu, kvm) {
> -		if (kvm_vgic_vcpu_pending_irq(vcpu))
> +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  			kvm_vcpu_kick(vcpu);
> +		}
>  	}
>  }
>  
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 02/10] KVM: Add documentation for VCPU requests
  2017-05-04 11:27   ` Paolo Bonzini
@ 2017-05-04 12:06     ` Andrew Jones
  2017-05-04 12:51       ` Paolo Bonzini
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-04 12:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, cdall, kvmarm, kvm

On Thu, May 04, 2017 at 01:27:35PM +0200, Paolo Bonzini wrote:
> 
> 
> On 03/05/2017 18:06, Andrew Jones wrote:
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  Documentation/virtual/kvm/vcpu-requests.rst | 269 ++++++++++++++++++++++++++++
> 
> I for one welcome our new reStructuredText overlords. :)
> 
> Thanks for the excellent writeup.
> 
> >  1 file changed, 269 insertions(+)
> >  create mode 100644 Documentation/virtual/kvm/vcpu-requests.rst
> > 
> > diff --git a/Documentation/virtual/kvm/vcpu-requests.rst b/Documentation/virtual/kvm/vcpu-requests.rst
> > new file mode 100644
> > index 000000000000..d74616d7999a
> > --- /dev/null
> > +++ b/Documentation/virtual/kvm/vcpu-requests.rst
> > @@ -0,0 +1,269 @@
> > +=================
> > +KVM VCPU Requests
> > +=================
> > +
> > +Overview
> > +========
> > +
> > +KVM supports an internal API enabling threads to request a VCPU thread to
> > +perform some activity.  For example, a thread may request a VCPU to flush
> > +its TLB with a VCPU request.  The API consists of the following functions::
> > +
> > +  /* Check if any requests are pending for VCPU @vcpu. */
> > +  bool kvm_request_pending(struct kvm_vcpu *vcpu);
> > +
> > +  /* Check if VCPU @vcpu has request @req pending. */
> > +  bool kvm_test_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /* Clear request @req for VCPU @vcpu. */
> > +  void kvm_clear_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /*
> > +   * Check if VCPU @vcpu has request @req pending. When the request is
> > +   * pending it will be cleared and a memory barrier, which pairs with
> > +   * another in kvm_make_request(), will be issued.
> > +   */
> > +  bool kvm_check_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /*
> > +   * Make request @req of VCPU @vcpu. Issues a memory barrier, which pairs
> > +   * with another in kvm_check_request(), prior to setting the request.
> > +   */
> > +  void kvm_make_request(int req, struct kvm_vcpu *vcpu);
> > +
> > +  /* Make request @req of all VCPUs of the VM with struct kvm @kvm. */
> > +  bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
> > +
> > +Typically a requester wants the VCPU to perform the activity as soon
> > +as possible after making the request.  This means most requests
> > +(kvm_make_request() calls) are followed by a call to kvm_vcpu_kick(),
> > +and kvm_make_all_cpus_request() has the kicking of all VCPUs built
> > +into it.
> > +
> > +VCPU Kicks
> > +----------
> > +
> > +The goal of a VCPU kick is to bring a VCPU thread out of guest mode in
> > +order to perform some KVM maintenance.  To do so, an IPI is sent, forcing
> > +a guest mode exit.  However, a VCPU thread may not be in guest mode at the
> > +time of the kick.  Therefore, depending on the mode and state of the VCPU
> > +thread, there are two other actions a kick may take.  All three actions
> > +are listed below:
> > +
> > +1) Send an IPI.  This forces a guest mode exit.
> > +2) Waking a sleeping VCPU.  Sleeping VCPUs are VCPU threads outside guest
> > +   mode that wait on waitqueues.  Waking them removes the threads from
> > +   the waitqueues, allowing the threads to run again.  This behavior
> > +   may be suppressed, see KVM_REQUEST_NO_WAKEUP below.
> > +3) Nothing.  When the VCPU is not in guest mode and the VCPU thread is not
> > +   sleeping, then there is nothing to do.
> > +
> > +VCPU Mode
> > +---------
> > +
> > +VCPUs have a mode state, vcpu->mode, that is used to track whether the
> > +guest is running in guest mode or not, as well as some specific
> > +outside guest mode states.  The architecture may use vcpu->mode to ensure
> > +VCPU requests are seen by VCPUs (see "Ensuring Requests Are Seen"), as
> > +well as to avoid sending unnecessary IPIs (see "IPI Reduction"), and even
> > +to ensure IPI acknowledgements are waited upon (see "Waiting for
> > +Acknowledgements").  The following modes are defined:
> > +
> > +OUTSIDE_GUEST_MODE
> > +
> > +  The VCPU thread is outside guest mode.
> > +
> > +IN_GUEST_MODE
> > +
> > +  The VCPU thread is in guest mode.
> > +
> > +EXITING_GUEST_MODE
> > +
> > +  The VCPU thread is transitioning from IN_GUEST_MODE to
> > +  OUTSIDE_GUEST_MODE.
> > +
> > +READING_SHADOW_PAGE_TABLES
> > +
> > +  The VCPU thread is outside guest mode and wants certain VCPU requests,
> > +  namely KVM_REQ_TLB_FLUSH, to be delayed until it's done reading the
> > +  page tables.
> 
> ... but it wants the sender of certain VCPU requests, namely
> KVM_REQ_TLB_FLUSH to wait until the VCPU thread is done reading the page
> tables.

fixed

> 
> > +VCPU Request Internals
> > +======================
> > +
> > +VCPU requests are simply bit indices of the vcpu->requests bitmap.  This
> > +means general bitops, like those documented in [atomic-ops]_ could also be
> > +used, e.g. ::
> > +
> > +  clear_bit(KVM_REQ_UNHALT & KVM_REQUEST_MASK, &vcpu->requests);
> > +
> > +However, VCPU request users should refrain from doing so, as it would
> > +break the abstraction.  The first 8 bits are reserved for architecture
> > +independent requests, all additional bits are available for architecture
> > +dependent requests.
> > +
> > +Architecture Independent Requests
> > +---------------------------------
> > +
> > +KVM_REQ_TLB_FLUSH
> > +
> > +  KVM's common MMU notifier may need to flush all of a guest's TLB
> > +  entries, calling kvm_flush_remote_tlbs() to do so.  Architectures that
> > +  choose to use the common kvm_flush_remote_tlbs() implementation will
> > +  need to handle this VCPU request.
> > +
> > +KVM_REQ_MMU_RELOAD
> > +
> > +  When shadow page tables are used and memory slots are removed it's
> > +  necessary to inform each VCPU to completely refresh the tables.  This
> > +  request is used for that.
> > +
> > +KVM_REQ_PENDING_TIMER
> > +
> > +  This request may be made from a timer handler run on the host on behalf
> > +  of a VCPU.  It informs the VCPU thread to inject a timer interrupt.
> > +
> > +KVM_REQ_UNHALT
> > +
> > +  This request may be made from the KVM common function kvm_vcpu_block(),
> > +  which is used to emulate an instruction that causes a CPU to halt until
> > +  one of an architectural specific set of events and/or interrupts is
> > +  received (determined by checking kvm_arch_vcpu_runnable()).  When that
> > +  event or interrupt arrives kvm_vcpu_block() makes the request.  This is
> > +  in contrast to when kvm_vcpu_block() returns due to any other reason,
> > +  such as a pending signal, which does not indicate the VCPU's halt
> > +  emulation should stop, and therefore does not make the request.
> > +
> > +KVM_REQUEST_MASK
> > +----------------
> > +
> > +VCPU requests should be masked by KVM_REQUEST_MASK before using them with
> > +bitops.  This is because only the lower 8 bits are used to represent the
> > +request's number.  The upper bits are reserved, and may be used as flags.
> 
> The upper bits are used as flags.  Currently only two flags are defined.

fixed

> 
> > +VCPU Request Flags
> > +------------------
> > +
> > +KVM_REQUEST_NO_WAKEUP
> > +
> > +  This flag is applied to a request that does not need immediate
> > +  attention.  When a request does not need immediate attention, and the
> > +  VCPU's thread is outside guest mode sleeping, then the thread is not
> > +  awaken by a kick.
> > +
> > +KVM_REQUEST_WAIT
> > +
> > +  When requests with this flag are made with kvm_make_all_cpus_request(),
> > +  then the caller will wait for each VCPU to acknowledge the IPI before
> > +  proceeding.
> > +
> > +VCPU Requests with Associated State
> > +===================================
> > +
> > +Requesters that want the receiving VCPU to handle new state need to ensure
> > +the newly written state is observable to the receiving VCPU thread's CPU
> > +by the time it observes the request.  This means a write memory barrier
> > +must be inserted after writing the new state and before setting the VCPU
> > +request bit.  Additionally, on the receiving VCPU thread's side, a
> > +corresponding read barrier must be inserted after reading the request bit
> > +and before proceeding to read the new state associated with it.  See
> > +scenario 3, Message and Flag, of [lwn-mb]_ and the kernel documentation
> > +[memory-barriers]_.
> > +
> > +The pair of functions, kvm_check_request() and kvm_make_request(), provide
> > +the memory barriers, allowing this requirement to be handled internally by
> > +the API.
> > +
> > +Ensuring Requests Are Seen
> > +==========================
> > +
> > +When making requests to VCPUs, we want to avoid the receiving VCPU
> > +executing in guest mode for an arbitrary long time without handling the
> > +request.  We can be sure this won't happen as long as we ensure the VCPU
> > +thread checks kvm_request_pending() before entering guest mode and that a
> > +kick will send an IPI when necessary.  Extra care must be taken to cover
> > +the period after the VCPU thread's last kvm_request_pending() check and
> > +before it has entered guest mode, as kick IPIs will only trigger VCPU run
> > +loops for VCPU threads that are in guest mode or at least have already
> > +disabled interrupts in order to prepare to enter guest mode.  This means
> > +that an optimized implementation (see "IPI Reduction") must be certain
> > +when it's safe to not send the IPI.  One solution, which all architectures
> > +except s390 apply, is to set vcpu->mode to IN_GUEST_MODE prior to the last
> > +kvm_request_pending() check and to rely on memory barrier guarantees.
> 
> is to:
> 
> - set vcpu->mode to IN_GUEST_MODE between disabling the interrupts and
> the last kvm_request_pending() check;
> 
> - enable interrupts atomically when entering the guest.
> 
> Then at the beginning of the next paragraph: "This solution also
> requires memory barriers to be placed carefully in both the sender of
> the IPI and the VCPU thread."

fixed

> 
> Should vcpu->mode and IN_GUEST_MODE use monospaced font?  Likewise
> elsewhere in the document.

I thought about that, but all the `` that would entail might make the
text viewing a bit too ugly. I like rst because, when its features are
minimally used, it looks nice even as raw text, especially with vim's
syntax highlighting. Anyway, I'll quick add the `` to see how it looks,
but if it's too much, then I guess I'll drop them again.

> 
> 
> > +With memory barriers we can exclude the possibility of a VCPU thread
> > +observing !kvm_request_pending() on its last check and then not receiving
> > +an IPI for the next request made of it, even if the request is made
> > +immediately after the check.  This is done by way of the Dekker memory
> > +barrier pattern (scenario 10 of [lwn-mb]_).  As the Dekker pattern
> > +requires two variables, this solution pairs vcpu->mode with
> > +vcpu->requests.  Substituting them into the pattern gives::
> > +
> > +  CPU1                                    CPU2
> > +  =================                       =================
> > +  local_irq_disable();
> > +  WRITE_ONCE(vcpu->mode, IN_GUEST_MODE);  kvm_make_request(REQ, vcpu);
> > +  smp_mb();                               smp_mb();
> > +  if (kvm_request_pending(vcpu)) {        if (READ_ONCE(vcpu->mode) ==
> > +                                              IN_GUEST_MODE) {
> > +      ...abort guest entry...                 ...send IPI...
> > +  }                                       }
> > +
> > +As stated above, the IPI is only useful for VCPU threads in guest mode or
> > +that have already disabled interrupts.  This is why this specific case of
> > +the Dekker pattern has been extended to disable interrupts before setting
> > +vcpu->mode to IN_GUEST_MODE.  WRITE_ONCE() and READ_ONCE() are used to
> > +pedantically implement the memory barrier pattern, guaranteeing the
> > +compiler doesn't interfere with vcpu->mode's carefully planned accesses.
> > +
> > +IPI Reduction
> > +-------------
> > +
> > +As only one IPI is needed to get a VCPU to check for any/all requests,
> > +then they may be coalesced.  This is easily done by having the first IPI
> > +sending kick also change the VCPU mode to something !IN_GUEST_MODE.  The
> > +transitional state, EXITING_GUEST_MODE, is used for this purpose.
> > +
> > +Waiting for Acknowledgements
> > +----------------------------
> > +
> > +Some requests, those with the KVM_REQUEST_WAIT flag set, require IPIs to
> > +be sent, and the acknowledgements to be waited upon, even when the target
> > +VCPU threads are in modes other than IN_GUEST_MODE.  For example, one case
> > +is when a target VCPU thread is in READING_SHADOW_PAGE_TABLES mode, which
> > +is set after disabling interrupts.  For these cases, the "should send an
> > +IPI" condition becomes READ_ONCE(vcpu->mode) != OUTSIDE_GUEST_MODE.
> > +
> > +Request-less VCPU Kicks
> > +-----------------------
> > +
> > +As the determination of whether or not to send an IPI depends on the
> > +two-variable Dekker memory barrier pattern, then it's clear that
> > +request-less VCPU kicks are almost never correct.  Without the assurance
> > +that a non-IPI generating kick will still result in an action by the
> > +receiving VCPU, as the final kvm_request_pending() check does for
> > +request-accompanying kicks, then the kick may not do anything useful at
> > +all.  If, for instance, a request-less kick was made to a VCPU that was
> > +just about to set its mode to IN_GUEST_MODE, meaning no IPI is sent, then
> > +the VCPU thread may continue its entry without actually having done
> > +whatever it was the kick was meant to initiate.
> 
> One exception is x86's posted interrupt mechanism.  In this case,
> however, even the request-less VCPU kick is coupled with the same
> local_irq_disable()+smp_mb() pattern described above; the ON bit
> (Outstanding Notification) in the posted interrupt descriptor takes the
> role of vcpu->requests.  When sending a posted interrupt, PIR.ON is set
> before reading vcpu->mode; dually, in the VCPU thread,
> vmx_sync_pir_to_irr reads PIR after setting vcpu->mode to IN_GUEST_MODE.

I'll add this paragraph you wrote.

> 
> > +Additional Considerations
> > +=========================
> > +
> > +Sleeping VCPUs
> > +--------------
> > +
> > +VCPU threads may need to consider requests before and/or after calling
> > +functions that may put them to sleep, e.g. kvm_vcpu_block().  Whether they
> > +do or not, and, if they do, which requests need consideration, is
> > +architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
> > +to check if it should awaken.  One reason to do so is to provide
> > +architectures a function where requests may be checked if necessary.
> 
> What did you have in mind here?

I was trying to point out vcpu request concerns with respect to sleeping
vcpus, but to stay as general as possible. I can't really think of
anything else to say here, other than to give some hypothetical example.
For a while I was thinking I might check requests (kvm_request_pending())
from the kvm_arch_vcpu_runnable() call for ARM, but then changed my mind
on that - leaving it only checking the pause and power_off booleans.
Anyway, I don't think the above paragraph is "wrong", but if it's
confusing then I can change / remove it as people like. Just let me know
how you'd like it changed :-)

Thanks,
drew

> 
> Paolo
> 
> > +References
> > +==========
> > +
> > +.. [atomic-ops] Documentation/core-api/atomic_ops.rst
> > +.. [memory-barriers] Documentation/memory-barriers.txt
> > +.. [lwn-mb] https://lwn.net/Articles/573436/
> > 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 08/10] KVM: arm/arm64: change exit request to sleep request
  2017-05-04 11:38   ` Paolo Bonzini
@ 2017-05-04 12:07     ` Andrew Jones
  0 siblings, 0 replies; 43+ messages in thread
From: Andrew Jones @ 2017-05-04 12:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, cdall, kvmarm, kvm

On Thu, May 04, 2017 at 01:38:13PM +0200, Paolo Bonzini wrote:
> 
> 
> On 03/05/2017 18:06, Andrew Jones wrote:
> > -#define KVM_REQ_VCPU_EXIT \
> > +#define KVM_REQ_SLEEP \
> >  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
> 
> 
> Note that this is still like this in kvm/queue:
> 
> #define KVM_REQ_VCPU_EXIT       (8 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> 
> but I did like the KVM_ARCH_REQ_FLAGS of Radim's series (just not
> KVM_REQUEST_NO_WAKEUP_WAIT or whatever it was...).

Yeah, I forgot to mention in the cover letter that I just picked my
favorite way for Radim to resolve the KVM_REQUEST_NO_WAKEUP_WAIT thing
for my base. I figured if I picked wrong that the conflict resolution
would be trivial enough that it didn't matter anyway.

Thanks,
drew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 02/10] KVM: Add documentation for VCPU requests
  2017-05-04 12:06     ` Andrew Jones
@ 2017-05-04 12:51       ` Paolo Bonzini
  2017-05-04 13:31         ` Andrew Jones
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2017-05-04 12:51 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, cdall, kvmarm, kvm



On 04/05/2017 14:06, Andrew Jones wrote:
>>> +VCPU threads may need to consider requests before and/or after calling
>>> +functions that may put them to sleep, e.g. kvm_vcpu_block().  Whether they
>>> +do or not, and, if they do, which requests need consideration, is
>>> +architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
>>> +to check if it should awaken.  One reason to do so is to provide
>>> +architectures a function where requests may be checked if necessary.
>> What did you have in mind here?
> I was trying to point out vcpu request concerns with respect to sleeping
> vcpus, but to stay as general as possible. I can't really think of
> anything else to say here, other than to give some hypothetical example.
> For a while I was thinking I might check requests (kvm_request_pending())
> from the kvm_arch_vcpu_runnable() call for ARM, but then changed my mind
> on that - leaving it only checking the pause and power_off booleans.
> Anyway, I don't think the above paragraph is "wrong", but if it's
> confusing then I can change / remove it as people like. Just let me know
> how you'd like it changed :-)

I think the x86 scheme, where you only process requests once you have
decided you'll get IN_GUEST_MODE, is a good one.

That is, they _may_ check some requests in kvm_arch_vcpu_runnable but
not process them.

For ARM this would be:

                if (vcpu->arch.power_off || vcpu->arch.pause) {
                        vcpu_sleep(vcpu);
			ret = 0;
		} else {
			ret = vcpu_enter_guest(vcpu);
		}

where vcpu_enter_guest is basically the "while (ret > 0)" loop in
kvm_arch_vcpu_ioctl_run:


                /*
                 * Check conditions before entering the guest
                 */
                cond_resched();

                update_vttbr(vcpu->kvm);
                preempt_disable();
		...
                if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
                        vcpu->arch.power_off || vcpu->arch.pause) {
                        local_irq_enable();
                        kvm_pmu_sync_hwstate(vcpu);
                        kvm_timer_sync_hwstate(vcpu);
                        kvm_vgic_sync_hwstate(vcpu);
                        preempt_enable();
                        return ret;
                }
		...
                preempt_enable();
                return handle_exit(vcpu, run, ret);

In your case, you don't need to check any request in
kvm_arch_vcpu_runnable, I think.  This split would also solve my review
doubt from "Re: [PATCH v3 05/10] KVM: arm/arm64: don't clear exit
request from caller".

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 02/10] KVM: Add documentation for VCPU requests
  2017-05-04 12:51       ` Paolo Bonzini
@ 2017-05-04 13:31         ` Andrew Jones
  0 siblings, 0 replies; 43+ messages in thread
From: Andrew Jones @ 2017-05-04 13:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kvmarm, kvm, cdall, marc.zyngier, rkrcmar

On Thu, May 04, 2017 at 02:51:39PM +0200, Paolo Bonzini wrote:
> 
> 
> On 04/05/2017 14:06, Andrew Jones wrote:
> >>> +VCPU threads may need to consider requests before and/or after calling
> >>> +functions that may put them to sleep, e.g. kvm_vcpu_block().  Whether they
> >>> +do or not, and, if they do, which requests need consideration, is
> >>> +architecture dependent.  kvm_vcpu_block() calls kvm_arch_vcpu_runnable()
> >>> +to check if it should awaken.  One reason to do so is to provide
> >>> +architectures a function where requests may be checked if necessary.
> >> What did you have in mind here?
> > I was trying to point out vcpu request concerns with respect to sleeping
> > vcpus, but to stay as general as possible. I can't really think of
> > anything else to say here, other than to give some hypothetical example.
> > For a while I was thinking I might check requests (kvm_request_pending())
> > from the kvm_arch_vcpu_runnable() call for ARM, but then changed my mind
> > on that - leaving it only checking the pause and power_off booleans.
> > Anyway, I don't think the above paragraph is "wrong", but if it's
> > confusing then I can change / remove it as people like. Just let me know
> > how you'd like it changed :-)
> 
> I think the x86 scheme, where you only process requests once you have
> decided you'll get IN_GUEST_MODE, is a good one.
> 
> That is, they _may_ check some requests in kvm_arch_vcpu_runnable but
> not process them.

This was my thought too, but checking that there are pending requests
seems like a valid reason to unblock - although only for certain requests.

> 
> For ARM this would be:
> 
>                 if (vcpu->arch.power_off || vcpu->arch.pause) {
>                         vcpu_sleep(vcpu);
> 			ret = 0;
> 		} else {
> 			ret = vcpu_enter_guest(vcpu);
> 		}
> 
> where vcpu_enter_guest is basically the "while (ret > 0)" loop in
> kvm_arch_vcpu_ioctl_run:

I'm not sure this refactoring is necessary, but I can experiment
with it.

> 
> 
>                 /*
>                  * Check conditions before entering the guest
>                  */
>                 cond_resched();
> 
>                 update_vttbr(vcpu->kvm);
>                 preempt_disable();
> 		...
>                 if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
>                         vcpu->arch.power_off || vcpu->arch.pause) {

This needs to check kvm_requests_pending(), like patch 3/10 adds.

>                         local_irq_enable();
>                         kvm_pmu_sync_hwstate(vcpu);
>                         kvm_timer_sync_hwstate(vcpu);
>                         kvm_vgic_sync_hwstate(vcpu);
>                         preempt_enable();
>                         return ret;
>                 }
> 		...
>                 preempt_enable();
>                 return handle_exit(vcpu, run, ret);
> 
> In your case, you don't need to check any request in
> kvm_arch_vcpu_runnable, I think.

Right. That was might final determination as well, so I don't check
any requests there with this series. I still tried writing this paragraph
to capture the general idea though, as I still think it's a valid idea to
want to check for certain pending requests in kvm_arch_vcpu_runnable(),
in order to know if a wakeup is necessary.

> This split would also solve my review
> doubt from "Re: [PATCH v3 05/10] KVM: arm/arm64: don't clear exit

I haven't received your doubt yet. Problem with mail delivery? Or did
you forget to send it :-)

> request from caller".
> 
> Paolo

Thanks,
drew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu
  2017-05-03 16:06 ` [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu Andrew Jones
@ 2017-05-06 18:08   ` Christoffer Dall
  2017-05-09 17:02     ` Andrew Jones
  0 siblings, 1 reply; 43+ messages in thread
From: Christoffer Dall @ 2017-05-06 18:08 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, May 03, 2017 at 06:06:29PM +0200, Andrew Jones wrote:
> VCPU halting/resuming is partially implemented with VCPU requests.
> When kvm_arm_halt_guest() is called all VCPUs get the EXIT request,
> telling them to exit guest mode and look at the state of 'pause',
> which will be true, telling them to sleep.  As ARM's VCPU RUN
> implements the memory barrier pattern described in "Ensuring Requests
> Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst, there's
> no way for a VCPU halted by kvm_arm_halt_guest() to miss the pause
> state change.  However, before this patch, a single VCPU halted with
> kvm_arm_halt_vcpu() did not get a request, opening a tiny race window.
> This patch adds the request, closing the race window and also allowing
> us to remove the final check of pause in VCPU RUN, as the final check
> for requests is sufficient.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> 
> ---
> 
> I have two questions about the halting/resuming.
> 
> Question 1:
> 
> Do we even need kvm_arm_halt_vcpu()/kvm_arm_resume_vcpu()? It should
> only be necessary if one VCPU can activate or inactivate the private
> IRQs of another VCPU, right?  That doesn't seem like something that
> should be possible, but I'm GIC-illiterate...

True, it shouldn't be possible.  I wonder if we were thinking of
userspace access to the CPU-specific data, but we already ensure that no
VCPUs are running at that time, so I don't think it should be necessary.

> 
> Question 2:
> 
> It's not clear to me if we have another problem with halting/resuming
> or not.  If it's possible for VCPU1 and VCPU2 to race in
> vgic_mmio_write_s/cactive(), then the following scenario could occur,
> leading to VCPU3 being in guest mode when it should not be.  Does the
> hardware prohibit more than one VCPU entering trap handlers that lead
> to these functions at the same time?  If not, then I guess pause needs
> to be a counter instead of a boolean.
> 
>  VCPU1                 VCPU2                  VCPU3
>  -----                 -----                  -----
>                        VCPU3->pause = true;
>                        halt(VCPU3);
>                                               if (pause)
>                                                 sleep();
>  VCPU3->pause = true;
>  halt(VCPU3);
>                        VCPU3->pause = false;
>                        resume(VCPU3);
>                                               ...wake up...
>                                               if (!pause)
>                                                 Enter guest mode. Bad!
>  VCPU3->pause = false;
>  resume(VCPU3);
> 
> (Yes, the "Bad!" is there to both identify something we don't want
>  occurring and to make fun of Trump's tweeting style.)

I think it's bad, and it might be even worse, because it could lead to a
CPU looping forever in the host kernel, since there's no guarantee to
exit from the VM in the other VCPU thread.

But I think simply taking the kvm->lock mutex to serialize the mmio
active change operations should be sufficient.

If we agree on this I can send a patch with your reported by that fixes
that issue, which gets rid of kvm_arm_halt_vcpu and requires you to
modify your first patch to clear the KVM_REQ_VCPU_EXIT flag for each
vcpu in kvm_arm_halt_guest instead and you can fold the remaining change
from this patch into a patch that completely gets rid of the pause flag.

See untested patch draft at the end of this mail.

Thanks,
-Christoffer

> ---
>  arch/arm/kvm/arm.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 47f6c7fdca96..9174ed13135a 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -545,6 +545,7 @@ void kvm_arm_halt_guest(struct kvm *kvm)
>  void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	vcpu->arch.pause = true;
> +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  }
>  
> @@ -664,7 +665,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
>  		    kvm_request_pending(vcpu) ||
> -		    vcpu->arch.power_off || vcpu->arch.pause) {
> +		    vcpu->arch.power_off) {
>  			vcpu->mode = OUTSIDE_GUEST_MODE;
>  			local_irq_enable();
>  			kvm_pmu_sync_hwstate(vcpu);
> -- 
> 2.9.3
> 


Untested draft patch:

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index d488b88..b77a3af 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -234,8 +234,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
-void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
-void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
 
 int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
 unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 578df18..7a38d5a 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -334,8 +334,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
 struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
 void kvm_arm_halt_guest(struct kvm *kvm);
 void kvm_arm_resume_guest(struct kvm *kvm);
-void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
-void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
 
 u64 __kvm_call_hyp(void *hypfn, ...);
 #define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 7941699..932788a 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -542,27 +542,15 @@ void kvm_arm_halt_guest(struct kvm *kvm)
 	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
 }
 
-void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
-{
-	vcpu->arch.pause = true;
-	kvm_vcpu_kick(vcpu);
-}
-
-void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
-{
-	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
-
-	vcpu->arch.pause = false;
-	swake_up(wq);
-}
-
 void kvm_arm_resume_guest(struct kvm *kvm)
 {
 	int i;
 	struct kvm_vcpu *vcpu;
 
-	kvm_for_each_vcpu(i, vcpu, kvm)
-		kvm_arm_resume_vcpu(vcpu);
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		vcpu->arch.pause = false;
+		swake_up(kvm_arch_vcpu_wq(vcpu));
+	}
 }
 
 static void vcpu_sleep(struct kvm_vcpu *vcpu)
diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
index 2a5db13..c143add 100644
--- a/virt/kvm/arm/vgic/vgic-mmio.c
+++ b/virt/kvm/arm/vgic/vgic-mmio.c
@@ -231,23 +231,21 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
  * be migrated while we don't hold the IRQ locks and we don't want to be
  * chasing moving targets.
  *
- * For private interrupts, we only have to make sure the single and only VCPU
- * that can potentially queue the IRQ is stopped.
+ * For private interrupts we don't have to do anything because userspace
+ * accesses to the VGIC state already require all VCPUs to be stopped, and
+ * only the VCPU itself can modify its private interrupts active state, which
+ * guarantees that the VCPU is not running.
  */
 static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
 {
-	if (intid < VGIC_NR_PRIVATE_IRQS)
-		kvm_arm_halt_vcpu(vcpu);
-	else
+	if (intid > VGIC_NR_PRIVATE_IRQS)
 		kvm_arm_halt_guest(vcpu->kvm);
 }
 
 /* See vgic_change_active_prepare */
 static void vgic_change_active_finish(struct kvm_vcpu *vcpu, u32 intid)
 {
-	if (intid < VGIC_NR_PRIVATE_IRQS)
-		kvm_arm_resume_vcpu(vcpu);
-	else
+	if (intid > VGIC_NR_PRIVATE_IRQS)
 		kvm_arm_resume_guest(vcpu->kvm);
 }
 
@@ -258,6 +256,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
 
+	mutex_lock(&vcpu->kvm->lock);
 	vgic_change_active_prepare(vcpu, intid);
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
@@ -265,6 +264,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 	vgic_change_active_finish(vcpu, intid);
+	mutex_unlock(&vcpu->kvm->lock);
 }
 
 void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
@@ -274,6 +274,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
 	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
 	int i;
 
+	mutex_lock(&vcpu->kvm->lock);
 	vgic_change_active_prepare(vcpu, intid);
 	for_each_set_bit(i, &val, len * 8) {
 		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
@@ -281,6 +282,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
 		vgic_put_irq(vcpu->kvm, irq);
 	}
 	vgic_change_active_finish(vcpu, intid);
+	mutex_unlock(&vcpu->kvm->lock);
 }
 
 unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,

^ permalink raw reply related	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller
  2017-05-03 16:06 ` [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller Andrew Jones
@ 2017-05-06 18:12   ` Christoffer Dall
  2017-05-09 17:17     ` Andrew Jones
  0 siblings, 1 reply; 43+ messages in thread
From: Christoffer Dall @ 2017-05-06 18:12 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Wed, May 03, 2017 at 06:06:30PM +0200, Andrew Jones wrote:
> VCPU requests that the receiver should handle should only be cleared
> by the receiver. 

I cannot parse this sentence.

> Not only does this properly implement the protocol,
> but also avoids bugs where one VCPU clears another VCPU's request,
> before the receiving VCPU has had a chance to see it.

Is this an actual race we have currently or just something thay may
happen later.  Im' not sure.

> ARM VCPUs
> currently only handle one request, EXIT, and handling it is achieved
> by checking pause to see if the VCPU should sleep.

This makes sense.  So forget my comment on the previous patch about
getting rid of the pause flag.

> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/kvm/arm.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 9174ed13135a..7be0d9b0c63a 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -553,7 +553,6 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
>  {
>  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
>  
> -	kvm_clear_request(KVM_REQ_VCPU_EXIT, vcpu);
>  	vcpu->arch.pause = false;
>  	swake_up(wq);
>  }
> @@ -625,7 +624,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		update_vttbr(vcpu->kvm);
>  
> -		if (vcpu->arch.power_off || vcpu->arch.pause)
> +		if (kvm_request_pending(vcpu)) {
> +			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> +				if (vcpu->arch.pause)
> +					vcpu_sleep(vcpu);
> +			}

Can we factor out this bit to a separate function,
kvm_handle_vcpu_requests() or something like that?

> +		}
> +
> +		if (vcpu->arch.power_off)
>  			vcpu_sleep(vcpu);
>  
>  		/*
> -- 
> 2.9.3
> 
Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 06/10] KVM: arm/arm64: use vcpu requests for power_off
  2017-05-03 16:06 ` [PATCH v3 06/10] KVM: arm/arm64: use vcpu requests for power_off Andrew Jones
@ 2017-05-06 18:17   ` Christoffer Dall
  0 siblings, 0 replies; 43+ messages in thread
From: Christoffer Dall @ 2017-05-06 18:17 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, May 03, 2017 at 06:06:31PM +0200, Andrew Jones wrote:
> System shutdown is currently using request-less VCPU kicks. This
> leaves open a tiny race window, as it doesn't ensure the state
> change to power_off is seen by a VCPU just about to enter guest
> mode. VCPU requests, OTOH, are guaranteed to be seen (see "Ensuring
> Requests Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst)
> This patch applies the EXIT request used by pause to power_off,
> closing the race window and also allowing us to remove the final
> check of power_off in VCPU RUN, as the final check for requests
> is sufficient.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/kvm/arm.c  | 3 +--
>  arch/arm/kvm/psci.c | 5 ++---
>  2 files changed, 3 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 7be0d9b0c63a..26d9d4d72853 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -670,8 +670,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		smp_store_mb(vcpu->mode, IN_GUEST_MODE);
>  
>  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> -		    kvm_request_pending(vcpu) ||
> -		    vcpu->arch.power_off) {
> +		    kvm_request_pending(vcpu)) {
>  			vcpu->mode = OUTSIDE_GUEST_MODE;
>  			local_irq_enable();
>  			kvm_pmu_sync_hwstate(vcpu);
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index f68be2cc6256..f189d0ad30d5 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -179,10 +179,9 @@ static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type)
>  	 * after this call is handled and before the VCPUs have been
>  	 * re-initialized.
>  	 */
> -	kvm_for_each_vcpu(i, tmp, vcpu->kvm) {
> +	kvm_for_each_vcpu(i, tmp, vcpu->kvm)
>  		tmp->arch.power_off = true;
> -		kvm_vcpu_kick(tmp);
> -	}
> +	kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_VCPU_EXIT);
>  
>  	memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event));
>  	vcpu->run->system_event.type = type;
> -- 
> 2.9.3
> 

Reviewed-by: Christoffer Dall <cdall@linaro.org>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN
  2017-05-03 16:06 ` [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN Andrew Jones
@ 2017-05-06 18:27   ` Christoffer Dall
  2017-05-09 17:40     ` Andrew Jones
  0 siblings, 1 reply; 43+ messages in thread
From: Christoffer Dall @ 2017-05-06 18:27 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, May 03, 2017 at 06:06:32PM +0200, Andrew Jones wrote:

nit: can you make the subject of this patch a bit more specific?

For example:  Optimize checking power_off flag in KVM_RUN

> We can make a small optimization by not checking the state of
> the power_off field on each run. This is done by treating
> power_off like pause, only checking it when we get the EXIT
> VCPU request. When a VCPU powers off another VCPU the EXIT
> request is already made, so we just need to make sure the
> request is also made on self power off. kvm_vcpu_kick() isn't
> necessary for these cases, as the VCPU would just be kicking
> itself, but we add it anyway as a self kick doesn't cost much,
> and it makes the code more future-proof.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/kvm/arm.c  | 16 ++++++++++------
>  arch/arm/kvm/psci.c |  2 ++
>  2 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 26d9d4d72853..24bbc7671d89 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -371,6 +371,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  	kvm_timer_vcpu_put(vcpu);
>  }
>  
> +static void vcpu_power_off(struct kvm_vcpu *vcpu)
> +{
> +	vcpu->arch.power_off = true;
> +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> +	kvm_vcpu_kick(vcpu);
> +}
> +
>  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
>  				    struct kvm_mp_state *mp_state)
>  {
> @@ -390,7 +397,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
>  		vcpu->arch.power_off = false;
>  		break;
>  	case KVM_MP_STATE_STOPPED:
> -		vcpu->arch.power_off = true;
> +		vcpu_power_off(vcpu);
>  		break;
>  	default:
>  		return -EINVAL;
> @@ -626,14 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  
>  		if (kvm_request_pending(vcpu)) {
>  			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> -				if (vcpu->arch.pause)
> +				if (vcpu->arch.power_off || vcpu->arch.pause)
>  					vcpu_sleep(vcpu);
>  			}
>  		}
>  
> -		if (vcpu->arch.power_off)
> -			vcpu_sleep(vcpu);
> -

Hmmm, even though I just gave a reviewed-by on the pause side, I'm not
realizing that I don't think this works.  Because you're now only
checking requests in the vcpu loop, but the vcpu_sleep() function is
implemented using swait_event_interruptible(), which can wake up if you
have a pending signal for example, and then the loop can wrap around and
you can run the VCPU even though you should be paused.  Am I missing
something?

Thanks,
-Christoffer

>  		/*
>  		 * Preparing the interrupts to be injected also
>  		 * involves poking the GIC, which must be done in a
> @@ -903,7 +907,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
>  	 * Handle the "start in power-off" case.
>  	 */
>  	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
> -		vcpu->arch.power_off = true;
> +		vcpu_power_off(vcpu);
>  	else
>  		vcpu->arch.power_off = false;
>  
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index f189d0ad30d5..4a436685c552 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -65,6 +65,8 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
>  static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
>  {
>  	vcpu->arch.power_off = true;
> +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> +	kvm_vcpu_kick(vcpu);
>  }
>  
>  static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection
  2017-05-04 11:47   ` Paolo Bonzini
@ 2017-05-06 18:49     ` Christoffer Dall
  2017-05-08  8:48       ` Paolo Bonzini
  0 siblings, 1 reply; 43+ messages in thread
From: Christoffer Dall @ 2017-05-06 18:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, kvmarm, kvm

On Thu, May 04, 2017 at 01:47:41PM +0200, Paolo Bonzini wrote:
> 
> 
> On 03/05/2017 18:06, Andrew Jones wrote:
> > Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> > kick meant to trigger the interrupt injection could be sent while
> > the VCPU is outside guest mode, which means no IPI is sent, and
> > after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> > the updated GIC state until its next exit some time later for some
> > other reason.  The receiving VCPU only needs to check this request
> > in VCPU RUN to handle it.  By checking it, if it's pending, a
> > memory barrier will be issued that ensures all state is visible.
> > We still create a vcpu_req_irq_pending() function (which is a nop),
> > though, in order to allow us to use the standard request checking
> > pattern.
> 
> I wonder if you aren't just papering over this race:
> 
>         /*
>          * If there are no virtual interrupts active or pending for this
>          * VCPU, then there is no work to do and we can bail out without
>          * taking any lock.  There is a potential race with someone injecting
>          * interrupts to the VCPU, but it is a benign race as the VCPU will
>          * either observe the new interrupt before or after doing this check,
>          * and introducing additional synchronization mechanism doesn't change
>          * this.
>          */
>         if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
>                 return;
> 
>         spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
>         vgic_flush_lr_state(vcpu);
>         spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> 
> not being so "benign" after all. :)  Maybe you can remove the if (list_empty()),
> and have kvm_arch_vcpu_ioctl_run do this instead:

I don't see how removing this shortcut improves anything.  You'd still
have the same window where you could loose an interrupt right after the
spin_unlock.

I think the race that this comment discusses is indeed benign, but the
overall guarantees that our vgic injection relies on is flawed and can
be solved by either doing requests as Drew does here, or moving the
vgic_flush inside a region that has both mode == IN_GUEST_MODE and
interrupts disabled.  Note that for other purposes I'm planning to move
the flush functions inside the interrupts disabled region later anyhow.

I don't see a problem with Drew's patch actually.

Thanks,
-Christoffer

> 
>  		if (kvm_request_pending(vcpu)) {
>  			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
>  				vcpu_req_sleep(vcpu);
> 		}
> 
>                 preempt_disable();
> 
>                 kvm_pmu_flush_hwstate(vcpu);
>                 kvm_timer_flush_hwstate(vcpu);
> 
> 		if (kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu))
> 			kvm_vgic_flush_hwstate(vcpu);
> 
> ?
> 
> Paolo
> 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  1 +
> >  arch/arm/kvm/arm.c                | 12 ++++++++++++
> >  arch/arm64/include/asm/kvm_host.h |  1 +
> >  virt/kvm/arm/arch_timer.c         |  1 +
> >  virt/kvm/arm/vgic/vgic.c          |  9 +++++++--
> >  5 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 41669578b3df..7bf90aaf2e87 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -47,6 +47,7 @@
> >  
> >  #define KVM_REQ_SLEEP \
> >  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
> > +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
> >  
> >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> >  int __attribute_const__ kvm_target_cpu(void);
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index d62e99885434..330064475914 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -581,6 +581,15 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
> >  				       (!vcpu->arch.pause)));
> >  }
> >  
> > +static void vcpu_req_irq_pending(struct kvm_vcpu *vcpu)
> > +{
> > +	/*
> > +	 * Nothing to do here. kvm_check_request() already issued a memory
> > +	 * barrier that pairs with kvm_make_request(), so all hardware state
> > +	 * we need to flush should now be visible.
> > +	 */
> > +}
> > +
> >  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> >  {
> >  	return vcpu->arch.target >= 0;
> > @@ -634,6 +643,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		if (kvm_request_pending(vcpu)) {
> >  			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
> >  				vcpu_req_sleep(vcpu);
> > +			if (kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu))
> > +				vcpu_req_irq_pending(vcpu);
> >  		}
> >  
> >  		/*
> > @@ -777,6 +788,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> >  	 * trigger a world-switch round on the running physical CPU to set the
> >  	 * virtual IRQ/FIQ fields in the HCR appropriately.
> >  	 */
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  
> >  	return 0;
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 04c0f9d37386..2c33fef945fe 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -44,6 +44,7 @@
> >  
> >  #define KVM_REQ_SLEEP \
> >  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
> > +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
> >  
> >  int __attribute_const__ kvm_target_cpu(void);
> >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 5976609ef27c..469b43315c0a 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
> >  	 * If the vcpu is blocked we want to wake it up so that it will see
> >  	 * the timer has expired when entering the guest.
> >  	 */
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  }
> >  
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index 3d0979c30721..bdd4b3a953b5 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  		 * won't see this one until it exits for some other
> >  		 * reason.
> >  		 */
> > -		if (vcpu)
> > +		if (vcpu) {
> > +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  			kvm_vcpu_kick(vcpu);
> > +		}
> >  		return false;
> >  	}
> >  
> > @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  	spin_unlock(&irq->irq_lock);
> >  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> >  
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  
> >  	return true;
> > @@ -719,8 +722,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
> >  	 * a good kick...
> >  	 */
> >  	kvm_for_each_vcpu(c, vcpu, kvm) {
> > -		if (kvm_vgic_vcpu_pending_irq(vcpu))
> > +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> > +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  			kvm_vcpu_kick(vcpu);
> > +		}
> >  	}
> >  }
> >  
> > 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection
  2017-05-03 16:06 ` [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection Andrew Jones
  2017-05-04 11:47   ` Paolo Bonzini
@ 2017-05-06 18:51   ` Christoffer Dall
  2017-05-09 17:53     ` Andrew Jones
  1 sibling, 1 reply; 43+ messages in thread
From: Christoffer Dall @ 2017-05-06 18:51 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

Hi Drew,

On Wed, May 03, 2017 at 06:06:34PM +0200, Andrew Jones wrote:
> Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> kick meant to trigger the interrupt injection could be sent while
> the VCPU is outside guest mode, which means no IPI is sent, and
> after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> the updated GIC state until its next exit some time later for some
> other reason.  The receiving VCPU only needs to check this request
> in VCPU RUN to handle it.  By checking it, if it's pending, a
> memory barrier will be issued that ensures all state is visible.
> We still create a vcpu_req_irq_pending() function (which is a nop),
> though, in order to allow us to use the standard request checking
> pattern.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>
> ---
>  arch/arm/include/asm/kvm_host.h   |  1 +
>  arch/arm/kvm/arm.c                | 12 ++++++++++++
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  virt/kvm/arm/arch_timer.c         |  1 +
>  virt/kvm/arm/vgic/vgic.c          |  9 +++++++--
>  5 files changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 41669578b3df..7bf90aaf2e87 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -47,6 +47,7 @@
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
> +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>  
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index d62e99885434..330064475914 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -581,6 +581,15 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
>  				       (!vcpu->arch.pause)));
>  }
>  
> +static void vcpu_req_irq_pending(struct kvm_vcpu *vcpu)
> +{
> +	/*
> +	 * Nothing to do here. kvm_check_request() already issued a memory
> +	 * barrier that pairs with kvm_make_request(), so all hardware state
> +	 * we need to flush should now be visible.
> +	 */
> +}
> +
>  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.target >= 0;
> @@ -634,6 +643,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>  		if (kvm_request_pending(vcpu)) {
>  			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
>  				vcpu_req_sleep(vcpu);
> +			if (kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu))
> +				vcpu_req_irq_pending(vcpu);
>  		}
>  
>  		/*
> @@ -777,6 +788,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>  	 * trigger a world-switch round on the running physical CPU to set the
>  	 * virtual IRQ/FIQ fields in the HCR appropriately.
>  	 */
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  
>  	return 0;
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 04c0f9d37386..2c33fef945fe 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -44,6 +44,7 @@
>  
>  #define KVM_REQ_SLEEP \
>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
> +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> index 5976609ef27c..469b43315c0a 100644
> --- a/virt/kvm/arm/arch_timer.c
> +++ b/virt/kvm/arm/arch_timer.c
> @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
>  	 * If the vcpu is blocked we want to wake it up so that it will see
>  	 * the timer has expired when entering the guest.
>  	 */
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);

So I think we just call kvm_vcpu_kick() because it calls
kvm_vcpu_wake_up().  If we have this timer work happening, it means that
the VCPU is blocked, and there won't be a race with executing in the run
loop, right?

So maybe we should just change this kvm_vcpu_kick() to a direct call to
kvm_vcpu_wake_up() to avoid having a request-less kick.

Note that your change will still work, I just think it's unnecessary.

Thanks,
-Christoffer

>  }
>  
> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> index 3d0979c30721..bdd4b3a953b5 100644
> --- a/virt/kvm/arm/vgic/vgic.c
> +++ b/virt/kvm/arm/vgic/vgic.c
> @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  		 * won't see this one until it exits for some other
>  		 * reason.
>  		 */
> -		if (vcpu)
> +		if (vcpu) {
> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  			kvm_vcpu_kick(vcpu);
> +		}
>  		return false;
>  	}
>  
> @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>  	spin_unlock(&irq->irq_lock);
>  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>  
> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  	kvm_vcpu_kick(vcpu);
>  
>  	return true;
> @@ -719,8 +722,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
>  	 * a good kick...
>  	 */
>  	kvm_for_each_vcpu(c, vcpu, kvm) {
> -		if (kvm_vgic_vcpu_pending_irq(vcpu))
> +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>  			kvm_vcpu_kick(vcpu);
> +		}
>  	}
>  }
>  
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 10/10] KVM: arm/arm64: PMU: remove request-less vcpu kick
  2017-05-03 16:06 ` [PATCH v3 10/10] KVM: arm/arm64: PMU: remove request-less vcpu kick Andrew Jones
@ 2017-05-06 18:55   ` Christoffer Dall
  0 siblings, 0 replies; 43+ messages in thread
From: Christoffer Dall @ 2017-05-06 18:55 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, May 03, 2017 at 06:06:35PM +0200, Andrew Jones wrote:
> Refactor PMU overflow handling in order to remove the request-less
> vcpu kick.  Now, since kvm_vgic_inject_irq() uses vcpu requests,
> there should be no chance that a kick sent at just the wrong time
> (between the VCPU's call to kvm_pmu_flush_hwstate() and before it
> enters guest mode) results in a failure for the guest to see updated
> GIC state until its next exit some time later for some other reason.
> 
> Signed-off-by: Andrew Jones <drjones@redhat.com>

Reviewed-by: Christoffer Dall <cdall@linaro.org>

> ---
>  virt/kvm/arm/pmu.c | 40 +++++++++++++++++++---------------------
>  1 file changed, 19 insertions(+), 21 deletions(-)
> 
> diff --git a/virt/kvm/arm/pmu.c b/virt/kvm/arm/pmu.c
> index 4b43e7f3b158..2451607dc25e 100644
> --- a/virt/kvm/arm/pmu.c
> +++ b/virt/kvm/arm/pmu.c
> @@ -203,6 +203,23 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
>  	return reg;
>  }
>  
> +static void kvm_pmu_check_overflow(struct kvm_vcpu *vcpu)
> +{
> +	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> +	bool overflow = !!kvm_pmu_overflow_status(vcpu);
> +
> +	if (pmu->irq_level == overflow)
> +		return;
> +
> +	pmu->irq_level = overflow;
> +
> +	if (likely(irqchip_in_kernel(vcpu->kvm))) {
> +		int ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> +					      pmu->irq_num, overflow);
> +		WARN_ON(ret);
> +	}
> +}
> +
>  /**
>   * kvm_pmu_overflow_set - set PMU overflow interrupt
>   * @vcpu: The vcpu pointer
> @@ -210,37 +227,18 @@ static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu)
>   */
>  void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val)
>  {
> -	u64 reg;
> -
>  	if (val == 0)
>  		return;
>  
>  	vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val;
> -	reg = kvm_pmu_overflow_status(vcpu);
> -	if (reg != 0)
> -		kvm_vcpu_kick(vcpu);
> +	kvm_pmu_check_overflow(vcpu);
>  }
>  
>  static void kvm_pmu_update_state(struct kvm_vcpu *vcpu)
>  {
> -	struct kvm_pmu *pmu = &vcpu->arch.pmu;
> -	bool overflow;
> -
>  	if (!kvm_arm_pmu_v3_ready(vcpu))
>  		return;
> -
> -	overflow = !!kvm_pmu_overflow_status(vcpu);
> -	if (pmu->irq_level == overflow)
> -		return;
> -
> -	pmu->irq_level = overflow;
> -
> -	if (likely(irqchip_in_kernel(vcpu->kvm))) {
> -		int ret;
> -		ret = kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id,
> -					  pmu->irq_num, overflow);
> -		WARN_ON(ret);
> -	}
> +	kvm_pmu_check_overflow(vcpu);
>  }
>  
>  bool kvm_pmu_should_notify_user(struct kvm_vcpu *vcpu)
> -- 
> 2.9.3
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection
  2017-05-06 18:49     ` Christoffer Dall
@ 2017-05-08  8:48       ` Paolo Bonzini
  2017-05-08  8:56         ` Christoffer Dall
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2017-05-08  8:48 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: Andrew Jones, kvmarm, kvm, marc.zyngier, rkrcmar



On 06/05/2017 20:49, Christoffer Dall wrote:
> On Thu, May 04, 2017 at 01:47:41PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 03/05/2017 18:06, Andrew Jones wrote:
>>> Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
>>> kick meant to trigger the interrupt injection could be sent while
>>> the VCPU is outside guest mode, which means no IPI is sent, and
>>> after it has called kvm_vgic_flush_hwstate(), meaning it won't see
>>> the updated GIC state until its next exit some time later for some
>>> other reason.  The receiving VCPU only needs to check this request
>>> in VCPU RUN to handle it.  By checking it, if it's pending, a
>>> memory barrier will be issued that ensures all state is visible.
>>> We still create a vcpu_req_irq_pending() function (which is a nop),
>>> though, in order to allow us to use the standard request checking
>>> pattern.
>>
>> I wonder if you aren't just papering over this race:
>>
>>         /*
>>          * If there are no virtual interrupts active or pending for this
>>          * VCPU, then there is no work to do and we can bail out without
>>          * taking any lock.  There is a potential race with someone injecting
>>          * interrupts to the VCPU, but it is a benign race as the VCPU will
>>          * either observe the new interrupt before or after doing this check,
>>          * and introducing additional synchronization mechanism doesn't change
>>          * this.
>>          */
>>         if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
>>                 return;
>>
>>         spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
>>         vgic_flush_lr_state(vcpu);
>>         spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>>
>> not being so "benign" after all. :)  Maybe you can remove the if (list_empty()),
>> and have kvm_arch_vcpu_ioctl_run do this instead:
> 
> I don't see how removing this shortcut improves anything.  You'd still
> have the same window where you could loose an interrupt right after the
> spin_unlock.

It's not removing it that matters; it's just unnecessary if you add
KVM_REQ_IRQ_PENDING and you key the call to kvm_vgic_flush_hwstate on it.

Paolo

> I think the race that this comment discusses is indeed benign, but the
> overall guarantees that our vgic injection relies on is flawed and can
> be solved by either doing requests as Drew does here, or moving the
> vgic_flush inside a region that has both mode == IN_GUEST_MODE and
> interrupts disabled.  Note that for other purposes I'm planning to move
> the flush functions inside the interrupts disabled region later anyhow.
> 
> I don't see a problem with Drew's patch actually.
> 
> Thanks,
> -Christoffer
> 
>>
>>  		if (kvm_request_pending(vcpu)) {
>>  			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
>>  				vcpu_req_sleep(vcpu);
>> 		}
>>
>>                 preempt_disable();
>>
>>                 kvm_pmu_flush_hwstate(vcpu);
>>                 kvm_timer_flush_hwstate(vcpu);
>>
>> 		if (kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu))
>> 			kvm_vgic_flush_hwstate(vcpu);
>>
>> ?
>>
>> Paolo
>>
>>> Signed-off-by: Andrew Jones <drjones@redhat.com>
>>> ---
>>>  arch/arm/include/asm/kvm_host.h   |  1 +
>>>  arch/arm/kvm/arm.c                | 12 ++++++++++++
>>>  arch/arm64/include/asm/kvm_host.h |  1 +
>>>  virt/kvm/arm/arch_timer.c         |  1 +
>>>  virt/kvm/arm/vgic/vgic.c          |  9 +++++++--
>>>  5 files changed, 22 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>> index 41669578b3df..7bf90aaf2e87 100644
>>> --- a/arch/arm/include/asm/kvm_host.h
>>> +++ b/arch/arm/include/asm/kvm_host.h
>>> @@ -47,6 +47,7 @@
>>>  
>>>  #define KVM_REQ_SLEEP \
>>>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
>>> +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>>>  
>>>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>>>  int __attribute_const__ kvm_target_cpu(void);
>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>> index d62e99885434..330064475914 100644
>>> --- a/arch/arm/kvm/arm.c
>>> +++ b/arch/arm/kvm/arm.c
>>> @@ -581,6 +581,15 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
>>>  				       (!vcpu->arch.pause)));
>>>  }
>>>  
>>> +static void vcpu_req_irq_pending(struct kvm_vcpu *vcpu)
>>> +{
>>> +	/*
>>> +	 * Nothing to do here. kvm_check_request() already issued a memory
>>> +	 * barrier that pairs with kvm_make_request(), so all hardware state
>>> +	 * we need to flush should now be visible.
>>> +	 */
>>> +}
>>> +
>>>  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
>>>  {
>>>  	return vcpu->arch.target >= 0;
>>> @@ -634,6 +643,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>  		if (kvm_request_pending(vcpu)) {
>>>  			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
>>>  				vcpu_req_sleep(vcpu);
>>> +			if (kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu))
>>> +				vcpu_req_irq_pending(vcpu);
>>>  		}
>>>  
>>>  		/*
>>> @@ -777,6 +788,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
>>>  	 * trigger a world-switch round on the running physical CPU to set the
>>>  	 * virtual IRQ/FIQ fields in the HCR appropriately.
>>>  	 */
>>> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>>>  	kvm_vcpu_kick(vcpu);
>>>  
>>>  	return 0;
>>> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
>>> index 04c0f9d37386..2c33fef945fe 100644
>>> --- a/arch/arm64/include/asm/kvm_host.h
>>> +++ b/arch/arm64/include/asm/kvm_host.h
>>> @@ -44,6 +44,7 @@
>>>  
>>>  #define KVM_REQ_SLEEP \
>>>  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
>>> +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
>>>  
>>>  int __attribute_const__ kvm_target_cpu(void);
>>>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>>> diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
>>> index 5976609ef27c..469b43315c0a 100644
>>> --- a/virt/kvm/arm/arch_timer.c
>>> +++ b/virt/kvm/arm/arch_timer.c
>>> @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
>>>  	 * If the vcpu is blocked we want to wake it up so that it will see
>>>  	 * the timer has expired when entering the guest.
>>>  	 */
>>> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>>>  	kvm_vcpu_kick(vcpu);
>>>  }
>>>  
>>> diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
>>> index 3d0979c30721..bdd4b3a953b5 100644
>>> --- a/virt/kvm/arm/vgic/vgic.c
>>> +++ b/virt/kvm/arm/vgic/vgic.c
>>> @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>>>  		 * won't see this one until it exits for some other
>>>  		 * reason.
>>>  		 */
>>> -		if (vcpu)
>>> +		if (vcpu) {
>>> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>>>  			kvm_vcpu_kick(vcpu);
>>> +		}
>>>  		return false;
>>>  	}
>>>  
>>> @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
>>>  	spin_unlock(&irq->irq_lock);
>>>  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
>>>  
>>> +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>>>  	kvm_vcpu_kick(vcpu);
>>>  
>>>  	return true;
>>> @@ -719,8 +722,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
>>>  	 * a good kick...
>>>  	 */
>>>  	kvm_for_each_vcpu(c, vcpu, kvm) {
>>> -		if (kvm_vgic_vcpu_pending_irq(vcpu))
>>> +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
>>> +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
>>>  			kvm_vcpu_kick(vcpu);
>>> +		}
>>>  	}
>>>  }
>>>  
>>>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection
  2017-05-08  8:48       ` Paolo Bonzini
@ 2017-05-08  8:56         ` Christoffer Dall
  0 siblings, 0 replies; 43+ messages in thread
From: Christoffer Dall @ 2017-05-08  8:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: marc.zyngier, kvmarm, kvm

On Mon, May 08, 2017 at 10:48:57AM +0200, Paolo Bonzini wrote:
> 
> 
> On 06/05/2017 20:49, Christoffer Dall wrote:
> > On Thu, May 04, 2017 at 01:47:41PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 03/05/2017 18:06, Andrew Jones wrote:
> >>> Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> >>> kick meant to trigger the interrupt injection could be sent while
> >>> the VCPU is outside guest mode, which means no IPI is sent, and
> >>> after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> >>> the updated GIC state until its next exit some time later for some
> >>> other reason.  The receiving VCPU only needs to check this request
> >>> in VCPU RUN to handle it.  By checking it, if it's pending, a
> >>> memory barrier will be issued that ensures all state is visible.
> >>> We still create a vcpu_req_irq_pending() function (which is a nop),
> >>> though, in order to allow us to use the standard request checking
> >>> pattern.
> >>
> >> I wonder if you aren't just papering over this race:
> >>
> >>         /*
> >>          * If there are no virtual interrupts active or pending for this
> >>          * VCPU, then there is no work to do and we can bail out without
> >>          * taking any lock.  There is a potential race with someone injecting
> >>          * interrupts to the VCPU, but it is a benign race as the VCPU will
> >>          * either observe the new interrupt before or after doing this check,
> >>          * and introducing additional synchronization mechanism doesn't change
> >>          * this.
> >>          */
> >>         if (list_empty(&vcpu->arch.vgic_cpu.ap_list_head))
> >>                 return;
> >>
> >>         spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock);
> >>         vgic_flush_lr_state(vcpu);
> >>         spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> >>
> >> not being so "benign" after all. :)  Maybe you can remove the if (list_empty()),
> >> and have kvm_arch_vcpu_ioctl_run do this instead:
> > 
> > I don't see how removing this shortcut improves anything.  You'd still
> > have the same window where you could loose an interrupt right after the
> > spin_unlock.
> 
> It's not removing it that matters; it's just unnecessary if you add
> KVM_REQ_IRQ_PENDING and you key the call to kvm_vgic_flush_hwstate on it.
> 

That doesn't work, because you can have active interrupts in flight long
after someone sent you that request which means you'll have interrupts
on the ap_list that you need to flush.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu
  2017-05-06 18:08   ` Christoffer Dall
@ 2017-05-09 17:02     ` Andrew Jones
  2017-05-10  9:59       ` Christoffer Dall
  2017-05-15 11:14       ` Christoffer Dall
  0 siblings, 2 replies; 43+ messages in thread
From: Andrew Jones @ 2017-05-09 17:02 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Sat, May 06, 2017 at 08:08:09PM +0200, Christoffer Dall wrote:
> On Wed, May 03, 2017 at 06:06:29PM +0200, Andrew Jones wrote:
> > VCPU halting/resuming is partially implemented with VCPU requests.
> > When kvm_arm_halt_guest() is called all VCPUs get the EXIT request,
> > telling them to exit guest mode and look at the state of 'pause',
> > which will be true, telling them to sleep.  As ARM's VCPU RUN
> > implements the memory barrier pattern described in "Ensuring Requests
> > Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst, there's
> > no way for a VCPU halted by kvm_arm_halt_guest() to miss the pause
> > state change.  However, before this patch, a single VCPU halted with
> > kvm_arm_halt_vcpu() did not get a request, opening a tiny race window.
> > This patch adds the request, closing the race window and also allowing
> > us to remove the final check of pause in VCPU RUN, as the final check
> > for requests is sufficient.
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > 
> > ---
> > 
> > I have two questions about the halting/resuming.
> > 
> > Question 1:
> > 
> > Do we even need kvm_arm_halt_vcpu()/kvm_arm_resume_vcpu()? It should
> > only be necessary if one VCPU can activate or inactivate the private
> > IRQs of another VCPU, right?  That doesn't seem like something that
> > should be possible, but I'm GIC-illiterate...
> 
> True, it shouldn't be possible.  I wonder if we were thinking of
> userspace access to the CPU-specific data, but we already ensure that no
> VCPUs are running at that time, so I don't think it should be necessary.
> 
> > 
> > Question 2:
> > 
> > It's not clear to me if we have another problem with halting/resuming
> > or not.  If it's possible for VCPU1 and VCPU2 to race in
> > vgic_mmio_write_s/cactive(), then the following scenario could occur,
> > leading to VCPU3 being in guest mode when it should not be.  Does the
> > hardware prohibit more than one VCPU entering trap handlers that lead
> > to these functions at the same time?  If not, then I guess pause needs
> > to be a counter instead of a boolean.
> > 
> >  VCPU1                 VCPU2                  VCPU3
> >  -----                 -----                  -----
> >                        VCPU3->pause = true;
> >                        halt(VCPU3);
> >                                               if (pause)
> >                                                 sleep();
> >  VCPU3->pause = true;
> >  halt(VCPU3);
> >                        VCPU3->pause = false;
> >                        resume(VCPU3);
> >                                               ...wake up...
> >                                               if (!pause)
> >                                                 Enter guest mode. Bad!
> >  VCPU3->pause = false;
> >  resume(VCPU3);
> > 
> > (Yes, the "Bad!" is there to both identify something we don't want
> >  occurring and to make fun of Trump's tweeting style.)
> 
> I think it's bad, and it might be even worse, because it could lead to a
> CPU looping forever in the host kernel, since there's no guarantee to
> exit from the VM in the other VCPU thread.
> 
> But I think simply taking the kvm->lock mutex to serialize the mmio
> active change operations should be sufficient.
> 
> If we agree on this I can send a patch with your reported by that fixes
> that issue, which gets rid of kvm_arm_halt_vcpu and requires you to
> modify your first patch to clear the KVM_REQ_VCPU_EXIT flag for each
> vcpu in kvm_arm_halt_guest instead and you can fold the remaining change
> from this patch into a patch that completely gets rid of the pause flag.

Yup, seems reasonable to me to lock the kvm mutex on a stop the guest type
action.

> 
> See untested patch draft at the end of this mail.
> 
> Thanks,
> -Christoffer
> 
> > ---
> >  arch/arm/kvm/arm.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 47f6c7fdca96..9174ed13135a 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -545,6 +545,7 @@ void kvm_arm_halt_guest(struct kvm *kvm)
> >  void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> >  {
> >  	vcpu->arch.pause = true;
> > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  }
> >  
> > @@ -664,7 +665,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> >  		    kvm_request_pending(vcpu) ||
> > -		    vcpu->arch.power_off || vcpu->arch.pause) {
> > +		    vcpu->arch.power_off) {
> >  			vcpu->mode = OUTSIDE_GUEST_MODE;
> >  			local_irq_enable();
> >  			kvm_pmu_sync_hwstate(vcpu);
> > -- 
> > 2.9.3
> > 
> 
> 
> Untested draft patch:
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index d488b88..b77a3af 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -234,8 +234,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
>  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
>  void kvm_arm_halt_guest(struct kvm *kvm);
>  void kvm_arm_resume_guest(struct kvm *kvm);
> -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
>  
>  int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
>  unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 578df18..7a38d5a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -334,8 +334,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
>  struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
>  void kvm_arm_halt_guest(struct kvm *kvm);
>  void kvm_arm_resume_guest(struct kvm *kvm);
> -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
>  
>  u64 __kvm_call_hyp(void *hypfn, ...);
>  #define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 7941699..932788a 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -542,27 +542,15 @@ void kvm_arm_halt_guest(struct kvm *kvm)
>  	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
>  }
>  
> -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> -{
> -	vcpu->arch.pause = true;
> -	kvm_vcpu_kick(vcpu);
> -}
> -
> -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> -{
> -	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> -
> -	vcpu->arch.pause = false;
> -	swake_up(wq);
> -}
> -
>  void kvm_arm_resume_guest(struct kvm *kvm)
>  {
>  	int i;
>  	struct kvm_vcpu *vcpu;
>  
> -	kvm_for_each_vcpu(i, vcpu, kvm)
> -		kvm_arm_resume_vcpu(vcpu);
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		vcpu->arch.pause = false;
> +		swake_up(kvm_arch_vcpu_wq(vcpu));
> +	}
>  }
>  
>  static void vcpu_sleep(struct kvm_vcpu *vcpu)
> diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> index 2a5db13..c143add 100644
> --- a/virt/kvm/arm/vgic/vgic-mmio.c
> +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> @@ -231,23 +231,21 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
>   * be migrated while we don't hold the IRQ locks and we don't want to be
>   * chasing moving targets.
>   *
> - * For private interrupts, we only have to make sure the single and only VCPU
> - * that can potentially queue the IRQ is stopped.
> + * For private interrupts we don't have to do anything because userspace
> + * accesses to the VGIC state already require all VCPUs to be stopped, and
> + * only the VCPU itself can modify its private interrupts active state, which
> + * guarantees that the VCPU is not running.
>   */
>  static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
>  {
> -	if (intid < VGIC_NR_PRIVATE_IRQS)
> -		kvm_arm_halt_vcpu(vcpu);
> -	else
> +	if (intid > VGIC_NR_PRIVATE_IRQS)
>  		kvm_arm_halt_guest(vcpu->kvm);
>  }
>  
>  /* See vgic_change_active_prepare */
>  static void vgic_change_active_finish(struct kvm_vcpu *vcpu, u32 intid)
>  {
> -	if (intid < VGIC_NR_PRIVATE_IRQS)
> -		kvm_arm_resume_vcpu(vcpu);
> -	else
> +	if (intid > VGIC_NR_PRIVATE_IRQS)
>  		kvm_arm_resume_guest(vcpu->kvm);
>  }
>  
> @@ -258,6 +256,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
>  
> +	mutex_lock(&vcpu->kvm->lock);
>  	vgic_change_active_prepare(vcpu, intid);
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> @@ -265,6 +264,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  	vgic_change_active_finish(vcpu, intid);
> +	mutex_unlock(&vcpu->kvm->lock);
>  }
>  
>  void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> @@ -274,6 +274,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
>  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
>  	int i;
>  
> +	mutex_lock(&vcpu->kvm->lock);
>  	vgic_change_active_prepare(vcpu, intid);
>  	for_each_set_bit(i, &val, len * 8) {
>  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> @@ -281,6 +282,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
>  		vgic_put_irq(vcpu->kvm, irq);
>  	}
>  	vgic_change_active_finish(vcpu, intid);
> +	mutex_unlock(&vcpu->kvm->lock);
>  }
>  
>  unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,

Looks good to me. How about adding kvm->lock to the locking order comment
at the top of virt/kvm/arm/vgic/vgic.c too. With that, you can add my R-b
on the posting.

I'll rebase this series on your posting.

Thanks,
drew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller
  2017-05-06 18:12   ` Christoffer Dall
@ 2017-05-09 17:17     ` Andrew Jones
  2017-05-10  9:55       ` Christoffer Dall
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-09 17:17 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Sat, May 06, 2017 at 08:12:56PM +0200, Christoffer Dall wrote:
> On Wed, May 03, 2017 at 06:06:30PM +0200, Andrew Jones wrote:
> > VCPU requests that the receiver should handle should only be cleared
> > by the receiver. 
> 
> I cannot parse this sentence.

I'll try again:

VCPU requests should only be cleared by the receiving VCPUs.  The only
exception is when a request is set as a side-effect.  In these cases
the "requester" threads may clear the requests when it is sure the
receiving VCPUs do not need to see them.

> 
> > Not only does this properly implement the protocol,
> > but also avoids bugs where one VCPU clears another VCPU's request,
> > before the receiving VCPU has had a chance to see it.
> 
> Is this an actual race we have currently or just something thay may
> happen later.  Im' not sure.

Since ARM is just learning to handle VCPU requests, then it's not a bug
now.  Actually, I think I should state this protocol (what I wrote above)
in the document, and then I can just reference that here in this commit
message as the justification for change.

> 
> > ARM VCPUs
> > currently only handle one request, EXIT, and handling it is achieved
> > by checking pause to see if the VCPU should sleep.
> 
> This makes sense.  So forget my comment on the previous patch about
> getting rid of the pause flag.

Forgotten

> 
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/arm/kvm/arm.c | 10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 9174ed13135a..7be0d9b0c63a 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -553,7 +553,6 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> >  {
> >  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> >  
> > -	kvm_clear_request(KVM_REQ_VCPU_EXIT, vcpu);
> >  	vcpu->arch.pause = false;
> >  	swake_up(wq);
> >  }
> > @@ -625,7 +624,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		update_vttbr(vcpu->kvm);
> >  
> > -		if (vcpu->arch.power_off || vcpu->arch.pause)
> > +		if (kvm_request_pending(vcpu)) {
> > +			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> > +				if (vcpu->arch.pause)
> > +					vcpu_sleep(vcpu);
> > +			}
> 
> Can we factor out this bit to a separate function,
> kvm_handle_vcpu_requests() or something like that?

Later patches make this look a bit better, but a function to bundle all
the request handling up sounds good too. Will do.

> 
> > +		}
> > +
> > +		if (vcpu->arch.power_off)
> >  			vcpu_sleep(vcpu);
> >  
> >  		/*
> > -- 
> > 2.9.3
> > 
> Thanks,
> -Christoffer

Thanks,
drew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN
  2017-05-06 18:27   ` Christoffer Dall
@ 2017-05-09 17:40     ` Andrew Jones
  2017-05-09 20:13       ` Christoffer Dall
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-09 17:40 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Sat, May 06, 2017 at 08:27:15PM +0200, Christoffer Dall wrote:
> On Wed, May 03, 2017 at 06:06:32PM +0200, Andrew Jones wrote:
> 
> nit: can you make the subject of this patch a bit more specific?
> 
> For example:  Optimize checking power_off flag in KVM_RUN

OK

> 
> > We can make a small optimization by not checking the state of
> > the power_off field on each run. This is done by treating
> > power_off like pause, only checking it when we get the EXIT
> > VCPU request. When a VCPU powers off another VCPU the EXIT
> > request is already made, so we just need to make sure the
> > request is also made on self power off. kvm_vcpu_kick() isn't
> > necessary for these cases, as the VCPU would just be kicking
> > itself, but we add it anyway as a self kick doesn't cost much,
> > and it makes the code more future-proof.
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/arm/kvm/arm.c  | 16 ++++++++++------
> >  arch/arm/kvm/psci.c |  2 ++
> >  2 files changed, 12 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index 26d9d4d72853..24bbc7671d89 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -371,6 +371,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> >  	kvm_timer_vcpu_put(vcpu);
> >  }
> >  
> > +static void vcpu_power_off(struct kvm_vcpu *vcpu)
> > +{
> > +	vcpu->arch.power_off = true;
> > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > +	kvm_vcpu_kick(vcpu);
> > +}
> > +
> >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> >  				    struct kvm_mp_state *mp_state)
> >  {
> > @@ -390,7 +397,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> >  		vcpu->arch.power_off = false;
> >  		break;
> >  	case KVM_MP_STATE_STOPPED:
> > -		vcpu->arch.power_off = true;
> > +		vcpu_power_off(vcpu);
> >  		break;
> >  	default:
> >  		return -EINVAL;
> > @@ -626,14 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  
> >  		if (kvm_request_pending(vcpu)) {
> >  			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> > -				if (vcpu->arch.pause)
> > +				if (vcpu->arch.power_off || vcpu->arch.pause)
> >  					vcpu_sleep(vcpu);
> >  			}
> >  		}
> >  
> > -		if (vcpu->arch.power_off)
> > -			vcpu_sleep(vcpu);
> > -
> 
> Hmmm, even though I just gave a reviewed-by on the pause side, I'm not
> realizing that I don't think this works.  Because you're now only
> checking requests in the vcpu loop, but the vcpu_sleep() function is
> implemented using swait_event_interruptible(), which can wake up if you
> have a pending signal for example, and then the loop can wrap around and
> you can run the VCPU even though you should be paused.  Am I missing
> something?

Hmm, I think I missed something. I missed that swait_event_interruptible()
doesn't check its condition again when awoken by a signal (which, as far
as I can tell, is the only other way we can stop vcpu_sleep() while
power_off and/or pause are true.  Had I noticed that, I could have
addressed it in one of two ways:

 1) Leave power_off and pause in the condition that stops guest entry.
    Easy to see we'll never enter guest mode with one or both set.
    
 2) Add a comment somewhere to explain the subtle dependency vcpu_sleep()
    has on the pending signal check done after its call and before the
    condition that stops guest entry is run. (IOW, I don't think we have
    a bug with this series, but we do have a non-commented subtlety.)

Thanks,
drew

> 
> Thanks,
> -Christoffer
> 
> >  		/*
> >  		 * Preparing the interrupts to be injected also
> >  		 * involves poking the GIC, which must be done in a
> > @@ -903,7 +907,7 @@ static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
> >  	 * Handle the "start in power-off" case.
> >  	 */
> >  	if (test_bit(KVM_ARM_VCPU_POWER_OFF, vcpu->arch.features))
> > -		vcpu->arch.power_off = true;
> > +		vcpu_power_off(vcpu);
> >  	else
> >  		vcpu->arch.power_off = false;
> >  
> > diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> > index f189d0ad30d5..4a436685c552 100644
> > --- a/arch/arm/kvm/psci.c
> > +++ b/arch/arm/kvm/psci.c
> > @@ -65,6 +65,8 @@ static unsigned long kvm_psci_vcpu_suspend(struct kvm_vcpu *vcpu)
> >  static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
> >  {
> >  	vcpu->arch.power_off = true;
> > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > +	kvm_vcpu_kick(vcpu);
> >  }
> >  
> >  static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
> > -- 
> > 2.9.3
> > 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection
  2017-05-06 18:51   ` Christoffer Dall
@ 2017-05-09 17:53     ` Andrew Jones
  0 siblings, 0 replies; 43+ messages in thread
From: Andrew Jones @ 2017-05-09 17:53 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Sat, May 06, 2017 at 08:51:00PM +0200, Christoffer Dall wrote:
> Hi Drew,
> 
> On Wed, May 03, 2017 at 06:06:34PM +0200, Andrew Jones wrote:
> > Don't use request-less VCPU kicks when injecting IRQs, as a VCPU
> > kick meant to trigger the interrupt injection could be sent while
> > the VCPU is outside guest mode, which means no IPI is sent, and
> > after it has called kvm_vgic_flush_hwstate(), meaning it won't see
> > the updated GIC state until its next exit some time later for some
> > other reason.  The receiving VCPU only needs to check this request
> > in VCPU RUN to handle it.  By checking it, if it's pending, a
> > memory barrier will be issued that ensures all state is visible.
> > We still create a vcpu_req_irq_pending() function (which is a nop),
> > though, in order to allow us to use the standard request checking
> > pattern.
> > 
> > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  1 +
> >  arch/arm/kvm/arm.c                | 12 ++++++++++++
> >  arch/arm64/include/asm/kvm_host.h |  1 +
> >  virt/kvm/arm/arch_timer.c         |  1 +
> >  virt/kvm/arm/vgic/vgic.c          |  9 +++++++--
> >  5 files changed, 22 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index 41669578b3df..7bf90aaf2e87 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -47,6 +47,7 @@
> >  
> >  #define KVM_REQ_SLEEP \
> >  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
> > +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
> >  
> >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> >  int __attribute_const__ kvm_target_cpu(void);
> > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > index d62e99885434..330064475914 100644
> > --- a/arch/arm/kvm/arm.c
> > +++ b/arch/arm/kvm/arm.c
> > @@ -581,6 +581,15 @@ static void vcpu_req_sleep(struct kvm_vcpu *vcpu)
> >  				       (!vcpu->arch.pause)));
> >  }
> >  
> > +static void vcpu_req_irq_pending(struct kvm_vcpu *vcpu)
> > +{
> > +	/*
> > +	 * Nothing to do here. kvm_check_request() already issued a memory
> > +	 * barrier that pairs with kvm_make_request(), so all hardware state
> > +	 * we need to flush should now be visible.
> > +	 */
> > +}
> > +
> >  static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
> >  {
> >  	return vcpu->arch.target >= 0;
> > @@ -634,6 +643,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> >  		if (kvm_request_pending(vcpu)) {
> >  			if (kvm_check_request(KVM_REQ_SLEEP, vcpu))
> >  				vcpu_req_sleep(vcpu);
> > +			if (kvm_check_request(KVM_REQ_IRQ_PENDING, vcpu))
> > +				vcpu_req_irq_pending(vcpu);
> >  		}
> >  
> >  		/*
> > @@ -777,6 +788,7 @@ static int vcpu_interrupt_line(struct kvm_vcpu *vcpu, int number, bool level)
> >  	 * trigger a world-switch round on the running physical CPU to set the
> >  	 * virtual IRQ/FIQ fields in the HCR appropriately.
> >  	 */
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  
> >  	return 0;
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 04c0f9d37386..2c33fef945fe 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -44,6 +44,7 @@
> >  
> >  #define KVM_REQ_SLEEP \
> >  	KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_NO_WAKEUP | KVM_REQUEST_WAIT)
> > +#define KVM_REQ_IRQ_PENDING	KVM_ARCH_REQ(1)
> >  
> >  int __attribute_const__ kvm_target_cpu(void);
> >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > diff --git a/virt/kvm/arm/arch_timer.c b/virt/kvm/arm/arch_timer.c
> > index 5976609ef27c..469b43315c0a 100644
> > --- a/virt/kvm/arm/arch_timer.c
> > +++ b/virt/kvm/arm/arch_timer.c
> > @@ -95,6 +95,7 @@ static void kvm_timer_inject_irq_work(struct work_struct *work)
> >  	 * If the vcpu is blocked we want to wake it up so that it will see
> >  	 * the timer has expired when entering the guest.
> >  	 */
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> 
> So I think we just call kvm_vcpu_kick() because it calls
> kvm_vcpu_wake_up().  If we have this timer work happening, it means that
> the VCPU is blocked, and there won't be a race with executing in the run
> loop, right?
> 
> So maybe we should just change this kvm_vcpu_kick() to a direct call to
> kvm_vcpu_wake_up() to avoid having a request-less kick.
> 
> Note that your change will still work, I just think it's unnecessary.

Ah, yes.  I like the idea of changing it to a wake up. Will do.

Thanks,
drew


> 
> Thanks,
> -Christoffer
> 
> >  }
> >  
> > diff --git a/virt/kvm/arm/vgic/vgic.c b/virt/kvm/arm/vgic/vgic.c
> > index 3d0979c30721..bdd4b3a953b5 100644
> > --- a/virt/kvm/arm/vgic/vgic.c
> > +++ b/virt/kvm/arm/vgic/vgic.c
> > @@ -283,8 +283,10 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  		 * won't see this one until it exits for some other
> >  		 * reason.
> >  		 */
> > -		if (vcpu)
> > +		if (vcpu) {
> > +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  			kvm_vcpu_kick(vcpu);
> > +		}
> >  		return false;
> >  	}
> >  
> > @@ -330,6 +332,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq)
> >  	spin_unlock(&irq->irq_lock);
> >  	spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);
> >  
> > +	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  	kvm_vcpu_kick(vcpu);
> >  
> >  	return true;
> > @@ -719,8 +722,10 @@ void vgic_kick_vcpus(struct kvm *kvm)
> >  	 * a good kick...
> >  	 */
> >  	kvm_for_each_vcpu(c, vcpu, kvm) {
> > -		if (kvm_vgic_vcpu_pending_irq(vcpu))
> > +		if (kvm_vgic_vcpu_pending_irq(vcpu)) {
> > +			kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
> >  			kvm_vcpu_kick(vcpu);
> > +		}
> >  	}
> >  }
> >  
> > -- 
> > 2.9.3
> > 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN
  2017-05-09 17:40     ` Andrew Jones
@ 2017-05-09 20:13       ` Christoffer Dall
  2017-05-10  6:58         ` Andrew Jones
  0 siblings, 1 reply; 43+ messages in thread
From: Christoffer Dall @ 2017-05-09 20:13 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Christoffer Dall, kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, May 09, 2017 at 07:40:57PM +0200, Andrew Jones wrote:
> On Sat, May 06, 2017 at 08:27:15PM +0200, Christoffer Dall wrote:
> > On Wed, May 03, 2017 at 06:06:32PM +0200, Andrew Jones wrote:
> > 
> > nit: can you make the subject of this patch a bit more specific?
> > 
> > For example:  Optimize checking power_off flag in KVM_RUN
> 
> OK
> 
> > 
> > > We can make a small optimization by not checking the state of
> > > the power_off field on each run. This is done by treating
> > > power_off like pause, only checking it when we get the EXIT
> > > VCPU request. When a VCPU powers off another VCPU the EXIT
> > > request is already made, so we just need to make sure the
> > > request is also made on self power off. kvm_vcpu_kick() isn't
> > > necessary for these cases, as the VCPU would just be kicking
> > > itself, but we add it anyway as a self kick doesn't cost much,
> > > and it makes the code more future-proof.
> > > 
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > ---
> > >  arch/arm/kvm/arm.c  | 16 ++++++++++------
> > >  arch/arm/kvm/psci.c |  2 ++
> > >  2 files changed, 12 insertions(+), 6 deletions(-)
> > > 
> > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > index 26d9d4d72853..24bbc7671d89 100644
> > > --- a/arch/arm/kvm/arm.c
> > > +++ b/arch/arm/kvm/arm.c
> > > @@ -371,6 +371,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > >  	kvm_timer_vcpu_put(vcpu);
> > >  }
> > >  
> > > +static void vcpu_power_off(struct kvm_vcpu *vcpu)
> > > +{
> > > +	vcpu->arch.power_off = true;
> > > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > > +	kvm_vcpu_kick(vcpu);
> > > +}
> > > +
> > >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> > >  				    struct kvm_mp_state *mp_state)
> > >  {
> > > @@ -390,7 +397,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> > >  		vcpu->arch.power_off = false;
> > >  		break;
> > >  	case KVM_MP_STATE_STOPPED:
> > > -		vcpu->arch.power_off = true;
> > > +		vcpu_power_off(vcpu);
> > >  		break;
> > >  	default:
> > >  		return -EINVAL;
> > > @@ -626,14 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > >  
> > >  		if (kvm_request_pending(vcpu)) {
> > >  			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> > > -				if (vcpu->arch.pause)
> > > +				if (vcpu->arch.power_off || vcpu->arch.pause)
> > >  					vcpu_sleep(vcpu);
> > >  			}
> > >  		}
> > >  
> > > -		if (vcpu->arch.power_off)
> > > -			vcpu_sleep(vcpu);
> > > -
> > 
> > Hmmm, even though I just gave a reviewed-by on the pause side, I'm not
> > realizing that I don't think this works.  Because you're now only
> > checking requests in the vcpu loop, but the vcpu_sleep() function is
> > implemented using swait_event_interruptible(), which can wake up if you
> > have a pending signal for example, and then the loop can wrap around and
> > you can run the VCPU even though you should be paused.  Am I missing
> > something?
> 
> Hmm, I think I missed something. I missed that swait_event_interruptible()
> doesn't check its condition again when awoken by a signal (which, as far
> as I can tell, is the only other way we can stop vcpu_sleep() while
> power_off and/or pause are true.  Had I noticed that, I could have
> addressed it in one of two ways:
> 
>  1) Leave power_off and pause in the condition that stops guest entry.
>     Easy to see we'll never enter guest mode with one or both set.
>     
>  2) Add a comment somewhere to explain the subtle dependency vcpu_sleep()
>     has on the pending signal check done after its call and before the
>     condition that stops guest entry is run. (IOW, I don't think we have
>     a bug with this series, but we do have a non-commented subtlety.)
> 

But, then it can return to userspace and enter the kernel again, at
which time there will be no pending signal and no pending VCPU requests,
so the VCPU will enter the guest, but the pause flag can still be true
and it shouldn't enter the guest.  So I think there is a bug.

And I think the only nice way to solve it is to not clear the request
until the CPU is really not paused any more.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN
  2017-05-09 20:13       ` Christoffer Dall
@ 2017-05-10  6:58         ` Andrew Jones
  2017-05-10  8:07           ` Christoffer Dall
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-10  6:58 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, May 09, 2017 at 01:13:47PM -0700, Christoffer Dall wrote:
> On Tue, May 09, 2017 at 07:40:57PM +0200, Andrew Jones wrote:
> > On Sat, May 06, 2017 at 08:27:15PM +0200, Christoffer Dall wrote:
> > > On Wed, May 03, 2017 at 06:06:32PM +0200, Andrew Jones wrote:
> > > 
> > > nit: can you make the subject of this patch a bit more specific?
> > > 
> > > For example:  Optimize checking power_off flag in KVM_RUN
> > 
> > OK
> > 
> > > 
> > > > We can make a small optimization by not checking the state of
> > > > the power_off field on each run. This is done by treating
> > > > power_off like pause, only checking it when we get the EXIT
> > > > VCPU request. When a VCPU powers off another VCPU the EXIT
> > > > request is already made, so we just need to make sure the
> > > > request is also made on self power off. kvm_vcpu_kick() isn't
> > > > necessary for these cases, as the VCPU would just be kicking
> > > > itself, but we add it anyway as a self kick doesn't cost much,
> > > > and it makes the code more future-proof.
> > > > 
> > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > ---
> > > >  arch/arm/kvm/arm.c  | 16 ++++++++++------
> > > >  arch/arm/kvm/psci.c |  2 ++
> > > >  2 files changed, 12 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > > index 26d9d4d72853..24bbc7671d89 100644
> > > > --- a/arch/arm/kvm/arm.c
> > > > +++ b/arch/arm/kvm/arm.c
> > > > @@ -371,6 +371,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > > >  	kvm_timer_vcpu_put(vcpu);
> > > >  }
> > > >  
> > > > +static void vcpu_power_off(struct kvm_vcpu *vcpu)
> > > > +{
> > > > +	vcpu->arch.power_off = true;
> > > > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > > > +	kvm_vcpu_kick(vcpu);
> > > > +}
> > > > +
> > > >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> > > >  				    struct kvm_mp_state *mp_state)
> > > >  {
> > > > @@ -390,7 +397,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> > > >  		vcpu->arch.power_off = false;
> > > >  		break;
> > > >  	case KVM_MP_STATE_STOPPED:
> > > > -		vcpu->arch.power_off = true;
> > > > +		vcpu_power_off(vcpu);
> > > >  		break;
> > > >  	default:
> > > >  		return -EINVAL;
> > > > @@ -626,14 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > >  
> > > >  		if (kvm_request_pending(vcpu)) {
> > > >  			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> > > > -				if (vcpu->arch.pause)
> > > > +				if (vcpu->arch.power_off || vcpu->arch.pause)
> > > >  					vcpu_sleep(vcpu);
> > > >  			}
> > > >  		}
> > > >  
> > > > -		if (vcpu->arch.power_off)
> > > > -			vcpu_sleep(vcpu);
> > > > -
> > > 
> > > Hmmm, even though I just gave a reviewed-by on the pause side, I'm not
> > > realizing that I don't think this works.  Because you're now only
> > > checking requests in the vcpu loop, but the vcpu_sleep() function is
> > > implemented using swait_event_interruptible(), which can wake up if you
> > > have a pending signal for example, and then the loop can wrap around and
> > > you can run the VCPU even though you should be paused.  Am I missing
> > > something?
> > 
> > Hmm, I think I missed something. I missed that swait_event_interruptible()
> > doesn't check its condition again when awoken by a signal (which, as far
> > as I can tell, is the only other way we can stop vcpu_sleep() while
> > power_off and/or pause are true.  Had I noticed that, I could have
> > addressed it in one of two ways:
> > 
> >  1) Leave power_off and pause in the condition that stops guest entry.
> >     Easy to see we'll never enter guest mode with one or both set.
> >     
> >  2) Add a comment somewhere to explain the subtle dependency vcpu_sleep()
> >     has on the pending signal check done after its call and before the
> >     condition that stops guest entry is run. (IOW, I don't think we have
> >     a bug with this series, but we do have a non-commented subtlety.)
> > 
> 
> But, then it can return to userspace and enter the kernel again, at
> which time there will be no pending signal and no pending VCPU requests,
> so the VCPU will enter the guest, but the pause flag can still be true
> and it shouldn't enter the guest.  So I think there is a bug.

Ah, indeed.

> 
> And I think the only nice way to solve it is to not clear the request
> until the CPU is really not paused any more.

This would sort of circle back to the original approach of using the
request bit as the state, but I've already convinced myself that that's
too much abuse of VCPU requests to want to do it. (1) above would also
work and also allow VCPU requests to be used as designed.

To tidy up the repeated 'vcpu->arch.power_off || vcpu->arch.pause'
condition I think I'll just introduce a vcpu_should_sleep() to encapsulate
it.

Thanks,
drew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN
  2017-05-10  6:58         ` Andrew Jones
@ 2017-05-10  8:07           ` Christoffer Dall
  2017-05-10  8:20             ` Andrew Jones
  0 siblings, 1 reply; 43+ messages in thread
From: Christoffer Dall @ 2017-05-10  8:07 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Christoffer Dall, kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, May 10, 2017 at 08:58:15AM +0200, Andrew Jones wrote:
> On Tue, May 09, 2017 at 01:13:47PM -0700, Christoffer Dall wrote:
> > On Tue, May 09, 2017 at 07:40:57PM +0200, Andrew Jones wrote:
> > > On Sat, May 06, 2017 at 08:27:15PM +0200, Christoffer Dall wrote:
> > > > On Wed, May 03, 2017 at 06:06:32PM +0200, Andrew Jones wrote:
> > > > 
> > > > nit: can you make the subject of this patch a bit more specific?
> > > > 
> > > > For example:  Optimize checking power_off flag in KVM_RUN
> > > 
> > > OK
> > > 
> > > > 
> > > > > We can make a small optimization by not checking the state of
> > > > > the power_off field on each run. This is done by treating
> > > > > power_off like pause, only checking it when we get the EXIT
> > > > > VCPU request. When a VCPU powers off another VCPU the EXIT
> > > > > request is already made, so we just need to make sure the
> > > > > request is also made on self power off. kvm_vcpu_kick() isn't
> > > > > necessary for these cases, as the VCPU would just be kicking
> > > > > itself, but we add it anyway as a self kick doesn't cost much,
> > > > > and it makes the code more future-proof.
> > > > > 
> > > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > > ---
> > > > >  arch/arm/kvm/arm.c  | 16 ++++++++++------
> > > > >  arch/arm/kvm/psci.c |  2 ++
> > > > >  2 files changed, 12 insertions(+), 6 deletions(-)
> > > > > 
> > > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > > > index 26d9d4d72853..24bbc7671d89 100644
> > > > > --- a/arch/arm/kvm/arm.c
> > > > > +++ b/arch/arm/kvm/arm.c
> > > > > @@ -371,6 +371,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > > > >  	kvm_timer_vcpu_put(vcpu);
> > > > >  }
> > > > >  
> > > > > +static void vcpu_power_off(struct kvm_vcpu *vcpu)
> > > > > +{
> > > > > +	vcpu->arch.power_off = true;
> > > > > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > > > > +	kvm_vcpu_kick(vcpu);
> > > > > +}
> > > > > +
> > > > >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> > > > >  				    struct kvm_mp_state *mp_state)
> > > > >  {
> > > > > @@ -390,7 +397,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> > > > >  		vcpu->arch.power_off = false;
> > > > >  		break;
> > > > >  	case KVM_MP_STATE_STOPPED:
> > > > > -		vcpu->arch.power_off = true;
> > > > > +		vcpu_power_off(vcpu);
> > > > >  		break;
> > > > >  	default:
> > > > >  		return -EINVAL;
> > > > > @@ -626,14 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > > >  
> > > > >  		if (kvm_request_pending(vcpu)) {
> > > > >  			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> > > > > -				if (vcpu->arch.pause)
> > > > > +				if (vcpu->arch.power_off || vcpu->arch.pause)
> > > > >  					vcpu_sleep(vcpu);
> > > > >  			}
> > > > >  		}
> > > > >  
> > > > > -		if (vcpu->arch.power_off)
> > > > > -			vcpu_sleep(vcpu);
> > > > > -
> > > > 
> > > > Hmmm, even though I just gave a reviewed-by on the pause side, I'm not
> > > > realizing that I don't think this works.  Because you're now only
> > > > checking requests in the vcpu loop, but the vcpu_sleep() function is
> > > > implemented using swait_event_interruptible(), which can wake up if you
> > > > have a pending signal for example, and then the loop can wrap around and
> > > > you can run the VCPU even though you should be paused.  Am I missing
> > > > something?
> > > 
> > > Hmm, I think I missed something. I missed that swait_event_interruptible()
> > > doesn't check its condition again when awoken by a signal (which, as far
> > > as I can tell, is the only other way we can stop vcpu_sleep() while
> > > power_off and/or pause are true.  Had I noticed that, I could have
> > > addressed it in one of two ways:
> > > 
> > >  1) Leave power_off and pause in the condition that stops guest entry.
> > >     Easy to see we'll never enter guest mode with one or both set.
> > >     
> > >  2) Add a comment somewhere to explain the subtle dependency vcpu_sleep()
> > >     has on the pending signal check done after its call and before the
> > >     condition that stops guest entry is run. (IOW, I don't think we have
> > >     a bug with this series, but we do have a non-commented subtlety.)
> > > 
> > 
> > But, then it can return to userspace and enter the kernel again, at
> > which time there will be no pending signal and no pending VCPU requests,
> > so the VCPU will enter the guest, but the pause flag can still be true
> > and it shouldn't enter the guest.  So I think there is a bug.
> 
> Ah, indeed.
> 
> > 
> > And I think the only nice way to solve it is to not clear the request
> > until the CPU is really not paused any more.
> 
> This would sort of circle back to the original approach of using the
> request bit as the state, but I've already convinced myself that that's
> too much abuse of VCPU requests to want to do it. (1) above would also
> work and also allow VCPU requests to be used as designed.
> 
> To tidy up the repeated 'vcpu->arch.power_off || vcpu->arch.pause'
> condition I think I'll just introduce a vcpu_should_sleep() to encapsulate
> it.
> 

Fair enough, but could we keep these two booleans as flags in a single
unsigned long on the vcpu struct then, so that we can do a single
check on them and call out to handle_run_flags or whatever, analogous to
how we handle requests?

The other way to do it would be to set the request on the VCPU itself
when returning from the sleep function if pause is still set...

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN
  2017-05-10  8:07           ` Christoffer Dall
@ 2017-05-10  8:20             ` Andrew Jones
  2017-05-10  9:06               ` Christoffer Dall
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-10  8:20 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: Christoffer Dall, kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, May 10, 2017 at 10:07:34AM +0200, Christoffer Dall wrote:
> On Wed, May 10, 2017 at 08:58:15AM +0200, Andrew Jones wrote:
> > On Tue, May 09, 2017 at 01:13:47PM -0700, Christoffer Dall wrote:
> > > On Tue, May 09, 2017 at 07:40:57PM +0200, Andrew Jones wrote:
> > > > On Sat, May 06, 2017 at 08:27:15PM +0200, Christoffer Dall wrote:
> > > > > On Wed, May 03, 2017 at 06:06:32PM +0200, Andrew Jones wrote:
> > > > > 
> > > > > nit: can you make the subject of this patch a bit more specific?
> > > > > 
> > > > > For example:  Optimize checking power_off flag in KVM_RUN
> > > > 
> > > > OK
> > > > 
> > > > > 
> > > > > > We can make a small optimization by not checking the state of
> > > > > > the power_off field on each run. This is done by treating
> > > > > > power_off like pause, only checking it when we get the EXIT
> > > > > > VCPU request. When a VCPU powers off another VCPU the EXIT
> > > > > > request is already made, so we just need to make sure the
> > > > > > request is also made on self power off. kvm_vcpu_kick() isn't
> > > > > > necessary for these cases, as the VCPU would just be kicking
> > > > > > itself, but we add it anyway as a self kick doesn't cost much,
> > > > > > and it makes the code more future-proof.
> > > > > > 
> > > > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > > > ---
> > > > > >  arch/arm/kvm/arm.c  | 16 ++++++++++------
> > > > > >  arch/arm/kvm/psci.c |  2 ++
> > > > > >  2 files changed, 12 insertions(+), 6 deletions(-)
> > > > > > 
> > > > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > > > > index 26d9d4d72853..24bbc7671d89 100644
> > > > > > --- a/arch/arm/kvm/arm.c
> > > > > > +++ b/arch/arm/kvm/arm.c
> > > > > > @@ -371,6 +371,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > > > > >  	kvm_timer_vcpu_put(vcpu);
> > > > > >  }
> > > > > >  
> > > > > > +static void vcpu_power_off(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > > +	vcpu->arch.power_off = true;
> > > > > > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > > > > > +	kvm_vcpu_kick(vcpu);
> > > > > > +}
> > > > > > +
> > > > > >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> > > > > >  				    struct kvm_mp_state *mp_state)
> > > > > >  {
> > > > > > @@ -390,7 +397,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> > > > > >  		vcpu->arch.power_off = false;
> > > > > >  		break;
> > > > > >  	case KVM_MP_STATE_STOPPED:
> > > > > > -		vcpu->arch.power_off = true;
> > > > > > +		vcpu_power_off(vcpu);
> > > > > >  		break;
> > > > > >  	default:
> > > > > >  		return -EINVAL;
> > > > > > @@ -626,14 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > > > >  
> > > > > >  		if (kvm_request_pending(vcpu)) {
> > > > > >  			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> > > > > > -				if (vcpu->arch.pause)
> > > > > > +				if (vcpu->arch.power_off || vcpu->arch.pause)
> > > > > >  					vcpu_sleep(vcpu);
> > > > > >  			}
> > > > > >  		}
> > > > > >  
> > > > > > -		if (vcpu->arch.power_off)
> > > > > > -			vcpu_sleep(vcpu);
> > > > > > -
> > > > > 
> > > > > Hmmm, even though I just gave a reviewed-by on the pause side, I'm not
> > > > > realizing that I don't think this works.  Because you're now only
> > > > > checking requests in the vcpu loop, but the vcpu_sleep() function is
> > > > > implemented using swait_event_interruptible(), which can wake up if you
> > > > > have a pending signal for example, and then the loop can wrap around and
> > > > > you can run the VCPU even though you should be paused.  Am I missing
> > > > > something?
> > > > 
> > > > Hmm, I think I missed something. I missed that swait_event_interruptible()
> > > > doesn't check its condition again when awoken by a signal (which, as far
> > > > as I can tell, is the only other way we can stop vcpu_sleep() while
> > > > power_off and/or pause are true.  Had I noticed that, I could have
> > > > addressed it in one of two ways:
> > > > 
> > > >  1) Leave power_off and pause in the condition that stops guest entry.
> > > >     Easy to see we'll never enter guest mode with one or both set.
> > > >     
> > > >  2) Add a comment somewhere to explain the subtle dependency vcpu_sleep()
> > > >     has on the pending signal check done after its call and before the
> > > >     condition that stops guest entry is run. (IOW, I don't think we have
> > > >     a bug with this series, but we do have a non-commented subtlety.)
> > > > 
> > > 
> > > But, then it can return to userspace and enter the kernel again, at
> > > which time there will be no pending signal and no pending VCPU requests,
> > > so the VCPU will enter the guest, but the pause flag can still be true
> > > and it shouldn't enter the guest.  So I think there is a bug.
> > 
> > Ah, indeed.
> > 
> > > 
> > > And I think the only nice way to solve it is to not clear the request
> > > until the CPU is really not paused any more.
> > 
> > This would sort of circle back to the original approach of using the
> > request bit as the state, but I've already convinced myself that that's
> > too much abuse of VCPU requests to want to do it. (1) above would also
> > work and also allow VCPU requests to be used as designed.
> > 
> > To tidy up the repeated 'vcpu->arch.power_off || vcpu->arch.pause'
> > condition I think I'll just introduce a vcpu_should_sleep() to encapsulate
> > it.
> > 
> 
> Fair enough, but could we keep these two booleans as flags in a single
> unsigned long on the vcpu struct then, so that we can do a single
> check on them and call out to handle_run_flags or whatever, analogous to
> how we handle requests?

Could do that.

> 
> The other way to do it would be to set the request on the VCPU itself
> when returning from the sleep function if pause is still set...

I like this suggestion more. I'll do that for v4.

Thanks,
drew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN
  2017-05-10  8:20             ` Andrew Jones
@ 2017-05-10  9:06               ` Christoffer Dall
  0 siblings, 0 replies; 43+ messages in thread
From: Christoffer Dall @ 2017-05-10  9:06 UTC (permalink / raw)
  To: Andrew Jones
  Cc: Christoffer Dall, kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, May 10, 2017 at 10:20:13AM +0200, Andrew Jones wrote:
> On Wed, May 10, 2017 at 10:07:34AM +0200, Christoffer Dall wrote:
> > On Wed, May 10, 2017 at 08:58:15AM +0200, Andrew Jones wrote:
> > > On Tue, May 09, 2017 at 01:13:47PM -0700, Christoffer Dall wrote:
> > > > On Tue, May 09, 2017 at 07:40:57PM +0200, Andrew Jones wrote:
> > > > > On Sat, May 06, 2017 at 08:27:15PM +0200, Christoffer Dall wrote:
> > > > > > On Wed, May 03, 2017 at 06:06:32PM +0200, Andrew Jones wrote:
> > > > > > 
> > > > > > nit: can you make the subject of this patch a bit more specific?
> > > > > > 
> > > > > > For example:  Optimize checking power_off flag in KVM_RUN
> > > > > 
> > > > > OK
> > > > > 
> > > > > > 
> > > > > > > We can make a small optimization by not checking the state of
> > > > > > > the power_off field on each run. This is done by treating
> > > > > > > power_off like pause, only checking it when we get the EXIT
> > > > > > > VCPU request. When a VCPU powers off another VCPU the EXIT
> > > > > > > request is already made, so we just need to make sure the
> > > > > > > request is also made on self power off. kvm_vcpu_kick() isn't
> > > > > > > necessary for these cases, as the VCPU would just be kicking
> > > > > > > itself, but we add it anyway as a self kick doesn't cost much,
> > > > > > > and it makes the code more future-proof.
> > > > > > > 
> > > > > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > > > > ---
> > > > > > >  arch/arm/kvm/arm.c  | 16 ++++++++++------
> > > > > > >  arch/arm/kvm/psci.c |  2 ++
> > > > > > >  2 files changed, 12 insertions(+), 6 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > > > > > index 26d9d4d72853..24bbc7671d89 100644
> > > > > > > --- a/arch/arm/kvm/arm.c
> > > > > > > +++ b/arch/arm/kvm/arm.c
> > > > > > > @@ -371,6 +371,13 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
> > > > > > >  	kvm_timer_vcpu_put(vcpu);
> > > > > > >  }
> > > > > > >  
> > > > > > > +static void vcpu_power_off(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > > +	vcpu->arch.power_off = true;
> > > > > > > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > > > > > > +	kvm_vcpu_kick(vcpu);
> > > > > > > +}
> > > > > > > +
> > > > > > >  int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,
> > > > > > >  				    struct kvm_mp_state *mp_state)
> > > > > > >  {
> > > > > > > @@ -390,7 +397,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
> > > > > > >  		vcpu->arch.power_off = false;
> > > > > > >  		break;
> > > > > > >  	case KVM_MP_STATE_STOPPED:
> > > > > > > -		vcpu->arch.power_off = true;
> > > > > > > +		vcpu_power_off(vcpu);
> > > > > > >  		break;
> > > > > > >  	default:
> > > > > > >  		return -EINVAL;
> > > > > > > @@ -626,14 +633,11 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > > > > >  
> > > > > > >  		if (kvm_request_pending(vcpu)) {
> > > > > > >  			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> > > > > > > -				if (vcpu->arch.pause)
> > > > > > > +				if (vcpu->arch.power_off || vcpu->arch.pause)
> > > > > > >  					vcpu_sleep(vcpu);
> > > > > > >  			}
> > > > > > >  		}
> > > > > > >  
> > > > > > > -		if (vcpu->arch.power_off)
> > > > > > > -			vcpu_sleep(vcpu);
> > > > > > > -
> > > > > > 
> > > > > > Hmmm, even though I just gave a reviewed-by on the pause side, I'm not
> > > > > > realizing that I don't think this works.  Because you're now only
> > > > > > checking requests in the vcpu loop, but the vcpu_sleep() function is
> > > > > > implemented using swait_event_interruptible(), which can wake up if you
> > > > > > have a pending signal for example, and then the loop can wrap around and
> > > > > > you can run the VCPU even though you should be paused.  Am I missing
> > > > > > something?
> > > > > 
> > > > > Hmm, I think I missed something. I missed that swait_event_interruptible()
> > > > > doesn't check its condition again when awoken by a signal (which, as far
> > > > > as I can tell, is the only other way we can stop vcpu_sleep() while
> > > > > power_off and/or pause are true.  Had I noticed that, I could have
> > > > > addressed it in one of two ways:
> > > > > 
> > > > >  1) Leave power_off and pause in the condition that stops guest entry.
> > > > >     Easy to see we'll never enter guest mode with one or both set.
> > > > >     
> > > > >  2) Add a comment somewhere to explain the subtle dependency vcpu_sleep()
> > > > >     has on the pending signal check done after its call and before the
> > > > >     condition that stops guest entry is run. (IOW, I don't think we have
> > > > >     a bug with this series, but we do have a non-commented subtlety.)
> > > > > 
> > > > 
> > > > But, then it can return to userspace and enter the kernel again, at
> > > > which time there will be no pending signal and no pending VCPU requests,
> > > > so the VCPU will enter the guest, but the pause flag can still be true
> > > > and it shouldn't enter the guest.  So I think there is a bug.
> > > 
> > > Ah, indeed.
> > > 
> > > > 
> > > > And I think the only nice way to solve it is to not clear the request
> > > > until the CPU is really not paused any more.
> > > 
> > > This would sort of circle back to the original approach of using the
> > > request bit as the state, but I've already convinced myself that that's
> > > too much abuse of VCPU requests to want to do it. (1) above would also
> > > work and also allow VCPU requests to be used as designed.
> > > 
> > > To tidy up the repeated 'vcpu->arch.power_off || vcpu->arch.pause'
> > > condition I think I'll just introduce a vcpu_should_sleep() to encapsulate
> > > it.
> > > 
> > 
> > Fair enough, but could we keep these two booleans as flags in a single
> > unsigned long on the vcpu struct then, so that we can do a single
> > check on them and call out to handle_run_flags or whatever, analogous to
> > how we handle requests?
> 
> Could do that.
> 
> > 
> > The other way to do it would be to set the request on the VCPU itself
> > when returning from the sleep function if pause is still set...
> 
> I like this suggestion more. I'll do that for v4.
> 
pause or power_off that is.

Cool, I'll look forward to seeing how that looks like.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller
  2017-05-09 17:17     ` Andrew Jones
@ 2017-05-10  9:55       ` Christoffer Dall
  2017-05-10 10:07         ` Andrew Jones
  0 siblings, 1 reply; 43+ messages in thread
From: Christoffer Dall @ 2017-05-10  9:55 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, May 09, 2017 at 07:17:06PM +0200, Andrew Jones wrote:
> On Sat, May 06, 2017 at 08:12:56PM +0200, Christoffer Dall wrote:
> > On Wed, May 03, 2017 at 06:06:30PM +0200, Andrew Jones wrote:
> > > VCPU requests that the receiver should handle should only be cleared
> > > by the receiver. 
> > 
> > I cannot parse this sentence.
> 
> I'll try again:
> 
> VCPU requests should only be cleared by the receiving VCPUs.  The only
> exception is when a request is set as a side-effect.  In these cases
> the "requester" threads may clear the requests when it is sure the
> receiving VCPUs do not need to see them.
> 

I can parse this, and I mostly understand this, except for the part
about side-effects.

> > 
> > > Not only does this properly implement the protocol,
> > > but also avoids bugs where one VCPU clears another VCPU's request,
> > > before the receiving VCPU has had a chance to see it.
> > 
> > Is this an actual race we have currently or just something thay may
> > happen later.  Im' not sure.
> 
> Since ARM is just learning to handle VCPU requests, then it's not a bug
> now.  Actually, I think I should state this protocol (what I wrote above)
> in the document, and then I can just reference that here in this commit
> message as the justification for change.

That might solve the missing piece for me above, yes.

> > 
> > > ARM VCPUs
> > > currently only handle one request, EXIT, and handling it is achieved
> > > by checking pause to see if the VCPU should sleep.
> > 
> > This makes sense.  So forget my comment on the previous patch about
> > getting rid of the pause flag.
> 
> Forgotten
> 
> > 
> > > 
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > ---
> > >  arch/arm/kvm/arm.c | 10 ++++++++--
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > index 9174ed13135a..7be0d9b0c63a 100644
> > > --- a/arch/arm/kvm/arm.c
> > > +++ b/arch/arm/kvm/arm.c
> > > @@ -553,7 +553,6 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> > >  {
> > >  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> > >  
> > > -	kvm_clear_request(KVM_REQ_VCPU_EXIT, vcpu);
> > >  	vcpu->arch.pause = false;
> > >  	swake_up(wq);
> > >  }
> > > @@ -625,7 +624,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > >  
> > >  		update_vttbr(vcpu->kvm);
> > >  
> > > -		if (vcpu->arch.power_off || vcpu->arch.pause)
> > > +		if (kvm_request_pending(vcpu)) {
> > > +			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> > > +				if (vcpu->arch.pause)
> > > +					vcpu_sleep(vcpu);
> > > +			}
> > 
> > Can we factor out this bit to a separate function,
> > kvm_handle_vcpu_requests() or something like that?
> 
> Later patches make this look a bit better, but a function to bundle all
> the request handling up sounds good too. Will do.
> 

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu
  2017-05-09 17:02     ` Andrew Jones
@ 2017-05-10  9:59       ` Christoffer Dall
  2017-05-15 11:14       ` Christoffer Dall
  1 sibling, 0 replies; 43+ messages in thread
From: Christoffer Dall @ 2017-05-10  9:59 UTC (permalink / raw)
  To: Andrew Jones; +Cc: marc.zyngier, pbonzini, kvmarm, kvm

On Tue, May 09, 2017 at 07:02:51PM +0200, Andrew Jones wrote:
> On Sat, May 06, 2017 at 08:08:09PM +0200, Christoffer Dall wrote:
> > On Wed, May 03, 2017 at 06:06:29PM +0200, Andrew Jones wrote:
> > > VCPU halting/resuming is partially implemented with VCPU requests.
> > > When kvm_arm_halt_guest() is called all VCPUs get the EXIT request,
> > > telling them to exit guest mode and look at the state of 'pause',
> > > which will be true, telling them to sleep.  As ARM's VCPU RUN
> > > implements the memory barrier pattern described in "Ensuring Requests
> > > Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst, there's
> > > no way for a VCPU halted by kvm_arm_halt_guest() to miss the pause
> > > state change.  However, before this patch, a single VCPU halted with
> > > kvm_arm_halt_vcpu() did not get a request, opening a tiny race window.
> > > This patch adds the request, closing the race window and also allowing
> > > us to remove the final check of pause in VCPU RUN, as the final check
> > > for requests is sufficient.
> > > 
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > 
> > > ---
> > > 
> > > I have two questions about the halting/resuming.
> > > 
> > > Question 1:
> > > 
> > > Do we even need kvm_arm_halt_vcpu()/kvm_arm_resume_vcpu()? It should
> > > only be necessary if one VCPU can activate or inactivate the private
> > > IRQs of another VCPU, right?  That doesn't seem like something that
> > > should be possible, but I'm GIC-illiterate...
> > 
> > True, it shouldn't be possible.  I wonder if we were thinking of
> > userspace access to the CPU-specific data, but we already ensure that no
> > VCPUs are running at that time, so I don't think it should be necessary.
> > 
> > > 
> > > Question 2:
> > > 
> > > It's not clear to me if we have another problem with halting/resuming
> > > or not.  If it's possible for VCPU1 and VCPU2 to race in
> > > vgic_mmio_write_s/cactive(), then the following scenario could occur,
> > > leading to VCPU3 being in guest mode when it should not be.  Does the
> > > hardware prohibit more than one VCPU entering trap handlers that lead
> > > to these functions at the same time?  If not, then I guess pause needs
> > > to be a counter instead of a boolean.
> > > 
> > >  VCPU1                 VCPU2                  VCPU3
> > >  -----                 -----                  -----
> > >                        VCPU3->pause = true;
> > >                        halt(VCPU3);
> > >                                               if (pause)
> > >                                                 sleep();
> > >  VCPU3->pause = true;
> > >  halt(VCPU3);
> > >                        VCPU3->pause = false;
> > >                        resume(VCPU3);
> > >                                               ...wake up...
> > >                                               if (!pause)
> > >                                                 Enter guest mode. Bad!
> > >  VCPU3->pause = false;
> > >  resume(VCPU3);
> > > 
> > > (Yes, the "Bad!" is there to both identify something we don't want
> > >  occurring and to make fun of Trump's tweeting style.)
> > 
> > I think it's bad, and it might be even worse, because it could lead to a
> > CPU looping forever in the host kernel, since there's no guarantee to
> > exit from the VM in the other VCPU thread.
> > 
> > But I think simply taking the kvm->lock mutex to serialize the mmio
> > active change operations should be sufficient.
> > 
> > If we agree on this I can send a patch with your reported by that fixes
> > that issue, which gets rid of kvm_arm_halt_vcpu and requires you to
> > modify your first patch to clear the KVM_REQ_VCPU_EXIT flag for each
> > vcpu in kvm_arm_halt_guest instead and you can fold the remaining change
> > from this patch into a patch that completely gets rid of the pause flag.
> 
> Yup, seems reasonable to me to lock the kvm mutex on a stop the guest type
> action.
> 
> > 
> > See untested patch draft at the end of this mail.
> > 
> > Thanks,
> > -Christoffer
> > 
> > > ---
> > >  arch/arm/kvm/arm.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > index 47f6c7fdca96..9174ed13135a 100644
> > > --- a/arch/arm/kvm/arm.c
> > > +++ b/arch/arm/kvm/arm.c
> > > @@ -545,6 +545,7 @@ void kvm_arm_halt_guest(struct kvm *kvm)
> > >  void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> > >  {
> > >  	vcpu->arch.pause = true;
> > > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > >  	kvm_vcpu_kick(vcpu);
> > >  }
> > >  
> > > @@ -664,7 +665,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > >  
> > >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> > >  		    kvm_request_pending(vcpu) ||
> > > -		    vcpu->arch.power_off || vcpu->arch.pause) {
> > > +		    vcpu->arch.power_off) {
> > >  			vcpu->mode = OUTSIDE_GUEST_MODE;
> > >  			local_irq_enable();
> > >  			kvm_pmu_sync_hwstate(vcpu);
> > > -- 
> > > 2.9.3
> > > 
> > 
> > 
> > Untested draft patch:
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index d488b88..b77a3af 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -234,8 +234,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
> >  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
> >  void kvm_arm_halt_guest(struct kvm *kvm);
> >  void kvm_arm_resume_guest(struct kvm *kvm);
> > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
> >  
> >  int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> >  unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 578df18..7a38d5a 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -334,8 +334,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
> >  struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
> >  void kvm_arm_halt_guest(struct kvm *kvm);
> >  void kvm_arm_resume_guest(struct kvm *kvm);
> > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
> >  
> >  u64 __kvm_call_hyp(void *hypfn, ...);
> >  #define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 7941699..932788a 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -542,27 +542,15 @@ void kvm_arm_halt_guest(struct kvm *kvm)
> >  	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
> >  }
> >  
> > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> > -{
> > -	vcpu->arch.pause = true;
> > -	kvm_vcpu_kick(vcpu);
> > -}
> > -
> > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> > -{
> > -	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> > -
> > -	vcpu->arch.pause = false;
> > -	swake_up(wq);
> > -}
> > -
> >  void kvm_arm_resume_guest(struct kvm *kvm)
> >  {
> >  	int i;
> >  	struct kvm_vcpu *vcpu;
> >  
> > -	kvm_for_each_vcpu(i, vcpu, kvm)
> > -		kvm_arm_resume_vcpu(vcpu);
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		vcpu->arch.pause = false;
> > +		swake_up(kvm_arch_vcpu_wq(vcpu));
> > +	}
> >  }
> >  
> >  static void vcpu_sleep(struct kvm_vcpu *vcpu)
> > diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> > index 2a5db13..c143add 100644
> > --- a/virt/kvm/arm/vgic/vgic-mmio.c
> > +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> > @@ -231,23 +231,21 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
> >   * be migrated while we don't hold the IRQ locks and we don't want to be
> >   * chasing moving targets.
> >   *
> > - * For private interrupts, we only have to make sure the single and only VCPU
> > - * that can potentially queue the IRQ is stopped.
> > + * For private interrupts we don't have to do anything because userspace
> > + * accesses to the VGIC state already require all VCPUs to be stopped, and
> > + * only the VCPU itself can modify its private interrupts active state, which
> > + * guarantees that the VCPU is not running.
> >   */
> >  static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
> >  {
> > -	if (intid < VGIC_NR_PRIVATE_IRQS)
> > -		kvm_arm_halt_vcpu(vcpu);
> > -	else
> > +	if (intid > VGIC_NR_PRIVATE_IRQS)
> >  		kvm_arm_halt_guest(vcpu->kvm);
> >  }
> >  
> >  /* See vgic_change_active_prepare */
> >  static void vgic_change_active_finish(struct kvm_vcpu *vcpu, u32 intid)
> >  {
> > -	if (intid < VGIC_NR_PRIVATE_IRQS)
> > -		kvm_arm_resume_vcpu(vcpu);
> > -	else
> > +	if (intid > VGIC_NR_PRIVATE_IRQS)
> >  		kvm_arm_resume_guest(vcpu->kvm);
> >  }
> >  
> > @@ -258,6 +256,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> >  
> > +	mutex_lock(&vcpu->kvm->lock);
> >  	vgic_change_active_prepare(vcpu, intid);
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > @@ -265,6 +264,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  	vgic_change_active_finish(vcpu, intid);
> > +	mutex_unlock(&vcpu->kvm->lock);
> >  }
> >  
> >  void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> > @@ -274,6 +274,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> >  
> > +	mutex_lock(&vcpu->kvm->lock);
> >  	vgic_change_active_prepare(vcpu, intid);
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > @@ -281,6 +282,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  	vgic_change_active_finish(vcpu, intid);
> > +	mutex_unlock(&vcpu->kvm->lock);
> >  }
> >  
> >  unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
> 
> Looks good to me. How about adding kvm->lock to the locking order comment
> at the top of virt/kvm/arm/vgic/vgic.c too. With that, you can add my R-b
> on the posting.

That's a good point.  That covers the case of the ITS save/restore as
well.

> 
> I'll rebase this series on your posting.
> 

Will send out shortly.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller
  2017-05-10  9:55       ` Christoffer Dall
@ 2017-05-10 10:07         ` Andrew Jones
  2017-05-10 12:19           ` Christoffer Dall
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-10 10:07 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, May 10, 2017 at 11:55:11AM +0200, Christoffer Dall wrote:
> On Tue, May 09, 2017 at 07:17:06PM +0200, Andrew Jones wrote:
> > On Sat, May 06, 2017 at 08:12:56PM +0200, Christoffer Dall wrote:
> > > On Wed, May 03, 2017 at 06:06:30PM +0200, Andrew Jones wrote:
> > > > VCPU requests that the receiver should handle should only be cleared
> > > > by the receiver. 
> > > 
> > > I cannot parse this sentence.
> > 
> > I'll try again:
> > 
> > VCPU requests should only be cleared by the receiving VCPUs.  The only
> > exception is when a request is set as a side-effect.  In these cases
> > the "requester" threads may clear the requests when it is sure the
> > receiving VCPUs do not need to see them.
> > 
> 
> I can parse this, and I mostly understand this, except for the part
> about side-effects.

E.g. kvm_vcpu_block(). This case isn't perfect, because the requester is
also the receiver, but the protocol applies to self-requests too, so it
still counts. Here KVM_REQ_UNHALT may be set as a side-effect of the call,
but on exit from the call, the caller may be sure that the receiver
(itself) doesn't care about the request, and thus can just clear it.

Thanks,
drew

> 
> > > 
> > > > Not only does this properly implement the protocol,
> > > > but also avoids bugs where one VCPU clears another VCPU's request,
> > > > before the receiving VCPU has had a chance to see it.
> > > 
> > > Is this an actual race we have currently or just something thay may
> > > happen later.  Im' not sure.
> > 
> > Since ARM is just learning to handle VCPU requests, then it's not a bug
> > now.  Actually, I think I should state this protocol (what I wrote above)
> > in the document, and then I can just reference that here in this commit
> > message as the justification for change.
> 
> That might solve the missing piece for me above, yes.
> 
> > > 
> > > > ARM VCPUs
> > > > currently only handle one request, EXIT, and handling it is achieved
> > > > by checking pause to see if the VCPU should sleep.
> > > 
> > > This makes sense.  So forget my comment on the previous patch about
> > > getting rid of the pause flag.
> > 
> > Forgotten
> > 
> > > 
> > > > 
> > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > ---
> > > >  arch/arm/kvm/arm.c | 10 ++++++++--
> > > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > > index 9174ed13135a..7be0d9b0c63a 100644
> > > > --- a/arch/arm/kvm/arm.c
> > > > +++ b/arch/arm/kvm/arm.c
> > > > @@ -553,7 +553,6 @@ void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> > > >  {
> > > >  	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> > > >  
> > > > -	kvm_clear_request(KVM_REQ_VCPU_EXIT, vcpu);
> > > >  	vcpu->arch.pause = false;
> > > >  	swake_up(wq);
> > > >  }
> > > > @@ -625,7 +624,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > >  
> > > >  		update_vttbr(vcpu->kvm);
> > > >  
> > > > -		if (vcpu->arch.power_off || vcpu->arch.pause)
> > > > +		if (kvm_request_pending(vcpu)) {
> > > > +			if (kvm_check_request(KVM_REQ_VCPU_EXIT, vcpu)) {
> > > > +				if (vcpu->arch.pause)
> > > > +					vcpu_sleep(vcpu);
> > > > +			}
> > > 
> > > Can we factor out this bit to a separate function,
> > > kvm_handle_vcpu_requests() or something like that?
> > 
> > Later patches make this look a bit better, but a function to bundle all
> > the request handling up sounds good too. Will do.
> > 
> 
> Thanks,
> -Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller
  2017-05-10 10:07         ` Andrew Jones
@ 2017-05-10 12:19           ` Christoffer Dall
  0 siblings, 0 replies; 43+ messages in thread
From: Christoffer Dall @ 2017-05-10 12:19 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Wed, May 10, 2017 at 12:07:31PM +0200, Andrew Jones wrote:
> On Wed, May 10, 2017 at 11:55:11AM +0200, Christoffer Dall wrote:
> > On Tue, May 09, 2017 at 07:17:06PM +0200, Andrew Jones wrote:
> > > On Sat, May 06, 2017 at 08:12:56PM +0200, Christoffer Dall wrote:
> > > > On Wed, May 03, 2017 at 06:06:30PM +0200, Andrew Jones wrote:
> > > > > VCPU requests that the receiver should handle should only be cleared
> > > > > by the receiver. 
> > > > 
> > > > I cannot parse this sentence.
> > > 
> > > I'll try again:
> > > 
> > > VCPU requests should only be cleared by the receiving VCPUs.  The only
> > > exception is when a request is set as a side-effect.  In these cases
> > > the "requester" threads may clear the requests when it is sure the
> > > receiving VCPUs do not need to see them.
> > > 
> > 
> > I can parse this, and I mostly understand this, except for the part
> > about side-effects.
> 
> E.g. kvm_vcpu_block(). This case isn't perfect, because the requester is
> also the receiver, but the protocol applies to self-requests too, so it
> still counts. Here KVM_REQ_UNHALT may be set as a side-effect of the call,
> but on exit from the call, the caller may be sure that the receiver
> (itself) doesn't care about the request, and thus can just clear it.
> 

I see.  You could mention this as an example if you like.

Thanks,
-Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu
  2017-05-09 17:02     ` Andrew Jones
  2017-05-10  9:59       ` Christoffer Dall
@ 2017-05-15 11:14       ` Christoffer Dall
  2017-05-16  2:17         ` Andrew Jones
  1 sibling, 1 reply; 43+ messages in thread
From: Christoffer Dall @ 2017-05-15 11:14 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, May 09, 2017 at 07:02:51PM +0200, Andrew Jones wrote:
> On Sat, May 06, 2017 at 08:08:09PM +0200, Christoffer Dall wrote:
> > On Wed, May 03, 2017 at 06:06:29PM +0200, Andrew Jones wrote:
> > > VCPU halting/resuming is partially implemented with VCPU requests.
> > > When kvm_arm_halt_guest() is called all VCPUs get the EXIT request,
> > > telling them to exit guest mode and look at the state of 'pause',
> > > which will be true, telling them to sleep.  As ARM's VCPU RUN
> > > implements the memory barrier pattern described in "Ensuring Requests
> > > Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst, there's
> > > no way for a VCPU halted by kvm_arm_halt_guest() to miss the pause
> > > state change.  However, before this patch, a single VCPU halted with
> > > kvm_arm_halt_vcpu() did not get a request, opening a tiny race window.
> > > This patch adds the request, closing the race window and also allowing
> > > us to remove the final check of pause in VCPU RUN, as the final check
> > > for requests is sufficient.
> > > 
> > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > 
> > > ---
> > > 
> > > I have two questions about the halting/resuming.
> > > 
> > > Question 1:
> > > 
> > > Do we even need kvm_arm_halt_vcpu()/kvm_arm_resume_vcpu()? It should
> > > only be necessary if one VCPU can activate or inactivate the private
> > > IRQs of another VCPU, right?  That doesn't seem like something that
> > > should be possible, but I'm GIC-illiterate...
> > 
> > True, it shouldn't be possible.  I wonder if we were thinking of
> > userspace access to the CPU-specific data, but we already ensure that no
> > VCPUs are running at that time, so I don't think it should be necessary.
> > 
> > > 
> > > Question 2:
> > > 
> > > It's not clear to me if we have another problem with halting/resuming
> > > or not.  If it's possible for VCPU1 and VCPU2 to race in
> > > vgic_mmio_write_s/cactive(), then the following scenario could occur,
> > > leading to VCPU3 being in guest mode when it should not be.  Does the
> > > hardware prohibit more than one VCPU entering trap handlers that lead
> > > to these functions at the same time?  If not, then I guess pause needs
> > > to be a counter instead of a boolean.
> > > 
> > >  VCPU1                 VCPU2                  VCPU3
> > >  -----                 -----                  -----
> > >                        VCPU3->pause = true;
> > >                        halt(VCPU3);
> > >                                               if (pause)
> > >                                                 sleep();
> > >  VCPU3->pause = true;
> > >  halt(VCPU3);
> > >                        VCPU3->pause = false;
> > >                        resume(VCPU3);
> > >                                               ...wake up...
> > >                                               if (!pause)
> > >                                                 Enter guest mode. Bad!
> > >  VCPU3->pause = false;
> > >  resume(VCPU3);
> > > 
> > > (Yes, the "Bad!" is there to both identify something we don't want
> > >  occurring and to make fun of Trump's tweeting style.)
> > 
> > I think it's bad, and it might be even worse, because it could lead to a
> > CPU looping forever in the host kernel, since there's no guarantee to
> > exit from the VM in the other VCPU thread.
> > 
> > But I think simply taking the kvm->lock mutex to serialize the mmio
> > active change operations should be sufficient.
> > 
> > If we agree on this I can send a patch with your reported by that fixes
> > that issue, which gets rid of kvm_arm_halt_vcpu and requires you to
> > modify your first patch to clear the KVM_REQ_VCPU_EXIT flag for each
> > vcpu in kvm_arm_halt_guest instead and you can fold the remaining change
> > from this patch into a patch that completely gets rid of the pause flag.
> 
> Yup, seems reasonable to me to lock the kvm mutex on a stop the guest type
> action.
> 
> > 
> > See untested patch draft at the end of this mail.
> > 
> > Thanks,
> > -Christoffer
> > 
> > > ---
> > >  arch/arm/kvm/arm.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > index 47f6c7fdca96..9174ed13135a 100644
> > > --- a/arch/arm/kvm/arm.c
> > > +++ b/arch/arm/kvm/arm.c
> > > @@ -545,6 +545,7 @@ void kvm_arm_halt_guest(struct kvm *kvm)
> > >  void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> > >  {
> > >  	vcpu->arch.pause = true;
> > > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > >  	kvm_vcpu_kick(vcpu);
> > >  }
> > >  
> > > @@ -664,7 +665,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > >  
> > >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> > >  		    kvm_request_pending(vcpu) ||
> > > -		    vcpu->arch.power_off || vcpu->arch.pause) {
> > > +		    vcpu->arch.power_off) {
> > >  			vcpu->mode = OUTSIDE_GUEST_MODE;
> > >  			local_irq_enable();
> > >  			kvm_pmu_sync_hwstate(vcpu);
> > > -- 
> > > 2.9.3
> > > 
> > 
> > 
> > Untested draft patch:
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > index d488b88..b77a3af 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -234,8 +234,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
> >  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
> >  void kvm_arm_halt_guest(struct kvm *kvm);
> >  void kvm_arm_resume_guest(struct kvm *kvm);
> > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
> >  
> >  int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> >  unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
> > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > index 578df18..7a38d5a 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -334,8 +334,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
> >  struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
> >  void kvm_arm_halt_guest(struct kvm *kvm);
> >  void kvm_arm_resume_guest(struct kvm *kvm);
> > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
> >  
> >  u64 __kvm_call_hyp(void *hypfn, ...);
> >  #define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 7941699..932788a 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -542,27 +542,15 @@ void kvm_arm_halt_guest(struct kvm *kvm)
> >  	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
> >  }
> >  
> > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> > -{
> > -	vcpu->arch.pause = true;
> > -	kvm_vcpu_kick(vcpu);
> > -}
> > -
> > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> > -{
> > -	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> > -
> > -	vcpu->arch.pause = false;
> > -	swake_up(wq);
> > -}
> > -
> >  void kvm_arm_resume_guest(struct kvm *kvm)
> >  {
> >  	int i;
> >  	struct kvm_vcpu *vcpu;
> >  
> > -	kvm_for_each_vcpu(i, vcpu, kvm)
> > -		kvm_arm_resume_vcpu(vcpu);
> > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > +		vcpu->arch.pause = false;
> > +		swake_up(kvm_arch_vcpu_wq(vcpu));
> > +	}
> >  }
> >  
> >  static void vcpu_sleep(struct kvm_vcpu *vcpu)
> > diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> > index 2a5db13..c143add 100644
> > --- a/virt/kvm/arm/vgic/vgic-mmio.c
> > +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> > @@ -231,23 +231,21 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
> >   * be migrated while we don't hold the IRQ locks and we don't want to be
> >   * chasing moving targets.
> >   *
> > - * For private interrupts, we only have to make sure the single and only VCPU
> > - * that can potentially queue the IRQ is stopped.
> > + * For private interrupts we don't have to do anything because userspace
> > + * accesses to the VGIC state already require all VCPUs to be stopped, and
> > + * only the VCPU itself can modify its private interrupts active state, which
> > + * guarantees that the VCPU is not running.
> >   */
> >  static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
> >  {
> > -	if (intid < VGIC_NR_PRIVATE_IRQS)
> > -		kvm_arm_halt_vcpu(vcpu);
> > -	else
> > +	if (intid > VGIC_NR_PRIVATE_IRQS)
> >  		kvm_arm_halt_guest(vcpu->kvm);
> >  }
> >  
> >  /* See vgic_change_active_prepare */
> >  static void vgic_change_active_finish(struct kvm_vcpu *vcpu, u32 intid)
> >  {
> > -	if (intid < VGIC_NR_PRIVATE_IRQS)
> > -		kvm_arm_resume_vcpu(vcpu);
> > -	else
> > +	if (intid > VGIC_NR_PRIVATE_IRQS)
> >  		kvm_arm_resume_guest(vcpu->kvm);
> >  }
> >  
> > @@ -258,6 +256,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> >  
> > +	mutex_lock(&vcpu->kvm->lock);
> >  	vgic_change_active_prepare(vcpu, intid);
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > @@ -265,6 +264,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  	vgic_change_active_finish(vcpu, intid);
> > +	mutex_unlock(&vcpu->kvm->lock);
> >  }
> >  
> >  void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> > @@ -274,6 +274,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> >  	int i;
> >  
> > +	mutex_lock(&vcpu->kvm->lock);
> >  	vgic_change_active_prepare(vcpu, intid);
> >  	for_each_set_bit(i, &val, len * 8) {
> >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > @@ -281,6 +282,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> >  		vgic_put_irq(vcpu->kvm, irq);
> >  	}
> >  	vgic_change_active_finish(vcpu, intid);
> > +	mutex_unlock(&vcpu->kvm->lock);
> >  }
> >  
> >  unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
> 
> Looks good to me. How about adding kvm->lock to the locking order comment
> at the top of virt/kvm/arm/vgic/vgic.c too. With that, you can add my R-b
> on the posting.
> 
> I'll rebase this series on your posting.
> 

FYI, this patch is now in kvmarm/queue.

-Christoffer

> Thanks,
> drew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu
  2017-05-15 11:14       ` Christoffer Dall
@ 2017-05-16  2:17         ` Andrew Jones
  2017-05-16 10:06           ` Christoffer Dall
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Jones @ 2017-05-16  2:17 UTC (permalink / raw)
  To: Christoffer Dall; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Mon, May 15, 2017 at 01:14:42PM +0200, Christoffer Dall wrote:
> On Tue, May 09, 2017 at 07:02:51PM +0200, Andrew Jones wrote:
> > On Sat, May 06, 2017 at 08:08:09PM +0200, Christoffer Dall wrote:
> > > On Wed, May 03, 2017 at 06:06:29PM +0200, Andrew Jones wrote:
> > > > VCPU halting/resuming is partially implemented with VCPU requests.
> > > > When kvm_arm_halt_guest() is called all VCPUs get the EXIT request,
> > > > telling them to exit guest mode and look at the state of 'pause',
> > > > which will be true, telling them to sleep.  As ARM's VCPU RUN
> > > > implements the memory barrier pattern described in "Ensuring Requests
> > > > Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst, there's
> > > > no way for a VCPU halted by kvm_arm_halt_guest() to miss the pause
> > > > state change.  However, before this patch, a single VCPU halted with
> > > > kvm_arm_halt_vcpu() did not get a request, opening a tiny race window.
> > > > This patch adds the request, closing the race window and also allowing
> > > > us to remove the final check of pause in VCPU RUN, as the final check
> > > > for requests is sufficient.
> > > > 
> > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > 
> > > > ---
> > > > 
> > > > I have two questions about the halting/resuming.
> > > > 
> > > > Question 1:
> > > > 
> > > > Do we even need kvm_arm_halt_vcpu()/kvm_arm_resume_vcpu()? It should
> > > > only be necessary if one VCPU can activate or inactivate the private
> > > > IRQs of another VCPU, right?  That doesn't seem like something that
> > > > should be possible, but I'm GIC-illiterate...
> > > 
> > > True, it shouldn't be possible.  I wonder if we were thinking of
> > > userspace access to the CPU-specific data, but we already ensure that no
> > > VCPUs are running at that time, so I don't think it should be necessary.
> > > 
> > > > 
> > > > Question 2:
> > > > 
> > > > It's not clear to me if we have another problem with halting/resuming
> > > > or not.  If it's possible for VCPU1 and VCPU2 to race in
> > > > vgic_mmio_write_s/cactive(), then the following scenario could occur,
> > > > leading to VCPU3 being in guest mode when it should not be.  Does the
> > > > hardware prohibit more than one VCPU entering trap handlers that lead
> > > > to these functions at the same time?  If not, then I guess pause needs
> > > > to be a counter instead of a boolean.
> > > > 
> > > >  VCPU1                 VCPU2                  VCPU3
> > > >  -----                 -----                  -----
> > > >                        VCPU3->pause = true;
> > > >                        halt(VCPU3);
> > > >                                               if (pause)
> > > >                                                 sleep();
> > > >  VCPU3->pause = true;
> > > >  halt(VCPU3);
> > > >                        VCPU3->pause = false;
> > > >                        resume(VCPU3);
> > > >                                               ...wake up...
> > > >                                               if (!pause)
> > > >                                                 Enter guest mode. Bad!
> > > >  VCPU3->pause = false;
> > > >  resume(VCPU3);
> > > > 
> > > > (Yes, the "Bad!" is there to both identify something we don't want
> > > >  occurring and to make fun of Trump's tweeting style.)
> > > 
> > > I think it's bad, and it might be even worse, because it could lead to a
> > > CPU looping forever in the host kernel, since there's no guarantee to
> > > exit from the VM in the other VCPU thread.
> > > 
> > > But I think simply taking the kvm->lock mutex to serialize the mmio
> > > active change operations should be sufficient.
> > > 
> > > If we agree on this I can send a patch with your reported by that fixes
> > > that issue, which gets rid of kvm_arm_halt_vcpu and requires you to
> > > modify your first patch to clear the KVM_REQ_VCPU_EXIT flag for each
> > > vcpu in kvm_arm_halt_guest instead and you can fold the remaining change
> > > from this patch into a patch that completely gets rid of the pause flag.
> > 
> > Yup, seems reasonable to me to lock the kvm mutex on a stop the guest type
> > action.
> > 
> > > 
> > > See untested patch draft at the end of this mail.
> > > 
> > > Thanks,
> > > -Christoffer
> > > 
> > > > ---
> > > >  arch/arm/kvm/arm.c | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > > index 47f6c7fdca96..9174ed13135a 100644
> > > > --- a/arch/arm/kvm/arm.c
> > > > +++ b/arch/arm/kvm/arm.c
> > > > @@ -545,6 +545,7 @@ void kvm_arm_halt_guest(struct kvm *kvm)
> > > >  void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> > > >  {
> > > >  	vcpu->arch.pause = true;
> > > > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > > >  	kvm_vcpu_kick(vcpu);
> > > >  }
> > > >  
> > > > @@ -664,7 +665,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > >  
> > > >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> > > >  		    kvm_request_pending(vcpu) ||
> > > > -		    vcpu->arch.power_off || vcpu->arch.pause) {
> > > > +		    vcpu->arch.power_off) {
> > > >  			vcpu->mode = OUTSIDE_GUEST_MODE;
> > > >  			local_irq_enable();
> > > >  			kvm_pmu_sync_hwstate(vcpu);
> > > > -- 
> > > > 2.9.3
> > > > 
> > > 
> > > 
> > > Untested draft patch:
> > > 
> > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > > index d488b88..b77a3af 100644
> > > --- a/arch/arm/include/asm/kvm_host.h
> > > +++ b/arch/arm/include/asm/kvm_host.h
> > > @@ -234,8 +234,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
> > >  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
> > >  void kvm_arm_halt_guest(struct kvm *kvm);
> > >  void kvm_arm_resume_guest(struct kvm *kvm);
> > > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> > > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
> > >  
> > >  int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> > >  unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
> > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > index 578df18..7a38d5a 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -334,8 +334,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
> > >  struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
> > >  void kvm_arm_halt_guest(struct kvm *kvm);
> > >  void kvm_arm_resume_guest(struct kvm *kvm);
> > > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> > > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
> > >  
> > >  u64 __kvm_call_hyp(void *hypfn, ...);
> > >  #define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
> > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > index 7941699..932788a 100644
> > > --- a/virt/kvm/arm/arm.c
> > > +++ b/virt/kvm/arm/arm.c
> > > @@ -542,27 +542,15 @@ void kvm_arm_halt_guest(struct kvm *kvm)
> > >  	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
> > >  }
> > >  
> > > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> > > -{
> > > -	vcpu->arch.pause = true;
> > > -	kvm_vcpu_kick(vcpu);
> > > -}
> > > -
> > > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> > > -{
> > > -	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> > > -
> > > -	vcpu->arch.pause = false;
> > > -	swake_up(wq);
> > > -}
> > > -
> > >  void kvm_arm_resume_guest(struct kvm *kvm)
> > >  {
> > >  	int i;
> > >  	struct kvm_vcpu *vcpu;
> > >  
> > > -	kvm_for_each_vcpu(i, vcpu, kvm)
> > > -		kvm_arm_resume_vcpu(vcpu);
> > > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > > +		vcpu->arch.pause = false;
> > > +		swake_up(kvm_arch_vcpu_wq(vcpu));
> > > +	}
> > >  }
> > >  
> > >  static void vcpu_sleep(struct kvm_vcpu *vcpu)
> > > diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> > > index 2a5db13..c143add 100644
> > > --- a/virt/kvm/arm/vgic/vgic-mmio.c
> > > +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> > > @@ -231,23 +231,21 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
> > >   * be migrated while we don't hold the IRQ locks and we don't want to be
> > >   * chasing moving targets.
> > >   *
> > > - * For private interrupts, we only have to make sure the single and only VCPU
> > > - * that can potentially queue the IRQ is stopped.
> > > + * For private interrupts we don't have to do anything because userspace
> > > + * accesses to the VGIC state already require all VCPUs to be stopped, and
> > > + * only the VCPU itself can modify its private interrupts active state, which
> > > + * guarantees that the VCPU is not running.
> > >   */
> > >  static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
> > >  {
> > > -	if (intid < VGIC_NR_PRIVATE_IRQS)
> > > -		kvm_arm_halt_vcpu(vcpu);
> > > -	else
> > > +	if (intid > VGIC_NR_PRIVATE_IRQS)
> > >  		kvm_arm_halt_guest(vcpu->kvm);
> > >  }
> > >  
> > >  /* See vgic_change_active_prepare */
> > >  static void vgic_change_active_finish(struct kvm_vcpu *vcpu, u32 intid)
> > >  {
> > > -	if (intid < VGIC_NR_PRIVATE_IRQS)
> > > -		kvm_arm_resume_vcpu(vcpu);
> > > -	else
> > > +	if (intid > VGIC_NR_PRIVATE_IRQS)
> > >  		kvm_arm_resume_guest(vcpu->kvm);
> > >  }
> > >  
> > > @@ -258,6 +256,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
> > >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> > >  	int i;
> > >  
> > > +	mutex_lock(&vcpu->kvm->lock);
> > >  	vgic_change_active_prepare(vcpu, intid);
> > >  	for_each_set_bit(i, &val, len * 8) {
> > >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > > @@ -265,6 +264,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
> > >  		vgic_put_irq(vcpu->kvm, irq);
> > >  	}
> > >  	vgic_change_active_finish(vcpu, intid);
> > > +	mutex_unlock(&vcpu->kvm->lock);
> > >  }
> > >  
> > >  void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> > > @@ -274,6 +274,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> > >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> > >  	int i;
> > >  
> > > +	mutex_lock(&vcpu->kvm->lock);
> > >  	vgic_change_active_prepare(vcpu, intid);
> > >  	for_each_set_bit(i, &val, len * 8) {
> > >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > > @@ -281,6 +282,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> > >  		vgic_put_irq(vcpu->kvm, irq);
> > >  	}
> > >  	vgic_change_active_finish(vcpu, intid);
> > > +	mutex_unlock(&vcpu->kvm->lock);
> > >  }
> > >  
> > >  unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
> > 
> > Looks good to me. How about adding kvm->lock to the locking order comment
> > at the top of virt/kvm/arm/vgic/vgic.c too. With that, you can add my R-b
> > on the posting.
> > 
> > I'll rebase this series on your posting.
> > 
> 
> FYI, this patch is now in kvmarm/queue.

Just rebased the vcpu request series on it and tested. Bad news. This
patch immediately hangs the guest for me. Letting it sit for a couple
minutes gets the logs

[  243.691000] INFO: task qemu-kvm:1710 blocked for more than 120 seconds.
[  243.697591]       Not tainted 4.12.0-rc1+ #3
[  243.701860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.709653] qemu-kvm        D    0  1710      1 0x00000200
[  243.715132] Call trace:
[  243.717575] [<ffff0000080857d8>] __switch_to+0x64/0x70
[  243.722707] [<ffff0000087b1dd4>] __schedule+0x31c/0x854
[  243.727909] [<ffff0000087b2340>] schedule+0x34/0x8c
[  243.732778] [<ffff0000087b26f8>] schedule_preempt_disabled+0x14/0x1c
[  243.739105] [<ffff0000087b3730>] __mutex_lock.isra.8+0x170/0x49c
[  243.745098] [<ffff0000087b3a80>] __mutex_lock_slowpath+0x24/0x30
[  243.751092] [<ffff0000087b3acc>] mutex_lock+0x40/0x4c
[  243.756123] [<ffff0000080baa2c>] vgic_mmio_write_cactive+0x40/0x14c
[  243.762370] [<ffff0000080bb694>] vgic_uaccess+0xd0/0x104
[  243.767662] [<ffff0000080bc100>] vgic_v2_dist_uaccess+0x70/0x94
[  243.773568] [<ffff0000080bdc48>] vgic_v2_attr_regs_access.isra.6+0x108/0x110
[  243.780587] [<ffff0000080bddec>] vgic_v2_set_attr+0xc4/0xd4
[  243.786148] [<ffff0000080a1028>] kvm_device_ioctl_attr+0x7c/0xc8
[  243.792143] [<ffff0000080a10f8>] kvm_device_ioctl+0x84/0xd4
[  243.797692] [<ffff00000829025c>] do_vfs_ioctl+0xcc/0x7b4
[  243.802988] [<ffff0000082909d4>] SyS_ioctl+0x90/0xa4
[  243.807933] [<ffff0000080834a0>] __sys_trace_return+0x0/0x4

I don't have kdump or support for live crash setup right now to see what
else is holding the lock. Unfortunately I don't have time to set it up
either, as I'll be out of the office from now through the rest of the
week.

I'll still post v4 of the vcpu request series now. I've smoke tested
it with the vgic_mmio_write_cactive/sactive locking removed.

Thanks,
drew

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu
  2017-05-16  2:17         ` Andrew Jones
@ 2017-05-16 10:06           ` Christoffer Dall
  0 siblings, 0 replies; 43+ messages in thread
From: Christoffer Dall @ 2017-05-16 10:06 UTC (permalink / raw)
  To: Andrew Jones; +Cc: kvmarm, kvm, marc.zyngier, pbonzini, rkrcmar

On Tue, May 16, 2017 at 04:17:33AM +0200, Andrew Jones wrote:
> On Mon, May 15, 2017 at 01:14:42PM +0200, Christoffer Dall wrote:
> > On Tue, May 09, 2017 at 07:02:51PM +0200, Andrew Jones wrote:
> > > On Sat, May 06, 2017 at 08:08:09PM +0200, Christoffer Dall wrote:
> > > > On Wed, May 03, 2017 at 06:06:29PM +0200, Andrew Jones wrote:
> > > > > VCPU halting/resuming is partially implemented with VCPU requests.
> > > > > When kvm_arm_halt_guest() is called all VCPUs get the EXIT request,
> > > > > telling them to exit guest mode and look at the state of 'pause',
> > > > > which will be true, telling them to sleep.  As ARM's VCPU RUN
> > > > > implements the memory barrier pattern described in "Ensuring Requests
> > > > > Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst, there's
> > > > > no way for a VCPU halted by kvm_arm_halt_guest() to miss the pause
> > > > > state change.  However, before this patch, a single VCPU halted with
> > > > > kvm_arm_halt_vcpu() did not get a request, opening a tiny race window.
> > > > > This patch adds the request, closing the race window and also allowing
> > > > > us to remove the final check of pause in VCPU RUN, as the final check
> > > > > for requests is sufficient.
> > > > > 
> > > > > Signed-off-by: Andrew Jones <drjones@redhat.com>
> > > > > 
> > > > > ---
> > > > > 
> > > > > I have two questions about the halting/resuming.
> > > > > 
> > > > > Question 1:
> > > > > 
> > > > > Do we even need kvm_arm_halt_vcpu()/kvm_arm_resume_vcpu()? It should
> > > > > only be necessary if one VCPU can activate or inactivate the private
> > > > > IRQs of another VCPU, right?  That doesn't seem like something that
> > > > > should be possible, but I'm GIC-illiterate...
> > > > 
> > > > True, it shouldn't be possible.  I wonder if we were thinking of
> > > > userspace access to the CPU-specific data, but we already ensure that no
> > > > VCPUs are running at that time, so I don't think it should be necessary.
> > > > 
> > > > > 
> > > > > Question 2:
> > > > > 
> > > > > It's not clear to me if we have another problem with halting/resuming
> > > > > or not.  If it's possible for VCPU1 and VCPU2 to race in
> > > > > vgic_mmio_write_s/cactive(), then the following scenario could occur,
> > > > > leading to VCPU3 being in guest mode when it should not be.  Does the
> > > > > hardware prohibit more than one VCPU entering trap handlers that lead
> > > > > to these functions at the same time?  If not, then I guess pause needs
> > > > > to be a counter instead of a boolean.
> > > > > 
> > > > >  VCPU1                 VCPU2                  VCPU3
> > > > >  -----                 -----                  -----
> > > > >                        VCPU3->pause = true;
> > > > >                        halt(VCPU3);
> > > > >                                               if (pause)
> > > > >                                                 sleep();
> > > > >  VCPU3->pause = true;
> > > > >  halt(VCPU3);
> > > > >                        VCPU3->pause = false;
> > > > >                        resume(VCPU3);
> > > > >                                               ...wake up...
> > > > >                                               if (!pause)
> > > > >                                                 Enter guest mode. Bad!
> > > > >  VCPU3->pause = false;
> > > > >  resume(VCPU3);
> > > > > 
> > > > > (Yes, the "Bad!" is there to both identify something we don't want
> > > > >  occurring and to make fun of Trump's tweeting style.)
> > > > 
> > > > I think it's bad, and it might be even worse, because it could lead to a
> > > > CPU looping forever in the host kernel, since there's no guarantee to
> > > > exit from the VM in the other VCPU thread.
> > > > 
> > > > But I think simply taking the kvm->lock mutex to serialize the mmio
> > > > active change operations should be sufficient.
> > > > 
> > > > If we agree on this I can send a patch with your reported by that fixes
> > > > that issue, which gets rid of kvm_arm_halt_vcpu and requires you to
> > > > modify your first patch to clear the KVM_REQ_VCPU_EXIT flag for each
> > > > vcpu in kvm_arm_halt_guest instead and you can fold the remaining change
> > > > from this patch into a patch that completely gets rid of the pause flag.
> > > 
> > > Yup, seems reasonable to me to lock the kvm mutex on a stop the guest type
> > > action.
> > > 
> > > > 
> > > > See untested patch draft at the end of this mail.
> > > > 
> > > > Thanks,
> > > > -Christoffer
> > > > 
> > > > > ---
> > > > >  arch/arm/kvm/arm.c | 3 ++-
> > > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> > > > > index 47f6c7fdca96..9174ed13135a 100644
> > > > > --- a/arch/arm/kvm/arm.c
> > > > > +++ b/arch/arm/kvm/arm.c
> > > > > @@ -545,6 +545,7 @@ void kvm_arm_halt_guest(struct kvm *kvm)
> > > > >  void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> > > > >  {
> > > > >  	vcpu->arch.pause = true;
> > > > > +	kvm_make_request(KVM_REQ_VCPU_EXIT, vcpu);
> > > > >  	kvm_vcpu_kick(vcpu);
> > > > >  }
> > > > >  
> > > > > @@ -664,7 +665,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
> > > > >  
> > > > >  		if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) ||
> > > > >  		    kvm_request_pending(vcpu) ||
> > > > > -		    vcpu->arch.power_off || vcpu->arch.pause) {
> > > > > +		    vcpu->arch.power_off) {
> > > > >  			vcpu->mode = OUTSIDE_GUEST_MODE;
> > > > >  			local_irq_enable();
> > > > >  			kvm_pmu_sync_hwstate(vcpu);
> > > > > -- 
> > > > > 2.9.3
> > > > > 
> > > > 
> > > > 
> > > > Untested draft patch:
> > > > 
> > > > diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> > > > index d488b88..b77a3af 100644
> > > > --- a/arch/arm/include/asm/kvm_host.h
> > > > +++ b/arch/arm/include/asm/kvm_host.h
> > > > @@ -234,8 +234,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
> > > >  struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
> > > >  void kvm_arm_halt_guest(struct kvm *kvm);
> > > >  void kvm_arm_resume_guest(struct kvm *kvm);
> > > > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> > > > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
> > > >  
> > > >  int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
> > > >  unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
> > > > diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> > > > index 578df18..7a38d5a 100644
> > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > @@ -334,8 +334,6 @@ struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
> > > >  struct kvm_vcpu * __percpu *kvm_get_running_vcpus(void);
> > > >  void kvm_arm_halt_guest(struct kvm *kvm);
> > > >  void kvm_arm_resume_guest(struct kvm *kvm);
> > > > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu);
> > > > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu);
> > > >  
> > > >  u64 __kvm_call_hyp(void *hypfn, ...);
> > > >  #define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__)
> > > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > > index 7941699..932788a 100644
> > > > --- a/virt/kvm/arm/arm.c
> > > > +++ b/virt/kvm/arm/arm.c
> > > > @@ -542,27 +542,15 @@ void kvm_arm_halt_guest(struct kvm *kvm)
> > > >  	kvm_make_all_cpus_request(kvm, KVM_REQ_VCPU_EXIT);
> > > >  }
> > > >  
> > > > -void kvm_arm_halt_vcpu(struct kvm_vcpu *vcpu)
> > > > -{
> > > > -	vcpu->arch.pause = true;
> > > > -	kvm_vcpu_kick(vcpu);
> > > > -}
> > > > -
> > > > -void kvm_arm_resume_vcpu(struct kvm_vcpu *vcpu)
> > > > -{
> > > > -	struct swait_queue_head *wq = kvm_arch_vcpu_wq(vcpu);
> > > > -
> > > > -	vcpu->arch.pause = false;
> > > > -	swake_up(wq);
> > > > -}
> > > > -
> > > >  void kvm_arm_resume_guest(struct kvm *kvm)
> > > >  {
> > > >  	int i;
> > > >  	struct kvm_vcpu *vcpu;
> > > >  
> > > > -	kvm_for_each_vcpu(i, vcpu, kvm)
> > > > -		kvm_arm_resume_vcpu(vcpu);
> > > > +	kvm_for_each_vcpu(i, vcpu, kvm) {
> > > > +		vcpu->arch.pause = false;
> > > > +		swake_up(kvm_arch_vcpu_wq(vcpu));
> > > > +	}
> > > >  }
> > > >  
> > > >  static void vcpu_sleep(struct kvm_vcpu *vcpu)
> > > > diff --git a/virt/kvm/arm/vgic/vgic-mmio.c b/virt/kvm/arm/vgic/vgic-mmio.c
> > > > index 2a5db13..c143add 100644
> > > > --- a/virt/kvm/arm/vgic/vgic-mmio.c
> > > > +++ b/virt/kvm/arm/vgic/vgic-mmio.c
> > > > @@ -231,23 +231,21 @@ static void vgic_mmio_change_active(struct kvm_vcpu *vcpu, struct vgic_irq *irq,
> > > >   * be migrated while we don't hold the IRQ locks and we don't want to be
> > > >   * chasing moving targets.
> > > >   *
> > > > - * For private interrupts, we only have to make sure the single and only VCPU
> > > > - * that can potentially queue the IRQ is stopped.
> > > > + * For private interrupts we don't have to do anything because userspace
> > > > + * accesses to the VGIC state already require all VCPUs to be stopped, and
> > > > + * only the VCPU itself can modify its private interrupts active state, which
> > > > + * guarantees that the VCPU is not running.
> > > >   */
> > > >  static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid)
> > > >  {
> > > > -	if (intid < VGIC_NR_PRIVATE_IRQS)
> > > > -		kvm_arm_halt_vcpu(vcpu);
> > > > -	else
> > > > +	if (intid > VGIC_NR_PRIVATE_IRQS)
> > > >  		kvm_arm_halt_guest(vcpu->kvm);
> > > >  }
> > > >  
> > > >  /* See vgic_change_active_prepare */
> > > >  static void vgic_change_active_finish(struct kvm_vcpu *vcpu, u32 intid)
> > > >  {
> > > > -	if (intid < VGIC_NR_PRIVATE_IRQS)
> > > > -		kvm_arm_resume_vcpu(vcpu);
> > > > -	else
> > > > +	if (intid > VGIC_NR_PRIVATE_IRQS)
> > > >  		kvm_arm_resume_guest(vcpu->kvm);
> > > >  }
> > > >  
> > > > @@ -258,6 +256,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
> > > >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> > > >  	int i;
> > > >  
> > > > +	mutex_lock(&vcpu->kvm->lock);
> > > >  	vgic_change_active_prepare(vcpu, intid);
> > > >  	for_each_set_bit(i, &val, len * 8) {
> > > >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > > > @@ -265,6 +264,7 @@ void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,
> > > >  		vgic_put_irq(vcpu->kvm, irq);
> > > >  	}
> > > >  	vgic_change_active_finish(vcpu, intid);
> > > > +	mutex_unlock(&vcpu->kvm->lock);
> > > >  }
> > > >  
> > > >  void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> > > > @@ -274,6 +274,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> > > >  	u32 intid = VGIC_ADDR_TO_INTID(addr, 1);
> > > >  	int i;
> > > >  
> > > > +	mutex_lock(&vcpu->kvm->lock);
> > > >  	vgic_change_active_prepare(vcpu, intid);
> > > >  	for_each_set_bit(i, &val, len * 8) {
> > > >  		struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i);
> > > > @@ -281,6 +282,7 @@ void vgic_mmio_write_sactive(struct kvm_vcpu *vcpu,
> > > >  		vgic_put_irq(vcpu->kvm, irq);
> > > >  	}
> > > >  	vgic_change_active_finish(vcpu, intid);
> > > > +	mutex_unlock(&vcpu->kvm->lock);
> > > >  }
> > > >  
> > > >  unsigned long vgic_mmio_read_priority(struct kvm_vcpu *vcpu,
> > > 
> > > Looks good to me. How about adding kvm->lock to the locking order comment
> > > at the top of virt/kvm/arm/vgic/vgic.c too. With that, you can add my R-b
> > > on the posting.
> > > 
> > > I'll rebase this series on your posting.
> > > 
> > 
> > FYI, this patch is now in kvmarm/queue.
> 
> Just rebased the vcpu request series on it and tested. Bad news. This
> patch immediately hangs the guest for me. Letting it sit for a couple
> minutes gets the logs
> 
> [  243.691000] INFO: task qemu-kvm:1710 blocked for more than 120 seconds.
> [  243.697591]       Not tainted 4.12.0-rc1+ #3
> [  243.701860] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.709653] qemu-kvm        D    0  1710      1 0x00000200
> [  243.715132] Call trace:
> [  243.717575] [<ffff0000080857d8>] __switch_to+0x64/0x70
> [  243.722707] [<ffff0000087b1dd4>] __schedule+0x31c/0x854
> [  243.727909] [<ffff0000087b2340>] schedule+0x34/0x8c
> [  243.732778] [<ffff0000087b26f8>] schedule_preempt_disabled+0x14/0x1c
> [  243.739105] [<ffff0000087b3730>] __mutex_lock.isra.8+0x170/0x49c
> [  243.745098] [<ffff0000087b3a80>] __mutex_lock_slowpath+0x24/0x30
> [  243.751092] [<ffff0000087b3acc>] mutex_lock+0x40/0x4c
> [  243.756123] [<ffff0000080baa2c>] vgic_mmio_write_cactive+0x40/0x14c
> [  243.762370] [<ffff0000080bb694>] vgic_uaccess+0xd0/0x104
> [  243.767662] [<ffff0000080bc100>] vgic_v2_dist_uaccess+0x70/0x94
> [  243.773568] [<ffff0000080bdc48>] vgic_v2_attr_regs_access.isra.6+0x108/0x110
> [  243.780587] [<ffff0000080bddec>] vgic_v2_set_attr+0xc4/0xd4
> [  243.786148] [<ffff0000080a1028>] kvm_device_ioctl_attr+0x7c/0xc8
> [  243.792143] [<ffff0000080a10f8>] kvm_device_ioctl+0x84/0xd4
> [  243.797692] [<ffff00000829025c>] do_vfs_ioctl+0xcc/0x7b4
> [  243.802988] [<ffff0000082909d4>] SyS_ioctl+0x90/0xa4
> [  243.807933] [<ffff0000080834a0>] __sys_trace_return+0x0/0x4
> 
> I don't have kdump or support for live crash setup right now to see what
> else is holding the lock. Unfortunately I don't have time to set it up
> either, as I'll be out of the office from now through the rest of the
> week.
> 
> I'll still post v4 of the vcpu request series now. I've smoke tested
> it with the vgic_mmio_write_cactive/sactive locking removed.
> 
Duh, we were already holding the mutex when coming from the uaccess
patch, and my testing was utterly flawed because I had forgotten that I
had configured my grub boot default for a specific kernel, and thus I
didn't actually test this patch when I thought I did.

I've sent a small v2 series for that patch and removed it from
kvmarm/queue.

Thanks for the bug report.

-Christoffer

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2017-05-16 10:06 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-03 16:06 [PATCH v3 00/10] KVM: arm/arm64: race fixes and vcpu requests Andrew Jones
2017-05-03 16:06 ` [PATCH v3 01/10] KVM: add kvm_request_pending Andrew Jones
2017-05-03 16:06 ` [PATCH v3 02/10] KVM: Add documentation for VCPU requests Andrew Jones
2017-05-04 11:27   ` Paolo Bonzini
2017-05-04 12:06     ` Andrew Jones
2017-05-04 12:51       ` Paolo Bonzini
2017-05-04 13:31         ` Andrew Jones
2017-05-03 16:06 ` [PATCH v3 03/10] KVM: arm/arm64: prepare to use vcpu requests Andrew Jones
2017-05-03 16:06 ` [PATCH v3 04/10] KVM: arm/arm64: use vcpu request in kvm_arm_halt_vcpu Andrew Jones
2017-05-06 18:08   ` Christoffer Dall
2017-05-09 17:02     ` Andrew Jones
2017-05-10  9:59       ` Christoffer Dall
2017-05-15 11:14       ` Christoffer Dall
2017-05-16  2:17         ` Andrew Jones
2017-05-16 10:06           ` Christoffer Dall
2017-05-03 16:06 ` [PATCH v3 05/10] KVM: arm/arm64: don't clear exit request from caller Andrew Jones
2017-05-06 18:12   ` Christoffer Dall
2017-05-09 17:17     ` Andrew Jones
2017-05-10  9:55       ` Christoffer Dall
2017-05-10 10:07         ` Andrew Jones
2017-05-10 12:19           ` Christoffer Dall
2017-05-03 16:06 ` [PATCH v3 06/10] KVM: arm/arm64: use vcpu requests for power_off Andrew Jones
2017-05-06 18:17   ` Christoffer Dall
2017-05-03 16:06 ` [PATCH v3 07/10] KVM: arm/arm64: optimize VCPU RUN Andrew Jones
2017-05-06 18:27   ` Christoffer Dall
2017-05-09 17:40     ` Andrew Jones
2017-05-09 20:13       ` Christoffer Dall
2017-05-10  6:58         ` Andrew Jones
2017-05-10  8:07           ` Christoffer Dall
2017-05-10  8:20             ` Andrew Jones
2017-05-10  9:06               ` Christoffer Dall
2017-05-03 16:06 ` [PATCH v3 08/10] KVM: arm/arm64: change exit request to sleep request Andrew Jones
2017-05-04 11:38   ` Paolo Bonzini
2017-05-04 12:07     ` Andrew Jones
2017-05-03 16:06 ` [PATCH v3 09/10] KVM: arm/arm64: use vcpu requests for irq injection Andrew Jones
2017-05-04 11:47   ` Paolo Bonzini
2017-05-06 18:49     ` Christoffer Dall
2017-05-08  8:48       ` Paolo Bonzini
2017-05-08  8:56         ` Christoffer Dall
2017-05-06 18:51   ` Christoffer Dall
2017-05-09 17:53     ` Andrew Jones
2017-05-03 16:06 ` [PATCH v3 10/10] KVM: arm/arm64: PMU: remove request-less vcpu kick Andrew Jones
2017-05-06 18:55   ` Christoffer Dall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.