All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/23] Bug fixes and improvements for HV KVM
@ 2015-03-20  9:39 ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This is my current patch queue for HV KVM on PPC.  This series is
based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3
plus a set of recent KVM changes which don't intersect with the
changes in this series.  On top of that, in my testing I have some
patches which are not KVM-related but are needed to boot and run a
recent upstream kernel successfully:

tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
tick/hotplug: Handover time related duties before cpu offline
powerpc/powernv: Check image loaded or not before calling flash
powerpc/powernv: Fixes for hypervisor doorbell handling
powerpc/powernv: Fix return value from power7_nap() et al.
powerpc: Export __spin_yield

These patches have been posted by their authors and are on their way
upstream via various trees.  They are not included in this series.

The first three patches are bug fixes that should go into v4.0 if
possible.  The remainder are intended for the 4.1 merge window.

The patch "powerpc: Export __spin_yield" is a prerequisite for patch
9/23 of this series ("KVM: PPC: Book3S HV: Convert ICS mutex lock to
spin lock").  It is on its way upstream through the linuxppc-dev
mailing list.

The patch "powerpc/powernv: Fixes for hypervisor doorbell handling" is
needed for correct operation with patch 20/23, "KVM: PPC: Book3S HV:
Use msgsnd for signalling threads".  It is also on its way upstream
through the linuxppc-dev list.  I am expecting both of these
prerequisite patches to go into 4.0.

Finally, the last patch in this series converts some of the assembly
code in book3s_hv_rmhandlers.S into C.  I intend to continue this
trend.

Paul.

 Documentation/virtual/kvm/api.txt        |  17 +
 arch/powerpc/include/asm/archrandom.h    |  11 +-
 arch/powerpc/include/asm/kvm_book3s_64.h |  18 ++
 arch/powerpc/include/asm/kvm_host.h      |  45 ++-
 arch/powerpc/include/asm/kvm_ppc.h       |   2 +
 arch/powerpc/include/asm/time.h          |   3 +
 arch/powerpc/kernel/asm-offsets.c        |  19 +-
 arch/powerpc/kernel/time.c               |   6 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 189 +++++++++--
 arch/powerpc/kvm/book3s_hv.c             | 413 +++++++++++++++++-------
 arch/powerpc/kvm/book3s_hv_builtin.c     |  98 +++++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      |  25 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c     | 239 ++++++++++++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 524 +++++++++++++++++++++++--------
 arch/powerpc/kvm/book3s_xics.c           | 114 +++++--
 arch/powerpc/kvm/book3s_xics.h           |  13 +-
 arch/powerpc/kvm/powerpc.c               |   3 +
 arch/powerpc/platforms/powernv/rng.c     |  29 ++
 include/uapi/linux/kvm.h                 |   1 +
 virt/kvm/kvm_main.c                      |   1 +
 20 files changed, 1401 insertions(+), 369 deletions(-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 00/23] Bug fixes and improvements for HV KVM
@ 2015-03-20  9:39 ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This is my current patch queue for HV KVM on PPC.  This series is
based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3
plus a set of recent KVM changes which don't intersect with the
changes in this series.  On top of that, in my testing I have some
patches which are not KVM-related but are needed to boot and run a
recent upstream kernel successfully:

tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
tick/hotplug: Handover time related duties before cpu offline
powerpc/powernv: Check image loaded or not before calling flash
powerpc/powernv: Fixes for hypervisor doorbell handling
powerpc/powernv: Fix return value from power7_nap() et al.
powerpc: Export __spin_yield

These patches have been posted by their authors and are on their way
upstream via various trees.  They are not included in this series.

The first three patches are bug fixes that should go into v4.0 if
possible.  The remainder are intended for the 4.1 merge window.

The patch "powerpc: Export __spin_yield" is a prerequisite for patch
9/23 of this series ("KVM: PPC: Book3S HV: Convert ICS mutex lock to
spin lock").  It is on its way upstream through the linuxppc-dev
mailing list.

The patch "powerpc/powernv: Fixes for hypervisor doorbell handling" is
needed for correct operation with patch 20/23, "KVM: PPC: Book3S HV:
Use msgsnd for signalling threads".  It is also on its way upstream
through the linuxppc-dev list.  I am expecting both of these
prerequisite patches to go into 4.0.

Finally, the last patch in this series converts some of the assembly
code in book3s_hv_rmhandlers.S into C.  I intend to continue this
trend.

Paul.

 Documentation/virtual/kvm/api.txt        |  17 +
 arch/powerpc/include/asm/archrandom.h    |  11 +-
 arch/powerpc/include/asm/kvm_book3s_64.h |  18 ++
 arch/powerpc/include/asm/kvm_host.h      |  45 ++-
 arch/powerpc/include/asm/kvm_ppc.h       |   2 +
 arch/powerpc/include/asm/time.h          |   3 +
 arch/powerpc/kernel/asm-offsets.c        |  19 +-
 arch/powerpc/kernel/time.c               |   6 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 189 +++++++++--
 arch/powerpc/kvm/book3s_hv.c             | 413 +++++++++++++++++-------
 arch/powerpc/kvm/book3s_hv_builtin.c     |  98 +++++-
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      |  25 +-
 arch/powerpc/kvm/book3s_hv_rm_xics.c     | 239 ++++++++++++--
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 524 +++++++++++++++++++++++--------
 arch/powerpc/kvm/book3s_xics.c           | 114 +++++--
 arch/powerpc/kvm/book3s_xics.h           |  13 +-
 arch/powerpc/kvm/powerpc.c               |   3 +
 arch/powerpc/platforms/powernv/rng.c     |  29 ++
 include/uapi/linux/kvm.h                 |   1 +
 virt/kvm/kvm_main.c                      |   1 +
 20 files changed, 1401 insertions(+), 369 deletions(-)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 01/23] KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Currently, kvmppc_set_lpcr() has a spinlock around the whole function,
and inside that does mutex_lock(&kvm->lock).  It is not permitted to
take a mutex while holding a spinlock, because the mutex_lock might
call schedule().  In addition, this causes lockdep to warn about a
lock ordering issue:

======================================================
[ INFO: possible circular locking dependency detected ]
3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted
-------------------------------------------------------
qemu-system-ppc/8179 is trying to acquire lock:
 (&kvm->lock){+.+.+.}, at: [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]

but task is already holding lock:
 (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&vcore->lock)->rlock){+.+...}:
       [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
       [<d00000000ecc7a14>] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv]
       [<d00000000eb9f5cc>] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
       [<d00000000eb9cb24>] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
       [<d00000000eb94478>] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm]
       [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
       [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
       [<c000000000009264>] syscall_exit+0x0/0x98

-> #0 (&kvm->lock){+.+.+.}:
       [<c0000000000ff28c>] .lock_acquire+0xcc/0x1a0
       [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
       [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
       [<d00000000ecc510c>] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
       [<d00000000eb9f234>] .kvmppc_set_one_reg+0x44/0x330 [kvm]
       [<d00000000eb9c9dc>] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
       [<d00000000eb9ced4>] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
       [<d00000000eb940b0>] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
       [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
       [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
       [<c000000000009264>] syscall_exit+0x0/0x98

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&vcore->lock)->rlock);
                               lock(&kvm->lock);
                               lock(&(&vcore->lock)->rlock);
  lock(&kvm->lock);

 *** DEADLOCK ***

2 locks held by qemu-system-ppc/8179:
 #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f18>] .vcpu_load+0x28/0x90 [kvm]
 #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

stack backtrace:
CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 3.18.0-kvm-04645-gdfea862-dirty #131
Call Trace:
[c000001a66c0f310] [c000000000b486ac] .dump_stack+0x88/0xb4 (unreliable)
[c000001a66c0f390] [c0000000000f8bec] .print_circular_bug+0x27c/0x3d0
[c000001a66c0f440] [c0000000000fe9e8] .__lock_acquire+0x2028/0x2190
[c000001a66c0f5d0] [c0000000000ff28c] .lock_acquire+0xcc/0x1a0
[c000001a66c0f6a0] [c000000000b3c120] .mutex_lock_nested+0x80/0x570
[c000001a66c0f7c0] [d00000000ecc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
[c000001a66c0f860] [d00000000ecc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
[c000001a66c0f8d0] [d00000000eb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
[c000001a66c0f960] [d00000000eb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
[c000001a66c0f9f0] [d00000000eb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
[c000001a66c0faf0] [d00000000eb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
[c000001a66c0fcb0] [c00000000026cbb4] .do_vfs_ioctl+0x444/0x770
[c000001a66c0fd90] [c00000000026cfa4] .SyS_ioctl+0xc4/0xe0
[c000001a66c0fe30] [c000000000009264] syscall_exit+0x0/0x98

This fixes it by moving the mutex_lock()/mutex_unlock() pair outside
the spin-locked region.

Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de4018a..b273193 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -942,20 +942,20 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct kvm_vcpu *vcpu,
 static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 		bool preserve_top32)
 {
+	struct kvm *kvm = vcpu->kvm;
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	u64 mask;
 
+	mutex_lock(&kvm->lock);
 	spin_lock(&vc->lock);
 	/*
 	 * If ILE (interrupt little-endian) has changed, update the
 	 * MSR_LE bit in the intr_msr for each vcpu in this vcore.
 	 */
 	if ((new_lpcr & LPCR_ILE) != (vc->lpcr & LPCR_ILE)) {
-		struct kvm *kvm = vcpu->kvm;
 		struct kvm_vcpu *vcpu;
 		int i;
 
-		mutex_lock(&kvm->lock);
 		kvm_for_each_vcpu(i, vcpu, kvm) {
 			if (vcpu->arch.vcore != vc)
 				continue;
@@ -964,7 +964,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 			else
 				vcpu->arch.intr_msr &= ~MSR_LE;
 		}
-		mutex_unlock(&kvm->lock);
 	}
 
 	/*
@@ -981,6 +980,7 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 		mask &= 0xFFFFFFFF;
 	vc->lpcr = (vc->lpcr & ~mask) | (new_lpcr & mask);
 	spin_unlock(&vc->lock);
+	mutex_unlock(&kvm->lock);
 }
 
 static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 01/23] KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr()
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Currently, kvmppc_set_lpcr() has a spinlock around the whole function,
and inside that does mutex_lock(&kvm->lock).  It is not permitted to
take a mutex while holding a spinlock, because the mutex_lock might
call schedule().  In addition, this causes lockdep to warn about a
lock ordering issue:

===========================
[ INFO: possible circular locking dependency detected ]
3.18.0-kvm-04645-gdfea862-dirty #131 Not tainted
-------------------------------------------------------
qemu-system-ppc/8179 is trying to acquire lock:
 (&kvm->lock){+.+.+.}, at: [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]

but task is already holding lock:
 (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #1 (&(&vcore->lock)->rlock){+.+...}:
       [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
       [<d00000000ecc7a14>] .kvmppc_vcpu_run_hv+0xc4/0xe40 [kvm_hv]
       [<d00000000eb9f5cc>] .kvmppc_vcpu_run+0x2c/0x40 [kvm]
       [<d00000000eb9cb24>] .kvm_arch_vcpu_ioctl_run+0x54/0x160 [kvm]
       [<d00000000eb94478>] .kvm_vcpu_ioctl+0x4a8/0x7b0 [kvm]
       [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
       [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
       [<c000000000009264>] syscall_exit+0x0/0x98

-> #0 (&kvm->lock){+.+.+.}:
       [<c0000000000ff28c>] .lock_acquire+0xcc/0x1a0
       [<c000000000b3c120>] .mutex_lock_nested+0x80/0x570
       [<d00000000ecc1f54>] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
       [<d00000000ecc510c>] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
       [<d00000000eb9f234>] .kvmppc_set_one_reg+0x44/0x330 [kvm]
       [<d00000000eb9c9dc>] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
       [<d00000000eb9ced4>] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
       [<d00000000eb940b0>] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
       [<c00000000026cbb4>] .do_vfs_ioctl+0x444/0x770
       [<c00000000026cfa4>] .SyS_ioctl+0xc4/0xe0
       [<c000000000009264>] syscall_exit+0x0/0x98

other info that might help us debug this:

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(&(&vcore->lock)->rlock);
                               lock(&kvm->lock);
                               lock(&(&vcore->lock)->rlock);
  lock(&kvm->lock);

 *** DEADLOCK ***

2 locks held by qemu-system-ppc/8179:
 #0:  (&vcpu->mutex){+.+.+.}, at: [<d00000000eb93f18>] .vcpu_load+0x28/0x90 [kvm]
 #1:  (&(&vcore->lock)->rlock){+.+...}, at: [<d00000000ecc1ea0>] .kvmppc_set_lpcr+0x40/0x1c0 [kvm_hv]

stack backtrace:
CPU: 4 PID: 8179 Comm: qemu-system-ppc Not tainted 3.18.0-kvm-04645-gdfea862-dirty #131
Call Trace:
[c000001a66c0f310] [c000000000b486ac] .dump_stack+0x88/0xb4 (unreliable)
[c000001a66c0f390] [c0000000000f8bec] .print_circular_bug+0x27c/0x3d0
[c000001a66c0f440] [c0000000000fe9e8] .__lock_acquire+0x2028/0x2190
[c000001a66c0f5d0] [c0000000000ff28c] .lock_acquire+0xcc/0x1a0
[c000001a66c0f6a0] [c000000000b3c120] .mutex_lock_nested+0x80/0x570
[c000001a66c0f7c0] [d00000000ecc1f54] .kvmppc_set_lpcr+0xf4/0x1c0 [kvm_hv]
[c000001a66c0f860] [d00000000ecc510c] .kvmppc_set_one_reg_hv+0x4dc/0x990 [kvm_hv]
[c000001a66c0f8d0] [d00000000eb9f234] .kvmppc_set_one_reg+0x44/0x330 [kvm]
[c000001a66c0f960] [d00000000eb9c9dc] .kvm_vcpu_ioctl_set_one_reg+0x5c/0x150 [kvm]
[c000001a66c0f9f0] [d00000000eb9ced4] .kvm_arch_vcpu_ioctl+0x214/0x2c0 [kvm]
[c000001a66c0faf0] [d00000000eb940b0] .kvm_vcpu_ioctl+0xe0/0x7b0 [kvm]
[c000001a66c0fcb0] [c00000000026cbb4] .do_vfs_ioctl+0x444/0x770
[c000001a66c0fd90] [c00000000026cfa4] .SyS_ioctl+0xc4/0xe0
[c000001a66c0fe30] [c000000000009264] syscall_exit+0x0/0x98

This fixes it by moving the mutex_lock()/mutex_unlock() pair outside
the spin-locked region.

Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de4018a..b273193 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -942,20 +942,20 @@ static int kvm_arch_vcpu_ioctl_set_sregs_hv(struct kvm_vcpu *vcpu,
 static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 		bool preserve_top32)
 {
+	struct kvm *kvm = vcpu->kvm;
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 	u64 mask;
 
+	mutex_lock(&kvm->lock);
 	spin_lock(&vc->lock);
 	/*
 	 * If ILE (interrupt little-endian) has changed, update the
 	 * MSR_LE bit in the intr_msr for each vcpu in this vcore.
 	 */
 	if ((new_lpcr & LPCR_ILE) != (vc->lpcr & LPCR_ILE)) {
-		struct kvm *kvm = vcpu->kvm;
 		struct kvm_vcpu *vcpu;
 		int i;
 
-		mutex_lock(&kvm->lock);
 		kvm_for_each_vcpu(i, vcpu, kvm) {
 			if (vcpu->arch.vcore != vc)
 				continue;
@@ -964,7 +964,6 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 			else
 				vcpu->arch.intr_msr &= ~MSR_LE;
 		}
-		mutex_unlock(&kvm->lock);
 	}
 
 	/*
@@ -981,6 +980,7 @@ static void kvmppc_set_lpcr(struct kvm_vcpu *vcpu, u64 new_lpcr,
 		mask &= 0xFFFFFFFF;
 	vc->lpcr = (vc->lpcr & ~mask) | (new_lpcr & mask);
 	spin_unlock(&vc->lock);
+	mutex_unlock(&kvm->lock);
 }
 
 static int kvmppc_get_one_reg_hv(struct kvm_vcpu *vcpu, u64 id,
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/23] KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

The VPA (virtual processor area) is defined by PAPR and is therefore
big-endian, so we need a be32_to_cpu when reading it in
kvmppc_get_yield_count().  Without this, H_CONFER always fails on a
little-endian host, causing SMP guests to waste time spinning on
spinlocks.

Cc: stable@vger.kernel.org # v3.19
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b273193..de74756 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -636,7 +636,7 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
 	spin_lock(&vcpu->arch.vpa_update_lock);
 	lppaca = (struct lppaca *)vcpu->arch.vpa.pinned_addr;
 	if (lppaca)
-		yield_count = lppaca->yield_count;
+		yield_count = be32_to_cpu(lppaca->yield_count);
 	spin_unlock(&vcpu->arch.vpa_update_lock);
 	return yield_count;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 02/23] KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

The VPA (virtual processor area) is defined by PAPR and is therefore
big-endian, so we need a be32_to_cpu when reading it in
kvmppc_get_yield_count().  Without this, H_CONFER always fails on a
little-endian host, causing SMP guests to waste time spinning on
spinlocks.

Cc: stable@vger.kernel.org # v3.19
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index b273193..de74756 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -636,7 +636,7 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
 	spin_lock(&vcpu->arch.vpa_update_lock);
 	lppaca = (struct lppaca *)vcpu->arch.vpa.pinned_addr;
 	if (lppaca)
-		yield_count = lppaca->yield_count;
+		yield_count = be32_to_cpu(lppaca->yield_count);
 	spin_unlock(&vcpu->arch.vpa_update_lock);
 	return yield_count;
 }
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/23] KVM: PPC: Book3S HV: Fix instruction emulation
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Commit 4a157d61b48c ("KVM: PPC: Book3S HV: Fix endianness of
instruction obtained from HEIR register") had the side effect that
we no longer reset vcpu->arch.last_inst to -1 on guest exit in
the cases where the instruction is not fetched from the guest.
This means that if instruction emulation turns out to be required
in those cases, the host will emulate the wrong instruction, since
vcpu->arch.last_inst will contain the last instruction that was
emulated.

This fixes it by making sure that vcpu->arch.last_inst is reset
to -1 in those cases.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bb94e6f..6cbf163 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1005,6 +1005,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	/* Save HEIR (HV emulation assist reg) in emul_inst
 	   if this is an HEI (HV emulation interrupt, e40) */
 	li	r3,KVM_INST_FETCH_FAILED
+	stw	r3,VCPU_LAST_INST(r9)
 	cmpwi	r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST
 	bne	11f
 	mfspr	r3,SPRN_HEIR
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 03/23] KVM: PPC: Book3S HV: Fix instruction emulation
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Commit 4a157d61b48c ("KVM: PPC: Book3S HV: Fix endianness of
instruction obtained from HEIR register") had the side effect that
we no longer reset vcpu->arch.last_inst to -1 on guest exit in
the cases where the instruction is not fetched from the guest.
This means that if instruction emulation turns out to be required
in those cases, the host will emulate the wrong instruction, since
vcpu->arch.last_inst will contain the last instruction that was
emulated.

This fixes it by making sure that vcpu->arch.last_inst is reset
to -1 in those cases.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index bb94e6f..6cbf163 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1005,6 +1005,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	/* Save HEIR (HV emulation assist reg) in emul_inst
 	   if this is an HEI (HV emulation interrupt, e40) */
 	li	r3,KVM_INST_FETCH_FAILED
+	stw	r3,VCPU_LAST_INST(r9)
 	cmpwi	r12,BOOK3S_INTERRUPT_H_EMUL_ASSIST
 	bne	11f
 	mfspr	r3,SPRN_HEIR
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/23] KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Michael Ellerman

From: Michael Ellerman <michael@ellerman.id.au>

Some PowerNV systems include a hardware random-number generator.
This HWRNG is present on POWER7+ and POWER8 chips and is capable of
generating one 64-bit random number every microsecond.  The random
numbers are produced by sampling a set of 64 unstable high-frequency
oscillators and are almost completely entropic.

PAPR defines an H_RANDOM hypercall which guests can use to obtain one
64-bit random sample from the HWRNG.  This adds a real-mode
implementation of the H_RANDOM hypercall.  This hypercall was
implemented in real mode because the latency of reading the HWRNG is
generally small compared to the latency of a guest exit and entry for
all the threads in the same virtual core.

Userspace can detect the presence of the HWRNG and the H_RANDOM
implementation by quering the KVM_CAP_PPC_HWRNG capability.  The
H_RANDOM hypercall implementation will only be invoked when the guest
does an H_RANDOM hypercall if userspace first enables the in-kernel
H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 Documentation/virtual/kvm/api.txt       |  17 +++++
 arch/powerpc/include/asm/archrandom.h   |  11 ++-
 arch/powerpc/include/asm/kvm_ppc.h      |   2 +
 arch/powerpc/kvm/book3s_hv_builtin.c    |  15 +++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 115 ++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c              |   3 +
 arch/powerpc/platforms/powernv/rng.c    |  29 ++++++++
 include/uapi/linux/kvm.h                |   1 +
 8 files changed, 191 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index b112efc..ce10b48 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3248,3 +3248,20 @@ All other orders will be handled completely in user space.
 Only privileged operation exceptions will be checked for in the kernel (or even
 in the hardware prior to interception). If this capability is not enabled, the
 old way of handling SIGP orders is used (partially in kernel and user space).
+
+
+8. Other capabilities.
+----------------------
+
+This section lists capabilities that give information about other
+features of the KVM implementation.
+
+8.1 KVM_CAP_PPC_HWRNG
+
+Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel has an implementation of the
+H_RANDOM hypercall backed by a hardware random-number generator.
+If present, the kernel H_RANDOM handler can be enabled for guest use
+with the KVM_CAP_PPC_ENABLE_HCALL capability.
diff --git a/arch/powerpc/include/asm/archrandom.h b/arch/powerpc/include/asm/archrandom.h
index bde5311..0cc6eed 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -30,8 +30,6 @@ static inline int arch_has_random(void)
 	return !!ppc_md.get_random_long;
 }
 
-int powernv_get_random_long(unsigned long *v);
-
 static inline int arch_get_random_seed_long(unsigned long *v)
 {
 	return 0;
@@ -47,4 +45,13 @@ static inline int arch_has_random_seed(void)
 
 #endif /* CONFIG_ARCH_RANDOM */
 
+#ifdef CONFIG_PPC_POWERNV
+int powernv_hwrng_present(void);
+int powernv_get_random_long(unsigned long *v);
+int powernv_get_random_real_mode(unsigned long *v);
+#else
+static inline int powernv_hwrng_present(void) { return 0; }
+static inline int powernv_get_random_real_mode(unsigned long *v) { return 0; }
+#endif
+
 #endif /* _ASM_POWERPC_ARCHRANDOM_H */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 46bf652..b8475da 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -302,6 +302,8 @@ static inline bool is_kvmppc_hv_enabled(struct kvm *kvm)
 	return kvm->arch.kvm_ops == kvmppc_hv_ops;
 }
 
+extern int kvmppc_hwrng_present(void);
+
 /*
  * Cuts out inst bits with ordering according to spec.
  * That means the leftmost bit is zero. All given bits are included.
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1f083ff..1954a1c 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -21,6 +21,7 @@
 #include <asm/cputable.h>
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
+#include <asm/archrandom.h>
 
 #define KVM_CMA_CHUNK_ORDER	18
 
@@ -169,3 +170,17 @@ int kvmppc_hcall_impl_hv_realmode(unsigned long cmd)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode);
+
+int kvmppc_hwrng_present(void)
+{
+	return powernv_hwrng_present();
+}
+EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
+
+long kvmppc_h_random(struct kvm_vcpu *vcpu)
+{
+	if (powernv_get_random_real_mode(&vcpu->arch.gpr[4]))
+		return H_SUCCESS;
+
+	return H_HARDWARE;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6cbf163..0814ca1 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1839,6 +1839,121 @@ hcall_real_table:
 	.long	0		/* 0x12c */
 	.long	0		/* 0x130 */
 	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
+	.long	0		/* 0x138 */
+	.long	0		/* 0x13c */
+	.long	0		/* 0x140 */
+	.long	0		/* 0x144 */
+	.long	0		/* 0x148 */
+	.long	0		/* 0x14c */
+	.long	0		/* 0x150 */
+	.long	0		/* 0x154 */
+	.long	0		/* 0x158 */
+	.long	0		/* 0x15c */
+	.long	0		/* 0x160 */
+	.long	0		/* 0x164 */
+	.long	0		/* 0x168 */
+	.long	0		/* 0x16c */
+	.long	0		/* 0x170 */
+	.long	0		/* 0x174 */
+	.long	0		/* 0x178 */
+	.long	0		/* 0x17c */
+	.long	0		/* 0x180 */
+	.long	0		/* 0x184 */
+	.long	0		/* 0x188 */
+	.long	0		/* 0x18c */
+	.long	0		/* 0x190 */
+	.long	0		/* 0x194 */
+	.long	0		/* 0x198 */
+	.long	0		/* 0x19c */
+	.long	0		/* 0x1a0 */
+	.long	0		/* 0x1a4 */
+	.long	0		/* 0x1a8 */
+	.long	0		/* 0x1ac */
+	.long	0		/* 0x1b0 */
+	.long	0		/* 0x1b4 */
+	.long	0		/* 0x1b8 */
+	.long	0		/* 0x1bc */
+	.long	0		/* 0x1c0 */
+	.long	0		/* 0x1c4 */
+	.long	0		/* 0x1c8 */
+	.long	0		/* 0x1cc */
+	.long	0		/* 0x1d0 */
+	.long	0		/* 0x1d4 */
+	.long	0		/* 0x1d8 */
+	.long	0		/* 0x1dc */
+	.long	0		/* 0x1e0 */
+	.long	0		/* 0x1e4 */
+	.long	0		/* 0x1e8 */
+	.long	0		/* 0x1ec */
+	.long	0		/* 0x1f0 */
+	.long	0		/* 0x1f4 */
+	.long	0		/* 0x1f8 */
+	.long	0		/* 0x1fc */
+	.long	0		/* 0x200 */
+	.long	0		/* 0x204 */
+	.long	0		/* 0x208 */
+	.long	0		/* 0x20c */
+	.long	0		/* 0x210 */
+	.long	0		/* 0x214 */
+	.long	0		/* 0x218 */
+	.long	0		/* 0x21c */
+	.long	0		/* 0x220 */
+	.long	0		/* 0x224 */
+	.long	0		/* 0x228 */
+	.long	0		/* 0x22c */
+	.long	0		/* 0x230 */
+	.long	0		/* 0x234 */
+	.long	0		/* 0x238 */
+	.long	0		/* 0x23c */
+	.long	0		/* 0x240 */
+	.long	0		/* 0x244 */
+	.long	0		/* 0x248 */
+	.long	0		/* 0x24c */
+	.long	0		/* 0x250 */
+	.long	0		/* 0x254 */
+	.long	0		/* 0x258 */
+	.long	0		/* 0x25c */
+	.long	0		/* 0x260 */
+	.long	0		/* 0x264 */
+	.long	0		/* 0x268 */
+	.long	0		/* 0x26c */
+	.long	0		/* 0x270 */
+	.long	0		/* 0x274 */
+	.long	0		/* 0x278 */
+	.long	0		/* 0x27c */
+	.long	0		/* 0x280 */
+	.long	0		/* 0x284 */
+	.long	0		/* 0x288 */
+	.long	0		/* 0x28c */
+	.long	0		/* 0x290 */
+	.long	0		/* 0x294 */
+	.long	0		/* 0x298 */
+	.long	0		/* 0x29c */
+	.long	0		/* 0x2a0 */
+	.long	0		/* 0x2a4 */
+	.long	0		/* 0x2a8 */
+	.long	0		/* 0x2ac */
+	.long	0		/* 0x2b0 */
+	.long	0		/* 0x2b4 */
+	.long	0		/* 0x2b8 */
+	.long	0		/* 0x2bc */
+	.long	0		/* 0x2c0 */
+	.long	0		/* 0x2c4 */
+	.long	0		/* 0x2c8 */
+	.long	0		/* 0x2cc */
+	.long	0		/* 0x2d0 */
+	.long	0		/* 0x2d4 */
+	.long	0		/* 0x2d8 */
+	.long	0		/* 0x2dc */
+	.long	0		/* 0x2e0 */
+	.long	0		/* 0x2e4 */
+	.long	0		/* 0x2e8 */
+	.long	0		/* 0x2ec */
+	.long	0		/* 0x2f0 */
+	.long	0		/* 0x2f4 */
+	.long	0		/* 0x2f8 */
+	.long	0		/* 0x2fc */
+	.long	DOTSYM(kvmppc_h_random) - hcall_real_table
 	.globl	hcall_real_table_end
 hcall_real_table_end:
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27c0fac..c645cf6 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -529,6 +529,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_PPC_RMA:
 		r = 0;
 		break;
+	case KVM_CAP_PPC_HWRNG:
+		r = kvmppc_hwrng_present();
+		break;
 #endif
 	case KVM_CAP_SYNC_MMU:
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
diff --git a/arch/powerpc/platforms/powernv/rng.c b/arch/powerpc/platforms/powernv/rng.c
index 80db439..6eb808f 100644
--- a/arch/powerpc/platforms/powernv/rng.c
+++ b/arch/powerpc/platforms/powernv/rng.c
@@ -24,12 +24,22 @@
 
 struct powernv_rng {
 	void __iomem *regs;
+	void __iomem *regs_real;
 	unsigned long mask;
 };
 
 static DEFINE_PER_CPU(struct powernv_rng *, powernv_rng);
 
 
+int powernv_hwrng_present(void)
+{
+	struct powernv_rng *rng;
+
+	rng = get_cpu_var(powernv_rng);
+	put_cpu_var(rng);
+	return rng != NULL;
+}
+
 static unsigned long rng_whiten(struct powernv_rng *rng, unsigned long val)
 {
 	unsigned long parity;
@@ -46,6 +56,17 @@ static unsigned long rng_whiten(struct powernv_rng *rng, unsigned long val)
 	return val;
 }
 
+int powernv_get_random_real_mode(unsigned long *v)
+{
+	struct powernv_rng *rng;
+
+	rng = raw_cpu_read(powernv_rng);
+
+	*v = rng_whiten(rng, in_rm64(rng->regs_real));
+
+	return 1;
+}
+
 int powernv_get_random_long(unsigned long *v)
 {
 	struct powernv_rng *rng;
@@ -80,12 +101,20 @@ static __init void rng_init_per_cpu(struct powernv_rng *rng,
 static __init int rng_create(struct device_node *dn)
 {
 	struct powernv_rng *rng;
+	struct resource res;
 	unsigned long val;
 
 	rng = kzalloc(sizeof(*rng), GFP_KERNEL);
 	if (!rng)
 		return -ENOMEM;
 
+	if (of_address_to_resource(dn, 0, &res)) {
+		kfree(rng);
+		return -ENXIO;
+	}
+
+	rng->regs_real = (void __iomem *)res.start;
+
 	rng->regs = of_iomap(dn, 0);
 	if (!rng->regs) {
 		kfree(rng);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8055706..896ec84 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -760,6 +760,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ENABLE_HCALL 104
 #define KVM_CAP_CHECK_EXTENSION_VM 105
 #define KVM_CAP_S390_USER_SIGP 106
+#define KVM_CAP_PPC_HWRNG 107
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 04/23] KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation.
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Michael Ellerman

From: Michael Ellerman <michael@ellerman.id.au>

Some PowerNV systems include a hardware random-number generator.
This HWRNG is present on POWER7+ and POWER8 chips and is capable of
generating one 64-bit random number every microsecond.  The random
numbers are produced by sampling a set of 64 unstable high-frequency
oscillators and are almost completely entropic.

PAPR defines an H_RANDOM hypercall which guests can use to obtain one
64-bit random sample from the HWRNG.  This adds a real-mode
implementation of the H_RANDOM hypercall.  This hypercall was
implemented in real mode because the latency of reading the HWRNG is
generally small compared to the latency of a guest exit and entry for
all the threads in the same virtual core.

Userspace can detect the presence of the HWRNG and the H_RANDOM
implementation by quering the KVM_CAP_PPC_HWRNG capability.  The
H_RANDOM hypercall implementation will only be invoked when the guest
does an H_RANDOM hypercall if userspace first enables the in-kernel
H_RANDOM implementation using the KVM_CAP_PPC_ENABLE_HCALL capability.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 Documentation/virtual/kvm/api.txt       |  17 +++++
 arch/powerpc/include/asm/archrandom.h   |  11 ++-
 arch/powerpc/include/asm/kvm_ppc.h      |   2 +
 arch/powerpc/kvm/book3s_hv_builtin.c    |  15 +++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 115 ++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/powerpc.c              |   3 +
 arch/powerpc/platforms/powernv/rng.c    |  29 ++++++++
 include/uapi/linux/kvm.h                |   1 +
 8 files changed, 191 insertions(+), 2 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index b112efc..ce10b48 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3248,3 +3248,20 @@ All other orders will be handled completely in user space.
 Only privileged operation exceptions will be checked for in the kernel (or even
 in the hardware prior to interception). If this capability is not enabled, the
 old way of handling SIGP orders is used (partially in kernel and user space).
+
+
+8. Other capabilities.
+----------------------
+
+This section lists capabilities that give information about other
+features of the KVM implementation.
+
+8.1 KVM_CAP_PPC_HWRNG
+
+Architectures: ppc
+
+This capability, if KVM_CHECK_EXTENSION indicates that it is
+available, means that that the kernel has an implementation of the
+H_RANDOM hypercall backed by a hardware random-number generator.
+If present, the kernel H_RANDOM handler can be enabled for guest use
+with the KVM_CAP_PPC_ENABLE_HCALL capability.
diff --git a/arch/powerpc/include/asm/archrandom.h b/arch/powerpc/include/asm/archrandom.h
index bde5311..0cc6eed 100644
--- a/arch/powerpc/include/asm/archrandom.h
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -30,8 +30,6 @@ static inline int arch_has_random(void)
 	return !!ppc_md.get_random_long;
 }
 
-int powernv_get_random_long(unsigned long *v);
-
 static inline int arch_get_random_seed_long(unsigned long *v)
 {
 	return 0;
@@ -47,4 +45,13 @@ static inline int arch_has_random_seed(void)
 
 #endif /* CONFIG_ARCH_RANDOM */
 
+#ifdef CONFIG_PPC_POWERNV
+int powernv_hwrng_present(void);
+int powernv_get_random_long(unsigned long *v);
+int powernv_get_random_real_mode(unsigned long *v);
+#else
+static inline int powernv_hwrng_present(void) { return 0; }
+static inline int powernv_get_random_real_mode(unsigned long *v) { return 0; }
+#endif
+
 #endif /* _ASM_POWERPC_ARCHRANDOM_H */
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 46bf652..b8475da 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -302,6 +302,8 @@ static inline bool is_kvmppc_hv_enabled(struct kvm *kvm)
 	return kvm->arch.kvm_ops = kvmppc_hv_ops;
 }
 
+extern int kvmppc_hwrng_present(void);
+
 /*
  * Cuts out inst bits with ordering according to spec.
  * That means the leftmost bit is zero. All given bits are included.
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1f083ff..1954a1c 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -21,6 +21,7 @@
 #include <asm/cputable.h>
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
+#include <asm/archrandom.h>
 
 #define KVM_CMA_CHUNK_ORDER	18
 
@@ -169,3 +170,17 @@ int kvmppc_hcall_impl_hv_realmode(unsigned long cmd)
 	return 0;
 }
 EXPORT_SYMBOL_GPL(kvmppc_hcall_impl_hv_realmode);
+
+int kvmppc_hwrng_present(void)
+{
+	return powernv_hwrng_present();
+}
+EXPORT_SYMBOL_GPL(kvmppc_hwrng_present);
+
+long kvmppc_h_random(struct kvm_vcpu *vcpu)
+{
+	if (powernv_get_random_real_mode(&vcpu->arch.gpr[4]))
+		return H_SUCCESS;
+
+	return H_HARDWARE;
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 6cbf163..0814ca1 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1839,6 +1839,121 @@ hcall_real_table:
 	.long	0		/* 0x12c */
 	.long	0		/* 0x130 */
 	.long	DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table
+	.long	0		/* 0x138 */
+	.long	0		/* 0x13c */
+	.long	0		/* 0x140 */
+	.long	0		/* 0x144 */
+	.long	0		/* 0x148 */
+	.long	0		/* 0x14c */
+	.long	0		/* 0x150 */
+	.long	0		/* 0x154 */
+	.long	0		/* 0x158 */
+	.long	0		/* 0x15c */
+	.long	0		/* 0x160 */
+	.long	0		/* 0x164 */
+	.long	0		/* 0x168 */
+	.long	0		/* 0x16c */
+	.long	0		/* 0x170 */
+	.long	0		/* 0x174 */
+	.long	0		/* 0x178 */
+	.long	0		/* 0x17c */
+	.long	0		/* 0x180 */
+	.long	0		/* 0x184 */
+	.long	0		/* 0x188 */
+	.long	0		/* 0x18c */
+	.long	0		/* 0x190 */
+	.long	0		/* 0x194 */
+	.long	0		/* 0x198 */
+	.long	0		/* 0x19c */
+	.long	0		/* 0x1a0 */
+	.long	0		/* 0x1a4 */
+	.long	0		/* 0x1a8 */
+	.long	0		/* 0x1ac */
+	.long	0		/* 0x1b0 */
+	.long	0		/* 0x1b4 */
+	.long	0		/* 0x1b8 */
+	.long	0		/* 0x1bc */
+	.long	0		/* 0x1c0 */
+	.long	0		/* 0x1c4 */
+	.long	0		/* 0x1c8 */
+	.long	0		/* 0x1cc */
+	.long	0		/* 0x1d0 */
+	.long	0		/* 0x1d4 */
+	.long	0		/* 0x1d8 */
+	.long	0		/* 0x1dc */
+	.long	0		/* 0x1e0 */
+	.long	0		/* 0x1e4 */
+	.long	0		/* 0x1e8 */
+	.long	0		/* 0x1ec */
+	.long	0		/* 0x1f0 */
+	.long	0		/* 0x1f4 */
+	.long	0		/* 0x1f8 */
+	.long	0		/* 0x1fc */
+	.long	0		/* 0x200 */
+	.long	0		/* 0x204 */
+	.long	0		/* 0x208 */
+	.long	0		/* 0x20c */
+	.long	0		/* 0x210 */
+	.long	0		/* 0x214 */
+	.long	0		/* 0x218 */
+	.long	0		/* 0x21c */
+	.long	0		/* 0x220 */
+	.long	0		/* 0x224 */
+	.long	0		/* 0x228 */
+	.long	0		/* 0x22c */
+	.long	0		/* 0x230 */
+	.long	0		/* 0x234 */
+	.long	0		/* 0x238 */
+	.long	0		/* 0x23c */
+	.long	0		/* 0x240 */
+	.long	0		/* 0x244 */
+	.long	0		/* 0x248 */
+	.long	0		/* 0x24c */
+	.long	0		/* 0x250 */
+	.long	0		/* 0x254 */
+	.long	0		/* 0x258 */
+	.long	0		/* 0x25c */
+	.long	0		/* 0x260 */
+	.long	0		/* 0x264 */
+	.long	0		/* 0x268 */
+	.long	0		/* 0x26c */
+	.long	0		/* 0x270 */
+	.long	0		/* 0x274 */
+	.long	0		/* 0x278 */
+	.long	0		/* 0x27c */
+	.long	0		/* 0x280 */
+	.long	0		/* 0x284 */
+	.long	0		/* 0x288 */
+	.long	0		/* 0x28c */
+	.long	0		/* 0x290 */
+	.long	0		/* 0x294 */
+	.long	0		/* 0x298 */
+	.long	0		/* 0x29c */
+	.long	0		/* 0x2a0 */
+	.long	0		/* 0x2a4 */
+	.long	0		/* 0x2a8 */
+	.long	0		/* 0x2ac */
+	.long	0		/* 0x2b0 */
+	.long	0		/* 0x2b4 */
+	.long	0		/* 0x2b8 */
+	.long	0		/* 0x2bc */
+	.long	0		/* 0x2c0 */
+	.long	0		/* 0x2c4 */
+	.long	0		/* 0x2c8 */
+	.long	0		/* 0x2cc */
+	.long	0		/* 0x2d0 */
+	.long	0		/* 0x2d4 */
+	.long	0		/* 0x2d8 */
+	.long	0		/* 0x2dc */
+	.long	0		/* 0x2e0 */
+	.long	0		/* 0x2e4 */
+	.long	0		/* 0x2e8 */
+	.long	0		/* 0x2ec */
+	.long	0		/* 0x2f0 */
+	.long	0		/* 0x2f4 */
+	.long	0		/* 0x2f8 */
+	.long	0		/* 0x2fc */
+	.long	DOTSYM(kvmppc_h_random) - hcall_real_table
 	.globl	hcall_real_table_end
 hcall_real_table_end:
 
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27c0fac..c645cf6 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -529,6 +529,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_PPC_RMA:
 		r = 0;
 		break;
+	case KVM_CAP_PPC_HWRNG:
+		r = kvmppc_hwrng_present();
+		break;
 #endif
 	case KVM_CAP_SYNC_MMU:
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
diff --git a/arch/powerpc/platforms/powernv/rng.c b/arch/powerpc/platforms/powernv/rng.c
index 80db439..6eb808f 100644
--- a/arch/powerpc/platforms/powernv/rng.c
+++ b/arch/powerpc/platforms/powernv/rng.c
@@ -24,12 +24,22 @@
 
 struct powernv_rng {
 	void __iomem *regs;
+	void __iomem *regs_real;
 	unsigned long mask;
 };
 
 static DEFINE_PER_CPU(struct powernv_rng *, powernv_rng);
 
 
+int powernv_hwrng_present(void)
+{
+	struct powernv_rng *rng;
+
+	rng = get_cpu_var(powernv_rng);
+	put_cpu_var(rng);
+	return rng != NULL;
+}
+
 static unsigned long rng_whiten(struct powernv_rng *rng, unsigned long val)
 {
 	unsigned long parity;
@@ -46,6 +56,17 @@ static unsigned long rng_whiten(struct powernv_rng *rng, unsigned long val)
 	return val;
 }
 
+int powernv_get_random_real_mode(unsigned long *v)
+{
+	struct powernv_rng *rng;
+
+	rng = raw_cpu_read(powernv_rng);
+
+	*v = rng_whiten(rng, in_rm64(rng->regs_real));
+
+	return 1;
+}
+
 int powernv_get_random_long(unsigned long *v)
 {
 	struct powernv_rng *rng;
@@ -80,12 +101,20 @@ static __init void rng_init_per_cpu(struct powernv_rng *rng,
 static __init int rng_create(struct device_node *dn)
 {
 	struct powernv_rng *rng;
+	struct resource res;
 	unsigned long val;
 
 	rng = kzalloc(sizeof(*rng), GFP_KERNEL);
 	if (!rng)
 		return -ENOMEM;
 
+	if (of_address_to_resource(dn, 0, &res)) {
+		kfree(rng);
+		return -ENXIO;
+	}
+
+	rng->regs_real = (void __iomem *)res.start;
+
 	rng->regs = of_iomap(dn, 0);
 	if (!rng->regs) {
 		kfree(rng);
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8055706..896ec84 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -760,6 +760,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ENABLE_HCALL 104
 #define KVM_CAP_CHECK_EXTENSION_VM 105
 #define KVM_CAP_S390_USER_SIGP 106
+#define KVM_CAP_PPC_HWRNG 107
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/23] KVM: PPC: Book3S HV: Remove RMA-related variables from code
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We don't support real-mode areas now that 970 support is removed.
Remove the remaining details of rma from the code.  Also rename
rma_setup_done to hpte_setup_done to better reflect the changes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |  3 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++++++++++++++--------------
 arch/powerpc/kvm/book3s_hv.c        | 10 +++++-----
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 8ef0512..015773f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -228,9 +228,8 @@ struct kvm_arch {
 	int tlbie_lock;
 	unsigned long lpcr;
 	unsigned long rmor;
-	struct kvm_rma_info *rma;
 	unsigned long vrma_slb_v;
-	int rma_setup_done;
+	int hpte_setup_done;
 	u32 hpt_order;
 	atomic_t vcpus_running;
 	u32 online_vcores;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3..dbf1271 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
 	long order;
 
 	mutex_lock(&kvm->lock);
-	if (kvm->arch.rma_setup_done) {
-		kvm->arch.rma_setup_done = 0;
-		/* order rma_setup_done vs. vcpus_running */
+	if (kvm->arch.hpte_setup_done) {
+		kvm->arch.hpte_setup_done = 0;
+		/* order hpte_setup_done vs. vcpus_running */
 		smp_mb();
 		if (atomic_read(&kvm->arch.vcpus_running)) {
-			kvm->arch.rma_setup_done = 1;
+			kvm->arch.hpte_setup_done = 1;
 			goto out;
 		}
 	}
@@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 	unsigned long tmp[2];
 	ssize_t nb;
 	long int err, ret;
-	int rma_setup;
+	int hpte_setup;
 
 	if (!access_ok(VERIFY_READ, buf, count))
 		return -EFAULT;
 
 	/* lock out vcpus from running while we're doing this */
 	mutex_lock(&kvm->lock);
-	rma_setup = kvm->arch.rma_setup_done;
-	if (rma_setup) {
-		kvm->arch.rma_setup_done = 0;	/* temporarily */
-		/* order rma_setup_done vs. vcpus_running */
+	hpte_setup = kvm->arch.hpte_setup_done;
+	if (hpte_setup) {
+		kvm->arch.hpte_setup_done = 0;	/* temporarily */
+		/* order hpte_setup_done vs. vcpus_running */
 		smp_mb();
 		if (atomic_read(&kvm->arch.vcpus_running)) {
-			kvm->arch.rma_setup_done = 1;
+			kvm->arch.hpte_setup_done = 1;
 			mutex_unlock(&kvm->lock);
 			return -EBUSY;
 		}
@@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 				       "r=%lx\n", ret, i, v, r);
 				goto out;
 			}
-			if (!rma_setup && is_vrma_hpte(v)) {
+			if (!hpte_setup && is_vrma_hpte(v)) {
 				unsigned long psize = hpte_base_page_size(v, r);
 				unsigned long senc = slb_pgsize_encoding(psize);
 				unsigned long lpcr;
@@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 					(VRMA_VSID << SLB_VSID_SHIFT_1T);
 				lpcr = senc << (LPCR_VRMASD_SH - 4);
 				kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
-				rma_setup = 1;
+				hpte_setup = 1;
 			}
 			++i;
 			hptp += 2;
@@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 	}
 
  out:
-	/* Order HPTE updates vs. rma_setup_done */
+	/* Order HPTE updates vs. hpte_setup_done */
 	smp_wmb();
-	kvm->arch.rma_setup_done = rma_setup;
+	kvm->arch.hpte_setup_done = hpte_setup;
 	mutex_unlock(&kvm->lock);
 
 	if (err)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de74756..7b7102a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2032,11 +2032,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	}
 
 	atomic_inc(&vcpu->kvm->arch.vcpus_running);
-	/* Order vcpus_running vs. rma_setup_done, see kvmppc_alloc_reset_hpt */
+	/* Order vcpus_running vs. hpte_setup_done, see kvmppc_alloc_reset_hpt */
 	smp_mb();
 
 	/* On the first time here, set up HTAB and VRMA */
-	if (!vcpu->kvm->arch.rma_setup_done) {
+	if (!vcpu->kvm->arch.hpte_setup_done) {
 		r = kvmppc_hv_setup_htab_rma(vcpu);
 		if (r)
 			goto out;
@@ -2238,7 +2238,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	int srcu_idx;
 
 	mutex_lock(&kvm->lock);
-	if (kvm->arch.rma_setup_done)
+	if (kvm->arch.hpte_setup_done)
 		goto out;	/* another vcpu beat us to it */
 
 	/* Allocate hashed page table (if not done already) and reset it */
@@ -2289,9 +2289,9 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 
 	kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
 
-	/* Order updates to kvm->arch.lpcr etc. vs. rma_setup_done */
+	/* Order updates to kvm->arch.lpcr etc. vs. hpte_setup_done */
 	smp_wmb();
-	kvm->arch.rma_setup_done = 1;
+	kvm->arch.hpte_setup_done = 1;
 	err = 0;
  out_srcu:
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 05/23] KVM: PPC: Book3S HV: Remove RMA-related variables from code
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We don't support real-mode areas now that 970 support is removed.
Remove the remaining details of rma from the code.  Also rename
rma_setup_done to hpte_setup_done to better reflect the changes.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |  3 +--
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 ++++++++++++++--------------
 arch/powerpc/kvm/book3s_hv.c        | 10 +++++-----
 3 files changed, 20 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 8ef0512..015773f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -228,9 +228,8 @@ struct kvm_arch {
 	int tlbie_lock;
 	unsigned long lpcr;
 	unsigned long rmor;
-	struct kvm_rma_info *rma;
 	unsigned long vrma_slb_v;
-	int rma_setup_done;
+	int hpte_setup_done;
 	u32 hpt_order;
 	atomic_t vcpus_running;
 	u32 online_vcores;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 534acb3..dbf1271 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -116,12 +116,12 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
 	long order;
 
 	mutex_lock(&kvm->lock);
-	if (kvm->arch.rma_setup_done) {
-		kvm->arch.rma_setup_done = 0;
-		/* order rma_setup_done vs. vcpus_running */
+	if (kvm->arch.hpte_setup_done) {
+		kvm->arch.hpte_setup_done = 0;
+		/* order hpte_setup_done vs. vcpus_running */
 		smp_mb();
 		if (atomic_read(&kvm->arch.vcpus_running)) {
-			kvm->arch.rma_setup_done = 1;
+			kvm->arch.hpte_setup_done = 1;
 			goto out;
 		}
 	}
@@ -1339,20 +1339,20 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 	unsigned long tmp[2];
 	ssize_t nb;
 	long int err, ret;
-	int rma_setup;
+	int hpte_setup;
 
 	if (!access_ok(VERIFY_READ, buf, count))
 		return -EFAULT;
 
 	/* lock out vcpus from running while we're doing this */
 	mutex_lock(&kvm->lock);
-	rma_setup = kvm->arch.rma_setup_done;
-	if (rma_setup) {
-		kvm->arch.rma_setup_done = 0;	/* temporarily */
-		/* order rma_setup_done vs. vcpus_running */
+	hpte_setup = kvm->arch.hpte_setup_done;
+	if (hpte_setup) {
+		kvm->arch.hpte_setup_done = 0;	/* temporarily */
+		/* order hpte_setup_done vs. vcpus_running */
 		smp_mb();
 		if (atomic_read(&kvm->arch.vcpus_running)) {
-			kvm->arch.rma_setup_done = 1;
+			kvm->arch.hpte_setup_done = 1;
 			mutex_unlock(&kvm->lock);
 			return -EBUSY;
 		}
@@ -1405,7 +1405,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 				       "r=%lx\n", ret, i, v, r);
 				goto out;
 			}
-			if (!rma_setup && is_vrma_hpte(v)) {
+			if (!hpte_setup && is_vrma_hpte(v)) {
 				unsigned long psize = hpte_base_page_size(v, r);
 				unsigned long senc = slb_pgsize_encoding(psize);
 				unsigned long lpcr;
@@ -1414,7 +1414,7 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 					(VRMA_VSID << SLB_VSID_SHIFT_1T);
 				lpcr = senc << (LPCR_VRMASD_SH - 4);
 				kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
-				rma_setup = 1;
+				hpte_setup = 1;
 			}
 			++i;
 			hptp += 2;
@@ -1430,9 +1430,9 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,
 	}
 
  out:
-	/* Order HPTE updates vs. rma_setup_done */
+	/* Order HPTE updates vs. hpte_setup_done */
 	smp_wmb();
-	kvm->arch.rma_setup_done = rma_setup;
+	kvm->arch.hpte_setup_done = hpte_setup;
 	mutex_unlock(&kvm->lock);
 
 	if (err)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index de74756..7b7102a 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2032,11 +2032,11 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	}
 
 	atomic_inc(&vcpu->kvm->arch.vcpus_running);
-	/* Order vcpus_running vs. rma_setup_done, see kvmppc_alloc_reset_hpt */
+	/* Order vcpus_running vs. hpte_setup_done, see kvmppc_alloc_reset_hpt */
 	smp_mb();
 
 	/* On the first time here, set up HTAB and VRMA */
-	if (!vcpu->kvm->arch.rma_setup_done) {
+	if (!vcpu->kvm->arch.hpte_setup_done) {
 		r = kvmppc_hv_setup_htab_rma(vcpu);
 		if (r)
 			goto out;
@@ -2238,7 +2238,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 	int srcu_idx;
 
 	mutex_lock(&kvm->lock);
-	if (kvm->arch.rma_setup_done)
+	if (kvm->arch.hpte_setup_done)
 		goto out;	/* another vcpu beat us to it */
 
 	/* Allocate hashed page table (if not done already) and reset it */
@@ -2289,9 +2289,9 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 
 	kvmppc_update_lpcr(kvm, lpcr, LPCR_VRMASD);
 
-	/* Order updates to kvm->arch.lpcr etc. vs. rma_setup_done */
+	/* Order updates to kvm->arch.lpcr etc. vs. hpte_setup_done */
 	smp_wmb();
-	kvm->arch.rma_setup_done = 1;
+	kvm->arch.hpte_setup_done = 1;
 	err = 0;
  out_srcu:
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/23] KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This adds helper routines for locking and unlocking HPTEs, and uses
them in the rest of the code.  We don't change any locking rules in
this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 25 ++++++++++---------------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 25 +++++++++----------------
 3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e20..0789a0f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits)
 	return old == 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+	hpte_v &= ~HPTE_V_HVLOCK;
+	asm volatile(PPC_RELEASE_BARRIER "" : : : "memory");
+	hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+	hpte_v &= ~HPTE_V_HVLOCK;
+	hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
 	int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dbf1271..6c6825a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
 	gr = kvm->arch.revmap[index].guest_rpte;
 
-	/* Unlock the HPTE */
-	asm volatile("lwsync" : : : "memory");
-	hptep[0] = cpu_to_be64(v);
+	unlock_hpte(hptep, v);
 	preempt_enable();
 
 	gpte->eaddr = eaddr;
@@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	hpte[0] = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
 	hpte[1] = be64_to_cpu(hptep[1]);
 	hpte[2] = r = rev->guest_rpte;
-	asm volatile("lwsync" : : : "memory");
-	hptep[0] = cpu_to_be64(hpte[0]);
+	unlock_hpte(hptep, hpte[0]);
 	preempt_enable();
 
 	if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
@@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 
 	hptep[1] = cpu_to_be64(r);
 	eieio();
-	hptep[0] = cpu_to_be64(hpte[0]);
+	__unlock_hpte(hptep, hpte[0]);
 	asm volatile("ptesync" : : : "memory");
 	preempt_enable();
 	if (page && hpte_is_writable(r))
@@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	return ret;
 
  out_unlock:
-	hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+	__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
 	preempt_enable();
 	goto out_put;
 }
@@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			}
 		}
 		unlock_rmap(rmapp);
-		hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
 	}
 	return 0;
 }
@@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			}
 			ret = 1;
 		}
-		hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
@@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 
 		/* Now check and modify the HPTE */
 		if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID))) {
-			/* unlock and continue */
-			hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+			__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
 			continue;
 		}
 
@@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 				npages_dirty = n;
 			eieio();
 		}
-		v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+		v &= ~HPTE_V_ABSENT;
 		v |= HPTE_V_VALID;
-		hptep[0] = cpu_to_be64(v);
+		__unlock_hpte(hptep, v);
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
@@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			r &= ~HPTE_GR_MODIFIED;
 			revp->guest_rpte = r;
 		}
-		asm volatile(PPC_RELEASE_BARRIER "" : : : "memory");
-		hptp[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		unlock_hpte(hptp, be64_to_cpu(hptp[0]));
 		preempt_enable();
 		if (!(valid == want_valid && (first_pass || dirty)))
 			ok = 0;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 625407e..f6bf0b1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -150,12 +150,6 @@ static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva,
 	return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift);
 }
 
-static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
-{
-	asm volatile(PPC_RELEASE_BARRIER "" : : : "memory");
-	hpte[0] = cpu_to_be64(hpte_v);
-}
-
 long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		       long pte_index, unsigned long pteh, unsigned long ptel,
 		       pgd_t *pgdir, bool realmode, unsigned long *pte_idx_ret)
@@ -271,10 +265,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 				u64 pte;
 				while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 					cpu_relax();
-				pte = be64_to_cpu(*hpte);
+				pte = be64_to_cpu(hpte[0]);
 				if (!(pte & (HPTE_V_VALID | HPTE_V_ABSENT)))
 					break;
-				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
+				__unlock_hpte(hpte, pte);
 				hpte += 2;
 			}
 			if (i == 8)
@@ -290,9 +284,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 			while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 				cpu_relax();
-			pte = be64_to_cpu(*hpte);
+			pte = be64_to_cpu(hpte[0]);
 			if (pte & (HPTE_V_VALID | HPTE_V_ABSENT)) {
-				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
+				__unlock_hpte(hpte, pte);
 				return H_PTEG_FULL;
 			}
 		}
@@ -331,7 +325,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	/* Write the first HPTE dword, unlocking the HPTE and making it valid */
 	eieio();
-	hpte[0] = cpu_to_be64(pteh);
+	__unlock_hpte(hpte, pteh);
 	asm volatile("ptesync" : : : "memory");
 
 	*pte_idx_ret = pte_index;
@@ -412,7 +406,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
 	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
-		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		__unlock_hpte(hpte, pte);
 		return H_NOT_FOUND;
 	}
 
@@ -548,7 +542,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 				be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
 			rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
 			args[j] |= rcbits << (56 - 5);
-			hp[0] = 0;
+			__unlock_hpte(hp, 0);
 		}
 	}
 
@@ -574,7 +568,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	pte = be64_to_cpu(hpte[0]);
 	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) == 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
-		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		__unlock_hpte(hpte, pte);
 		return H_NOT_FOUND;
 	}
 
@@ -755,8 +749,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 				/* Return with the HPTE still locked */
 				return (hash << 3) + (i >> 1);
 
-			/* Unlock and move on */
-			hpte[i] = cpu_to_be64(v);
+			__unlock_hpte(&hpte[i], v);
 		}
 
 		if (val & HPTE_V_SECONDARY)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 06/23] KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

This adds helper routines for locking and unlocking HPTEs, and uses
them in the rest of the code.  We don't change any locking rules in
this patch.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s_64.h | 14 ++++++++++++++
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 25 ++++++++++---------------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c      | 25 +++++++++----------------
 3 files changed, 33 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 2d81e20..0789a0f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -85,6 +85,20 @@ static inline long try_lock_hpte(__be64 *hpte, unsigned long bits)
 	return old = 0;
 }
 
+static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+	hpte_v &= ~HPTE_V_HVLOCK;
+	asm volatile(PPC_RELEASE_BARRIER "" : : : "memory");
+	hpte[0] = cpu_to_be64(hpte_v);
+}
+
+/* Without barrier */
+static inline void __unlock_hpte(__be64 *hpte, unsigned long hpte_v)
+{
+	hpte_v &= ~HPTE_V_HVLOCK;
+	hpte[0] = cpu_to_be64(hpte_v);
+}
+
 static inline int __hpte_actual_psize(unsigned int lp, int psize)
 {
 	int i, shift;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index dbf1271..6c6825a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -338,9 +338,7 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
 	v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
 	gr = kvm->arch.revmap[index].guest_rpte;
 
-	/* Unlock the HPTE */
-	asm volatile("lwsync" : : : "memory");
-	hptep[0] = cpu_to_be64(v);
+	unlock_hpte(hptep, v);
 	preempt_enable();
 
 	gpte->eaddr = eaddr;
@@ -469,8 +467,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	hpte[0] = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
 	hpte[1] = be64_to_cpu(hptep[1]);
 	hpte[2] = r = rev->guest_rpte;
-	asm volatile("lwsync" : : : "memory");
-	hptep[0] = cpu_to_be64(hpte[0]);
+	unlock_hpte(hptep, hpte[0]);
 	preempt_enable();
 
 	if (hpte[0] != vcpu->arch.pgfault_hpte[0] ||
@@ -621,7 +618,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 
 	hptep[1] = cpu_to_be64(r);
 	eieio();
-	hptep[0] = cpu_to_be64(hpte[0]);
+	__unlock_hpte(hptep, hpte[0]);
 	asm volatile("ptesync" : : : "memory");
 	preempt_enable();
 	if (page && hpte_is_writable(r))
@@ -642,7 +639,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	return ret;
 
  out_unlock:
-	hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+	__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
 	preempt_enable();
 	goto out_put;
 }
@@ -771,7 +768,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			}
 		}
 		unlock_rmap(rmapp);
-		hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
 	}
 	return 0;
 }
@@ -857,7 +854,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			}
 			ret = 1;
 		}
-		hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
@@ -974,8 +971,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 
 		/* Now check and modify the HPTE */
 		if (!(hptep[0] & cpu_to_be64(HPTE_V_VALID))) {
-			/* unlock and continue */
-			hptep[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+			__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
 			continue;
 		}
 
@@ -996,9 +992,9 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 				npages_dirty = n;
 			eieio();
 		}
-		v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+		v &= ~HPTE_V_ABSENT;
 		v |= HPTE_V_VALID;
-		hptep[0] = cpu_to_be64(v);
+		__unlock_hpte(hptep, v);
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
@@ -1218,8 +1214,7 @@ static long record_hpte(unsigned long flags, __be64 *hptp,
 			r &= ~HPTE_GR_MODIFIED;
 			revp->guest_rpte = r;
 		}
-		asm volatile(PPC_RELEASE_BARRIER "" : : : "memory");
-		hptp[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		unlock_hpte(hptp, be64_to_cpu(hptp[0]));
 		preempt_enable();
 		if (!(valid = want_valid && (first_pass || dirty)))
 			ok = 0;
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 625407e..f6bf0b1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -150,12 +150,6 @@ static pte_t lookup_linux_pte_and_update(pgd_t *pgdir, unsigned long hva,
 	return kvmppc_read_update_linux_pte(ptep, writing, hugepage_shift);
 }
 
-static inline void unlock_hpte(__be64 *hpte, unsigned long hpte_v)
-{
-	asm volatile(PPC_RELEASE_BARRIER "" : : : "memory");
-	hpte[0] = cpu_to_be64(hpte_v);
-}
-
 long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 		       long pte_index, unsigned long pteh, unsigned long ptel,
 		       pgd_t *pgdir, bool realmode, unsigned long *pte_idx_ret)
@@ -271,10 +265,10 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 				u64 pte;
 				while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 					cpu_relax();
-				pte = be64_to_cpu(*hpte);
+				pte = be64_to_cpu(hpte[0]);
 				if (!(pte & (HPTE_V_VALID | HPTE_V_ABSENT)))
 					break;
-				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
+				__unlock_hpte(hpte, pte);
 				hpte += 2;
 			}
 			if (i = 8)
@@ -290,9 +284,9 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 			while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
 				cpu_relax();
-			pte = be64_to_cpu(*hpte);
+			pte = be64_to_cpu(hpte[0]);
 			if (pte & (HPTE_V_VALID | HPTE_V_ABSENT)) {
-				*hpte &= ~cpu_to_be64(HPTE_V_HVLOCK);
+				__unlock_hpte(hpte, pte);
 				return H_PTEG_FULL;
 			}
 		}
@@ -331,7 +325,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
 
 	/* Write the first HPTE dword, unlocking the HPTE and making it valid */
 	eieio();
-	hpte[0] = cpu_to_be64(pteh);
+	__unlock_hpte(hpte, pteh);
 	asm volatile("ptesync" : : : "memory");
 
 	*pte_idx_ret = pte_index;
@@ -412,7 +406,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
 	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn) ||
 	    ((flags & H_ANDCOND) && (pte & avpn) != 0)) {
-		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		__unlock_hpte(hpte, pte);
 		return H_NOT_FOUND;
 	}
 
@@ -548,7 +542,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
 				be64_to_cpu(hp[0]), be64_to_cpu(hp[1]));
 			rcbits = rev->guest_rpte & (HPTE_R_R|HPTE_R_C);
 			args[j] |= rcbits << (56 - 5);
-			hp[0] = 0;
+			__unlock_hpte(hp, 0);
 		}
 	}
 
@@ -574,7 +568,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
 	pte = be64_to_cpu(hpte[0]);
 	if ((pte & (HPTE_V_ABSENT | HPTE_V_VALID)) = 0 ||
 	    ((flags & H_AVPN) && (pte & ~0x7fUL) != avpn)) {
-		hpte[0] &= ~cpu_to_be64(HPTE_V_HVLOCK);
+		__unlock_hpte(hpte, pte);
 		return H_NOT_FOUND;
 	}
 
@@ -755,8 +749,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
 				/* Return with the HPTE still locked */
 				return (hash << 3) + (i >> 1);
 
-			/* Unlock and move on */
-			hpte[i] = cpu_to_be64(v);
+			__unlock_hpte(&hpte[i], v);
 		}
 
 		if (val & HPTE_V_SECONDARY)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Bharata B Rao

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
correctly, certain work arounds have to be employed to allow reuse of
vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
proposed workaround is to park the vcpu fd in userspace during cpu unplug
and reuse it later during next hotplug.

More details can be found here:
KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

In order to support this workaround with PowerPC KVM, don't create or
initialize ICP if the vCPU is found to be already associated with an ICP.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_xics.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a4a8d9f..ead3a35 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
 		return -EPERM;
 	if (xics->kvm != vcpu->kvm)
 		return -EPERM;
-	if (vcpu->arch.irq_type)
-		return -EBUSY;
+
+	/*
+	 * If irq_type is already set, don't reinialize but
+	 * return success allowing this vcpu to be reused.
+	 */
+	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
+		return 0;
 
 	r = kvmppc_xics_create_icp(vcpu, xcpu);
 	if (!r)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Bharata B Rao

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
correctly, certain work arounds have to be employed to allow reuse of
vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
proposed workaround is to park the vcpu fd in userspace during cpu unplug
and reuse it later during next hotplug.

More details can be found here:
KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

In order to support this workaround with PowerPC KVM, don't create or
initialize ICP if the vCPU is found to be already associated with an ICP.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_xics.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a4a8d9f..ead3a35 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
 		return -EPERM;
 	if (xics->kvm != vcpu->kvm)
 		return -EPERM;
-	if (vcpu->arch.irq_type)
-		return -EBUSY;
+
+	/*
+	 * If irq_type is already set, don't reinialize but
+	 * return success allowing this vcpu to be reused.
+	 */
+	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
+		return 0;
 
 	r = kvmppc_xics_create_icp(vcpu, xcpu);
 	if (!r)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/23] KVM: PPC: Book3S HV: Add guest->host real mode completion counters
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Suresh E. Warrier

From: "Suresh E. Warrier" <warrier@linux.vnet.ibm.com>

Add counters to track number of times we switch from guest real mode
to host virtual mode during an interrupt-related hyper call because the
hypercall requires actions that cannot be completed in real mode. This
will help when making optimizations that reduce guest-host transitions.

It is safe to use an ordinary increment rather than an atomic operation
because there is one ICP per virtual CPU and kvmppc_xics_rm_complete()
only works on the ICP for the current VCPU.

The counters are displayed as part of IPC and ICP state provided by
/sys/debug/kernel/powerpc/kvm* for each VM.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_xics.c | 31 +++++++++++++++++++++++++++----
 arch/powerpc/kvm/book3s_xics.h |  6 ++++++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index ead3a35..48f0bda 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -802,14 +802,22 @@ static noinline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
 	XICS_DBG("XICS_RM: H_%x completing, act: %x state: %lx tgt: %p\n",
 		 hcall, icp->rm_action, icp->rm_dbgstate.raw, icp->rm_dbgtgt);
 
-	if (icp->rm_action & XICS_RM_KICK_VCPU)
+	if (icp->rm_action & XICS_RM_KICK_VCPU) {
+		icp->n_rm_kick_vcpu++;
 		kvmppc_fast_vcpu_kick(icp->rm_kick_target);
-	if (icp->rm_action & XICS_RM_CHECK_RESEND)
+	}
+	if (icp->rm_action & XICS_RM_CHECK_RESEND) {
+		icp->n_rm_check_resend++;
 		icp_check_resend(xics, icp->rm_resend_icp);
-	if (icp->rm_action & XICS_RM_REJECT)
+	}
+	if (icp->rm_action & XICS_RM_REJECT) {
+		icp->n_rm_reject++;
 		icp_deliver_irq(xics, icp, icp->rm_reject);
-	if (icp->rm_action & XICS_RM_NOTIFY_EOI)
+	}
+	if (icp->rm_action & XICS_RM_NOTIFY_EOI) {
+		icp->n_rm_notify_eoi++;
 		kvm_notify_acked_irq(vcpu->kvm, 0, icp->rm_eoied_irq);
+	}
 
 	icp->rm_action = 0;
 
@@ -872,10 +880,17 @@ static int xics_debug_show(struct seq_file *m, void *private)
 	struct kvm *kvm = xics->kvm;
 	struct kvm_vcpu *vcpu;
 	int icsid, i;
+	unsigned long t_rm_kick_vcpu, t_rm_check_resend;
+	unsigned long t_rm_reject, t_rm_notify_eoi;
 
 	if (!kvm)
 		return 0;
 
+	t_rm_kick_vcpu = 0;
+	t_rm_notify_eoi = 0;
+	t_rm_check_resend = 0;
+	t_rm_reject = 0;
+
 	seq_printf(m, "=========\nICP state\n=========\n");
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -890,8 +905,16 @@ static int xics_debug_show(struct seq_file *m, void *private)
 			   icp->server_num, state.xisr,
 			   state.pending_pri, state.cppr, state.mfrr,
 			   state.out_ee, state.need_resend);
+		t_rm_kick_vcpu += icp->n_rm_kick_vcpu;
+		t_rm_notify_eoi += icp->n_rm_notify_eoi;
+		t_rm_check_resend += icp->n_rm_check_resend;
+		t_rm_reject += icp->n_rm_reject;
 	}
 
+	seq_puts(m, "ICP Guest Real Mode exit totals: ");
+	seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n",
+			t_rm_kick_vcpu, t_rm_check_resend,
+			t_rm_reject, t_rm_notify_eoi);
 	for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) {
 		struct kvmppc_ics *ics = xics->ics[icsid];
 
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 73f0f27..de970ec 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -78,6 +78,12 @@ struct kvmppc_icp {
 	u32  rm_reject;
 	u32  rm_eoied_irq;
 
+	/* Counters for each reason we exited real mode */
+	unsigned long n_rm_kick_vcpu;
+	unsigned long n_rm_check_resend;
+	unsigned long n_rm_reject;
+	unsigned long n_rm_notify_eoi;
+
 	/* Debug stuff for real mode */
 	union kvmppc_icp_state rm_dbgstate;
 	struct kvm_vcpu *rm_dbgtgt;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 08/23] KVM: PPC: Book3S HV: Add guest->host real mode completion counters
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Suresh E. Warrier

From: "Suresh E. Warrier" <warrier@linux.vnet.ibm.com>

Add counters to track number of times we switch from guest real mode
to host virtual mode during an interrupt-related hyper call because the
hypercall requires actions that cannot be completed in real mode. This
will help when making optimizations that reduce guest-host transitions.

It is safe to use an ordinary increment rather than an atomic operation
because there is one ICP per virtual CPU and kvmppc_xics_rm_complete()
only works on the ICP for the current VCPU.

The counters are displayed as part of IPC and ICP state provided by
/sys/debug/kernel/powerpc/kvm* for each VM.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_xics.c | 31 +++++++++++++++++++++++++++----
 arch/powerpc/kvm/book3s_xics.h |  6 ++++++
 2 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index ead3a35..48f0bda 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -802,14 +802,22 @@ static noinline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall)
 	XICS_DBG("XICS_RM: H_%x completing, act: %x state: %lx tgt: %p\n",
 		 hcall, icp->rm_action, icp->rm_dbgstate.raw, icp->rm_dbgtgt);
 
-	if (icp->rm_action & XICS_RM_KICK_VCPU)
+	if (icp->rm_action & XICS_RM_KICK_VCPU) {
+		icp->n_rm_kick_vcpu++;
 		kvmppc_fast_vcpu_kick(icp->rm_kick_target);
-	if (icp->rm_action & XICS_RM_CHECK_RESEND)
+	}
+	if (icp->rm_action & XICS_RM_CHECK_RESEND) {
+		icp->n_rm_check_resend++;
 		icp_check_resend(xics, icp->rm_resend_icp);
-	if (icp->rm_action & XICS_RM_REJECT)
+	}
+	if (icp->rm_action & XICS_RM_REJECT) {
+		icp->n_rm_reject++;
 		icp_deliver_irq(xics, icp, icp->rm_reject);
-	if (icp->rm_action & XICS_RM_NOTIFY_EOI)
+	}
+	if (icp->rm_action & XICS_RM_NOTIFY_EOI) {
+		icp->n_rm_notify_eoi++;
 		kvm_notify_acked_irq(vcpu->kvm, 0, icp->rm_eoied_irq);
+	}
 
 	icp->rm_action = 0;
 
@@ -872,10 +880,17 @@ static int xics_debug_show(struct seq_file *m, void *private)
 	struct kvm *kvm = xics->kvm;
 	struct kvm_vcpu *vcpu;
 	int icsid, i;
+	unsigned long t_rm_kick_vcpu, t_rm_check_resend;
+	unsigned long t_rm_reject, t_rm_notify_eoi;
 
 	if (!kvm)
 		return 0;
 
+	t_rm_kick_vcpu = 0;
+	t_rm_notify_eoi = 0;
+	t_rm_check_resend = 0;
+	t_rm_reject = 0;
+
 	seq_printf(m, "=====\nICP state\n=====\n");
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -890,8 +905,16 @@ static int xics_debug_show(struct seq_file *m, void *private)
 			   icp->server_num, state.xisr,
 			   state.pending_pri, state.cppr, state.mfrr,
 			   state.out_ee, state.need_resend);
+		t_rm_kick_vcpu += icp->n_rm_kick_vcpu;
+		t_rm_notify_eoi += icp->n_rm_notify_eoi;
+		t_rm_check_resend += icp->n_rm_check_resend;
+		t_rm_reject += icp->n_rm_reject;
 	}
 
+	seq_puts(m, "ICP Guest Real Mode exit totals: ");
+	seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n",
+			t_rm_kick_vcpu, t_rm_check_resend,
+			t_rm_reject, t_rm_notify_eoi);
 	for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) {
 		struct kvmppc_ics *ics = xics->ics[icsid];
 
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 73f0f27..de970ec 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -78,6 +78,12 @@ struct kvmppc_icp {
 	u32  rm_reject;
 	u32  rm_eoied_irq;
 
+	/* Counters for each reason we exited real mode */
+	unsigned long n_rm_kick_vcpu;
+	unsigned long n_rm_check_resend;
+	unsigned long n_rm_reject;
+	unsigned long n_rm_notify_eoi;
+
 	/* Debug stuff for real mode */
 	union kvmppc_icp_state rm_dbgstate;
 	struct kvm_vcpu *rm_dbgtgt;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/23] KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Suresh Warrier

From: Suresh Warrier <warrier@linux.vnet.ibm.com>

Replaces the ICS mutex lock with a spin lock since we will be porting
these routines to real mode. Note that we need to disable interrupts
before we take the lock in anticipation of the fact that on the guest
side, we are running in the context of a hard irq and interrupts are
disabled (EE bit off) when the lock is acquired. Again, because we
will be acquiring the lock in hypervisor real mode, we need to use
an arch_spinlock_t instead of a normal spinlock here as we want to
avoid running any lockdep code (which may not be safe to execute in
real mode).

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_xics.c | 68 +++++++++++++++++++++++++++++-------------
 arch/powerpc/kvm/book3s_xics.h |  2 +-
 2 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 48f0bda..56ed9b4 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -20,6 +20,7 @@
 #include <asm/xics.h>
 #include <asm/debug.h>
 #include <asm/time.h>
+#include <asm/spinlock.h>
 
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
@@ -39,7 +40,7 @@
  * LOCKING
  * =======
  *
- * Each ICS has a mutex protecting the information about the IRQ
+ * Each ICS has a spin lock protecting the information about the IRQ
  * sources and avoiding simultaneous deliveries if the same interrupt.
  *
  * ICP operations are done via a single compare & swap transaction
@@ -109,7 +110,10 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 {
 	int i;
 
-	mutex_lock(&ics->lock);
+	unsigned long flags;
+
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 
 	for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
 		struct ics_irq_state *state = &ics->irq_state[i];
@@ -120,12 +124,15 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 		XICS_DBG("resend %#x prio %#x\n", state->number,
 			      state->priority);
 
-		mutex_unlock(&ics->lock);
+		arch_spin_unlock(&ics->lock);
+		local_irq_restore(flags);
 		icp_deliver_irq(xics, icp, state->number);
-		mutex_lock(&ics->lock);
+		local_irq_save(flags);
+		arch_spin_lock(&ics->lock);
 	}
 
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 }
 
 static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
@@ -133,8 +140,10 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 		       u32 server, u32 priority, u32 saved_priority)
 {
 	bool deliver;
+	unsigned long flags;
 
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 
 	state->server = server;
 	state->priority = priority;
@@ -145,7 +154,8 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 		deliver = true;
 	}
 
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 
 	return deliver;
 }
@@ -186,6 +196,7 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority)
 	struct kvmppc_ics *ics;
 	struct ics_irq_state *state;
 	u16 src;
+	unsigned long flags;
 
 	if (!xics)
 		return -ENODEV;
@@ -195,10 +206,12 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority)
 		return -EINVAL;
 	state = &ics->irq_state[src];
 
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 	*server = state->server;
 	*priority = state->priority;
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 
 	return 0;
 }
@@ -365,6 +378,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	struct kvmppc_ics *ics;
 	u32 reject;
 	u16 src;
+	unsigned long flags;
 
 	/*
 	 * This is used both for initial delivery of an interrupt and
@@ -391,7 +405,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	state = &ics->irq_state[src];
 
 	/* Get a lock on the ICS */
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 
 	/* Get our server */
 	if (!icp || state->server != icp->server_num) {
@@ -434,7 +449,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	 *
 	 * Note that if successful, the new delivery might have itself
 	 * rejected an interrupt that was "delivered" before we took the
-	 * icp mutex.
+	 * ics spin lock.
 	 *
 	 * In this case we do the whole sequence all over again for the
 	 * new guy. We cannot assume that the rejected interrupt is less
@@ -448,7 +463,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 		 * Delivery was successful, did we reject somebody else ?
 		 */
 		if (reject && reject != XICS_IPI) {
-			mutex_unlock(&ics->lock);
+			arch_spin_unlock(&ics->lock);
+			local_irq_restore(flags);
 			new_irq = reject;
 			goto again;
 		}
@@ -468,12 +484,14 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 		 */
 		smp_mb();
 		if (!icp->state.need_resend) {
-			mutex_unlock(&ics->lock);
+			arch_spin_unlock(&ics->lock);
+			local_irq_restore(flags);
 			goto again;
 		}
 	}
  out:
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 }
 
 static void icp_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
@@ -880,6 +898,7 @@ static int xics_debug_show(struct seq_file *m, void *private)
 	struct kvm *kvm = xics->kvm;
 	struct kvm_vcpu *vcpu;
 	int icsid, i;
+	unsigned long flags;
 	unsigned long t_rm_kick_vcpu, t_rm_check_resend;
 	unsigned long t_rm_reject, t_rm_notify_eoi;
 
@@ -924,7 +943,8 @@ static int xics_debug_show(struct seq_file *m, void *private)
 		seq_printf(m, "=========\nICS state for ICS 0x%x\n=========\n",
 			   icsid);
 
-		mutex_lock(&ics->lock);
+		local_irq_save(flags);
+		arch_spin_lock(&ics->lock);
 
 		for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
 			struct ics_irq_state *irq = &ics->irq_state[i];
@@ -935,7 +955,8 @@ static int xics_debug_show(struct seq_file *m, void *private)
 				   irq->resend, irq->masked_pending);
 
 		}
-		mutex_unlock(&ics->lock);
+		arch_spin_unlock(&ics->lock);
+		local_irq_restore(flags);
 	}
 	return 0;
 }
@@ -988,7 +1009,6 @@ static struct kvmppc_ics *kvmppc_xics_create_ics(struct kvm *kvm,
 	if (!ics)
 		goto out;
 
-	mutex_init(&ics->lock);
 	ics->icsid = icsid;
 
 	for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
@@ -1130,13 +1150,15 @@ static int xics_get_source(struct kvmppc_xics *xics, long irq, u64 addr)
 	u64 __user *ubufp = (u64 __user *) addr;
 	u16 idx;
 	u64 val, prio;
+	unsigned long flags;
 
 	ics = kvmppc_xics_find_ics(xics, irq, &idx);
 	if (!ics)
 		return -ENOENT;
 
 	irqp = &ics->irq_state[idx];
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 	ret = -ENOENT;
 	if (irqp->exists) {
 		val = irqp->server;
@@ -1152,7 +1174,8 @@ static int xics_get_source(struct kvmppc_xics *xics, long irq, u64 addr)
 			val |= KVM_XICS_PENDING;
 		ret = 0;
 	}
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 
 	if (!ret && put_user(val, ubufp))
 		ret = -EFAULT;
@@ -1169,6 +1192,7 @@ static int xics_set_source(struct kvmppc_xics *xics, long irq, u64 addr)
 	u64 val;
 	u8 prio;
 	u32 server;
+	unsigned long flags;
 
 	if (irq < KVMPPC_XICS_FIRST_IRQ || irq >= KVMPPC_XICS_NR_IRQS)
 		return -ENOENT;
@@ -1189,7 +1213,8 @@ static int xics_set_source(struct kvmppc_xics *xics, long irq, u64 addr)
 	    kvmppc_xics_find_server(xics->kvm, server) == NULL)
 		return -EINVAL;
 
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 	irqp->server = server;
 	irqp->saved_priority = prio;
 	if (val & KVM_XICS_MASKED)
@@ -1201,7 +1226,8 @@ static int xics_set_source(struct kvmppc_xics *xics, long irq, u64 addr)
 	if ((val & KVM_XICS_PENDING) && (val & KVM_XICS_LEVEL_SENSITIVE))
 		irqp->asserted = 1;
 	irqp->exists = 1;
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 
 	if (val & KVM_XICS_PENDING)
 		icp_deliver_irq(xics, NULL, irqp->number);
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index de970ec..055424c 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -90,7 +90,7 @@ struct kvmppc_icp {
 };
 
 struct kvmppc_ics {
-	struct mutex lock;
+	arch_spinlock_t lock;
 	u16 icsid;
 	struct ics_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS];
 };
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 09/23] KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Suresh Warrier

From: Suresh Warrier <warrier@linux.vnet.ibm.com>

Replaces the ICS mutex lock with a spin lock since we will be porting
these routines to real mode. Note that we need to disable interrupts
before we take the lock in anticipation of the fact that on the guest
side, we are running in the context of a hard irq and interrupts are
disabled (EE bit off) when the lock is acquired. Again, because we
will be acquiring the lock in hypervisor real mode, we need to use
an arch_spinlock_t instead of a normal spinlock here as we want to
avoid running any lockdep code (which may not be safe to execute in
real mode).

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_xics.c | 68 +++++++++++++++++++++++++++++-------------
 arch/powerpc/kvm/book3s_xics.h |  2 +-
 2 files changed, 48 insertions(+), 22 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 48f0bda..56ed9b4 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -20,6 +20,7 @@
 #include <asm/xics.h>
 #include <asm/debug.h>
 #include <asm/time.h>
+#include <asm/spinlock.h>
 
 #include <linux/debugfs.h>
 #include <linux/seq_file.h>
@@ -39,7 +40,7 @@
  * LOCKING
  * ===  *
- * Each ICS has a mutex protecting the information about the IRQ
+ * Each ICS has a spin lock protecting the information about the IRQ
  * sources and avoiding simultaneous deliveries if the same interrupt.
  *
  * ICP operations are done via a single compare & swap transaction
@@ -109,7 +110,10 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 {
 	int i;
 
-	mutex_lock(&ics->lock);
+	unsigned long flags;
+
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 
 	for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
 		struct ics_irq_state *state = &ics->irq_state[i];
@@ -120,12 +124,15 @@ static void ics_check_resend(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 		XICS_DBG("resend %#x prio %#x\n", state->number,
 			      state->priority);
 
-		mutex_unlock(&ics->lock);
+		arch_spin_unlock(&ics->lock);
+		local_irq_restore(flags);
 		icp_deliver_irq(xics, icp, state->number);
-		mutex_lock(&ics->lock);
+		local_irq_save(flags);
+		arch_spin_lock(&ics->lock);
 	}
 
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 }
 
 static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
@@ -133,8 +140,10 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 		       u32 server, u32 priority, u32 saved_priority)
 {
 	bool deliver;
+	unsigned long flags;
 
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 
 	state->server = server;
 	state->priority = priority;
@@ -145,7 +154,8 @@ static bool write_xive(struct kvmppc_xics *xics, struct kvmppc_ics *ics,
 		deliver = true;
 	}
 
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 
 	return deliver;
 }
@@ -186,6 +196,7 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority)
 	struct kvmppc_ics *ics;
 	struct ics_irq_state *state;
 	u16 src;
+	unsigned long flags;
 
 	if (!xics)
 		return -ENODEV;
@@ -195,10 +206,12 @@ int kvmppc_xics_get_xive(struct kvm *kvm, u32 irq, u32 *server, u32 *priority)
 		return -EINVAL;
 	state = &ics->irq_state[src];
 
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 	*server = state->server;
 	*priority = state->priority;
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 
 	return 0;
 }
@@ -365,6 +378,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	struct kvmppc_ics *ics;
 	u32 reject;
 	u16 src;
+	unsigned long flags;
 
 	/*
 	 * This is used both for initial delivery of an interrupt and
@@ -391,7 +405,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	state = &ics->irq_state[src];
 
 	/* Get a lock on the ICS */
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 
 	/* Get our server */
 	if (!icp || state->server != icp->server_num) {
@@ -434,7 +449,7 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	 *
 	 * Note that if successful, the new delivery might have itself
 	 * rejected an interrupt that was "delivered" before we took the
-	 * icp mutex.
+	 * ics spin lock.
 	 *
 	 * In this case we do the whole sequence all over again for the
 	 * new guy. We cannot assume that the rejected interrupt is less
@@ -448,7 +463,8 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 		 * Delivery was successful, did we reject somebody else ?
 		 */
 		if (reject && reject != XICS_IPI) {
-			mutex_unlock(&ics->lock);
+			arch_spin_unlock(&ics->lock);
+			local_irq_restore(flags);
 			new_irq = reject;
 			goto again;
 		}
@@ -468,12 +484,14 @@ static void icp_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 		 */
 		smp_mb();
 		if (!icp->state.need_resend) {
-			mutex_unlock(&ics->lock);
+			arch_spin_unlock(&ics->lock);
+			local_irq_restore(flags);
 			goto again;
 		}
 	}
  out:
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 }
 
 static void icp_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
@@ -880,6 +898,7 @@ static int xics_debug_show(struct seq_file *m, void *private)
 	struct kvm *kvm = xics->kvm;
 	struct kvm_vcpu *vcpu;
 	int icsid, i;
+	unsigned long flags;
 	unsigned long t_rm_kick_vcpu, t_rm_check_resend;
 	unsigned long t_rm_reject, t_rm_notify_eoi;
 
@@ -924,7 +943,8 @@ static int xics_debug_show(struct seq_file *m, void *private)
 		seq_printf(m, "=====\nICS state for ICS 0x%x\n=====\n",
 			   icsid);
 
-		mutex_lock(&ics->lock);
+		local_irq_save(flags);
+		arch_spin_lock(&ics->lock);
 
 		for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
 			struct ics_irq_state *irq = &ics->irq_state[i];
@@ -935,7 +955,8 @@ static int xics_debug_show(struct seq_file *m, void *private)
 				   irq->resend, irq->masked_pending);
 
 		}
-		mutex_unlock(&ics->lock);
+		arch_spin_unlock(&ics->lock);
+		local_irq_restore(flags);
 	}
 	return 0;
 }
@@ -988,7 +1009,6 @@ static struct kvmppc_ics *kvmppc_xics_create_ics(struct kvm *kvm,
 	if (!ics)
 		goto out;
 
-	mutex_init(&ics->lock);
 	ics->icsid = icsid;
 
 	for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
@@ -1130,13 +1150,15 @@ static int xics_get_source(struct kvmppc_xics *xics, long irq, u64 addr)
 	u64 __user *ubufp = (u64 __user *) addr;
 	u16 idx;
 	u64 val, prio;
+	unsigned long flags;
 
 	ics = kvmppc_xics_find_ics(xics, irq, &idx);
 	if (!ics)
 		return -ENOENT;
 
 	irqp = &ics->irq_state[idx];
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 	ret = -ENOENT;
 	if (irqp->exists) {
 		val = irqp->server;
@@ -1152,7 +1174,8 @@ static int xics_get_source(struct kvmppc_xics *xics, long irq, u64 addr)
 			val |= KVM_XICS_PENDING;
 		ret = 0;
 	}
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 
 	if (!ret && put_user(val, ubufp))
 		ret = -EFAULT;
@@ -1169,6 +1192,7 @@ static int xics_set_source(struct kvmppc_xics *xics, long irq, u64 addr)
 	u64 val;
 	u8 prio;
 	u32 server;
+	unsigned long flags;
 
 	if (irq < KVMPPC_XICS_FIRST_IRQ || irq >= KVMPPC_XICS_NR_IRQS)
 		return -ENOENT;
@@ -1189,7 +1213,8 @@ static int xics_set_source(struct kvmppc_xics *xics, long irq, u64 addr)
 	    kvmppc_xics_find_server(xics->kvm, server) = NULL)
 		return -EINVAL;
 
-	mutex_lock(&ics->lock);
+	local_irq_save(flags);
+	arch_spin_lock(&ics->lock);
 	irqp->server = server;
 	irqp->saved_priority = prio;
 	if (val & KVM_XICS_MASKED)
@@ -1201,7 +1226,8 @@ static int xics_set_source(struct kvmppc_xics *xics, long irq, u64 addr)
 	if ((val & KVM_XICS_PENDING) && (val & KVM_XICS_LEVEL_SENSITIVE))
 		irqp->asserted = 1;
 	irqp->exists = 1;
-	mutex_unlock(&ics->lock);
+	arch_spin_unlock(&ics->lock);
+	local_irq_restore(flags);
 
 	if (val & KVM_XICS_PENDING)
 		icp_deliver_irq(xics, NULL, irqp->number);
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index de970ec..055424c 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -90,7 +90,7 @@ struct kvmppc_icp {
 };
 
 struct kvmppc_ics {
-	struct mutex lock;
+	arch_spinlock_t lock;
 	u16 icsid;
 	struct ics_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS];
 };
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/23] KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Suresh Warrier

From: Suresh Warrier <warrier@linux.vnet.ibm.com>

Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
to switch to the host to complete the rest of hypercall function in
virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
functions to be runnable in hypervisor real mode, thus avoiding the need
to switch to the host to execute these functions in virtual mode. However,
the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
events - these events cannot be done in real mode and they will still need
a switch to host virtual mode.

There are sufficient differences between the real mode code and the
virtual mode code for the ICS/ICP resend and reject functions that
for now the code has been duplicated instead of sharing common code.
In the future, we can look at creating common functions.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 225 ++++++++++++++++++++++++++++++++---
 1 file changed, 211 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 7c22997..73bbe92 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -23,12 +23,39 @@
 
 #define DEBUG_PASSUP
 
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+			    u32 new_irq);
+
 static inline void rm_writeb(unsigned long paddr, u8 val)
 {
 	__asm__ __volatile__("sync; stbcix %0,0,%1"
 		: : "r" (val), "r" (paddr) : "memory");
 }
 
+/* -- ICS routines -- */
+static void ics_rm_check_resend(struct kvmppc_xics *xics,
+				struct kvmppc_ics *ics, struct kvmppc_icp *icp)
+{
+	int i;
+
+	arch_spin_lock(&ics->lock);
+
+	for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
+		struct ics_irq_state *state = &ics->irq_state[i];
+
+		if (!state->resend)
+			continue;
+
+		arch_spin_unlock(&ics->lock);
+		icp_rm_deliver_irq(xics, icp, state->number);
+		arch_spin_lock(&ics->lock);
+	}
+
+	arch_spin_unlock(&ics->lock);
+}
+
+/* -- ICP routines -- */
+
 static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
 				struct kvm_vcpu *this_vcpu)
 {
@@ -116,6 +143,178 @@ static inline int check_too_hard(struct kvmppc_xics *xics,
 	return (xics->real_mode_dbg || icp->rm_action) ? H_TOO_HARD : H_SUCCESS;
 }
 
+static void icp_rm_check_resend(struct kvmppc_xics *xics,
+			     struct kvmppc_icp *icp)
+{
+	u32 icsid;
+
+	/* Order this load with the test for need_resend in the caller */
+	smp_rmb();
+	for_each_set_bit(icsid, icp->resend_map, xics->max_icsid + 1) {
+		struct kvmppc_ics *ics = xics->ics[icsid];
+
+		if (!test_and_clear_bit(icsid, icp->resend_map))
+			continue;
+		if (!ics)
+			continue;
+		ics_rm_check_resend(xics, ics, icp);
+	}
+}
+
+static bool icp_rm_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
+			       u32 *reject)
+{
+	union kvmppc_icp_state old_state, new_state;
+	bool success;
+
+	do {
+		old_state = new_state = READ_ONCE(icp->state);
+
+		*reject = 0;
+
+		/* See if we can deliver */
+		success = new_state.cppr > priority &&
+			new_state.mfrr > priority &&
+			new_state.pending_pri > priority;
+
+		/*
+		 * If we can, check for a rejection and perform the
+		 * delivery
+		 */
+		if (success) {
+			*reject = new_state.xisr;
+			new_state.xisr = irq;
+			new_state.pending_pri = priority;
+		} else {
+			/*
+			 * If we failed to deliver we set need_resend
+			 * so a subsequent CPPR state change causes us
+			 * to try a new delivery.
+			 */
+			new_state.need_resend = true;
+		}
+
+	} while (!icp_rm_try_update(icp, old_state, new_state));
+
+	return success;
+}
+
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+			    u32 new_irq)
+{
+	struct ics_irq_state *state;
+	struct kvmppc_ics *ics;
+	u32 reject;
+	u16 src;
+
+	/*
+	 * This is used both for initial delivery of an interrupt and
+	 * for subsequent rejection.
+	 *
+	 * Rejection can be racy vs. resends. We have evaluated the
+	 * rejection in an atomic ICP transaction which is now complete,
+	 * so potentially the ICP can already accept the interrupt again.
+	 *
+	 * So we need to retry the delivery. Essentially the reject path
+	 * boils down to a failed delivery. Always.
+	 *
+	 * Now the interrupt could also have moved to a different target,
+	 * thus we may need to re-do the ICP lookup as well
+	 */
+
+ again:
+	/* Get the ICS state and lock it */
+	ics = kvmppc_xics_find_ics(xics, new_irq, &src);
+	if (!ics) {
+		/* Unsafe increment, but this does not need to be accurate */
+		return;
+	}
+	state = &ics->irq_state[src];
+
+	/* Get a lock on the ICS */
+	arch_spin_lock(&ics->lock);
+
+	/* Get our server */
+	if (!icp || state->server != icp->server_num) {
+		icp = kvmppc_xics_find_server(xics->kvm, state->server);
+		if (!icp) {
+			/* Unsafe increment again*/
+			goto out;
+		}
+	}
+
+	/* Clear the resend bit of that interrupt */
+	state->resend = 0;
+
+	/*
+	 * If masked, bail out
+	 *
+	 * Note: PAPR doesn't mention anything about masked pending
+	 * when doing a resend, only when doing a delivery.
+	 *
+	 * However that would have the effect of losing a masked
+	 * interrupt that was rejected and isn't consistent with
+	 * the whole masked_pending business which is about not
+	 * losing interrupts that occur while masked.
+	 *
+	 * I don't differentiate normal deliveries and resends, this
+	 * implementation will differ from PAPR and not lose such
+	 * interrupts.
+	 */
+	if (state->priority == MASKED) {
+		state->masked_pending = 1;
+		goto out;
+	}
+
+	/*
+	 * Try the delivery, this will set the need_resend flag
+	 * in the ICP as part of the atomic transaction if the
+	 * delivery is not possible.
+	 *
+	 * Note that if successful, the new delivery might have itself
+	 * rejected an interrupt that was "delivered" before we took the
+	 * ics spin lock.
+	 *
+	 * In this case we do the whole sequence all over again for the
+	 * new guy. We cannot assume that the rejected interrupt is less
+	 * favored than the new one, and thus doesn't need to be delivered,
+	 * because by the time we exit icp_rm_try_to_deliver() the target
+	 * processor may well have already consumed & completed it, and thus
+	 * the rejected interrupt might actually be already acceptable.
+	 */
+	if (icp_rm_try_to_deliver(icp, new_irq, state->priority, &reject)) {
+		/*
+		 * Delivery was successful, did we reject somebody else ?
+		 */
+		if (reject && reject != XICS_IPI) {
+			arch_spin_unlock(&ics->lock);
+			new_irq = reject;
+			goto again;
+		}
+	} else {
+		/*
+		 * We failed to deliver the interrupt we need to set the
+		 * resend map bit and mark the ICS state as needing a resend
+		 */
+		set_bit(ics->icsid, icp->resend_map);
+		state->resend = 1;
+
+		/*
+		 * If the need_resend flag got cleared in the ICP some time
+		 * between icp_rm_try_to_deliver() atomic update and now, then
+		 * we know it might have missed the resend_map bit. So we
+		 * retry
+		 */
+		smp_mb();
+		if (!icp->state.need_resend) {
+			arch_spin_unlock(&ics->lock);
+			goto again;
+		}
+	}
+ out:
+	arch_spin_unlock(&ics->lock);
+}
+
 static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 			     u8 new_cppr)
 {
@@ -184,8 +383,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	 * separately here as well.
 	 */
 	if (resend) {
-		icp->rm_action |= XICS_RM_CHECK_RESEND;
-		icp->rm_resend_icp = icp;
+		icp_rm_check_resend(xics, icp);
 	}
 }
 
@@ -300,16 +498,14 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
 		}
 	} while (!icp_rm_try_update(icp, old_state, new_state));
 
-	/* Pass rejects to virtual mode */
+	/* Handle reject in real mode */
 	if (reject && reject != XICS_IPI) {
-		this_icp->rm_action |= XICS_RM_REJECT;
-		this_icp->rm_reject = reject;
+		icp_rm_deliver_irq(xics, icp, reject);
 	}
 
-	/* Pass resends to virtual mode */
+	/* Handle resends in real mode */
 	if (resend) {
-		this_icp->rm_action |= XICS_RM_CHECK_RESEND;
-		this_icp->rm_resend_icp = icp;
+		icp_rm_check_resend(xics, icp);
 	}
 
 	return check_too_hard(xics, this_icp);
@@ -365,10 +561,12 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
 
 	} while (!icp_rm_try_update(icp, old_state, new_state));
 
-	/* Pass rejects to virtual mode */
+	/*
+	 * Check for rejects. They are handled by doing a new delivery
+	 * attempt (see comments in icp_rm_deliver_irq).
+	 */
 	if (reject && reject != XICS_IPI) {
-		icp->rm_action |= XICS_RM_REJECT;
-		icp->rm_reject = reject;
+		icp_rm_deliver_irq(xics, icp, reject);
 	}
  bail:
 	return check_too_hard(xics, icp);
@@ -416,10 +614,9 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 		goto bail;
 	state = &ics->irq_state[src];
 
-	/* Still asserted, resend it, we make it look like a reject */
+	/* Still asserted, resend it */
 	if (state->asserted) {
-		icp->rm_action |= XICS_RM_REJECT;
-		icp->rm_reject = irq;
+		icp_rm_deliver_irq(xics, icp, irq);
 	}
 
 	if (!hlist_empty(&vcpu->kvm->irq_ack_notifier_list)) {
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 10/23] KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Suresh Warrier

From: Suresh Warrier <warrier@linux.vnet.ibm.com>

Interrupt-based hypercalls return H_TOO_HARD to inform KVM that it needs
to switch to the host to complete the rest of hypercall function in
virtual mode. This patch ports the virtual mode ICS/ICP reject and resend
functions to be runnable in hypervisor real mode, thus avoiding the need
to switch to the host to execute these functions in virtual mode. However,
the hypercalls continue to return H_TOO_HARD for vcpu_wakeup and notify
events - these events cannot be done in real mode and they will still need
a switch to host virtual mode.

There are sufficient differences between the real mode code and the
virtual mode code for the ICS/ICP resend and reject functions that
for now the code has been duplicated instead of sharing common code.
In the future, we can look at creating common functions.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c | 225 ++++++++++++++++++++++++++++++++---
 1 file changed, 211 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 7c22997..73bbe92 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -23,12 +23,39 @@
 
 #define DEBUG_PASSUP
 
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+			    u32 new_irq);
+
 static inline void rm_writeb(unsigned long paddr, u8 val)
 {
 	__asm__ __volatile__("sync; stbcix %0,0,%1"
 		: : "r" (val), "r" (paddr) : "memory");
 }
 
+/* -- ICS routines -- */
+static void ics_rm_check_resend(struct kvmppc_xics *xics,
+				struct kvmppc_ics *ics, struct kvmppc_icp *icp)
+{
+	int i;
+
+	arch_spin_lock(&ics->lock);
+
+	for (i = 0; i < KVMPPC_XICS_IRQ_PER_ICS; i++) {
+		struct ics_irq_state *state = &ics->irq_state[i];
+
+		if (!state->resend)
+			continue;
+
+		arch_spin_unlock(&ics->lock);
+		icp_rm_deliver_irq(xics, icp, state->number);
+		arch_spin_lock(&ics->lock);
+	}
+
+	arch_spin_unlock(&ics->lock);
+}
+
+/* -- ICP routines -- */
+
 static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
 				struct kvm_vcpu *this_vcpu)
 {
@@ -116,6 +143,178 @@ static inline int check_too_hard(struct kvmppc_xics *xics,
 	return (xics->real_mode_dbg || icp->rm_action) ? H_TOO_HARD : H_SUCCESS;
 }
 
+static void icp_rm_check_resend(struct kvmppc_xics *xics,
+			     struct kvmppc_icp *icp)
+{
+	u32 icsid;
+
+	/* Order this load with the test for need_resend in the caller */
+	smp_rmb();
+	for_each_set_bit(icsid, icp->resend_map, xics->max_icsid + 1) {
+		struct kvmppc_ics *ics = xics->ics[icsid];
+
+		if (!test_and_clear_bit(icsid, icp->resend_map))
+			continue;
+		if (!ics)
+			continue;
+		ics_rm_check_resend(xics, ics, icp);
+	}
+}
+
+static bool icp_rm_try_to_deliver(struct kvmppc_icp *icp, u32 irq, u8 priority,
+			       u32 *reject)
+{
+	union kvmppc_icp_state old_state, new_state;
+	bool success;
+
+	do {
+		old_state = new_state = READ_ONCE(icp->state);
+
+		*reject = 0;
+
+		/* See if we can deliver */
+		success = new_state.cppr > priority &&
+			new_state.mfrr > priority &&
+			new_state.pending_pri > priority;
+
+		/*
+		 * If we can, check for a rejection and perform the
+		 * delivery
+		 */
+		if (success) {
+			*reject = new_state.xisr;
+			new_state.xisr = irq;
+			new_state.pending_pri = priority;
+		} else {
+			/*
+			 * If we failed to deliver we set need_resend
+			 * so a subsequent CPPR state change causes us
+			 * to try a new delivery.
+			 */
+			new_state.need_resend = true;
+		}
+
+	} while (!icp_rm_try_update(icp, old_state, new_state));
+
+	return success;
+}
+
+static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
+			    u32 new_irq)
+{
+	struct ics_irq_state *state;
+	struct kvmppc_ics *ics;
+	u32 reject;
+	u16 src;
+
+	/*
+	 * This is used both for initial delivery of an interrupt and
+	 * for subsequent rejection.
+	 *
+	 * Rejection can be racy vs. resends. We have evaluated the
+	 * rejection in an atomic ICP transaction which is now complete,
+	 * so potentially the ICP can already accept the interrupt again.
+	 *
+	 * So we need to retry the delivery. Essentially the reject path
+	 * boils down to a failed delivery. Always.
+	 *
+	 * Now the interrupt could also have moved to a different target,
+	 * thus we may need to re-do the ICP lookup as well
+	 */
+
+ again:
+	/* Get the ICS state and lock it */
+	ics = kvmppc_xics_find_ics(xics, new_irq, &src);
+	if (!ics) {
+		/* Unsafe increment, but this does not need to be accurate */
+		return;
+	}
+	state = &ics->irq_state[src];
+
+	/* Get a lock on the ICS */
+	arch_spin_lock(&ics->lock);
+
+	/* Get our server */
+	if (!icp || state->server != icp->server_num) {
+		icp = kvmppc_xics_find_server(xics->kvm, state->server);
+		if (!icp) {
+			/* Unsafe increment again*/
+			goto out;
+		}
+	}
+
+	/* Clear the resend bit of that interrupt */
+	state->resend = 0;
+
+	/*
+	 * If masked, bail out
+	 *
+	 * Note: PAPR doesn't mention anything about masked pending
+	 * when doing a resend, only when doing a delivery.
+	 *
+	 * However that would have the effect of losing a masked
+	 * interrupt that was rejected and isn't consistent with
+	 * the whole masked_pending business which is about not
+	 * losing interrupts that occur while masked.
+	 *
+	 * I don't differentiate normal deliveries and resends, this
+	 * implementation will differ from PAPR and not lose such
+	 * interrupts.
+	 */
+	if (state->priority = MASKED) {
+		state->masked_pending = 1;
+		goto out;
+	}
+
+	/*
+	 * Try the delivery, this will set the need_resend flag
+	 * in the ICP as part of the atomic transaction if the
+	 * delivery is not possible.
+	 *
+	 * Note that if successful, the new delivery might have itself
+	 * rejected an interrupt that was "delivered" before we took the
+	 * ics spin lock.
+	 *
+	 * In this case we do the whole sequence all over again for the
+	 * new guy. We cannot assume that the rejected interrupt is less
+	 * favored than the new one, and thus doesn't need to be delivered,
+	 * because by the time we exit icp_rm_try_to_deliver() the target
+	 * processor may well have already consumed & completed it, and thus
+	 * the rejected interrupt might actually be already acceptable.
+	 */
+	if (icp_rm_try_to_deliver(icp, new_irq, state->priority, &reject)) {
+		/*
+		 * Delivery was successful, did we reject somebody else ?
+		 */
+		if (reject && reject != XICS_IPI) {
+			arch_spin_unlock(&ics->lock);
+			new_irq = reject;
+			goto again;
+		}
+	} else {
+		/*
+		 * We failed to deliver the interrupt we need to set the
+		 * resend map bit and mark the ICS state as needing a resend
+		 */
+		set_bit(ics->icsid, icp->resend_map);
+		state->resend = 1;
+
+		/*
+		 * If the need_resend flag got cleared in the ICP some time
+		 * between icp_rm_try_to_deliver() atomic update and now, then
+		 * we know it might have missed the resend_map bit. So we
+		 * retry
+		 */
+		smp_mb();
+		if (!icp->state.need_resend) {
+			arch_spin_unlock(&ics->lock);
+			goto again;
+		}
+	}
+ out:
+	arch_spin_unlock(&ics->lock);
+}
+
 static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 			     u8 new_cppr)
 {
@@ -184,8 +383,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	 * separately here as well.
 	 */
 	if (resend) {
-		icp->rm_action |= XICS_RM_CHECK_RESEND;
-		icp->rm_resend_icp = icp;
+		icp_rm_check_resend(xics, icp);
 	}
 }
 
@@ -300,16 +498,14 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
 		}
 	} while (!icp_rm_try_update(icp, old_state, new_state));
 
-	/* Pass rejects to virtual mode */
+	/* Handle reject in real mode */
 	if (reject && reject != XICS_IPI) {
-		this_icp->rm_action |= XICS_RM_REJECT;
-		this_icp->rm_reject = reject;
+		icp_rm_deliver_irq(xics, icp, reject);
 	}
 
-	/* Pass resends to virtual mode */
+	/* Handle resends in real mode */
 	if (resend) {
-		this_icp->rm_action |= XICS_RM_CHECK_RESEND;
-		this_icp->rm_resend_icp = icp;
+		icp_rm_check_resend(xics, icp);
 	}
 
 	return check_too_hard(xics, this_icp);
@@ -365,10 +561,12 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
 
 	} while (!icp_rm_try_update(icp, old_state, new_state));
 
-	/* Pass rejects to virtual mode */
+	/*
+	 * Check for rejects. They are handled by doing a new delivery
+	 * attempt (see comments in icp_rm_deliver_irq).
+	 */
 	if (reject && reject != XICS_IPI) {
-		icp->rm_action |= XICS_RM_REJECT;
-		icp->rm_reject = reject;
+		icp_rm_deliver_irq(xics, icp, reject);
 	}
  bail:
 	return check_too_hard(xics, icp);
@@ -416,10 +614,9 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 		goto bail;
 	state = &ics->irq_state[src];
 
-	/* Still asserted, resend it, we make it look like a reject */
+	/* Still asserted, resend it */
 	if (state->asserted) {
-		icp->rm_action |= XICS_RM_REJECT;
-		icp->rm_reject = irq;
+		icp_rm_deliver_irq(xics, icp, irq);
 	}
 
 	if (!hlist_empty(&vcpu->kvm->irq_ack_notifier_list)) {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/23] KVM: PPC: Book3S HV: Add ICP real mode counters
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Suresh Warrier

From: Suresh Warrier <warrier@linux.vnet.ibm.com>

Add two counters to count how often we generate real-mode ICS resend
and reject events. The counters provide some performance statistics
that could be used in the future to consider if the real mode functions
need further optimizing. The counters are displayed as part of IPC and
ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM.

Also added two counters that count (approximately) how many times we
don't find an ICP or ICS we're looking for. These are not currently
exposed through sysfs, but can be useful when debugging crashes.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  7 +++++++
 arch/powerpc/kvm/book3s_xics.c       | 10 ++++++++--
 arch/powerpc/kvm/book3s_xics.h       |  5 +++++
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 73bbe92..6dded8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -227,6 +227,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	ics = kvmppc_xics_find_ics(xics, new_irq, &src);
 	if (!ics) {
 		/* Unsafe increment, but this does not need to be accurate */
+		xics->err_noics++;
 		return;
 	}
 	state = &ics->irq_state[src];
@@ -239,6 +240,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 		icp = kvmppc_xics_find_server(xics->kvm, state->server);
 		if (!icp) {
 			/* Unsafe increment again*/
+			xics->err_noicp++;
 			goto out;
 		}
 	}
@@ -383,6 +385,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	 * separately here as well.
 	 */
 	if (resend) {
+		icp->n_check_resend++;
 		icp_rm_check_resend(xics, icp);
 	}
 }
@@ -500,11 +503,13 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
 
 	/* Handle reject in real mode */
 	if (reject && reject != XICS_IPI) {
+		this_icp->n_reject++;
 		icp_rm_deliver_irq(xics, icp, reject);
 	}
 
 	/* Handle resends in real mode */
 	if (resend) {
+		this_icp->n_check_resend++;
 		icp_rm_check_resend(xics, icp);
 	}
 
@@ -566,6 +571,7 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
 	 * attempt (see comments in icp_rm_deliver_irq).
 	 */
 	if (reject && reject != XICS_IPI) {
+		icp->n_reject++;
 		icp_rm_deliver_irq(xics, icp, reject);
 	}
  bail:
@@ -616,6 +622,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 
 	/* Still asserted, resend it */
 	if (state->asserted) {
+		icp->n_reject++;
 		icp_rm_deliver_irq(xics, icp, irq);
 	}
 
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 56ed9b4..eb2569a 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -901,6 +901,7 @@ static int xics_debug_show(struct seq_file *m, void *private)
 	unsigned long flags;
 	unsigned long t_rm_kick_vcpu, t_rm_check_resend;
 	unsigned long t_rm_reject, t_rm_notify_eoi;
+	unsigned long t_reject, t_check_resend;
 
 	if (!kvm)
 		return 0;
@@ -909,6 +910,8 @@ static int xics_debug_show(struct seq_file *m, void *private)
 	t_rm_notify_eoi = 0;
 	t_rm_check_resend = 0;
 	t_rm_reject = 0;
+	t_check_resend = 0;
+	t_reject = 0;
 
 	seq_printf(m, "=========\nICP state\n=========\n");
 
@@ -928,12 +931,15 @@ static int xics_debug_show(struct seq_file *m, void *private)
 		t_rm_notify_eoi += icp->n_rm_notify_eoi;
 		t_rm_check_resend += icp->n_rm_check_resend;
 		t_rm_reject += icp->n_rm_reject;
+		t_check_resend += icp->n_check_resend;
+		t_reject += icp->n_reject;
 	}
 
-	seq_puts(m, "ICP Guest Real Mode exit totals: ");
-	seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n",
+	seq_printf(m, "ICP Guest->Host totals: kick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n",
 			t_rm_kick_vcpu, t_rm_check_resend,
 			t_rm_reject, t_rm_notify_eoi);
+	seq_printf(m, "ICP Real Mode totals: check_resend=%lu resend=%lu\n",
+			t_check_resend, t_reject);
 	for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) {
 		struct kvmppc_ics *ics = xics->ics[icsid];
 
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 055424c..56ea44f 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -83,6 +83,9 @@ struct kvmppc_icp {
 	unsigned long n_rm_check_resend;
 	unsigned long n_rm_reject;
 	unsigned long n_rm_notify_eoi;
+	/* Counters for handling ICP processing in real mode */
+	unsigned long n_check_resend;
+	unsigned long n_reject;
 
 	/* Debug stuff for real mode */
 	union kvmppc_icp_state rm_dbgstate;
@@ -102,6 +105,8 @@ struct kvmppc_xics {
 	u32 max_icsid;
 	bool real_mode;
 	bool real_mode_dbg;
+	u32 err_noics;
+	u32 err_noicp;
 	struct kvmppc_ics *ics[KVMPPC_XICS_MAX_ICS_ID + 1];
 };
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 11/23] KVM: PPC: Book3S HV: Add ICP real mode counters
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf, Suresh Warrier

From: Suresh Warrier <warrier@linux.vnet.ibm.com>

Add two counters to count how often we generate real-mode ICS resend
and reject events. The counters provide some performance statistics
that could be used in the future to consider if the real mode functions
need further optimizing. The counters are displayed as part of IPC and
ICP state provided by /sys/debug/kernel/powerpc/kvm* for each VM.

Also added two counters that count (approximately) how many times we
don't find an ICP or ICS we're looking for. These are not currently
exposed through sysfs, but can be useful when debugging crashes.

Signed-off-by: Suresh Warrier <warrier@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_xics.c |  7 +++++++
 arch/powerpc/kvm/book3s_xics.c       | 10 ++++++++--
 arch/powerpc/kvm/book3s_xics.h       |  5 +++++
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 73bbe92..6dded8c 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -227,6 +227,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	ics = kvmppc_xics_find_ics(xics, new_irq, &src);
 	if (!ics) {
 		/* Unsafe increment, but this does not need to be accurate */
+		xics->err_noics++;
 		return;
 	}
 	state = &ics->irq_state[src];
@@ -239,6 +240,7 @@ static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 		icp = kvmppc_xics_find_server(xics->kvm, state->server);
 		if (!icp) {
 			/* Unsafe increment again*/
+			xics->err_noicp++;
 			goto out;
 		}
 	}
@@ -383,6 +385,7 @@ static void icp_rm_down_cppr(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 	 * separately here as well.
 	 */
 	if (resend) {
+		icp->n_check_resend++;
 		icp_rm_check_resend(xics, icp);
 	}
 }
@@ -500,11 +503,13 @@ int kvmppc_rm_h_ipi(struct kvm_vcpu *vcpu, unsigned long server,
 
 	/* Handle reject in real mode */
 	if (reject && reject != XICS_IPI) {
+		this_icp->n_reject++;
 		icp_rm_deliver_irq(xics, icp, reject);
 	}
 
 	/* Handle resends in real mode */
 	if (resend) {
+		this_icp->n_check_resend++;
 		icp_rm_check_resend(xics, icp);
 	}
 
@@ -566,6 +571,7 @@ int kvmppc_rm_h_cppr(struct kvm_vcpu *vcpu, unsigned long cppr)
 	 * attempt (see comments in icp_rm_deliver_irq).
 	 */
 	if (reject && reject != XICS_IPI) {
+		icp->n_reject++;
 		icp_rm_deliver_irq(xics, icp, reject);
 	}
  bail:
@@ -616,6 +622,7 @@ int kvmppc_rm_h_eoi(struct kvm_vcpu *vcpu, unsigned long xirr)
 
 	/* Still asserted, resend it */
 	if (state->asserted) {
+		icp->n_reject++;
 		icp_rm_deliver_irq(xics, icp, irq);
 	}
 
diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index 56ed9b4..eb2569a 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -901,6 +901,7 @@ static int xics_debug_show(struct seq_file *m, void *private)
 	unsigned long flags;
 	unsigned long t_rm_kick_vcpu, t_rm_check_resend;
 	unsigned long t_rm_reject, t_rm_notify_eoi;
+	unsigned long t_reject, t_check_resend;
 
 	if (!kvm)
 		return 0;
@@ -909,6 +910,8 @@ static int xics_debug_show(struct seq_file *m, void *private)
 	t_rm_notify_eoi = 0;
 	t_rm_check_resend = 0;
 	t_rm_reject = 0;
+	t_check_resend = 0;
+	t_reject = 0;
 
 	seq_printf(m, "=====\nICP state\n=====\n");
 
@@ -928,12 +931,15 @@ static int xics_debug_show(struct seq_file *m, void *private)
 		t_rm_notify_eoi += icp->n_rm_notify_eoi;
 		t_rm_check_resend += icp->n_rm_check_resend;
 		t_rm_reject += icp->n_rm_reject;
+		t_check_resend += icp->n_check_resend;
+		t_reject += icp->n_reject;
 	}
 
-	seq_puts(m, "ICP Guest Real Mode exit totals: ");
-	seq_printf(m, "\tkick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n",
+	seq_printf(m, "ICP Guest->Host totals: kick_vcpu=%lu check_resend=%lu reject=%lu notify_eoi=%lu\n",
 			t_rm_kick_vcpu, t_rm_check_resend,
 			t_rm_reject, t_rm_notify_eoi);
+	seq_printf(m, "ICP Real Mode totals: check_resend=%lu resend=%lu\n",
+			t_check_resend, t_reject);
 	for (icsid = 0; icsid <= KVMPPC_XICS_MAX_ICS_ID; icsid++) {
 		struct kvmppc_ics *ics = xics->ics[icsid];
 
diff --git a/arch/powerpc/kvm/book3s_xics.h b/arch/powerpc/kvm/book3s_xics.h
index 055424c..56ea44f 100644
--- a/arch/powerpc/kvm/book3s_xics.h
+++ b/arch/powerpc/kvm/book3s_xics.h
@@ -83,6 +83,9 @@ struct kvmppc_icp {
 	unsigned long n_rm_check_resend;
 	unsigned long n_rm_reject;
 	unsigned long n_rm_notify_eoi;
+	/* Counters for handling ICP processing in real mode */
+	unsigned long n_check_resend;
+	unsigned long n_reject;
 
 	/* Debug stuff for real mode */
 	union kvmppc_icp_state rm_dbgstate;
@@ -102,6 +105,8 @@ struct kvmppc_xics {
 	u32 max_icsid;
 	bool real_mode;
 	bool real_mode_dbg;
+	u32 err_noics;
+	u32 err_noicp;
 	struct kvmppc_ics *ics[KVMPPC_XICS_MAX_ICS_ID + 1];
 };
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vmnnnn, where nnnn is the PID of the
process that created the guest.  The file is named "htab".  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
 arch/powerpc/include/asm/kvm_host.h      |   2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 136 +++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c             |  12 +++
 virt/kvm/kvm_main.c                      |   1 +
 5 files changed, 153 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0789a0f..869c53f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
 	return rcu_dereference_raw_notrace(kvm->memslots);
 }
 
+extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 015773f..f1d0bbc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ struct kvm_arch {
 	atomic_t hpte_mod_interest;
 	cpumask_t need_tlb_flush;
 	int hpt_cma_alloc;
+	struct dentry *debugfs_dir;
+	struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6c6825a..d6fe308 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -27,6 +27,7 @@
 #include <linux/srcu.h>
 #include <linux/anon_inodes.h>
 #include <linux/file.h>
+#include <linux/debugfs.h>
 
 #include <asm/tlbflush.h>
 #include <asm/kvm_ppc.h>
@@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *ghf)
 	return ret;
 }
 
+struct debugfs_htab_state {
+	struct kvm	*kvm;
+	struct mutex	mutex;
+	unsigned long	hpt_index;
+	int		chars_left;
+	int		buf_index;
+	char		buf[64];
+};
+
+static int debugfs_htab_open(struct inode *inode, struct file *file)
+{
+	struct kvm *kvm = inode->i_private;
+	struct debugfs_htab_state *p;
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+
+	kvm_get_kvm(kvm);
+	p->kvm = kvm;
+	mutex_init(&p->mutex);
+	file->private_data = p;
+
+	return nonseekable_open(inode, file);
+}
+
+static int debugfs_htab_release(struct inode *inode, struct file *file)
+{
+	struct debugfs_htab_state *p = file->private_data;
+
+	kvm_put_kvm(p->kvm);
+	kfree(p);
+	return 0;
+}
+
+static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
+				 size_t len, loff_t *ppos)
+{
+	struct debugfs_htab_state *p = file->private_data;
+	ssize_t ret, r;
+	unsigned long i, n;
+	unsigned long v, hr, gr;
+	struct kvm *kvm;
+	__be64 *hptp;
+
+	ret = mutex_lock_interruptible(&p->mutex);
+	if (ret)
+		return ret;
+
+	if (p->chars_left) {
+		n = p->chars_left;
+		if (n > len)
+			n = len;
+		r = copy_to_user(buf, p->buf + p->buf_index, n);
+		n -= r;
+		p->chars_left -= n;
+		p->buf_index += n;
+		buf += n;
+		len -= n;
+		ret = n;
+		if (r) {
+			if (!n)
+				ret = -EFAULT;
+			goto out;
+		}
+	}
+
+	kvm = p->kvm;
+	i = p->hpt_index;
+	hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
+	for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
+		if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)))
+			continue;
+
+		/* lock the HPTE so it's stable and read it */
+		preempt_disable();
+		while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
+			cpu_relax();
+		v = be64_to_cpu(hptp[0]) & ~HPTE_V_HVLOCK;
+		hr = be64_to_cpu(hptp[1]);
+		gr = kvm->arch.revmap[i].guest_rpte;
+		unlock_hpte(hptp, v);
+		preempt_enable();
+
+		if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT)))
+			continue;
+
+		n = scnprintf(p->buf, sizeof(p->buf),
+			      "%6lx %.16lx %.16lx %.16lx\n",
+			      i, v, hr, gr);
+		p->chars_left = n;
+		if (n > len)
+			n = len;
+		r = copy_to_user(buf, p->buf, n);
+		n -= r;
+		p->chars_left -= n;
+		p->buf_index = n;
+		buf += n;
+		len -= n;
+		ret += n;
+		if (r) {
+			if (!ret)
+				ret = -EFAULT;
+			goto out;
+		}
+	}
+	p->hpt_index = i;
+
+ out:
+	mutex_unlock(&p->mutex);
+	return ret;
+}
+
+ssize_t debugfs_htab_write(struct file *file, const char __user *buf,
+			   size_t len, loff_t *ppos)
+{
+	return -EACCES;
+}
+
+static const struct file_operations debugfs_htab_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = debugfs_htab_open,
+	.release = debugfs_htab_release,
+	.read	 = debugfs_htab_read,
+	.write	 = debugfs_htab_write,
+	.llseek	 = generic_file_llseek,
+};
+
+void kvmppc_mmu_debugfs_init(struct kvm *kvm)
+{
+	kvm->arch.htab_dentry = debugfs_create_file("htab", 0400,
+						    kvm->arch.debugfs_dir, kvm,
+						    &debugfs_htab_fops);
+}
+
 void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7b7102a..6631b53 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -32,6 +32,7 @@
 #include <linux/page-flags.h>
 #include <linux/srcu.h>
 #include <linux/miscdevice.h>
+#include <linux/debugfs.h>
 
 #include <asm/reg.h>
 #include <asm/cputable.h>
@@ -2307,6 +2308,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 {
 	unsigned long lpcr, lpid;
+	char buf[32];
 
 	/* Allocate the guest's logical partition ID */
 
@@ -2347,6 +2349,14 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	 */
 	kvm_hv_vm_activated();
 
+	/*
+	 * Create a debugfs directory for the VM
+	 */
+	sprintf(buf, "vm%d", current->pid);
+	kvm->arch.debugfs_dir = debugfs_create_dir(buf, kvm_debugfs_dir);
+	if (!IS_ERR_OR_NULL(kvm->arch.debugfs_dir))
+		kvmppc_mmu_debugfs_init(kvm);
+
 	return 0;
 }
 
@@ -2367,6 +2377,8 @@ static void kvmppc_free_vcores(struct kvm *kvm)
 
 static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
+	debugfs_remove_recursive(kvm->arch.debugfs_dir);
+
 	kvm_hv_vm_deactivated();
 
 	kvmppc_free_vcores(kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 49900fc..2b7278b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -89,6 +89,7 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_cache);
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 
 struct dentry *kvm_debugfs_dir;
+EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
 
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
 			   unsigned long arg);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This creates a debugfs directory for each HV guest (assuming debugfs
is enabled in the kernel config), and within that directory, a file
by which the contents of the guest's HPT (hashed page table) can be
read.  The directory is named vmnnnn, where nnnn is the PID of the
process that created the guest.  The file is named "htab".  This is
intended to help in debugging problems in the host's management
of guest memory.

The contents of the file consist of a series of lines like this:

  3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196

The first field is the index of the entry in the HPT, the second and
third are the HPT entry, so the third entry contains the real page
number that is mapped by the entry if the entry's valid bit is set.
The fourth field is the guest's view of the second doubleword of the
entry, so it contains the guest physical address.  (The format of the
second through fourth fields are described in the Power ISA and also
in arch/powerpc/include/asm/mmu-hash64.h.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
 arch/powerpc/include/asm/kvm_host.h      |   2 +
 arch/powerpc/kvm/book3s_64_mmu_hv.c      | 136 +++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c             |  12 +++
 virt/kvm/kvm_main.c                      |   1 +
 5 files changed, 153 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 0789a0f..869c53f 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
 	return rcu_dereference_raw_notrace(kvm->memslots);
 }
 
+extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 015773f..f1d0bbc 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -238,6 +238,8 @@ struct kvm_arch {
 	atomic_t hpte_mod_interest;
 	cpumask_t need_tlb_flush;
 	int hpt_cma_alloc;
+	struct dentry *debugfs_dir;
+	struct dentry *htab_dentry;
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
 	struct mutex hpt_mutex;
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 6c6825a..d6fe308 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -27,6 +27,7 @@
 #include <linux/srcu.h>
 #include <linux/anon_inodes.h>
 #include <linux/file.h>
+#include <linux/debugfs.h>
 
 #include <asm/tlbflush.h>
 #include <asm/kvm_ppc.h>
@@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *ghf)
 	return ret;
 }
 
+struct debugfs_htab_state {
+	struct kvm	*kvm;
+	struct mutex	mutex;
+	unsigned long	hpt_index;
+	int		chars_left;
+	int		buf_index;
+	char		buf[64];
+};
+
+static int debugfs_htab_open(struct inode *inode, struct file *file)
+{
+	struct kvm *kvm = inode->i_private;
+	struct debugfs_htab_state *p;
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+
+	kvm_get_kvm(kvm);
+	p->kvm = kvm;
+	mutex_init(&p->mutex);
+	file->private_data = p;
+
+	return nonseekable_open(inode, file);
+}
+
+static int debugfs_htab_release(struct inode *inode, struct file *file)
+{
+	struct debugfs_htab_state *p = file->private_data;
+
+	kvm_put_kvm(p->kvm);
+	kfree(p);
+	return 0;
+}
+
+static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
+				 size_t len, loff_t *ppos)
+{
+	struct debugfs_htab_state *p = file->private_data;
+	ssize_t ret, r;
+	unsigned long i, n;
+	unsigned long v, hr, gr;
+	struct kvm *kvm;
+	__be64 *hptp;
+
+	ret = mutex_lock_interruptible(&p->mutex);
+	if (ret)
+		return ret;
+
+	if (p->chars_left) {
+		n = p->chars_left;
+		if (n > len)
+			n = len;
+		r = copy_to_user(buf, p->buf + p->buf_index, n);
+		n -= r;
+		p->chars_left -= n;
+		p->buf_index += n;
+		buf += n;
+		len -= n;
+		ret = n;
+		if (r) {
+			if (!n)
+				ret = -EFAULT;
+			goto out;
+		}
+	}
+
+	kvm = p->kvm;
+	i = p->hpt_index;
+	hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
+	for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
+		if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)))
+			continue;
+
+		/* lock the HPTE so it's stable and read it */
+		preempt_disable();
+		while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
+			cpu_relax();
+		v = be64_to_cpu(hptp[0]) & ~HPTE_V_HVLOCK;
+		hr = be64_to_cpu(hptp[1]);
+		gr = kvm->arch.revmap[i].guest_rpte;
+		unlock_hpte(hptp, v);
+		preempt_enable();
+
+		if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT)))
+			continue;
+
+		n = scnprintf(p->buf, sizeof(p->buf),
+			      "%6lx %.16lx %.16lx %.16lx\n",
+			      i, v, hr, gr);
+		p->chars_left = n;
+		if (n > len)
+			n = len;
+		r = copy_to_user(buf, p->buf, n);
+		n -= r;
+		p->chars_left -= n;
+		p->buf_index = n;
+		buf += n;
+		len -= n;
+		ret += n;
+		if (r) {
+			if (!ret)
+				ret = -EFAULT;
+			goto out;
+		}
+	}
+	p->hpt_index = i;
+
+ out:
+	mutex_unlock(&p->mutex);
+	return ret;
+}
+
+ssize_t debugfs_htab_write(struct file *file, const char __user *buf,
+			   size_t len, loff_t *ppos)
+{
+	return -EACCES;
+}
+
+static const struct file_operations debugfs_htab_fops = {
+	.owner	 = THIS_MODULE,
+	.open	 = debugfs_htab_open,
+	.release = debugfs_htab_release,
+	.read	 = debugfs_htab_read,
+	.write	 = debugfs_htab_write,
+	.llseek	 = generic_file_llseek,
+};
+
+void kvmppc_mmu_debugfs_init(struct kvm *kvm)
+{
+	kvm->arch.htab_dentry = debugfs_create_file("htab", 0400,
+						    kvm->arch.debugfs_dir, kvm,
+						    &debugfs_htab_fops);
+}
+
 void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7b7102a..6631b53 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -32,6 +32,7 @@
 #include <linux/page-flags.h>
 #include <linux/srcu.h>
 #include <linux/miscdevice.h>
+#include <linux/debugfs.h>
 
 #include <asm/reg.h>
 #include <asm/cputable.h>
@@ -2307,6 +2308,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
 static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 {
 	unsigned long lpcr, lpid;
+	char buf[32];
 
 	/* Allocate the guest's logical partition ID */
 
@@ -2347,6 +2349,14 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	 */
 	kvm_hv_vm_activated();
 
+	/*
+	 * Create a debugfs directory for the VM
+	 */
+	sprintf(buf, "vm%d", current->pid);
+	kvm->arch.debugfs_dir = debugfs_create_dir(buf, kvm_debugfs_dir);
+	if (!IS_ERR_OR_NULL(kvm->arch.debugfs_dir))
+		kvmppc_mmu_debugfs_init(kvm);
+
 	return 0;
 }
 
@@ -2367,6 +2377,8 @@ static void kvmppc_free_vcores(struct kvm *kvm)
 
 static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
+	debugfs_remove_recursive(kvm->arch.debugfs_dir);
+
 	kvm_hv_vm_deactivated();
 
 	kvmppc_free_vcores(kvm);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 49900fc..2b7278b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -89,6 +89,7 @@ EXPORT_SYMBOL_GPL(kvm_vcpu_cache);
 static __read_mostly struct preempt_ops kvm_preempt_ops;
 
 struct dentry *kvm_debugfs_dir;
+EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
 
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
 			   unsigned long arg);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h     |  19 +++++
 arch/powerpc/include/asm/time.h         |   3 +
 arch/powerpc/kernel/asm-offsets.c       |  11 +++
 arch/powerpc/kernel/time.c              |   6 ++
 arch/powerpc/kvm/book3s_hv.c            | 135 ++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 ++++++++++++++++++++++++-
 6 files changed, 276 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index f1d0bbc..286c0ce 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -369,6 +369,14 @@ struct kvmppc_slb {
 	u8 base_page_size;	/* MMU_PAGE_xxx */
 };
 
+/* Struct used to accumulate timing information in HV real mode code */
+struct kvmhv_tb_accumulator {
+	u64	seqcount;	/* used to synchronize access, also count * 2 */
+	u64	tb_total;	/* total time in timebase ticks */
+	u64	tb_min;		/* min time */
+	u64	tb_max;		/* max time */
+};
+
 # ifdef CONFIG_PPC_FSL_BOOK3E
 #define KVMPPC_BOOKE_IAC_NUM	2
 #define KVMPPC_BOOKE_DAC_NUM	2
@@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
 	u64 busy_preempt;
 
 	u32 emul_inst;
+
+	struct kvmhv_tb_accumulator *cur_activity;	/* What we're timing */
+	u64	cur_tb_start;			/* when it started */
+	struct kvmhv_tb_accumulator rm_entry;	/* real-mode entry code */
+	struct kvmhv_tb_accumulator rm_intr;	/* real-mode intr handling */
+	struct kvmhv_tb_accumulator rm_exit;	/* real-mode exit code */
+	struct kvmhv_tb_accumulator guest_time;	/* guest execution */
+	struct kvmhv_tb_accumulator cede_time;	/* time napping inside guest */
+
+	struct dentry *debugfs_dir;
+	struct dentry *debugfs_timings;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 03cbada..10fc784 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+/* Convert timebase ticks to nanoseconds */
+unsigned long long tb_to_ns(unsigned long long tb_ticks);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 4717859..ec9f59c 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -458,6 +458,17 @@ int main(void)
 	DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
 	DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
 	DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
+	DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
+	DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
+	DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
+	DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
+	DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
+	DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
+	DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, arch.cur_tb_start));
+	DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
+	DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
+	DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
+	DEFINE(TAS_MAX, offsetof(struct kvmhv_tb_accumulator, tb_max));
 #endif
 	DEFINE(VCPU_SHARED_SPRG3, offsetof(struct kvm_vcpu_arch_shared, sprg3));
 	DEFINE(VCPU_SHARED_SPRG4, offsetof(struct kvm_vcpu_arch_shared, sprg4));
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 2d7b33f..56f4484 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -608,6 +608,12 @@ void arch_suspend_enable_irqs(void)
 }
 #endif
 
+unsigned long long tb_to_ns(unsigned long long ticks)
+{
+	return mulhdu(ticks, tb_to_ns_scale) << tb_to_ns_shift;
+}
+EXPORT_SYMBOL_GPL(tb_to_ns);
+
 /*
  * Scheduler clock - returns current time in nanosec units.
  *
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6631b53..8517c33 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1411,6 +1411,127 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
 	return vcore;
 }
 
+static struct debugfs_timings_element {
+	const char *name;
+	size_t offset;
+} timings[] = {
+	{"rm_entry",	offsetof(struct kvm_vcpu, arch.rm_entry)},
+	{"rm_intr",	offsetof(struct kvm_vcpu, arch.rm_intr)},
+	{"rm_exit",	offsetof(struct kvm_vcpu, arch.rm_exit)},
+	{"guest",	offsetof(struct kvm_vcpu, arch.guest_time)},
+	{"cede",	offsetof(struct kvm_vcpu, arch.cede_time)},
+};
+
+#define N_TIMINGS	(sizeof(timings) / sizeof(timings[0]))
+
+struct debugfs_timings_state {
+	struct kvm_vcpu	*vcpu;
+	unsigned int	buflen;
+	char		buf[N_TIMINGS * 100];
+};
+
+static int debugfs_timings_open(struct inode *inode, struct file *file)
+{
+	struct kvm_vcpu *vcpu = inode->i_private;
+	struct debugfs_timings_state *p;
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+
+	kvm_get_kvm(vcpu->kvm);
+	p->vcpu = vcpu;
+	file->private_data = p;
+
+	return nonseekable_open(inode, file);
+}
+
+static int debugfs_timings_release(struct inode *inode, struct file *file)
+{
+	struct debugfs_timings_state *p = file->private_data;
+
+	kvm_put_kvm(p->vcpu->kvm);
+	kfree(p);
+	return 0;
+}
+
+static ssize_t debugfs_timings_read(struct file *file, char __user *buf,
+				    size_t len, loff_t *ppos)
+{
+	struct debugfs_timings_state *p = file->private_data;
+	struct kvm_vcpu *vcpu = p->vcpu;
+	char *s;
+	struct kvmhv_tb_accumulator tb;
+	u64 count;
+	loff_t pos;
+	ssize_t n;
+	int i, loops;
+	bool ok;
+
+	if (!p->buflen) {
+		s = p->buf;
+		for (i = 0; i < N_TIMINGS; ++i) {
+			struct kvmhv_tb_accumulator *acc;
+
+			acc = (struct kvmhv_tb_accumulator *)
+				((unsigned long)vcpu + timings[i].offset);
+			ok = false;
+			for (loops = 0; loops < 1000; ++loops) {
+				count = acc->seqcount;
+				if (!(count & 1)) {
+					smp_rmb();
+					tb = *acc;
+					smp_rmb();
+					if (count == acc->seqcount) {
+						ok = true;
+						break;
+					}
+				}
+				udelay(1);
+			}
+			if (!ok)
+				sprintf(s, "%s: stuck\n", timings[i].name);
+			else
+				sprintf(s, "%s: %llu %llu %llu %llu\n",
+					timings[i].name, count / 2,
+					tb_to_ns(tb.tb_total),
+					tb_to_ns(tb.tb_min),
+					tb_to_ns(tb.tb_max));
+			s += strlen(s);
+		}
+		p->buflen = s - p->buf;
+	}
+
+	pos = *ppos;
+	if (pos >= p->buflen)
+		return 0;
+	if (len > p->buflen - pos)
+		len = p->buflen - pos;
+	n = copy_to_user(buf, p->buf + pos, len);
+	if (n) {
+		if (n == len)
+			return -EFAULT;
+		len -= n;
+	}
+	*ppos = pos + len;
+	return len;
+}
+
+static ssize_t debugfs_timings_write(struct file *file, const char __user *buf,
+				     size_t len, loff_t *ppos)
+{
+	return -EACCES;
+}
+
+static const struct file_operations debugfs_timings_ops = {
+	.owner	 = THIS_MODULE,
+	.open	 = debugfs_timings_open,
+	.release = debugfs_timings_release,
+	.read	 = debugfs_timings_read,
+	.write	 = debugfs_timings_write,
+	.llseek	 = generic_file_llseek,
+};
+
 static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 						   unsigned int id)
 {
@@ -1418,6 +1539,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	int err = -EINVAL;
 	int core;
 	struct kvmppc_vcore *vcore;
+	char buf[16];
 
 	core = id / threads_per_subcore;
 	if (core >= KVM_MAX_VCORES)
@@ -1480,6 +1602,19 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	vcpu->arch.cpu_type = KVM_CPU_3S_64;
 	kvmppc_sanity_check(vcpu);
 
+	/* Create a debugfs directory for the vcpu */
+	sprintf(buf, "vcpu%u", id);
+	if (!IS_ERR_OR_NULL(kvm->arch.debugfs_dir)) {
+		vcpu->arch.debugfs_dir = debugfs_create_dir(buf,
+						kvm->arch.debugfs_dir);
+		if (!IS_ERR_OR_NULL(vcpu->arch.debugfs_dir)) {
+			vcpu->arch.debugfs_timings =
+				debugfs_create_file("timings", 0444,
+						    vcpu->arch.debugfs_dir,
+						    vcpu, &debugfs_timings_ops);
+		}
+	}
+
 	return vcpu;
 
 free_vcpu:
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 0814ca1..d71ae2f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -225,9 +225,14 @@ kvm_novcpu_wakeup:
 	/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
 	ld	r4, HSTATE_KVM_VCPU(r13)
 	cmpdi	r4, 0
-	bne	kvmppc_got_guest
+	beq	kvmppc_primary_no_guest
+
+	addi	r3, r4, VCPU_TB_RMENTRY
+	bl	kvmhv_start_timing
+	b	kvmppc_got_guest
 
 kvm_novcpu_exit:
+	ld	r4, HSTATE_KVM_VCPU(r13)
 	b	hdec_soon
 
 /*
@@ -376,6 +381,12 @@ kvmppc_hv_entry:
 	li	r6, KVM_GUEST_MODE_HOST_HV
 	stb	r6, HSTATE_IN_GUEST(r13)
 
+	/* Store initial timestamp */
+	cmpdi	r4, 0
+	beq	1f
+	addi	r3, r4, VCPU_TB_RMENTRY
+	bl	kvmhv_start_timing
+1:
 	/* Clear out SLB */
 	li	r6,0
 	slbmte	r6,r6
@@ -880,6 +891,10 @@ fast_guest_return:
 	li	r9, KVM_GUEST_MODE_GUEST_HV
 	stb	r9, HSTATE_IN_GUEST(r13)
 
+	/* Accumulate timing */
+	addi	r3, r4, VCPU_TB_GUEST
+	bl	kvmhv_accumulate_time
+
 	/* Enter guest */
 
 BEGIN_FTR_SECTION
@@ -917,6 +932,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	hrfid
 	b	.
 
+secondary_too_late:
+	cmpdi	r4, 0
+	beq	11f
+	addi	r3, r4, VCPU_TB_RMEXIT
+	bl	kvmhv_accumulate_time
+11:	b	kvmhv_switch_to_host
+
+hdec_soon:
+	cmpdi	r4, 0
+	beq	12f
+	addi	r3, r4, VCPU_TB_RMEXIT
+	bl	kvmhv_accumulate_time
+12:	b	kvmhv_do_exit
+
 /******************************************************************************
  *                                                                            *
  *                               Exit code                                    *
@@ -1002,6 +1031,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 	stw	r12,VCPU_TRAP(r9)
 
+	addi	r3, r9, VCPU_TB_RMINTR
+	mr	r4, r9
+	bl	kvmhv_accumulate_time
+	ld	r5, VCPU_GPR(R5)(r9)
+	ld	r6, VCPU_GPR(R6)(r9)
+	ld	r7, VCPU_GPR(R7)(r9)
+	ld	r8, VCPU_GPR(R8)(r9)
+
 	/* Save HEIR (HV emulation assist reg) in emul_inst
 	   if this is an HEI (HV emulation interrupt, e40) */
 	li	r3,KVM_INST_FETCH_FAILED
@@ -1073,6 +1110,9 @@ guest_exit_cont:		/* r9 = vcpu, r12 = trap, r13 = paca */
 	cmpwi	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
 	beq	machine_check_realmode
 mc_cont:
+	addi	r3, r9, VCPU_TB_RMEXIT
+	mr	r4, r9
+	bl	kvmhv_accumulate_time
 
 	/* Save guest CTRL register, set runlatch to 1 */
 6:	mfspr	r6,SPRN_CTRLF
@@ -1417,7 +1457,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	slbia
 	ptesync
 
-hdec_soon:			/* r12 = trap, r13 = paca */
+kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	/*
 	 * POWER7/POWER8 guest -> host partition switch code.
 	 * We don't have to lock against tlbies but we do
@@ -1476,7 +1516,7 @@ hdec_soon:			/* r12 = trap, r13 = paca */
 	addi	r6,r6,PACA_SIZE
 	bne	42b
 
-secondary_too_late:
+kvmhv_switch_to_host:
 	/* Secondary threads wait for primary to do partition switch */
 43:	ld	r5,HSTATE_KVM_VCORE(r13)
 	ld	r4,VCORE_KVM(r5)	/* pointer to struct kvm */
@@ -1562,6 +1602,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 1:	addi	r8,r8,16
 	.endr
 
+	/* Finish timing, if we have a vcpu */
+	ld	r4, HSTATE_KVM_VCPU(r13)
+	cmpdi	r4, 0
+	li	r3, 0
+	beq	2f
+	bl	kvmhv_accumulate_time
+2:
 	/* Unset guest mode */
 	li	r0, KVM_GUEST_MODE_NONE
 	stb	r0, HSTATE_IN_GUEST(r13)
@@ -2069,6 +2116,10 @@ _GLOBAL(kvmppc_h_cede)
 	/* save FP state */
 	bl	kvmppc_save_fp
 
+	ld	r4, HSTATE_KVM_VCPU(r13)
+	addi	r3, r4, VCPU_TB_CEDE
+	bl	kvmhv_accumulate_time
+
 	/*
 	 * Take a nap until a decrementer or external or doobell interrupt
 	 * occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
@@ -2109,6 +2160,9 @@ kvm_end_cede:
 	/* Woken by external or decrementer interrupt */
 	ld	r1, HSTATE_HOST_R1(r13)
 
+	addi	r3, r4, VCPU_TB_RMINTR
+	bl	kvmhv_accumulate_time
+
 	/* load up FP state */
 	bl	kvmppc_load_fp
 
@@ -2429,3 +2483,48 @@ kvmppc_fix_pmao:
 	mtspr	SPRN_PMC6, r3
 	isync
 	blr
+
+/*
+ * Start timing an activity
+ * r3 = pointer to time accumulation struct, r4 = vcpu
+ */
+kvmhv_start_timing:
+	mftb	r5
+	std	r3, VCPU_CUR_ACTIVITY(r4)
+	std	r5, VCPU_ACTIVITY_START(r4)
+	blr
+
+/*
+ * Accumulate time to one activity and start another.
+ * r3 = pointer to new time accumulation struct, r4 = vcpu
+ */
+kvmhv_accumulate_time:
+	ld	r5, VCPU_CUR_ACTIVITY(r4)
+	ld	r6, VCPU_ACTIVITY_START(r4)
+	std	r3, VCPU_CUR_ACTIVITY(r4)
+	mftb	r7
+	std	r7, VCPU_ACTIVITY_START(r4)
+	cmpdi	r5, 0
+	beqlr
+	subf	r3, r6, r7
+	ld	r8, TAS_SEQCOUNT(r5)
+	cmpdi	r8, 0
+	addi	r8, r8, 1
+	std	r8, TAS_SEQCOUNT(r5)
+	lwsync
+	ld	r7, TAS_TOTAL(r5)
+	add	r7, r7, r3
+	std	r7, TAS_TOTAL(r5)
+	ld	r6, TAS_MIN(r5)
+	ld	r7, TAS_MAX(r5)
+	beq	3f
+	cmpd	r3, r6
+	bge	1f
+3:	std	r3, TAS_MIN(r5)
+1:	cmpd	r3, r7
+	ble	2f
+	std	r3, TAS_MAX(r5)
+2:	lwsync
+	addi	r8, r8, 1
+	std	r8, TAS_SEQCOUNT(r5)
+	blr
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This reads the timebase at various points in the real-mode guest
entry/exit code and uses that to accumulate total, minimum and
maximum time spent in those parts of the code.  Currently these
times are accumulated per vcpu in 5 parts of the code:

* rm_entry - time taken from the start of kvmppc_hv_entry() until
  just before entering the guest.
* rm_intr - time from when we take a hypervisor interrupt in the
  guest until we either re-enter the guest or decide to exit to the
  host.  This includes time spent handling hcalls in real mode.
* rm_exit - time from when we decide to exit the guest until the
  return from kvmppc_hv_entry().
* guest - time spend in the guest
* cede - time spent napping in real mode due to an H_CEDE hcall
  while other threads in the same vcore are active.

These times are exposed in debugfs in a directory per vcpu that
contains a file called "timings".  This file contains one line for
each of the 5 timings above, with the name followed by a colon and
4 numbers, which are the count (number of times the code has been
executed), the total time, the minimum time, and the maximum time,
all in nanoseconds.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h     |  19 +++++
 arch/powerpc/include/asm/time.h         |   3 +
 arch/powerpc/kernel/asm-offsets.c       |  11 +++
 arch/powerpc/kernel/time.c              |   6 ++
 arch/powerpc/kvm/book3s_hv.c            | 135 ++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 ++++++++++++++++++++++++-
 6 files changed, 276 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index f1d0bbc..286c0ce 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -369,6 +369,14 @@ struct kvmppc_slb {
 	u8 base_page_size;	/* MMU_PAGE_xxx */
 };
 
+/* Struct used to accumulate timing information in HV real mode code */
+struct kvmhv_tb_accumulator {
+	u64	seqcount;	/* used to synchronize access, also count * 2 */
+	u64	tb_total;	/* total time in timebase ticks */
+	u64	tb_min;		/* min time */
+	u64	tb_max;		/* max time */
+};
+
 # ifdef CONFIG_PPC_FSL_BOOK3E
 #define KVMPPC_BOOKE_IAC_NUM	2
 #define KVMPPC_BOOKE_DAC_NUM	2
@@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
 	u64 busy_preempt;
 
 	u32 emul_inst;
+
+	struct kvmhv_tb_accumulator *cur_activity;	/* What we're timing */
+	u64	cur_tb_start;			/* when it started */
+	struct kvmhv_tb_accumulator rm_entry;	/* real-mode entry code */
+	struct kvmhv_tb_accumulator rm_intr;	/* real-mode intr handling */
+	struct kvmhv_tb_accumulator rm_exit;	/* real-mode exit code */
+	struct kvmhv_tb_accumulator guest_time;	/* guest execution */
+	struct kvmhv_tb_accumulator cede_time;	/* time napping inside guest */
+
+	struct dentry *debugfs_dir;
+	struct dentry *debugfs_timings;
 #endif
 };
 
diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 03cbada..10fc784 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
 
 DECLARE_PER_CPU(u64, decrementers_next_tb);
 
+/* Convert timebase ticks to nanoseconds */
+unsigned long long tb_to_ns(unsigned long long tb_ticks);
+
 #endif /* __KERNEL__ */
 #endif /* __POWERPC_TIME_H */
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 4717859..ec9f59c 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -458,6 +458,17 @@ int main(void)
 	DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
 	DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
 	DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
+	DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
+	DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
+	DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
+	DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
+	DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
+	DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
+	DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, arch.cur_tb_start));
+	DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
+	DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
+	DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
+	DEFINE(TAS_MAX, offsetof(struct kvmhv_tb_accumulator, tb_max));
 #endif
 	DEFINE(VCPU_SHARED_SPRG3, offsetof(struct kvm_vcpu_arch_shared, sprg3));
 	DEFINE(VCPU_SHARED_SPRG4, offsetof(struct kvm_vcpu_arch_shared, sprg4));
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 2d7b33f..56f4484 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -608,6 +608,12 @@ void arch_suspend_enable_irqs(void)
 }
 #endif
 
+unsigned long long tb_to_ns(unsigned long long ticks)
+{
+	return mulhdu(ticks, tb_to_ns_scale) << tb_to_ns_shift;
+}
+EXPORT_SYMBOL_GPL(tb_to_ns);
+
 /*
  * Scheduler clock - returns current time in nanosec units.
  *
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6631b53..8517c33 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1411,6 +1411,127 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
 	return vcore;
 }
 
+static struct debugfs_timings_element {
+	const char *name;
+	size_t offset;
+} timings[] = {
+	{"rm_entry",	offsetof(struct kvm_vcpu, arch.rm_entry)},
+	{"rm_intr",	offsetof(struct kvm_vcpu, arch.rm_intr)},
+	{"rm_exit",	offsetof(struct kvm_vcpu, arch.rm_exit)},
+	{"guest",	offsetof(struct kvm_vcpu, arch.guest_time)},
+	{"cede",	offsetof(struct kvm_vcpu, arch.cede_time)},
+};
+
+#define N_TIMINGS	(sizeof(timings) / sizeof(timings[0]))
+
+struct debugfs_timings_state {
+	struct kvm_vcpu	*vcpu;
+	unsigned int	buflen;
+	char		buf[N_TIMINGS * 100];
+};
+
+static int debugfs_timings_open(struct inode *inode, struct file *file)
+{
+	struct kvm_vcpu *vcpu = inode->i_private;
+	struct debugfs_timings_state *p;
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return -ENOMEM;
+
+	kvm_get_kvm(vcpu->kvm);
+	p->vcpu = vcpu;
+	file->private_data = p;
+
+	return nonseekable_open(inode, file);
+}
+
+static int debugfs_timings_release(struct inode *inode, struct file *file)
+{
+	struct debugfs_timings_state *p = file->private_data;
+
+	kvm_put_kvm(p->vcpu->kvm);
+	kfree(p);
+	return 0;
+}
+
+static ssize_t debugfs_timings_read(struct file *file, char __user *buf,
+				    size_t len, loff_t *ppos)
+{
+	struct debugfs_timings_state *p = file->private_data;
+	struct kvm_vcpu *vcpu = p->vcpu;
+	char *s;
+	struct kvmhv_tb_accumulator tb;
+	u64 count;
+	loff_t pos;
+	ssize_t n;
+	int i, loops;
+	bool ok;
+
+	if (!p->buflen) {
+		s = p->buf;
+		for (i = 0; i < N_TIMINGS; ++i) {
+			struct kvmhv_tb_accumulator *acc;
+
+			acc = (struct kvmhv_tb_accumulator *)
+				((unsigned long)vcpu + timings[i].offset);
+			ok = false;
+			for (loops = 0; loops < 1000; ++loops) {
+				count = acc->seqcount;
+				if (!(count & 1)) {
+					smp_rmb();
+					tb = *acc;
+					smp_rmb();
+					if (count = acc->seqcount) {
+						ok = true;
+						break;
+					}
+				}
+				udelay(1);
+			}
+			if (!ok)
+				sprintf(s, "%s: stuck\n", timings[i].name);
+			else
+				sprintf(s, "%s: %llu %llu %llu %llu\n",
+					timings[i].name, count / 2,
+					tb_to_ns(tb.tb_total),
+					tb_to_ns(tb.tb_min),
+					tb_to_ns(tb.tb_max));
+			s += strlen(s);
+		}
+		p->buflen = s - p->buf;
+	}
+
+	pos = *ppos;
+	if (pos >= p->buflen)
+		return 0;
+	if (len > p->buflen - pos)
+		len = p->buflen - pos;
+	n = copy_to_user(buf, p->buf + pos, len);
+	if (n) {
+		if (n = len)
+			return -EFAULT;
+		len -= n;
+	}
+	*ppos = pos + len;
+	return len;
+}
+
+static ssize_t debugfs_timings_write(struct file *file, const char __user *buf,
+				     size_t len, loff_t *ppos)
+{
+	return -EACCES;
+}
+
+static const struct file_operations debugfs_timings_ops = {
+	.owner	 = THIS_MODULE,
+	.open	 = debugfs_timings_open,
+	.release = debugfs_timings_release,
+	.read	 = debugfs_timings_read,
+	.write	 = debugfs_timings_write,
+	.llseek	 = generic_file_llseek,
+};
+
 static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 						   unsigned int id)
 {
@@ -1418,6 +1539,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	int err = -EINVAL;
 	int core;
 	struct kvmppc_vcore *vcore;
+	char buf[16];
 
 	core = id / threads_per_subcore;
 	if (core >= KVM_MAX_VCORES)
@@ -1480,6 +1602,19 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	vcpu->arch.cpu_type = KVM_CPU_3S_64;
 	kvmppc_sanity_check(vcpu);
 
+	/* Create a debugfs directory for the vcpu */
+	sprintf(buf, "vcpu%u", id);
+	if (!IS_ERR_OR_NULL(kvm->arch.debugfs_dir)) {
+		vcpu->arch.debugfs_dir = debugfs_create_dir(buf,
+						kvm->arch.debugfs_dir);
+		if (!IS_ERR_OR_NULL(vcpu->arch.debugfs_dir)) {
+			vcpu->arch.debugfs_timings +				debugfs_create_file("timings", 0444,
+						    vcpu->arch.debugfs_dir,
+						    vcpu, &debugfs_timings_ops);
+		}
+	}
+
 	return vcpu;
 
 free_vcpu:
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 0814ca1..d71ae2f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -225,9 +225,14 @@ kvm_novcpu_wakeup:
 	/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
 	ld	r4, HSTATE_KVM_VCPU(r13)
 	cmpdi	r4, 0
-	bne	kvmppc_got_guest
+	beq	kvmppc_primary_no_guest
+
+	addi	r3, r4, VCPU_TB_RMENTRY
+	bl	kvmhv_start_timing
+	b	kvmppc_got_guest
 
 kvm_novcpu_exit:
+	ld	r4, HSTATE_KVM_VCPU(r13)
 	b	hdec_soon
 
 /*
@@ -376,6 +381,12 @@ kvmppc_hv_entry:
 	li	r6, KVM_GUEST_MODE_HOST_HV
 	stb	r6, HSTATE_IN_GUEST(r13)
 
+	/* Store initial timestamp */
+	cmpdi	r4, 0
+	beq	1f
+	addi	r3, r4, VCPU_TB_RMENTRY
+	bl	kvmhv_start_timing
+1:
 	/* Clear out SLB */
 	li	r6,0
 	slbmte	r6,r6
@@ -880,6 +891,10 @@ fast_guest_return:
 	li	r9, KVM_GUEST_MODE_GUEST_HV
 	stb	r9, HSTATE_IN_GUEST(r13)
 
+	/* Accumulate timing */
+	addi	r3, r4, VCPU_TB_GUEST
+	bl	kvmhv_accumulate_time
+
 	/* Enter guest */
 
 BEGIN_FTR_SECTION
@@ -917,6 +932,20 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	hrfid
 	b	.
 
+secondary_too_late:
+	cmpdi	r4, 0
+	beq	11f
+	addi	r3, r4, VCPU_TB_RMEXIT
+	bl	kvmhv_accumulate_time
+11:	b	kvmhv_switch_to_host
+
+hdec_soon:
+	cmpdi	r4, 0
+	beq	12f
+	addi	r3, r4, VCPU_TB_RMEXIT
+	bl	kvmhv_accumulate_time
+12:	b	kvmhv_do_exit
+
 /******************************************************************************
  *                                                                            *
  *                               Exit code                                    *
@@ -1002,6 +1031,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 	stw	r12,VCPU_TRAP(r9)
 
+	addi	r3, r9, VCPU_TB_RMINTR
+	mr	r4, r9
+	bl	kvmhv_accumulate_time
+	ld	r5, VCPU_GPR(R5)(r9)
+	ld	r6, VCPU_GPR(R6)(r9)
+	ld	r7, VCPU_GPR(R7)(r9)
+	ld	r8, VCPU_GPR(R8)(r9)
+
 	/* Save HEIR (HV emulation assist reg) in emul_inst
 	   if this is an HEI (HV emulation interrupt, e40) */
 	li	r3,KVM_INST_FETCH_FAILED
@@ -1073,6 +1110,9 @@ guest_exit_cont:		/* r9 = vcpu, r12 = trap, r13 = paca */
 	cmpwi	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
 	beq	machine_check_realmode
 mc_cont:
+	addi	r3, r9, VCPU_TB_RMEXIT
+	mr	r4, r9
+	bl	kvmhv_accumulate_time
 
 	/* Save guest CTRL register, set runlatch to 1 */
 6:	mfspr	r6,SPRN_CTRLF
@@ -1417,7 +1457,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	slbia
 	ptesync
 
-hdec_soon:			/* r12 = trap, r13 = paca */
+kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	/*
 	 * POWER7/POWER8 guest -> host partition switch code.
 	 * We don't have to lock against tlbies but we do
@@ -1476,7 +1516,7 @@ hdec_soon:			/* r12 = trap, r13 = paca */
 	addi	r6,r6,PACA_SIZE
 	bne	42b
 
-secondary_too_late:
+kvmhv_switch_to_host:
 	/* Secondary threads wait for primary to do partition switch */
 43:	ld	r5,HSTATE_KVM_VCORE(r13)
 	ld	r4,VCORE_KVM(r5)	/* pointer to struct kvm */
@@ -1562,6 +1602,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 1:	addi	r8,r8,16
 	.endr
 
+	/* Finish timing, if we have a vcpu */
+	ld	r4, HSTATE_KVM_VCPU(r13)
+	cmpdi	r4, 0
+	li	r3, 0
+	beq	2f
+	bl	kvmhv_accumulate_time
+2:
 	/* Unset guest mode */
 	li	r0, KVM_GUEST_MODE_NONE
 	stb	r0, HSTATE_IN_GUEST(r13)
@@ -2069,6 +2116,10 @@ _GLOBAL(kvmppc_h_cede)
 	/* save FP state */
 	bl	kvmppc_save_fp
 
+	ld	r4, HSTATE_KVM_VCPU(r13)
+	addi	r3, r4, VCPU_TB_CEDE
+	bl	kvmhv_accumulate_time
+
 	/*
 	 * Take a nap until a decrementer or external or doobell interrupt
 	 * occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
@@ -2109,6 +2160,9 @@ kvm_end_cede:
 	/* Woken by external or decrementer interrupt */
 	ld	r1, HSTATE_HOST_R1(r13)
 
+	addi	r3, r4, VCPU_TB_RMINTR
+	bl	kvmhv_accumulate_time
+
 	/* load up FP state */
 	bl	kvmppc_load_fp
 
@@ -2429,3 +2483,48 @@ kvmppc_fix_pmao:
 	mtspr	SPRN_PMC6, r3
 	isync
 	blr
+
+/*
+ * Start timing an activity
+ * r3 = pointer to time accumulation struct, r4 = vcpu
+ */
+kvmhv_start_timing:
+	mftb	r5
+	std	r3, VCPU_CUR_ACTIVITY(r4)
+	std	r5, VCPU_ACTIVITY_START(r4)
+	blr
+
+/*
+ * Accumulate time to one activity and start another.
+ * r3 = pointer to new time accumulation struct, r4 = vcpu
+ */
+kvmhv_accumulate_time:
+	ld	r5, VCPU_CUR_ACTIVITY(r4)
+	ld	r6, VCPU_ACTIVITY_START(r4)
+	std	r3, VCPU_CUR_ACTIVITY(r4)
+	mftb	r7
+	std	r7, VCPU_ACTIVITY_START(r4)
+	cmpdi	r5, 0
+	beqlr
+	subf	r3, r6, r7
+	ld	r8, TAS_SEQCOUNT(r5)
+	cmpdi	r8, 0
+	addi	r8, r8, 1
+	std	r8, TAS_SEQCOUNT(r5)
+	lwsync
+	ld	r7, TAS_TOTAL(r5)
+	add	r7, r7, r3
+	std	r7, TAS_TOTAL(r5)
+	ld	r6, TAS_MIN(r5)
+	ld	r7, TAS_MAX(r5)
+	beq	3f
+	cmpd	r3, r6
+	bge	1f
+3:	std	r3, TAS_MIN(r5)
+1:	cmpd	r3, r7
+	ble	2f
+	std	r3, TAS_MAX(r5)
+2:	lwsync
+	addi	r8, r8, 1
+	std	r8, TAS_SEQCOUNT(r5)
+	blr
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 14/23] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |  5 ++--
 arch/powerpc/kvm/book3s_hv.c        | 56 ++++++++++++++++++++-----------------
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 286c0ce..cee6e55 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -306,9 +306,8 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE	0
 #define VCORE_SLEEPING	1
-#define VCORE_STARTING	2
-#define VCORE_RUNNING	3
-#define VCORE_EXITING	4
+#define VCORE_RUNNING	2
+#define VCORE_EXITING	3
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8517c33..15598be 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1836,6 +1836,25 @@ static void kvmppc_start_restoring_l2_cache(const struct kvmppc_vcore *vc)
 	mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE);
 }
 
+static void prepare_threads(struct kvmppc_vcore *vc)
+{
+	struct kvm_vcpu *vcpu, *vnext;
+
+	list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+				 arch.run_list) {
+		if (signal_pending(vcpu->arch.run_task))
+			vcpu->arch.ret = -EINTR;
+		else if (vcpu->arch.vpa.update_pending ||
+			 vcpu->arch.slb_shadow.update_pending ||
+			 vcpu->arch.dtl.update_pending)
+			vcpu->arch.ret = RESUME_GUEST;
+		else
+			continue;
+		kvmppc_remove_runnable(vc, vcpu);
+		wake_up(&vcpu->arch.cpu_run);
+	}
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
@@ -1845,46 +1864,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	struct kvm_vcpu *vcpu, *vnext;
 	long ret;
 	u64 now;
-	int i, need_vpa_update;
+	int i;
 	int srcu_idx;
-	struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
-	/* don't start if any threads have a signal pending */
-	need_vpa_update = 0;
-	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-		if (signal_pending(vcpu->arch.run_task))
-			return;
-		if (vcpu->arch.vpa.update_pending ||
-		    vcpu->arch.slb_shadow.update_pending ||
-		    vcpu->arch.dtl.update_pending)
-			vcpus_to_update[need_vpa_update++] = vcpu;
-	}
+	/*
+	 * Remove from the list any threads that have a signal pending
+	 * or need a VPA update done
+	 */
+	prepare_threads(vc);
+
+	/* if the runner is no longer runnable, let the caller pick a new one */
+	if (vc->runner->arch.state != KVMPPC_VCPU_RUNNABLE)
+		return;
 
 	/*
-	 * Initialize *vc, in particular vc->vcore_state, so we can
-	 * drop the vcore lock if necessary.
+	 * Initialize *vc.
 	 */
 	vc->n_woken = 0;
 	vc->nap_count = 0;
 	vc->entry_exit_count = 0;
 	vc->preempt_tb = TB_NIL;
-	vc->vcore_state = VCORE_STARTING;
 	vc->in_guest = 0;
 	vc->napping_threads = 0;
 	vc->conferring_threads = 0;
 
 	/*
-	 * Updating any of the vpas requires calling kvmppc_pin_guest_page,
-	 * which can't be called with any spinlocks held.
-	 */
-	if (need_vpa_update) {
-		spin_unlock(&vc->lock);
-		for (i = 0; i < need_vpa_update; ++i)
-			kvmppc_update_vpas(vcpus_to_update[i]);
-		spin_lock(&vc->lock);
-	}
-
-	/*
 	 * Make sure we are running on primary threads, and that secondary
 	 * threads are offline.  Also check if the number of threads in this
 	 * guest are greater than the current system threads per guest.
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 14/23] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Previously, if kvmppc_run_core() was running a VCPU that needed a VPA
update (i.e. one of its 3 virtual processor areas needed to be pinned
in memory so the host real mode code can update it on guest entry and
exit), we would drop the vcore lock and do the update there and then.
Future changes will make it inconvenient to drop the lock, so instead
we now remove it from the list of runnable VCPUs and wake up its
VCPU task.  This will have the effect that the VCPU task will exit
kvmppc_run_vcpu(), go around the do loop in kvmppc_vcpu_run_hv(), and
re-enter kvmppc_run_vcpu(), whereupon it will do the necessary call
to kvmppc_update_vpas() and then rejoin the vcore.

The one complication is that the runner VCPU (whose VCPU task is the
current task) might be one of the ones that gets removed from the
runnable list.  In that case we just return from kvmppc_run_core()
and let the code in kvmppc_run_vcpu() wake up another VCPU task to be
the runner if necessary.

This all means that the VCORE_STARTING state is no longer used, so we
remove it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |  5 ++--
 arch/powerpc/kvm/book3s_hv.c        | 56 ++++++++++++++++++++-----------------
 2 files changed, 32 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 286c0ce..cee6e55 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -306,9 +306,8 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE	0
 #define VCORE_SLEEPING	1
-#define VCORE_STARTING	2
-#define VCORE_RUNNING	3
-#define VCORE_EXITING	4
+#define VCORE_RUNNING	2
+#define VCORE_EXITING	3
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8517c33..15598be 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1836,6 +1836,25 @@ static void kvmppc_start_restoring_l2_cache(const struct kvmppc_vcore *vc)
 	mtspr(SPRN_MPPR, mpp_addr | PPC_MPPR_FETCH_WHOLE_TABLE);
 }
 
+static void prepare_threads(struct kvmppc_vcore *vc)
+{
+	struct kvm_vcpu *vcpu, *vnext;
+
+	list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+				 arch.run_list) {
+		if (signal_pending(vcpu->arch.run_task))
+			vcpu->arch.ret = -EINTR;
+		else if (vcpu->arch.vpa.update_pending ||
+			 vcpu->arch.slb_shadow.update_pending ||
+			 vcpu->arch.dtl.update_pending)
+			vcpu->arch.ret = RESUME_GUEST;
+		else
+			continue;
+		kvmppc_remove_runnable(vc, vcpu);
+		wake_up(&vcpu->arch.cpu_run);
+	}
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
@@ -1845,46 +1864,31 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	struct kvm_vcpu *vcpu, *vnext;
 	long ret;
 	u64 now;
-	int i, need_vpa_update;
+	int i;
 	int srcu_idx;
-	struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
-	/* don't start if any threads have a signal pending */
-	need_vpa_update = 0;
-	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-		if (signal_pending(vcpu->arch.run_task))
-			return;
-		if (vcpu->arch.vpa.update_pending ||
-		    vcpu->arch.slb_shadow.update_pending ||
-		    vcpu->arch.dtl.update_pending)
-			vcpus_to_update[need_vpa_update++] = vcpu;
-	}
+	/*
+	 * Remove from the list any threads that have a signal pending
+	 * or need a VPA update done
+	 */
+	prepare_threads(vc);
+
+	/* if the runner is no longer runnable, let the caller pick a new one */
+	if (vc->runner->arch.state != KVMPPC_VCPU_RUNNABLE)
+		return;
 
 	/*
-	 * Initialize *vc, in particular vc->vcore_state, so we can
-	 * drop the vcore lock if necessary.
+	 * Initialize *vc.
 	 */
 	vc->n_woken = 0;
 	vc->nap_count = 0;
 	vc->entry_exit_count = 0;
 	vc->preempt_tb = TB_NIL;
-	vc->vcore_state = VCORE_STARTING;
 	vc->in_guest = 0;
 	vc->napping_threads = 0;
 	vc->conferring_threads = 0;
 
 	/*
-	 * Updating any of the vpas requires calling kvmppc_pin_guest_page,
-	 * which can't be called with any spinlocks held.
-	 */
-	if (need_vpa_update) {
-		spin_unlock(&vc->lock);
-		for (i = 0; i < need_vpa_update; ++i)
-			kvmppc_update_vpas(vcpus_to_update[i]);
-		spin_lock(&vc->lock);
-	}
-
-	/*
 	 * Make sure we are running on primary threads, and that secondary
 	 * threads are offline.  Also check if the number of threads in this
 	 * guest are greater than the current system threads per guest.
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 15/23] KVM: PPC: Book3S HV: Minor cleanups
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h     |  2 --
 arch/powerpc/kernel/asm-offsets.c       |  1 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++++++++++++++-------------------
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index cee6e55..ec4cf37 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,6 @@ struct kvm_arch {
 	unsigned long host_sdr1;
 	int tlbie_lock;
 	unsigned long lpcr;
-	unsigned long rmor;
 	unsigned long vrma_slb_v;
 	int hpte_setup_done;
 	u32 hpt_order;
@@ -271,7 +270,6 @@ struct kvm_arch {
  */
 struct kvmppc_vcore {
 	int n_runnable;
-	int n_busy;
 	int num_threads;
 	int entry_exit_count;
 	int n_woken;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index ec9f59c..5eda551 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -503,7 +503,6 @@ int main(void)
 	DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
 	DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
 	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
-	DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
 	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
 	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
 	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d71ae2f..b2e6718 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -244,9 +244,9 @@ kvm_novcpu_exit:
 kvm_start_guest:
 
 	/* Set runlatch bit the minute you wake up from nap */
-	mfspr	r1, SPRN_CTRLF
-	ori 	r1, r1, 1
-	mtspr	SPRN_CTRLT, r1
+	mfspr	r0, SPRN_CTRLF
+	ori 	r0, r0, 1
+	mtspr	SPRN_CTRLT, r0
 
 	ld	r2,PACATOC(r13)
 
@@ -490,11 +490,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	cmpwi	r0,0
 	beq	20b
 
-	/* Set LPCR and RMOR. */
+	/* Set LPCR. */
 10:	ld	r8,VCORE_LPCR(r5)
 	mtspr	SPRN_LPCR,r8
-	ld	r8,KVM_RMOR(r9)
-	mtspr	SPRN_RMOR,r8
 	isync
 
 	/* Check if HDEC expires soon */
@@ -1065,7 +1063,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	bne	2f
 	mfspr	r3,SPRN_HDEC
 	cmpwi	r3,0
-	bge	ignore_hdec
+	mr	r4,r9
+	bge	fast_guest_return
 2:
 	/* See if this is an hcall we can handle in real mode */
 	cmpwi	r12,BOOK3S_INTERRUPT_SYSCALL
@@ -1073,26 +1072,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 	/* External interrupt ? */
 	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
-	bne+	ext_interrupt_to_host
+	bne+	guest_exit_cont
 
 	/* External interrupt, first check for host_ipi. If this is
 	 * set, we know the host wants us out so let's do it now
 	 */
 	bl	kvmppc_read_intr
 	cmpdi	r3, 0
-	bgt	ext_interrupt_to_host
+	bgt	guest_exit_cont
 
 	/* Check if any CPU is heading out to the host, if so head out too */
 	ld	r5, HSTATE_KVM_VCORE(r13)
 	lwz	r0, VCORE_ENTRY_EXIT(r5)
 	cmpwi	r0, 0x100
-	bge	ext_interrupt_to_host
-
-	/* Return to guest after delivering any pending interrupt */
 	mr	r4, r9
-	b	deliver_guest_interrupt
-
-ext_interrupt_to_host:
+	blt	deliver_guest_interrupt
 
 guest_exit_cont:		/* r9 = vcpu, r12 = trap, r13 = paca */
 	/* Save more register state  */
@@ -1743,8 +1737,10 @@ kvmppc_hisi:
  * Returns to the guest if we handle it, or continues on up to
  * the kernel if we can't (i.e. if we don't have a handler for
  * it, or if the handler returns H_TOO_HARD).
+ *
+ * r5 - r8 contain hcall args,
+ * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca
  */
-	.globl	hcall_try_real_mode
 hcall_try_real_mode:
 	ld	r3,VCPU_GPR(R3)(r9)
 	andi.	r0,r11,MSR_PR
@@ -2004,10 +2000,6 @@ hcall_real_table:
 	.globl	hcall_real_table_end
 hcall_real_table_end:
 
-ignore_hdec:
-	mr	r4,r9
-	b	fast_guest_return
-
 _GLOBAL(kvmppc_h_set_xdabr)
 	andi.	r0, r5, DABRX_USER | DABRX_KERNEL
 	beq	6f
@@ -2046,7 +2038,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	li	r3, 0
 	blr
 
-_GLOBAL(kvmppc_h_cede)
+_GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	ori	r11,r11,MSR_EE
 	std	r11,VCPU_MSR(r3)
 	li	r0,1
@@ -2126,9 +2118,9 @@ _GLOBAL(kvmppc_h_cede)
 	 * runlatch bit before napping.
 	 */
 kvm_do_nap:
-	mfspr	r2, SPRN_CTRLF
-	clrrdi	r2, r2, 1
-	mtspr	SPRN_CTRLT, r2
+	mfspr	r0, SPRN_CTRLF
+	clrrdi	r0, r0, 1
+	mtspr	SPRN_CTRLT, r0
 
 	li	r0,1
 	stb	r0,HSTATE_HWTHREAD_REQ(r13)
@@ -2258,13 +2250,14 @@ machine_check_realmode:
 
 /*
  * Check the reason we woke from nap, and take appropriate action.
- * Returns:
+ * Returns (in r3):
  *	0 if nothing needs to be done
  *	1 if something happened that needs to be handled by the host
  *	-1 if there was a guest wakeup (IPI)
  *
  * Also sets r12 to the interrupt vector for any interrupt that needs
  * to be handled now by the host (0x500 for external interrupt), or zero.
+ * Modifies r0, r6, r7, r8.
  */
 kvmppc_check_wake_reason:
 	mfspr	r6, SPRN_SRR1
@@ -2300,6 +2293,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
  *	0 if no interrupt is pending
  *	1 if an interrupt is pending that needs to be handled by the host
  *	-1 if there was a guest wakeup IPI (which has now been cleared)
+ * Modifies r0, r6, r7, r8, returns value in r3.
  */
 kvmppc_read_intr:
 	/* see if a host IPI is pending */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 15/23] KVM: PPC: Book3S HV: Minor cleanups
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

* Remove unused kvmppc_vcore::n_busy field.
* Remove setting of RMOR, since it was only used on PPC970 and the
  PPC970 KVM support has been removed.
* Don't use r1 or r2 in setting the runlatch since they are
  conventionally reserved for other things; use r0 instead.
* Streamline the code a little and remove the ext_interrupt_to_host
  label.
* Add some comments about register usage.
* hcall_try_real_mode doesn't need to be global, and can't be
  called from C code anyway.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h     |  2 --
 arch/powerpc/kernel/asm-offsets.c       |  1 -
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 44 ++++++++++++++-------------------
 3 files changed, 19 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index cee6e55..ec4cf37 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -227,7 +227,6 @@ struct kvm_arch {
 	unsigned long host_sdr1;
 	int tlbie_lock;
 	unsigned long lpcr;
-	unsigned long rmor;
 	unsigned long vrma_slb_v;
 	int hpte_setup_done;
 	u32 hpt_order;
@@ -271,7 +270,6 @@ struct kvm_arch {
  */
 struct kvmppc_vcore {
 	int n_runnable;
-	int n_busy;
 	int num_threads;
 	int entry_exit_count;
 	int n_woken;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index ec9f59c..5eda551 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -503,7 +503,6 @@ int main(void)
 	DEFINE(KVM_NEED_FLUSH, offsetof(struct kvm, arch.need_tlb_flush.bits));
 	DEFINE(KVM_ENABLED_HCALLS, offsetof(struct kvm, arch.enabled_hcalls));
 	DEFINE(KVM_LPCR, offsetof(struct kvm, arch.lpcr));
-	DEFINE(KVM_RMOR, offsetof(struct kvm, arch.rmor));
 	DEFINE(KVM_VRMA_SLB_V, offsetof(struct kvm, arch.vrma_slb_v));
 	DEFINE(VCPU_DSISR, offsetof(struct kvm_vcpu, arch.shregs.dsisr));
 	DEFINE(VCPU_DAR, offsetof(struct kvm_vcpu, arch.shregs.dar));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d71ae2f..b2e6718 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -244,9 +244,9 @@ kvm_novcpu_exit:
 kvm_start_guest:
 
 	/* Set runlatch bit the minute you wake up from nap */
-	mfspr	r1, SPRN_CTRLF
-	ori 	r1, r1, 1
-	mtspr	SPRN_CTRLT, r1
+	mfspr	r0, SPRN_CTRLF
+	ori 	r0, r0, 1
+	mtspr	SPRN_CTRLT, r0
 
 	ld	r2,PACATOC(r13)
 
@@ -490,11 +490,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	cmpwi	r0,0
 	beq	20b
 
-	/* Set LPCR and RMOR. */
+	/* Set LPCR. */
 10:	ld	r8,VCORE_LPCR(r5)
 	mtspr	SPRN_LPCR,r8
-	ld	r8,KVM_RMOR(r9)
-	mtspr	SPRN_RMOR,r8
 	isync
 
 	/* Check if HDEC expires soon */
@@ -1065,7 +1063,8 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	bne	2f
 	mfspr	r3,SPRN_HDEC
 	cmpwi	r3,0
-	bge	ignore_hdec
+	mr	r4,r9
+	bge	fast_guest_return
 2:
 	/* See if this is an hcall we can handle in real mode */
 	cmpwi	r12,BOOK3S_INTERRUPT_SYSCALL
@@ -1073,26 +1072,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 
 	/* External interrupt ? */
 	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
-	bne+	ext_interrupt_to_host
+	bne+	guest_exit_cont
 
 	/* External interrupt, first check for host_ipi. If this is
 	 * set, we know the host wants us out so let's do it now
 	 */
 	bl	kvmppc_read_intr
 	cmpdi	r3, 0
-	bgt	ext_interrupt_to_host
+	bgt	guest_exit_cont
 
 	/* Check if any CPU is heading out to the host, if so head out too */
 	ld	r5, HSTATE_KVM_VCORE(r13)
 	lwz	r0, VCORE_ENTRY_EXIT(r5)
 	cmpwi	r0, 0x100
-	bge	ext_interrupt_to_host
-
-	/* Return to guest after delivering any pending interrupt */
 	mr	r4, r9
-	b	deliver_guest_interrupt
-
-ext_interrupt_to_host:
+	blt	deliver_guest_interrupt
 
 guest_exit_cont:		/* r9 = vcpu, r12 = trap, r13 = paca */
 	/* Save more register state  */
@@ -1743,8 +1737,10 @@ kvmppc_hisi:
  * Returns to the guest if we handle it, or continues on up to
  * the kernel if we can't (i.e. if we don't have a handler for
  * it, or if the handler returns H_TOO_HARD).
+ *
+ * r5 - r8 contain hcall args,
+ * r9 = vcpu, r10 = pc, r11 = msr, r12 = trap, r13 = paca
  */
-	.globl	hcall_try_real_mode
 hcall_try_real_mode:
 	ld	r3,VCPU_GPR(R3)(r9)
 	andi.	r0,r11,MSR_PR
@@ -2004,10 +2000,6 @@ hcall_real_table:
 	.globl	hcall_real_table_end
 hcall_real_table_end:
 
-ignore_hdec:
-	mr	r4,r9
-	b	fast_guest_return
-
 _GLOBAL(kvmppc_h_set_xdabr)
 	andi.	r0, r5, DABRX_USER | DABRX_KERNEL
 	beq	6f
@@ -2046,7 +2038,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	li	r3, 0
 	blr
 
-_GLOBAL(kvmppc_h_cede)
+_GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	ori	r11,r11,MSR_EE
 	std	r11,VCPU_MSR(r3)
 	li	r0,1
@@ -2126,9 +2118,9 @@ _GLOBAL(kvmppc_h_cede)
 	 * runlatch bit before napping.
 	 */
 kvm_do_nap:
-	mfspr	r2, SPRN_CTRLF
-	clrrdi	r2, r2, 1
-	mtspr	SPRN_CTRLT, r2
+	mfspr	r0, SPRN_CTRLF
+	clrrdi	r0, r0, 1
+	mtspr	SPRN_CTRLT, r0
 
 	li	r0,1
 	stb	r0,HSTATE_HWTHREAD_REQ(r13)
@@ -2258,13 +2250,14 @@ machine_check_realmode:
 
 /*
  * Check the reason we woke from nap, and take appropriate action.
- * Returns:
+ * Returns (in r3):
  *	0 if nothing needs to be done
  *	1 if something happened that needs to be handled by the host
  *	-1 if there was a guest wakeup (IPI)
  *
  * Also sets r12 to the interrupt vector for any interrupt that needs
  * to be handled now by the host (0x500 for external interrupt), or zero.
+ * Modifies r0, r6, r7, r8.
  */
 kvmppc_check_wake_reason:
 	mfspr	r6, SPRN_SRR1
@@ -2300,6 +2293,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
  *	0 if no interrupt is pending
  *	1 if an interrupt is pending that needs to be handled by the host
  *	-1 if there was a guest wakeup IPI (which has now been cleared)
+ * Modifies r0, r6, r7, r8, returns value in r3.
  */
 kvmppc_read_intr:
 	/* see if a host IPI is pending */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 16/23] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Rather than calling cond_resched() in kvmppc_run_core() before doing
the post-processing for the vcpus that we have just run (that is,
calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
that post-processing before calling cond_resched(), and that post-
processing is moved out into its own function, post_guest_process().

The reschedule point is now in kvmppc_run_vcpu() and we define a new
vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
task is runnable but not running.  (Doing the reschedule with the
vcore in VCORE_INACTIVE state would be bad because there are potentially
other vcpus waiting for the runner in kvmppc_wait_for_exec() which
then wouldn't get woken up.)

Also, we make use of the handy cond_resched_lock() function, which
unlocks and relocks vc->lock for us around the reschedule.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |  5 +-
 arch/powerpc/kvm/book3s_hv.c        | 92 +++++++++++++++++++++----------------
 2 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index ec4cf37..7b327e5 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -304,8 +304,9 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE	0
 #define VCORE_SLEEPING	1
-#define VCORE_RUNNING	2
-#define VCORE_EXITING	3
+#define VCORE_PREEMPT	2
+#define VCORE_RUNNING	3
+#define VCORE_EXITING	4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 15598be..bd16b03 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1855,15 +1855,50 @@ static void prepare_threads(struct kvmppc_vcore *vc)
 	}
 }
 
+static void post_guest_process(struct kvmppc_vcore *vc)
+{
+	u64 now;
+	long ret;
+	struct kvm_vcpu *vcpu, *vnext;
+
+	now = get_tb();
+	list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+				 arch.run_list) {
+		/* cancel pending dec exception if dec is positive */
+		if (now < vcpu->arch.dec_expires &&
+		    kvmppc_core_pending_dec(vcpu))
+			kvmppc_core_dequeue_dec(vcpu);
+
+		trace_kvm_guest_exit(vcpu);
+
+		ret = RESUME_GUEST;
+		if (vcpu->arch.trap)
+			ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
+						    vcpu->arch.run_task);
+
+		vcpu->arch.ret = ret;
+		vcpu->arch.trap = 0;
+
+		if (vcpu->arch.ceded) {
+			if (!is_kvmppc_resume_guest(ret))
+				kvmppc_end_cede(vcpu);
+			else
+				kvmppc_set_timer(vcpu);
+		}
+		if (!is_kvmppc_resume_guest(vcpu->arch.ret)) {
+			kvmppc_remove_runnable(vc, vcpu);
+			wake_up(&vcpu->arch.cpu_run);
+		}
+	}
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
 static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-	struct kvm_vcpu *vcpu, *vnext;
-	long ret;
-	u64 now;
+	struct kvm_vcpu *vcpu;
 	int i;
 	int srcu_idx;
 
@@ -1895,8 +1930,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	 */
 	if ((threads_per_core > 1) &&
 	    ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
-		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
+		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 			vcpu->arch.ret = -EBUSY;
+			kvmppc_remove_runnable(vc, vcpu);
+			wake_up(&vcpu->arch.cpu_run);
+		}
 		goto out;
 	}
 
@@ -1952,44 +1990,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	kvm_guest_exit();
 
 	preempt_enable();
-	cond_resched();
 
 	spin_lock(&vc->lock);
-	now = get_tb();
-	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-		/* cancel pending dec exception if dec is positive */
-		if (now < vcpu->arch.dec_expires &&
-		    kvmppc_core_pending_dec(vcpu))
-			kvmppc_core_dequeue_dec(vcpu);
-
-		trace_kvm_guest_exit(vcpu);
-
-		ret = RESUME_GUEST;
-		if (vcpu->arch.trap)
-			ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
-						    vcpu->arch.run_task);
-
-		vcpu->arch.ret = ret;
-		vcpu->arch.trap = 0;
-
-		if (vcpu->arch.ceded) {
-			if (!is_kvmppc_resume_guest(ret))
-				kvmppc_end_cede(vcpu);
-			else
-				kvmppc_set_timer(vcpu);
-		}
-	}
+	post_guest_process(vc);
 
  out:
 	vc->vcore_state = VCORE_INACTIVE;
-	list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
-				 arch.run_list) {
-		if (!is_kvmppc_resume_guest(vcpu->arch.ret)) {
-			kvmppc_remove_runnable(vc, vcpu);
-			wake_up(&vcpu->arch.cpu_run);
-		}
-	}
-
 	trace_kvmppc_run_core(vc, 1);
 }
 
@@ -2111,7 +2117,6 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 		}
 		if (!vc->n_runnable || vcpu->arch.state != KVMPPC_VCPU_RUNNABLE)
 			break;
-		vc->runner = vcpu;
 		n_ceded = 0;
 		list_for_each_entry(v, &vc->runnable_threads, arch.run_list) {
 			if (!v->arch.pending_exceptions)
@@ -2119,10 +2124,17 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 			else
 				v->arch.ceded = 0;
 		}
-		if (n_ceded == vc->n_runnable)
+		vc->runner = vcpu;
+		if (n_ceded == vc->n_runnable) {
 			kvmppc_vcore_blocked(vc);
-		else
+		} else if (should_resched()) {
+			vc->vcore_state = VCORE_PREEMPT;
+			/* Let something else run */
+			cond_resched_lock(&vc->lock);
+			vc->vcore_state = VCORE_INACTIVE;
+		} else {
 			kvmppc_run_core(vc);
+		}
 		vc->runner = NULL;
 	}
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 16/23] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Rather than calling cond_resched() in kvmppc_run_core() before doing
the post-processing for the vcpus that we have just run (that is,
calling kvmppc_handle_exit_hv(), kvmppc_set_timer(), etc.), we now do
that post-processing before calling cond_resched(), and that post-
processing is moved out into its own function, post_guest_process().

The reschedule point is now in kvmppc_run_vcpu() and we define a new
vcore state, VCORE_PREEMPT, to indicate that that the vcore's runner
task is runnable but not running.  (Doing the reschedule with the
vcore in VCORE_INACTIVE state would be bad because there are potentially
other vcpus waiting for the runner in kvmppc_wait_for_exec() which
then wouldn't get woken up.)

Also, we make use of the handy cond_resched_lock() function, which
unlocks and relocks vc->lock for us around the reschedule.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |  5 +-
 arch/powerpc/kvm/book3s_hv.c        | 92 +++++++++++++++++++++----------------
 2 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index ec4cf37..7b327e5 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -304,8 +304,9 @@ struct kvmppc_vcore {
 /* Values for vcore_state */
 #define VCORE_INACTIVE	0
 #define VCORE_SLEEPING	1
-#define VCORE_RUNNING	2
-#define VCORE_EXITING	3
+#define VCORE_PREEMPT	2
+#define VCORE_RUNNING	3
+#define VCORE_EXITING	4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 15598be..bd16b03 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1855,15 +1855,50 @@ static void prepare_threads(struct kvmppc_vcore *vc)
 	}
 }
 
+static void post_guest_process(struct kvmppc_vcore *vc)
+{
+	u64 now;
+	long ret;
+	struct kvm_vcpu *vcpu, *vnext;
+
+	now = get_tb();
+	list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
+				 arch.run_list) {
+		/* cancel pending dec exception if dec is positive */
+		if (now < vcpu->arch.dec_expires &&
+		    kvmppc_core_pending_dec(vcpu))
+			kvmppc_core_dequeue_dec(vcpu);
+
+		trace_kvm_guest_exit(vcpu);
+
+		ret = RESUME_GUEST;
+		if (vcpu->arch.trap)
+			ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
+						    vcpu->arch.run_task);
+
+		vcpu->arch.ret = ret;
+		vcpu->arch.trap = 0;
+
+		if (vcpu->arch.ceded) {
+			if (!is_kvmppc_resume_guest(ret))
+				kvmppc_end_cede(vcpu);
+			else
+				kvmppc_set_timer(vcpu);
+		}
+		if (!is_kvmppc_resume_guest(vcpu->arch.ret)) {
+			kvmppc_remove_runnable(vc, vcpu);
+			wake_up(&vcpu->arch.cpu_run);
+		}
+	}
+}
+
 /*
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
 static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
-	struct kvm_vcpu *vcpu, *vnext;
-	long ret;
-	u64 now;
+	struct kvm_vcpu *vcpu;
 	int i;
 	int srcu_idx;
 
@@ -1895,8 +1930,11 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	 */
 	if ((threads_per_core > 1) &&
 	    ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
-		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
+		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 			vcpu->arch.ret = -EBUSY;
+			kvmppc_remove_runnable(vc, vcpu);
+			wake_up(&vcpu->arch.cpu_run);
+		}
 		goto out;
 	}
 
@@ -1952,44 +1990,12 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	kvm_guest_exit();
 
 	preempt_enable();
-	cond_resched();
 
 	spin_lock(&vc->lock);
-	now = get_tb();
-	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
-		/* cancel pending dec exception if dec is positive */
-		if (now < vcpu->arch.dec_expires &&
-		    kvmppc_core_pending_dec(vcpu))
-			kvmppc_core_dequeue_dec(vcpu);
-
-		trace_kvm_guest_exit(vcpu);
-
-		ret = RESUME_GUEST;
-		if (vcpu->arch.trap)
-			ret = kvmppc_handle_exit_hv(vcpu->arch.kvm_run, vcpu,
-						    vcpu->arch.run_task);
-
-		vcpu->arch.ret = ret;
-		vcpu->arch.trap = 0;
-
-		if (vcpu->arch.ceded) {
-			if (!is_kvmppc_resume_guest(ret))
-				kvmppc_end_cede(vcpu);
-			else
-				kvmppc_set_timer(vcpu);
-		}
-	}
+	post_guest_process(vc);
 
  out:
 	vc->vcore_state = VCORE_INACTIVE;
-	list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
-				 arch.run_list) {
-		if (!is_kvmppc_resume_guest(vcpu->arch.ret)) {
-			kvmppc_remove_runnable(vc, vcpu);
-			wake_up(&vcpu->arch.cpu_run);
-		}
-	}
-
 	trace_kvmppc_run_core(vc, 1);
 }
 
@@ -2111,7 +2117,6 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 		}
 		if (!vc->n_runnable || vcpu->arch.state != KVMPPC_VCPU_RUNNABLE)
 			break;
-		vc->runner = vcpu;
 		n_ceded = 0;
 		list_for_each_entry(v, &vc->runnable_threads, arch.run_list) {
 			if (!v->arch.pending_exceptions)
@@ -2119,10 +2124,17 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 			else
 				v->arch.ceded = 0;
 		}
-		if (n_ceded = vc->n_runnable)
+		vc->runner = vcpu;
+		if (n_ceded = vc->n_runnable) {
 			kvmppc_vcore_blocked(vc);
-		else
+		} else if (should_resched()) {
+			vc->vcore_state = VCORE_PREEMPT;
+			/* Let something else run */
+			cond_resched_lock(&vc->lock);
+			vc->vcore_state = VCORE_INACTIVE;
+		} else {
 			kvmppc_run_core(vc);
+		}
 		vc->runner = NULL;
 	}
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 17/23] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h     |  2 --
 arch/powerpc/kernel/asm-offsets.c       |  1 -
 arch/powerpc/kvm/book3s_hv.c            | 47 +++++++++++++++++++--------------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 +++++--------
 4 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 7b327e5..f6d4232 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,8 +272,6 @@ struct kvmppc_vcore {
 	int n_runnable;
 	int num_threads;
 	int entry_exit_count;
-	int n_woken;
-	int nap_count;
 	int napping_threads;
 	int first_vcpuid;
 	u16 pcpu;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 5eda551..fa7b57d 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,6 @@ int main(void)
 	DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
 	DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
 	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count));
-	DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
 	DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
 	DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
 	DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bd16b03..03a8bb4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1702,8 +1702,10 @@ static int kvmppc_grab_hwthread(int cpu)
 	tpaca = &paca[cpu];
 
 	/* Ensure the thread won't go into the kernel if it wakes */
-	tpaca->kvm_hstate.hwthread_req = 1;
 	tpaca->kvm_hstate.kvm_vcpu = NULL;
+	tpaca->kvm_hstate.napping = 0;
+	smp_wmb();
+	tpaca->kvm_hstate.hwthread_req = 1;
 
 	/*
 	 * If the thread is already executing in the kernel (e.g. handling
@@ -1746,35 +1748,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
 	}
 	cpu = vc->pcpu + vcpu->arch.ptid;
 	tpaca = &paca[cpu];
-	tpaca->kvm_hstate.kvm_vcpu = vcpu;
 	tpaca->kvm_hstate.kvm_vcore = vc;
 	tpaca->kvm_hstate.ptid = vcpu->arch.ptid;
 	vcpu->cpu = vc->pcpu;
+	/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
 	smp_wmb();
+	tpaca->kvm_hstate.kvm_vcpu = vcpu;
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
-	if (cpu != smp_processor_id()) {
+	if (cpu != smp_processor_id())
 		xics_wake_cpu(cpu);
-		if (vcpu->arch.ptid)
-			++vc->n_woken;
-	}
 #endif
 }
 
-static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
+static void kvmppc_wait_for_nap(void)
 {
-	int i;
+	int cpu = smp_processor_id();
+	int i, loops;
 
-	HMT_low();
-	i = 0;
-	while (vc->nap_count < vc->n_woken) {
-		if (++i >= 1000000) {
-			pr_err("kvmppc_wait_for_nap timeout %d %d\n",
-			       vc->nap_count, vc->n_woken);
-			break;
+	for (loops = 0; loops < 1000000; ++loops) {
+		/*
+		 * Check if all threads are finished.
+		 * We set the vcpu pointer when starting a thread
+		 * and the thread clears it when finished, so we look
+		 * for any threads that still have a non-NULL vcpu ptr.
+		 */
+		for (i = 1; i < threads_per_subcore; ++i)
+			if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+				break;
+		if (i == threads_per_subcore) {
+			HMT_medium();
+			return;
 		}
-		cpu_relax();
+		HMT_low();
 	}
 	HMT_medium();
+	for (i = 1; i < threads_per_subcore; ++i)
+		if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+			pr_err("KVM: CPU %d seems to be stuck\n", cpu + i);
 }
 
 /*
@@ -1915,8 +1925,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	/*
 	 * Initialize *vc.
 	 */
-	vc->n_woken = 0;
-	vc->nap_count = 0;
 	vc->entry_exit_count = 0;
 	vc->preempt_tb = TB_NIL;
 	vc->in_guest = 0;
@@ -1975,8 +1983,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
 		vcpu->cpu = -1;
 	/* wait for secondary threads to finish writing their state to memory */
-	if (vc->nap_count < vc->n_woken)
-		kvmppc_wait_for_nap(vc);
+	kvmppc_wait_for_nap();
 	for (i = 0; i < threads_per_subcore; ++i)
 		kvmppc_release_hwthread(vc->pcpu + i);
 	/* prevent other vcpu threads from doing kvmppc_start_thread() now */
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b2e6718..c3b148d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -291,26 +291,21 @@ kvm_secondary_got_guest:
 	ld	r6, PACA_DSCR(r13)
 	std	r6, HSTATE_DSCR(r13)
 
+	/* Order load of vcore, ptid etc. after load of vcpu */
+	lwsync
 	bl	kvmppc_hv_entry
 
 	/* Back from the guest, go back to nap */
 	/* Clear our vcpu pointer so we don't come back in early */
 	li	r0, 0
-	std	r0, HSTATE_KVM_VCPU(r13)
 	/*
-	 * Make sure we clear HSTATE_KVM_VCPU(r13) before incrementing
-	 * the nap_count, because once the increment to nap_count is
-	 * visible we could be given another vcpu.
+	 * Once we clear HSTATE_KVM_VCPU(r13), the code in
+	 * kvmppc_run_core() is going to assume that all our vcpu
+	 * state is visible in memory.  This lwsync makes sure
+	 * that that is true.
 	 */
 	lwsync
-
-	/* increment the nap count and then go to nap mode */
-	ld	r4, HSTATE_KVM_VCORE(r13)
-	addi	r4, r4, VCORE_NAP_COUNT
-51:	lwarx	r3, 0, r4
-	addi	r3, r3, 1
-	stwcx.	r3, 0, r4
-	bne	51b
+	std	r0, HSTATE_KVM_VCPU(r13)
 
 /*
  * At this point we have finished executing in the guest.
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 17/23] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

We can tell when a secondary thread has finished running a guest by
the fact that it clears its kvm_hstate.kvm_vcpu pointer, so there
is no real need for the nap_count field in the kvmppc_vcore struct.
This changes kvmppc_wait_for_nap to poll the kvm_hstate.kvm_vcpu
pointers of the secondary threads rather than polling vc->nap_count.
Besides reducing the size of the kvmppc_vcore struct by 8 bytes,
this also means that we can tell which secondary threads have got
stuck and thus print a more informative error message.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h     |  2 --
 arch/powerpc/kernel/asm-offsets.c       |  1 -
 arch/powerpc/kvm/book3s_hv.c            | 47 +++++++++++++++++++--------------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 19 +++++--------
 4 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 7b327e5..f6d4232 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -272,8 +272,6 @@ struct kvmppc_vcore {
 	int n_runnable;
 	int num_threads;
 	int entry_exit_count;
-	int n_woken;
-	int nap_count;
 	int napping_threads;
 	int first_vcpuid;
 	u16 pcpu;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 5eda551..fa7b57d 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,6 @@ int main(void)
 	DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
 	DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
 	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count));
-	DEFINE(VCORE_NAP_COUNT, offsetof(struct kvmppc_vcore, nap_count));
 	DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
 	DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
 	DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bd16b03..03a8bb4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1702,8 +1702,10 @@ static int kvmppc_grab_hwthread(int cpu)
 	tpaca = &paca[cpu];
 
 	/* Ensure the thread won't go into the kernel if it wakes */
-	tpaca->kvm_hstate.hwthread_req = 1;
 	tpaca->kvm_hstate.kvm_vcpu = NULL;
+	tpaca->kvm_hstate.napping = 0;
+	smp_wmb();
+	tpaca->kvm_hstate.hwthread_req = 1;
 
 	/*
 	 * If the thread is already executing in the kernel (e.g. handling
@@ -1746,35 +1748,43 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
 	}
 	cpu = vc->pcpu + vcpu->arch.ptid;
 	tpaca = &paca[cpu];
-	tpaca->kvm_hstate.kvm_vcpu = vcpu;
 	tpaca->kvm_hstate.kvm_vcore = vc;
 	tpaca->kvm_hstate.ptid = vcpu->arch.ptid;
 	vcpu->cpu = vc->pcpu;
+	/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
 	smp_wmb();
+	tpaca->kvm_hstate.kvm_vcpu = vcpu;
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
-	if (cpu != smp_processor_id()) {
+	if (cpu != smp_processor_id())
 		xics_wake_cpu(cpu);
-		if (vcpu->arch.ptid)
-			++vc->n_woken;
-	}
 #endif
 }
 
-static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
+static void kvmppc_wait_for_nap(void)
 {
-	int i;
+	int cpu = smp_processor_id();
+	int i, loops;
 
-	HMT_low();
-	i = 0;
-	while (vc->nap_count < vc->n_woken) {
-		if (++i >= 1000000) {
-			pr_err("kvmppc_wait_for_nap timeout %d %d\n",
-			       vc->nap_count, vc->n_woken);
-			break;
+	for (loops = 0; loops < 1000000; ++loops) {
+		/*
+		 * Check if all threads are finished.
+		 * We set the vcpu pointer when starting a thread
+		 * and the thread clears it when finished, so we look
+		 * for any threads that still have a non-NULL vcpu ptr.
+		 */
+		for (i = 1; i < threads_per_subcore; ++i)
+			if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+				break;
+		if (i = threads_per_subcore) {
+			HMT_medium();
+			return;
 		}
-		cpu_relax();
+		HMT_low();
 	}
 	HMT_medium();
+	for (i = 1; i < threads_per_subcore; ++i)
+		if (paca[cpu + i].kvm_hstate.kvm_vcpu)
+			pr_err("KVM: CPU %d seems to be stuck\n", cpu + i);
 }
 
 /*
@@ -1915,8 +1925,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	/*
 	 * Initialize *vc.
 	 */
-	vc->n_woken = 0;
-	vc->nap_count = 0;
 	vc->entry_exit_count = 0;
 	vc->preempt_tb = TB_NIL;
 	vc->in_guest = 0;
@@ -1975,8 +1983,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
 		vcpu->cpu = -1;
 	/* wait for secondary threads to finish writing their state to memory */
-	if (vc->nap_count < vc->n_woken)
-		kvmppc_wait_for_nap(vc);
+	kvmppc_wait_for_nap();
 	for (i = 0; i < threads_per_subcore; ++i)
 		kvmppc_release_hwthread(vc->pcpu + i);
 	/* prevent other vcpu threads from doing kvmppc_start_thread() now */
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b2e6718..c3b148d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -291,26 +291,21 @@ kvm_secondary_got_guest:
 	ld	r6, PACA_DSCR(r13)
 	std	r6, HSTATE_DSCR(r13)
 
+	/* Order load of vcore, ptid etc. after load of vcpu */
+	lwsync
 	bl	kvmppc_hv_entry
 
 	/* Back from the guest, go back to nap */
 	/* Clear our vcpu pointer so we don't come back in early */
 	li	r0, 0
-	std	r0, HSTATE_KVM_VCPU(r13)
 	/*
-	 * Make sure we clear HSTATE_KVM_VCPU(r13) before incrementing
-	 * the nap_count, because once the increment to nap_count is
-	 * visible we could be given another vcpu.
+	 * Once we clear HSTATE_KVM_VCPU(r13), the code in
+	 * kvmppc_run_core() is going to assume that all our vcpu
+	 * state is visible in memory.  This lwsync makes sure
+	 * that that is true.
 	 */
 	lwsync
-
-	/* increment the nap count and then go to nap mode */
-	ld	r4, HSTATE_KVM_VCORE(r13)
-	addi	r4, r4, VCORE_NAP_COUNT
-51:	lwarx	r3, 0, r4
-	addi	r3, r3, 1
-	stwcx.	r3, 0, r4
-	bne	51b
+	std	r0, HSTATE_KVM_VCPU(r13)
 
 /*
  * At this point we have finished executing in the guest.
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 18/23] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3b148d..8afc8a8 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -191,6 +191,7 @@ kvmppc_primary_no_guest:
 	li	r3, NAPPING_NOVCPU
 	stb	r3, HSTATE_NAPPING(r13)
 
+	li	r3, 0		/* Don't wake on privileged (OS) doorbell */
 	b	kvm_do_nap
 
 kvm_novcpu_wakeup:
@@ -2107,10 +2108,13 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	addi	r3, r4, VCPU_TB_CEDE
 	bl	kvmhv_accumulate_time
 
+	lis	r3, LPCR_PECEDP@h	/* Do wake on privileged doorbell */
+
 	/*
 	 * Take a nap until a decrementer or external or doobell interrupt
-	 * occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
-	 * runlatch bit before napping.
+	 * occurs, with PECE1 and PECE0 set in LPCR.
+	 * On POWER8, if we are ceding, also set PECEDP.
+	 * Also clear the runlatch bit before napping.
 	 */
 kvm_do_nap:
 	mfspr	r0, SPRN_CTRLF
@@ -2122,7 +2126,7 @@ kvm_do_nap:
 	mfspr	r5,SPRN_LPCR
 	ori	r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-	oris	r5,r5,LPCR_PECEDP@h
+	rlwimi	r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_LPCR,r5
 	isync
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 18/23] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

When running a multi-threaded guest and vcpu 0 in a virtual core
is not running in the guest (i.e. it is busy elsewhere in the host),
thread 0 of the physical core will switch the MMU to the guest and
then go to nap mode in the code at kvm_do_nap.  If the guest sends
an IPI to thread 0 using the msgsndp instruction, that will wake
up thread 0 and cause all the threads in the guest to exit to the
host unnecessarily.  To avoid the unnecessary exit, this arranges
for the PECEDP bit to be cleared in this situation.  When napping
due to a H_CEDE from the guest, we still set PECEDP so that the
thread will wake up on an IPI sent using msgsndp.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index c3b148d..8afc8a8 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -191,6 +191,7 @@ kvmppc_primary_no_guest:
 	li	r3, NAPPING_NOVCPU
 	stb	r3, HSTATE_NAPPING(r13)
 
+	li	r3, 0		/* Don't wake on privileged (OS) doorbell */
 	b	kvm_do_nap
 
 kvm_novcpu_wakeup:
@@ -2107,10 +2108,13 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	addi	r3, r4, VCPU_TB_CEDE
 	bl	kvmhv_accumulate_time
 
+	lis	r3, LPCR_PECEDP@h	/* Do wake on privileged doorbell */
+
 	/*
 	 * Take a nap until a decrementer or external or doobell interrupt
-	 * occurs, with PECE1, PECE0 and PECEDP set in LPCR. Also clear the
-	 * runlatch bit before napping.
+	 * occurs, with PECE1 and PECE0 set in LPCR.
+	 * On POWER8, if we are ceding, also set PECEDP.
+	 * Also clear the runlatch bit before napping.
 	 */
 kvm_do_nap:
 	mfspr	r0, SPRN_CTRLF
@@ -2122,7 +2126,7 @@ kvm_do_nap:
 	mfspr	r5,SPRN_LPCR
 	ori	r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
-	oris	r5,r5,LPCR_PECEDP@h
+	rlwimi	r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_LPCR,r5
 	isync
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 19/23] KVM: PPC: Book3S HV: Use decrementer to wake napping threads
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 8afc8a8..03a37a0 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvmppc_primary_no_guest:
 	/* We handle this much like a ceded vcpu */
+	/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+	mfspr	r3, SPRN_HDEC
+	mtspr	SPRN_DEC, r3
 	/* set our bit in napping_threads */
 	ld	r5, HSTATE_KVM_VCORE(r13)
 	lbz	r7, HSTATE_PTID(r13)
@@ -223,6 +226,12 @@ kvm_novcpu_wakeup:
 	cmpdi	r3, 0
 	bge	kvm_novcpu_exit
 
+	/* See if our timeslice has expired (HDEC is negative) */
+	mfspr	r0, SPRN_HDEC
+	li	r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+	cmpwi	r0, 0
+	blt	kvm_novcpu_exit
+
 	/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
 	ld	r4, HSTATE_KVM_VCPU(r13)
 	cmpdi	r4, 0
@@ -1478,10 +1487,10 @@ kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	cmpwi	r3,0x100	/* Are we the first here? */
 	bge	43f
 	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-	beq	40f
+	beq	43f
 	li	r0,0
 	mtspr	SPRN_HDEC,r0
-40:
+
 	/*
 	 * Send an IPI to any napping threads, since an HDEC interrupt
 	 * doesn't wake CPUs up from nap.
@@ -2104,6 +2113,27 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	/* save FP state */
 	bl	kvmppc_save_fp
 
+	/*
+	 * Set DEC to the smaller of DEC and HDEC, so that we wake
+	 * no later than the end of our timeslice (HDEC interrupts
+	 * don't wake us from nap).
+	 */
+	mfspr	r3, SPRN_DEC
+	mfspr	r4, SPRN_HDEC
+	mftb	r5
+	cmpw	r3, r4
+	ble	67f
+	mtspr	SPRN_DEC, r4
+67:
+	/* save expiry time of guest decrementer */
+	extsw	r3, r3
+	add	r3, r3, r5
+	ld	r4, HSTATE_KVM_VCPU(r13)
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	ld	r6, VCORE_TB_OFFSET(r5)
+	subf	r3, r6, r3	/* convert to host TB value */
+	std	r3, VCPU_DEC_EXPIRES(r4)
+
 	ld	r4, HSTATE_KVM_VCPU(r13)
 	addi	r3, r4, VCPU_TB_CEDE
 	bl	kvmhv_accumulate_time
@@ -2157,6 +2187,15 @@ kvm_end_cede:
 	/* load up FP state */
 	bl	kvmppc_load_fp
 
+	/* Restore guest decrementer */
+	ld	r3, VCPU_DEC_EXPIRES(r4)
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	ld	r6, VCORE_TB_OFFSET(r5)
+	add	r3, r3, r6	/* convert host TB to guest TB value */
+	mftb	r7
+	subf	r3, r7, r3
+	mtspr	SPRN_DEC, r3
+
 	/* Load NV GPRS */
 	ld	r14, VCPU_GPR(R14)(r4)
 	ld	r15, VCPU_GPR(R15)(r4)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 19/23] KVM: PPC: Book3S HV: Use decrementer to wake napping threads
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This arranges for threads that are napping due to their vcpu having
ceded or due to not having a vcpu to wake up at the end of the guest's
timeslice without having to be poked with an IPI.  We do that by
arranging for the decrementer to contain a value no greater than the
number of timebase ticks remaining until the end of the timeslice.
In the case of a thread with no vcpu, this number is in the hypervisor
decrementer already.  In the case of a ceded vcpu, we use the smaller
of the HDEC value and the DEC value.

Using the DEC like this when ceded means we need to save and restore
the guest decrementer value around the nap.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 43 +++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 8afc8a8..03a37a0 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -172,6 +172,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 kvmppc_primary_no_guest:
 	/* We handle this much like a ceded vcpu */
+	/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
+	mfspr	r3, SPRN_HDEC
+	mtspr	SPRN_DEC, r3
 	/* set our bit in napping_threads */
 	ld	r5, HSTATE_KVM_VCORE(r13)
 	lbz	r7, HSTATE_PTID(r13)
@@ -223,6 +226,12 @@ kvm_novcpu_wakeup:
 	cmpdi	r3, 0
 	bge	kvm_novcpu_exit
 
+	/* See if our timeslice has expired (HDEC is negative) */
+	mfspr	r0, SPRN_HDEC
+	li	r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+	cmpwi	r0, 0
+	blt	kvm_novcpu_exit
+
 	/* Got an IPI but other vcpus aren't yet exiting, must be a latecomer */
 	ld	r4, HSTATE_KVM_VCPU(r13)
 	cmpdi	r4, 0
@@ -1478,10 +1487,10 @@ kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	cmpwi	r3,0x100	/* Are we the first here? */
 	bge	43f
 	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-	beq	40f
+	beq	43f
 	li	r0,0
 	mtspr	SPRN_HDEC,r0
-40:
+
 	/*
 	 * Send an IPI to any napping threads, since an HDEC interrupt
 	 * doesn't wake CPUs up from nap.
@@ -2104,6 +2113,27 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	/* save FP state */
 	bl	kvmppc_save_fp
 
+	/*
+	 * Set DEC to the smaller of DEC and HDEC, so that we wake
+	 * no later than the end of our timeslice (HDEC interrupts
+	 * don't wake us from nap).
+	 */
+	mfspr	r3, SPRN_DEC
+	mfspr	r4, SPRN_HDEC
+	mftb	r5
+	cmpw	r3, r4
+	ble	67f
+	mtspr	SPRN_DEC, r4
+67:
+	/* save expiry time of guest decrementer */
+	extsw	r3, r3
+	add	r3, r3, r5
+	ld	r4, HSTATE_KVM_VCPU(r13)
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	ld	r6, VCORE_TB_OFFSET(r5)
+	subf	r3, r6, r3	/* convert to host TB value */
+	std	r3, VCPU_DEC_EXPIRES(r4)
+
 	ld	r4, HSTATE_KVM_VCPU(r13)
 	addi	r3, r4, VCPU_TB_CEDE
 	bl	kvmhv_accumulate_time
@@ -2157,6 +2187,15 @@ kvm_end_cede:
 	/* load up FP state */
 	bl	kvmppc_load_fp
 
+	/* Restore guest decrementer */
+	ld	r3, VCPU_DEC_EXPIRES(r4)
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	ld	r6, VCORE_TB_OFFSET(r5)
+	add	r3, r3, r6	/* convert host TB to guest TB value */
+	mftb	r7
+	subf	r3, r7, r3
+	mtspr	SPRN_DEC, r3
+
 	/* Load NV GPRS */
 	ld	r14, VCPU_GPR(R14)(r4)
 	ld	r15, VCPU_GPR(R15)(r4)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kernel/asm-offsets.c       |  4 +++
 arch/powerpc/kvm/book3s_hv.c            | 48 ++++++++++++++++++++++-----------
 arch/powerpc/kvm/book3s_hv_rm_xics.c    | 11 ++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 ++++++++++++++++++++++++----
 4 files changed, 83 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index fa7b57d..0ce2aa6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -37,6 +37,7 @@
 #include <asm/thread_info.h>
 #include <asm/rtas.h>
 #include <asm/vdso_datapage.h>
+#include <asm/dbell.h>
 #ifdef CONFIG_PPC64
 #include <asm/paca.h>
 #include <asm/lppaca.h>
@@ -568,6 +569,7 @@ int main(void)
 	DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
 	DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
 	DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
+	DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
 	DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
 	DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
 	DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
@@ -757,5 +759,7 @@ int main(void)
 			offsetof(struct paca_struct, subcore_sibling_mask));
 #endif
 
+	DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+
 	return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 03a8bb4..2c34bae 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -51,6 +51,7 @@
 #include <asm/hvcall.h>
 #include <asm/switch_to.h>
 #include <asm/smp.h>
+#include <asm/dbell.h>
 #include <linux/gfp.h>
 #include <linux/vmalloc.h>
 #include <linux/highmem.h>
@@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+static bool kvmppc_ipi_thread(int cpu)
+{
+	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
+	if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+		preempt_disable();
+		if ((cpu & ~7) == (smp_processor_id() & ~7)) {
+			unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+			msg |= cpu & 7;
+			smp_mb();
+			__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+			preempt_enable();
+			return true;
+		}
+		preempt_enable();
+	}
+
+#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
+	if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
+		xics_wake_cpu(cpu);
+		return true;
+	}
+#endif
+
+	return false;
+}
+
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-	int me;
 	int cpu = vcpu->cpu;
 	wait_queue_head_t *wqp;
 
@@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 		++vcpu->stat.halt_wakeup;
 	}
 
-	me = get_cpu();
+	if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
+		return;
 
 	/* CPU points to the first thread of the core */
-	if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
-#ifdef CONFIG_PPC_ICP_NATIVE
-		int real_cpu = cpu + vcpu->arch.ptid;
-		if (paca[real_cpu].kvm_hstate.xics_phys)
-			xics_wake_cpu(real_cpu);
-		else
-#endif
-		if (cpu_online(cpu))
-			smp_send_reschedule(cpu);
-	}
-	put_cpu();
+	if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
+		smp_send_reschedule(cpu);
 }
 
 /*
@@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
 	/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
 	smp_wmb();
 	tpaca->kvm_hstate.kvm_vcpu = vcpu;
-#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
 	if (cpu != smp_processor_id())
-		xics_wake_cpu(cpu);
-#endif
+		kvmppc_ipi_thread(cpu);
 }
 
 static void kvmppc_wait_for_nap(void)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 6dded8c..457a8b1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -18,6 +18,7 @@
 #include <asm/debug.h>
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
+#include <asm/dbell.h>
 
 #include "book3s_xics.h"
 
@@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
 	/* In SMT cpu will always point to thread 0, we adjust it */
 	cpu += vcpu->arch.ptid;
 
+	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
+	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
+	    (cpu & ~7) == (raw_smp_processor_id() & ~7)) {
+		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+		msg |= cpu & 7;
+		smp_mb();
+		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+		return;
+	}
+
 	/* Not too hard, then poke the target */
 	xics_phys = paca[cpu].kvm_hstate.xics_phys;
 	rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 03a37a0..04728ce 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1075,6 +1075,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	cmpwi	r12,BOOK3S_INTERRUPT_SYSCALL
 	beq	hcall_try_real_mode
 
+	/* Hypervisor doorbell - exit only if host IPI flag set */
+	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
+	bne	3f
+	lbz	r0, HSTATE_HOST_IPI(r13)
+	beq	4f
+	b	guest_exit_cont
+3:
 	/* External interrupt ? */
 	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
 	bne+	guest_exit_cont
@@ -1087,7 +1094,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	bgt	guest_exit_cont
 
 	/* Check if any CPU is heading out to the host, if so head out too */
-	ld	r5, HSTATE_KVM_VCORE(r13)
+4:	ld	r5, HSTATE_KVM_VCORE(r13)
 	lwz	r0, VCORE_ENTRY_EXIT(r5)
 	cmpwi	r0, 0x100
 	mr	r4, r9
@@ -1492,7 +1499,7 @@ kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	mtspr	SPRN_HDEC,r0
 
 	/*
-	 * Send an IPI to any napping threads, since an HDEC interrupt
+	 * Send a message or IPI to any napping threads, since an HDEC interrupt
 	 * doesn't wake CPUs up from nap.
 	 */
 	lwz	r3,VCORE_NAPPING_THREADS(r5)
@@ -1503,7 +1510,22 @@ kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	beq	43f
 	/* Order entry/exit update vs. IPIs */
 	sync
-	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
+BEGIN_FTR_SECTION
+	b	45f
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
+	/* Use msgsnd on POWER8 */
+	lhz	r6, VCORE_PCPU(r5)
+	clrldi	r6, r6, 64-3
+	oris	r6, r6, (PPC_DBELL_SERVER << (63-36))@h
+42:	andi.	r0, r3, 1
+	beq	44f
+	PPC_MSGSND(6)
+44:	srdi.	r3, r3, 1
+	addi	r6, r6, 1
+	bne	42b
+	b	kvmhv_switch_to_host
+	/* Use IPIs on POWER7 */
+45:	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
 	subf	r6,r4,r13
 42:	andi.	r0,r3,1
 	beq	44f
@@ -2143,7 +2165,7 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	/*
 	 * Take a nap until a decrementer or external or doobell interrupt
 	 * occurs, with PECE1 and PECE0 set in LPCR.
-	 * On POWER8, if we are ceding, also set PECEDP.
+	 * On POWER8, set PECEDH, and if we are ceding, also set PECEDP.
 	 * Also clear the runlatch bit before napping.
 	 */
 kvm_do_nap:
@@ -2156,6 +2178,7 @@ kvm_do_nap:
 	mfspr	r5,SPRN_LPCR
 	ori	r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
+	ori	r5, r5, LPCR_PECEDH
 	rlwimi	r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_LPCR,r5
@@ -2291,7 +2314,7 @@ machine_check_realmode:
  * Returns (in r3):
  *	0 if nothing needs to be done
  *	1 if something happened that needs to be handled by the host
- *	-1 if there was a guest wakeup (IPI)
+ *	-1 if there was a guest wakeup (IPI or msgsnd)
  *
  * Also sets r12 to the interrupt vector for any interrupt that needs
  * to be handled now by the host (0x500 for external interrupt), or zero.
@@ -2322,7 +2345,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 	/* hypervisor doorbell */
 3:	li	r12, BOOK3S_INTERRUPT_H_DOORBELL
+	/* see if it's a host IPI */
 	li	r3, 1
+	lbz	r0, HSTATE_HOST_IPI(r13)
+	cmpwi	r0, 0
+	bnelr
+	/* if not, clear it and return -1 */
+	lis	r6, (PPC_DBELL_SERVER << (63-36))@h
+	PPC_MSGCLR(6)
+	li	r3, -1
 	blr
 
 /*
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This uses msgsnd where possible for signalling other threads within
the same core on POWER8 systems, rather than IPIs through the XICS
interrupt controller.  This includes waking secondary threads to run
the guest, the interrupts generated by the virtual XICS, and the
interrupts to bring the other threads out of the guest when exiting.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kernel/asm-offsets.c       |  4 +++
 arch/powerpc/kvm/book3s_hv.c            | 48 ++++++++++++++++++++++-----------
 arch/powerpc/kvm/book3s_hv_rm_xics.c    | 11 ++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 ++++++++++++++++++++++++----
 4 files changed, 83 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index fa7b57d..0ce2aa6 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -37,6 +37,7 @@
 #include <asm/thread_info.h>
 #include <asm/rtas.h>
 #include <asm/vdso_datapage.h>
+#include <asm/dbell.h>
 #ifdef CONFIG_PPC64
 #include <asm/paca.h>
 #include <asm/lppaca.h>
@@ -568,6 +569,7 @@ int main(void)
 	DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
 	DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
 	DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
+	DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
 	DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
 	DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
 	DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
@@ -757,5 +759,7 @@ int main(void)
 			offsetof(struct paca_struct, subcore_sibling_mask));
 #endif
 
+	DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
+
 	return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 03a8bb4..2c34bae 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -51,6 +51,7 @@
 #include <asm/hvcall.h>
 #include <asm/switch_to.h>
 #include <asm/smp.h>
+#include <asm/dbell.h>
 #include <linux/gfp.h>
 #include <linux/vmalloc.h>
 #include <linux/highmem.h>
@@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+static bool kvmppc_ipi_thread(int cpu)
+{
+	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
+	if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
+		preempt_disable();
+		if ((cpu & ~7) = (smp_processor_id() & ~7)) {
+			unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+			msg |= cpu & 7;
+			smp_mb();
+			__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+			preempt_enable();
+			return true;
+		}
+		preempt_enable();
+	}
+
+#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
+	if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
+		xics_wake_cpu(cpu);
+		return true;
+	}
+#endif
+
+	return false;
+}
+
 static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 {
-	int me;
 	int cpu = vcpu->cpu;
 	wait_queue_head_t *wqp;
 
@@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
 		++vcpu->stat.halt_wakeup;
 	}
 
-	me = get_cpu();
+	if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
+		return;
 
 	/* CPU points to the first thread of the core */
-	if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
-#ifdef CONFIG_PPC_ICP_NATIVE
-		int real_cpu = cpu + vcpu->arch.ptid;
-		if (paca[real_cpu].kvm_hstate.xics_phys)
-			xics_wake_cpu(real_cpu);
-		else
-#endif
-		if (cpu_online(cpu))
-			smp_send_reschedule(cpu);
-	}
-	put_cpu();
+	if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
+		smp_send_reschedule(cpu);
 }
 
 /*
@@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
 	/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
 	smp_wmb();
 	tpaca->kvm_hstate.kvm_vcpu = vcpu;
-#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
 	if (cpu != smp_processor_id())
-		xics_wake_cpu(cpu);
-#endif
+		kvmppc_ipi_thread(cpu);
 }
 
 static void kvmppc_wait_for_nap(void)
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 6dded8c..457a8b1 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -18,6 +18,7 @@
 #include <asm/debug.h>
 #include <asm/synch.h>
 #include <asm/ppc-opcode.h>
+#include <asm/dbell.h>
 
 #include "book3s_xics.h"
 
@@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
 	/* In SMT cpu will always point to thread 0, we adjust it */
 	cpu += vcpu->arch.ptid;
 
+	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
+	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
+	    (cpu & ~7) = (raw_smp_processor_id() & ~7)) {
+		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+		msg |= cpu & 7;
+		smp_mb();
+		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+		return;
+	}
+
 	/* Not too hard, then poke the target */
 	xics_phys = paca[cpu].kvm_hstate.xics_phys;
 	rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 03a37a0..04728ce 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1075,6 +1075,13 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	cmpwi	r12,BOOK3S_INTERRUPT_SYSCALL
 	beq	hcall_try_real_mode
 
+	/* Hypervisor doorbell - exit only if host IPI flag set */
+	cmpwi	r12, BOOK3S_INTERRUPT_H_DOORBELL
+	bne	3f
+	lbz	r0, HSTATE_HOST_IPI(r13)
+	beq	4f
+	b	guest_exit_cont
+3:
 	/* External interrupt ? */
 	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
 	bne+	guest_exit_cont
@@ -1087,7 +1094,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	bgt	guest_exit_cont
 
 	/* Check if any CPU is heading out to the host, if so head out too */
-	ld	r5, HSTATE_KVM_VCORE(r13)
+4:	ld	r5, HSTATE_KVM_VCORE(r13)
 	lwz	r0, VCORE_ENTRY_EXIT(r5)
 	cmpwi	r0, 0x100
 	mr	r4, r9
@@ -1492,7 +1499,7 @@ kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	mtspr	SPRN_HDEC,r0
 
 	/*
-	 * Send an IPI to any napping threads, since an HDEC interrupt
+	 * Send a message or IPI to any napping threads, since an HDEC interrupt
 	 * doesn't wake CPUs up from nap.
 	 */
 	lwz	r3,VCORE_NAPPING_THREADS(r5)
@@ -1503,7 +1510,22 @@ kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	beq	43f
 	/* Order entry/exit update vs. IPIs */
 	sync
-	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
+BEGIN_FTR_SECTION
+	b	45f
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
+	/* Use msgsnd on POWER8 */
+	lhz	r6, VCORE_PCPU(r5)
+	clrldi	r6, r6, 64-3
+	oris	r6, r6, (PPC_DBELL_SERVER << (63-36))@h
+42:	andi.	r0, r3, 1
+	beq	44f
+	PPC_MSGSND(6)
+44:	srdi.	r3, r3, 1
+	addi	r6, r6, 1
+	bne	42b
+	b	kvmhv_switch_to_host
+	/* Use IPIs on POWER7 */
+45:	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
 	subf	r6,r4,r13
 42:	andi.	r0,r3,1
 	beq	44f
@@ -2143,7 +2165,7 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	/*
 	 * Take a nap until a decrementer or external or doobell interrupt
 	 * occurs, with PECE1 and PECE0 set in LPCR.
-	 * On POWER8, if we are ceding, also set PECEDP.
+	 * On POWER8, set PECEDH, and if we are ceding, also set PECEDP.
 	 * Also clear the runlatch bit before napping.
 	 */
 kvm_do_nap:
@@ -2156,6 +2178,7 @@ kvm_do_nap:
 	mfspr	r5,SPRN_LPCR
 	ori	r5,r5,LPCR_PECE0 | LPCR_PECE1
 BEGIN_FTR_SECTION
+	ori	r5, r5, LPCR_PECEDH
 	rlwimi	r5, r3, 0, LPCR_PECEDP
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtspr	SPRN_LPCR,r5
@@ -2291,7 +2314,7 @@ machine_check_realmode:
  * Returns (in r3):
  *	0 if nothing needs to be done
  *	1 if something happened that needs to be handled by the host
- *	-1 if there was a guest wakeup (IPI)
+ *	-1 if there was a guest wakeup (IPI or msgsnd)
  *
  * Also sets r12 to the interrupt vector for any interrupt that needs
  * to be handled now by the host (0x500 for external interrupt), or zero.
@@ -2322,7 +2345,15 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 	/* hypervisor doorbell */
 3:	li	r12, BOOK3S_INTERRUPT_H_DOORBELL
+	/* see if it's a host IPI */
 	li	r3, 1
+	lbz	r0, HSTATE_HOST_IPI(r13)
+	cmpwi	r0, 0
+	bnelr
+	/* if not, clear it and return -1 */
+	lis	r6, (PPC_DBELL_SERVER << (63-36))@h
+	PPC_MSGCLR(6)
+	li	r3, -1
 	blr
 
 /*
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 21/23] KVM: PPC: Book3S HV: Streamline guest entry and exit
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu->arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry:     avg 3919.3ns (244 - 56492, 742665 samples)
  rm_exit:     avg 4102.5ns (130 - 36272, 704056 samples)
  rm_intr:     avg 1006.0ns (12 - 75040, 2819905 samples)

and this after the change:

 rm_entry:     avg 2979.8ns (258 - 83740, 836403 samples)
  rm_exit:     avg 3992.9ns (12 - 45572, 838034 samples)
  rm_intr:     avg  922.2ns (12 - 66694, 3127066 samples)

showing a substantial reduction in the time spent in the real-mode
guest entry code, and smaller reductions in the real mode guest exit
and interrupt handling times.  (The test was to start the guest and
boot Fedora 20 big-endian to the login prompt.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 240 +++++++++++++++++++-------------
 1 file changed, 141 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 04728ce..ff1461d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -175,6 +175,19 @@ kvmppc_primary_no_guest:
 	/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
 	mfspr	r3, SPRN_HDEC
 	mtspr	SPRN_DEC, r3
+	/*
+	 * Make sure the primary has finished the MMU switch.
+	 * We should never get here on a secondary thread, but
+	 * check it for robustness' sake.
+	 */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+65:	lbz	r0, VCORE_IN_GUEST(r5)
+	cmpwi	r0, 0
+	beq	65b
+	/* Set LPCR. */
+	ld	r8,VCORE_LPCR(r5)
+	mtspr	SPRN_LPCR,r8
+	isync
 	/* set our bit in napping_threads */
 	ld	r5, HSTATE_KVM_VCORE(r13)
 	lbz	r7, HSTATE_PTID(r13)
@@ -206,7 +219,7 @@ kvm_novcpu_wakeup:
 
 	/* check the wake reason */
 	bl	kvmppc_check_wake_reason
-	
+
 	/* see if any other thread is already exiting */
 	lwz	r0, VCORE_ENTRY_EXIT(r5)
 	cmpwi	r0, 0x100
@@ -243,7 +256,12 @@ kvm_novcpu_wakeup:
 
 kvm_novcpu_exit:
 	ld	r4, HSTATE_KVM_VCPU(r13)
-	b	hdec_soon
+	cmpdi	r4, 0
+	beq	13f
+	addi	r3, r4, VCPU_TB_RMEXIT
+	bl	kvmhv_accumulate_time
+13:	bl	kvmhv_commence_exit
+	b	kvmhv_switch_to_host
 
 /*
  * We come in here when wakened from nap mode.
@@ -417,7 +435,7 @@ kvmppc_hv_entry:
 	ld	r9,VCORE_KVM(r5)	/* pointer to struct kvm */
 	lbz	r6,HSTATE_PTID(r13)
 	cmpwi	r6,0
-	bne	20f
+	bne	10f
 	ld	r6,KVM_SDR1(r9)
 	lwz	r7,KVM_LPID(r9)
 	li	r0,LPID_RSVD		/* switch to reserved LPID */
@@ -488,26 +506,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 	li	r0,1
 	stb	r0,VCORE_IN_GUEST(r5)	/* signal secondaries to continue */
-	b	10f
-
-	/* Secondary threads wait for primary to have done partition switch */
-20:	lbz	r0,VCORE_IN_GUEST(r5)
-	cmpwi	r0,0
-	beq	20b
-
-	/* Set LPCR. */
-10:	ld	r8,VCORE_LPCR(r5)
-	mtspr	SPRN_LPCR,r8
-	isync
-
-	/* Check if HDEC expires soon */
-	mfspr	r3,SPRN_HDEC
-	cmpwi	r3,512		/* 1 microsecond */
-	li	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-	blt	hdec_soon
 
 	/* Do we have a guest vcpu to run? */
-	cmpdi	r4, 0
+10:	cmpdi	r4, 0
 	beq	kvmppc_primary_no_guest
 kvmppc_got_guest:
 
@@ -832,6 +833,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	clrrdi	r6,r6,1
 	mtspr	SPRN_CTRLT,r6
 4:
+	/* Secondary threads wait for primary to have done partition switch */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	lbz	r6, HSTATE_PTID(r13)
+	cmpwi	r6, 0
+	beq	21f
+	lbz	r0, VCORE_IN_GUEST(r5)
+	cmpwi	r0, 0
+	bne	21f
+	HMT_LOW
+20:	lbz	r0, VCORE_IN_GUEST(r5)
+	cmpwi	r0, 0
+	beq	20b
+	HMT_MEDIUM
+21:
+	/* Set LPCR. */
+	ld	r8,VCORE_LPCR(r5)
+	mtspr	SPRN_LPCR,r8
+	isync
+
+	/* Check if HDEC expires soon */
+	mfspr	r3, SPRN_HDEC
+	cmpwi	r3, 512		/* 1 microsecond */
+	blt	hdec_soon
+
 	ld	r6, VCPU_CTR(r4)
 	lwz	r7, VCPU_XER(r4)
 
@@ -936,18 +961,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	b	.
 
 secondary_too_late:
+	li	r12, 0
 	cmpdi	r4, 0
 	beq	11f
+	stw	r12, VCPU_TRAP(r4)
 	addi	r3, r4, VCPU_TB_RMEXIT
 	bl	kvmhv_accumulate_time
 11:	b	kvmhv_switch_to_host
 
 hdec_soon:
-	cmpdi	r4, 0
-	beq	12f
+	li	r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+	stw	r12, VCPU_TRAP(r4)
+	mr	r9, r4
 	addi	r3, r4, VCPU_TB_RMEXIT
 	bl	kvmhv_accumulate_time
-12:	b	kvmhv_do_exit
+	b	guest_exit_cont
 
 /******************************************************************************
  *                                                                            *
@@ -1108,7 +1136,7 @@ guest_exit_cont:		/* r9 = vcpu, r12 = trap, r13 = paca */
 	stw	r7, VCPU_DSISR(r9)
 	/* don't overwrite fault_dar/fault_dsisr if HDSI */
 	cmpwi	r12,BOOK3S_INTERRUPT_H_DATA_STORAGE
-	beq	6f
+	beq	mc_cont
 	std	r6, VCPU_FAULT_DAR(r9)
 	stw	r7, VCPU_FAULT_DSISR(r9)
 
@@ -1120,8 +1148,11 @@ mc_cont:
 	mr	r4, r9
 	bl	kvmhv_accumulate_time
 
+	/* Increment exit count, poke other threads to exit */
+	bl	kvmhv_commence_exit
+
 	/* Save guest CTRL register, set runlatch to 1 */
-6:	mfspr	r6,SPRN_CTRLF
+	mfspr	r6,SPRN_CTRLF
 	stw	r6,VCPU_CTRL(r9)
 	andi.	r0,r6,1
 	bne	4f
@@ -1463,83 +1494,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	slbia
 	ptesync
 
-kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	/*
 	 * POWER7/POWER8 guest -> host partition switch code.
 	 * We don't have to lock against tlbies but we do
 	 * have to coordinate the hardware threads.
 	 */
-	/* Increment the threads-exiting-guest count in the 0xff00
-	   bits of vcore->entry_exit_count */
-	ld	r5,HSTATE_KVM_VCORE(r13)
-	addi	r6,r5,VCORE_ENTRY_EXIT
-41:	lwarx	r3,0,r6
-	addi	r0,r3,0x100
-	stwcx.	r0,0,r6
-	bne	41b
-	isync		/* order stwcx. vs. reading napping_threads */
-
-	/*
-	 * At this point we have an interrupt that we have to pass
-	 * up to the kernel or qemu; we can't handle it in real mode.
-	 * Thus we have to do a partition switch, so we have to
-	 * collect the other threads, if we are the first thread
-	 * to take an interrupt.  To do this, we set the HDEC to 0,
-	 * which causes an HDEC interrupt in all threads within 2ns
-	 * because the HDEC register is shared between all 4 threads.
-	 * However, we don't need to bother if this is an HDEC
-	 * interrupt, since the other threads will already be on their
-	 * way here in that case.
-	 */
-	cmpwi	r3,0x100	/* Are we the first here? */
-	bge	43f
-	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-	beq	43f
-	li	r0,0
-	mtspr	SPRN_HDEC,r0
-
-	/*
-	 * Send a message or IPI to any napping threads, since an HDEC interrupt
-	 * doesn't wake CPUs up from nap.
-	 */
-	lwz	r3,VCORE_NAPPING_THREADS(r5)
-	lbz	r4,HSTATE_PTID(r13)
-	li	r0,1
-	sld	r0,r0,r4
-	andc.	r3,r3,r0		/* no sense IPI'ing ourselves */
-	beq	43f
-	/* Order entry/exit update vs. IPIs */
-	sync
-BEGIN_FTR_SECTION
-	b	45f
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
-	/* Use msgsnd on POWER8 */
-	lhz	r6, VCORE_PCPU(r5)
-	clrldi	r6, r6, 64-3
-	oris	r6, r6, (PPC_DBELL_SERVER << (63-36))@h
-42:	andi.	r0, r3, 1
-	beq	44f
-	PPC_MSGSND(6)
-44:	srdi.	r3, r3, 1
-	addi	r6, r6, 1
-	bne	42b
-	b	kvmhv_switch_to_host
-	/* Use IPIs on POWER7 */
-45:	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
-	subf	r6,r4,r13
-42:	andi.	r0,r3,1
-	beq	44f
-	ld	r8,HSTATE_XICS_PHYS(r6)	/* get thread's XICS reg addr */
-	li	r0,IPI_PRIORITY
-	li	r7,XICS_MFRR
-	stbcix	r0,r7,r8		/* trigger the IPI */
-44:	srdi.	r3,r3,1
-	addi	r6,r6,PACA_SIZE
-	bne	42b
-
 kvmhv_switch_to_host:
 	/* Secondary threads wait for primary to do partition switch */
-43:	ld	r5,HSTATE_KVM_VCORE(r13)
+	ld	r5,HSTATE_KVM_VCORE(r13)
 	ld	r4,VCORE_KVM(r5)	/* pointer to struct kvm */
 	lbz	r3,HSTATE_PTID(r13)
 	cmpwi	r3,0
@@ -1639,6 +1601,84 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtlr	r0
 	blr
 
+kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
+	mflr	r0
+	std	r0, PPC_LR_STKOFF(r1)
+	stdu	r1, -PPC_MIN_STKFRM(r1)
+
+	/* Increment the threads-exiting-guest count in the 0xff00
+	   bits of vcore->entry_exit_count */
+	ld	r5,HSTATE_KVM_VCORE(r13)
+	addi	r6,r5,VCORE_ENTRY_EXIT
+41:	lwarx	r3,0,r6
+	addi	r0,r3,0x100
+	stwcx.	r0,0,r6
+	bne	41b
+	isync		/* order stwcx. vs. reading napping_threads */
+
+	/*
+	 * At this point we have an interrupt that we have to pass
+	 * up to the kernel or qemu; we can't handle it in real mode.
+	 * Thus we have to do a partition switch, so we have to
+	 * collect the other threads, if we are the first thread
+	 * to take an interrupt.  To do this, we set the HDEC to 0,
+	 * which causes an HDEC interrupt in all threads within 2ns
+	 * because the HDEC register is shared between all 4 threads.
+	 * However, we don't need to bother if this is an HDEC
+	 * interrupt, since the other threads will already be on their
+	 * way here in that case.
+	 */
+	cmpwi	r3,0x100	/* Are we the first here? */
+	bge	43f
+	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
+	beq	43f
+	li	r0,0
+	mtspr	SPRN_HDEC,r0
+
+	/*
+	 * Send a message or IPI to any napping threads, since an HDEC interrupt
+	 * doesn't wake CPUs up from nap.
+	 */
+	lwz	r3,VCORE_NAPPING_THREADS(r5)
+	lbz	r4,HSTATE_PTID(r13)
+	li	r0,1
+	sld	r0,r0,r4
+	andc.	r3,r3,r0		/* no sense IPI'ing ourselves */
+	beq	43f
+	/* Order entry/exit update vs. IPIs */
+	sync
+BEGIN_FTR_SECTION
+	b	45f
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
+	/* Use msgsnd on POWER8 */
+	lhz	r6, VCORE_PCPU(r5)
+	clrldi	r6, r6, 64-3
+	oris	r6, r6, (PPC_DBELL_SERVER << (63-36))@h
+42:	andi.	r0, r3, 1
+	beq	44f
+	PPC_MSGSND(6)
+44:	srdi.	r3, r3, 1
+	addi	r6, r6, 1
+	bne	42b
+	b	43f
+	/* Use IPIs on POWER7 */
+45:	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
+	subf	r6,r4,r13
+42:	andi.	r0,r3,1
+	beq	44f
+	ld	r8,HSTATE_XICS_PHYS(r6)	/* get thread's XICS reg addr */
+	li	r0,IPI_PRIORITY
+	li	r7,XICS_MFRR
+	stbcix	r0,r7,r8		/* trigger the IPI */
+44:	srdi.	r3,r3,1
+	addi	r6,r6,PACA_SIZE
+	bne	42b
+
+43:	ld	r0, PPC_MIN_STKFRM+PPC_LR_STKOFF(r1)
+	addi	r1, r1, PPC_MIN_STKFRM
+	mtlr	r0
+	blr
+
 /*
  * Check whether an HDSI is an HPTE not found fault or something else.
  * If it is an HPTE not found fault that is due to the guest accessing
@@ -2074,8 +2114,8 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	lbz	r5,VCPU_PRODDED(r3)
 	cmpwi	r5,0
 	bne	kvm_cede_prodded
-	li	r0,0		/* set trap to 0 to say hcall is handled */
-	stw	r0,VCPU_TRAP(r3)
+	li	r12,0		/* set trap to 0 to say hcall is handled */
+	stw	r12,VCPU_TRAP(r3)
 	li	r0,H_SUCCESS
 	std	r0,VCPU_GPR(R3)(r3)
 
@@ -2279,7 +2319,8 @@ kvm_cede_prodded:
 
 	/* we've ceded but we want to give control to the host */
 kvm_cede_exit:
-	b	hcall_real_fallback
+	ld	r9, HSTATE_KVM_VCPU(r13)
+	b	guest_exit_cont
 
 	/* Try to handle a machine check in real mode */
 machine_check_realmode:
@@ -2417,6 +2458,7 @@ kvmppc_read_intr:
 	bne-	43f
 
 	/* OK, it's an IPI for us */
+	li	r12, 0
 	li	r3, -1
 1:	blr
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 21/23] KVM: PPC: Book3S HV: Streamline guest entry and exit
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

On entry to the guest, secondary threads now wait for the primary to
switch the MMU after loading up most of their state, rather than before.
This means that the secondary threads get into the guest sooner, in the
common case where the secondary threads get to kvmppc_hv_entry before
the primary thread.

On exit, the first thread out increments the exit count and interrupts
the other threads (to get them out of the guest) before saving most
of its state, rather than after.  That means that the other threads
exit sooner and means that the first thread doesn't spend so much
time waiting for the other threads at the point where the MMU gets
switched back to the host.

This pulls out the code that increments the exit count and interrupts
other threads into a separate function, kvmhv_commence_exit().
This also makes sure that r12 and vcpu->arch.trap are set correctly
in some corner cases.

Statistics from /sys/kernel/debug/kvm/vm*/vcpu*/timings show the
improvement.  Aggregating across vcpus for a guest with 32 vcpus,
8 threads/vcore, running on a POWER8, gives this before the change:

 rm_entry:     avg 3919.3ns (244 - 56492, 742665 samples)
  rm_exit:     avg 4102.5ns (130 - 36272, 704056 samples)
  rm_intr:     avg 1006.0ns (12 - 75040, 2819905 samples)

and this after the change:

 rm_entry:     avg 2979.8ns (258 - 83740, 836403 samples)
  rm_exit:     avg 3992.9ns (12 - 45572, 838034 samples)
  rm_intr:     avg  922.2ns (12 - 66694, 3127066 samples)

showing a substantial reduction in the time spent in the real-mode
guest entry code, and smaller reductions in the real mode guest exit
and interrupt handling times.  (The test was to start the guest and
boot Fedora 20 big-endian to the login prompt.)

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 240 +++++++++++++++++++-------------
 1 file changed, 141 insertions(+), 99 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 04728ce..ff1461d 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -175,6 +175,19 @@ kvmppc_primary_no_guest:
 	/* put the HDEC into the DEC, since HDEC interrupts don't wake us */
 	mfspr	r3, SPRN_HDEC
 	mtspr	SPRN_DEC, r3
+	/*
+	 * Make sure the primary has finished the MMU switch.
+	 * We should never get here on a secondary thread, but
+	 * check it for robustness' sake.
+	 */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+65:	lbz	r0, VCORE_IN_GUEST(r5)
+	cmpwi	r0, 0
+	beq	65b
+	/* Set LPCR. */
+	ld	r8,VCORE_LPCR(r5)
+	mtspr	SPRN_LPCR,r8
+	isync
 	/* set our bit in napping_threads */
 	ld	r5, HSTATE_KVM_VCORE(r13)
 	lbz	r7, HSTATE_PTID(r13)
@@ -206,7 +219,7 @@ kvm_novcpu_wakeup:
 
 	/* check the wake reason */
 	bl	kvmppc_check_wake_reason
-	
+
 	/* see if any other thread is already exiting */
 	lwz	r0, VCORE_ENTRY_EXIT(r5)
 	cmpwi	r0, 0x100
@@ -243,7 +256,12 @@ kvm_novcpu_wakeup:
 
 kvm_novcpu_exit:
 	ld	r4, HSTATE_KVM_VCPU(r13)
-	b	hdec_soon
+	cmpdi	r4, 0
+	beq	13f
+	addi	r3, r4, VCPU_TB_RMEXIT
+	bl	kvmhv_accumulate_time
+13:	bl	kvmhv_commence_exit
+	b	kvmhv_switch_to_host
 
 /*
  * We come in here when wakened from nap mode.
@@ -417,7 +435,7 @@ kvmppc_hv_entry:
 	ld	r9,VCORE_KVM(r5)	/* pointer to struct kvm */
 	lbz	r6,HSTATE_PTID(r13)
 	cmpwi	r6,0
-	bne	20f
+	bne	10f
 	ld	r6,KVM_SDR1(r9)
 	lwz	r7,KVM_LPID(r9)
 	li	r0,LPID_RSVD		/* switch to reserved LPID */
@@ -488,26 +506,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 
 	li	r0,1
 	stb	r0,VCORE_IN_GUEST(r5)	/* signal secondaries to continue */
-	b	10f
-
-	/* Secondary threads wait for primary to have done partition switch */
-20:	lbz	r0,VCORE_IN_GUEST(r5)
-	cmpwi	r0,0
-	beq	20b
-
-	/* Set LPCR. */
-10:	ld	r8,VCORE_LPCR(r5)
-	mtspr	SPRN_LPCR,r8
-	isync
-
-	/* Check if HDEC expires soon */
-	mfspr	r3,SPRN_HDEC
-	cmpwi	r3,512		/* 1 microsecond */
-	li	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-	blt	hdec_soon
 
 	/* Do we have a guest vcpu to run? */
-	cmpdi	r4, 0
+10:	cmpdi	r4, 0
 	beq	kvmppc_primary_no_guest
 kvmppc_got_guest:
 
@@ -832,6 +833,30 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	clrrdi	r6,r6,1
 	mtspr	SPRN_CTRLT,r6
 4:
+	/* Secondary threads wait for primary to have done partition switch */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	lbz	r6, HSTATE_PTID(r13)
+	cmpwi	r6, 0
+	beq	21f
+	lbz	r0, VCORE_IN_GUEST(r5)
+	cmpwi	r0, 0
+	bne	21f
+	HMT_LOW
+20:	lbz	r0, VCORE_IN_GUEST(r5)
+	cmpwi	r0, 0
+	beq	20b
+	HMT_MEDIUM
+21:
+	/* Set LPCR. */
+	ld	r8,VCORE_LPCR(r5)
+	mtspr	SPRN_LPCR,r8
+	isync
+
+	/* Check if HDEC expires soon */
+	mfspr	r3, SPRN_HDEC
+	cmpwi	r3, 512		/* 1 microsecond */
+	blt	hdec_soon
+
 	ld	r6, VCPU_CTR(r4)
 	lwz	r7, VCPU_XER(r4)
 
@@ -936,18 +961,21 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
 	b	.
 
 secondary_too_late:
+	li	r12, 0
 	cmpdi	r4, 0
 	beq	11f
+	stw	r12, VCPU_TRAP(r4)
 	addi	r3, r4, VCPU_TB_RMEXIT
 	bl	kvmhv_accumulate_time
 11:	b	kvmhv_switch_to_host
 
 hdec_soon:
-	cmpdi	r4, 0
-	beq	12f
+	li	r12, BOOK3S_INTERRUPT_HV_DECREMENTER
+	stw	r12, VCPU_TRAP(r4)
+	mr	r9, r4
 	addi	r3, r4, VCPU_TB_RMEXIT
 	bl	kvmhv_accumulate_time
-12:	b	kvmhv_do_exit
+	b	guest_exit_cont
 
 /******************************************************************************
  *                                                                            *
@@ -1108,7 +1136,7 @@ guest_exit_cont:		/* r9 = vcpu, r12 = trap, r13 = paca */
 	stw	r7, VCPU_DSISR(r9)
 	/* don't overwrite fault_dar/fault_dsisr if HDSI */
 	cmpwi	r12,BOOK3S_INTERRUPT_H_DATA_STORAGE
-	beq	6f
+	beq	mc_cont
 	std	r6, VCPU_FAULT_DAR(r9)
 	stw	r7, VCPU_FAULT_DSISR(r9)
 
@@ -1120,8 +1148,11 @@ mc_cont:
 	mr	r4, r9
 	bl	kvmhv_accumulate_time
 
+	/* Increment exit count, poke other threads to exit */
+	bl	kvmhv_commence_exit
+
 	/* Save guest CTRL register, set runlatch to 1 */
-6:	mfspr	r6,SPRN_CTRLF
+	mfspr	r6,SPRN_CTRLF
 	stw	r6,VCPU_CTRL(r9)
 	andi.	r0,r6,1
 	bne	4f
@@ -1463,83 +1494,14 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	slbia
 	ptesync
 
-kvmhv_do_exit:			/* r12 = trap, r13 = paca */
 	/*
 	 * POWER7/POWER8 guest -> host partition switch code.
 	 * We don't have to lock against tlbies but we do
 	 * have to coordinate the hardware threads.
 	 */
-	/* Increment the threads-exiting-guest count in the 0xff00
-	   bits of vcore->entry_exit_count */
-	ld	r5,HSTATE_KVM_VCORE(r13)
-	addi	r6,r5,VCORE_ENTRY_EXIT
-41:	lwarx	r3,0,r6
-	addi	r0,r3,0x100
-	stwcx.	r0,0,r6
-	bne	41b
-	isync		/* order stwcx. vs. reading napping_threads */
-
-	/*
-	 * At this point we have an interrupt that we have to pass
-	 * up to the kernel or qemu; we can't handle it in real mode.
-	 * Thus we have to do a partition switch, so we have to
-	 * collect the other threads, if we are the first thread
-	 * to take an interrupt.  To do this, we set the HDEC to 0,
-	 * which causes an HDEC interrupt in all threads within 2ns
-	 * because the HDEC register is shared between all 4 threads.
-	 * However, we don't need to bother if this is an HDEC
-	 * interrupt, since the other threads will already be on their
-	 * way here in that case.
-	 */
-	cmpwi	r3,0x100	/* Are we the first here? */
-	bge	43f
-	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-	beq	43f
-	li	r0,0
-	mtspr	SPRN_HDEC,r0
-
-	/*
-	 * Send a message or IPI to any napping threads, since an HDEC interrupt
-	 * doesn't wake CPUs up from nap.
-	 */
-	lwz	r3,VCORE_NAPPING_THREADS(r5)
-	lbz	r4,HSTATE_PTID(r13)
-	li	r0,1
-	sld	r0,r0,r4
-	andc.	r3,r3,r0		/* no sense IPI'ing ourselves */
-	beq	43f
-	/* Order entry/exit update vs. IPIs */
-	sync
-BEGIN_FTR_SECTION
-	b	45f
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
-	/* Use msgsnd on POWER8 */
-	lhz	r6, VCORE_PCPU(r5)
-	clrldi	r6, r6, 64-3
-	oris	r6, r6, (PPC_DBELL_SERVER << (63-36))@h
-42:	andi.	r0, r3, 1
-	beq	44f
-	PPC_MSGSND(6)
-44:	srdi.	r3, r3, 1
-	addi	r6, r6, 1
-	bne	42b
-	b	kvmhv_switch_to_host
-	/* Use IPIs on POWER7 */
-45:	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
-	subf	r6,r4,r13
-42:	andi.	r0,r3,1
-	beq	44f
-	ld	r8,HSTATE_XICS_PHYS(r6)	/* get thread's XICS reg addr */
-	li	r0,IPI_PRIORITY
-	li	r7,XICS_MFRR
-	stbcix	r0,r7,r8		/* trigger the IPI */
-44:	srdi.	r3,r3,1
-	addi	r6,r6,PACA_SIZE
-	bne	42b
-
 kvmhv_switch_to_host:
 	/* Secondary threads wait for primary to do partition switch */
-43:	ld	r5,HSTATE_KVM_VCORE(r13)
+	ld	r5,HSTATE_KVM_VCORE(r13)
 	ld	r4,VCORE_KVM(r5)	/* pointer to struct kvm */
 	lbz	r3,HSTATE_PTID(r13)
 	cmpwi	r3,0
@@ -1639,6 +1601,84 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtlr	r0
 	blr
 
+kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
+	mflr	r0
+	std	r0, PPC_LR_STKOFF(r1)
+	stdu	r1, -PPC_MIN_STKFRM(r1)
+
+	/* Increment the threads-exiting-guest count in the 0xff00
+	   bits of vcore->entry_exit_count */
+	ld	r5,HSTATE_KVM_VCORE(r13)
+	addi	r6,r5,VCORE_ENTRY_EXIT
+41:	lwarx	r3,0,r6
+	addi	r0,r3,0x100
+	stwcx.	r0,0,r6
+	bne	41b
+	isync		/* order stwcx. vs. reading napping_threads */
+
+	/*
+	 * At this point we have an interrupt that we have to pass
+	 * up to the kernel or qemu; we can't handle it in real mode.
+	 * Thus we have to do a partition switch, so we have to
+	 * collect the other threads, if we are the first thread
+	 * to take an interrupt.  To do this, we set the HDEC to 0,
+	 * which causes an HDEC interrupt in all threads within 2ns
+	 * because the HDEC register is shared between all 4 threads.
+	 * However, we don't need to bother if this is an HDEC
+	 * interrupt, since the other threads will already be on their
+	 * way here in that case.
+	 */
+	cmpwi	r3,0x100	/* Are we the first here? */
+	bge	43f
+	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
+	beq	43f
+	li	r0,0
+	mtspr	SPRN_HDEC,r0
+
+	/*
+	 * Send a message or IPI to any napping threads, since an HDEC interrupt
+	 * doesn't wake CPUs up from nap.
+	 */
+	lwz	r3,VCORE_NAPPING_THREADS(r5)
+	lbz	r4,HSTATE_PTID(r13)
+	li	r0,1
+	sld	r0,r0,r4
+	andc.	r3,r3,r0		/* no sense IPI'ing ourselves */
+	beq	43f
+	/* Order entry/exit update vs. IPIs */
+	sync
+BEGIN_FTR_SECTION
+	b	45f
+END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
+	/* Use msgsnd on POWER8 */
+	lhz	r6, VCORE_PCPU(r5)
+	clrldi	r6, r6, 64-3
+	oris	r6, r6, (PPC_DBELL_SERVER << (63-36))@h
+42:	andi.	r0, r3, 1
+	beq	44f
+	PPC_MSGSND(6)
+44:	srdi.	r3, r3, 1
+	addi	r6, r6, 1
+	bne	42b
+	b	43f
+	/* Use IPIs on POWER7 */
+45:	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
+	subf	r6,r4,r13
+42:	andi.	r0,r3,1
+	beq	44f
+	ld	r8,HSTATE_XICS_PHYS(r6)	/* get thread's XICS reg addr */
+	li	r0,IPI_PRIORITY
+	li	r7,XICS_MFRR
+	stbcix	r0,r7,r8		/* trigger the IPI */
+44:	srdi.	r3,r3,1
+	addi	r6,r6,PACA_SIZE
+	bne	42b
+
+43:	ld	r0, PPC_MIN_STKFRM+PPC_LR_STKOFF(r1)
+	addi	r1, r1, PPC_MIN_STKFRM
+	mtlr	r0
+	blr
+
 /*
  * Check whether an HDSI is an HPTE not found fault or something else.
  * If it is an HPTE not found fault that is due to the guest accessing
@@ -2074,8 +2114,8 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	lbz	r5,VCPU_PRODDED(r3)
 	cmpwi	r5,0
 	bne	kvm_cede_prodded
-	li	r0,0		/* set trap to 0 to say hcall is handled */
-	stw	r0,VCPU_TRAP(r3)
+	li	r12,0		/* set trap to 0 to say hcall is handled */
+	stw	r12,VCPU_TRAP(r3)
 	li	r0,H_SUCCESS
 	std	r0,VCPU_GPR(R3)(r3)
 
@@ -2279,7 +2319,8 @@ kvm_cede_prodded:
 
 	/* we've ceded but we want to give control to the host */
 kvm_cede_exit:
-	b	hcall_real_fallback
+	ld	r9, HSTATE_KVM_VCPU(r13)
+	b	guest_exit_cont
 
 	/* Try to handle a machine check in real mode */
 machine_check_realmode:
@@ -2417,6 +2458,7 @@ kvmppc_read_intr:
 	bne-	43f
 
 	/* OK, it's an IPI for us */
+	li	r12, 0
 	li	r3, -1
 1:	blr
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 22/23] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:39   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h     | 15 ++++----
 arch/powerpc/kernel/asm-offsets.c       |  2 +-
 arch/powerpc/kvm/book3s_hv.c            |  5 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c    | 10 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 63 +++++++++++++++------------------
 5 files changed, 45 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index f6d4232..c2b9551 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -263,15 +263,15 @@ struct kvm_arch {
 
 /*
  * Struct for a virtual core.
- * Note: entry_exit_count combines an entry count in the bottom 8 bits
- * and an exit count in the next 8 bits.  This is so that we can
- * atomically increment the entry count iff the exit count is 0
- * without taking the lock.
+ * Note: entry_exit_map combines a bitmap of threads that have entered
+ * in the bottom 8 bits and a bitmap of threads that have exited in the
+ * next 8 bits.  This is so that we can atomically set the entry bit
+ * iff the exit map is 0 without taking a lock.
  */
 struct kvmppc_vcore {
 	int n_runnable;
 	int num_threads;
-	int entry_exit_count;
+	int entry_exit_map;
 	int napping_threads;
 	int first_vcpuid;
 	u16 pcpu;
@@ -296,8 +296,9 @@ struct kvmppc_vcore {
 	ulong conferring_threads;
 };
 
-#define VCORE_ENTRY_COUNT(vc)	((vc)->entry_exit_count & 0xff)
-#define VCORE_EXIT_COUNT(vc)	((vc)->entry_exit_count >> 8)
+#define VCORE_ENTRY_MAP(vc)	((vc)->entry_exit_map & 0xff)
+#define VCORE_EXIT_MAP(vc)	((vc)->entry_exit_map >> 8)
+#define VCORE_IS_EXITING(vc)	(VCORE_EXIT_MAP(vc) != 0)
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE	0
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0ce2aa6..ed348e5 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,7 @@ int main(void)
 	DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
 	DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
 	DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
-	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count));
+	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
 	DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
 	DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
 	DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2c34bae..9ea0eb5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1941,7 +1941,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	/*
 	 * Initialize *vc.
 	 */
-	vc->entry_exit_count = 0;
+	vc->entry_exit_map = 0;
 	vc->preempt_tb = TB_NIL;
 	vc->in_guest = 0;
 	vc->napping_threads = 0;
@@ -2108,8 +2108,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	 * this thread straight away and have it join in.
 	 */
 	if (!signal_pending(current)) {
-		if (vc->vcore_state == VCORE_RUNNING &&
-		    VCORE_EXIT_COUNT(vc) == 0) {
+		if (vc->vcore_state == VCORE_RUNNING && !VCORE_IS_EXITING(vc)) {
 			kvmppc_create_dtl_entry(vcpu, vc);
 			kvmppc_start_thread(vcpu);
 			trace_kvm_guest_enter(vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1954a1c..2754251 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int target,
 	int rv = H_SUCCESS; /* => don't yield */
 
 	set_bit(vcpu->arch.ptid, &vc->conferring_threads);
-	while ((get_tb() < stop) && (VCORE_EXIT_COUNT(vc) == 0)) {
-		threads_running = VCORE_ENTRY_COUNT(vc);
-		threads_ceded = hweight32(vc->napping_threads);
-		threads_conferring = hweight32(vc->conferring_threads);
-		if (threads_ceded + threads_conferring >= threads_running) {
+	while ((get_tb() < stop) && !VCORE_IS_EXITING(vc)) {
+		threads_running = VCORE_ENTRY_MAP(vc);
+		threads_ceded = vc->napping_threads;
+		threads_conferring = vc->conferring_threads;
+		if ((threads_ceded | threads_conferring) == threads_running) {
 			rv = H_TOO_HARD; /* => do yield */
 			break;
 		}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index ff1461d..854ddac 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -198,7 +198,7 @@ kvmppc_primary_no_guest:
 	or	r3, r3, r0
 	stwcx.	r3, 0, r6
 	bne	1b
-	/* order napping_threads update vs testing entry_exit_count */
+	/* order napping_threads update vs testing entry_exit_map */
 	isync
 	li	r12, 0
 	lwz	r7, VCORE_ENTRY_EXIT(r5)
@@ -421,19 +421,21 @@ kvmppc_hv_entry:
 	 * We don't have to lock against concurrent tlbies,
 	 * but we do have to coordinate across hardware threads.
 	 */
-	/* Increment entry count iff exit count is zero. */
-	ld	r5,HSTATE_KVM_VCORE(r13)
-	addi	r9,r5,VCORE_ENTRY_EXIT
-21:	lwarx	r3,0,r9
-	cmpwi	r3,0x100		/* any threads starting to exit? */
+	/* Set bit in entry map iff exit map is zero. */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	li	r7, 1
+	lbz	r6, HSTATE_PTID(r13)
+	sld	r7, r7, r6
+	addi	r9, r5, VCORE_ENTRY_EXIT
+21:	lwarx	r3, 0, r9
+	cmpwi	r3, 0x100		/* any threads starting to exit? */
 	bge	secondary_too_late	/* if so we're too late to the party */
-	addi	r3,r3,1
-	stwcx.	r3,0,r9
+	or	r3, r3, r7
+	stwcx.	r3, 0, r9
 	bne	21b
 
 	/* Primary thread switches to guest partition. */
 	ld	r9,VCORE_KVM(r5)	/* pointer to struct kvm */
-	lbz	r6,HSTATE_PTID(r13)
 	cmpwi	r6,0
 	bne	10f
 	ld	r6,KVM_SDR1(r9)
@@ -1606,13 +1608,16 @@ kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
 	std	r0, PPC_LR_STKOFF(r1)
 	stdu	r1, -PPC_MIN_STKFRM(r1)
 
-	/* Increment the threads-exiting-guest count in the 0xff00
-	   bits of vcore->entry_exit_count */
-	ld	r5,HSTATE_KVM_VCORE(r13)
-	addi	r6,r5,VCORE_ENTRY_EXIT
-41:	lwarx	r3,0,r6
-	addi	r0,r3,0x100
-	stwcx.	r0,0,r6
+	/* Set our bit in the threads-exiting-guest map in the 0xff00
+	   bits of vcore->entry_exit_map */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	lbz	r4, HSTATE_PTID(r13)
+	li	r7, 0x100
+	sld	r7, r7, r4
+	addi	r6, r5, VCORE_ENTRY_EXIT
+41:	lwarx	r3, 0, r6
+	or	r0, r3, r7
+	stwcx.	r0, 0, r6
 	bne	41b
 	isync		/* order stwcx. vs. reading napping_threads */
 
@@ -1621,9 +1626,9 @@ kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
 	 * up to the kernel or qemu; we can't handle it in real mode.
 	 * Thus we have to do a partition switch, so we have to
 	 * collect the other threads, if we are the first thread
-	 * to take an interrupt.  To do this, we set the HDEC to 0,
-	 * which causes an HDEC interrupt in all threads within 2ns
-	 * because the HDEC register is shared between all 4 threads.
+	 * to take an interrupt.  To do this, we send a message or
+	 * IPI to all the threads that have their bit set in the entry
+	 * map in vcore->entry_exit_map (other than ourselves).
 	 * However, we don't need to bother if this is an HDEC
 	 * interrupt, since the other threads will already be on their
 	 * way here in that case.
@@ -1632,18 +1637,9 @@ kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
 	bge	43f
 	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
 	beq	43f
-	li	r0,0
-	mtspr	SPRN_HDEC,r0
 
-	/*
-	 * Send a message or IPI to any napping threads, since an HDEC interrupt
-	 * doesn't wake CPUs up from nap.
-	 */
-	lwz	r3,VCORE_NAPPING_THREADS(r5)
-	lbz	r4,HSTATE_PTID(r13)
-	li	r0,1
-	sld	r0,r0,r4
-	andc.	r3,r3,r0		/* no sense IPI'ing ourselves */
+	srwi	r0, r7, 8
+	andc.	r3, r3, r0		/* no sense IPI'ing ourselves */
 	beq	43f
 	/* Order entry/exit update vs. IPIs */
 	sync
@@ -2133,12 +2129,11 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	addi	r6,r5,VCORE_NAPPING_THREADS
 31:	lwarx	r4,0,r6
 	or	r4,r4,r0
-	PPC_POPCNTW(R7,R4)
-	cmpw	r7,r8
-	bge	kvm_cede_exit
+	cmpw	r4,r8
+	beq	kvm_cede_exit
 	stwcx.	r4,0,r6
 	bne	31b
-	/* order napping_threads update vs testing entry_exit_count */
+	/* order napping_threads update vs testing entry_exit_map */
 	isync
 	li	r0,NAPPING_CEDE
 	stb	r0,HSTATE_NAPPING(r13)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 22/23] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count
@ 2015-03-20  9:39   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:39 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

Currently, the entry_exit_count field in the kvmppc_vcore struct
contains two 8-bit counts, one of the threads that have started entering
the guest, and one of the threads that have started exiting the guest.
This changes it to an entry_exit_map field which contains two bitmaps
of 8 bits each.  The advantage of doing this is that it gives us a
bitmap of which threads need to be signalled when exiting the guest.
That means that we no longer need to use the trick of setting the
HDEC to 0 to pull the other threads out of the guest, which led in
some cases to a spurious HDEC interrupt on the next guest entry.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h     | 15 ++++----
 arch/powerpc/kernel/asm-offsets.c       |  2 +-
 arch/powerpc/kvm/book3s_hv.c            |  5 ++-
 arch/powerpc/kvm/book3s_hv_builtin.c    | 10 +++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 63 +++++++++++++++------------------
 5 files changed, 45 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index f6d4232..c2b9551 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -263,15 +263,15 @@ struct kvm_arch {
 
 /*
  * Struct for a virtual core.
- * Note: entry_exit_count combines an entry count in the bottom 8 bits
- * and an exit count in the next 8 bits.  This is so that we can
- * atomically increment the entry count iff the exit count is 0
- * without taking the lock.
+ * Note: entry_exit_map combines a bitmap of threads that have entered
+ * in the bottom 8 bits and a bitmap of threads that have exited in the
+ * next 8 bits.  This is so that we can atomically set the entry bit
+ * iff the exit map is 0 without taking a lock.
  */
 struct kvmppc_vcore {
 	int n_runnable;
 	int num_threads;
-	int entry_exit_count;
+	int entry_exit_map;
 	int napping_threads;
 	int first_vcpuid;
 	u16 pcpu;
@@ -296,8 +296,9 @@ struct kvmppc_vcore {
 	ulong conferring_threads;
 };
 
-#define VCORE_ENTRY_COUNT(vc)	((vc)->entry_exit_count & 0xff)
-#define VCORE_EXIT_COUNT(vc)	((vc)->entry_exit_count >> 8)
+#define VCORE_ENTRY_MAP(vc)	((vc)->entry_exit_map & 0xff)
+#define VCORE_EXIT_MAP(vc)	((vc)->entry_exit_map >> 8)
+#define VCORE_IS_EXITING(vc)	(VCORE_EXIT_MAP(vc) != 0)
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE	0
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0ce2aa6..ed348e5 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -561,7 +561,7 @@ int main(void)
 	DEFINE(VCPU_ACOP, offsetof(struct kvm_vcpu, arch.acop));
 	DEFINE(VCPU_WORT, offsetof(struct kvm_vcpu, arch.wort));
 	DEFINE(VCPU_SHADOW_SRR1, offsetof(struct kvm_vcpu, arch.shadow_srr1));
-	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_count));
+	DEFINE(VCORE_ENTRY_EXIT, offsetof(struct kvmppc_vcore, entry_exit_map));
 	DEFINE(VCORE_IN_GUEST, offsetof(struct kvmppc_vcore, in_guest));
 	DEFINE(VCORE_NAPPING_THREADS, offsetof(struct kvmppc_vcore, napping_threads));
 	DEFINE(VCORE_KVM, offsetof(struct kvmppc_vcore, kvm));
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2c34bae..9ea0eb5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1941,7 +1941,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	/*
 	 * Initialize *vc.
 	 */
-	vc->entry_exit_count = 0;
+	vc->entry_exit_map = 0;
 	vc->preempt_tb = TB_NIL;
 	vc->in_guest = 0;
 	vc->napping_threads = 0;
@@ -2108,8 +2108,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	 * this thread straight away and have it join in.
 	 */
 	if (!signal_pending(current)) {
-		if (vc->vcore_state = VCORE_RUNNING &&
-		    VCORE_EXIT_COUNT(vc) = 0) {
+		if (vc->vcore_state = VCORE_RUNNING && !VCORE_IS_EXITING(vc)) {
 			kvmppc_create_dtl_entry(vcpu, vc);
 			kvmppc_start_thread(vcpu);
 			trace_kvm_guest_enter(vcpu);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 1954a1c..2754251 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -115,11 +115,11 @@ long int kvmppc_rm_h_confer(struct kvm_vcpu *vcpu, int target,
 	int rv = H_SUCCESS; /* => don't yield */
 
 	set_bit(vcpu->arch.ptid, &vc->conferring_threads);
-	while ((get_tb() < stop) && (VCORE_EXIT_COUNT(vc) = 0)) {
-		threads_running = VCORE_ENTRY_COUNT(vc);
-		threads_ceded = hweight32(vc->napping_threads);
-		threads_conferring = hweight32(vc->conferring_threads);
-		if (threads_ceded + threads_conferring >= threads_running) {
+	while ((get_tb() < stop) && !VCORE_IS_EXITING(vc)) {
+		threads_running = VCORE_ENTRY_MAP(vc);
+		threads_ceded = vc->napping_threads;
+		threads_conferring = vc->conferring_threads;
+		if ((threads_ceded | threads_conferring) = threads_running) {
 			rv = H_TOO_HARD; /* => do yield */
 			break;
 		}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index ff1461d..854ddac 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -198,7 +198,7 @@ kvmppc_primary_no_guest:
 	or	r3, r3, r0
 	stwcx.	r3, 0, r6
 	bne	1b
-	/* order napping_threads update vs testing entry_exit_count */
+	/* order napping_threads update vs testing entry_exit_map */
 	isync
 	li	r12, 0
 	lwz	r7, VCORE_ENTRY_EXIT(r5)
@@ -421,19 +421,21 @@ kvmppc_hv_entry:
 	 * We don't have to lock against concurrent tlbies,
 	 * but we do have to coordinate across hardware threads.
 	 */
-	/* Increment entry count iff exit count is zero. */
-	ld	r5,HSTATE_KVM_VCORE(r13)
-	addi	r9,r5,VCORE_ENTRY_EXIT
-21:	lwarx	r3,0,r9
-	cmpwi	r3,0x100		/* any threads starting to exit? */
+	/* Set bit in entry map iff exit map is zero. */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	li	r7, 1
+	lbz	r6, HSTATE_PTID(r13)
+	sld	r7, r7, r6
+	addi	r9, r5, VCORE_ENTRY_EXIT
+21:	lwarx	r3, 0, r9
+	cmpwi	r3, 0x100		/* any threads starting to exit? */
 	bge	secondary_too_late	/* if so we're too late to the party */
-	addi	r3,r3,1
-	stwcx.	r3,0,r9
+	or	r3, r3, r7
+	stwcx.	r3, 0, r9
 	bne	21b
 
 	/* Primary thread switches to guest partition. */
 	ld	r9,VCORE_KVM(r5)	/* pointer to struct kvm */
-	lbz	r6,HSTATE_PTID(r13)
 	cmpwi	r6,0
 	bne	10f
 	ld	r6,KVM_SDR1(r9)
@@ -1606,13 +1608,16 @@ kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
 	std	r0, PPC_LR_STKOFF(r1)
 	stdu	r1, -PPC_MIN_STKFRM(r1)
 
-	/* Increment the threads-exiting-guest count in the 0xff00
-	   bits of vcore->entry_exit_count */
-	ld	r5,HSTATE_KVM_VCORE(r13)
-	addi	r6,r5,VCORE_ENTRY_EXIT
-41:	lwarx	r3,0,r6
-	addi	r0,r3,0x100
-	stwcx.	r0,0,r6
+	/* Set our bit in the threads-exiting-guest map in the 0xff00
+	   bits of vcore->entry_exit_map */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	lbz	r4, HSTATE_PTID(r13)
+	li	r7, 0x100
+	sld	r7, r7, r4
+	addi	r6, r5, VCORE_ENTRY_EXIT
+41:	lwarx	r3, 0, r6
+	or	r0, r3, r7
+	stwcx.	r0, 0, r6
 	bne	41b
 	isync		/* order stwcx. vs. reading napping_threads */
 
@@ -1621,9 +1626,9 @@ kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
 	 * up to the kernel or qemu; we can't handle it in real mode.
 	 * Thus we have to do a partition switch, so we have to
 	 * collect the other threads, if we are the first thread
-	 * to take an interrupt.  To do this, we set the HDEC to 0,
-	 * which causes an HDEC interrupt in all threads within 2ns
-	 * because the HDEC register is shared between all 4 threads.
+	 * to take an interrupt.  To do this, we send a message or
+	 * IPI to all the threads that have their bit set in the entry
+	 * map in vcore->entry_exit_map (other than ourselves).
 	 * However, we don't need to bother if this is an HDEC
 	 * interrupt, since the other threads will already be on their
 	 * way here in that case.
@@ -1632,18 +1637,9 @@ kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
 	bge	43f
 	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
 	beq	43f
-	li	r0,0
-	mtspr	SPRN_HDEC,r0
 
-	/*
-	 * Send a message or IPI to any napping threads, since an HDEC interrupt
-	 * doesn't wake CPUs up from nap.
-	 */
-	lwz	r3,VCORE_NAPPING_THREADS(r5)
-	lbz	r4,HSTATE_PTID(r13)
-	li	r0,1
-	sld	r0,r0,r4
-	andc.	r3,r3,r0		/* no sense IPI'ing ourselves */
+	srwi	r0, r7, 8
+	andc.	r3, r3, r0		/* no sense IPI'ing ourselves */
 	beq	43f
 	/* Order entry/exit update vs. IPIs */
 	sync
@@ -2133,12 +2129,11 @@ _GLOBAL(kvmppc_h_cede)		/* r3 = vcpu pointer, r11 = msr, r13 = paca */
 	addi	r6,r5,VCORE_NAPPING_THREADS
 31:	lwarx	r4,0,r6
 	or	r4,r4,r0
-	PPC_POPCNTW(R7,R4)
-	cmpw	r7,r8
-	bge	kvm_cede_exit
+	cmpw	r4,r8
+	beq	kvm_cede_exit
 	stwcx.	r4,0,r6
 	bne	31b
-	/* order napping_threads update vs testing entry_exit_count */
+	/* order napping_threads update vs testing entry_exit_map */
 	isync
 	li	r0,NAPPING_CEDE
 	stb	r0,HSTATE_NAPPING(r13)
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 23/23] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20  9:40   ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:40 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c.  It also moves the IPI/message sending code
that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function
so it can be used by kvmhv_commence_exit() as well as
icp_rm_set_vcpu_irq().

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv_builtin.c     | 73 ++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rm_xics.c     | 22 +--------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 81 ++++----------------------------
 4 files changed, 85 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 869c53f..2b84e48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
+extern void kvmhv_rm_send_ipi(int cpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 2754251..dbfc525 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -22,6 +22,8 @@
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
 #include <asm/archrandom.h>
+#include <asm/dbell.h>
+#include <asm/xics.h>
 
 #define KVM_CMA_CHUNK_ORDER	18
 
@@ -184,3 +186,74 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu)
 
 	return H_HARDWARE;
 }
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+	__asm__ __volatile__("stbcix %0,0,%1"
+		: : "r" (val), "r" (paddr) : "memory");
+}
+
+/*
+ * Send an interrupt or message to another CPU.
+ * This can only be called in real mode.
+ * The caller needs to include any barrier needed to order writes
+ * to memory vs. the IPI/message.
+ */
+void kvmhv_rm_send_ipi(int cpu)
+{
+	unsigned long xics_phys;
+
+	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
+	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
+	    (cpu & ~7) == (raw_smp_processor_id() & ~7)) {
+		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+		msg |= cpu & 7;
+		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+		return;
+	}
+
+	/* Not too hard, then poke the target */
+	xics_phys = paca[cpu].kvm_hstate.xics_phys;
+	rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+/*
+ * The following functions are called from the assembly code
+ * in book3s_hv_rmhandlers.S.
+ */
+static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active)
+{
+	int cpu = vc->pcpu;
+
+	/* Order setting of exit map vs. msgsnd/IPI */
+	smp_mb();
+	for (; active; active >>= 1, ++cpu)
+		if (active & 1)
+			kvmhv_rm_send_ipi(cpu);
+}
+
+void kvmhv_commence_exit(int trap)
+{
+	struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore;
+	int ptid = local_paca->kvm_hstate.ptid;
+	int me, ee;
+
+	/* Set our bit in the threads-exiting-guest map in the 0xff00
+	   bits of vcore->entry_exit_map */
+	me = 0x100 << ptid;
+	do {
+		ee = vc->entry_exit_map;
+	} while (cmpxchg(&vc->entry_exit_map, ee, ee | me) != ee);
+
+	/* Are we the first here? */
+	if ((ee >> 8) != 0)
+		return;
+
+	/*
+	 * Trigger the other threads in this vcore to exit the guest.
+	 * If this is a hypervisor decrementer interrupt then they
+	 * will be already on their way out of the guest.
+	 */
+	if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER)
+		kvmhv_interrupt_vcore(vc, ee & ~(1 << ptid));
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 457a8b1..046ab44 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -27,12 +27,6 @@
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 			    u32 new_irq);
 
-static inline void rm_writeb(unsigned long paddr, u8 val)
-{
-	__asm__ __volatile__("sync; stbcix %0,0,%1"
-		: : "r" (val), "r" (paddr) : "memory");
-}
-
 /* -- ICS routines -- */
 static void ics_rm_check_resend(struct kvmppc_xics *xics,
 				struct kvmppc_ics *ics, struct kvmppc_icp *icp)
@@ -61,7 +55,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
 				struct kvm_vcpu *this_vcpu)
 {
 	struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
-	unsigned long xics_phys;
 	int cpu;
 
 	/* Mark the target VCPU as having an interrupt pending */
@@ -84,19 +77,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
 	/* In SMT cpu will always point to thread 0, we adjust it */
 	cpu += vcpu->arch.ptid;
 
-	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
-	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
-	    (cpu & ~7) == (raw_smp_processor_id() & ~7)) {
-		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
-		msg |= cpu & 7;
-		smp_mb();
-		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
-		return;
-	}
-
-	/* Not too hard, then poke the target */
-	xics_phys = paca[cpu].kvm_hstate.xics_phys;
-	rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+	smp_mb();
+	kvmhv_rm_send_ipi(cpu);
 }
 
 static void icp_rm_clr_vcpu_irq(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 854ddac..47aeaf7 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -260,7 +260,11 @@ kvm_novcpu_exit:
 	beq	13f
 	addi	r3, r4, VCPU_TB_RMEXIT
 	bl	kvmhv_accumulate_time
-13:	bl	kvmhv_commence_exit
+13:	mr	r3, r12
+	stw	r12, 112-4(r1)
+	bl	kvmhv_commence_exit
+	nop
+	lwz	r12, 112-4(r1)
 	b	kvmhv_switch_to_host
 
 /*
@@ -1152,6 +1156,9 @@ mc_cont:
 
 	/* Increment exit count, poke other threads to exit */
 	bl	kvmhv_commence_exit
+	nop
+	ld	r9, HSTATE_KVM_VCPU(r13)
+	lwz	r12, VCPU_TRAP(r9)
 
 	/* Save guest CTRL register, set runlatch to 1 */
 	mfspr	r6,SPRN_CTRLF
@@ -1603,78 +1610,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtlr	r0
 	blr
 
-kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
-	mflr	r0
-	std	r0, PPC_LR_STKOFF(r1)
-	stdu	r1, -PPC_MIN_STKFRM(r1)
-
-	/* Set our bit in the threads-exiting-guest map in the 0xff00
-	   bits of vcore->entry_exit_map */
-	ld	r5, HSTATE_KVM_VCORE(r13)
-	lbz	r4, HSTATE_PTID(r13)
-	li	r7, 0x100
-	sld	r7, r7, r4
-	addi	r6, r5, VCORE_ENTRY_EXIT
-41:	lwarx	r3, 0, r6
-	or	r0, r3, r7
-	stwcx.	r0, 0, r6
-	bne	41b
-	isync		/* order stwcx. vs. reading napping_threads */
-
-	/*
-	 * At this point we have an interrupt that we have to pass
-	 * up to the kernel or qemu; we can't handle it in real mode.
-	 * Thus we have to do a partition switch, so we have to
-	 * collect the other threads, if we are the first thread
-	 * to take an interrupt.  To do this, we send a message or
-	 * IPI to all the threads that have their bit set in the entry
-	 * map in vcore->entry_exit_map (other than ourselves).
-	 * However, we don't need to bother if this is an HDEC
-	 * interrupt, since the other threads will already be on their
-	 * way here in that case.
-	 */
-	cmpwi	r3,0x100	/* Are we the first here? */
-	bge	43f
-	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-	beq	43f
-
-	srwi	r0, r7, 8
-	andc.	r3, r3, r0		/* no sense IPI'ing ourselves */
-	beq	43f
-	/* Order entry/exit update vs. IPIs */
-	sync
-BEGIN_FTR_SECTION
-	b	45f
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
-	/* Use msgsnd on POWER8 */
-	lhz	r6, VCORE_PCPU(r5)
-	clrldi	r6, r6, 64-3
-	oris	r6, r6, (PPC_DBELL_SERVER << (63-36))@h
-42:	andi.	r0, r3, 1
-	beq	44f
-	PPC_MSGSND(6)
-44:	srdi.	r3, r3, 1
-	addi	r6, r6, 1
-	bne	42b
-	b	43f
-	/* Use IPIs on POWER7 */
-45:	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
-	subf	r6,r4,r13
-42:	andi.	r0,r3,1
-	beq	44f
-	ld	r8,HSTATE_XICS_PHYS(r6)	/* get thread's XICS reg addr */
-	li	r0,IPI_PRIORITY
-	li	r7,XICS_MFRR
-	stbcix	r0,r7,r8		/* trigger the IPI */
-44:	srdi.	r3,r3,1
-	addi	r6,r6,PACA_SIZE
-	bne	42b
-
-43:	ld	r0, PPC_MIN_STKFRM+PPC_LR_STKOFF(r1)
-	addi	r1, r1, PPC_MIN_STKFRM
-	mtlr	r0
-	blr
-
 /*
  * Check whether an HDSI is an HPTE not found fault or something else.
  * If it is an HPTE not found fault that is due to the guest accessing
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* [PATCH 23/23] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C
@ 2015-03-20  9:40   ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20  9:40 UTC (permalink / raw)
  To: kvm-ppc, kvm; +Cc: Alexander Graf

This replaces the assembler code for kvmhv_commence_exit() with C code
in book3s_hv_builtin.c.  It also moves the IPI/message sending code
that was in book3s_hv_rm_xics.c into a new kvmhv_rm_send_ipi() function
so it can be used by kvmhv_commence_exit() as well as
icp_rm_set_vcpu_irq().

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_book3s_64.h |  2 +
 arch/powerpc/kvm/book3s_hv_builtin.c     | 73 ++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rm_xics.c     | 22 +--------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S  | 81 ++++----------------------------
 4 files changed, 85 insertions(+), 93 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 869c53f..2b84e48 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -438,6 +438,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
 
 extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
 
+extern void kvmhv_rm_send_ipi(int cpu);
+
 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
 
 #endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 2754251..dbfc525 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -22,6 +22,8 @@
 #include <asm/kvm_ppc.h>
 #include <asm/kvm_book3s.h>
 #include <asm/archrandom.h>
+#include <asm/dbell.h>
+#include <asm/xics.h>
 
 #define KVM_CMA_CHUNK_ORDER	18
 
@@ -184,3 +186,74 @@ long kvmppc_h_random(struct kvm_vcpu *vcpu)
 
 	return H_HARDWARE;
 }
+
+static inline void rm_writeb(unsigned long paddr, u8 val)
+{
+	__asm__ __volatile__("stbcix %0,0,%1"
+		: : "r" (val), "r" (paddr) : "memory");
+}
+
+/*
+ * Send an interrupt or message to another CPU.
+ * This can only be called in real mode.
+ * The caller needs to include any barrier needed to order writes
+ * to memory vs. the IPI/message.
+ */
+void kvmhv_rm_send_ipi(int cpu)
+{
+	unsigned long xics_phys;
+
+	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
+	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
+	    (cpu & ~7) = (raw_smp_processor_id() & ~7)) {
+		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
+		msg |= cpu & 7;
+		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
+		return;
+	}
+
+	/* Not too hard, then poke the target */
+	xics_phys = paca[cpu].kvm_hstate.xics_phys;
+	rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+}
+
+/*
+ * The following functions are called from the assembly code
+ * in book3s_hv_rmhandlers.S.
+ */
+static void kvmhv_interrupt_vcore(struct kvmppc_vcore *vc, int active)
+{
+	int cpu = vc->pcpu;
+
+	/* Order setting of exit map vs. msgsnd/IPI */
+	smp_mb();
+	for (; active; active >>= 1, ++cpu)
+		if (active & 1)
+			kvmhv_rm_send_ipi(cpu);
+}
+
+void kvmhv_commence_exit(int trap)
+{
+	struct kvmppc_vcore *vc = local_paca->kvm_hstate.kvm_vcore;
+	int ptid = local_paca->kvm_hstate.ptid;
+	int me, ee;
+
+	/* Set our bit in the threads-exiting-guest map in the 0xff00
+	   bits of vcore->entry_exit_map */
+	me = 0x100 << ptid;
+	do {
+		ee = vc->entry_exit_map;
+	} while (cmpxchg(&vc->entry_exit_map, ee, ee | me) != ee);
+
+	/* Are we the first here? */
+	if ((ee >> 8) != 0)
+		return;
+
+	/*
+	 * Trigger the other threads in this vcore to exit the guest.
+	 * If this is a hypervisor decrementer interrupt then they
+	 * will be already on their way out of the guest.
+	 */
+	if (trap != BOOK3S_INTERRUPT_HV_DECREMENTER)
+		kvmhv_interrupt_vcore(vc, ee & ~(1 << ptid));
+}
diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
index 457a8b1..046ab44 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
@@ -27,12 +27,6 @@
 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp,
 			    u32 new_irq);
 
-static inline void rm_writeb(unsigned long paddr, u8 val)
-{
-	__asm__ __volatile__("sync; stbcix %0,0,%1"
-		: : "r" (val), "r" (paddr) : "memory");
-}
-
 /* -- ICS routines -- */
 static void ics_rm_check_resend(struct kvmppc_xics *xics,
 				struct kvmppc_ics *ics, struct kvmppc_icp *icp)
@@ -61,7 +55,6 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
 				struct kvm_vcpu *this_vcpu)
 {
 	struct kvmppc_icp *this_icp = this_vcpu->arch.icp;
-	unsigned long xics_phys;
 	int cpu;
 
 	/* Mark the target VCPU as having an interrupt pending */
@@ -84,19 +77,8 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
 	/* In SMT cpu will always point to thread 0, we adjust it */
 	cpu += vcpu->arch.ptid;
 
-	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
-	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
-	    (cpu & ~7) = (raw_smp_processor_id() & ~7)) {
-		unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
-		msg |= cpu & 7;
-		smp_mb();
-		__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
-		return;
-	}
-
-	/* Not too hard, then poke the target */
-	xics_phys = paca[cpu].kvm_hstate.xics_phys;
-	rm_writeb(xics_phys + XICS_MFRR, IPI_PRIORITY);
+	smp_mb();
+	kvmhv_rm_send_ipi(cpu);
 }
 
 static void icp_rm_clr_vcpu_irq(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 854ddac..47aeaf7 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -260,7 +260,11 @@ kvm_novcpu_exit:
 	beq	13f
 	addi	r3, r4, VCPU_TB_RMEXIT
 	bl	kvmhv_accumulate_time
-13:	bl	kvmhv_commence_exit
+13:	mr	r3, r12
+	stw	r12, 112-4(r1)
+	bl	kvmhv_commence_exit
+	nop
+	lwz	r12, 112-4(r1)
 	b	kvmhv_switch_to_host
 
 /*
@@ -1152,6 +1156,9 @@ mc_cont:
 
 	/* Increment exit count, poke other threads to exit */
 	bl	kvmhv_commence_exit
+	nop
+	ld	r9, HSTATE_KVM_VCPU(r13)
+	lwz	r12, VCPU_TRAP(r9)
 
 	/* Save guest CTRL register, set runlatch to 1 */
 	mfspr	r6,SPRN_CTRLF
@@ -1603,78 +1610,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	mtlr	r0
 	blr
 
-kvmhv_commence_exit:		/* r12 = trap, r13 = paca, doesn't trash r9 */
-	mflr	r0
-	std	r0, PPC_LR_STKOFF(r1)
-	stdu	r1, -PPC_MIN_STKFRM(r1)
-
-	/* Set our bit in the threads-exiting-guest map in the 0xff00
-	   bits of vcore->entry_exit_map */
-	ld	r5, HSTATE_KVM_VCORE(r13)
-	lbz	r4, HSTATE_PTID(r13)
-	li	r7, 0x100
-	sld	r7, r7, r4
-	addi	r6, r5, VCORE_ENTRY_EXIT
-41:	lwarx	r3, 0, r6
-	or	r0, r3, r7
-	stwcx.	r0, 0, r6
-	bne	41b
-	isync		/* order stwcx. vs. reading napping_threads */
-
-	/*
-	 * At this point we have an interrupt that we have to pass
-	 * up to the kernel or qemu; we can't handle it in real mode.
-	 * Thus we have to do a partition switch, so we have to
-	 * collect the other threads, if we are the first thread
-	 * to take an interrupt.  To do this, we send a message or
-	 * IPI to all the threads that have their bit set in the entry
-	 * map in vcore->entry_exit_map (other than ourselves).
-	 * However, we don't need to bother if this is an HDEC
-	 * interrupt, since the other threads will already be on their
-	 * way here in that case.
-	 */
-	cmpwi	r3,0x100	/* Are we the first here? */
-	bge	43f
-	cmpwi	r12,BOOK3S_INTERRUPT_HV_DECREMENTER
-	beq	43f
-
-	srwi	r0, r7, 8
-	andc.	r3, r3, r0		/* no sense IPI'ing ourselves */
-	beq	43f
-	/* Order entry/exit update vs. IPIs */
-	sync
-BEGIN_FTR_SECTION
-	b	45f
-END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
-	/* Use msgsnd on POWER8 */
-	lhz	r6, VCORE_PCPU(r5)
-	clrldi	r6, r6, 64-3
-	oris	r6, r6, (PPC_DBELL_SERVER << (63-36))@h
-42:	andi.	r0, r3, 1
-	beq	44f
-	PPC_MSGSND(6)
-44:	srdi.	r3, r3, 1
-	addi	r6, r6, 1
-	bne	42b
-	b	43f
-	/* Use IPIs on POWER7 */
-45:	mulli	r4,r4,PACA_SIZE		/* get paca for thread 0 */
-	subf	r6,r4,r13
-42:	andi.	r0,r3,1
-	beq	44f
-	ld	r8,HSTATE_XICS_PHYS(r6)	/* get thread's XICS reg addr */
-	li	r0,IPI_PRIORITY
-	li	r7,XICS_MFRR
-	stbcix	r0,r7,r8		/* trigger the IPI */
-44:	srdi.	r3,r3,1
-	addi	r6,r6,PACA_SIZE
-	bne	42b
-
-43:	ld	r0, PPC_MIN_STKFRM+PPC_LR_STKOFF(r1)
-	addi	r1, r1, PPC_MIN_STKFRM
-	mtlr	r0
-	blr
-
 /*
  * Check whether an HDSI is an HPTE not found fault or something else.
  * If it is an HPTE not found fault that is due to the guest accessing
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/23] Bug fixes and improvements for HV KVM
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20 10:45   ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 10:45 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This is my current patch queue for HV KVM on PPC.  This series is
> based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3
> plus a set of recent KVM changes which don't intersect with the
> changes in this series.  On top of that, in my testing I have some
> patches which are not KVM-related but are needed to boot and run a
> recent upstream kernel successfully:
> 
> tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
> tick/hotplug: Handover time related duties before cpu offline
> powerpc/powernv: Check image loaded or not before calling flash
> powerpc/powernv: Fixes for hypervisor doorbell handling
> powerpc/powernv: Fix return value from power7_nap() et al.
> powerpc: Export __spin_yield
> 
> These patches have been posted by their authors and are on their way
> upstream via various trees.  They are not included in this series.
> 
> The first three patches are bug fixes that should go into v4.0 if
> possible.

Thanks, applied the first 3 to my for-4.0 branch which is going through
autotest now. If everything runs fine, I'll send it to Paolo for
upstream merge.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/23] Bug fixes and improvements for HV KVM
@ 2015-03-20 10:45   ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 10:45 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This is my current patch queue for HV KVM on PPC.  This series is
> based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3
> plus a set of recent KVM changes which don't intersect with the
> changes in this series.  On top of that, in my testing I have some
> patches which are not KVM-related but are needed to boot and run a
> recent upstream kernel successfully:
> 
> tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
> tick/hotplug: Handover time related duties before cpu offline
> powerpc/powernv: Check image loaded or not before calling flash
> powerpc/powernv: Fixes for hypervisor doorbell handling
> powerpc/powernv: Fix return value from power7_nap() et al.
> powerpc: Export __spin_yield
> 
> These patches have been posted by their authors and are on their way
> upstream via various trees.  They are not included in this series.
> 
> The first three patches are bug fixes that should go into v4.0 if
> possible.

Thanks, applied the first 3 to my for-4.0 branch which is going through
autotest now. If everything runs fine, I'll send it to Paolo for
upstream merge.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
  2015-03-20  9:39   ` Paul Mackerras
@ 2015-03-20 11:01     ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:01 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm; +Cc: Bharata B Rao



On 20.03.15 10:39, Paul Mackerras wrote:
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> correctly, certain work arounds have to be employed to allow reuse of
> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> proposed workaround is to park the vcpu fd in userspace during cpu unplug
> and reuse it later during next hotplug.
> 
> More details can be found here:
> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> 
> In order to support this workaround with PowerPC KVM, don't create or
> initialize ICP if the vCPU is found to be already associated with an ICP.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Paul Mackerras <paulus@samba.org>

This probably makes some sense, but please make sure that user space has
some way to figure out whether hotplug works at all.

Also Paul, for patches that you pick up from others, I'd prefer if they
send the patches to the ML themselves first and you pick them up from
there then. That way we give everyone the same treatment.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
@ 2015-03-20 11:01     ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:01 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm; +Cc: Bharata B Rao



On 20.03.15 10:39, Paul Mackerras wrote:
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> correctly, certain work arounds have to be employed to allow reuse of
> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> proposed workaround is to park the vcpu fd in userspace during cpu unplug
> and reuse it later during next hotplug.
> 
> More details can be found here:
> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> 
> In order to support this workaround with PowerPC KVM, don't create or
> initialize ICP if the vCPU is found to be already associated with an ICP.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> Signed-off-by: Paul Mackerras <paulus@samba.org>

This probably makes some sense, but please make sure that user space has
some way to figure out whether hotplug works at all.

Also Paul, for patches that you pick up from others, I'd prefer if they
send the patches to the ML themselves first and you pick them up from
there then. That way we give everyone the same treatment.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
  2015-03-20  9:39   ` Paul Mackerras
@ 2015-03-20 11:15     ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:15 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This reads the timebase at various points in the real-mode guest
> entry/exit code and uses that to accumulate total, minimum and
> maximum time spent in those parts of the code.  Currently these
> times are accumulated per vcpu in 5 parts of the code:
> 
> * rm_entry - time taken from the start of kvmppc_hv_entry() until
>   just before entering the guest.
> * rm_intr - time from when we take a hypervisor interrupt in the
>   guest until we either re-enter the guest or decide to exit to the
>   host.  This includes time spent handling hcalls in real mode.
> * rm_exit - time from when we decide to exit the guest until the
>   return from kvmppc_hv_entry().
> * guest - time spend in the guest
> * cede - time spent napping in real mode due to an H_CEDE hcall
>   while other threads in the same vcore are active.
> 
> These times are exposed in debugfs in a directory per vcpu that
> contains a file called "timings".  This file contains one line for
> each of the 5 timings above, with the name followed by a colon and
> 4 numbers, which are the count (number of times the code has been
> executed), the total time, the minimum time, and the maximum time,
> all in nanoseconds.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Have you measure the additional overhead this brings?

> ---
>  arch/powerpc/include/asm/kvm_host.h     |  19 +++++
>  arch/powerpc/include/asm/time.h         |   3 +
>  arch/powerpc/kernel/asm-offsets.c       |  11 +++
>  arch/powerpc/kernel/time.c              |   6 ++
>  arch/powerpc/kvm/book3s_hv.c            | 135 ++++++++++++++++++++++++++++++++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 ++++++++++++++++++++++++-
>  6 files changed, 276 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index f1d0bbc..286c0ce 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -369,6 +369,14 @@ struct kvmppc_slb {
>  	u8 base_page_size;	/* MMU_PAGE_xxx */
>  };
>  
> +/* Struct used to accumulate timing information in HV real mode code */
> +struct kvmhv_tb_accumulator {
> +	u64	seqcount;	/* used to synchronize access, also count * 2 */
> +	u64	tb_total;	/* total time in timebase ticks */
> +	u64	tb_min;		/* min time */
> +	u64	tb_max;		/* max time */
> +};
> +
>  # ifdef CONFIG_PPC_FSL_BOOK3E
>  #define KVMPPC_BOOKE_IAC_NUM	2
>  #define KVMPPC_BOOKE_DAC_NUM	2
> @@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
>  	u64 busy_preempt;
>  
>  	u32 emul_inst;
> +
> +	struct kvmhv_tb_accumulator *cur_activity;	/* What we're timing */
> +	u64	cur_tb_start;			/* when it started */
> +	struct kvmhv_tb_accumulator rm_entry;	/* real-mode entry code */
> +	struct kvmhv_tb_accumulator rm_intr;	/* real-mode intr handling */
> +	struct kvmhv_tb_accumulator rm_exit;	/* real-mode exit code */
> +	struct kvmhv_tb_accumulator guest_time;	/* guest execution */
> +	struct kvmhv_tb_accumulator cede_time;	/* time napping inside guest */
> +
> +	struct dentry *debugfs_dir;
> +	struct dentry *debugfs_timings;
>  #endif
>  };
>  
> diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
> index 03cbada..10fc784 100644
> --- a/arch/powerpc/include/asm/time.h
> +++ b/arch/powerpc/include/asm/time.h
> @@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
>  
>  DECLARE_PER_CPU(u64, decrementers_next_tb);
>  
> +/* Convert timebase ticks to nanoseconds */
> +unsigned long long tb_to_ns(unsigned long long tb_ticks);
> +
>  #endif /* __KERNEL__ */
>  #endif /* __POWERPC_TIME_H */
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 4717859..ec9f59c 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -458,6 +458,17 @@ int main(void)
>  	DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
>  	DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
>  	DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
> +	DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
> +	DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
> +	DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
> +	DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
> +	DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
> +	DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
> +	DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, arch.cur_tb_start));
> +	DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
> +	DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
> +	DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
> +	DEFINE(TAS_MAX, offsetof(struct kvmhv_tb_accumulator, tb_max));
>  #endif
>  	DEFINE(VCPU_SHARED_SPRG3, offsetof(struct kvm_vcpu_arch_shared, sprg3));
>  	DEFINE(VCPU_SHARED_SPRG4, offsetof(struct kvm_vcpu_arch_shared, sprg4));
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 2d7b33f..56f4484 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -608,6 +608,12 @@ void arch_suspend_enable_irqs(void)
>  }
>  #endif
>  
> +unsigned long long tb_to_ns(unsigned long long ticks)
> +{
> +	return mulhdu(ticks, tb_to_ns_scale) << tb_to_ns_shift;
> +}
> +EXPORT_SYMBOL_GPL(tb_to_ns);
> +
>  /*
>   * Scheduler clock - returns current time in nanosec units.
>   *
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 6631b53..8517c33 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1411,6 +1411,127 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
>  	return vcore;
>  }
>  
> +static struct debugfs_timings_element {
> +	const char *name;
> +	size_t offset;
> +} timings[] = {
> +	{"rm_entry",	offsetof(struct kvm_vcpu, arch.rm_entry)},
> +	{"rm_intr",	offsetof(struct kvm_vcpu, arch.rm_intr)},
> +	{"rm_exit",	offsetof(struct kvm_vcpu, arch.rm_exit)},
> +	{"guest",	offsetof(struct kvm_vcpu, arch.guest_time)},
> +	{"cede",	offsetof(struct kvm_vcpu, arch.cede_time)},
> +};
> +
> +#define N_TIMINGS	(sizeof(timings) / sizeof(timings[0]))
> +
> +struct debugfs_timings_state {
> +	struct kvm_vcpu	*vcpu;
> +	unsigned int	buflen;
> +	char		buf[N_TIMINGS * 100];
> +};
> +
> +static int debugfs_timings_open(struct inode *inode, struct file *file)
> +{
> +	struct kvm_vcpu *vcpu = inode->i_private;
> +	struct debugfs_timings_state *p;
> +
> +	p = kzalloc(sizeof(*p), GFP_KERNEL);
> +	if (!p)
> +		return -ENOMEM;
> +
> +	kvm_get_kvm(vcpu->kvm);
> +	p->vcpu = vcpu;
> +	file->private_data = p;
> +
> +	return nonseekable_open(inode, file);
> +}
> +
> +static int debugfs_timings_release(struct inode *inode, struct file *file)
> +{
> +	struct debugfs_timings_state *p = file->private_data;
> +
> +	kvm_put_kvm(p->vcpu->kvm);
> +	kfree(p);
> +	return 0;
> +}
> +
> +static ssize_t debugfs_timings_read(struct file *file, char __user *buf,
> +				    size_t len, loff_t *ppos)
> +{
> +	struct debugfs_timings_state *p = file->private_data;
> +	struct kvm_vcpu *vcpu = p->vcpu;
> +	char *s;
> +	struct kvmhv_tb_accumulator tb;
> +	u64 count;
> +	loff_t pos;
> +	ssize_t n;
> +	int i, loops;
> +	bool ok;
> +
> +	if (!p->buflen) {
> +		s = p->buf;
> +		for (i = 0; i < N_TIMINGS; ++i) {
> +			struct kvmhv_tb_accumulator *acc;
> +
> +			acc = (struct kvmhv_tb_accumulator *)
> +				((unsigned long)vcpu + timings[i].offset);
> +			ok = false;
> +			for (loops = 0; loops < 1000; ++loops) {
> +				count = acc->seqcount;
> +				if (!(count & 1)) {
> +					smp_rmb();
> +					tb = *acc;
> +					smp_rmb();
> +					if (count == acc->seqcount) {
> +						ok = true;
> +						break;
> +					}
> +				}
> +				udelay(1);
> +			}
> +			if (!ok)
> +				sprintf(s, "%s: stuck\n", timings[i].name);
> +			else
> +				sprintf(s, "%s: %llu %llu %llu %llu\n",
> +					timings[i].name, count / 2,
> +					tb_to_ns(tb.tb_total),
> +					tb_to_ns(tb.tb_min),
> +					tb_to_ns(tb.tb_max));
> +			s += strlen(s);
> +		}
> +		p->buflen = s - p->buf;
> +	}
> +
> +	pos = *ppos;
> +	if (pos >= p->buflen)
> +		return 0;
> +	if (len > p->buflen - pos)
> +		len = p->buflen - pos;
> +	n = copy_to_user(buf, p->buf + pos, len);
> +	if (n) {
> +		if (n == len)
> +			return -EFAULT;
> +		len -= n;
> +	}
> +	*ppos = pos + len;
> +	return len;
> +}
> +
> +static ssize_t debugfs_timings_write(struct file *file, const char __user *buf,
> +				     size_t len, loff_t *ppos)
> +{
> +	return -EACCES;
> +}
> +
> +static const struct file_operations debugfs_timings_ops = {
> +	.owner	 = THIS_MODULE,
> +	.open	 = debugfs_timings_open,
> +	.release = debugfs_timings_release,
> +	.read	 = debugfs_timings_read,
> +	.write	 = debugfs_timings_write,
> +	.llseek	 = generic_file_llseek,
> +};
> +
>  static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
>  						   unsigned int id)
>  {
> @@ -1418,6 +1539,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
>  	int err = -EINVAL;
>  	int core;
>  	struct kvmppc_vcore *vcore;
> +	char buf[16];
>  
>  	core = id / threads_per_subcore;
>  	if (core >= KVM_MAX_VCORES)
> @@ -1480,6 +1602,19 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
>  	vcpu->arch.cpu_type = KVM_CPU_3S_64;
>  	kvmppc_sanity_check(vcpu);
>  
> +	/* Create a debugfs directory for the vcpu */
> +	sprintf(buf, "vcpu%u", id);

Please make sure to use snprintf and friends rather than assume that you
happened to choose the buffer correctly sized.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
@ 2015-03-20 11:15     ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:15 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This reads the timebase at various points in the real-mode guest
> entry/exit code and uses that to accumulate total, minimum and
> maximum time spent in those parts of the code.  Currently these
> times are accumulated per vcpu in 5 parts of the code:
> 
> * rm_entry - time taken from the start of kvmppc_hv_entry() until
>   just before entering the guest.
> * rm_intr - time from when we take a hypervisor interrupt in the
>   guest until we either re-enter the guest or decide to exit to the
>   host.  This includes time spent handling hcalls in real mode.
> * rm_exit - time from when we decide to exit the guest until the
>   return from kvmppc_hv_entry().
> * guest - time spend in the guest
> * cede - time spent napping in real mode due to an H_CEDE hcall
>   while other threads in the same vcore are active.
> 
> These times are exposed in debugfs in a directory per vcpu that
> contains a file called "timings".  This file contains one line for
> each of the 5 timings above, with the name followed by a colon and
> 4 numbers, which are the count (number of times the code has been
> executed), the total time, the minimum time, and the maximum time,
> all in nanoseconds.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Have you measure the additional overhead this brings?

> ---
>  arch/powerpc/include/asm/kvm_host.h     |  19 +++++
>  arch/powerpc/include/asm/time.h         |   3 +
>  arch/powerpc/kernel/asm-offsets.c       |  11 +++
>  arch/powerpc/kernel/time.c              |   6 ++
>  arch/powerpc/kvm/book3s_hv.c            | 135 ++++++++++++++++++++++++++++++++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 105 ++++++++++++++++++++++++-
>  6 files changed, 276 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index f1d0bbc..286c0ce 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -369,6 +369,14 @@ struct kvmppc_slb {
>  	u8 base_page_size;	/* MMU_PAGE_xxx */
>  };
>  
> +/* Struct used to accumulate timing information in HV real mode code */
> +struct kvmhv_tb_accumulator {
> +	u64	seqcount;	/* used to synchronize access, also count * 2 */
> +	u64	tb_total;	/* total time in timebase ticks */
> +	u64	tb_min;		/* min time */
> +	u64	tb_max;		/* max time */
> +};
> +
>  # ifdef CONFIG_PPC_FSL_BOOK3E
>  #define KVMPPC_BOOKE_IAC_NUM	2
>  #define KVMPPC_BOOKE_DAC_NUM	2
> @@ -656,6 +664,17 @@ struct kvm_vcpu_arch {
>  	u64 busy_preempt;
>  
>  	u32 emul_inst;
> +
> +	struct kvmhv_tb_accumulator *cur_activity;	/* What we're timing */
> +	u64	cur_tb_start;			/* when it started */
> +	struct kvmhv_tb_accumulator rm_entry;	/* real-mode entry code */
> +	struct kvmhv_tb_accumulator rm_intr;	/* real-mode intr handling */
> +	struct kvmhv_tb_accumulator rm_exit;	/* real-mode exit code */
> +	struct kvmhv_tb_accumulator guest_time;	/* guest execution */
> +	struct kvmhv_tb_accumulator cede_time;	/* time napping inside guest */
> +
> +	struct dentry *debugfs_dir;
> +	struct dentry *debugfs_timings;
>  #endif
>  };
>  
> diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
> index 03cbada..10fc784 100644
> --- a/arch/powerpc/include/asm/time.h
> +++ b/arch/powerpc/include/asm/time.h
> @@ -211,5 +211,8 @@ extern void secondary_cpu_time_init(void);
>  
>  DECLARE_PER_CPU(u64, decrementers_next_tb);
>  
> +/* Convert timebase ticks to nanoseconds */
> +unsigned long long tb_to_ns(unsigned long long tb_ticks);
> +
>  #endif /* __KERNEL__ */
>  #endif /* __POWERPC_TIME_H */
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 4717859..ec9f59c 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -458,6 +458,17 @@ int main(void)
>  	DEFINE(VCPU_SPRG1, offsetof(struct kvm_vcpu, arch.shregs.sprg1));
>  	DEFINE(VCPU_SPRG2, offsetof(struct kvm_vcpu, arch.shregs.sprg2));
>  	DEFINE(VCPU_SPRG3, offsetof(struct kvm_vcpu, arch.shregs.sprg3));
> +	DEFINE(VCPU_TB_RMENTRY, offsetof(struct kvm_vcpu, arch.rm_entry));
> +	DEFINE(VCPU_TB_RMINTR, offsetof(struct kvm_vcpu, arch.rm_intr));
> +	DEFINE(VCPU_TB_RMEXIT, offsetof(struct kvm_vcpu, arch.rm_exit));
> +	DEFINE(VCPU_TB_GUEST, offsetof(struct kvm_vcpu, arch.guest_time));
> +	DEFINE(VCPU_TB_CEDE, offsetof(struct kvm_vcpu, arch.cede_time));
> +	DEFINE(VCPU_CUR_ACTIVITY, offsetof(struct kvm_vcpu, arch.cur_activity));
> +	DEFINE(VCPU_ACTIVITY_START, offsetof(struct kvm_vcpu, arch.cur_tb_start));
> +	DEFINE(TAS_SEQCOUNT, offsetof(struct kvmhv_tb_accumulator, seqcount));
> +	DEFINE(TAS_TOTAL, offsetof(struct kvmhv_tb_accumulator, tb_total));
> +	DEFINE(TAS_MIN, offsetof(struct kvmhv_tb_accumulator, tb_min));
> +	DEFINE(TAS_MAX, offsetof(struct kvmhv_tb_accumulator, tb_max));
>  #endif
>  	DEFINE(VCPU_SHARED_SPRG3, offsetof(struct kvm_vcpu_arch_shared, sprg3));
>  	DEFINE(VCPU_SHARED_SPRG4, offsetof(struct kvm_vcpu_arch_shared, sprg4));
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 2d7b33f..56f4484 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -608,6 +608,12 @@ void arch_suspend_enable_irqs(void)
>  }
>  #endif
>  
> +unsigned long long tb_to_ns(unsigned long long ticks)
> +{
> +	return mulhdu(ticks, tb_to_ns_scale) << tb_to_ns_shift;
> +}
> +EXPORT_SYMBOL_GPL(tb_to_ns);
> +
>  /*
>   * Scheduler clock - returns current time in nanosec units.
>   *
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 6631b53..8517c33 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1411,6 +1411,127 @@ static struct kvmppc_vcore *kvmppc_vcore_create(struct kvm *kvm, int core)
>  	return vcore;
>  }
>  
> +static struct debugfs_timings_element {
> +	const char *name;
> +	size_t offset;
> +} timings[] = {
> +	{"rm_entry",	offsetof(struct kvm_vcpu, arch.rm_entry)},
> +	{"rm_intr",	offsetof(struct kvm_vcpu, arch.rm_intr)},
> +	{"rm_exit",	offsetof(struct kvm_vcpu, arch.rm_exit)},
> +	{"guest",	offsetof(struct kvm_vcpu, arch.guest_time)},
> +	{"cede",	offsetof(struct kvm_vcpu, arch.cede_time)},
> +};
> +
> +#define N_TIMINGS	(sizeof(timings) / sizeof(timings[0]))
> +
> +struct debugfs_timings_state {
> +	struct kvm_vcpu	*vcpu;
> +	unsigned int	buflen;
> +	char		buf[N_TIMINGS * 100];
> +};
> +
> +static int debugfs_timings_open(struct inode *inode, struct file *file)
> +{
> +	struct kvm_vcpu *vcpu = inode->i_private;
> +	struct debugfs_timings_state *p;
> +
> +	p = kzalloc(sizeof(*p), GFP_KERNEL);
> +	if (!p)
> +		return -ENOMEM;
> +
> +	kvm_get_kvm(vcpu->kvm);
> +	p->vcpu = vcpu;
> +	file->private_data = p;
> +
> +	return nonseekable_open(inode, file);
> +}
> +
> +static int debugfs_timings_release(struct inode *inode, struct file *file)
> +{
> +	struct debugfs_timings_state *p = file->private_data;
> +
> +	kvm_put_kvm(p->vcpu->kvm);
> +	kfree(p);
> +	return 0;
> +}
> +
> +static ssize_t debugfs_timings_read(struct file *file, char __user *buf,
> +				    size_t len, loff_t *ppos)
> +{
> +	struct debugfs_timings_state *p = file->private_data;
> +	struct kvm_vcpu *vcpu = p->vcpu;
> +	char *s;
> +	struct kvmhv_tb_accumulator tb;
> +	u64 count;
> +	loff_t pos;
> +	ssize_t n;
> +	int i, loops;
> +	bool ok;
> +
> +	if (!p->buflen) {
> +		s = p->buf;
> +		for (i = 0; i < N_TIMINGS; ++i) {
> +			struct kvmhv_tb_accumulator *acc;
> +
> +			acc = (struct kvmhv_tb_accumulator *)
> +				((unsigned long)vcpu + timings[i].offset);
> +			ok = false;
> +			for (loops = 0; loops < 1000; ++loops) {
> +				count = acc->seqcount;
> +				if (!(count & 1)) {
> +					smp_rmb();
> +					tb = *acc;
> +					smp_rmb();
> +					if (count = acc->seqcount) {
> +						ok = true;
> +						break;
> +					}
> +				}
> +				udelay(1);
> +			}
> +			if (!ok)
> +				sprintf(s, "%s: stuck\n", timings[i].name);
> +			else
> +				sprintf(s, "%s: %llu %llu %llu %llu\n",
> +					timings[i].name, count / 2,
> +					tb_to_ns(tb.tb_total),
> +					tb_to_ns(tb.tb_min),
> +					tb_to_ns(tb.tb_max));
> +			s += strlen(s);
> +		}
> +		p->buflen = s - p->buf;
> +	}
> +
> +	pos = *ppos;
> +	if (pos >= p->buflen)
> +		return 0;
> +	if (len > p->buflen - pos)
> +		len = p->buflen - pos;
> +	n = copy_to_user(buf, p->buf + pos, len);
> +	if (n) {
> +		if (n = len)
> +			return -EFAULT;
> +		len -= n;
> +	}
> +	*ppos = pos + len;
> +	return len;
> +}
> +
> +static ssize_t debugfs_timings_write(struct file *file, const char __user *buf,
> +				     size_t len, loff_t *ppos)
> +{
> +	return -EACCES;
> +}
> +
> +static const struct file_operations debugfs_timings_ops = {
> +	.owner	 = THIS_MODULE,
> +	.open	 = debugfs_timings_open,
> +	.release = debugfs_timings_release,
> +	.read	 = debugfs_timings_read,
> +	.write	 = debugfs_timings_write,
> +	.llseek	 = generic_file_llseek,
> +};
> +
>  static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
>  						   unsigned int id)
>  {
> @@ -1418,6 +1539,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
>  	int err = -EINVAL;
>  	int core;
>  	struct kvmppc_vcore *vcore;
> +	char buf[16];
>  
>  	core = id / threads_per_subcore;
>  	if (core >= KVM_MAX_VCORES)
> @@ -1480,6 +1602,19 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
>  	vcpu->arch.cpu_type = KVM_CPU_3S_64;
>  	kvmppc_sanity_check(vcpu);
>  
> +	/* Create a debugfs directory for the vcpu */
> +	sprintf(buf, "vcpu%u", id);

Please make sure to use snprintf and friends rather than assume that you
happened to choose the buffer correctly sized.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
  2015-03-20  9:39   ` Paul Mackerras
@ 2015-03-20 11:20     ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:20 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This creates a debugfs directory for each HV guest (assuming debugfs
> is enabled in the kernel config), and within that directory, a file
> by which the contents of the guest's HPT (hashed page table) can be
> read.  The directory is named vmnnnn, where nnnn is the PID of the
> process that created the guest.  The file is named "htab".  This is
> intended to help in debugging problems in the host's management
> of guest memory.
> 
> The contents of the file consist of a series of lines like this:
> 
>   3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196
> 
> The first field is the index of the entry in the HPT, the second and
> third are the HPT entry, so the third entry contains the real page
> number that is mapped by the entry if the entry's valid bit is set.
> The fourth field is the guest's view of the second doubleword of the
> entry, so it contains the guest physical address.  (The format of the
> second through fourth fields are described in the Power ISA and also
> in arch/powerpc/include/asm/mmu-hash64.h.)
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
>  arch/powerpc/include/asm/kvm_host.h      |   2 +
>  arch/powerpc/kvm/book3s_64_mmu_hv.c      | 136 +++++++++++++++++++++++++++++++
>  arch/powerpc/kvm/book3s_hv.c             |  12 +++
>  virt/kvm/kvm_main.c                      |   1 +
>  5 files changed, 153 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 0789a0f..869c53f 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
>  	return rcu_dereference_raw_notrace(kvm->memslots);
>  }
>  
> +extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
> +
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  
>  #endif /* __ASM_KVM_BOOK3S_64_H__ */
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 015773f..f1d0bbc 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -238,6 +238,8 @@ struct kvm_arch {
>  	atomic_t hpte_mod_interest;
>  	cpumask_t need_tlb_flush;
>  	int hpt_cma_alloc;
> +	struct dentry *debugfs_dir;
> +	struct dentry *htab_dentry;
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>  	struct mutex hpt_mutex;
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 6c6825a..d6fe308 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -27,6 +27,7 @@
>  #include <linux/srcu.h>
>  #include <linux/anon_inodes.h>
>  #include <linux/file.h>
> +#include <linux/debugfs.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/kvm_ppc.h>
> @@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *ghf)
>  	return ret;
>  }
>  
> +struct debugfs_htab_state {
> +	struct kvm	*kvm;
> +	struct mutex	mutex;
> +	unsigned long	hpt_index;
> +	int		chars_left;
> +	int		buf_index;
> +	char		buf[64];
> +};
> +
> +static int debugfs_htab_open(struct inode *inode, struct file *file)
> +{
> +	struct kvm *kvm = inode->i_private;
> +	struct debugfs_htab_state *p;
> +
> +	p = kzalloc(sizeof(*p), GFP_KERNEL);
> +	if (!p)
> +		return -ENOMEM;
> +
> +	kvm_get_kvm(kvm);
> +	p->kvm = kvm;
> +	mutex_init(&p->mutex);
> +	file->private_data = p;
> +
> +	return nonseekable_open(inode, file);
> +}
> +
> +static int debugfs_htab_release(struct inode *inode, struct file *file)
> +{
> +	struct debugfs_htab_state *p = file->private_data;
> +
> +	kvm_put_kvm(p->kvm);
> +	kfree(p);
> +	return 0;
> +}
> +
> +static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
> +				 size_t len, loff_t *ppos)
> +{
> +	struct debugfs_htab_state *p = file->private_data;
> +	ssize_t ret, r;
> +	unsigned long i, n;
> +	unsigned long v, hr, gr;
> +	struct kvm *kvm;
> +	__be64 *hptp;
> +
> +	ret = mutex_lock_interruptible(&p->mutex);
> +	if (ret)
> +		return ret;
> +
> +	if (p->chars_left) {
> +		n = p->chars_left;
> +		if (n > len)
> +			n = len;
> +		r = copy_to_user(buf, p->buf + p->buf_index, n);
> +		n -= r;
> +		p->chars_left -= n;
> +		p->buf_index += n;
> +		buf += n;
> +		len -= n;
> +		ret = n;
> +		if (r) {
> +			if (!n)
> +				ret = -EFAULT;
> +			goto out;
> +		}
> +	}
> +
> +	kvm = p->kvm;
> +	i = p->hpt_index;
> +	hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
> +	for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
> +		if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)))
> +			continue;
> +
> +		/* lock the HPTE so it's stable and read it */
> +		preempt_disable();
> +		while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
> +			cpu_relax();
> +		v = be64_to_cpu(hptp[0]) & ~HPTE_V_HVLOCK;
> +		hr = be64_to_cpu(hptp[1]);
> +		gr = kvm->arch.revmap[i].guest_rpte;
> +		unlock_hpte(hptp, v);
> +		preempt_enable();
> +
> +		if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT)))
> +			continue;
> +
> +		n = scnprintf(p->buf, sizeof(p->buf),
> +			      "%6lx %.16lx %.16lx %.16lx\n",
> +			      i, v, hr, gr);
> +		p->chars_left = n;
> +		if (n > len)
> +			n = len;
> +		r = copy_to_user(buf, p->buf, n);
> +		n -= r;
> +		p->chars_left -= n;
> +		p->buf_index = n;
> +		buf += n;
> +		len -= n;
> +		ret += n;
> +		if (r) {
> +			if (!ret)
> +				ret = -EFAULT;
> +			goto out;
> +		}
> +	}
> +	p->hpt_index = i;
> +
> + out:
> +	mutex_unlock(&p->mutex);
> +	return ret;
> +}
> +
> +ssize_t debugfs_htab_write(struct file *file, const char __user *buf,
> +			   size_t len, loff_t *ppos)
> +{
> +	return -EACCES;
> +}
> +
> +static const struct file_operations debugfs_htab_fops = {
> +	.owner	 = THIS_MODULE,
> +	.open	 = debugfs_htab_open,
> +	.release = debugfs_htab_release,
> +	.read	 = debugfs_htab_read,
> +	.write	 = debugfs_htab_write,
> +	.llseek	 = generic_file_llseek,
> +};
> +
> +void kvmppc_mmu_debugfs_init(struct kvm *kvm)
> +{
> +	kvm->arch.htab_dentry = debugfs_create_file("htab", 0400,
> +						    kvm->arch.debugfs_dir, kvm,
> +						    &debugfs_htab_fops);
> +}
> +
>  void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu)
>  {
>  	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 7b7102a..6631b53 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -32,6 +32,7 @@
>  #include <linux/page-flags.h>
>  #include <linux/srcu.h>
>  #include <linux/miscdevice.h>
> +#include <linux/debugfs.h>
>  
>  #include <asm/reg.h>
>  #include <asm/cputable.h>
> @@ -2307,6 +2308,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
>  static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  {
>  	unsigned long lpcr, lpid;
> +	char buf[32];
>  
>  	/* Allocate the guest's logical partition ID */
>  
> @@ -2347,6 +2349,14 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  	 */
>  	kvm_hv_vm_activated();
>  
> +	/*
> +	 * Create a debugfs directory for the VM
> +	 */
> +	sprintf(buf, "vm%d", current->pid);

The same comment stands here. I don't see how you could possibly overrun
32 bytes with vm%d, but I'd feel a lot safer if this was an snprintf.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT
@ 2015-03-20 11:20     ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:20 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This creates a debugfs directory for each HV guest (assuming debugfs
> is enabled in the kernel config), and within that directory, a file
> by which the contents of the guest's HPT (hashed page table) can be
> read.  The directory is named vmnnnn, where nnnn is the PID of the
> process that created the guest.  The file is named "htab".  This is
> intended to help in debugging problems in the host's management
> of guest memory.
> 
> The contents of the file consist of a series of lines like this:
> 
>   3f48 4000d032bf003505 0000000bd7ff1196 00000003b5c71196
> 
> The first field is the index of the entry in the HPT, the second and
> third are the HPT entry, so the third entry contains the real page
> number that is mapped by the entry if the entry's valid bit is set.
> The fourth field is the guest's view of the second doubleword of the
> entry, so it contains the guest physical address.  (The format of the
> second through fourth fields are described in the Power ISA and also
> in arch/powerpc/include/asm/mmu-hash64.h.)
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/include/asm/kvm_book3s_64.h |   2 +
>  arch/powerpc/include/asm/kvm_host.h      |   2 +
>  arch/powerpc/kvm/book3s_64_mmu_hv.c      | 136 +++++++++++++++++++++++++++++++
>  arch/powerpc/kvm/book3s_hv.c             |  12 +++
>  virt/kvm/kvm_main.c                      |   1 +
>  5 files changed, 153 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
> index 0789a0f..869c53f 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_64.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_64.h
> @@ -436,6 +436,8 @@ static inline struct kvm_memslots *kvm_memslots_raw(struct kvm *kvm)
>  	return rcu_dereference_raw_notrace(kvm->memslots);
>  }
>  
> +extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);
> +
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  
>  #endif /* __ASM_KVM_BOOK3S_64_H__ */
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 015773f..f1d0bbc 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -238,6 +238,8 @@ struct kvm_arch {
>  	atomic_t hpte_mod_interest;
>  	cpumask_t need_tlb_flush;
>  	int hpt_cma_alloc;
> +	struct dentry *debugfs_dir;
> +	struct dentry *htab_dentry;
>  #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>  	struct mutex hpt_mutex;
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> index 6c6825a..d6fe308 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
> @@ -27,6 +27,7 @@
>  #include <linux/srcu.h>
>  #include <linux/anon_inodes.h>
>  #include <linux/file.h>
> +#include <linux/debugfs.h>
>  
>  #include <asm/tlbflush.h>
>  #include <asm/kvm_ppc.h>
> @@ -1490,6 +1491,141 @@ int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *ghf)
>  	return ret;
>  }
>  
> +struct debugfs_htab_state {
> +	struct kvm	*kvm;
> +	struct mutex	mutex;
> +	unsigned long	hpt_index;
> +	int		chars_left;
> +	int		buf_index;
> +	char		buf[64];
> +};
> +
> +static int debugfs_htab_open(struct inode *inode, struct file *file)
> +{
> +	struct kvm *kvm = inode->i_private;
> +	struct debugfs_htab_state *p;
> +
> +	p = kzalloc(sizeof(*p), GFP_KERNEL);
> +	if (!p)
> +		return -ENOMEM;
> +
> +	kvm_get_kvm(kvm);
> +	p->kvm = kvm;
> +	mutex_init(&p->mutex);
> +	file->private_data = p;
> +
> +	return nonseekable_open(inode, file);
> +}
> +
> +static int debugfs_htab_release(struct inode *inode, struct file *file)
> +{
> +	struct debugfs_htab_state *p = file->private_data;
> +
> +	kvm_put_kvm(p->kvm);
> +	kfree(p);
> +	return 0;
> +}
> +
> +static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
> +				 size_t len, loff_t *ppos)
> +{
> +	struct debugfs_htab_state *p = file->private_data;
> +	ssize_t ret, r;
> +	unsigned long i, n;
> +	unsigned long v, hr, gr;
> +	struct kvm *kvm;
> +	__be64 *hptp;
> +
> +	ret = mutex_lock_interruptible(&p->mutex);
> +	if (ret)
> +		return ret;
> +
> +	if (p->chars_left) {
> +		n = p->chars_left;
> +		if (n > len)
> +			n = len;
> +		r = copy_to_user(buf, p->buf + p->buf_index, n);
> +		n -= r;
> +		p->chars_left -= n;
> +		p->buf_index += n;
> +		buf += n;
> +		len -= n;
> +		ret = n;
> +		if (r) {
> +			if (!n)
> +				ret = -EFAULT;
> +			goto out;
> +		}
> +	}
> +
> +	kvm = p->kvm;
> +	i = p->hpt_index;
> +	hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
> +	for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
> +		if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)))
> +			continue;
> +
> +		/* lock the HPTE so it's stable and read it */
> +		preempt_disable();
> +		while (!try_lock_hpte(hptp, HPTE_V_HVLOCK))
> +			cpu_relax();
> +		v = be64_to_cpu(hptp[0]) & ~HPTE_V_HVLOCK;
> +		hr = be64_to_cpu(hptp[1]);
> +		gr = kvm->arch.revmap[i].guest_rpte;
> +		unlock_hpte(hptp, v);
> +		preempt_enable();
> +
> +		if (!(v & (HPTE_V_VALID | HPTE_V_ABSENT)))
> +			continue;
> +
> +		n = scnprintf(p->buf, sizeof(p->buf),
> +			      "%6lx %.16lx %.16lx %.16lx\n",
> +			      i, v, hr, gr);
> +		p->chars_left = n;
> +		if (n > len)
> +			n = len;
> +		r = copy_to_user(buf, p->buf, n);
> +		n -= r;
> +		p->chars_left -= n;
> +		p->buf_index = n;
> +		buf += n;
> +		len -= n;
> +		ret += n;
> +		if (r) {
> +			if (!ret)
> +				ret = -EFAULT;
> +			goto out;
> +		}
> +	}
> +	p->hpt_index = i;
> +
> + out:
> +	mutex_unlock(&p->mutex);
> +	return ret;
> +}
> +
> +ssize_t debugfs_htab_write(struct file *file, const char __user *buf,
> +			   size_t len, loff_t *ppos)
> +{
> +	return -EACCES;
> +}
> +
> +static const struct file_operations debugfs_htab_fops = {
> +	.owner	 = THIS_MODULE,
> +	.open	 = debugfs_htab_open,
> +	.release = debugfs_htab_release,
> +	.read	 = debugfs_htab_read,
> +	.write	 = debugfs_htab_write,
> +	.llseek	 = generic_file_llseek,
> +};
> +
> +void kvmppc_mmu_debugfs_init(struct kvm *kvm)
> +{
> +	kvm->arch.htab_dentry = debugfs_create_file("htab", 0400,
> +						    kvm->arch.debugfs_dir, kvm,
> +						    &debugfs_htab_fops);
> +}
> +
>  void kvmppc_mmu_book3s_hv_init(struct kvm_vcpu *vcpu)
>  {
>  	struct kvmppc_mmu *mmu = &vcpu->arch.mmu;
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 7b7102a..6631b53 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -32,6 +32,7 @@
>  #include <linux/page-flags.h>
>  #include <linux/srcu.h>
>  #include <linux/miscdevice.h>
> +#include <linux/debugfs.h>
>  
>  #include <asm/reg.h>
>  #include <asm/cputable.h>
> @@ -2307,6 +2308,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
>  static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  {
>  	unsigned long lpcr, lpid;
> +	char buf[32];
>  
>  	/* Allocate the guest's logical partition ID */
>  
> @@ -2347,6 +2349,14 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
>  	 */
>  	kvm_hv_vm_activated();
>  
> +	/*
> +	 * Create a debugfs directory for the VM
> +	 */
> +	sprintf(buf, "vm%d", current->pid);

The same comment stands here. I don't see how you could possibly overrun
32 bytes with vm%d, but I'd feel a lot safer if this was an snprintf.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
  2015-03-20 11:15     ` Alexander Graf
@ 2015-03-20 11:25       ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20 11:25 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 10:39, Paul Mackerras wrote:
> > This reads the timebase at various points in the real-mode guest
> > entry/exit code and uses that to accumulate total, minimum and
> > maximum time spent in those parts of the code.  Currently these
> > times are accumulated per vcpu in 5 parts of the code:
> > 
> > * rm_entry - time taken from the start of kvmppc_hv_entry() until
> >   just before entering the guest.
> > * rm_intr - time from when we take a hypervisor interrupt in the
> >   guest until we either re-enter the guest or decide to exit to the
> >   host.  This includes time spent handling hcalls in real mode.
> > * rm_exit - time from when we decide to exit the guest until the
> >   return from kvmppc_hv_entry().
> > * guest - time spend in the guest
> > * cede - time spent napping in real mode due to an H_CEDE hcall
> >   while other threads in the same vcore are active.
> > 
> > These times are exposed in debugfs in a directory per vcpu that
> > contains a file called "timings".  This file contains one line for
> > each of the 5 timings above, with the name followed by a colon and
> > 4 numbers, which are the count (number of times the code has been
> > executed), the total time, the minimum time, and the maximum time,
> > all in nanoseconds.
> > 
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
> 
> Have you measure the additional overhead this brings?

I haven't - in fact I did this patch so I could measure the overhead
or improvement from other changes I did, but it doesn't measure its
own overhead, of course.  I guess I need a workload that does a
defined number of guest entries and exits and measure how fast it runs
with and without the patch (maybe something like H_SET_MODE in a
loop).  I'll figure something out and post the results.  

Paul.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
@ 2015-03-20 11:25       ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20 11:25 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 10:39, Paul Mackerras wrote:
> > This reads the timebase at various points in the real-mode guest
> > entry/exit code and uses that to accumulate total, minimum and
> > maximum time spent in those parts of the code.  Currently these
> > times are accumulated per vcpu in 5 parts of the code:
> > 
> > * rm_entry - time taken from the start of kvmppc_hv_entry() until
> >   just before entering the guest.
> > * rm_intr - time from when we take a hypervisor interrupt in the
> >   guest until we either re-enter the guest or decide to exit to the
> >   host.  This includes time spent handling hcalls in real mode.
> > * rm_exit - time from when we decide to exit the guest until the
> >   return from kvmppc_hv_entry().
> > * guest - time spend in the guest
> > * cede - time spent napping in real mode due to an H_CEDE hcall
> >   while other threads in the same vcore are active.
> > 
> > These times are exposed in debugfs in a directory per vcpu that
> > contains a file called "timings".  This file contains one line for
> > each of the 5 timings above, with the name followed by a colon and
> > 4 numbers, which are the count (number of times the code has been
> > executed), the total time, the minimum time, and the maximum time,
> > all in nanoseconds.
> > 
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
> 
> Have you measure the additional overhead this brings?

I haven't - in fact I did this patch so I could measure the overhead
or improvement from other changes I did, but it doesn't measure its
own overhead, of course.  I guess I need a workload that does a
defined number of guest entries and exits and measure how fast it runs
with and without the patch (maybe something like H_SET_MODE in a
loop).  I'll figure something out and post the results.  

Paul.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
  2015-03-20 11:01     ` Alexander Graf
@ 2015-03-20 11:26       ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20 11:26 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm, Bharata B Rao

On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 10:39, Paul Mackerras wrote:
> > From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > 
> > Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> > correctly, certain work arounds have to be employed to allow reuse of
> > vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> > proposed workaround is to park the vcpu fd in userspace during cpu unplug
> > and reuse it later during next hotplug.
> > 
> > More details can be found here:
> > KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> > QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> > 
> > In order to support this workaround with PowerPC KVM, don't create or
> > initialize ICP if the vCPU is found to be already associated with an ICP.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
> 
> This probably makes some sense, but please make sure that user space has
> some way to figure out whether hotplug works at all.

Bharata is working on the qemu side of all this, so I assume he has
that covered.

> Also Paul, for patches that you pick up from others, I'd prefer if they
> send the patches to the ML themselves first and you pick them up from
> there then. That way we give everyone the same treatment.

Fair enough.  In fact Bharata did post the patch but he sent it to
linuxppc-dev@ozlabs.org not the KVM lists.

Paul.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
@ 2015-03-20 11:26       ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-20 11:26 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm, Bharata B Rao

On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 10:39, Paul Mackerras wrote:
> > From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > 
> > Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> > correctly, certain work arounds have to be employed to allow reuse of
> > vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> > proposed workaround is to park the vcpu fd in userspace during cpu unplug
> > and reuse it later during next hotplug.
> > 
> > More details can be found here:
> > KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> > QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> > 
> > In order to support this workaround with PowerPC KVM, don't create or
> > initialize ICP if the vCPU is found to be already associated with an ICP.
> > 
> > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> > Signed-off-by: Paul Mackerras <paulus@samba.org>
> 
> This probably makes some sense, but please make sure that user space has
> some way to figure out whether hotplug works at all.

Bharata is working on the qemu side of all this, so I assume he has
that covered.

> Also Paul, for patches that you pick up from others, I'd prefer if they
> send the patches to the ML themselves first and you pick them up from
> there then. That way we give everyone the same treatment.

Fair enough.  In fact Bharata did post the patch but he sent it to
linuxppc-dev@ozlabs.org not the KVM lists.

Paul.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
  2015-03-20  9:39   ` Paul Mackerras
@ 2015-03-20 11:28     ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:28 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This uses msgsnd where possible for signalling other threads within
> the same core on POWER8 systems, rather than IPIs through the XICS
> interrupt controller.  This includes waking secondary threads to run
> the guest, the interrupts generated by the virtual XICS, and the
> interrupts to bring the other threads out of the guest when exiting.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/kernel/asm-offsets.c       |  4 +++
>  arch/powerpc/kvm/book3s_hv.c            | 48 ++++++++++++++++++++++-----------
>  arch/powerpc/kvm/book3s_hv_rm_xics.c    | 11 ++++++++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 ++++++++++++++++++++++++----
>  4 files changed, 83 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index fa7b57d..0ce2aa6 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -37,6 +37,7 @@
>  #include <asm/thread_info.h>
>  #include <asm/rtas.h>
>  #include <asm/vdso_datapage.h>
> +#include <asm/dbell.h>
>  #ifdef CONFIG_PPC64
>  #include <asm/paca.h>
>  #include <asm/lppaca.h>
> @@ -568,6 +569,7 @@ int main(void)
>  	DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
>  	DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
>  	DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
> +	DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
>  	DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
>  	DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
>  	DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
> @@ -757,5 +759,7 @@ int main(void)
>  			offsetof(struct paca_struct, subcore_sibling_mask));
>  #endif
>  
> +	DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> +
>  	return 0;
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 03a8bb4..2c34bae 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -51,6 +51,7 @@
>  #include <asm/hvcall.h>
>  #include <asm/switch_to.h>
>  #include <asm/smp.h>
> +#include <asm/dbell.h>
>  #include <linux/gfp.h>
>  #include <linux/vmalloc.h>
>  #include <linux/highmem.h>
> @@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
>  static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
>  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
>  
> +static bool kvmppc_ipi_thread(int cpu)
> +{
> +	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> +	if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
> +		preempt_disable();
> +		if ((cpu & ~7) == (smp_processor_id() & ~7)) {
> +			unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> +			msg |= cpu & 7;
> +			smp_mb();
> +			__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> +			preempt_enable();
> +			return true;
> +		}
> +		preempt_enable();
> +	}
> +
> +#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
> +	if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
> +		xics_wake_cpu(cpu);
> +		return true;
> +	}
> +#endif
> +
> +	return false;
> +}
> +
>  static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
>  {
> -	int me;
>  	int cpu = vcpu->cpu;
>  	wait_queue_head_t *wqp;
>  
> @@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
>  		++vcpu->stat.halt_wakeup;
>  	}
>  
> -	me = get_cpu();
> +	if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
> +		return;
>  
>  	/* CPU points to the first thread of the core */
> -	if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
> -#ifdef CONFIG_PPC_ICP_NATIVE
> -		int real_cpu = cpu + vcpu->arch.ptid;
> -		if (paca[real_cpu].kvm_hstate.xics_phys)
> -			xics_wake_cpu(real_cpu);
> -		else
> -#endif
> -		if (cpu_online(cpu))
> -			smp_send_reschedule(cpu);
> -	}
> -	put_cpu();
> +	if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
> +		smp_send_reschedule(cpu);
>  }
>  
>  /*
> @@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
>  	/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
>  	smp_wmb();
>  	tpaca->kvm_hstate.kvm_vcpu = vcpu;
> -#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
>  	if (cpu != smp_processor_id())
> -		xics_wake_cpu(cpu);
> -#endif
> +		kvmppc_ipi_thread(cpu);
>  }
>  
>  static void kvmppc_wait_for_nap(void)
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
> index 6dded8c..457a8b1 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
> @@ -18,6 +18,7 @@
>  #include <asm/debug.h>
>  #include <asm/synch.h>
>  #include <asm/ppc-opcode.h>
> +#include <asm/dbell.h>
>  
>  #include "book3s_xics.h"
>  
> @@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
>  	/* In SMT cpu will always point to thread 0, we adjust it */
>  	cpu += vcpu->arch.ptid;
>  
> +	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> +	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> +	    (cpu & ~7) == (raw_smp_processor_id() & ~7)) {

Can we somehow encapsulate the secret knowledge that 8 threads mean one
core?


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
@ 2015-03-20 11:28     ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:28 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This uses msgsnd where possible for signalling other threads within
> the same core on POWER8 systems, rather than IPIs through the XICS
> interrupt controller.  This includes waking secondary threads to run
> the guest, the interrupts generated by the virtual XICS, and the
> interrupts to bring the other threads out of the guest when exiting.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
>  arch/powerpc/kernel/asm-offsets.c       |  4 +++
>  arch/powerpc/kvm/book3s_hv.c            | 48 ++++++++++++++++++++++-----------
>  arch/powerpc/kvm/book3s_hv_rm_xics.c    | 11 ++++++++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 41 ++++++++++++++++++++++++----
>  4 files changed, 83 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index fa7b57d..0ce2aa6 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -37,6 +37,7 @@
>  #include <asm/thread_info.h>
>  #include <asm/rtas.h>
>  #include <asm/vdso_datapage.h>
> +#include <asm/dbell.h>
>  #ifdef CONFIG_PPC64
>  #include <asm/paca.h>
>  #include <asm/lppaca.h>
> @@ -568,6 +569,7 @@ int main(void)
>  	DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
>  	DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
>  	DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
> +	DEFINE(VCORE_PCPU, offsetof(struct kvmppc_vcore, pcpu));
>  	DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
>  	DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
>  	DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
> @@ -757,5 +759,7 @@ int main(void)
>  			offsetof(struct paca_struct, subcore_sibling_mask));
>  #endif
>  
> +	DEFINE(PPC_DBELL_SERVER, PPC_DBELL_SERVER);
> +
>  	return 0;
>  }
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 03a8bb4..2c34bae 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -51,6 +51,7 @@
>  #include <asm/hvcall.h>
>  #include <asm/switch_to.h>
>  #include <asm/smp.h>
> +#include <asm/dbell.h>
>  #include <linux/gfp.h>
>  #include <linux/vmalloc.h>
>  #include <linux/highmem.h>
> @@ -84,9 +85,34 @@ static DECLARE_BITMAP(default_enabled_hcalls, MAX_HCALL_OPCODE/4 + 1);
>  static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
>  static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
>  
> +static bool kvmppc_ipi_thread(int cpu)
> +{
> +	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> +	if (cpu_has_feature(CPU_FTR_ARCH_207S)) {
> +		preempt_disable();
> +		if ((cpu & ~7) = (smp_processor_id() & ~7)) {
> +			unsigned long msg = PPC_DBELL_TYPE(PPC_DBELL_SERVER);
> +			msg |= cpu & 7;
> +			smp_mb();
> +			__asm__ __volatile__ (PPC_MSGSND(%0) : : "r" (msg));
> +			preempt_enable();
> +			return true;
> +		}
> +		preempt_enable();
> +	}
> +
> +#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
> +	if (cpu >= 0 && cpu < nr_cpu_ids && paca[cpu].kvm_hstate.xics_phys) {
> +		xics_wake_cpu(cpu);
> +		return true;
> +	}
> +#endif
> +
> +	return false;
> +}
> +
>  static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
>  {
> -	int me;
>  	int cpu = vcpu->cpu;
>  	wait_queue_head_t *wqp;
>  
> @@ -96,20 +122,12 @@ static void kvmppc_fast_vcpu_kick_hv(struct kvm_vcpu *vcpu)
>  		++vcpu->stat.halt_wakeup;
>  	}
>  
> -	me = get_cpu();
> +	if (kvmppc_ipi_thread(cpu + vcpu->arch.ptid))
> +		return;
>  
>  	/* CPU points to the first thread of the core */
> -	if (cpu != me && cpu >= 0 && cpu < nr_cpu_ids) {
> -#ifdef CONFIG_PPC_ICP_NATIVE
> -		int real_cpu = cpu + vcpu->arch.ptid;
> -		if (paca[real_cpu].kvm_hstate.xics_phys)
> -			xics_wake_cpu(real_cpu);
> -		else
> -#endif
> -		if (cpu_online(cpu))
> -			smp_send_reschedule(cpu);
> -	}
> -	put_cpu();
> +	if (cpu >= 0 && cpu < nr_cpu_ids && cpu_online(cpu))
> +		smp_send_reschedule(cpu);
>  }
>  
>  /*
> @@ -1754,10 +1772,8 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
>  	/* Order stores to hstate.kvm_vcore etc. before store to kvm_vcpu */
>  	smp_wmb();
>  	tpaca->kvm_hstate.kvm_vcpu = vcpu;
> -#if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
>  	if (cpu != smp_processor_id())
> -		xics_wake_cpu(cpu);
> -#endif
> +		kvmppc_ipi_thread(cpu);
>  }
>  
>  static void kvmppc_wait_for_nap(void)
> diff --git a/arch/powerpc/kvm/book3s_hv_rm_xics.c b/arch/powerpc/kvm/book3s_hv_rm_xics.c
> index 6dded8c..457a8b1 100644
> --- a/arch/powerpc/kvm/book3s_hv_rm_xics.c
> +++ b/arch/powerpc/kvm/book3s_hv_rm_xics.c
> @@ -18,6 +18,7 @@
>  #include <asm/debug.h>
>  #include <asm/synch.h>
>  #include <asm/ppc-opcode.h>
> +#include <asm/dbell.h>
>  
>  #include "book3s_xics.h"
>  
> @@ -83,6 +84,16 @@ static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu,
>  	/* In SMT cpu will always point to thread 0, we adjust it */
>  	cpu += vcpu->arch.ptid;
>  
> +	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> +	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> +	    (cpu & ~7) = (raw_smp_processor_id() & ~7)) {

Can we somehow encapsulate the secret knowledge that 8 threads mean one
core?


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
  2015-03-20 11:26       ` Paul Mackerras
@ 2015-03-20 11:34         ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:34 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm-ppc, kvm, Bharata B Rao



On 20.03.15 12:26, Paul Mackerras wrote:
> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>
>>
>> On 20.03.15 10:39, Paul Mackerras wrote:
>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>
>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>> correctly, certain work arounds have to be employed to allow reuse of
>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>> and reuse it later during next hotplug.
>>>
>>> More details can be found here:
>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>
>>> In order to support this workaround with PowerPC KVM, don't create or
>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>
>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>
>> This probably makes some sense, but please make sure that user space has
>> some way to figure out whether hotplug works at all.
> 
> Bharata is working on the qemu side of all this, so I assume he has
> that covered.

Well, so far the kernel doesn't expose anything he can query, so I
suppose he just blindly assumes that older host kernels will randomly
break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
can check on.

> 
>> Also Paul, for patches that you pick up from others, I'd prefer if they
>> send the patches to the ML themselves first and you pick them up from
>> there then. That way we give everyone the same treatment.
> 
> Fair enough.  In fact Bharata did post the patch but he sent it to
> linuxppc-dev@ozlabs.org not the KVM lists.

Please make sure you only take patches into your queue that made it to
at least kvm@vger, preferably kvm-ppc@vger as well. If you see related
patches on other mailing lists, just ask the respective people to resend
with proper ML exposure.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
@ 2015-03-20 11:34         ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:34 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm-ppc, kvm, Bharata B Rao



On 20.03.15 12:26, Paul Mackerras wrote:
> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>
>>
>> On 20.03.15 10:39, Paul Mackerras wrote:
>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>
>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>> correctly, certain work arounds have to be employed to allow reuse of
>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>> and reuse it later during next hotplug.
>>>
>>> More details can be found here:
>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>
>>> In order to support this workaround with PowerPC KVM, don't create or
>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>
>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>
>> This probably makes some sense, but please make sure that user space has
>> some way to figure out whether hotplug works at all.
> 
> Bharata is working on the qemu side of all this, so I assume he has
> that covered.

Well, so far the kernel doesn't expose anything he can query, so I
suppose he just blindly assumes that older host kernels will randomly
break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
can check on.

> 
>> Also Paul, for patches that you pick up from others, I'd prefer if they
>> send the patches to the ML themselves first and you pick them up from
>> there then. That way we give everyone the same treatment.
> 
> Fair enough.  In fact Bharata did post the patch but he sent it to
> linuxppc-dev@ozlabs.org not the KVM lists.

Please make sure you only take patches into your queue that made it to
at least kvm@vger, preferably kvm-ppc@vger as well. If you see related
patches on other mailing lists, just ask the respective people to resend
with proper ML exposure.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
  2015-03-20 11:25       ` Paul Mackerras
@ 2015-03-20 11:35         ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:35 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm-ppc, kvm



On 20.03.15 12:25, Paul Mackerras wrote:
> On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
>>
>>
>> On 20.03.15 10:39, Paul Mackerras wrote:
>>> This reads the timebase at various points in the real-mode guest
>>> entry/exit code and uses that to accumulate total, minimum and
>>> maximum time spent in those parts of the code.  Currently these
>>> times are accumulated per vcpu in 5 parts of the code:
>>>
>>> * rm_entry - time taken from the start of kvmppc_hv_entry() until
>>>   just before entering the guest.
>>> * rm_intr - time from when we take a hypervisor interrupt in the
>>>   guest until we either re-enter the guest or decide to exit to the
>>>   host.  This includes time spent handling hcalls in real mode.
>>> * rm_exit - time from when we decide to exit the guest until the
>>>   return from kvmppc_hv_entry().
>>> * guest - time spend in the guest
>>> * cede - time spent napping in real mode due to an H_CEDE hcall
>>>   while other threads in the same vcore are active.
>>>
>>> These times are exposed in debugfs in a directory per vcpu that
>>> contains a file called "timings".  This file contains one line for
>>> each of the 5 timings above, with the name followed by a colon and
>>> 4 numbers, which are the count (number of times the code has been
>>> executed), the total time, the minimum time, and the maximum time,
>>> all in nanoseconds.
>>>
>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>
>> Have you measure the additional overhead this brings?
> 
> I haven't - in fact I did this patch so I could measure the overhead
> or improvement from other changes I did, but it doesn't measure its
> own overhead, of course.  I guess I need a workload that does a
> defined number of guest entries and exits and measure how fast it runs
> with and without the patch (maybe something like H_SET_MODE in a
> loop).  I'll figure something out and post the results.  

Yeah, just measure the number of exits you can handle for a simple
hcall. If there is measurable overhead, it's probably a good idea to
move the statistics gathering into #ifdef paths for DEBUGFS or maybe
even a separate EXIT_TIMING config option as we have it for booke.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
@ 2015-03-20 11:35         ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:35 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm-ppc, kvm



On 20.03.15 12:25, Paul Mackerras wrote:
> On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
>>
>>
>> On 20.03.15 10:39, Paul Mackerras wrote:
>>> This reads the timebase at various points in the real-mode guest
>>> entry/exit code and uses that to accumulate total, minimum and
>>> maximum time spent in those parts of the code.  Currently these
>>> times are accumulated per vcpu in 5 parts of the code:
>>>
>>> * rm_entry - time taken from the start of kvmppc_hv_entry() until
>>>   just before entering the guest.
>>> * rm_intr - time from when we take a hypervisor interrupt in the
>>>   guest until we either re-enter the guest or decide to exit to the
>>>   host.  This includes time spent handling hcalls in real mode.
>>> * rm_exit - time from when we decide to exit the guest until the
>>>   return from kvmppc_hv_entry().
>>> * guest - time spend in the guest
>>> * cede - time spent napping in real mode due to an H_CEDE hcall
>>>   while other threads in the same vcore are active.
>>>
>>> These times are exposed in debugfs in a directory per vcpu that
>>> contains a file called "timings".  This file contains one line for
>>> each of the 5 timings above, with the name followed by a colon and
>>> 4 numbers, which are the count (number of times the code has been
>>> executed), the total time, the minimum time, and the maximum time,
>>> all in nanoseconds.
>>>
>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>
>> Have you measure the additional overhead this brings?
> 
> I haven't - in fact I did this patch so I could measure the overhead
> or improvement from other changes I did, but it doesn't measure its
> own overhead, of course.  I guess I need a workload that does a
> defined number of guest entries and exits and measure how fast it runs
> with and without the patch (maybe something like H_SET_MODE in a
> loop).  I'll figure something out and post the results.  

Yeah, just measure the number of exits you can handle for a simple
hcall. If there is measurable overhead, it's probably a good idea to
move the statistics gathering into #ifdef paths for DEBUGFS or maybe
even a separate EXIT_TIMING config option as we have it for booke.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/23] Bug fixes and improvements for HV KVM
  2015-03-20  9:39 ` Paul Mackerras
@ 2015-03-20 11:36   ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:36 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This is my current patch queue for HV KVM on PPC.  This series is
> based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3
> plus a set of recent KVM changes which don't intersect with the
> changes in this series.  On top of that, in my testing I have some
> patches which are not KVM-related but are needed to boot and run a
> recent upstream kernel successfully:
> 
> tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
> tick/hotplug: Handover time related duties before cpu offline
> powerpc/powernv: Check image loaded or not before calling flash
> powerpc/powernv: Fixes for hypervisor doorbell handling
> powerpc/powernv: Fix return value from power7_nap() et al.
> powerpc: Export __spin_yield
> 
> These patches have been posted by their authors and are on their way
> upstream via various trees.  They are not included in this series.
> 
> The first three patches are bug fixes that should go into v4.0 if
> possible.  The remainder are intended for the 4.1 merge window.
> 
> The patch "powerpc: Export __spin_yield" is a prerequisite for patch
> 9/23 of this series ("KVM: PPC: Book3S HV: Convert ICS mutex lock to
> spin lock").  It is on its way upstream through the linuxppc-dev
> mailing list.
> 
> The patch "powerpc/powernv: Fixes for hypervisor doorbell handling" is
> needed for correct operation with patch 20/23, "KVM: PPC: Book3S HV:
> Use msgsnd for signalling threads".  It is also on its way upstream
> through the linuxppc-dev list.  I am expecting both of these
> prerequisite patches to go into 4.0.
> 
> Finally, the last patch in this series converts some of the assembly
> code in book3s_hv_rmhandlers.S into C.  I intend to continue this
> trend.

Thanks, applied patches 4-11 to kvm-ppc-queue.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 00/23] Bug fixes and improvements for HV KVM
@ 2015-03-20 11:36   ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-20 11:36 UTC (permalink / raw)
  To: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 10:39, Paul Mackerras wrote:
> This is my current patch queue for HV KVM on PPC.  This series is
> based on the "queue" branch of the KVM tree, i.e. roughly v4.0-rc3
> plus a set of recent KVM changes which don't intersect with the
> changes in this series.  On top of that, in my testing I have some
> patches which are not KVM-related but are needed to boot and run a
> recent upstream kernel successfully:
> 
> tick/broadcast-hrtimer : Fix suspicious RCU usage in idle loop
> tick/hotplug: Handover time related duties before cpu offline
> powerpc/powernv: Check image loaded or not before calling flash
> powerpc/powernv: Fixes for hypervisor doorbell handling
> powerpc/powernv: Fix return value from power7_nap() et al.
> powerpc: Export __spin_yield
> 
> These patches have been posted by their authors and are on their way
> upstream via various trees.  They are not included in this series.
> 
> The first three patches are bug fixes that should go into v4.0 if
> possible.  The remainder are intended for the 4.1 merge window.
> 
> The patch "powerpc: Export __spin_yield" is a prerequisite for patch
> 9/23 of this series ("KVM: PPC: Book3S HV: Convert ICS mutex lock to
> spin lock").  It is on its way upstream through the linuxppc-dev
> mailing list.
> 
> The patch "powerpc/powernv: Fixes for hypervisor doorbell handling" is
> needed for correct operation with patch 20/23, "KVM: PPC: Book3S HV:
> Use msgsnd for signalling threads".  It is also on its way upstream
> through the linuxppc-dev list.  I am expecting both of these
> prerequisite patches to go into 4.0.
> 
> Finally, the last patch in this series converts some of the assembly
> code in book3s_hv_rmhandlers.S into C.  I intend to continue this
> trend.

Thanks, applied patches 4-11 to kvm-ppc-queue.


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
  2015-03-20 11:34         ` Alexander Graf
@ 2015-03-20 15:51           ` Bharata B Rao
  -1 siblings, 0 replies; 80+ messages in thread
From: Bharata B Rao @ 2015-03-20 15:51 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Paul Mackerras, kvm-ppc, kvm

On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 12:26, Paul Mackerras wrote:
> > On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
> >>
> >>
> >> On 20.03.15 10:39, Paul Mackerras wrote:
> >>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> >>>
> >>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> >>> correctly, certain work arounds have to be employed to allow reuse of
> >>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> >>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
> >>> and reuse it later during next hotplug.
> >>>
> >>> More details can be found here:
> >>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> >>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> >>>
> >>> In order to support this workaround with PowerPC KVM, don't create or
> >>> initialize ICP if the vCPU is found to be already associated with an ICP.
> >>>
> >>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> >>> Signed-off-by: Paul Mackerras <paulus@samba.org>
> >>
> >> This probably makes some sense, but please make sure that user space has
> >> some way to figure out whether hotplug works at all.
> > 
> > Bharata is working on the qemu side of all this, so I assume he has
> > that covered.
> 
> Well, so far the kernel doesn't expose anything he can query, so I
> suppose he just blindly assumes that older host kernels will randomly
> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
> can check on.

I see that you have already taken this into your tree. I have an updated
patch to expose a CAP. If the below patch looks ok, then let me know how
you would prefer to take this patch in.

Regards,
Bharata.

KVM: PPC: BOOK3S: Allow reuse of vCPU object

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
correctly, certain work arounds have to be employed to allow reuse of
vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
proposed workaround is to park the vcpu fd in userspace during cpu unplug
and reuse it later during next hotplug.

More details can be found here:
KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

In order to support this workaround with PowerPC KVM, don't create or
initialize ICP if the vCPU is found to be already associated with an ICP.
User space (QEMU) can reuse the vCPU after checking for the availability
of KVM_CAP_SPAPR_REUSE_VCPU capability.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_xics.c |    9 +++++++--
 arch/powerpc/kvm/powerpc.c     |   12 ++++++++++++
 include/uapi/linux/kvm.h       |    1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a4a8d9f..ead3a35 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
 		return -EPERM;
 	if (xics->kvm != vcpu->kvm)
 		return -EPERM;
-	if (vcpu->arch.irq_type)
-		return -EBUSY;
+
+	/*
+	 * If irq_type is already set, don't reinialize but
+	 * return success allowing this vcpu to be reused.
+	 */
+	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
+		return 0;
 
 	r = kvmppc_xics_create_icp(vcpu, xcpu);
 	if (!r)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27c0fac..5b7007c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = 1;
 		break;
 #endif
+	case KVM_CAP_SPAPR_REUSE_VCPU:
+		/*
+		 * Kernel currently doesn't support closing of vCPU fd from
+		 * user space (QEMU) correctly. Hence the option available
+		 * is to park the vCPU fd in user space whenever a guest
+		 * CPU is hot removed and reuse the same later when another
+		 * guest CPU is hotplugged. This capability determines whether
+		 * it is safe to assume if parking of vCPU fd and reuse from
+		 * user space works for sPAPR guests.
+		 */
+		r = 1;
+		break;
 	default:
 		r = 0;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8055706..8464755 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -760,6 +760,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ENABLE_HCALL 104
 #define KVM_CAP_CHECK_EXTENSION_VM 105
 #define KVM_CAP_S390_USER_SIGP 106
+#define KVM_CAP_SPAPR_REUSE_VCPU 107
 
 #ifdef KVM_CAP_IRQ_ROUTING
 

^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
@ 2015-03-20 15:51           ` Bharata B Rao
  0 siblings, 0 replies; 80+ messages in thread
From: Bharata B Rao @ 2015-03-20 15:51 UTC (permalink / raw)
  To: Alexander Graf; +Cc: Paul Mackerras, kvm-ppc, kvm

On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 12:26, Paul Mackerras wrote:
> > On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
> >>
> >>
> >> On 20.03.15 10:39, Paul Mackerras wrote:
> >>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> >>>
> >>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> >>> correctly, certain work arounds have to be employed to allow reuse of
> >>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> >>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
> >>> and reuse it later during next hotplug.
> >>>
> >>> More details can be found here:
> >>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> >>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> >>>
> >>> In order to support this workaround with PowerPC KVM, don't create or
> >>> initialize ICP if the vCPU is found to be already associated with an ICP.
> >>>
> >>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> >>> Signed-off-by: Paul Mackerras <paulus@samba.org>
> >>
> >> This probably makes some sense, but please make sure that user space has
> >> some way to figure out whether hotplug works at all.
> > 
> > Bharata is working on the qemu side of all this, so I assume he has
> > that covered.
> 
> Well, so far the kernel doesn't expose anything he can query, so I
> suppose he just blindly assumes that older host kernels will randomly
> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
> can check on.

I see that you have already taken this into your tree. I have an updated
patch to expose a CAP. If the below patch looks ok, then let me know how
you would prefer to take this patch in.

Regards,
Bharata.

KVM: PPC: BOOK3S: Allow reuse of vCPU object

From: Bharata B Rao <bharata@linux.vnet.ibm.com>

Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
correctly, certain work arounds have to be employed to allow reuse of
vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
proposed workaround is to park the vcpu fd in userspace during cpu unplug
and reuse it later during next hotplug.

More details can be found here:
KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html

In order to support this workaround with PowerPC KVM, don't create or
initialize ICP if the vCPU is found to be already associated with an ICP.
User space (QEMU) can reuse the vCPU after checking for the availability
of KVM_CAP_SPAPR_REUSE_VCPU capability.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_xics.c |    9 +++++++--
 arch/powerpc/kvm/powerpc.c     |   12 ++++++++++++
 include/uapi/linux/kvm.h       |    1 +
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a4a8d9f..ead3a35 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
 		return -EPERM;
 	if (xics->kvm != vcpu->kvm)
 		return -EPERM;
-	if (vcpu->arch.irq_type)
-		return -EBUSY;
+
+	/*
+	 * If irq_type is already set, don't reinialize but
+	 * return success allowing this vcpu to be reused.
+	 */
+	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
+		return 0;
 
 	r = kvmppc_xics_create_icp(vcpu, xcpu);
 	if (!r)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 27c0fac..5b7007c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = 1;
 		break;
 #endif
+	case KVM_CAP_SPAPR_REUSE_VCPU:
+		/*
+		 * Kernel currently doesn't support closing of vCPU fd from
+		 * user space (QEMU) correctly. Hence the option available
+		 * is to park the vCPU fd in user space whenever a guest
+		 * CPU is hot removed and reuse the same later when another
+		 * guest CPU is hotplugged. This capability determines whether
+		 * it is safe to assume if parking of vCPU fd and reuse from
+		 * user space works for sPAPR guests.
+		 */
+		r = 1;
+		break;
 	default:
 		r = 0;
 		break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 8055706..8464755 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -760,6 +760,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_ENABLE_HCALL 104
 #define KVM_CAP_CHECK_EXTENSION_VM 105
 #define KVM_CAP_S390_USER_SIGP 106
+#define KVM_CAP_SPAPR_REUSE_VCPU 107
 
 #ifdef KVM_CAP_IRQ_ROUTING
 


^ permalink raw reply related	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
  2015-03-20 15:51           ` Bharata B Rao
@ 2015-03-21 14:58             ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-21 14:58 UTC (permalink / raw)
  To: bharata; +Cc: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 16:51, Bharata B Rao wrote:
> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
>>
>>
>> On 20.03.15 12:26, Paul Mackerras wrote:
>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 20.03.15 10:39, Paul Mackerras wrote:
>>>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>
>>>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>>>> correctly, certain work arounds have to be employed to allow reuse of
>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>>>> and reuse it later during next hotplug.
>>>>>
>>>>> More details can be found here:
>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>>>
>>>>> In order to support this workaround with PowerPC KVM, don't create or
>>>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>>>
>>>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>>>
>>>> This probably makes some sense, but please make sure that user space has
>>>> some way to figure out whether hotplug works at all.
>>>
>>> Bharata is working on the qemu side of all this, so I assume he has
>>> that covered.
>>
>> Well, so far the kernel doesn't expose anything he can query, so I
>> suppose he just blindly assumes that older host kernels will randomly
>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
>> can check on.
> 
> I see that you have already taken this into your tree. I have an updated
> patch to expose a CAP. If the below patch looks ok, then let me know how
> you would prefer to take this patch in.
> 
> Regards,
> Bharata.
> 
> KVM: PPC: BOOK3S: Allow reuse of vCPU object
> 
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> correctly, certain work arounds have to be employed to allow reuse of
> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> proposed workaround is to park the vcpu fd in userspace during cpu unplug
> and reuse it later during next hotplug.
> 
> More details can be found here:
> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> 
> In order to support this workaround with PowerPC KVM, don't create or
> initialize ICP if the vCPU is found to be already associated with an ICP.
> User space (QEMU) can reuse the vCPU after checking for the availability
> of KVM_CAP_SPAPR_REUSE_VCPU capability.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_xics.c |    9 +++++++--
>  arch/powerpc/kvm/powerpc.c     |   12 ++++++++++++
>  include/uapi/linux/kvm.h       |    1 +
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
> index a4a8d9f..ead3a35 100644
> --- a/arch/powerpc/kvm/book3s_xics.c
> +++ b/arch/powerpc/kvm/book3s_xics.c
> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
>  		return -EPERM;
>  	if (xics->kvm != vcpu->kvm)
>  		return -EPERM;
> -	if (vcpu->arch.irq_type)
> -		return -EBUSY;
> +
> +	/*
> +	 * If irq_type is already set, don't reinialize but
> +	 * return success allowing this vcpu to be reused.
> +	 */
> +	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
> +		return 0;
>  
>  	r = kvmppc_xics_create_icp(vcpu, xcpu);
>  	if (!r)
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 27c0fac..5b7007c 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = 1;
>  		break;
>  #endif
> +	case KVM_CAP_SPAPR_REUSE_VCPU:
> +		/*
> +		 * Kernel currently doesn't support closing of vCPU fd from
> +		 * user space (QEMU) correctly. Hence the option available
> +		 * is to park the vCPU fd in user space whenever a guest
> +		 * CPU is hot removed and reuse the same later when another
> +		 * guest CPU is hotplugged. This capability determines whether
> +		 * it is safe to assume if parking of vCPU fd and reuse from
> +		 * user space works for sPAPR guests.

I don't see how the code you're changing here has anything to do with
parking vcpus. It's all about being able to call connect on an already
connected vcpu and not erroring out. Please reflect this in the cap name
and description.

You also need to update Documentation/virtual/kvm/api.txt.

Furthermore, thinking about this a bit more, I might still miss the
exact case why you need this. Why is QEMU issuing a connect again? Could
it maybe just not do it?

Also please just send a patch on top of this one for the CAP. It doesn't
hurt to separate them. Unless we end up concluding that you don't need
it at all, then I'll just remove your patch from my queue again ;).


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
@ 2015-03-21 14:58             ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-21 14:58 UTC (permalink / raw)
  To: bharata; +Cc: Paul Mackerras, kvm-ppc, kvm



On 20.03.15 16:51, Bharata B Rao wrote:
> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
>>
>>
>> On 20.03.15 12:26, Paul Mackerras wrote:
>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 20.03.15 10:39, Paul Mackerras wrote:
>>>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>
>>>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>>>> correctly, certain work arounds have to be employed to allow reuse of
>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>>>> and reuse it later during next hotplug.
>>>>>
>>>>> More details can be found here:
>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>>>
>>>>> In order to support this workaround with PowerPC KVM, don't create or
>>>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>>>
>>>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>>>
>>>> This probably makes some sense, but please make sure that user space has
>>>> some way to figure out whether hotplug works at all.
>>>
>>> Bharata is working on the qemu side of all this, so I assume he has
>>> that covered.
>>
>> Well, so far the kernel doesn't expose anything he can query, so I
>> suppose he just blindly assumes that older host kernels will randomly
>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
>> can check on.
> 
> I see that you have already taken this into your tree. I have an updated
> patch to expose a CAP. If the below patch looks ok, then let me know how
> you would prefer to take this patch in.
> 
> Regards,
> Bharata.
> 
> KVM: PPC: BOOK3S: Allow reuse of vCPU object
> 
> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
> 
> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
> correctly, certain work arounds have to be employed to allow reuse of
> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
> proposed workaround is to park the vcpu fd in userspace during cpu unplug
> and reuse it later during next hotplug.
> 
> More details can be found here:
> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
> 
> In order to support this workaround with PowerPC KVM, don't create or
> initialize ICP if the vCPU is found to be already associated with an ICP.
> User space (QEMU) can reuse the vCPU after checking for the availability
> of KVM_CAP_SPAPR_REUSE_VCPU capability.
> 
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_xics.c |    9 +++++++--
>  arch/powerpc/kvm/powerpc.c     |   12 ++++++++++++
>  include/uapi/linux/kvm.h       |    1 +
>  3 files changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
> index a4a8d9f..ead3a35 100644
> --- a/arch/powerpc/kvm/book3s_xics.c
> +++ b/arch/powerpc/kvm/book3s_xics.c
> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
>  		return -EPERM;
>  	if (xics->kvm != vcpu->kvm)
>  		return -EPERM;
> -	if (vcpu->arch.irq_type)
> -		return -EBUSY;
> +
> +	/*
> +	 * If irq_type is already set, don't reinialize but
> +	 * return success allowing this vcpu to be reused.
> +	 */
> +	if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
> +		return 0;
>  
>  	r = kvmppc_xics_create_icp(vcpu, xcpu);
>  	if (!r)
> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
> index 27c0fac..5b7007c 100644
> --- a/arch/powerpc/kvm/powerpc.c
> +++ b/arch/powerpc/kvm/powerpc.c
> @@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>  		r = 1;
>  		break;
>  #endif
> +	case KVM_CAP_SPAPR_REUSE_VCPU:
> +		/*
> +		 * Kernel currently doesn't support closing of vCPU fd from
> +		 * user space (QEMU) correctly. Hence the option available
> +		 * is to park the vCPU fd in user space whenever a guest
> +		 * CPU is hot removed and reuse the same later when another
> +		 * guest CPU is hotplugged. This capability determines whether
> +		 * it is safe to assume if parking of vCPU fd and reuse from
> +		 * user space works for sPAPR guests.

I don't see how the code you're changing here has anything to do with
parking vcpus. It's all about being able to call connect on an already
connected vcpu and not erroring out. Please reflect this in the cap name
and description.

You also need to update Documentation/virtual/kvm/api.txt.

Furthermore, thinking about this a bit more, I might still miss the
exact case why you need this. Why is QEMU issuing a connect again? Could
it maybe just not do it?

Also please just send a patch on top of this one for the CAP. It doesn't
hurt to separate them. Unless we end up concluding that you don't need
it at all, then I'll just remove your patch from my queue again ;).


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
  2015-03-20 11:35         ` Alexander Graf
@ 2015-03-22 22:57           ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-22 22:57 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

On Fri, Mar 20, 2015 at 12:35:43PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 12:25, Paul Mackerras wrote:
> > On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
> >> Have you measure the additional overhead this brings?
> > 
> > I haven't - in fact I did this patch so I could measure the overhead
> > or improvement from other changes I did, but it doesn't measure its
> > own overhead, of course.  I guess I need a workload that does a
> > defined number of guest entries and exits and measure how fast it runs
> > with and without the patch (maybe something like H_SET_MODE in a
> > loop).  I'll figure something out and post the results.  
> 
> Yeah, just measure the number of exits you can handle for a simple
> hcall. If there is measurable overhead, it's probably a good idea to
> move the statistics gathering into #ifdef paths for DEBUGFS or maybe
> even a separate EXIT_TIMING config option as we have it for booke.

For 1-vcpu guest on POWER8, it adds 29ns to the time for an hcall that
is handled in real mode (H_SET_DABR), which is 25%.  It adds 43ns to
the time for an hcall that is handled in the host kernel in virtual
mode (H_PROD), which is 1.2%.

I'll add a config option.

Paul.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code
@ 2015-03-22 22:57           ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-22 22:57 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

On Fri, Mar 20, 2015 at 12:35:43PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 12:25, Paul Mackerras wrote:
> > On Fri, Mar 20, 2015 at 12:15:15PM +0100, Alexander Graf wrote:
> >> Have you measure the additional overhead this brings?
> > 
> > I haven't - in fact I did this patch so I could measure the overhead
> > or improvement from other changes I did, but it doesn't measure its
> > own overhead, of course.  I guess I need a workload that does a
> > defined number of guest entries and exits and measure how fast it runs
> > with and without the patch (maybe something like H_SET_MODE in a
> > loop).  I'll figure something out and post the results.  
> 
> Yeah, just measure the number of exits you can handle for a simple
> hcall. If there is measurable overhead, it's probably a good idea to
> move the statistics gathering into #ifdef paths for DEBUGFS or maybe
> even a separate EXIT_TIMING config option as we have it for booke.

For 1-vcpu guest on POWER8, it adds 29ns to the time for an hcall that
is handled in real mode (H_SET_DABR), which is 25%.  It adds 43ns to
the time for an hcall that is handled in the host kernel in virtual
mode (H_PROD), which is 1.2%.

I'll add a config option.

Paul.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
  2015-03-20 11:28     ` Alexander Graf
@ 2015-03-23  0:44       ` Paul Mackerras
  -1 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-23  0:44 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

On Fri, Mar 20, 2015 at 12:28:25PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 10:39, Paul Mackerras wrote:
> > +	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> > +	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> > +	    (cpu & ~7) == (raw_smp_processor_id() & ~7)) {
> 
> Can we somehow encapsulate the secret knowledge that 8 threads mean one
> core?

Looks like I want:

	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
	    cpu_first_thread_sibling(cpu) ==
	    cpu_first_thread_sibling(raw_smp_processor_id())) {

Paul.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8
@ 2015-03-23  0:44       ` Paul Mackerras
  0 siblings, 0 replies; 80+ messages in thread
From: Paul Mackerras @ 2015-03-23  0:44 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

On Fri, Mar 20, 2015 at 12:28:25PM +0100, Alexander Graf wrote:
> 
> 
> On 20.03.15 10:39, Paul Mackerras wrote:
> > +	/* On POWER8 for IPIs to threads in the same core, use msgsnd */
> > +	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
> > +	    (cpu & ~7) = (raw_smp_processor_id() & ~7)) {
> 
> Can we somehow encapsulate the secret knowledge that 8 threads mean one
> core?

Looks like I want:

	if (cpu_has_feature(CPU_FTR_ARCH_207S) &&
	    cpu_first_thread_sibling(cpu) =
	    cpu_first_thread_sibling(raw_smp_processor_id())) {

Paul.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
  2015-03-21 14:58             ` Alexander Graf
@ 2015-03-23  7:51               ` Bharata B Rao
  -1 siblings, 0 replies; 80+ messages in thread
From: Bharata B Rao @ 2015-03-23  7:50 UTC (permalink / raw)
  To: Alexander Graf; +Cc: bharata, Paul Mackerras, kvm-ppc, kvm

On Sat, Mar 21, 2015 at 8:28 PM, Alexander Graf <agraf@suse.de> wrote:
>
>
> On 20.03.15 16:51, Bharata B Rao wrote:
>> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
>>>
>>>
>>> On 20.03.15 12:26, Paul Mackerras wrote:
>>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 20.03.15 10:39, Paul Mackerras wrote:
>>>>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>>
>>>>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>>>>> correctly, certain work arounds have to be employed to allow reuse of
>>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>>>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>>>>> and reuse it later during next hotplug.
>>>>>>
>>>>>> More details can be found here:
>>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>>>>
>>>>>> In order to support this workaround with PowerPC KVM, don't create or
>>>>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>>>>
>>>>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>>>>
>>>>> This probably makes some sense, but please make sure that user space has
>>>>> some way to figure out whether hotplug works at all.
>>>>
>>>> Bharata is working on the qemu side of all this, so I assume he has
>>>> that covered.
>>>
>>> Well, so far the kernel doesn't expose anything he can query, so I
>>> suppose he just blindly assumes that older host kernels will randomly
>>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
>>> can check on.
>>
>> I see that you have already taken this into your tree. I have an updated
>> patch to expose a CAP. If the below patch looks ok, then let me know how
>> you would prefer to take this patch in.
>>
>> Regards,
>> Bharata.
>>
>> KVM: PPC: BOOK3S: Allow reuse of vCPU object
>>
>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>
>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>> correctly, certain work arounds have to be employed to allow reuse of
>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>> and reuse it later during next hotplug.
>>
>> More details can be found here:
>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>
>> In order to support this workaround with PowerPC KVM, don't create or
>> initialize ICP if the vCPU is found to be already associated with an ICP.
>> User space (QEMU) can reuse the vCPU after checking for the availability
>> of KVM_CAP_SPAPR_REUSE_VCPU capability.
>>
>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kvm/book3s_xics.c |    9 +++++++--
>>  arch/powerpc/kvm/powerpc.c     |   12 ++++++++++++
>>  include/uapi/linux/kvm.h       |    1 +
>>  3 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
>> index a4a8d9f..ead3a35 100644
>> --- a/arch/powerpc/kvm/book3s_xics.c
>> +++ b/arch/powerpc/kvm/book3s_xics.c
>> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
>>               return -EPERM;
>>       if (xics->kvm != vcpu->kvm)
>>               return -EPERM;
>> -     if (vcpu->arch.irq_type)
>> -             return -EBUSY;
>> +
>> +     /*
>> +      * If irq_type is already set, don't reinialize but
>> +      * return success allowing this vcpu to be reused.
>> +      */
>> +     if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
>> +             return 0;
>>
>>       r = kvmppc_xics_create_icp(vcpu, xcpu);
>>       if (!r)
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 27c0fac..5b7007c 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>               r = 1;
>>               break;
>>  #endif
>> +     case KVM_CAP_SPAPR_REUSE_VCPU:
>> +             /*
>> +              * Kernel currently doesn't support closing of vCPU fd from
>> +              * user space (QEMU) correctly. Hence the option available
>> +              * is to park the vCPU fd in user space whenever a guest
>> +              * CPU is hot removed and reuse the same later when another
>> +              * guest CPU is hotplugged. This capability determines whether
>> +              * it is safe to assume if parking of vCPU fd and reuse from
>> +              * user space works for sPAPR guests.
>
> I don't see how the code you're changing here has anything to do with
> parking vcpus. It's all about being able to call connect on an already
> connected vcpu and not erroring out. Please reflect this in the cap name
> and description.
>
> You also need to update Documentation/virtual/kvm/api.txt.
>
> Furthermore, thinking about this a bit more, I might still miss the
> exact case why you need this. Why is QEMU issuing a connect again? Could
> it maybe just not do it?

Thinking more, I see that I can handle this entirely in user space. So
no need for this patch.

Sorry for the noise :(

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
@ 2015-03-23  7:51               ` Bharata B Rao
  0 siblings, 0 replies; 80+ messages in thread
From: Bharata B Rao @ 2015-03-23  7:51 UTC (permalink / raw)
  To: Alexander Graf; +Cc: bharata, Paul Mackerras, kvm-ppc, kvm

On Sat, Mar 21, 2015 at 8:28 PM, Alexander Graf <agraf@suse.de> wrote:
>
>
> On 20.03.15 16:51, Bharata B Rao wrote:
>> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
>>>
>>>
>>> On 20.03.15 12:26, Paul Mackerras wrote:
>>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>>>>
>>>>>
>>>>> On 20.03.15 10:39, Paul Mackerras wrote:
>>>>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>>
>>>>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>>>>> correctly, certain work arounds have to be employed to allow reuse of
>>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>>>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>>>>> and reuse it later during next hotplug.
>>>>>>
>>>>>> More details can be found here:
>>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>>>>
>>>>>> In order to support this workaround with PowerPC KVM, don't create or
>>>>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>>>>
>>>>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>>>>
>>>>> This probably makes some sense, but please make sure that user space has
>>>>> some way to figure out whether hotplug works at all.
>>>>
>>>> Bharata is working on the qemu side of all this, so I assume he has
>>>> that covered.
>>>
>>> Well, so far the kernel doesn't expose anything he can query, so I
>>> suppose he just blindly assumes that older host kernels will randomly
>>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
>>> can check on.
>>
>> I see that you have already taken this into your tree. I have an updated
>> patch to expose a CAP. If the below patch looks ok, then let me know how
>> you would prefer to take this patch in.
>>
>> Regards,
>> Bharata.
>>
>> KVM: PPC: BOOK3S: Allow reuse of vCPU object
>>
>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>
>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>> correctly, certain work arounds have to be employed to allow reuse of
>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>> and reuse it later during next hotplug.
>>
>> More details can be found here:
>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>
>> In order to support this workaround with PowerPC KVM, don't create or
>> initialize ICP if the vCPU is found to be already associated with an ICP.
>> User space (QEMU) can reuse the vCPU after checking for the availability
>> of KVM_CAP_SPAPR_REUSE_VCPU capability.
>>
>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kvm/book3s_xics.c |    9 +++++++--
>>  arch/powerpc/kvm/powerpc.c     |   12 ++++++++++++
>>  include/uapi/linux/kvm.h       |    1 +
>>  3 files changed, 20 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
>> index a4a8d9f..ead3a35 100644
>> --- a/arch/powerpc/kvm/book3s_xics.c
>> +++ b/arch/powerpc/kvm/book3s_xics.c
>> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
>>               return -EPERM;
>>       if (xics->kvm != vcpu->kvm)
>>               return -EPERM;
>> -     if (vcpu->arch.irq_type)
>> -             return -EBUSY;
>> +
>> +     /*
>> +      * If irq_type is already set, don't reinialize but
>> +      * return success allowing this vcpu to be reused.
>> +      */
>> +     if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
>> +             return 0;
>>
>>       r = kvmppc_xics_create_icp(vcpu, xcpu);
>>       if (!r)
>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>> index 27c0fac..5b7007c 100644
>> --- a/arch/powerpc/kvm/powerpc.c
>> +++ b/arch/powerpc/kvm/powerpc.c
>> @@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>               r = 1;
>>               break;
>>  #endif
>> +     case KVM_CAP_SPAPR_REUSE_VCPU:
>> +             /*
>> +              * Kernel currently doesn't support closing of vCPU fd from
>> +              * user space (QEMU) correctly. Hence the option available
>> +              * is to park the vCPU fd in user space whenever a guest
>> +              * CPU is hot removed and reuse the same later when another
>> +              * guest CPU is hotplugged. This capability determines whether
>> +              * it is safe to assume if parking of vCPU fd and reuse from
>> +              * user space works for sPAPR guests.
>
> I don't see how the code you're changing here has anything to do with
> parking vcpus. It's all about being able to call connect on an already
> connected vcpu and not erroring out. Please reflect this in the cap name
> and description.
>
> You also need to update Documentation/virtual/kvm/api.txt.
>
> Furthermore, thinking about this a bit more, I might still miss the
> exact case why you need this. Why is QEMU issuing a connect again? Could
> it maybe just not do it?

Thinking more, I see that I can handle this entirely in user space. So
no need for this patch.

Sorry for the noise :(

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
  2015-03-23  7:51               ` Bharata B Rao
@ 2015-03-23  8:31                 ` Alexander Graf
  -1 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-23  8:31 UTC (permalink / raw)
  To: Bharata B Rao; +Cc: bharata, Paul Mackerras, kvm-ppc, kvm



On 23.03.15 08:50, Bharata B Rao wrote:
> On Sat, Mar 21, 2015 at 8:28 PM, Alexander Graf <agraf@suse.de> wrote:
>>
>>
>> On 20.03.15 16:51, Bharata B Rao wrote:
>>> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 20.03.15 12:26, Paul Mackerras wrote:
>>>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>> On 20.03.15 10:39, Paul Mackerras wrote:
>>>>>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>>>
>>>>>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>>>>>> correctly, certain work arounds have to be employed to allow reuse of
>>>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>>>>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>>>>>> and reuse it later during next hotplug.
>>>>>>>
>>>>>>> More details can be found here:
>>>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>>>>>
>>>>>>> In order to support this workaround with PowerPC KVM, don't create or
>>>>>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>>>>>
>>>>>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>>>>>
>>>>>> This probably makes some sense, but please make sure that user space has
>>>>>> some way to figure out whether hotplug works at all.
>>>>>
>>>>> Bharata is working on the qemu side of all this, so I assume he has
>>>>> that covered.
>>>>
>>>> Well, so far the kernel doesn't expose anything he can query, so I
>>>> suppose he just blindly assumes that older host kernels will randomly
>>>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
>>>> can check on.
>>>
>>> I see that you have already taken this into your tree. I have an updated
>>> patch to expose a CAP. If the below patch looks ok, then let me know how
>>> you would prefer to take this patch in.
>>>
>>> Regards,
>>> Bharata.
>>>
>>> KVM: PPC: BOOK3S: Allow reuse of vCPU object
>>>
>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>
>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>> correctly, certain work arounds have to be employed to allow reuse of
>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>> and reuse it later during next hotplug.
>>>
>>> More details can be found here:
>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>
>>> In order to support this workaround with PowerPC KVM, don't create or
>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>> User space (QEMU) can reuse the vCPU after checking for the availability
>>> of KVM_CAP_SPAPR_REUSE_VCPU capability.
>>>
>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/kvm/book3s_xics.c |    9 +++++++--
>>>  arch/powerpc/kvm/powerpc.c     |   12 ++++++++++++
>>>  include/uapi/linux/kvm.h       |    1 +
>>>  3 files changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
>>> index a4a8d9f..ead3a35 100644
>>> --- a/arch/powerpc/kvm/book3s_xics.c
>>> +++ b/arch/powerpc/kvm/book3s_xics.c
>>> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
>>>               return -EPERM;
>>>       if (xics->kvm != vcpu->kvm)
>>>               return -EPERM;
>>> -     if (vcpu->arch.irq_type)
>>> -             return -EBUSY;
>>> +
>>> +     /*
>>> +      * If irq_type is already set, don't reinialize but
>>> +      * return success allowing this vcpu to be reused.
>>> +      */
>>> +     if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
>>> +             return 0;
>>>
>>>       r = kvmppc_xics_create_icp(vcpu, xcpu);
>>>       if (!r)
>>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>>> index 27c0fac..5b7007c 100644
>>> --- a/arch/powerpc/kvm/powerpc.c
>>> +++ b/arch/powerpc/kvm/powerpc.c
>>> @@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>               r = 1;
>>>               break;
>>>  #endif
>>> +     case KVM_CAP_SPAPR_REUSE_VCPU:
>>> +             /*
>>> +              * Kernel currently doesn't support closing of vCPU fd from
>>> +              * user space (QEMU) correctly. Hence the option available
>>> +              * is to park the vCPU fd in user space whenever a guest
>>> +              * CPU is hot removed and reuse the same later when another
>>> +              * guest CPU is hotplugged. This capability determines whether
>>> +              * it is safe to assume if parking of vCPU fd and reuse from
>>> +              * user space works for sPAPR guests.
>>
>> I don't see how the code you're changing here has anything to do with
>> parking vcpus. It's all about being able to call connect on an already
>> connected vcpu and not erroring out. Please reflect this in the cap name
>> and description.
>>
>> You also need to update Documentation/virtual/kvm/api.txt.
>>
>> Furthermore, thinking about this a bit more, I might still miss the
>> exact case why you need this. Why is QEMU issuing a connect again? Could
>> it maybe just not do it?
> 
> Thinking more, I see that I can handle this entirely in user space. So
> no need for this patch.
> 
> Sorry for the noise :(

Perfect, removed this patch from my queue again :).


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object
@ 2015-03-23  8:31                 ` Alexander Graf
  0 siblings, 0 replies; 80+ messages in thread
From: Alexander Graf @ 2015-03-23  8:31 UTC (permalink / raw)
  To: Bharata B Rao; +Cc: bharata, Paul Mackerras, kvm-ppc, kvm



On 23.03.15 08:50, Bharata B Rao wrote:
> On Sat, Mar 21, 2015 at 8:28 PM, Alexander Graf <agraf@suse.de> wrote:
>>
>>
>> On 20.03.15 16:51, Bharata B Rao wrote:
>>> On Fri, Mar 20, 2015 at 12:34:18PM +0100, Alexander Graf wrote:
>>>>
>>>>
>>>> On 20.03.15 12:26, Paul Mackerras wrote:
>>>>> On Fri, Mar 20, 2015 at 12:01:32PM +0100, Alexander Graf wrote:
>>>>>>
>>>>>>
>>>>>> On 20.03.15 10:39, Paul Mackerras wrote:
>>>>>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>>>
>>>>>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>>>>>> correctly, certain work arounds have to be employed to allow reuse of
>>>>>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>>>>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>>>>>> and reuse it later during next hotplug.
>>>>>>>
>>>>>>> More details can be found here:
>>>>>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>>>>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>>>>>
>>>>>>> In order to support this workaround with PowerPC KVM, don't create or
>>>>>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>>>>>>
>>>>>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>>>>> Signed-off-by: Paul Mackerras <paulus@samba.org>
>>>>>>
>>>>>> This probably makes some sense, but please make sure that user space has
>>>>>> some way to figure out whether hotplug works at all.
>>>>>
>>>>> Bharata is working on the qemu side of all this, so I assume he has
>>>>> that covered.
>>>>
>>>> Well, so far the kernel doesn't expose anything he can query, so I
>>>> suppose he just blindly assumes that older host kernels will randomly
>>>> break and nobody cares. I'd rather prefer to see a CAP exposed that qemu
>>>> can check on.
>>>
>>> I see that you have already taken this into your tree. I have an updated
>>> patch to expose a CAP. If the below patch looks ok, then let me know how
>>> you would prefer to take this patch in.
>>>
>>> Regards,
>>> Bharata.
>>>
>>> KVM: PPC: BOOK3S: Allow reuse of vCPU object
>>>
>>> From: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>>
>>> Since KVM isn't equipped to handle closure of vcpu fd from userspace(QEMU)
>>> correctly, certain work arounds have to be employed to allow reuse of
>>> vcpu array slot in KVM during cpu hot plug/unplug from guest. One such
>>> proposed workaround is to park the vcpu fd in userspace during cpu unplug
>>> and reuse it later during next hotplug.
>>>
>>> More details can be found here:
>>> KVM: https://www.mail-archive.com/kvm@vger.kernel.org/msg102839.html
>>> QEMU: http://lists.gnu.org/archive/html/qemu-devel/2014-12/msg00859.html
>>>
>>> In order to support this workaround with PowerPC KVM, don't create or
>>> initialize ICP if the vCPU is found to be already associated with an ICP.
>>> User space (QEMU) can reuse the vCPU after checking for the availability
>>> of KVM_CAP_SPAPR_REUSE_VCPU capability.
>>>
>>> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
>>> ---
>>>  arch/powerpc/kvm/book3s_xics.c |    9 +++++++--
>>>  arch/powerpc/kvm/powerpc.c     |   12 ++++++++++++
>>>  include/uapi/linux/kvm.h       |    1 +
>>>  3 files changed, 20 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
>>> index a4a8d9f..ead3a35 100644
>>> --- a/arch/powerpc/kvm/book3s_xics.c
>>> +++ b/arch/powerpc/kvm/book3s_xics.c
>>> @@ -1313,8 +1313,13 @@ int kvmppc_xics_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu,
>>>               return -EPERM;
>>>       if (xics->kvm != vcpu->kvm)
>>>               return -EPERM;
>>> -     if (vcpu->arch.irq_type)
>>> -             return -EBUSY;
>>> +
>>> +     /*
>>> +      * If irq_type is already set, don't reinialize but
>>> +      * return success allowing this vcpu to be reused.
>>> +      */
>>> +     if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT)
>>> +             return 0;
>>>
>>>       r = kvmppc_xics_create_icp(vcpu, xcpu);
>>>       if (!r)
>>> diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
>>> index 27c0fac..5b7007c 100644
>>> --- a/arch/powerpc/kvm/powerpc.c
>>> +++ b/arch/powerpc/kvm/powerpc.c
>>> @@ -564,6 +564,18 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>               r = 1;
>>>               break;
>>>  #endif
>>> +     case KVM_CAP_SPAPR_REUSE_VCPU:
>>> +             /*
>>> +              * Kernel currently doesn't support closing of vCPU fd from
>>> +              * user space (QEMU) correctly. Hence the option available
>>> +              * is to park the vCPU fd in user space whenever a guest
>>> +              * CPU is hot removed and reuse the same later when another
>>> +              * guest CPU is hotplugged. This capability determines whether
>>> +              * it is safe to assume if parking of vCPU fd and reuse from
>>> +              * user space works for sPAPR guests.
>>
>> I don't see how the code you're changing here has anything to do with
>> parking vcpus. It's all about being able to call connect on an already
>> connected vcpu and not erroring out. Please reflect this in the cap name
>> and description.
>>
>> You also need to update Documentation/virtual/kvm/api.txt.
>>
>> Furthermore, thinking about this a bit more, I might still miss the
>> exact case why you need this. Why is QEMU issuing a connect again? Could
>> it maybe just not do it?
> 
> Thinking more, I see that I can handle this entirely in user space. So
> no need for this patch.
> 
> Sorry for the noise :(

Perfect, removed this patch from my queue again :).


Alex

^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2015-03-23  8:31 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-20  9:39 [PATCH 00/23] Bug fixes and improvements for HV KVM Paul Mackerras
2015-03-20  9:39 ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 01/23] KVM: PPC: Book3S HV: Fix spinlock/mutex ordering issue in kvmppc_set_lpcr() Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 02/23] KVM: PPC: Book3S HV: Endian fix for accessing VPA yield count Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 03/23] KVM: PPC: Book3S HV: Fix instruction emulation Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 04/23] KVM: PPC: Book3S HV: Add fast real-mode H_RANDOM implementation Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 05/23] KVM: PPC: Book3S HV: Remove RMA-related variables from code Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 06/23] KVM: PPC: Book3S HV: Add helpers for lock/unlock hpte Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 07/23] KVM: PPC: Book3S: Allow reuse of vCPU object Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20 11:01   ` Alexander Graf
2015-03-20 11:01     ` Alexander Graf
2015-03-20 11:26     ` Paul Mackerras
2015-03-20 11:26       ` Paul Mackerras
2015-03-20 11:34       ` Alexander Graf
2015-03-20 11:34         ` Alexander Graf
2015-03-20 15:51         ` Bharata B Rao
2015-03-20 15:51           ` Bharata B Rao
2015-03-21 14:58           ` Alexander Graf
2015-03-21 14:58             ` Alexander Graf
2015-03-23  7:50             ` Bharata B Rao
2015-03-23  7:51               ` Bharata B Rao
2015-03-23  8:31               ` Alexander Graf
2015-03-23  8:31                 ` Alexander Graf
2015-03-20  9:39 ` [PATCH 08/23] KVM: PPC: Book3S HV: Add guest->host real mode completion counters Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 09/23] KVM: PPC: Book3S HV: Convert ICS mutex lock to spin lock Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 10/23] KVM: PPC: Book3S HV: Move virtual mode ICP functions to real-mode Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 11/23] KVM: PPC: Book3S HV: Add ICP real mode counters Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 12/23] KVM: PPC: Book3S HV: Create debugfs file for each guest's HPT Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20 11:20   ` Alexander Graf
2015-03-20 11:20     ` Alexander Graf
2015-03-20  9:39 ` [PATCH 13/23] KVM: PPC: Book3S HV: Accumulate timing information for real-mode code Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20 11:15   ` Alexander Graf
2015-03-20 11:15     ` Alexander Graf
2015-03-20 11:25     ` Paul Mackerras
2015-03-20 11:25       ` Paul Mackerras
2015-03-20 11:35       ` Alexander Graf
2015-03-20 11:35         ` Alexander Graf
2015-03-22 22:57         ` Paul Mackerras
2015-03-22 22:57           ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 14/23] KVM: PPC: Book3S HV: Simplify handling of VCPUs that need a VPA update Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 15/23] KVM: PPC: Book3S HV: Minor cleanups Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 16/23] KVM: PPC: Book3S HV: Move vcore preemption point up into kvmppc_run_vcpu Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 17/23] KVM: PPC: Book3S HV: Get rid of vcore nap_count and n_woken Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 18/23] KVM: PPC: Book3S HV: Don't wake thread with no vcpu on guest IPI Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 19/23] KVM: PPC: Book3S HV: Use decrementer to wake napping threads Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 20/23] KVM: PPC: Book3S HV: Use msgsnd for signalling threads on POWER8 Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20 11:28   ` Alexander Graf
2015-03-20 11:28     ` Alexander Graf
2015-03-23  0:44     ` Paul Mackerras
2015-03-23  0:44       ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 21/23] KVM: PPC: Book3S HV: Streamline guest entry and exit Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:39 ` [PATCH 22/23] KVM: PPC: Book3S HV: Use bitmap of active threads rather than count Paul Mackerras
2015-03-20  9:39   ` Paul Mackerras
2015-03-20  9:40 ` [PATCH 23/23] KVM: PPC: Book3S HV: Translate kvmhv_commence_exit to C Paul Mackerras
2015-03-20  9:40   ` Paul Mackerras
2015-03-20 10:45 ` [PATCH 00/23] Bug fixes and improvements for HV KVM Alexander Graf
2015-03-20 10:45   ` Alexander Graf
2015-03-20 11:36 ` Alexander Graf
2015-03-20 11:36   ` Alexander Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.