All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23  8:15 ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman

This patch series implements split core mode on POWER8.  This enables up to 4
subcores per core which can each independently run guests (per guest SPRs like
SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
feature in the code and commit messages.

Most of this code is in the powernv platform but there's a couple of KVM
specific patches too.

Patch series authored by mpe and me with a few bug fixes from others.

v2:
  There are some minor updates based on comments and I've added the Acks by
  Paulus and Alex for the KVM code.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23  8:15 ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm

This patch series implements split core mode on POWER8.  This enables up to 4
subcores per core which can each independently run guests (per guest SPRs like
SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
feature in the code and commit messages.

Most of this code is in the powernv platform but there's a couple of KVM
specific patches too.

Patch series authored by mpe and me with a few bug fixes from others.

v2:
  There are some minor updates based on comments and I've added the Acks by
  Paulus and Alex for the KVM code.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23  8:15 ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman

This patch series implements split core mode on POWER8.  This enables up to 4
subcores per core which can each independently run guests (per guest SPRs like
SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
feature in the code and commit messages.

Most of this code is in the powernv platform but there's a couple of KVM
specific patches too.

Patch series authored by mpe and me with a few bug fixes from others.

v2:
  There are some minor updates based on comments and I've added the Acks by
  Paulus and Alex for the KVM code.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v2 1/6] KVM: PPC: Book3S HV: Rework the secondary inhibit code
  2014-05-23  8:15 ` Michael Neuling
  (?)
@ 2014-05-23  8:15   ` Michael Neuling
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

As part of the support for split core on POWER8, we want to be able to
block splitting of the core while KVM VMs are active.

The logic to do that would be exactly the same as the code we currently
have for inhibiting onlining of secondaries.

Instead of adding an identical mechanism to block split core, rework the
secondary inhibit code to be a "HV KVM is active" check. We can then use
that in both the cpu hotplug code and the upcoming split core code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_ppc.h   |  7 +++++++
 arch/powerpc/include/asm/smp.h       |  8 --------
 arch/powerpc/kernel/smp.c            | 34 +++-------------------------------
 arch/powerpc/kvm/book3s_hv.c         |  8 ++++----
 arch/powerpc/kvm/book3s_hv_builtin.c | 31 +++++++++++++++++++++++++++++++
 5 files changed, 45 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 4096f16..2c8e399 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -337,6 +337,10 @@ static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
 	vcpu->kvm->arch.kvm_ops->fast_vcpu_kick(vcpu);
 }
 
+extern void kvm_hv_vm_activated(void);
+extern void kvm_hv_vm_deactivated(void);
+extern bool kvm_hv_mode_active(void);
+
 #else
 static inline void __init kvm_cma_reserve(void)
 {}
@@ -356,6 +360,9 @@ static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
 {
 	kvm_vcpu_kick(vcpu);
 }
+
+static inline bool kvm_hv_mode_active(void)		{ return false; }
+
 #endif
 
 #ifdef CONFIG_KVM_XICS
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ff51046..5a6614a 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -68,14 +68,6 @@ void generic_mach_cpu_die(void);
 void generic_set_cpu_dead(unsigned int cpu);
 void generic_set_cpu_up(unsigned int cpu);
 int generic_check_cpu_restart(unsigned int cpu);
-
-extern void inhibit_secondary_onlining(void);
-extern void uninhibit_secondary_onlining(void);
-
-#else /* HOTPLUG_CPU */
-static inline void inhibit_secondary_onlining(void) {}
-static inline void uninhibit_secondary_onlining(void) {}
-
 #endif
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e2a4232..6edae3d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -36,6 +36,7 @@
 #include <linux/atomic.h>
 #include <asm/irq.h>
 #include <asm/hw_irq.h>
+#include <asm/kvm_ppc.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/prom.h>
@@ -457,38 +458,9 @@ int generic_check_cpu_restart(unsigned int cpu)
 	return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE;
 }
 
-static atomic_t secondary_inhibit_count;
-
-/*
- * Don't allow secondary CPU threads to come online
- */
-void inhibit_secondary_onlining(void)
-{
-	/*
-	 * This makes secondary_inhibit_count stable during cpu
-	 * online/offline operations.
-	 */
-	get_online_cpus();
-
-	atomic_inc(&secondary_inhibit_count);
-	put_online_cpus();
-}
-EXPORT_SYMBOL_GPL(inhibit_secondary_onlining);
-
-/*
- * Allow secondary CPU threads to come online again
- */
-void uninhibit_secondary_onlining(void)
-{
-	get_online_cpus();
-	atomic_dec(&secondary_inhibit_count);
-	put_online_cpus();
-}
-EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining);
-
-static int secondaries_inhibited(void)
+static bool secondaries_inhibited(void)
 {
-	return atomic_read(&secondary_inhibit_count);
+	return kvm_hv_mode_active();
 }
 
 #else /* HOTPLUG_CPU */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8227dba..d7b74f8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2317,10 +2317,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	spin_lock_init(&kvm->arch.slot_phys_lock);
 
 	/*
-	 * Don't allow secondary CPU threads to come online
-	 * while any KVM VMs exist.
+	 * Track that we now have a HV mode VM active. This blocks secondary
+	 * CPU threads from coming online.
 	 */
-	inhibit_secondary_onlining();
+	kvm_hv_vm_activated();
 
 	return 0;
 }
@@ -2336,7 +2336,7 @@ static void kvmppc_free_vcores(struct kvm *kvm)
 
 static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
-	uninhibit_secondary_onlining();
+	kvm_hv_vm_deactivated();
 
 	kvmppc_free_vcores(kvm);
 	if (kvm->arch.rma) {
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 8cd0dae..7cde8a6 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -6,6 +6,7 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/cpu.h>
 #include <linux/kvm_host.h>
 #include <linux/preempt.h>
 #include <linux/export.h>
@@ -181,3 +182,33 @@ void __init kvm_cma_reserve(void)
 		kvm_cma_declare_contiguous(selected_size, align_size);
 	}
 }
+
+/*
+ * When running HV mode KVM we need to block certain operations while KVM VMs
+ * exist in the system. We use a counter of VMs to track this.
+ *
+ * One of the operations we need to block is onlining of secondaries, so we
+ * protect hv_vm_count with get/put_online_cpus().
+ */
+static atomic_t hv_vm_count;
+
+void kvm_hv_vm_activated(void)
+{
+	get_online_cpus();
+	atomic_inc(&hv_vm_count);
+	put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(kvm_hv_vm_activated);
+
+void kvm_hv_vm_deactivated(void)
+{
+	get_online_cpus();
+	atomic_dec(&hv_vm_count);
+	put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(kvm_hv_vm_deactivated);
+
+bool kvm_hv_mode_active(void)
+{
+	return atomic_read(&hv_vm_count) != 0;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 1/6] KVM: PPC: Book3S HV: Rework the secondary inhibit code
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Michael Neuling, kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

From: Michael Ellerman <mpe@ellerman.id.au>

As part of the support for split core on POWER8, we want to be able to
block splitting of the core while KVM VMs are active.

The logic to do that would be exactly the same as the code we currently
have for inhibiting onlining of secondaries.

Instead of adding an identical mechanism to block split core, rework the
secondary inhibit code to be a "HV KVM is active" check. We can then use
that in both the cpu hotplug code and the upcoming split core code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_ppc.h   |  7 +++++++
 arch/powerpc/include/asm/smp.h       |  8 --------
 arch/powerpc/kernel/smp.c            | 34 +++-------------------------------
 arch/powerpc/kvm/book3s_hv.c         |  8 ++++----
 arch/powerpc/kvm/book3s_hv_builtin.c | 31 +++++++++++++++++++++++++++++++
 5 files changed, 45 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 4096f16..2c8e399 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -337,6 +337,10 @@ static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
 	vcpu->kvm->arch.kvm_ops->fast_vcpu_kick(vcpu);
 }
 
+extern void kvm_hv_vm_activated(void);
+extern void kvm_hv_vm_deactivated(void);
+extern bool kvm_hv_mode_active(void);
+
 #else
 static inline void __init kvm_cma_reserve(void)
 {}
@@ -356,6 +360,9 @@ static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
 {
 	kvm_vcpu_kick(vcpu);
 }
+
+static inline bool kvm_hv_mode_active(void)		{ return false; }
+
 #endif
 
 #ifdef CONFIG_KVM_XICS
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ff51046..5a6614a 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -68,14 +68,6 @@ void generic_mach_cpu_die(void);
 void generic_set_cpu_dead(unsigned int cpu);
 void generic_set_cpu_up(unsigned int cpu);
 int generic_check_cpu_restart(unsigned int cpu);
-
-extern void inhibit_secondary_onlining(void);
-extern void uninhibit_secondary_onlining(void);
-
-#else /* HOTPLUG_CPU */
-static inline void inhibit_secondary_onlining(void) {}
-static inline void uninhibit_secondary_onlining(void) {}
-
 #endif
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e2a4232..6edae3d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -36,6 +36,7 @@
 #include <linux/atomic.h>
 #include <asm/irq.h>
 #include <asm/hw_irq.h>
+#include <asm/kvm_ppc.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/prom.h>
@@ -457,38 +458,9 @@ int generic_check_cpu_restart(unsigned int cpu)
 	return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE;
 }
 
-static atomic_t secondary_inhibit_count;
-
-/*
- * Don't allow secondary CPU threads to come online
- */
-void inhibit_secondary_onlining(void)
-{
-	/*
-	 * This makes secondary_inhibit_count stable during cpu
-	 * online/offline operations.
-	 */
-	get_online_cpus();
-
-	atomic_inc(&secondary_inhibit_count);
-	put_online_cpus();
-}
-EXPORT_SYMBOL_GPL(inhibit_secondary_onlining);
-
-/*
- * Allow secondary CPU threads to come online again
- */
-void uninhibit_secondary_onlining(void)
-{
-	get_online_cpus();
-	atomic_dec(&secondary_inhibit_count);
-	put_online_cpus();
-}
-EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining);
-
-static int secondaries_inhibited(void)
+static bool secondaries_inhibited(void)
 {
-	return atomic_read(&secondary_inhibit_count);
+	return kvm_hv_mode_active();
 }
 
 #else /* HOTPLUG_CPU */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8227dba..d7b74f8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2317,10 +2317,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	spin_lock_init(&kvm->arch.slot_phys_lock);
 
 	/*
-	 * Don't allow secondary CPU threads to come online
-	 * while any KVM VMs exist.
+	 * Track that we now have a HV mode VM active. This blocks secondary
+	 * CPU threads from coming online.
 	 */
-	inhibit_secondary_onlining();
+	kvm_hv_vm_activated();
 
 	return 0;
 }
@@ -2336,7 +2336,7 @@ static void kvmppc_free_vcores(struct kvm *kvm)
 
 static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
-	uninhibit_secondary_onlining();
+	kvm_hv_vm_deactivated();
 
 	kvmppc_free_vcores(kvm);
 	if (kvm->arch.rma) {
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 8cd0dae..7cde8a6 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -6,6 +6,7 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/cpu.h>
 #include <linux/kvm_host.h>
 #include <linux/preempt.h>
 #include <linux/export.h>
@@ -181,3 +182,33 @@ void __init kvm_cma_reserve(void)
 		kvm_cma_declare_contiguous(selected_size, align_size);
 	}
 }
+
+/*
+ * When running HV mode KVM we need to block certain operations while KVM VMs
+ * exist in the system. We use a counter of VMs to track this.
+ *
+ * One of the operations we need to block is onlining of secondaries, so we
+ * protect hv_vm_count with get/put_online_cpus().
+ */
+static atomic_t hv_vm_count;
+
+void kvm_hv_vm_activated(void)
+{
+	get_online_cpus();
+	atomic_inc(&hv_vm_count);
+	put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(kvm_hv_vm_activated);
+
+void kvm_hv_vm_deactivated(void)
+{
+	get_online_cpus();
+	atomic_dec(&hv_vm_count);
+	put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(kvm_hv_vm_deactivated);
+
+bool kvm_hv_mode_active(void)
+{
+	return atomic_read(&hv_vm_count) != 0;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 1/6] KVM: PPC: Book3S HV: Rework the secondary inhibit code
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

As part of the support for split core on POWER8, we want to be able to
block splitting of the core while KVM VMs are active.

The logic to do that would be exactly the same as the code we currently
have for inhibiting onlining of secondaries.

Instead of adding an identical mechanism to block split core, rework the
secondary inhibit code to be a "HV KVM is active" check. We can then use
that in both the cpu hotplug code and the upcoming split core code.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_ppc.h   |  7 +++++++
 arch/powerpc/include/asm/smp.h       |  8 --------
 arch/powerpc/kernel/smp.c            | 34 +++-------------------------------
 arch/powerpc/kvm/book3s_hv.c         |  8 ++++----
 arch/powerpc/kvm/book3s_hv_builtin.c | 31 +++++++++++++++++++++++++++++++
 5 files changed, 45 insertions(+), 43 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 4096f16..2c8e399 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -337,6 +337,10 @@ static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
 	vcpu->kvm->arch.kvm_ops->fast_vcpu_kick(vcpu);
 }
 
+extern void kvm_hv_vm_activated(void);
+extern void kvm_hv_vm_deactivated(void);
+extern bool kvm_hv_mode_active(void);
+
 #else
 static inline void __init kvm_cma_reserve(void)
 {}
@@ -356,6 +360,9 @@ static inline void kvmppc_fast_vcpu_kick(struct kvm_vcpu *vcpu)
 {
 	kvm_vcpu_kick(vcpu);
 }
+
+static inline bool kvm_hv_mode_active(void)		{ return false; }
+
 #endif
 
 #ifdef CONFIG_KVM_XICS
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ff51046..5a6614a 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -68,14 +68,6 @@ void generic_mach_cpu_die(void);
 void generic_set_cpu_dead(unsigned int cpu);
 void generic_set_cpu_up(unsigned int cpu);
 int generic_check_cpu_restart(unsigned int cpu);
-
-extern void inhibit_secondary_onlining(void);
-extern void uninhibit_secondary_onlining(void);
-
-#else /* HOTPLUG_CPU */
-static inline void inhibit_secondary_onlining(void) {}
-static inline void uninhibit_secondary_onlining(void) {}
-
 #endif
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index e2a4232..6edae3d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -36,6 +36,7 @@
 #include <linux/atomic.h>
 #include <asm/irq.h>
 #include <asm/hw_irq.h>
+#include <asm/kvm_ppc.h>
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/prom.h>
@@ -457,38 +458,9 @@ int generic_check_cpu_restart(unsigned int cpu)
 	return per_cpu(cpu_state, cpu) = CPU_UP_PREPARE;
 }
 
-static atomic_t secondary_inhibit_count;
-
-/*
- * Don't allow secondary CPU threads to come online
- */
-void inhibit_secondary_onlining(void)
-{
-	/*
-	 * This makes secondary_inhibit_count stable during cpu
-	 * online/offline operations.
-	 */
-	get_online_cpus();
-
-	atomic_inc(&secondary_inhibit_count);
-	put_online_cpus();
-}
-EXPORT_SYMBOL_GPL(inhibit_secondary_onlining);
-
-/*
- * Allow secondary CPU threads to come online again
- */
-void uninhibit_secondary_onlining(void)
-{
-	get_online_cpus();
-	atomic_dec(&secondary_inhibit_count);
-	put_online_cpus();
-}
-EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining);
-
-static int secondaries_inhibited(void)
+static bool secondaries_inhibited(void)
 {
-	return atomic_read(&secondary_inhibit_count);
+	return kvm_hv_mode_active();
 }
 
 #else /* HOTPLUG_CPU */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8227dba..d7b74f8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -2317,10 +2317,10 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	spin_lock_init(&kvm->arch.slot_phys_lock);
 
 	/*
-	 * Don't allow secondary CPU threads to come online
-	 * while any KVM VMs exist.
+	 * Track that we now have a HV mode VM active. This blocks secondary
+	 * CPU threads from coming online.
 	 */
-	inhibit_secondary_onlining();
+	kvm_hv_vm_activated();
 
 	return 0;
 }
@@ -2336,7 +2336,7 @@ static void kvmppc_free_vcores(struct kvm *kvm)
 
 static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)
 {
-	uninhibit_secondary_onlining();
+	kvm_hv_vm_deactivated();
 
 	kvmppc_free_vcores(kvm);
 	if (kvm->arch.rma) {
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 8cd0dae..7cde8a6 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -6,6 +6,7 @@
  * published by the Free Software Foundation.
  */
 
+#include <linux/cpu.h>
 #include <linux/kvm_host.h>
 #include <linux/preempt.h>
 #include <linux/export.h>
@@ -181,3 +182,33 @@ void __init kvm_cma_reserve(void)
 		kvm_cma_declare_contiguous(selected_size, align_size);
 	}
 }
+
+/*
+ * When running HV mode KVM we need to block certain operations while KVM VMs
+ * exist in the system. We use a counter of VMs to track this.
+ *
+ * One of the operations we need to block is onlining of secondaries, so we
+ * protect hv_vm_count with get/put_online_cpus().
+ */
+static atomic_t hv_vm_count;
+
+void kvm_hv_vm_activated(void)
+{
+	get_online_cpus();
+	atomic_inc(&hv_vm_count);
+	put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(kvm_hv_vm_activated);
+
+void kvm_hv_vm_deactivated(void)
+{
+	get_online_cpus();
+	atomic_dec(&hv_vm_count);
+	put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(kvm_hv_vm_deactivated);
+
+bool kvm_hv_mode_active(void)
+{
+	return atomic_read(&hv_vm_count) != 0;
+}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/6] powerpc/powernv: Make it possible to skip the IRQHAPPENED check in power7_nap()
  2014-05-23  8:15 ` Michael Neuling
  (?)
@ 2014-05-23  8:15   ` Michael Neuling
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

To support split core we need to be able to force all secondaries into
nap, so the core can detect they are idle and do an unsplit.

Currently power7_nap() will return without napping if there is an irq
pending. We want to ignore the pending irq and nap anyway, we will deal
with the interrupt later.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/processor.h | 2 +-
 arch/powerpc/kernel/idle_power7.S    | 9 +++++++++
 arch/powerpc/platforms/powernv/smp.c | 2 +-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index d660dc3..6d59072 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -449,7 +449,7 @@ extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
-extern void power7_nap(void);
+extern void power7_nap(int check_irq);
 extern void power7_sleep(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index dca6e16..2480256 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -39,6 +39,10 @@
  * Pass requested state in r3:
  * 	0 - nap
  * 	1 - sleep
+ *
+ * To check IRQ_HAPPENED in r4
+ * 	0 - don't check
+ * 	1 - check
  */
 _GLOBAL(power7_powersave_common)
 	/* Use r3 to pass state nap/sleep/winkle */
@@ -71,6 +75,8 @@ _GLOBAL(power7_powersave_common)
 	lbz	r0,PACAIRQHAPPENED(r13)
 	cmpwi	cr0,r0,0
 	beq	1f
+	cmpwi	cr0,r4,0
+	beq	1f
 	addi	r1,r1,INT_FRAME_SIZE
 	ld	r0,16(r1)
 	mtlr	r0
@@ -114,15 +120,18 @@ _GLOBAL(power7_idle)
 	lwz	r4,ADDROFF(powersave_nap)(r3)
 	cmpwi	0,r4,0
 	beqlr
+	li	r3, 1
 	/* fall through */
 
 _GLOBAL(power7_nap)
+	mr	r4,r3
 	li	r3,0
 	b	power7_powersave_common
 	/* No return */
 
 _GLOBAL(power7_sleep)
 	li	r3,1
+	li	r4,0
 	b	power7_powersave_common
 	/* No return */
 
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 1601a1e..65faf99 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -159,7 +159,7 @@ static void pnv_smp_cpu_kill_self(void)
 	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
 	while (!generic_check_cpu_restart(cpu)) {
 		ppc64_runlatch_off();
-		power7_nap();
+		power7_nap(1);
 		ppc64_runlatch_on();
 		if (!generic_check_cpu_restart(cpu)) {
 			DBG("CPU%d Unexpected exit while offline !\n", cpu);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/6] powerpc/powernv: Make it possible to skip the IRQHAPPENED check in power7_nap()
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Michael Neuling, kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

From: Michael Ellerman <mpe@ellerman.id.au>

To support split core we need to be able to force all secondaries into
nap, so the core can detect they are idle and do an unsplit.

Currently power7_nap() will return without napping if there is an irq
pending. We want to ignore the pending irq and nap anyway, we will deal
with the interrupt later.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/processor.h | 2 +-
 arch/powerpc/kernel/idle_power7.S    | 9 +++++++++
 arch/powerpc/platforms/powernv/smp.c | 2 +-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index d660dc3..6d59072 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -449,7 +449,7 @@ extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
-extern void power7_nap(void);
+extern void power7_nap(int check_irq);
 extern void power7_sleep(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index dca6e16..2480256 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -39,6 +39,10 @@
  * Pass requested state in r3:
  * 	0 - nap
  * 	1 - sleep
+ *
+ * To check IRQ_HAPPENED in r4
+ * 	0 - don't check
+ * 	1 - check
  */
 _GLOBAL(power7_powersave_common)
 	/* Use r3 to pass state nap/sleep/winkle */
@@ -71,6 +75,8 @@ _GLOBAL(power7_powersave_common)
 	lbz	r0,PACAIRQHAPPENED(r13)
 	cmpwi	cr0,r0,0
 	beq	1f
+	cmpwi	cr0,r4,0
+	beq	1f
 	addi	r1,r1,INT_FRAME_SIZE
 	ld	r0,16(r1)
 	mtlr	r0
@@ -114,15 +120,18 @@ _GLOBAL(power7_idle)
 	lwz	r4,ADDROFF(powersave_nap)(r3)
 	cmpwi	0,r4,0
 	beqlr
+	li	r3, 1
 	/* fall through */
 
 _GLOBAL(power7_nap)
+	mr	r4,r3
 	li	r3,0
 	b	power7_powersave_common
 	/* No return */
 
 _GLOBAL(power7_sleep)
 	li	r3,1
+	li	r4,0
 	b	power7_powersave_common
 	/* No return */
 
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 1601a1e..65faf99 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -159,7 +159,7 @@ static void pnv_smp_cpu_kill_self(void)
 	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
 	while (!generic_check_cpu_restart(cpu)) {
 		ppc64_runlatch_off();
-		power7_nap();
+		power7_nap(1);
 		ppc64_runlatch_on();
 		if (!generic_check_cpu_restart(cpu)) {
 			DBG("CPU%d Unexpected exit while offline !\n", cpu);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 2/6] powerpc/powernv: Make it possible to skip the IRQHAPPENED check in power7_nap()
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

To support split core we need to be able to force all secondaries into
nap, so the core can detect they are idle and do an unsplit.

Currently power7_nap() will return without napping if there is an irq
pending. We want to ignore the pending irq and nap anyway, we will deal
with the interrupt later.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/processor.h | 2 +-
 arch/powerpc/kernel/idle_power7.S    | 9 +++++++++
 arch/powerpc/platforms/powernv/smp.c | 2 +-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index d660dc3..6d59072 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -449,7 +449,7 @@ extern unsigned long cpuidle_disable;
 enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
 
 extern int powersave_nap;	/* set if nap mode can be used in idle loop */
-extern void power7_nap(void);
+extern void power7_nap(int check_irq);
 extern void power7_sleep(void);
 extern void flush_instruction_cache(void);
 extern void hard_reset_now(void);
diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S
index dca6e16..2480256 100644
--- a/arch/powerpc/kernel/idle_power7.S
+++ b/arch/powerpc/kernel/idle_power7.S
@@ -39,6 +39,10 @@
  * Pass requested state in r3:
  * 	0 - nap
  * 	1 - sleep
+ *
+ * To check IRQ_HAPPENED in r4
+ * 	0 - don't check
+ * 	1 - check
  */
 _GLOBAL(power7_powersave_common)
 	/* Use r3 to pass state nap/sleep/winkle */
@@ -71,6 +75,8 @@ _GLOBAL(power7_powersave_common)
 	lbz	r0,PACAIRQHAPPENED(r13)
 	cmpwi	cr0,r0,0
 	beq	1f
+	cmpwi	cr0,r4,0
+	beq	1f
 	addi	r1,r1,INT_FRAME_SIZE
 	ld	r0,16(r1)
 	mtlr	r0
@@ -114,15 +120,18 @@ _GLOBAL(power7_idle)
 	lwz	r4,ADDROFF(powersave_nap)(r3)
 	cmpwi	0,r4,0
 	beqlr
+	li	r3, 1
 	/* fall through */
 
 _GLOBAL(power7_nap)
+	mr	r4,r3
 	li	r3,0
 	b	power7_powersave_common
 	/* No return */
 
 _GLOBAL(power7_sleep)
 	li	r3,1
+	li	r4,0
 	b	power7_powersave_common
 	/* No return */
 
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 1601a1e..65faf99 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -159,7 +159,7 @@ static void pnv_smp_cpu_kill_self(void)
 	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~(u64)LPCR_PECE1);
 	while (!generic_check_cpu_restart(cpu)) {
 		ppc64_runlatch_off();
-		power7_nap();
+		power7_nap(1);
 		ppc64_runlatch_on();
 		if (!generic_check_cpu_restart(cpu)) {
 			DBG("CPU%d Unexpected exit while offline !\n", cpu);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/6] powerpc: Add threads_per_subcore
  2014-05-23  8:15 ` Michael Neuling
  (?)
@ 2014-05-23  8:15   ` Michael Neuling
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

On POWER8 we have a new concept of a subcore. This is what happens when
you take a regular core and split it. A subcore is a grouping of two or
four SMT threads, as well as a handfull of SPRs which allows the subcore
to appear as if it were a core from the point of view of a guest.

Unlike threads_per_core which is fixed at boot, threads_per_subcore can
change while the system is running. Most code will not want to use
threads_per_subcore.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/cputhreads.h | 7 +++++++
 arch/powerpc/kernel/setup-common.c    | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
index ac3eedb..2bf8e93 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -18,10 +18,12 @@
 
 #ifdef CONFIG_SMP
 extern int threads_per_core;
+extern int threads_per_subcore;
 extern int threads_shift;
 extern cpumask_t threads_core_mask;
 #else
 #define threads_per_core	1
+#define threads_per_subcore	1
 #define threads_shift		0
 #define threads_core_mask	(CPU_MASK_CPU0)
 #endif
@@ -74,6 +76,11 @@ static inline int cpu_thread_in_core(int cpu)
 	return cpu & (threads_per_core - 1);
 }
 
+static inline int cpu_thread_in_subcore(int cpu)
+{
+	return cpu & (threads_per_subcore - 1);
+}
+
 static inline int cpu_first_thread_sibling(int cpu)
 {
 	return cpu & ~(threads_per_core - 1);
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 3cf25c8..aa0f5ed 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -390,9 +390,10 @@ void __init check_for_initrd(void)
 
 #ifdef CONFIG_SMP
 
-int threads_per_core, threads_shift;
+int threads_per_core, threads_per_subcore, threads_shift;
 cpumask_t threads_core_mask;
 EXPORT_SYMBOL_GPL(threads_per_core);
+EXPORT_SYMBOL_GPL(threads_per_subcore);
 EXPORT_SYMBOL_GPL(threads_shift);
 EXPORT_SYMBOL_GPL(threads_core_mask);
 
@@ -401,6 +402,7 @@ static void __init cpu_init_thread_core_maps(int tpc)
 	int i;
 
 	threads_per_core = tpc;
+	threads_per_subcore = tpc;
 	cpumask_clear(&threads_core_mask);
 
 	/* This implementation only supports power of 2 number of threads
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/6] powerpc: Add threads_per_subcore
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Michael Neuling, kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

From: Michael Ellerman <mpe@ellerman.id.au>

On POWER8 we have a new concept of a subcore. This is what happens when
you take a regular core and split it. A subcore is a grouping of two or
four SMT threads, as well as a handfull of SPRs which allows the subcore
to appear as if it were a core from the point of view of a guest.

Unlike threads_per_core which is fixed at boot, threads_per_subcore can
change while the system is running. Most code will not want to use
threads_per_subcore.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/cputhreads.h | 7 +++++++
 arch/powerpc/kernel/setup-common.c    | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
index ac3eedb..2bf8e93 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -18,10 +18,12 @@
 
 #ifdef CONFIG_SMP
 extern int threads_per_core;
+extern int threads_per_subcore;
 extern int threads_shift;
 extern cpumask_t threads_core_mask;
 #else
 #define threads_per_core	1
+#define threads_per_subcore	1
 #define threads_shift		0
 #define threads_core_mask	(CPU_MASK_CPU0)
 #endif
@@ -74,6 +76,11 @@ static inline int cpu_thread_in_core(int cpu)
 	return cpu & (threads_per_core - 1);
 }
 
+static inline int cpu_thread_in_subcore(int cpu)
+{
+	return cpu & (threads_per_subcore - 1);
+}
+
 static inline int cpu_first_thread_sibling(int cpu)
 {
 	return cpu & ~(threads_per_core - 1);
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 3cf25c8..aa0f5ed 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -390,9 +390,10 @@ void __init check_for_initrd(void)
 
 #ifdef CONFIG_SMP
 
-int threads_per_core, threads_shift;
+int threads_per_core, threads_per_subcore, threads_shift;
 cpumask_t threads_core_mask;
 EXPORT_SYMBOL_GPL(threads_per_core);
+EXPORT_SYMBOL_GPL(threads_per_subcore);
 EXPORT_SYMBOL_GPL(threads_shift);
 EXPORT_SYMBOL_GPL(threads_core_mask);
 
@@ -401,6 +402,7 @@ static void __init cpu_init_thread_core_maps(int tpc)
 	int i;
 
 	threads_per_core = tpc;
+	threads_per_subcore = tpc;
 	cpumask_clear(&threads_core_mask);
 
 	/* This implementation only supports power of 2 number of threads
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 3/6] powerpc: Add threads_per_subcore
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

On POWER8 we have a new concept of a subcore. This is what happens when
you take a regular core and split it. A subcore is a grouping of two or
four SMT threads, as well as a handfull of SPRs which allows the subcore
to appear as if it were a core from the point of view of a guest.

Unlike threads_per_core which is fixed at boot, threads_per_subcore can
change while the system is running. Most code will not want to use
threads_per_subcore.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/include/asm/cputhreads.h | 7 +++++++
 arch/powerpc/kernel/setup-common.c    | 4 +++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h
index ac3eedb..2bf8e93 100644
--- a/arch/powerpc/include/asm/cputhreads.h
+++ b/arch/powerpc/include/asm/cputhreads.h
@@ -18,10 +18,12 @@
 
 #ifdef CONFIG_SMP
 extern int threads_per_core;
+extern int threads_per_subcore;
 extern int threads_shift;
 extern cpumask_t threads_core_mask;
 #else
 #define threads_per_core	1
+#define threads_per_subcore	1
 #define threads_shift		0
 #define threads_core_mask	(CPU_MASK_CPU0)
 #endif
@@ -74,6 +76,11 @@ static inline int cpu_thread_in_core(int cpu)
 	return cpu & (threads_per_core - 1);
 }
 
+static inline int cpu_thread_in_subcore(int cpu)
+{
+	return cpu & (threads_per_subcore - 1);
+}
+
 static inline int cpu_first_thread_sibling(int cpu)
 {
 	return cpu & ~(threads_per_core - 1);
diff --git a/arch/powerpc/kernel/setup-common.c b/arch/powerpc/kernel/setup-common.c
index 3cf25c8..aa0f5ed 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -390,9 +390,10 @@ void __init check_for_initrd(void)
 
 #ifdef CONFIG_SMP
 
-int threads_per_core, threads_shift;
+int threads_per_core, threads_per_subcore, threads_shift;
 cpumask_t threads_core_mask;
 EXPORT_SYMBOL_GPL(threads_per_core);
+EXPORT_SYMBOL_GPL(threads_per_subcore);
 EXPORT_SYMBOL_GPL(threads_shift);
 EXPORT_SYMBOL_GPL(threads_core_mask);
 
@@ -401,6 +402,7 @@ static void __init cpu_init_thread_core_maps(int tpc)
 	int i;
 
 	threads_per_core = tpc;
+	threads_per_subcore = tpc;
 	cpumask_clear(&threads_core_mask);
 
 	/* This implementation only supports power of 2 number of threads
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 4/6] powerpc: Check cpu_thread_in_subcore() in __cpu_up()
  2014-05-23  8:15 ` Michael Neuling
  (?)
@ 2014-05-23  8:15   ` Michael Neuling
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

To support split core we need to change the check in __cpu_up() that
determines if a cpu is allowed to come online.

Currently we refuse to online cpus which are not the primary thread
within their core.

On POWER8 with split core support this check needs to instead refuse to
online cpus which are not the primary thread within their *sub* core.

On POWER7 and other systems that do not support split core,
threads_per_subcore == threads_per_core and so the check is equivalent.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6edae3d..b5222c4 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -489,7 +489,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 	 * Don't allow secondary threads to come online if inhibited
 	 */
 	if (threads_per_core > 1 && secondaries_inhibited() &&
-	    cpu % threads_per_core != 0)
+	    cpu_thread_in_subcore(cpu))
 		return -EBUSY;
 
 	if (smp_ops == NULL ||
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 4/6] powerpc: Check cpu_thread_in_subcore() in __cpu_up()
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Michael Neuling, kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

From: Michael Ellerman <mpe@ellerman.id.au>

To support split core we need to change the check in __cpu_up() that
determines if a cpu is allowed to come online.

Currently we refuse to online cpus which are not the primary thread
within their core.

On POWER8 with split core support this check needs to instead refuse to
online cpus which are not the primary thread within their *sub* core.

On POWER7 and other systems that do not support split core,
threads_per_subcore == threads_per_core and so the check is equivalent.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6edae3d..b5222c4 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -489,7 +489,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 	 * Don't allow secondary threads to come online if inhibited
 	 */
 	if (threads_per_core > 1 && secondaries_inhibited() &&
-	    cpu % threads_per_core != 0)
+	    cpu_thread_in_subcore(cpu))
 		return -EBUSY;
 
 	if (smp_ops == NULL ||
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 4/6] powerpc: Check cpu_thread_in_subcore() in __cpu_up()
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

To support split core we need to change the check in __cpu_up() that
determines if a cpu is allowed to come online.

Currently we refuse to online cpus which are not the primary thread
within their core.

On POWER8 with split core support this check needs to instead refuse to
online cpus which are not the primary thread within their *sub* core.

On POWER7 and other systems that do not support split core,
threads_per_subcore = threads_per_core and so the check is equivalent.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
---
 arch/powerpc/kernel/smp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 6edae3d..b5222c4 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -489,7 +489,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *tidle)
 	 * Don't allow secondary threads to come online if inhibited
 	 */
 	if (threads_per_core > 1 && secondaries_inhibited() &&
-	    cpu % threads_per_core != 0)
+	    cpu_thread_in_subcore(cpu))
 		return -EBUSY;
 
 	if (smp_ops = NULL ||
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 5/6] KVM: PPC: Book3S HV: Use threads_per_subcore in KVM
  2014-05-23  8:15 ` Michael Neuling
  (?)
@ 2014-05-23  8:15   ` Michael Neuling
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

To support split core on POWER8 we need to modify various parts of the
KVM code to use threads_per_subcore instead of threads_per_core. On
systems that do not support split core threads_per_subcore ==
threads_per_core and these changes are a nop.

We use threads_per_subcore as the value reported by KVM_CAP_PPC_SMT.
This communicates to userspace that guests can only be created with
a value of threads_per_core that is less than or equal to the current
threads_per_subcore. This ensures that guests can only be created with a
thread configuration that we are able to run given the current split
core mode.

Although threads_per_subcore can change during the life of the system,
the commit that enables that will ensure that threads_per_subcore does
not change during the life of a KVM VM.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 26 ++++++++++++++++----------
 arch/powerpc/kvm/powerpc.c   |  2 +-
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d7b74f8..5e86f28 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1266,7 +1266,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	int core;
 	struct kvmppc_vcore *vcore;
 
-	core = id / threads_per_core;
+	core = id / threads_per_subcore;
 	if (core >= KVM_MAX_VCORES)
 		goto out;
 
@@ -1305,7 +1305,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 			init_waitqueue_head(&vcore->wq);
 			vcore->preempt_tb = TB_NIL;
 			vcore->lpcr = kvm->arch.lpcr;
-			vcore->first_vcpuid = core * threads_per_core;
+			vcore->first_vcpuid = core * threads_per_subcore;
 			vcore->kvm = kvm;
 		}
 		kvm->arch.vcores[core] = vcore;
@@ -1495,16 +1495,19 @@ static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
 static int on_primary_thread(void)
 {
 	int cpu = smp_processor_id();
-	int thr = cpu_thread_in_core(cpu);
+	int thr;
 
-	if (thr)
+	/* Are we on a primary subcore? */
+	if (cpu_thread_in_subcore(cpu))
 		return 0;
-	while (++thr < threads_per_core)
+
+	thr = 0;
+	while (++thr < threads_per_subcore)
 		if (cpu_online(cpu + thr))
 			return 0;
 
 	/* Grab all hw threads so they can't go into the kernel */
-	for (thr = 1; thr < threads_per_core; ++thr) {
+	for (thr = 1; thr < threads_per_subcore; ++thr) {
 		if (kvmppc_grab_hwthread(cpu + thr)) {
 			/* Couldn't grab one; let the others go */
 			do {
@@ -1563,15 +1566,18 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	}
 
 	/*
-	 * Make sure we are running on thread 0, and that
-	 * secondary threads are offline.
+	 * Make sure we are running on primary threads, and that secondary
+	 * threads are offline.  Also check if the number of threads in this
+	 * guest are greater than the current system threads per guest.
 	 */
-	if (threads_per_core > 1 && !on_primary_thread()) {
+	if ((threads_per_core > 1) &&
+	    ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
 		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
 			vcpu->arch.ret = -EBUSY;
 		goto out;
 	}
 
+
 	vc->pcpu = smp_processor_id();
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		kvmppc_start_thread(vcpu);
@@ -1599,7 +1605,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	/* wait for secondary threads to finish writing their state to memory */
 	if (vc->nap_count < vc->n_woken)
 		kvmppc_wait_for_nap(vc);
-	for (i = 0; i < threads_per_core; ++i)
+	for (i = 0; i < threads_per_subcore; ++i)
 		kvmppc_release_hwthread(vc->pcpu + i);
 	/* prevent other vcpu threads from doing kvmppc_start_thread() now */
 	vc->vcore_state = VCORE_EXITING;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3cf541a..27919a8 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -384,7 +384,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	case KVM_CAP_PPC_SMT:
 		if (hv_enabled)
-			r = threads_per_core;
+			r = threads_per_subcore;
 		else
 			r = 0;
 		break;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 5/6] KVM: PPC: Book3S HV: Use threads_per_subcore in KVM
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Michael Neuling, kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

From: Michael Ellerman <mpe@ellerman.id.au>

To support split core on POWER8 we need to modify various parts of the
KVM code to use threads_per_subcore instead of threads_per_core. On
systems that do not support split core threads_per_subcore ==
threads_per_core and these changes are a nop.

We use threads_per_subcore as the value reported by KVM_CAP_PPC_SMT.
This communicates to userspace that guests can only be created with
a value of threads_per_core that is less than or equal to the current
threads_per_subcore. This ensures that guests can only be created with a
thread configuration that we are able to run given the current split
core mode.

Although threads_per_subcore can change during the life of the system,
the commit that enables that will ensure that threads_per_subcore does
not change during the life of a KVM VM.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 26 ++++++++++++++++----------
 arch/powerpc/kvm/powerpc.c   |  2 +-
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d7b74f8..5e86f28 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1266,7 +1266,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	int core;
 	struct kvmppc_vcore *vcore;
 
-	core = id / threads_per_core;
+	core = id / threads_per_subcore;
 	if (core >= KVM_MAX_VCORES)
 		goto out;
 
@@ -1305,7 +1305,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 			init_waitqueue_head(&vcore->wq);
 			vcore->preempt_tb = TB_NIL;
 			vcore->lpcr = kvm->arch.lpcr;
-			vcore->first_vcpuid = core * threads_per_core;
+			vcore->first_vcpuid = core * threads_per_subcore;
 			vcore->kvm = kvm;
 		}
 		kvm->arch.vcores[core] = vcore;
@@ -1495,16 +1495,19 @@ static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
 static int on_primary_thread(void)
 {
 	int cpu = smp_processor_id();
-	int thr = cpu_thread_in_core(cpu);
+	int thr;
 
-	if (thr)
+	/* Are we on a primary subcore? */
+	if (cpu_thread_in_subcore(cpu))
 		return 0;
-	while (++thr < threads_per_core)
+
+	thr = 0;
+	while (++thr < threads_per_subcore)
 		if (cpu_online(cpu + thr))
 			return 0;
 
 	/* Grab all hw threads so they can't go into the kernel */
-	for (thr = 1; thr < threads_per_core; ++thr) {
+	for (thr = 1; thr < threads_per_subcore; ++thr) {
 		if (kvmppc_grab_hwthread(cpu + thr)) {
 			/* Couldn't grab one; let the others go */
 			do {
@@ -1563,15 +1566,18 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	}
 
 	/*
-	 * Make sure we are running on thread 0, and that
-	 * secondary threads are offline.
+	 * Make sure we are running on primary threads, and that secondary
+	 * threads are offline.  Also check if the number of threads in this
+	 * guest are greater than the current system threads per guest.
 	 */
-	if (threads_per_core > 1 && !on_primary_thread()) {
+	if ((threads_per_core > 1) &&
+	    ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
 		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
 			vcpu->arch.ret = -EBUSY;
 		goto out;
 	}
 
+
 	vc->pcpu = smp_processor_id();
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		kvmppc_start_thread(vcpu);
@@ -1599,7 +1605,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	/* wait for secondary threads to finish writing their state to memory */
 	if (vc->nap_count < vc->n_woken)
 		kvmppc_wait_for_nap(vc);
-	for (i = 0; i < threads_per_core; ++i)
+	for (i = 0; i < threads_per_subcore; ++i)
 		kvmppc_release_hwthread(vc->pcpu + i);
 	/* prevent other vcpu threads from doing kvmppc_start_thread() now */
 	vc->vcore_state = VCORE_EXITING;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3cf541a..27919a8 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -384,7 +384,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	case KVM_CAP_PPC_SMT:
 		if (hv_enabled)
-			r = threads_per_core;
+			r = threads_per_subcore;
 		else
 			r = 0;
 		break;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 5/6] KVM: PPC: Book3S HV: Use threads_per_subcore in KVM
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling

From: Michael Ellerman <mpe@ellerman.id.au>

To support split core on POWER8 we need to modify various parts of the
KVM code to use threads_per_subcore instead of threads_per_core. On
systems that do not support split core threads_per_subcore =
threads_per_core and these changes are a nop.

We use threads_per_subcore as the value reported by KVM_CAP_PPC_SMT.
This communicates to userspace that guests can only be created with
a value of threads_per_core that is less than or equal to the current
threads_per_subcore. This ensures that guests can only be created with a
thread configuration that we are able to run given the current split
core mode.

Although threads_per_subcore can change during the life of the system,
the commit that enables that will ensure that threads_per_subcore does
not change during the life of a KVM VM.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Acked-by: Alexander Graf <agraf@suse.de>
Acked-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c | 26 ++++++++++++++++----------
 arch/powerpc/kvm/powerpc.c   |  2 +-
 2 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index d7b74f8..5e86f28 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1266,7 +1266,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 	int core;
 	struct kvmppc_vcore *vcore;
 
-	core = id / threads_per_core;
+	core = id / threads_per_subcore;
 	if (core >= KVM_MAX_VCORES)
 		goto out;
 
@@ -1305,7 +1305,7 @@ static struct kvm_vcpu *kvmppc_core_vcpu_create_hv(struct kvm *kvm,
 			init_waitqueue_head(&vcore->wq);
 			vcore->preempt_tb = TB_NIL;
 			vcore->lpcr = kvm->arch.lpcr;
-			vcore->first_vcpuid = core * threads_per_core;
+			vcore->first_vcpuid = core * threads_per_subcore;
 			vcore->kvm = kvm;
 		}
 		kvm->arch.vcores[core] = vcore;
@@ -1495,16 +1495,19 @@ static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
 static int on_primary_thread(void)
 {
 	int cpu = smp_processor_id();
-	int thr = cpu_thread_in_core(cpu);
+	int thr;
 
-	if (thr)
+	/* Are we on a primary subcore? */
+	if (cpu_thread_in_subcore(cpu))
 		return 0;
-	while (++thr < threads_per_core)
+
+	thr = 0;
+	while (++thr < threads_per_subcore)
 		if (cpu_online(cpu + thr))
 			return 0;
 
 	/* Grab all hw threads so they can't go into the kernel */
-	for (thr = 1; thr < threads_per_core; ++thr) {
+	for (thr = 1; thr < threads_per_subcore; ++thr) {
 		if (kvmppc_grab_hwthread(cpu + thr)) {
 			/* Couldn't grab one; let the others go */
 			do {
@@ -1563,15 +1566,18 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	}
 
 	/*
-	 * Make sure we are running on thread 0, and that
-	 * secondary threads are offline.
+	 * Make sure we are running on primary threads, and that secondary
+	 * threads are offline.  Also check if the number of threads in this
+	 * guest are greater than the current system threads per guest.
 	 */
-	if (threads_per_core > 1 && !on_primary_thread()) {
+	if ((threads_per_core > 1) &&
+	    ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) {
 		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
 			vcpu->arch.ret = -EBUSY;
 		goto out;
 	}
 
+
 	vc->pcpu = smp_processor_id();
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		kvmppc_start_thread(vcpu);
@@ -1599,7 +1605,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	/* wait for secondary threads to finish writing their state to memory */
 	if (vc->nap_count < vc->n_woken)
 		kvmppc_wait_for_nap(vc);
-	for (i = 0; i < threads_per_core; ++i)
+	for (i = 0; i < threads_per_subcore; ++i)
 		kvmppc_release_hwthread(vc->pcpu + i);
 	/* prevent other vcpu threads from doing kvmppc_start_thread() now */
 	vc->vcore_state = VCORE_EXITING;
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3cf541a..27919a8 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -384,7 +384,7 @@ int kvm_dev_ioctl_check_extension(long ext)
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
 	case KVM_CAP_PPC_SMT:
 		if (hv_enabled)
-			r = threads_per_core;
+			r = threads_per_subcore;
 		else
 			r = 0;
 		break;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 6/6] powerpc/powernv: Add support for POWER8 split core on powernv
  2014-05-23  8:15 ` Michael Neuling
  (?)
@ 2014-05-23  8:15   ` Michael Neuling
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling, Srivatsa S. Bhat, Mahesh Salgaonkar

From: Michael Ellerman <mpe@ellerman.id.au>

Upcoming POWER8 chips support a concept called split core. This is where the
core can be split into subcores that although not full cores, are able to
appear as full cores to a guest.

The splitting & unsplitting procedure is mildly complicated, and explained at
length in the comments within the patch.

One notable detail is that when splitting or unsplitting we need to pull
offline cpus out of their offline state to do work as part of the procedure.

The interface for changing the split mode is via a sysfs file, eg:

 $ echo 2 > /sys/devices/system/cpu/subcores_per_core

Currently supported values are '1', '2' and '4'. And indicate respectively that
the core should be unsplit, split in half, and split in quarters. These modes
correspond to threads_per_subcore of 8, 4 and 2.

We do not allow changing the split mode while KVM VMs are active. This is to
prevent the value changing while userspace is configuring the VM, and also to
prevent the mode being changed in such a way that existing guests are unable to
be run.

CPU hotplug fixes by Srivatsa.  max_cpus fixes by Mahesh.  cpuset fixes by
benh.  Fix for irq race by paulus.  The rest by mikey and mpe.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/reg.h               |   9 +
 arch/powerpc/platforms/powernv/Makefile      |   2 +-
 arch/powerpc/platforms/powernv/powernv.h     |   2 +
 arch/powerpc/platforms/powernv/smp.c         |  18 +-
 arch/powerpc/platforms/powernv/subcore-asm.S |  95 +++++++
 arch/powerpc/platforms/powernv/subcore.c     | 392 +++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/subcore.h     |  18 ++
 7 files changed, 527 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/subcore-asm.S
 create mode 100644 arch/powerpc/platforms/powernv/subcore.c
 create mode 100644 arch/powerpc/platforms/powernv/subcore.h

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 29de015..2cd799b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -225,6 +225,7 @@
 #define   CTRL_TE	0x00c00000	/* thread enable */
 #define   CTRL_RUNLATCH	0x1
 #define SPRN_DAWR	0xB4
+#define SPRN_RPR	0xBA	/* Relative Priority Register */
 #define SPRN_CIABR	0xBB
 #define   CIABR_PRIV		0x3
 #define   CIABR_PRIV_USER	1
@@ -273,8 +274,10 @@
 #define SPRN_HSRR1	0x13B	/* Hypervisor Save/Restore 1 */
 #define SPRN_IC		0x350	/* Virtual Instruction Count */
 #define SPRN_VTB	0x351	/* Virtual Time Base */
+#define SPRN_LDBAR	0x352	/* LD Base Address Register */
 #define SPRN_PMICR	0x354   /* Power Management Idle Control Reg */
 #define SPRN_PMSR	0x355   /* Power Management Status Reg */
+#define SPRN_PMMAR	0x356	/* Power Management Memory Activity Register */
 #define SPRN_PMCR	0x374	/* Power Management Control Register */
 
 /* HFSCR and FSCR bit numbers are the same */
@@ -434,6 +437,12 @@
 #define HID0_BTCD	(1<<1)		/* Branch target cache disable */
 #define HID0_NOPDST	(1<<1)		/* No-op dst, dstt, etc. instr. */
 #define HID0_NOPTI	(1<<0)		/* No-op dcbt and dcbst instr. */
+/* POWER8 HID0 bits */
+#define HID0_POWER8_4LPARMODE	__MASK(61)
+#define HID0_POWER8_2LPARMODE	__MASK(57)
+#define HID0_POWER8_1TO2LPAR	__MASK(52)
+#define HID0_POWER8_1TO4LPAR	__MASK(51)
+#define HID0_POWER8_DYNLPARDIS	__MASK(48)
 
 #define SPRN_HID1	0x3F1		/* Hardware Implementation Register 1 */
 #ifdef CONFIG_6xx
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 63cebb9..4ad0d34 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,7 +1,7 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o opal-async.o
 obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
-obj-y			+= opal-msglog.o
+obj-y			+= opal-msglog.o subcore.o subcore-asm.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
index 0051e10..75501bf 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -25,4 +25,6 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 
 extern void pnv_lpc_init(void);
 
+bool cpu_core_split_required(void);
+
 #endif /* _POWERNV_H */
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 65faf99..0062a43 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -161,15 +161,17 @@ static void pnv_smp_cpu_kill_self(void)
 		ppc64_runlatch_off();
 		power7_nap(1);
 		ppc64_runlatch_on();
-		if (!generic_check_cpu_restart(cpu)) {
+
+		/* Reenable IRQs briefly to clear the IPI that woke us */
+		local_irq_enable();
+		local_irq_disable();
+		mb();
+
+		if (cpu_core_split_required())
+			continue;
+
+		if (!generic_check_cpu_restart(cpu))
 			DBG("CPU%d Unexpected exit while offline !\n", cpu);
-			/* We may be getting an IPI, so we re-enable
-			 * interrupts to process it, it will be ignored
-			 * since we aren't online (hopefully)
-			 */
-			local_irq_enable();
-			local_irq_disable();
-		}
 	}
 	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1);
 	DBG("CPU%d coming online...\n", cpu);
diff --git a/arch/powerpc/platforms/powernv/subcore-asm.S b/arch/powerpc/platforms/powernv/subcore-asm.S
new file mode 100644
index 0000000..39bb24a
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/subcore-asm.S
@@ -0,0 +1,95 @@
+/*
+ * Copyright 2013, Michael (Ellerman|Neuling), IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/asm-offsets.h>
+#include <asm/ppc_asm.h>
+#include <asm/reg.h>
+
+#include "subcore.h"
+
+
+_GLOBAL(split_core_secondary_loop)
+	/*
+	 * r3 = u8 *state, used throughout the routine
+	 * r4 = temp
+	 * r5 = temp
+	 * ..
+	 * r12 = MSR
+	 */
+	mfmsr	r12
+
+	/* Disable interrupts so SRR0/1 don't get trashed */
+	li	r4,0
+	ori	r4,r4,MSR_EE|MSR_SE|MSR_BE|MSR_RI
+	andc	r4,r12,r4
+	sync
+	mtmsrd	r4
+
+	/* Switch to real mode and leave interrupts off */
+	li	r5, MSR_IR|MSR_DR
+	andc	r5, r4, r5
+
+	LOAD_REG_ADDR(r4, real_mode)
+
+	mtspr	SPRN_SRR0,r4
+	mtspr	SPRN_SRR1,r5
+	rfid
+	b	.	/* prevent speculative execution */
+
+real_mode:
+	/* Grab values from unsplit SPRs */
+	mfspr	r6,  SPRN_LDBAR
+	mfspr	r7,  SPRN_PMMAR
+	mfspr	r8,  SPRN_PMCR
+	mfspr	r9,  SPRN_RPR
+	mfspr	r10, SPRN_SDR1
+
+	/* Order reading the SPRs vs telling the primary we are ready to split */
+	sync
+
+	/* Tell thread 0 we are in real mode */
+	li	r4, SYNC_STEP_REAL_MODE
+	stb	r4, 0(r3)
+
+	li	r5, (HID0_POWER8_4LPARMODE | HID0_POWER8_2LPARMODE)@highest
+	sldi	r5, r5, 48
+
+	/* Loop until we see the split happen in HID0 */
+1:	mfspr	r4, SPRN_HID0
+	and.	r4, r4, r5
+	beq	1b
+
+	/*
+	 * We only need to initialise the below regs once for each subcore,
+	 * but it's simpler and harmless to do it on each thread.
+	 */
+
+	/* Make sure various SPRS have sane values */
+	li	r4, 0
+	mtspr	SPRN_LPID, r4
+	mtspr	SPRN_PCR, r4
+	mtspr	SPRN_HDEC, r4
+
+	/* Restore SPR values now we are split */
+	mtspr	SPRN_LDBAR, r6
+	mtspr	SPRN_PMMAR, r7
+	mtspr	SPRN_PMCR, r8
+	mtspr	SPRN_RPR, r9
+	mtspr	SPRN_SDR1, r10
+
+	LOAD_REG_ADDR(r5, virtual_mode)
+
+	/* Get out of real mode */
+	mtspr	SPRN_SRR0,r5
+	mtspr	SPRN_SRR1,r12
+	rfid
+	b	.	/* prevent speculative execution */
+
+virtual_mode:
+	blr
diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c
new file mode 100644
index 0000000..894ecb3
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -0,0 +1,392 @@
+/*
+ * Copyright 2013, Michael (Ellerman|Neuling), IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt)	"powernv: " fmt
+
+#include <linux/kernel.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/gfp.h>
+#include <linux/smp.h>
+#include <linux/stop_machine.h>
+
+#include <asm/cputhreads.h>
+#include <asm/kvm_ppc.h>
+#include <asm/machdep.h>
+#include <asm/opal.h>
+#include <asm/smp.h>
+
+#include "subcore.h"
+
+
+/*
+ * Split/unsplit procedure:
+ *
+ * A core can be in one of three states, unsplit, 2-way split, and 4-way split.
+ *
+ * The mapping to subcores_per_core is simple:
+ *
+ *  State       | subcores_per_core
+ *  ------------|------------------
+ *  Unsplit     |        1
+ *  2-way split |        2
+ *  4-way split |        4
+ *
+ * The core is split along thread boundaries, the mapping between subcores and
+ * threads is as follows:
+ *
+ *  Unsplit:
+ *          ----------------------------
+ *  Subcore |            0             |
+ *          ----------------------------
+ *  Thread  |  0  1  2  3  4  5  6  7  |
+ *          ----------------------------
+ *
+ *  2-way split:
+ *          -------------------------------------
+ *  Subcore |        0        |        1        |
+ *          -------------------------------------
+ *  Thread  |  0   1   2   3  |  4   5   6   7  |
+ *          -------------------------------------
+ *
+ *  4-way split:
+ *          -----------------------------------------
+ *  Subcore |    0    |    1    |    2    |    3    |
+ *          -----------------------------------------
+ *  Thread  |  0   1  |  2   3  |  4   5  |  6   7  |
+ *          -----------------------------------------
+ *
+ *
+ * Transitions
+ * -----------
+ *
+ * It is not possible to transition between either of the split states, the
+ * core must first be unsplit. The legal transitions are:
+ *
+ *  -----------          ---------------
+ *  |         |  <---->  | 2-way split |
+ *  |         |          ---------------
+ *  | Unsplit |
+ *  |         |          ---------------
+ *  |         |  <---->  | 4-way split |
+ *  -----------          ---------------
+ *
+ * Unsplitting
+ * -----------
+ *
+ * Unsplitting is the simpler procedure. It requires thread 0 to request the
+ * unsplit while all other threads NAP.
+ *
+ * Thread 0 clears HID0_POWER8_DYNLPARDIS (Dynamic LPAR Disable). This tells
+ * the hardware that if all threads except 0 are napping, the hardware should
+ * unsplit the core.
+ *
+ * Non-zero threads are sent to a NAP loop, they don't exit the loop until they
+ * see the core unsplit.
+ *
+ * Core 0 spins waiting for the hardware to see all the other threads napping
+ * and perform the unsplit.
+ *
+ * Once thread 0 sees the unsplit, it IPIs the secondary threads to wake them
+ * out of NAP. They will then see the core unsplit and exit the NAP loop.
+ *
+ * Splitting
+ * ---------
+ *
+ * The basic splitting procedure is fairly straight forward. However it is
+ * complicated by the fact that after the split occurs, the newly created
+ * subcores are not in a fully initialised state.
+ *
+ * Most notably the subcores do not have the correct value for SDR1, which
+ * means they must not be running in virtual mode when the split occurs. The
+ * subcores have separate timebases SPRs but these are pre-synchronised by
+ * opal.
+ *
+ * To begin with secondary threads are sent to an assembly routine. There they
+ * switch to real mode, so they are immune to the uninitialised SDR1 value.
+ * Once in real mode they indicate that they are in real mode, and spin waiting
+ * to see the core split.
+ *
+ * Thread 0 waits to see that all secondaries are in real mode, and then begins
+ * the splitting procedure. It firstly sets HID0_POWER8_DYNLPARDIS, which
+ * prevents the hardware from unsplitting. Then it sets the appropriate HID bit
+ * to request the split, and spins waiting to see that the split has happened.
+ *
+ * Concurrently the secondaries will notice the split. When they do they set up
+ * their SPRs, notably SDR1, and then they can return to virtual mode and exit
+ * the procedure.
+ */
+
+/* Initialised at boot by subcore_init() */
+static int subcores_per_core;
+
+/*
+ * Used to communicate to offline cpus that we want them to pop out of the
+ * offline loop and do a split or unsplit.
+ *
+ * 0 - no split happening
+ * 1 - unsplit in progress
+ * 2 - split to 2 in progress
+ * 4 - split to 4 in progress
+ */
+static int new_split_mode;
+
+static cpumask_var_t cpu_offline_mask;
+
+struct split_state {
+	u8 step;
+	u8 master;
+};
+
+static DEFINE_PER_CPU(struct split_state, split_state);
+
+static void wait_for_sync_step(int step)
+{
+	int i, cpu = smp_processor_id();
+
+	for (i = cpu + 1; i < cpu + threads_per_core; i++)
+		while(per_cpu(split_state, i).step < step)
+			barrier();
+
+	/* Order the wait loop vs any subsequent loads/stores. */
+	mb();
+}
+
+static void unsplit_core(void)
+{
+	u64 hid0, mask;
+	int i, cpu;
+
+	mask = HID0_POWER8_2LPARMODE | HID0_POWER8_4LPARMODE;
+
+	cpu = smp_processor_id();
+	if (cpu_thread_in_core(cpu) != 0) {
+		while (mfspr(SPRN_HID0) & mask)
+			power7_nap(0);
+
+		per_cpu(split_state, cpu).step = SYNC_STEP_UNSPLIT;
+		return;
+	}
+
+	hid0 = mfspr(SPRN_HID0);
+	hid0 &= ~HID0_POWER8_DYNLPARDIS;
+	mtspr(SPRN_HID0, hid0);
+
+	while (mfspr(SPRN_HID0) & mask)
+		cpu_relax();
+
+	/* Wake secondaries out of NAP */
+	for (i = cpu + 1; i < cpu + threads_per_core; i++)
+		smp_send_reschedule(i);
+
+	wait_for_sync_step(SYNC_STEP_UNSPLIT);
+}
+
+static void split_core(int new_mode)
+{
+	struct {  u64 value; u64 mask; } split_parms[2] = {
+		{ HID0_POWER8_1TO2LPAR, HID0_POWER8_2LPARMODE },
+		{ HID0_POWER8_1TO4LPAR, HID0_POWER8_4LPARMODE }
+	};
+	int i, cpu;
+	u64 hid0;
+
+	/* Convert new_mode (2 or 4) into an index into our parms array */
+	i = (new_mode >> 1) - 1;
+	BUG_ON(i < 0 || i > 1);
+
+	cpu = smp_processor_id();
+	if (cpu_thread_in_core(cpu) != 0) {
+		split_core_secondary_loop(&per_cpu(split_state, cpu).step);
+		return;
+	}
+
+	wait_for_sync_step(SYNC_STEP_REAL_MODE);
+
+	/* Write new mode */
+	hid0  = mfspr(SPRN_HID0);
+	hid0 |= HID0_POWER8_DYNLPARDIS | split_parms[i].value;
+	mtspr(SPRN_HID0, hid0);
+
+	/* Wait for it to happen */
+	while (!(mfspr(SPRN_HID0) & split_parms[i].mask))
+		cpu_relax();
+}
+
+static void cpu_do_split(int new_mode)
+{
+	/*
+	 * At boot subcores_per_core will be 0, so we will always unsplit at
+	 * boot. In the usual case where the core is already unsplit it's a
+	 * nop, and this just ensures the kernel's notion of the mode is
+	 * consistent with the hardware.
+	 */
+	if (subcores_per_core != 1)
+		unsplit_core();
+
+	if (new_mode != 1)
+		split_core(new_mode);
+
+	mb();
+	per_cpu(split_state, smp_processor_id()).step = SYNC_STEP_FINISHED;
+}
+
+bool cpu_core_split_required(void)
+{
+	smp_rmb();
+
+	if (!new_split_mode)
+		return false;
+
+	cpu_do_split(new_split_mode);
+
+	return true;
+}
+
+static int cpu_update_split_mode(void *data)
+{
+	int cpu, new_mode = *(int *)data;
+
+	if (this_cpu_ptr(&split_state)->master) {
+		new_split_mode = new_mode;
+		smp_wmb();
+
+		cpumask_andnot(cpu_offline_mask, cpu_present_mask,
+			       cpu_online_mask);
+
+		/* This should work even though the cpu is offline */
+		for_each_cpu(cpu, cpu_offline_mask)
+			smp_send_reschedule(cpu);
+	}
+
+	cpu_do_split(new_mode);
+
+	if (this_cpu_ptr(&split_state)->master) {
+		/* Wait for all cpus to finish before we touch subcores_per_core */
+		for_each_present_cpu(cpu) {
+			if (cpu >= setup_max_cpus)
+				break;
+
+			while(per_cpu(split_state, cpu).step < SYNC_STEP_FINISHED)
+				barrier();
+		}
+
+		new_split_mode = 0;
+
+		/* Make the new mode public */
+		subcores_per_core = new_mode;
+		threads_per_subcore = threads_per_core / subcores_per_core;
+
+		/* Make sure the new mode is written before we exit */
+		mb();
+	}
+
+	return 0;
+}
+
+static int set_subcores_per_core(int new_mode)
+{
+	struct split_state *state;
+	int cpu;
+
+	if (kvm_hv_mode_active()) {
+		pr_err("Unable to change split core mode while KVM active.\n");
+		return -EBUSY;
+	}
+
+	/*
+	 * We are only called at boot, or from the sysfs write. If that ever
+	 * changes we'll need a lock here.
+	 */
+	BUG_ON(new_mode < 1 || new_mode > 4 || new_mode == 3);
+
+	for_each_present_cpu(cpu) {
+		state = &per_cpu(split_state, cpu);
+		state->step = SYNC_STEP_INITIAL;
+		state->master = 0;
+	}
+
+	get_online_cpus();
+
+	/* This cpu will update the globals before exiting stop machine */
+	this_cpu_ptr(&split_state)->master = 1;
+
+	/* Ensure state is consistent before we call the other cpus */
+	mb();
+
+	stop_machine(cpu_update_split_mode, &new_mode, cpu_online_mask);
+
+	put_online_cpus();
+
+	return 0;
+}
+
+static ssize_t __used store_subcores_per_core(struct device *dev,
+		struct device_attribute *attr, const char *buf,
+		size_t count)
+{
+	unsigned long val;
+	int rc;
+
+	/* We are serialised by the attribute lock */
+
+	rc = sscanf(buf, "%lx", &val);
+	if (rc != 1)
+		return -EINVAL;
+
+	switch (val) {
+	case 1:
+	case 2:
+	case 4:
+		if (subcores_per_core == val)
+			/* Nothing to do */
+			goto out;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	rc = set_subcores_per_core(val);
+	if (rc)
+		return rc;
+
+out:
+	return count;
+}
+
+static ssize_t show_subcores_per_core(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%x\n", subcores_per_core);
+}
+
+static DEVICE_ATTR(subcores_per_core, 0644,
+		show_subcores_per_core, store_subcores_per_core);
+
+static int subcore_init(void)
+{
+	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+		return 0;
+
+	/*
+	 * We need all threads in a core to be present to split/unsplit so
+         * continue only if max_cpus are aligned to threads_per_core.
+	 */
+	if (setup_max_cpus % threads_per_core)
+		return 0;
+
+	BUG_ON(!alloc_cpumask_var(&cpu_offline_mask, GFP_KERNEL));
+
+	set_subcores_per_core(1);
+
+	return device_create_file(cpu_subsys.dev_root,
+				  &dev_attr_subcores_per_core);
+}
+machine_device_initcall(powernv, subcore_init);
diff --git a/arch/powerpc/platforms/powernv/subcore.h b/arch/powerpc/platforms/powernv/subcore.h
new file mode 100644
index 0000000..148abc9
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/subcore.h
@@ -0,0 +1,18 @@
+/*
+ * Copyright 2013, Michael Ellerman, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/* These are ordered and tested with <= */
+#define SYNC_STEP_INITIAL	0
+#define SYNC_STEP_UNSPLIT	1	/* Set by secondary when it sees unsplit */
+#define SYNC_STEP_REAL_MODE	2	/* Set by secondary when in real mode  */
+#define SYNC_STEP_FINISHED	3	/* Set by secondary when split/unsplit is done */
+
+#ifndef __ASSEMBLY__
+void split_core_secondary_loop(u8 *state);
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 6/6] powerpc/powernv: Add support for POWER8 split core on powernv
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Michael Neuling, kvm, kvm-ppc, Paul Mackerras, Srivatsa S. Bhat,
	Mahesh Salgaonkar, linuxppc-dev

From: Michael Ellerman <mpe@ellerman.id.au>

Upcoming POWER8 chips support a concept called split core. This is where the
core can be split into subcores that although not full cores, are able to
appear as full cores to a guest.

The splitting & unsplitting procedure is mildly complicated, and explained at
length in the comments within the patch.

One notable detail is that when splitting or unsplitting we need to pull
offline cpus out of their offline state to do work as part of the procedure.

The interface for changing the split mode is via a sysfs file, eg:

 $ echo 2 > /sys/devices/system/cpu/subcores_per_core

Currently supported values are '1', '2' and '4'. And indicate respectively that
the core should be unsplit, split in half, and split in quarters. These modes
correspond to threads_per_subcore of 8, 4 and 2.

We do not allow changing the split mode while KVM VMs are active. This is to
prevent the value changing while userspace is configuring the VM, and also to
prevent the mode being changed in such a way that existing guests are unable to
be run.

CPU hotplug fixes by Srivatsa.  max_cpus fixes by Mahesh.  cpuset fixes by
benh.  Fix for irq race by paulus.  The rest by mikey and mpe.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/reg.h               |   9 +
 arch/powerpc/platforms/powernv/Makefile      |   2 +-
 arch/powerpc/platforms/powernv/powernv.h     |   2 +
 arch/powerpc/platforms/powernv/smp.c         |  18 +-
 arch/powerpc/platforms/powernv/subcore-asm.S |  95 +++++++
 arch/powerpc/platforms/powernv/subcore.c     | 392 +++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/subcore.h     |  18 ++
 7 files changed, 527 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/subcore-asm.S
 create mode 100644 arch/powerpc/platforms/powernv/subcore.c
 create mode 100644 arch/powerpc/platforms/powernv/subcore.h

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 29de015..2cd799b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -225,6 +225,7 @@
 #define   CTRL_TE	0x00c00000	/* thread enable */
 #define   CTRL_RUNLATCH	0x1
 #define SPRN_DAWR	0xB4
+#define SPRN_RPR	0xBA	/* Relative Priority Register */
 #define SPRN_CIABR	0xBB
 #define   CIABR_PRIV		0x3
 #define   CIABR_PRIV_USER	1
@@ -273,8 +274,10 @@
 #define SPRN_HSRR1	0x13B	/* Hypervisor Save/Restore 1 */
 #define SPRN_IC		0x350	/* Virtual Instruction Count */
 #define SPRN_VTB	0x351	/* Virtual Time Base */
+#define SPRN_LDBAR	0x352	/* LD Base Address Register */
 #define SPRN_PMICR	0x354   /* Power Management Idle Control Reg */
 #define SPRN_PMSR	0x355   /* Power Management Status Reg */
+#define SPRN_PMMAR	0x356	/* Power Management Memory Activity Register */
 #define SPRN_PMCR	0x374	/* Power Management Control Register */
 
 /* HFSCR and FSCR bit numbers are the same */
@@ -434,6 +437,12 @@
 #define HID0_BTCD	(1<<1)		/* Branch target cache disable */
 #define HID0_NOPDST	(1<<1)		/* No-op dst, dstt, etc. instr. */
 #define HID0_NOPTI	(1<<0)		/* No-op dcbt and dcbst instr. */
+/* POWER8 HID0 bits */
+#define HID0_POWER8_4LPARMODE	__MASK(61)
+#define HID0_POWER8_2LPARMODE	__MASK(57)
+#define HID0_POWER8_1TO2LPAR	__MASK(52)
+#define HID0_POWER8_1TO4LPAR	__MASK(51)
+#define HID0_POWER8_DYNLPARDIS	__MASK(48)
 
 #define SPRN_HID1	0x3F1		/* Hardware Implementation Register 1 */
 #ifdef CONFIG_6xx
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 63cebb9..4ad0d34 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,7 +1,7 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o opal-async.o
 obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
-obj-y			+= opal-msglog.o
+obj-y			+= opal-msglog.o subcore.o subcore-asm.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
index 0051e10..75501bf 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -25,4 +25,6 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 
 extern void pnv_lpc_init(void);
 
+bool cpu_core_split_required(void);
+
 #endif /* _POWERNV_H */
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 65faf99..0062a43 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -161,15 +161,17 @@ static void pnv_smp_cpu_kill_self(void)
 		ppc64_runlatch_off();
 		power7_nap(1);
 		ppc64_runlatch_on();
-		if (!generic_check_cpu_restart(cpu)) {
+
+		/* Reenable IRQs briefly to clear the IPI that woke us */
+		local_irq_enable();
+		local_irq_disable();
+		mb();
+
+		if (cpu_core_split_required())
+			continue;
+
+		if (!generic_check_cpu_restart(cpu))
 			DBG("CPU%d Unexpected exit while offline !\n", cpu);
-			/* We may be getting an IPI, so we re-enable
-			 * interrupts to process it, it will be ignored
-			 * since we aren't online (hopefully)
-			 */
-			local_irq_enable();
-			local_irq_disable();
-		}
 	}
 	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1);
 	DBG("CPU%d coming online...\n", cpu);
diff --git a/arch/powerpc/platforms/powernv/subcore-asm.S b/arch/powerpc/platforms/powernv/subcore-asm.S
new file mode 100644
index 0000000..39bb24a
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/subcore-asm.S
@@ -0,0 +1,95 @@
+/*
+ * Copyright 2013, Michael (Ellerman|Neuling), IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/asm-offsets.h>
+#include <asm/ppc_asm.h>
+#include <asm/reg.h>
+
+#include "subcore.h"
+
+
+_GLOBAL(split_core_secondary_loop)
+	/*
+	 * r3 = u8 *state, used throughout the routine
+	 * r4 = temp
+	 * r5 = temp
+	 * ..
+	 * r12 = MSR
+	 */
+	mfmsr	r12
+
+	/* Disable interrupts so SRR0/1 don't get trashed */
+	li	r4,0
+	ori	r4,r4,MSR_EE|MSR_SE|MSR_BE|MSR_RI
+	andc	r4,r12,r4
+	sync
+	mtmsrd	r4
+
+	/* Switch to real mode and leave interrupts off */
+	li	r5, MSR_IR|MSR_DR
+	andc	r5, r4, r5
+
+	LOAD_REG_ADDR(r4, real_mode)
+
+	mtspr	SPRN_SRR0,r4
+	mtspr	SPRN_SRR1,r5
+	rfid
+	b	.	/* prevent speculative execution */
+
+real_mode:
+	/* Grab values from unsplit SPRs */
+	mfspr	r6,  SPRN_LDBAR
+	mfspr	r7,  SPRN_PMMAR
+	mfspr	r8,  SPRN_PMCR
+	mfspr	r9,  SPRN_RPR
+	mfspr	r10, SPRN_SDR1
+
+	/* Order reading the SPRs vs telling the primary we are ready to split */
+	sync
+
+	/* Tell thread 0 we are in real mode */
+	li	r4, SYNC_STEP_REAL_MODE
+	stb	r4, 0(r3)
+
+	li	r5, (HID0_POWER8_4LPARMODE | HID0_POWER8_2LPARMODE)@highest
+	sldi	r5, r5, 48
+
+	/* Loop until we see the split happen in HID0 */
+1:	mfspr	r4, SPRN_HID0
+	and.	r4, r4, r5
+	beq	1b
+
+	/*
+	 * We only need to initialise the below regs once for each subcore,
+	 * but it's simpler and harmless to do it on each thread.
+	 */
+
+	/* Make sure various SPRS have sane values */
+	li	r4, 0
+	mtspr	SPRN_LPID, r4
+	mtspr	SPRN_PCR, r4
+	mtspr	SPRN_HDEC, r4
+
+	/* Restore SPR values now we are split */
+	mtspr	SPRN_LDBAR, r6
+	mtspr	SPRN_PMMAR, r7
+	mtspr	SPRN_PMCR, r8
+	mtspr	SPRN_RPR, r9
+	mtspr	SPRN_SDR1, r10
+
+	LOAD_REG_ADDR(r5, virtual_mode)
+
+	/* Get out of real mode */
+	mtspr	SPRN_SRR0,r5
+	mtspr	SPRN_SRR1,r12
+	rfid
+	b	.	/* prevent speculative execution */
+
+virtual_mode:
+	blr
diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c
new file mode 100644
index 0000000..894ecb3
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -0,0 +1,392 @@
+/*
+ * Copyright 2013, Michael (Ellerman|Neuling), IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt)	"powernv: " fmt
+
+#include <linux/kernel.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/gfp.h>
+#include <linux/smp.h>
+#include <linux/stop_machine.h>
+
+#include <asm/cputhreads.h>
+#include <asm/kvm_ppc.h>
+#include <asm/machdep.h>
+#include <asm/opal.h>
+#include <asm/smp.h>
+
+#include "subcore.h"
+
+
+/*
+ * Split/unsplit procedure:
+ *
+ * A core can be in one of three states, unsplit, 2-way split, and 4-way split.
+ *
+ * The mapping to subcores_per_core is simple:
+ *
+ *  State       | subcores_per_core
+ *  ------------|------------------
+ *  Unsplit     |        1
+ *  2-way split |        2
+ *  4-way split |        4
+ *
+ * The core is split along thread boundaries, the mapping between subcores and
+ * threads is as follows:
+ *
+ *  Unsplit:
+ *          ----------------------------
+ *  Subcore |            0             |
+ *          ----------------------------
+ *  Thread  |  0  1  2  3  4  5  6  7  |
+ *          ----------------------------
+ *
+ *  2-way split:
+ *          -------------------------------------
+ *  Subcore |        0        |        1        |
+ *          -------------------------------------
+ *  Thread  |  0   1   2   3  |  4   5   6   7  |
+ *          -------------------------------------
+ *
+ *  4-way split:
+ *          -----------------------------------------
+ *  Subcore |    0    |    1    |    2    |    3    |
+ *          -----------------------------------------
+ *  Thread  |  0   1  |  2   3  |  4   5  |  6   7  |
+ *          -----------------------------------------
+ *
+ *
+ * Transitions
+ * -----------
+ *
+ * It is not possible to transition between either of the split states, the
+ * core must first be unsplit. The legal transitions are:
+ *
+ *  -----------          ---------------
+ *  |         |  <---->  | 2-way split |
+ *  |         |          ---------------
+ *  | Unsplit |
+ *  |         |          ---------------
+ *  |         |  <---->  | 4-way split |
+ *  -----------          ---------------
+ *
+ * Unsplitting
+ * -----------
+ *
+ * Unsplitting is the simpler procedure. It requires thread 0 to request the
+ * unsplit while all other threads NAP.
+ *
+ * Thread 0 clears HID0_POWER8_DYNLPARDIS (Dynamic LPAR Disable). This tells
+ * the hardware that if all threads except 0 are napping, the hardware should
+ * unsplit the core.
+ *
+ * Non-zero threads are sent to a NAP loop, they don't exit the loop until they
+ * see the core unsplit.
+ *
+ * Core 0 spins waiting for the hardware to see all the other threads napping
+ * and perform the unsplit.
+ *
+ * Once thread 0 sees the unsplit, it IPIs the secondary threads to wake them
+ * out of NAP. They will then see the core unsplit and exit the NAP loop.
+ *
+ * Splitting
+ * ---------
+ *
+ * The basic splitting procedure is fairly straight forward. However it is
+ * complicated by the fact that after the split occurs, the newly created
+ * subcores are not in a fully initialised state.
+ *
+ * Most notably the subcores do not have the correct value for SDR1, which
+ * means they must not be running in virtual mode when the split occurs. The
+ * subcores have separate timebases SPRs but these are pre-synchronised by
+ * opal.
+ *
+ * To begin with secondary threads are sent to an assembly routine. There they
+ * switch to real mode, so they are immune to the uninitialised SDR1 value.
+ * Once in real mode they indicate that they are in real mode, and spin waiting
+ * to see the core split.
+ *
+ * Thread 0 waits to see that all secondaries are in real mode, and then begins
+ * the splitting procedure. It firstly sets HID0_POWER8_DYNLPARDIS, which
+ * prevents the hardware from unsplitting. Then it sets the appropriate HID bit
+ * to request the split, and spins waiting to see that the split has happened.
+ *
+ * Concurrently the secondaries will notice the split. When they do they set up
+ * their SPRs, notably SDR1, and then they can return to virtual mode and exit
+ * the procedure.
+ */
+
+/* Initialised at boot by subcore_init() */
+static int subcores_per_core;
+
+/*
+ * Used to communicate to offline cpus that we want them to pop out of the
+ * offline loop and do a split or unsplit.
+ *
+ * 0 - no split happening
+ * 1 - unsplit in progress
+ * 2 - split to 2 in progress
+ * 4 - split to 4 in progress
+ */
+static int new_split_mode;
+
+static cpumask_var_t cpu_offline_mask;
+
+struct split_state {
+	u8 step;
+	u8 master;
+};
+
+static DEFINE_PER_CPU(struct split_state, split_state);
+
+static void wait_for_sync_step(int step)
+{
+	int i, cpu = smp_processor_id();
+
+	for (i = cpu + 1; i < cpu + threads_per_core; i++)
+		while(per_cpu(split_state, i).step < step)
+			barrier();
+
+	/* Order the wait loop vs any subsequent loads/stores. */
+	mb();
+}
+
+static void unsplit_core(void)
+{
+	u64 hid0, mask;
+	int i, cpu;
+
+	mask = HID0_POWER8_2LPARMODE | HID0_POWER8_4LPARMODE;
+
+	cpu = smp_processor_id();
+	if (cpu_thread_in_core(cpu) != 0) {
+		while (mfspr(SPRN_HID0) & mask)
+			power7_nap(0);
+
+		per_cpu(split_state, cpu).step = SYNC_STEP_UNSPLIT;
+		return;
+	}
+
+	hid0 = mfspr(SPRN_HID0);
+	hid0 &= ~HID0_POWER8_DYNLPARDIS;
+	mtspr(SPRN_HID0, hid0);
+
+	while (mfspr(SPRN_HID0) & mask)
+		cpu_relax();
+
+	/* Wake secondaries out of NAP */
+	for (i = cpu + 1; i < cpu + threads_per_core; i++)
+		smp_send_reschedule(i);
+
+	wait_for_sync_step(SYNC_STEP_UNSPLIT);
+}
+
+static void split_core(int new_mode)
+{
+	struct {  u64 value; u64 mask; } split_parms[2] = {
+		{ HID0_POWER8_1TO2LPAR, HID0_POWER8_2LPARMODE },
+		{ HID0_POWER8_1TO4LPAR, HID0_POWER8_4LPARMODE }
+	};
+	int i, cpu;
+	u64 hid0;
+
+	/* Convert new_mode (2 or 4) into an index into our parms array */
+	i = (new_mode >> 1) - 1;
+	BUG_ON(i < 0 || i > 1);
+
+	cpu = smp_processor_id();
+	if (cpu_thread_in_core(cpu) != 0) {
+		split_core_secondary_loop(&per_cpu(split_state, cpu).step);
+		return;
+	}
+
+	wait_for_sync_step(SYNC_STEP_REAL_MODE);
+
+	/* Write new mode */
+	hid0  = mfspr(SPRN_HID0);
+	hid0 |= HID0_POWER8_DYNLPARDIS | split_parms[i].value;
+	mtspr(SPRN_HID0, hid0);
+
+	/* Wait for it to happen */
+	while (!(mfspr(SPRN_HID0) & split_parms[i].mask))
+		cpu_relax();
+}
+
+static void cpu_do_split(int new_mode)
+{
+	/*
+	 * At boot subcores_per_core will be 0, so we will always unsplit at
+	 * boot. In the usual case where the core is already unsplit it's a
+	 * nop, and this just ensures the kernel's notion of the mode is
+	 * consistent with the hardware.
+	 */
+	if (subcores_per_core != 1)
+		unsplit_core();
+
+	if (new_mode != 1)
+		split_core(new_mode);
+
+	mb();
+	per_cpu(split_state, smp_processor_id()).step = SYNC_STEP_FINISHED;
+}
+
+bool cpu_core_split_required(void)
+{
+	smp_rmb();
+
+	if (!new_split_mode)
+		return false;
+
+	cpu_do_split(new_split_mode);
+
+	return true;
+}
+
+static int cpu_update_split_mode(void *data)
+{
+	int cpu, new_mode = *(int *)data;
+
+	if (this_cpu_ptr(&split_state)->master) {
+		new_split_mode = new_mode;
+		smp_wmb();
+
+		cpumask_andnot(cpu_offline_mask, cpu_present_mask,
+			       cpu_online_mask);
+
+		/* This should work even though the cpu is offline */
+		for_each_cpu(cpu, cpu_offline_mask)
+			smp_send_reschedule(cpu);
+	}
+
+	cpu_do_split(new_mode);
+
+	if (this_cpu_ptr(&split_state)->master) {
+		/* Wait for all cpus to finish before we touch subcores_per_core */
+		for_each_present_cpu(cpu) {
+			if (cpu >= setup_max_cpus)
+				break;
+
+			while(per_cpu(split_state, cpu).step < SYNC_STEP_FINISHED)
+				barrier();
+		}
+
+		new_split_mode = 0;
+
+		/* Make the new mode public */
+		subcores_per_core = new_mode;
+		threads_per_subcore = threads_per_core / subcores_per_core;
+
+		/* Make sure the new mode is written before we exit */
+		mb();
+	}
+
+	return 0;
+}
+
+static int set_subcores_per_core(int new_mode)
+{
+	struct split_state *state;
+	int cpu;
+
+	if (kvm_hv_mode_active()) {
+		pr_err("Unable to change split core mode while KVM active.\n");
+		return -EBUSY;
+	}
+
+	/*
+	 * We are only called at boot, or from the sysfs write. If that ever
+	 * changes we'll need a lock here.
+	 */
+	BUG_ON(new_mode < 1 || new_mode > 4 || new_mode == 3);
+
+	for_each_present_cpu(cpu) {
+		state = &per_cpu(split_state, cpu);
+		state->step = SYNC_STEP_INITIAL;
+		state->master = 0;
+	}
+
+	get_online_cpus();
+
+	/* This cpu will update the globals before exiting stop machine */
+	this_cpu_ptr(&split_state)->master = 1;
+
+	/* Ensure state is consistent before we call the other cpus */
+	mb();
+
+	stop_machine(cpu_update_split_mode, &new_mode, cpu_online_mask);
+
+	put_online_cpus();
+
+	return 0;
+}
+
+static ssize_t __used store_subcores_per_core(struct device *dev,
+		struct device_attribute *attr, const char *buf,
+		size_t count)
+{
+	unsigned long val;
+	int rc;
+
+	/* We are serialised by the attribute lock */
+
+	rc = sscanf(buf, "%lx", &val);
+	if (rc != 1)
+		return -EINVAL;
+
+	switch (val) {
+	case 1:
+	case 2:
+	case 4:
+		if (subcores_per_core == val)
+			/* Nothing to do */
+			goto out;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	rc = set_subcores_per_core(val);
+	if (rc)
+		return rc;
+
+out:
+	return count;
+}
+
+static ssize_t show_subcores_per_core(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%x\n", subcores_per_core);
+}
+
+static DEVICE_ATTR(subcores_per_core, 0644,
+		show_subcores_per_core, store_subcores_per_core);
+
+static int subcore_init(void)
+{
+	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+		return 0;
+
+	/*
+	 * We need all threads in a core to be present to split/unsplit so
+         * continue only if max_cpus are aligned to threads_per_core.
+	 */
+	if (setup_max_cpus % threads_per_core)
+		return 0;
+
+	BUG_ON(!alloc_cpumask_var(&cpu_offline_mask, GFP_KERNEL));
+
+	set_subcores_per_core(1);
+
+	return device_create_file(cpu_subsys.dev_root,
+				  &dev_attr_subcores_per_core);
+}
+machine_device_initcall(powernv, subcore_init);
diff --git a/arch/powerpc/platforms/powernv/subcore.h b/arch/powerpc/platforms/powernv/subcore.h
new file mode 100644
index 0000000..148abc9
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/subcore.h
@@ -0,0 +1,18 @@
+/*
+ * Copyright 2013, Michael Ellerman, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/* These are ordered and tested with <= */
+#define SYNC_STEP_INITIAL	0
+#define SYNC_STEP_UNSPLIT	1	/* Set by secondary when it sees unsplit */
+#define SYNC_STEP_REAL_MODE	2	/* Set by secondary when in real mode  */
+#define SYNC_STEP_FINISHED	3	/* Set by secondary when split/unsplit is done */
+
+#ifndef __ASSEMBLY__
+void split_core_secondary_loop(u8 *state);
+#endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH v2 6/6] powerpc/powernv: Add support for POWER8 split core on powernv
@ 2014-05-23  8:15   ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23  8:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Alexander Graf
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman,
	Michael Neuling, Srivatsa S. Bhat, Mahesh Salgaonkar

From: Michael Ellerman <mpe@ellerman.id.au>

Upcoming POWER8 chips support a concept called split core. This is where the
core can be split into subcores that although not full cores, are able to
appear as full cores to a guest.

The splitting & unsplitting procedure is mildly complicated, and explained at
length in the comments within the patch.

One notable detail is that when splitting or unsplitting we need to pull
offline cpus out of their offline state to do work as part of the procedure.

The interface for changing the split mode is via a sysfs file, eg:

 $ echo 2 > /sys/devices/system/cpu/subcores_per_core

Currently supported values are '1', '2' and '4'. And indicate respectively that
the core should be unsplit, split in half, and split in quarters. These modes
correspond to threads_per_subcore of 8, 4 and 2.

We do not allow changing the split mode while KVM VMs are active. This is to
prevent the value changing while userspace is configuring the VM, and also to
prevent the mode being changed in such a way that existing guests are unable to
be run.

CPU hotplug fixes by Srivatsa.  max_cpus fixes by Mahesh.  cpuset fixes by
benh.  Fix for irq race by paulus.  The rest by mikey and mpe.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
 arch/powerpc/include/asm/reg.h               |   9 +
 arch/powerpc/platforms/powernv/Makefile      |   2 +-
 arch/powerpc/platforms/powernv/powernv.h     |   2 +
 arch/powerpc/platforms/powernv/smp.c         |  18 +-
 arch/powerpc/platforms/powernv/subcore-asm.S |  95 +++++++
 arch/powerpc/platforms/powernv/subcore.c     | 392 +++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/subcore.h     |  18 ++
 7 files changed, 527 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/platforms/powernv/subcore-asm.S
 create mode 100644 arch/powerpc/platforms/powernv/subcore.c
 create mode 100644 arch/powerpc/platforms/powernv/subcore.h

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index 29de015..2cd799b 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -225,6 +225,7 @@
 #define   CTRL_TE	0x00c00000	/* thread enable */
 #define   CTRL_RUNLATCH	0x1
 #define SPRN_DAWR	0xB4
+#define SPRN_RPR	0xBA	/* Relative Priority Register */
 #define SPRN_CIABR	0xBB
 #define   CIABR_PRIV		0x3
 #define   CIABR_PRIV_USER	1
@@ -273,8 +274,10 @@
 #define SPRN_HSRR1	0x13B	/* Hypervisor Save/Restore 1 */
 #define SPRN_IC		0x350	/* Virtual Instruction Count */
 #define SPRN_VTB	0x351	/* Virtual Time Base */
+#define SPRN_LDBAR	0x352	/* LD Base Address Register */
 #define SPRN_PMICR	0x354   /* Power Management Idle Control Reg */
 #define SPRN_PMSR	0x355   /* Power Management Status Reg */
+#define SPRN_PMMAR	0x356	/* Power Management Memory Activity Register */
 #define SPRN_PMCR	0x374	/* Power Management Control Register */
 
 /* HFSCR and FSCR bit numbers are the same */
@@ -434,6 +437,12 @@
 #define HID0_BTCD	(1<<1)		/* Branch target cache disable */
 #define HID0_NOPDST	(1<<1)		/* No-op dst, dstt, etc. instr. */
 #define HID0_NOPTI	(1<<0)		/* No-op dcbt and dcbst instr. */
+/* POWER8 HID0 bits */
+#define HID0_POWER8_4LPARMODE	__MASK(61)
+#define HID0_POWER8_2LPARMODE	__MASK(57)
+#define HID0_POWER8_1TO2LPAR	__MASK(52)
+#define HID0_POWER8_1TO4LPAR	__MASK(51)
+#define HID0_POWER8_DYNLPARDIS	__MASK(48)
 
 #define SPRN_HID1	0x3F1		/* Hardware Implementation Register 1 */
 #ifdef CONFIG_6xx
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 63cebb9..4ad0d34 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,7 +1,7 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o opal-async.o
 obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o opal-flash.o
 obj-y			+= rng.o opal-elog.o opal-dump.o opal-sysparam.o opal-sensor.o
-obj-y			+= opal-msglog.o
+obj-y			+= opal-msglog.o subcore.o subcore-asm.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h
index 0051e10..75501bf 100644
--- a/arch/powerpc/platforms/powernv/powernv.h
+++ b/arch/powerpc/platforms/powernv/powernv.h
@@ -25,4 +25,6 @@ static inline int pnv_pci_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
 
 extern void pnv_lpc_init(void);
 
+bool cpu_core_split_required(void);
+
 #endif /* _POWERNV_H */
diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c
index 65faf99..0062a43 100644
--- a/arch/powerpc/platforms/powernv/smp.c
+++ b/arch/powerpc/platforms/powernv/smp.c
@@ -161,15 +161,17 @@ static void pnv_smp_cpu_kill_self(void)
 		ppc64_runlatch_off();
 		power7_nap(1);
 		ppc64_runlatch_on();
-		if (!generic_check_cpu_restart(cpu)) {
+
+		/* Reenable IRQs briefly to clear the IPI that woke us */
+		local_irq_enable();
+		local_irq_disable();
+		mb();
+
+		if (cpu_core_split_required())
+			continue;
+
+		if (!generic_check_cpu_restart(cpu))
 			DBG("CPU%d Unexpected exit while offline !\n", cpu);
-			/* We may be getting an IPI, so we re-enable
-			 * interrupts to process it, it will be ignored
-			 * since we aren't online (hopefully)
-			 */
-			local_irq_enable();
-			local_irq_disable();
-		}
 	}
 	mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_PECE1);
 	DBG("CPU%d coming online...\n", cpu);
diff --git a/arch/powerpc/platforms/powernv/subcore-asm.S b/arch/powerpc/platforms/powernv/subcore-asm.S
new file mode 100644
index 0000000..39bb24a
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/subcore-asm.S
@@ -0,0 +1,95 @@
+/*
+ * Copyright 2013, Michael (Ellerman|Neuling), IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <asm/asm-offsets.h>
+#include <asm/ppc_asm.h>
+#include <asm/reg.h>
+
+#include "subcore.h"
+
+
+_GLOBAL(split_core_secondary_loop)
+	/*
+	 * r3 = u8 *state, used throughout the routine
+	 * r4 = temp
+	 * r5 = temp
+	 * ..
+	 * r12 = MSR
+	 */
+	mfmsr	r12
+
+	/* Disable interrupts so SRR0/1 don't get trashed */
+	li	r4,0
+	ori	r4,r4,MSR_EE|MSR_SE|MSR_BE|MSR_RI
+	andc	r4,r12,r4
+	sync
+	mtmsrd	r4
+
+	/* Switch to real mode and leave interrupts off */
+	li	r5, MSR_IR|MSR_DR
+	andc	r5, r4, r5
+
+	LOAD_REG_ADDR(r4, real_mode)
+
+	mtspr	SPRN_SRR0,r4
+	mtspr	SPRN_SRR1,r5
+	rfid
+	b	.	/* prevent speculative execution */
+
+real_mode:
+	/* Grab values from unsplit SPRs */
+	mfspr	r6,  SPRN_LDBAR
+	mfspr	r7,  SPRN_PMMAR
+	mfspr	r8,  SPRN_PMCR
+	mfspr	r9,  SPRN_RPR
+	mfspr	r10, SPRN_SDR1
+
+	/* Order reading the SPRs vs telling the primary we are ready to split */
+	sync
+
+	/* Tell thread 0 we are in real mode */
+	li	r4, SYNC_STEP_REAL_MODE
+	stb	r4, 0(r3)
+
+	li	r5, (HID0_POWER8_4LPARMODE | HID0_POWER8_2LPARMODE)@highest
+	sldi	r5, r5, 48
+
+	/* Loop until we see the split happen in HID0 */
+1:	mfspr	r4, SPRN_HID0
+	and.	r4, r4, r5
+	beq	1b
+
+	/*
+	 * We only need to initialise the below regs once for each subcore,
+	 * but it's simpler and harmless to do it on each thread.
+	 */
+
+	/* Make sure various SPRS have sane values */
+	li	r4, 0
+	mtspr	SPRN_LPID, r4
+	mtspr	SPRN_PCR, r4
+	mtspr	SPRN_HDEC, r4
+
+	/* Restore SPR values now we are split */
+	mtspr	SPRN_LDBAR, r6
+	mtspr	SPRN_PMMAR, r7
+	mtspr	SPRN_PMCR, r8
+	mtspr	SPRN_RPR, r9
+	mtspr	SPRN_SDR1, r10
+
+	LOAD_REG_ADDR(r5, virtual_mode)
+
+	/* Get out of real mode */
+	mtspr	SPRN_SRR0,r5
+	mtspr	SPRN_SRR1,r12
+	rfid
+	b	.	/* prevent speculative execution */
+
+virtual_mode:
+	blr
diff --git a/arch/powerpc/platforms/powernv/subcore.c b/arch/powerpc/platforms/powernv/subcore.c
new file mode 100644
index 0000000..894ecb3
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/subcore.c
@@ -0,0 +1,392 @@
+/*
+ * Copyright 2013, Michael (Ellerman|Neuling), IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt)	"powernv: " fmt
+
+#include <linux/kernel.h>
+#include <linux/cpu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/gfp.h>
+#include <linux/smp.h>
+#include <linux/stop_machine.h>
+
+#include <asm/cputhreads.h>
+#include <asm/kvm_ppc.h>
+#include <asm/machdep.h>
+#include <asm/opal.h>
+#include <asm/smp.h>
+
+#include "subcore.h"
+
+
+/*
+ * Split/unsplit procedure:
+ *
+ * A core can be in one of three states, unsplit, 2-way split, and 4-way split.
+ *
+ * The mapping to subcores_per_core is simple:
+ *
+ *  State       | subcores_per_core
+ *  ------------|------------------
+ *  Unsplit     |        1
+ *  2-way split |        2
+ *  4-way split |        4
+ *
+ * The core is split along thread boundaries, the mapping between subcores and
+ * threads is as follows:
+ *
+ *  Unsplit:
+ *          ----------------------------
+ *  Subcore |            0             |
+ *          ----------------------------
+ *  Thread  |  0  1  2  3  4  5  6  7  |
+ *          ----------------------------
+ *
+ *  2-way split:
+ *          -------------------------------------
+ *  Subcore |        0        |        1        |
+ *          -------------------------------------
+ *  Thread  |  0   1   2   3  |  4   5   6   7  |
+ *          -------------------------------------
+ *
+ *  4-way split:
+ *          -----------------------------------------
+ *  Subcore |    0    |    1    |    2    |    3    |
+ *          -----------------------------------------
+ *  Thread  |  0   1  |  2   3  |  4   5  |  6   7  |
+ *          -----------------------------------------
+ *
+ *
+ * Transitions
+ * -----------
+ *
+ * It is not possible to transition between either of the split states, the
+ * core must first be unsplit. The legal transitions are:
+ *
+ *  -----------          ---------------
+ *  |         |  <---->  | 2-way split |
+ *  |         |          ---------------
+ *  | Unsplit |
+ *  |         |          ---------------
+ *  |         |  <---->  | 4-way split |
+ *  -----------          ---------------
+ *
+ * Unsplitting
+ * -----------
+ *
+ * Unsplitting is the simpler procedure. It requires thread 0 to request the
+ * unsplit while all other threads NAP.
+ *
+ * Thread 0 clears HID0_POWER8_DYNLPARDIS (Dynamic LPAR Disable). This tells
+ * the hardware that if all threads except 0 are napping, the hardware should
+ * unsplit the core.
+ *
+ * Non-zero threads are sent to a NAP loop, they don't exit the loop until they
+ * see the core unsplit.
+ *
+ * Core 0 spins waiting for the hardware to see all the other threads napping
+ * and perform the unsplit.
+ *
+ * Once thread 0 sees the unsplit, it IPIs the secondary threads to wake them
+ * out of NAP. They will then see the core unsplit and exit the NAP loop.
+ *
+ * Splitting
+ * ---------
+ *
+ * The basic splitting procedure is fairly straight forward. However it is
+ * complicated by the fact that after the split occurs, the newly created
+ * subcores are not in a fully initialised state.
+ *
+ * Most notably the subcores do not have the correct value for SDR1, which
+ * means they must not be running in virtual mode when the split occurs. The
+ * subcores have separate timebases SPRs but these are pre-synchronised by
+ * opal.
+ *
+ * To begin with secondary threads are sent to an assembly routine. There they
+ * switch to real mode, so they are immune to the uninitialised SDR1 value.
+ * Once in real mode they indicate that they are in real mode, and spin waiting
+ * to see the core split.
+ *
+ * Thread 0 waits to see that all secondaries are in real mode, and then begins
+ * the splitting procedure. It firstly sets HID0_POWER8_DYNLPARDIS, which
+ * prevents the hardware from unsplitting. Then it sets the appropriate HID bit
+ * to request the split, and spins waiting to see that the split has happened.
+ *
+ * Concurrently the secondaries will notice the split. When they do they set up
+ * their SPRs, notably SDR1, and then they can return to virtual mode and exit
+ * the procedure.
+ */
+
+/* Initialised at boot by subcore_init() */
+static int subcores_per_core;
+
+/*
+ * Used to communicate to offline cpus that we want them to pop out of the
+ * offline loop and do a split or unsplit.
+ *
+ * 0 - no split happening
+ * 1 - unsplit in progress
+ * 2 - split to 2 in progress
+ * 4 - split to 4 in progress
+ */
+static int new_split_mode;
+
+static cpumask_var_t cpu_offline_mask;
+
+struct split_state {
+	u8 step;
+	u8 master;
+};
+
+static DEFINE_PER_CPU(struct split_state, split_state);
+
+static void wait_for_sync_step(int step)
+{
+	int i, cpu = smp_processor_id();
+
+	for (i = cpu + 1; i < cpu + threads_per_core; i++)
+		while(per_cpu(split_state, i).step < step)
+			barrier();
+
+	/* Order the wait loop vs any subsequent loads/stores. */
+	mb();
+}
+
+static void unsplit_core(void)
+{
+	u64 hid0, mask;
+	int i, cpu;
+
+	mask = HID0_POWER8_2LPARMODE | HID0_POWER8_4LPARMODE;
+
+	cpu = smp_processor_id();
+	if (cpu_thread_in_core(cpu) != 0) {
+		while (mfspr(SPRN_HID0) & mask)
+			power7_nap(0);
+
+		per_cpu(split_state, cpu).step = SYNC_STEP_UNSPLIT;
+		return;
+	}
+
+	hid0 = mfspr(SPRN_HID0);
+	hid0 &= ~HID0_POWER8_DYNLPARDIS;
+	mtspr(SPRN_HID0, hid0);
+
+	while (mfspr(SPRN_HID0) & mask)
+		cpu_relax();
+
+	/* Wake secondaries out of NAP */
+	for (i = cpu + 1; i < cpu + threads_per_core; i++)
+		smp_send_reschedule(i);
+
+	wait_for_sync_step(SYNC_STEP_UNSPLIT);
+}
+
+static void split_core(int new_mode)
+{
+	struct {  u64 value; u64 mask; } split_parms[2] = {
+		{ HID0_POWER8_1TO2LPAR, HID0_POWER8_2LPARMODE },
+		{ HID0_POWER8_1TO4LPAR, HID0_POWER8_4LPARMODE }
+	};
+	int i, cpu;
+	u64 hid0;
+
+	/* Convert new_mode (2 or 4) into an index into our parms array */
+	i = (new_mode >> 1) - 1;
+	BUG_ON(i < 0 || i > 1);
+
+	cpu = smp_processor_id();
+	if (cpu_thread_in_core(cpu) != 0) {
+		split_core_secondary_loop(&per_cpu(split_state, cpu).step);
+		return;
+	}
+
+	wait_for_sync_step(SYNC_STEP_REAL_MODE);
+
+	/* Write new mode */
+	hid0  = mfspr(SPRN_HID0);
+	hid0 |= HID0_POWER8_DYNLPARDIS | split_parms[i].value;
+	mtspr(SPRN_HID0, hid0);
+
+	/* Wait for it to happen */
+	while (!(mfspr(SPRN_HID0) & split_parms[i].mask))
+		cpu_relax();
+}
+
+static void cpu_do_split(int new_mode)
+{
+	/*
+	 * At boot subcores_per_core will be 0, so we will always unsplit at
+	 * boot. In the usual case where the core is already unsplit it's a
+	 * nop, and this just ensures the kernel's notion of the mode is
+	 * consistent with the hardware.
+	 */
+	if (subcores_per_core != 1)
+		unsplit_core();
+
+	if (new_mode != 1)
+		split_core(new_mode);
+
+	mb();
+	per_cpu(split_state, smp_processor_id()).step = SYNC_STEP_FINISHED;
+}
+
+bool cpu_core_split_required(void)
+{
+	smp_rmb();
+
+	if (!new_split_mode)
+		return false;
+
+	cpu_do_split(new_split_mode);
+
+	return true;
+}
+
+static int cpu_update_split_mode(void *data)
+{
+	int cpu, new_mode = *(int *)data;
+
+	if (this_cpu_ptr(&split_state)->master) {
+		new_split_mode = new_mode;
+		smp_wmb();
+
+		cpumask_andnot(cpu_offline_mask, cpu_present_mask,
+			       cpu_online_mask);
+
+		/* This should work even though the cpu is offline */
+		for_each_cpu(cpu, cpu_offline_mask)
+			smp_send_reschedule(cpu);
+	}
+
+	cpu_do_split(new_mode);
+
+	if (this_cpu_ptr(&split_state)->master) {
+		/* Wait for all cpus to finish before we touch subcores_per_core */
+		for_each_present_cpu(cpu) {
+			if (cpu >= setup_max_cpus)
+				break;
+
+			while(per_cpu(split_state, cpu).step < SYNC_STEP_FINISHED)
+				barrier();
+		}
+
+		new_split_mode = 0;
+
+		/* Make the new mode public */
+		subcores_per_core = new_mode;
+		threads_per_subcore = threads_per_core / subcores_per_core;
+
+		/* Make sure the new mode is written before we exit */
+		mb();
+	}
+
+	return 0;
+}
+
+static int set_subcores_per_core(int new_mode)
+{
+	struct split_state *state;
+	int cpu;
+
+	if (kvm_hv_mode_active()) {
+		pr_err("Unable to change split core mode while KVM active.\n");
+		return -EBUSY;
+	}
+
+	/*
+	 * We are only called at boot, or from the sysfs write. If that ever
+	 * changes we'll need a lock here.
+	 */
+	BUG_ON(new_mode < 1 || new_mode > 4 || new_mode = 3);
+
+	for_each_present_cpu(cpu) {
+		state = &per_cpu(split_state, cpu);
+		state->step = SYNC_STEP_INITIAL;
+		state->master = 0;
+	}
+
+	get_online_cpus();
+
+	/* This cpu will update the globals before exiting stop machine */
+	this_cpu_ptr(&split_state)->master = 1;
+
+	/* Ensure state is consistent before we call the other cpus */
+	mb();
+
+	stop_machine(cpu_update_split_mode, &new_mode, cpu_online_mask);
+
+	put_online_cpus();
+
+	return 0;
+}
+
+static ssize_t __used store_subcores_per_core(struct device *dev,
+		struct device_attribute *attr, const char *buf,
+		size_t count)
+{
+	unsigned long val;
+	int rc;
+
+	/* We are serialised by the attribute lock */
+
+	rc = sscanf(buf, "%lx", &val);
+	if (rc != 1)
+		return -EINVAL;
+
+	switch (val) {
+	case 1:
+	case 2:
+	case 4:
+		if (subcores_per_core = val)
+			/* Nothing to do */
+			goto out;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	rc = set_subcores_per_core(val);
+	if (rc)
+		return rc;
+
+out:
+	return count;
+}
+
+static ssize_t show_subcores_per_core(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%x\n", subcores_per_core);
+}
+
+static DEVICE_ATTR(subcores_per_core, 0644,
+		show_subcores_per_core, store_subcores_per_core);
+
+static int subcore_init(void)
+{
+	if (!cpu_has_feature(CPU_FTR_ARCH_207S))
+		return 0;
+
+	/*
+	 * We need all threads in a core to be present to split/unsplit so
+         * continue only if max_cpus are aligned to threads_per_core.
+	 */
+	if (setup_max_cpus % threads_per_core)
+		return 0;
+
+	BUG_ON(!alloc_cpumask_var(&cpu_offline_mask, GFP_KERNEL));
+
+	set_subcores_per_core(1);
+
+	return device_create_file(cpu_subsys.dev_root,
+				  &dev_attr_subcores_per_core);
+}
+machine_device_initcall(powernv, subcore_init);
diff --git a/arch/powerpc/platforms/powernv/subcore.h b/arch/powerpc/platforms/powernv/subcore.h
new file mode 100644
index 0000000..148abc9
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/subcore.h
@@ -0,0 +1,18 @@
+/*
+ * Copyright 2013, Michael Ellerman, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+/* These are ordered and tested with <= */
+#define SYNC_STEP_INITIAL	0
+#define SYNC_STEP_UNSPLIT	1	/* Set by secondary when it sees unsplit */
+#define SYNC_STEP_REAL_MODE	2	/* Set by secondary when in real mode  */
+#define SYNC_STEP_FINISHED	3	/* Set by secondary when split/unsplit is done */
+
+#ifndef __ASSEMBLY__
+void split_core_secondary_loop(u8 *state);
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
  2014-05-23  8:15 ` Michael Neuling
  (?)
@ 2014-05-23  9:53   ` Alexander Graf
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexander Graf @ 2014-05-23  9:53 UTC (permalink / raw)
  To: Michael Neuling, Benjamin Herrenschmidt
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman


On 23.05.14 10:15, Michael Neuling wrote:
> This patch series implements split core mode on POWER8.  This enables up to 4
> subcores per core which can each independently run guests (per guest SPRs like
> SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
> feature in the code and commit messages.
>
> Most of this code is in the powernv platform but there's a couple of KVM
> specific patches too.
>
> Patch series authored by mpe and me with a few bug fixes from others.
>
> v2:
>    There are some minor updates based on comments and I've added the Acks by
>    Paulus and Alex for the KVM code.

I don't see changelogs inside the individual patches. Please make sure 
to always mention what changed from one version to the next in a 
particular patch, so that I have the chance to check whether that change 
was good :).

Also, is there any performance penalty associated with split core mode? 
If not, could we just always default to split-by-4 on POWER8 bare metal?


Alex

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23  9:53   ` Alexander Graf
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Graf @ 2014-05-23  9:53 UTC (permalink / raw)
  To: Michael Neuling, Benjamin Herrenschmidt
  Cc: linuxppc-dev, Paul Mackerras, kvm-ppc, kvm


On 23.05.14 10:15, Michael Neuling wrote:
> This patch series implements split core mode on POWER8.  This enables up to 4
> subcores per core which can each independently run guests (per guest SPRs like
> SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
> feature in the code and commit messages.
>
> Most of this code is in the powernv platform but there's a couple of KVM
> specific patches too.
>
> Patch series authored by mpe and me with a few bug fixes from others.
>
> v2:
>    There are some minor updates based on comments and I've added the Acks by
>    Paulus and Alex for the KVM code.

I don't see changelogs inside the individual patches. Please make sure 
to always mention what changed from one version to the next in a 
particular patch, so that I have the chance to check whether that change 
was good :).

Also, is there any performance penalty associated with split core mode? 
If not, could we just always default to split-by-4 on POWER8 bare metal?


Alex

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23  9:53   ` Alexander Graf
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Graf @ 2014-05-23  9:53 UTC (permalink / raw)
  To: Michael Neuling, Benjamin Herrenschmidt
  Cc: Paul Mackerras, linuxppc-dev, kvm, kvm-ppc, Michael Ellerman


On 23.05.14 10:15, Michael Neuling wrote:
> This patch series implements split core mode on POWER8.  This enables up to 4
> subcores per core which can each independently run guests (per guest SPRs like
> SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
> feature in the code and commit messages.
>
> Most of this code is in the powernv platform but there's a couple of KVM
> specific patches too.
>
> Patch series authored by mpe and me with a few bug fixes from others.
>
> v2:
>    There are some minor updates based on comments and I've added the Acks by
>    Paulus and Alex for the KVM code.

I don't see changelogs inside the individual patches. Please make sure 
to always mention what changed from one version to the next in a 
particular patch, so that I have the chance to check whether that change 
was good :).

Also, is there any performance penalty associated with split core mode? 
If not, could we just always default to split-by-4 on POWER8 bare metal?


Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
  2014-05-23  9:53   ` Alexander Graf
  (?)
@ 2014-05-23 10:00     ` Michael Neuling
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23 10:00 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman

On Fri, 2014-05-23 at 11:53 +0200, Alexander Graf wrote:
> On 23.05.14 10:15, Michael Neuling wrote:
> > This patch series implements split core mode on POWER8.  This enables up to 4
> > subcores per core which can each independently run guests (per guest SPRs like
> > SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
> > feature in the code and commit messages.
> >
> > Most of this code is in the powernv platform but there's a couple of KVM
> > specific patches too.
> >
> > Patch series authored by mpe and me with a few bug fixes from others.
> >
> > v2:
> >    There are some minor updates based on comments and I've added the Acks by
> >    Paulus and Alex for the KVM code.
> 
> I don't see changelogs inside the individual patches. Please make sure 
> to always mention what changed from one version to the next in a 
> particular patch, so that I have the chance to check whether that change 
> was good :).

Sure, that was a bit sloppy

Only the last patch was the only one that changed.  I changed the sysfs
file from 600 permissions to 644 so that users can read it more easily
as requested by Joel.

The other change was to fix the possibility of a race when coming out of
nap and checking if we need to split.  This fix was form paulus' (worked
offline).

> Also, is there any performance penalty associated with split core mode? 
> If not, could we just always default to split-by-4 on POWER8 bare metal?

Yeah, there is a performance hit .  When you are split (ie
subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
only have 1 thread active (others napped), you won't get the benefit of
ST mode in the core (more register renames per HW thread, more FXUs,
more FPUs etc).

Mikey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:00     ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23 10:00 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

On Fri, 2014-05-23 at 11:53 +0200, Alexander Graf wrote:
> On 23.05.14 10:15, Michael Neuling wrote:
> > This patch series implements split core mode on POWER8.  This enables u=
p to 4
> > subcores per core which can each independently run guests (per guest SP=
Rs like
> > SDR1, LPIDR etc are replicated per subcore).  Lots more documentation o=
n this
> > feature in the code and commit messages.
> >
> > Most of this code is in the powernv platform but there's a couple of KV=
M
> > specific patches too.
> >
> > Patch series authored by mpe and me with a few bug fixes from others.
> >
> > v2:
> >    There are some minor updates based on comments and I've added the Ac=
ks by
> >    Paulus and Alex for the KVM code.
>=20
> I don't see changelogs inside the individual patches. Please make sure=
=20
> to always mention what changed from one version to the next in a=20
> particular patch, so that I have the chance to check whether that change=
=20
> was good :).

Sure, that was a bit sloppy

Only the last patch was the only one that changed.  I changed the sysfs
file from 600 permissions to 644 so that users can read it more easily
as requested by Joel.

The other change was to fix the possibility of a race when coming out of
nap and checking if we need to split.  This fix was form paulus' (worked
offline).

> Also, is there any performance penalty associated with split core mode?=
=20
> If not, could we just always default to split-by-4 on POWER8 bare metal?

Yeah, there is a performance hit .  When you are split (ie
subcores_per_core =3D 2 or 4), the core is stuck in SMT8 mode.  So if you
only have 1 thread active (others napped), you won't get the benefit of
ST mode in the core (more register renames per HW thread, more FXUs,
more FPUs etc).

Mikey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:00     ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23 10:00 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman

On Fri, 2014-05-23 at 11:53 +0200, Alexander Graf wrote:
> On 23.05.14 10:15, Michael Neuling wrote:
> > This patch series implements split core mode on POWER8.  This enables up to 4
> > subcores per core which can each independently run guests (per guest SPRs like
> > SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
> > feature in the code and commit messages.
> >
> > Most of this code is in the powernv platform but there's a couple of KVM
> > specific patches too.
> >
> > Patch series authored by mpe and me with a few bug fixes from others.
> >
> > v2:
> >    There are some minor updates based on comments and I've added the Acks by
> >    Paulus and Alex for the KVM code.
> 
> I don't see changelogs inside the individual patches. Please make sure 
> to always mention what changed from one version to the next in a 
> particular patch, so that I have the chance to check whether that change 
> was good :).

Sure, that was a bit sloppy

Only the last patch was the only one that changed.  I changed the sysfs
file from 600 permissions to 644 so that users can read it more easily
as requested by Joel.

The other change was to fix the possibility of a race when coming out of
nap and checking if we need to split.  This fix was form paulus' (worked
offline).

> Also, is there any performance penalty associated with split core mode? 
> If not, could we just always default to split-by-4 on POWER8 bare metal?

Yeah, there is a performance hit .  When you are split (ie
subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
only have 1 thread active (others napped), you won't get the benefit of
ST mode in the core (more register renames per HW thread, more FXUs,
more FPUs etc).

Mikey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
  2014-05-23 10:00     ` Michael Neuling
  (?)
@ 2014-05-23 10:05       ` Alexander Graf
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexander Graf @ 2014-05-23 10:05 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman


On 23.05.14 12:00, Michael Neuling wrote:
> On Fri, 2014-05-23 at 11:53 +0200, Alexander Graf wrote:
>> On 23.05.14 10:15, Michael Neuling wrote:
>>> This patch series implements split core mode on POWER8.  This enables up to 4
>>> subcores per core which can each independently run guests (per guest SPRs like
>>> SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
>>> feature in the code and commit messages.
>>>
>>> Most of this code is in the powernv platform but there's a couple of KVM
>>> specific patches too.
>>>
>>> Patch series authored by mpe and me with a few bug fixes from others.
>>>
>>> v2:
>>>     There are some minor updates based on comments and I've added the Acks by
>>>     Paulus and Alex for the KVM code.
>> I don't see changelogs inside the individual patches. Please make sure
>> to always mention what changed from one version to the next in a
>> particular patch, so that I have the chance to check whether that change
>> was good :).
> Sure, that was a bit sloppy
>
> Only the last patch was the only one that changed.  I changed the sysfs
> file from 600 permissions to 644 so that users can read it more easily
> as requested by Joel.
>
> The other change was to fix the possibility of a race when coming out of
> nap and checking if we need to split.  This fix was form paulus' (worked
> offline).
>
>> Also, is there any performance penalty associated with split core mode?
>> If not, could we just always default to split-by-4 on POWER8 bare metal?
> Yeah, there is a performance hit .  When you are split (ie
> subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
> only have 1 thread active (others napped), you won't get the benefit of
> ST mode in the core (more register renames per HW thread, more FXUs,
> more FPUs etc).

Ok, imagine I have 1 core with SMT8. I have one process running at 100% 
occupying one thread, the other 7 threads are idle.

Do I get performance benefits from having the other threads idle? Or do 
I have to configure the system into SMT1 mode to get my ST benefits?

If it's the latter, we could just have ppc64_cpu --smt=x also set the 
subcore amount in parallel to the thread count.

The reason I'm bringing this up is that I'm not quite sure who would be 
the instance doing these performance tweaks. So I'd guess the majority 
of users will simply miss out on them.


Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:05       ` Alexander Graf
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Graf @ 2014-05-23 10:05 UTC (permalink / raw)
  To: Michael Neuling; +Cc: kvm, kvm-ppc, Paul Mackerras, linuxppc-dev


On 23.05.14 12:00, Michael Neuling wrote:
> On Fri, 2014-05-23 at 11:53 +0200, Alexander Graf wrote:
>> On 23.05.14 10:15, Michael Neuling wrote:
>>> This patch series implements split core mode on POWER8.  This enables up to 4
>>> subcores per core which can each independently run guests (per guest SPRs like
>>> SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
>>> feature in the code and commit messages.
>>>
>>> Most of this code is in the powernv platform but there's a couple of KVM
>>> specific patches too.
>>>
>>> Patch series authored by mpe and me with a few bug fixes from others.
>>>
>>> v2:
>>>     There are some minor updates based on comments and I've added the Acks by
>>>     Paulus and Alex for the KVM code.
>> I don't see changelogs inside the individual patches. Please make sure
>> to always mention what changed from one version to the next in a
>> particular patch, so that I have the chance to check whether that change
>> was good :).
> Sure, that was a bit sloppy
>
> Only the last patch was the only one that changed.  I changed the sysfs
> file from 600 permissions to 644 so that users can read it more easily
> as requested by Joel.
>
> The other change was to fix the possibility of a race when coming out of
> nap and checking if we need to split.  This fix was form paulus' (worked
> offline).
>
>> Also, is there any performance penalty associated with split core mode?
>> If not, could we just always default to split-by-4 on POWER8 bare metal?
> Yeah, there is a performance hit .  When you are split (ie
> subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
> only have 1 thread active (others napped), you won't get the benefit of
> ST mode in the core (more register renames per HW thread, more FXUs,
> more FPUs etc).

Ok, imagine I have 1 core with SMT8. I have one process running at 100% 
occupying one thread, the other 7 threads are idle.

Do I get performance benefits from having the other threads idle? Or do 
I have to configure the system into SMT1 mode to get my ST benefits?

If it's the latter, we could just have ppc64_cpu --smt=x also set the 
subcore amount in parallel to the thread count.

The reason I'm bringing this up is that I'm not quite sure who would be 
the instance doing these performance tweaks. So I'd guess the majority 
of users will simply miss out on them.


Alex

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:05       ` Alexander Graf
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Graf @ 2014-05-23 10:05 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman


On 23.05.14 12:00, Michael Neuling wrote:
> On Fri, 2014-05-23 at 11:53 +0200, Alexander Graf wrote:
>> On 23.05.14 10:15, Michael Neuling wrote:
>>> This patch series implements split core mode on POWER8.  This enables up to 4
>>> subcores per core which can each independently run guests (per guest SPRs like
>>> SDR1, LPIDR etc are replicated per subcore).  Lots more documentation on this
>>> feature in the code and commit messages.
>>>
>>> Most of this code is in the powernv platform but there's a couple of KVM
>>> specific patches too.
>>>
>>> Patch series authored by mpe and me with a few bug fixes from others.
>>>
>>> v2:
>>>     There are some minor updates based on comments and I've added the Acks by
>>>     Paulus and Alex for the KVM code.
>> I don't see changelogs inside the individual patches. Please make sure
>> to always mention what changed from one version to the next in a
>> particular patch, so that I have the chance to check whether that change
>> was good :).
> Sure, that was a bit sloppy
>
> Only the last patch was the only one that changed.  I changed the sysfs
> file from 600 permissions to 644 so that users can read it more easily
> as requested by Joel.
>
> The other change was to fix the possibility of a race when coming out of
> nap and checking if we need to split.  This fix was form paulus' (worked
> offline).
>
>> Also, is there any performance penalty associated with split core mode?
>> If not, could we just always default to split-by-4 on POWER8 bare metal?
> Yeah, there is a performance hit .  When you are split (ie
> subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
> only have 1 thread active (others napped), you won't get the benefit of
> ST mode in the core (more register renames per HW thread, more FXUs,
> more FPUs etc).

Ok, imagine I have 1 core with SMT8. I have one process running at 100% 
occupying one thread, the other 7 threads are idle.

Do I get performance benefits from having the other threads idle? Or do 
I have to configure the system into SMT1 mode to get my ST benefits?

If it's the latter, we could just have ppc64_cpu --smt=x also set the 
subcore amount in parallel to the thread count.

The reason I'm bringing this up is that I'm not quite sure who would be 
the instance doing these performance tweaks. So I'd guess the majority 
of users will simply miss out on them.


Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
  2014-05-23 10:05       ` Alexander Graf
  (?)
@ 2014-05-23 10:11         ` Michael Neuling
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23 10:11 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman

> >> Also, is there any performance penalty associated with split core mode?
> >> If not, could we just always default to split-by-4 on POWER8 bare metal?
> > Yeah, there is a performance hit .  When you are split (ie
> > subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
> > only have 1 thread active (others napped), you won't get the benefit of
> > ST mode in the core (more register renames per HW thread, more FXUs,
> > more FPUs etc).
> 
> Ok, imagine I have 1 core with SMT8. I have one process running at 100% 
> occupying one thread, the other 7 threads are idle.
> 
> Do I get performance benefits from having the other threads idle? Or do 
> I have to configure the system into SMT1 mode to get my ST benefits?

You automatically get the performance benefit when they are idle.  When
threads enter nap, the core is able to reduce it's SMT mode
automatically. 

> If it's the latter, we could just have ppc64_cpu --smt=x also set the 
> subcore amount in parallel to the thread count.

FWIW on powernv we just nap the threads on hotplug.

> The reason I'm bringing this up is that I'm not quite sure who would be 
> the instance doing these performance tweaks. So I'd guess the majority 
> of users will simply miss out on them.

Everyone, it's automatic on idle... except for split core mode
unfortunately.

Mikey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:11         ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23 10:11 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

> >> Also, is there any performance penalty associated with split core mode=
?
> >> If not, could we just always default to split-by-4 on POWER8 bare meta=
l?
> > Yeah, there is a performance hit .  When you are split (ie
> > subcores_per_core =3D 2 or 4), the core is stuck in SMT8 mode.  So if y=
ou
> > only have 1 thread active (others napped), you won't get the benefit of
> > ST mode in the core (more register renames per HW thread, more FXUs,
> > more FPUs etc).
>=20
> Ok, imagine I have 1 core with SMT8. I have one process running at 100%=
=20
> occupying one thread, the other 7 threads are idle.
>=20
> Do I get performance benefits from having the other threads idle? Or do=
=20
> I have to configure the system into SMT1 mode to get my ST benefits?

You automatically get the performance benefit when they are idle.  When
threads enter nap, the core is able to reduce it's SMT mode
automatically.=20

> If it's the latter, we could just have ppc64_cpu --smt=3Dx also set the=
=20
> subcore amount in parallel to the thread count.

FWIW on powernv we just nap the threads on hotplug.

> The reason I'm bringing this up is that I'm not quite sure who would be=
=20
> the instance doing these performance tweaks. So I'd guess the majority=
=20
> of users will simply miss out on them.

Everyone, it's automatic on idle... except for split core mode
unfortunately.

Mikey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:11         ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23 10:11 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman

> >> Also, is there any performance penalty associated with split core mode?
> >> If not, could we just always default to split-by-4 on POWER8 bare metal?
> > Yeah, there is a performance hit .  When you are split (ie
> > subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
> > only have 1 thread active (others napped), you won't get the benefit of
> > ST mode in the core (more register renames per HW thread, more FXUs,
> > more FPUs etc).
> 
> Ok, imagine I have 1 core with SMT8. I have one process running at 100% 
> occupying one thread, the other 7 threads are idle.
> 
> Do I get performance benefits from having the other threads idle? Or do 
> I have to configure the system into SMT1 mode to get my ST benefits?

You automatically get the performance benefit when they are idle.  When
threads enter nap, the core is able to reduce it's SMT mode
automatically. 

> If it's the latter, we could just have ppc64_cpu --smt=x also set the 
> subcore amount in parallel to the thread count.

FWIW on powernv we just nap the threads on hotplug.

> The reason I'm bringing this up is that I'm not quite sure who would be 
> the instance doing these performance tweaks. So I'd guess the majority 
> of users will simply miss out on them.

Everyone, it's automatic on idle... except for split core mode
unfortunately.

Mikey


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
  2014-05-23 10:11         ` Michael Neuling
  (?)
@ 2014-05-23 10:27           ` Alexander Graf
  -1 siblings, 0 replies; 39+ messages in thread
From: Alexander Graf @ 2014-05-23 10:27 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman


On 23.05.14 12:11, Michael Neuling wrote:
>>>> Also, is there any performance penalty associated with split core mode?
>>>> If not, could we just always default to split-by-4 on POWER8 bare metal?
>>> Yeah, there is a performance hit .  When you are split (ie
>>> subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
>>> only have 1 thread active (others napped), you won't get the benefit of
>>> ST mode in the core (more register renames per HW thread, more FXUs,
>>> more FPUs etc).
>> Ok, imagine I have 1 core with SMT8. I have one process running at 100%
>> occupying one thread, the other 7 threads are idle.
>>
>> Do I get performance benefits from having the other threads idle? Or do
>> I have to configure the system into SMT1 mode to get my ST benefits?
> You automatically get the performance benefit when they are idle.  When
> threads enter nap, the core is able to reduce it's SMT mode
> automatically.

Unless in split core mode - meh. That's a real bummer then, yeah.


>
>> If it's the latter, we could just have ppc64_cpu --smt=x also set the
>> subcore amount in parallel to the thread count.
> FWIW on powernv we just nap the threads on hotplug.
>
>> The reason I'm bringing this up is that I'm not quite sure who would be
>> the instance doing these performance tweaks. So I'd guess the majority
>> of users will simply miss out on them.
> Everyone, it's automatic on idle... except for split core mode
> unfortunately.

Oh I meant when you want to use a POWER system as VM host, you have to 
know about split core mode and configure it accordingly. That's 
something someone needs to do. And it's different from x86 which means 
people may miss out on it for their performance benchmarks.

But if we impose a general performance penalty for everyone with it, I 
don't think split core mode should be enabled by default.


Alex

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:27           ` Alexander Graf
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Graf @ 2014-05-23 10:27 UTC (permalink / raw)
  To: Michael Neuling; +Cc: kvm, kvm-ppc, Paul Mackerras, linuxppc-dev


On 23.05.14 12:11, Michael Neuling wrote:
>>>> Also, is there any performance penalty associated with split core mode?
>>>> If not, could we just always default to split-by-4 on POWER8 bare metal?
>>> Yeah, there is a performance hit .  When you are split (ie
>>> subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
>>> only have 1 thread active (others napped), you won't get the benefit of
>>> ST mode in the core (more register renames per HW thread, more FXUs,
>>> more FPUs etc).
>> Ok, imagine I have 1 core with SMT8. I have one process running at 100%
>> occupying one thread, the other 7 threads are idle.
>>
>> Do I get performance benefits from having the other threads idle? Or do
>> I have to configure the system into SMT1 mode to get my ST benefits?
> You automatically get the performance benefit when they are idle.  When
> threads enter nap, the core is able to reduce it's SMT mode
> automatically.

Unless in split core mode - meh. That's a real bummer then, yeah.


>
>> If it's the latter, we could just have ppc64_cpu --smt=x also set the
>> subcore amount in parallel to the thread count.
> FWIW on powernv we just nap the threads on hotplug.
>
>> The reason I'm bringing this up is that I'm not quite sure who would be
>> the instance doing these performance tweaks. So I'd guess the majority
>> of users will simply miss out on them.
> Everyone, it's automatic on idle... except for split core mode
> unfortunately.

Oh I meant when you want to use a POWER system as VM host, you have to 
know about split core mode and configure it accordingly. That's 
something someone needs to do. And it's different from x86 which means 
people may miss out on it for their performance benchmarks.

But if we impose a general performance penalty for everyone with it, I 
don't think split core mode should be enabled by default.


Alex

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:27           ` Alexander Graf
  0 siblings, 0 replies; 39+ messages in thread
From: Alexander Graf @ 2014-05-23 10:27 UTC (permalink / raw)
  To: Michael Neuling
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman


On 23.05.14 12:11, Michael Neuling wrote:
>>>> Also, is there any performance penalty associated with split core mode?
>>>> If not, could we just always default to split-by-4 on POWER8 bare metal?
>>> Yeah, there is a performance hit .  When you are split (ie
>>> subcores_per_core = 2 or 4), the core is stuck in SMT8 mode.  So if you
>>> only have 1 thread active (others napped), you won't get the benefit of
>>> ST mode in the core (more register renames per HW thread, more FXUs,
>>> more FPUs etc).
>> Ok, imagine I have 1 core with SMT8. I have one process running at 100%
>> occupying one thread, the other 7 threads are idle.
>>
>> Do I get performance benefits from having the other threads idle? Or do
>> I have to configure the system into SMT1 mode to get my ST benefits?
> You automatically get the performance benefit when they are idle.  When
> threads enter nap, the core is able to reduce it's SMT mode
> automatically.

Unless in split core mode - meh. That's a real bummer then, yeah.


>
>> If it's the latter, we could just have ppc64_cpu --smt=x also set the
>> subcore amount in parallel to the thread count.
> FWIW on powernv we just nap the threads on hotplug.
>
>> The reason I'm bringing this up is that I'm not quite sure who would be
>> the instance doing these performance tweaks. So I'd guess the majority
>> of users will simply miss out on them.
> Everyone, it's automatic on idle... except for split core mode
> unfortunately.

Oh I meant when you want to use a POWER system as VM host, you have to 
know about split core mode and configure it accordingly. That's 
something someone needs to do. And it's different from x86 which means 
people may miss out on it for their performance benchmarks.

But if we impose a general performance penalty for everyone with it, I 
don't think split core mode should be enabled by default.


Alex


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
  2014-05-23 10:27           ` Alexander Graf
  (?)
@ 2014-05-23 10:50             ` Michael Neuling
  -1 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23 10:50 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman

Alex,

> >> If it's the latter, we could just have ppc64_cpu --smt=x also set the
> >> subcore amount in parallel to the thread count.
> > FWIW on powernv we just nap the threads on hotplug.
> >
> >> The reason I'm bringing this up is that I'm not quite sure who would be
> >> the instance doing these performance tweaks. So I'd guess the majority
> >> of users will simply miss out on them.
> > Everyone, it's automatic on idle... except for split core mode
> > unfortunately.
> 
> Oh I meant when you want to use a POWER system as VM host, you have to 
> know about split core mode and configure it accordingly. That's 
> something someone needs to do. And it's different from x86 which means 
> people may miss out on it for their performance benchmarks.

It depends on what's running.  If you have 1 guest per core, then
running unsplit is probably best as you can nap threads as needed and
improve performance.  

If you have more than two guests per core, then running split core can
hugely improve performance as they may be able to run at the same time
without context switching.  4 guests with 2 threads per core can run at
the same time on a single physical core.

One thing to note here is guest doorbell IRQs (new in POWER8).  They
can't cross a core or subcore boundary and there is no way for the
hypervisor to virtualise them.  Hence if you run split 4 on an SMT8
POWER8, you can only run guests up to 2 threads per core (rather than 8
threads per core).

> But if we impose a general performance penalty for everyone with it, I 
> don't think split core mode should be enabled by default.

FWIW we'd like to make this dynamic eventually, so that each core is run
in whatever mode is currently best based on the running guests.

Mikey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:50             ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23 10:50 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Paul Mackerras, linuxppc-dev

Alex,

> >> If it's the latter, we could just have ppc64_cpu --smt=3Dx also set th=
e
> >> subcore amount in parallel to the thread count.
> > FWIW on powernv we just nap the threads on hotplug.
> >
> >> The reason I'm bringing this up is that I'm not quite sure who would b=
e
> >> the instance doing these performance tweaks. So I'd guess the majority
> >> of users will simply miss out on them.
> > Everyone, it's automatic on idle... except for split core mode
> > unfortunately.
>=20
> Oh I meant when you want to use a POWER system as VM host, you have to=
=20
> know about split core mode and configure it accordingly. That's=20
> something someone needs to do. And it's different from x86 which means=
=20
> people may miss out on it for their performance benchmarks.

It depends on what's running.  If you have 1 guest per core, then
running unsplit is probably best as you can nap threads as needed and
improve performance. =20

If you have more than two guests per core, then running split core can
hugely improve performance as they may be able to run at the same time
without context switching.  4 guests with 2 threads per core can run at
the same time on a single physical core.

One thing to note here is guest doorbell IRQs (new in POWER8).  They
can't cross a core or subcore boundary and there is no way for the
hypervisor to virtualise them.  Hence if you run split 4 on an SMT8
POWER8, you can only run guests up to 2 threads per core (rather than 8
threads per core).

> But if we impose a general performance penalty for everyone with it, I=
=20
> don't think split core mode should be enabled by default.

FWIW we'd like to make this dynamic eventually, so that each core is run
in whatever mode is currently best based on the running guests.

Mikey

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v2 0/6] Implement split core for POWER8
@ 2014-05-23 10:50             ` Michael Neuling
  0 siblings, 0 replies; 39+ messages in thread
From: Michael Neuling @ 2014-05-23 10:50 UTC (permalink / raw)
  To: Alexander Graf
  Cc: Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev, kvm,
	kvm-ppc, Michael Ellerman

Alex,

> >> If it's the latter, we could just have ppc64_cpu --smt=x also set the
> >> subcore amount in parallel to the thread count.
> > FWIW on powernv we just nap the threads on hotplug.
> >
> >> The reason I'm bringing this up is that I'm not quite sure who would be
> >> the instance doing these performance tweaks. So I'd guess the majority
> >> of users will simply miss out on them.
> > Everyone, it's automatic on idle... except for split core mode
> > unfortunately.
> 
> Oh I meant when you want to use a POWER system as VM host, you have to 
> know about split core mode and configure it accordingly. That's 
> something someone needs to do. And it's different from x86 which means 
> people may miss out on it for their performance benchmarks.

It depends on what's running.  If you have 1 guest per core, then
running unsplit is probably best as you can nap threads as needed and
improve performance.  

If you have more than two guests per core, then running split core can
hugely improve performance as they may be able to run at the same time
without context switching.  4 guests with 2 threads per core can run at
the same time on a single physical core.

One thing to note here is guest doorbell IRQs (new in POWER8).  They
can't cross a core or subcore boundary and there is no way for the
hypervisor to virtualise them.  Hence if you run split 4 on an SMT8
POWER8, you can only run guests up to 2 threads per core (rather than 8
threads per core).

> But if we impose a general performance penalty for everyone with it, I 
> don't think split core mode should be enabled by default.

FWIW we'd like to make this dynamic eventually, so that each core is run
in whatever mode is currently best based on the running guests.

Mikey


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2014-05-23 10:50 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-23  8:15 [PATCH v2 0/6] Implement split core for POWER8 Michael Neuling
2014-05-23  8:15 ` Michael Neuling
2014-05-23  8:15 ` Michael Neuling
2014-05-23  8:15 ` [PATCH v2 1/6] KVM: PPC: Book3S HV: Rework the secondary inhibit code Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15 ` [PATCH v2 2/6] powerpc/powernv: Make it possible to skip the IRQHAPPENED check in power7_nap() Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15 ` [PATCH v2 3/6] powerpc: Add threads_per_subcore Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15 ` [PATCH v2 4/6] powerpc: Check cpu_thread_in_subcore() in __cpu_up() Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15 ` [PATCH v2 5/6] KVM: PPC: Book3S HV: Use threads_per_subcore in KVM Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15 ` [PATCH v2 6/6] powerpc/powernv: Add support for POWER8 split core on powernv Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  8:15   ` Michael Neuling
2014-05-23  9:53 ` [PATCH v2 0/6] Implement split core for POWER8 Alexander Graf
2014-05-23  9:53   ` Alexander Graf
2014-05-23  9:53   ` Alexander Graf
2014-05-23 10:00   ` Michael Neuling
2014-05-23 10:00     ` Michael Neuling
2014-05-23 10:00     ` Michael Neuling
2014-05-23 10:05     ` Alexander Graf
2014-05-23 10:05       ` Alexander Graf
2014-05-23 10:05       ` Alexander Graf
2014-05-23 10:11       ` Michael Neuling
2014-05-23 10:11         ` Michael Neuling
2014-05-23 10:11         ` Michael Neuling
2014-05-23 10:27         ` Alexander Graf
2014-05-23 10:27           ` Alexander Graf
2014-05-23 10:27           ` Alexander Graf
2014-05-23 10:50           ` Michael Neuling
2014-05-23 10:50             ` Michael Neuling
2014-05-23 10:50             ` Michael Neuling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.