linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host
@ 2014-10-16 19:29 kernelfans
  2014-10-16 19:29 ` [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system kernelfans
                   ` (11 more replies)
  0 siblings, 12 replies; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

Nowadays, when running powerKVM(book3s, hv mode), we should make the secondary hwthread
offline. Which means that if we run misc tsks other than dedicated KVM (e.g mix java and KVM),
we will lose the compute power of the secondary hwthread on host env.


This series aim to make the powerpc adaptive to the misc tsks on host.
( This series is just a sketch, with some broken patch. Sorry to bring up it in a hurry,
  I am afraid that I am on the wrong direction too far. So I hope I can get some advice and feedback
  in advance. I will go on the work on the "place holder" patch if my idea is reasonable.

  Please consider the code as the explaining of my idea.
)


The internal:
  -1.To enter guest, the primary hwthread schedule stopper func on the secondary to bring them into NAP mode.
        The proto will be:
             cpu1                              cpuX
          stop_cpus_async()
                                          bring cpuX to a special state
                                          signal flag and trapped
          check for flag
          set up guest env and ipi cpuX        
  -2.When exit to host, the secondary hardcode to jmp back to the stopper func, i.e back to host.


Drawbacks that I can think so far:
  -1. increase the sched interval on secondary but the schduler do NOT know it.(can it cause problem?)
  -2. lose some presice of hrtime on secondary hwthread for host.(To avoid the primary
       has too small time slice, we need to impose a threshold,so we may lose the presice)

Any suggestion? Thanks!

Liu Ping Fan (11):
  sched: introduce sys_cpumask in tsk to adapt asymmetric system
  powerpc: kvm: ensure vcpu-thread run only on primary hwthread
  powerpc: kvm: add interface to control kvm function on a core
  powerpc: kvm: introduce a kthread on primary thread to anti tickless
  sched: introduce stop_cpus_async() to schedule special tsk on cpu
  powerpc: kvm: introduce online in paca to indicate whether cpu is
    needed by host
  powerpc: kvm: the stopper func to cease secondary hwthread
  powerpc: kvm: add a flag in vcore to sync primary with secondry
    hwthread
  powerpc: kvm: handle time base on secondary hwthread
  powerpc: kvm: on_primary_thread() force the secondary threads into NAP
        mode
  powerpc: kvm: Kconfig add an option for enabling secondary hwthread

 arch/powerpc/include/asm/kvm_host.h     |  6 ++++
 arch/powerpc/include/asm/paca.h         |  3 ++
 arch/powerpc/kernel/asm-offsets.c       |  6 ++++
 arch/powerpc/kernel/smp.c               |  3 ++
 arch/powerpc/kernel/sysfs.c             | 41 ++++++++++++++++++++++
 arch/powerpc/kvm/Kconfig                |  4 +++
 arch/powerpc/kvm/book3s_hv.c            | 39 +++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 61 +++++++++++++++++++++++++++++++++
 arch/powerpc/sysdev/xics/xics-common.c  | 12 +++++++
 include/linux/init_task.h               |  1 +
 include/linux/sched.h                   |  6 ++++
 include/linux/stop_machine.h            |  2 ++
 kernel/sched/core.c                     | 10 ++++--
 kernel/stop_machine.c                   | 25 +++++++++++---
 14 files changed, 212 insertions(+), 7 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-11-12  9:22   ` Srikar Dronamraju
  2014-10-16 19:29 ` [RFC 02/11] powerpc: kvm: ensure vcpu-thread run only on primary hwthread kernelfans
                   ` (10 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

Some system such as powerpc, some tsk (vcpu thread) can only run on
the dedicated cpu. Since we adapt some asymmetric method to monitor the
whole physical cpu. (powerKVM only allows the primary hwthread to
set up runtime env for the secondary when entering guest).

Nowadays, powerKVM run with all the secondary hwthread offline to ensure
the vcpu threads only run on the primary thread. But we plan to keep all
cpus online when running powerKVM to give more power when switching back
to host, so introduce sys_allowed cpumask to reflect the cpuset which
the vcpu thread can run on.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/linux/init_task.h |  1 +
 include/linux/sched.h     |  6 ++++++
 kernel/sched/core.c       | 10 ++++++++--
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index 2bb4c4f3..c56f69e 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -172,6 +172,7 @@ extern struct task_group root_task_group;
 	.normal_prio	= MAX_PRIO-20,					\
 	.policy		= SCHED_NORMAL,					\
 	.cpus_allowed	= CPU_MASK_ALL,					\
+	.sys_allowed = CPU_MASK_ALL,			\
 	.nr_cpus_allowed= NR_CPUS,					\
 	.mm		= NULL,						\
 	.active_mm	= &init_mm,					\
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5c2c885..ce429f3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1260,7 +1260,10 @@ struct task_struct {
 
 	unsigned int policy;
 	int nr_cpus_allowed;
+	/* Anded user and sys_allowed */
 	cpumask_t cpus_allowed;
+	/* due to the feature of asymmetric, some tsk can only run on such cpu */
+	cpumask_t sys_allowed;
 
 #ifdef CONFIG_PREEMPT_RCU
 	int rcu_read_lock_nesting;
@@ -2030,6 +2033,9 @@ static inline void tsk_restore_flags(struct task_struct *task,
 }
 
 #ifdef CONFIG_SMP
+extern void set_cpus_sys_allowed(struct task_struct *p,
+			const struct cpumask *new_mask);
+
 extern void do_set_cpus_allowed(struct task_struct *p,
 			       const struct cpumask *new_mask);
 
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ec1a286..2cd1ae3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4596,13 +4596,19 @@ void init_idle(struct task_struct *idle, int cpu)
 }
 
 #ifdef CONFIG_SMP
+void set_cpus_sys_allowed(struct task_struct *p,
+	const struct cpumask *new_mask)
+{
+	cpumask_copy(&p->sys_allowed, new_mask);
+}
+
 void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
 {
 	if (p->sched_class && p->sched_class->set_cpus_allowed)
 		p->sched_class->set_cpus_allowed(p, new_mask);
 
-	cpumask_copy(&p->cpus_allowed, new_mask);
-	p->nr_cpus_allowed = cpumask_weight(new_mask);
+	cpumask_and(&p->cpus_allowed, &p->sys_allowed, new_mask);
+	p->nr_cpus_allowed = cpumask_weight(&p->cpus_allowed);
 }
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 02/11] powerpc: kvm: ensure vcpu-thread run only on primary hwthread
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
  2014-10-16 19:29 ` [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-11-12 10:17   ` Srikar Dronamraju
  2014-10-16 19:29 ` [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core kernelfans
                   ` (9 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

When vcpu thread runs at the first time, it will ensure to stick
to the primary thread.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h |  3 +++
 arch/powerpc/kvm/book3s_hv.c        | 17 +++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 98d9dd5..9a3355e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -666,6 +666,9 @@ struct kvm_vcpu_arch {
 	spinlock_t tbacct_lock;
 	u64 busy_stolen;
 	u64 busy_preempt;
+#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
+	bool cpu_selected;
+#endif
 #endif
 };
 
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 27cced9..ba258c8 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1909,6 +1909,23 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
 {
 	int r;
 	int srcu_idx;
+#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
+	int cpu = smp_processor_id();
+	int target_cpu;
+	unsigned int cpu;
+	struct task_struct *p = current;
+
+	if (unlikely(!vcpu->arch.cpu_selected)) {
+		vcpu->arch.cpu_selected = true;
+		for (cpu = 0; cpu < NR_CPUS; cpu+=threads_per_core) {
+			cpumask_set_cpu(cpu, &p->sys_allowed);
+		}
+		if (cpu%threads_per_core != 0) {
+			target_cpu = cpu/threads_per_core*threads_per_core;
+			migrate_task_to(current, target_cpu);
+		}
+	}
+#endif
 
 	if (!vcpu->arch.sane) {
 		run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
  2014-10-16 19:29 ` [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system kernelfans
  2014-10-16 19:29 ` [RFC 02/11] powerpc: kvm: ensure vcpu-thread run only on primary hwthread kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-10-27  4:04   ` Preeti U Murthy
  2014-11-12 13:01   ` Srikar Dronamraju
  2014-10-16 19:29 ` [RFC 04/11] powerpc: kvm: introduce a kthread on primary thread to anti tickless kernelfans
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

When kvm is enabled on a core, we migrate all external irq to primary
thread. Since currently, the kvmirq logic is handled by the primary
hwthread.

Todo: this patch lacks re-enable of irqbalance when kvm is disable on
the core

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/sysfs.c            | 39 ++++++++++++++++++++++++++++++++++
 arch/powerpc/sysdev/xics/xics-common.c | 12 +++++++++++
 2 files changed, 51 insertions(+)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index 67fd2fd..a2595dd 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -552,6 +552,45 @@ static void sysfs_create_dscr_default(void)
 	if (cpu_has_feature(CPU_FTR_DSCR))
 		err = device_create_file(cpu_subsys.dev_root, &dev_attr_dscr_default);
 }
+
+#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
+#define NR_CORES	(CONFIG_NR_CPUS/threads_per_core)
+static DECLARE_BITMAP(kvm_on_core, NR_CORES) __read_mostly
+
+static ssize_t show_kvm_enable(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+}
+
+static ssize_t __used store_kvm_enable(struct device *dev,
+		struct device_attribute *attr, const char *buf,
+		size_t count)
+{
+	struct cpumask stop_cpus;
+	unsigned long core, thr;
+
+	sscanf(buf, "%lx", &core);
+	if (core > NR_CORES)
+		return -1;
+	if (!test_bit(core, &kvm_on_core))
+		for (thr = 1; thr< threads_per_core; thr++)
+			if (cpu_online(thr * threads_per_core + thr))
+				cpumask_set_cpu(thr * threads_per_core + thr, &stop_cpus);
+
+	stop_machine(xics_migrate_irqs_away_secondary, NULL, &stop_cpus);
+	set_bit(core, &kvm_on_core);
+	return count;
+}
+
+static DEVICE_ATTR(kvm_enable, 0600,
+	show_kvm_enable, store_kvm_enable);
+
+static void sysfs_create_kvm_enable(void)
+{
+	device_create_file(cpu_subsys.dev_root, &dev_attr_kvm_enable);
+}
+#endif
+
 #endif /* CONFIG_PPC64 */
 
 #ifdef HAS_PPC_PMC_PA6T
diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
index fe0cca4..68b33d8 100644
--- a/arch/powerpc/sysdev/xics/xics-common.c
+++ b/arch/powerpc/sysdev/xics/xics-common.c
@@ -258,6 +258,18 @@ unlock:
 		raw_spin_unlock_irqrestore(&desc->lock, flags);
 	}
 }
+
+int xics_migrate_irqs_away_secondary(void *data)
+{
+	int cpu = smp_processor_id();
+	if(cpu%thread_per_core != 0) {
+		WARN(condition, format...);
+		return 0;
+	}
+	/* In fact, if we can migrate the primary, it will be more fine */
+	xics_migrate_irqs_away();
+	return 0;
+}
 #endif /* CONFIG_HOTPLUG_CPU */
 
 #ifdef CONFIG_SMP
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 04/11] powerpc: kvm: introduce a kthread on primary thread to anti tickless
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
                   ` (2 preceding siblings ...)
  2014-10-16 19:29 ` [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-10-27  4:45   ` Preeti U Murthy
  2014-10-16 19:29 ` [RFC 05/11] sched: introduce stop_cpus_async() to schedule special tsk on cpu kernelfans
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

(This patch is a place holder.)

If there is only one vcpu thread is ready(the other vcpu thread can
wait for it to execute), the primary thread can enter tickless mode,
which causes the primary keeps running, so the secondary has no
opportunity to exit to host, even they have other tsk on them.

Introduce a kthread (anti_tickless) on primary, so when there is only
one vcpu thread on primary, the secondary can resort to anti_tickless
to keep the primary out of tickless mode.
(I thought that anti_tickless thread can goto NAP, so we can let the
secondary run).

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kernel/sysfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
index a2595dd..f0b110e 100644
--- a/arch/powerpc/kernel/sysfs.c
+++ b/arch/powerpc/kernel/sysfs.c
@@ -575,9 +575,11 @@ static ssize_t __used store_kvm_enable(struct device *dev,
 	if (!test_bit(core, &kvm_on_core))
 		for (thr = 1; thr< threads_per_core; thr++)
 			if (cpu_online(thr * threads_per_core + thr))
-				cpumask_set_cpu(thr * threads_per_core + thr, &stop_cpus);
+				cpumask_set_cpu(core * threads_per_core + thr, &stop_cpus);
 
 	stop_machine(xics_migrate_irqs_away_secondary, NULL, &stop_cpus);
+	/* fixme, create a kthread on primary hwthread to handle tickless mode */
+	//kthread_create_on_cpu(prevent_tickless, NULL, core * threads_per_core, "ppckvm_prevent_tickless");
 	set_bit(core, &kvm_on_core);
 	return count;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 05/11] sched: introduce stop_cpus_async() to schedule special tsk on cpu
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
                   ` (3 preceding siblings ...)
  2014-10-16 19:29 ` [RFC 04/11] powerpc: kvm: introduce a kthread on primary thread to anti tickless kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-10-16 19:29 ` [RFC 06/11] powerpc: kvm: introduce online in paca to indicate whether cpu is needed by host kernelfans
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

The proto will be:
     cpu1                              cpuX
  stop_cpus_async()
                                  bring cpuX to a special state
                                  signal flag and trapped
  check for flag

The func help powerpc to reuse the scheme of cpu_stopper_task
to force the secondary hwthread goto NAP state, in which state,
cpu will not run any longer until the master cpu tells them to
go.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 include/linux/stop_machine.h |  2 ++
 kernel/stop_machine.c        | 25 ++++++++++++++++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h
index d2abbdb..871c1bf 100644
--- a/include/linux/stop_machine.h
+++ b/include/linux/stop_machine.h
@@ -32,6 +32,8 @@ int stop_two_cpus(unsigned int cpu1, unsigned int cpu2, cpu_stop_fn_t fn, void *
 void stop_one_cpu_nowait(unsigned int cpu, cpu_stop_fn_t fn, void *arg,
 			 struct cpu_stop_work *work_buf);
 int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
+int stop_cpus_async(const struct cpumask *cpumask, cpu_stop_fn_t fn,
+	void *arg);
 int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg);
 
 #else	/* CONFIG_SMP */
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 695f0c6..d26fd6a 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -354,13 +354,15 @@ static void queue_stop_cpus_work(const struct cpumask *cpumask,
 }
 
 static int __stop_cpus(const struct cpumask *cpumask,
-		       cpu_stop_fn_t fn, void *arg)
+		       cpu_stop_fn_t fn, void *arg, bool sync)
 {
 	struct cpu_stop_done done;
 
-	cpu_stop_init_done(&done, cpumask_weight(cpumask));
+	if (sync)
+		cpu_stop_init_done(&done, cpumask_weight(cpumask));
 	queue_stop_cpus_work(cpumask, fn, arg, &done);
-	wait_for_completion(&done.completion);
+	if (sync)
+		wait_for_completion(&done.completion);
 	return done.executed ? done.ret : -ENOENT;
 }
 
@@ -398,7 +400,20 @@ int stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
 
 	/* static works are used, process one request at a time */
 	mutex_lock(&stop_cpus_mutex);
-	ret = __stop_cpus(cpumask, fn, arg);
+	ret = __stop_cpus(cpumask, fn, arg, true);
+	mutex_unlock(&stop_cpus_mutex);
+	return ret;
+}
+
+/* similar to stop_cpus(), but not wait for the ack. */
+int stop_cpus_async(const struct cpumask *cpumask, cpu_stop_fn_t fn,
+	void *arg)
+{
+	int ret;
+
+	/* static works are used, process one request at a time */
+	mutex_lock(&stop_cpus_mutex);
+	ret = __stop_cpus(cpumask, fn, arg, false);
 	mutex_unlock(&stop_cpus_mutex);
 	return ret;
 }
@@ -428,7 +443,7 @@ int try_stop_cpus(const struct cpumask *cpumask, cpu_stop_fn_t fn, void *arg)
 	/* static works are used, process one request at a time */
 	if (!mutex_trylock(&stop_cpus_mutex))
 		return -EAGAIN;
-	ret = __stop_cpus(cpumask, fn, arg);
+	ret = __stop_cpus(cpumask, fn, arg, true);
 	mutex_unlock(&stop_cpus_mutex);
 	return ret;
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 06/11] powerpc: kvm: introduce online in paca to indicate whether cpu is needed by host
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
                   ` (4 preceding siblings ...)
  2014-10-16 19:29 ` [RFC 05/11] sched: introduce stop_cpus_async() to schedule special tsk on cpu kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-10-27  5:32   ` Preeti U Murthy
  2014-10-16 19:29 ` [RFC 07/11] powerpc: kvm: the stopper func to cease secondary hwthread kernelfans
                   ` (5 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

Nowadays, powerKVM runs with secondary hwthread offline. Although
we can make all secondary hwthread online later, we still preserve
this behavior for dedicated KVM env. Achieve this by setting
paca->online as false.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/paca.h         |  3 +++
 arch/powerpc/kernel/asm-offsets.c       |  3 +++
 arch/powerpc/kernel/smp.c               |  3 +++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 ++++++++++++
 4 files changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index a5139ea..67c2500 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -84,6 +84,9 @@ struct paca_struct {
 	u8 cpu_start;			/* At startup, processor spins until */
 					/* this becomes non-zero. */
 	u8 kexec_state;		/* set when kexec down has irqs off */
+#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
+	u8 online;
+#endif
 #ifdef CONFIG_PPC_STD_MMU_64
 	struct slb_shadow *slb_shadow_ptr;
 	struct dtl_entry *dispatch_log;
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 9d7dede..0faa8fe 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -182,6 +182,9 @@ int main(void)
 	DEFINE(PACATOC, offsetof(struct paca_struct, kernel_toc));
 	DEFINE(PACAKBASE, offsetof(struct paca_struct, kernelbase));
 	DEFINE(PACAKMSR, offsetof(struct paca_struct, kernel_msr));
+#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
+	DEFINE(PACAONLINE, offsetof(struct paca_struct, online));
+#endif
 	DEFINE(PACASOFTIRQEN, offsetof(struct paca_struct, soft_enabled));
 	DEFINE(PACAIRQHAPPENED, offsetof(struct paca_struct, irq_happened));
 	DEFINE(PACACONTEXTID, offsetof(struct paca_struct, context.id));
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index a0738af..4c3843e 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -736,6 +736,9 @@ void start_secondary(void *unused)
 
 	cpu_startup_entry(CPUHP_ONLINE);
 
+#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
+	get_paca()->online = true;
+#endif 
 	BUG();
 }
 
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index f0c4db7..d5594b0 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -322,6 +322,13 @@ kvm_no_guest:
 	li	r0, KVM_HWTHREAD_IN_NAP
 	stb	r0, HSTATE_HWTHREAD_STATE(r13)
 kvm_do_nap:
+#ifdef PPCKVM_ENABLE_SECONDARY
+	/* check the cpu is needed by host or not */
+	ld      r2, PACAONLINE(r13)
+	ld      r3, 0
+	cmp     r2, r3
+	bne     kvm_secondary_exit_trampoline
+#endif
 	/* Clear the runlatch bit before napping */
 	mfspr	r2, SPRN_CTRLF
 	clrrdi	r2, r2, 1
@@ -340,6 +347,11 @@ kvm_do_nap:
 	nap
 	b	.
 
+#ifdef PPCKVM_ENABLE_SECONDARY
+kvm_secondary_exit_trampoline:
+	b	.
+#endif
+
 /******************************************************************************
  *                                                                            *
  *                               Entry code                                   *
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 07/11] powerpc: kvm: the stopper func to cease secondary hwthread
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
                   ` (5 preceding siblings ...)
  2014-10-16 19:29 ` [RFC 06/11] powerpc: kvm: introduce online in paca to indicate whether cpu is needed by host kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-10-22  7:12   ` Preeti U Murthy
  2014-10-27  6:07   ` Preeti U Murthy
  2014-10-16 19:29 ` [RFC 08/11] powerpc: kvm: add a flag in vcore to sync primary with secondry hwthread kernelfans
                   ` (4 subsequent siblings)
  11 siblings, 2 replies; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

To enter guest, primary hwtherad schedules the stopper func on
secondary threads and force them into NAP mode.
When exit to host,secondary threads hardcode to restore the stack,
then switch back to the stopper func, i.e host.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c            | 15 +++++++++++++++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 34 +++++++++++++++++++++++++++++++++
 2 files changed, 49 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ba258c8..4348abd 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1486,6 +1486,21 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
 	list_del(&vcpu->arch.run_list);
 }
 
+#ifdef KVMPPC_ENABLE_SECONDARY
+
+extern void kvmppc_secondary_stopper_enter();
+
+static int kvmppc_secondary_stopper(void *data)
+{
+	int cpu =smp_processor_id();
+	struct paca_struct *lpaca = get_paca();
+	BUG_ON(!(cpu%thread_per_core));
+
+	kvmppc_secondary_stopper_enter();
+}
+
+#endif
+
 static int kvmppc_grab_hwthread(int cpu)
 {
 	struct paca_struct *tpaca;
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index d5594b0..254038b 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -349,7 +349,41 @@ kvm_do_nap:
 
 #ifdef PPCKVM_ENABLE_SECONDARY
 kvm_secondary_exit_trampoline:
+
+	/* all register is free to use, later kvmppc_secondary_stopper_exit set up them*/
+	//loop-wait for the primary to signal that host env is ready
+
+	LOAD_REG_ADDR(r5, kvmppc_secondary_stopper_exit)
+	/* fixme, load msr from lpaca stack */
+	li	r6, MSR_IR | MSR_DR
+	mtsrr0	r5
+	mtsrr1	r6
+	RFI
+
+_GLOBAL_TOC(kvmppc_secondary_stopper_enter)
+	mflr	r0
+	std	r0, PPC_LR_STKOFF(r1)
+	stdu	r1, -112(r1)
+
+	/* fixme: store other register such as msr */
+
+	/* prevent us to enter kernel */
+	li	r0, 1
+	stb	r0, HSTATE_HWTHREAD_REQ(r13)
+	/* tell the primary that we are ready */
+        li      r0,KVM_HWTHREAD_IN_KERNEL
+        stb     r0,HSTATE_HWTHREAD_STATE(r13)
+	nap
 	b	.
+
+/* enter with vmode */
+kvmppc_secondary_stopper_exit:
+	/* fixme, restore the stack which we store on lpaca */
+
+	ld	r0, 112+PPC_LR_STKOFF(r1)
+	addi	r1, r1, 112
+	mtlr	r0
+	blr
 #endif
 
 /******************************************************************************
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 08/11] powerpc: kvm: add a flag in vcore to sync primary with secondry hwthread
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
                   ` (6 preceding siblings ...)
  2014-10-16 19:29 ` [RFC 07/11] powerpc: kvm: the stopper func to cease secondary hwthread kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-10-27  6:28   ` Preeti U Murthy
  2014-10-16 19:29 ` [RFC 09/11] powerpc: kvm: handle time base on secondary hwthread kernelfans
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

The secondary thread can only jump back to host until primary has set
up the env. Add host_ready field in kvm_vcore to sync this action.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/kvm_host.h     |  3 +++
 arch/powerpc/kernel/asm-offsets.c       |  3 +++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 ++++++++++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 9a3355e..1310e03 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -305,6 +305,9 @@ struct kvmppc_vcore {
 	u32 arch_compat;
 	ulong pcr;
 	ulong dpdes;		/* doorbell state (POWER8) */
+#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
+	u8 host_ready;
+#endif
 	void *mpp_buffer; /* Micro Partition Prefetch buffer */
 	bool mpp_buffer_is_valid;
 };
diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
index 0faa8fe..9c04ac2 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -562,6 +562,9 @@ int main(void)
 	DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
 	DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
 	DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
+#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
+	DEFINE(VCORE_HOST_READY, offsetof(struct kvmppc_vcore, host_ready));
+#endif
 	DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
 	DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
 	DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 254038b..89ea16c 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -351,7 +351,11 @@ kvm_do_nap:
 kvm_secondary_exit_trampoline:
 
 	/* all register is free to use, later kvmppc_secondary_stopper_exit set up them*/
-	//loop-wait for the primary to signal that host env is ready
+	/* wait until the primary to set up host env */
+	ld	r5, HSTATE_KVM_VCORE(r13)
+	ld	r0, VCORE_HOST_READY(r5)
+	cmp	r0,  //primary is ready?
+	bne	kvm_secondary_exit_trampoline
 
 	LOAD_REG_ADDR(r5, kvmppc_secondary_stopper_exit)
 	/* fixme, load msr from lpaca stack */
@@ -1821,6 +1825,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	li	r0, KVM_GUEST_MODE_NONE
 	stb	r0, HSTATE_IN_GUEST(r13)
 
+#ifdef PPCKVM_ENABLE_SECONDARY
+	/* signal the secondary that host env is ready */
+	li	r0, 1
+	stb	r0, VCORE_HOST_READY(r5)
+#endif
 	ld	r0, 112+PPC_LR_STKOFF(r1)
 	addi	r1, r1, 112
 	mtlr	r0
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 09/11] powerpc: kvm: handle time base on secondary hwthread
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
                   ` (7 preceding siblings ...)
  2014-10-16 19:29 ` [RFC 08/11] powerpc: kvm: add a flag in vcore to sync primary with secondry hwthread kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-10-27  6:40   ` Preeti U Murthy
  2014-10-16 19:29 ` [RFC 10/11] powerpc: kvm: on_primary_thread() force the secondary threads into NAP mode kernelfans
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

(This is a place holder patch.)
We need to store the time base for host on secondary hwthread.
Later when switching back, we need to reprogram it with elapse
time.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 89ea16c..a817ba6 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -371,6 +371,8 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
 
 	/* fixme: store other register such as msr */
 
+	/* fixme: store the tb, and set it as MAX, so we cease the tick on secondary */
+
 	/* prevent us to enter kernel */
 	li	r0, 1
 	stb	r0, HSTATE_HWTHREAD_REQ(r13)
@@ -382,6 +384,10 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
 
 /* enter with vmode */
 kvmppc_secondary_stopper_exit:
+	/* fixme: restore the tb, with the orig val plus time elapse
+         * so we can fire the hrtimer as soon as possible
+         */
+
 	/* fixme, restore the stack which we store on lpaca */
 
 	ld	r0, 112+PPC_LR_STKOFF(r1)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 10/11] powerpc: kvm: on_primary_thread() force the secondary threads into NAP mode
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
                   ` (8 preceding siblings ...)
  2014-10-16 19:29 ` [RFC 09/11] powerpc: kvm: handle time base on secondary hwthread kernelfans
@ 2014-10-16 19:29 ` kernelfans
  2014-10-16 19:30 ` [RFC 11/11] powerpc: kvm: Kconfig add an option for enabling secondary hwthread kernelfans
  2014-11-18 17:54 ` [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host Alexander Graf
  11 siblings, 0 replies; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:29 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

The primary hwthread ceases the scheduler of secondary hwthread by
bringing them into NAP. Then, the secondary is ready for guest.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 4348abd..7896c31 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -1593,15 +1593,22 @@ static int on_primary_thread(void)
 {
 	int cpu = smp_processor_id();
 	int thr;
+	struct cpumask msk;
 
 	/* Are we on a primary subcore? */
 	if (cpu_thread_in_subcore(cpu))
 		return 0;
 
 	thr = 0;
+#ifdef KVMPPC_ENABLE_SECONDARY
+	while (++thr < threads_per_subcore)
+		cpumask_set_cpu(thr, &msk);
+	stop_cpus_async(&msk, kvmppc_secondary_stopper, NULL);
+#else
 	while (++thr < threads_per_subcore)
 		if (cpu_online(cpu + thr))
 			return 0;
+#endif
 
 	/* Grab all hw threads so they can't go into the kernel */
 	for (thr = 1; thr < threads_per_subcore; ++thr) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 11/11] powerpc: kvm: Kconfig add an option for enabling secondary hwthread
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
                   ` (9 preceding siblings ...)
  2014-10-16 19:29 ` [RFC 10/11] powerpc: kvm: on_primary_thread() force the secondary threads into NAP mode kernelfans
@ 2014-10-16 19:30 ` kernelfans
  2014-10-27  6:44   ` Preeti U Murthy
  2014-11-18 17:54 ` [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host Alexander Graf
  11 siblings, 1 reply; 30+ messages in thread
From: kernelfans @ 2014-10-16 19:30 UTC (permalink / raw)
  To: linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
---
 arch/powerpc/kvm/Kconfig | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 602eb51..de38566 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -93,6 +93,10 @@ config KVM_BOOK3S_64_HV
 
 	  If unsure, say N.
 
+config KVMPPC_ENABLE_SECONDARY
+	tristate "KVM support for running on secondary hwthread in host"
+	depends on KVM_BOOK3S_64_HV
+
 config KVM_BOOK3S_64_PR
 	tristate "KVM support without using hypervisor mode in host"
 	depends on KVM_BOOK3S_64
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC 07/11] powerpc: kvm: the stopper func to cease secondary hwthread
  2014-10-16 19:29 ` [RFC 07/11] powerpc: kvm: the stopper func to cease secondary hwthread kernelfans
@ 2014-10-22  7:12   ` Preeti U Murthy
  2014-10-27  6:07   ` Preeti U Murthy
  1 sibling, 0 replies; 30+ messages in thread
From: Preeti U Murthy @ 2014-10-22  7:12 UTC (permalink / raw)
  To: kernelfans, linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

Hi Liu,

On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> To enter guest, primary hwtherad schedules the stopper func on
> secondary threads and force them into NAP mode.
> When exit to host,secondary threads hardcode to restore the stack,
> then switch back to the stopper func, i.e host.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv.c            | 15 +++++++++++++++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 34 +++++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index ba258c8..4348abd 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1486,6 +1486,21 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
>  	list_del(&vcpu->arch.run_list);
>  }
>  
> +#ifdef KVMPPC_ENABLE_SECONDARY
> +
> +extern void kvmppc_secondary_stopper_enter();
> +
> +static int kvmppc_secondary_stopper(void *data)
> +{
> +	int cpu =smp_processor_id();
> +	struct paca_struct *lpaca = get_paca();
> +	BUG_ON(!(cpu%thread_per_core));
> +
> +	kvmppc_secondary_stopper_enter();
> +}
> +
> +#endif
> +
>  static int kvmppc_grab_hwthread(int cpu)
>  {
>  	struct paca_struct *tpaca;
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index d5594b0..254038b 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -349,7 +349,41 @@ kvm_do_nap:
>  
>  #ifdef PPCKVM_ENABLE_SECONDARY
>  kvm_secondary_exit_trampoline:
> +
> +	/* all register is free to use, later kvmppc_secondary_stopper_exit set up them*/
> +	//loop-wait for the primary to signal that host env is ready
> +
> +	LOAD_REG_ADDR(r5, kvmppc_secondary_stopper_exit)
> +	/* fixme, load msr from lpaca stack */
> +	li	r6, MSR_IR | MSR_DR
> +	mtsrr0	r5
> +	mtsrr1	r6
> +	RFI
> +
> +_GLOBAL_TOC(kvmppc_secondary_stopper_enter)
> +	mflr	r0
> +	std	r0, PPC_LR_STKOFF(r1)
> +	stdu	r1, -112(r1)
> +
> +	/* fixme: store other register such as msr */
> +
> +	/* prevent us to enter kernel */
> +	li	r0, 1
> +	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> +	/* tell the primary that we are ready */
> +        li      r0,KVM_HWTHREAD_IN_KERNEL
> +        stb     r0,HSTATE_HWTHREAD_STATE(r13)
> +	nap
>  	b	.

This does not look right. Let me explain.

We can get hypervisor decrementer interrupts on the secondary threads
since they are online. And we would ideally want to handle them. They
could be scheduler tick interrupts or some timer queued on that thread
could have fired.

If it is a scheduler tick, we would want to exit to host to schedule the
next task if the timeslice of the vcpu running on the secondary thread
is over.

To the best of my understanding, this is what I can tell happens when we
get a hypervisor decrementer interrupt in the guest. We call all threads
in the core to exit to the kernel. The secondaries now return to the
code in idle_power7.S where the wakeup from idle path takes effect. Then
we do an rfid which lands after the nap code above. And we spin.

>From your above code you seem to be indicating that you do not want the
secondary threads to exit to the kernel once they are made to run guest
for the first time. Is this right? Or am I missing something ?

Moreover the changelog in [patch 10/11] says that you cease the
scheduler. This pushes me to think that you do not expect the scheduler
to run on the secondary threads ever again after you have launched the
guest. This does not look right to me.

Regards
Preeti U Murthy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core
  2014-10-16 19:29 ` [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core kernelfans
@ 2014-10-27  4:04   ` Preeti U Murthy
  2014-11-18  5:17     ` Liu ping fan
  2014-11-12 13:01   ` Srikar Dronamraju
  1 sibling, 1 reply; 30+ messages in thread
From: Preeti U Murthy @ 2014-10-27  4:04 UTC (permalink / raw)
  To: kernelfans, linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

Hi Liu,

On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> When kvm is enabled on a core, we migrate all external irq to primary
> thread. Since currently, the kvmirq logic is handled by the primary
> hwthread.
> 
> Todo: this patch lacks re-enable of irqbalance when kvm is disable on
> the core

Why is a sysfs file introduced to trigger irq migration? Why is it not
done during kvm module insert ? And similarly spread interrupts when the
module is removed? Isn't this a saner way ?
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/sysfs.c            | 39 ++++++++++++++++++++++++++++++++++
>  arch/powerpc/sysdev/xics/xics-common.c | 12 +++++++++++
>  2 files changed, 51 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 67fd2fd..a2595dd 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -552,6 +552,45 @@ static void sysfs_create_dscr_default(void)
>  	if (cpu_has_feature(CPU_FTR_DSCR))
>  		err = device_create_file(cpu_subsys.dev_root, &dev_attr_dscr_default);
>  }
> +
> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
> +#define NR_CORES	(CONFIG_NR_CPUS/threads_per_core)
> +static DECLARE_BITMAP(kvm_on_core, NR_CORES) __read_mostly
> +
> +static ssize_t show_kvm_enable(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +}
> +
> +static ssize_t __used store_kvm_enable(struct device *dev,
> +		struct device_attribute *attr, const char *buf,
> +		size_t count)
> +{
> +	struct cpumask stop_cpus;
> +	unsigned long core, thr;
> +
> +	sscanf(buf, "%lx", &core);
> +	if (core > NR_CORES)
> +		return -1;
> +	if (!test_bit(core, &kvm_on_core))
> +		for (thr = 1; thr< threads_per_core; thr++)
> +			if (cpu_online(thr * threads_per_core + thr))
> +				cpumask_set_cpu(thr * threads_per_core + thr, &stop_cpus);

What is the above logic trying to do? Did you mean
cpu_online(threads_per_core * core + thr) ?

> +
> +	stop_machine(xics_migrate_irqs_away_secondary, NULL, &stop_cpus);
> +	set_bit(core, &kvm_on_core);
> +	return count;
> +}
> +
> +static DEVICE_ATTR(kvm_enable, 0600,
> +	show_kvm_enable, store_kvm_enable);
> +
> +static void sysfs_create_kvm_enable(void)
> +{
> +	device_create_file(cpu_subsys.dev_root, &dev_attr_kvm_enable);
> +}
> +#endif
> +
>  #endif /* CONFIG_PPC64 */
>  
>  #ifdef HAS_PPC_PMC_PA6T
> diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
> index fe0cca4..68b33d8 100644
> --- a/arch/powerpc/sysdev/xics/xics-common.c
> +++ b/arch/powerpc/sysdev/xics/xics-common.c
> @@ -258,6 +258,18 @@ unlock:
>  		raw_spin_unlock_irqrestore(&desc->lock, flags);
>  	}
>  }
> +
> +int xics_migrate_irqs_away_secondary(void *data)
> +{
> +	int cpu = smp_processor_id();
> +	if(cpu%thread_per_core != 0) {
> +		WARN(condition, format...);
> +		return 0;
> +	}
> +	/* In fact, if we can migrate the primary, it will be more fine */
> +	xics_migrate_irqs_away();

Isn't the aim of the patch to migrate irqs away from the secondary onto
the primary? But from above it looks like we are returning when we find
out that we are secondary threads, isn't it?

> +	return 0;
> +}
>  #endif /* CONFIG_HOTPLUG_CPU */

Note that xics_migrate_irqs_away() is defined under CONFIG_CPU_HOTPLUG.
But we will need this option on PowerKVM even when hotplug is not
configured in.

Regards
Preeti U Murthy
>  #ifdef CONFIG_SMP
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 04/11] powerpc: kvm: introduce a kthread on primary thread to anti tickless
  2014-10-16 19:29 ` [RFC 04/11] powerpc: kvm: introduce a kthread on primary thread to anti tickless kernelfans
@ 2014-10-27  4:45   ` Preeti U Murthy
  2014-11-18  5:24     ` Liu ping fan
  0 siblings, 1 reply; 30+ messages in thread
From: Preeti U Murthy @ 2014-10-27  4:45 UTC (permalink / raw)
  To: kernelfans, linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> (This patch is a place holder.)
> 
> If there is only one vcpu thread is ready(the other vcpu thread can
> wait for it to execute), the primary thread can enter tickless mode,

We do not configure NOHZ_FULL to y by default. Hence no thread would
enter tickless mode.

> which causes the primary keeps running, so the secondary has no
> opportunity to exit to host, even they have other tsk on them.

The secondary threads can still get scheduling ticks. The decrementer of
the secondary threads is still active. So as long as secondary threads
are busy, scheduling ticks will fire and try to schedule a new task on them.

Regards
Preeti U Murthy
> 
> Introduce a kthread (anti_tickless) on primary, so when there is only
> one vcpu thread on primary, the secondary can resort to anti_tickless
> to keep the primary out of tickless mode.
> (I thought that anti_tickless thread can goto NAP, so we can let the
> secondary run).
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/sysfs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index a2595dd..f0b110e 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -575,9 +575,11 @@ static ssize_t __used store_kvm_enable(struct device *dev,
>  	if (!test_bit(core, &kvm_on_core))
>  		for (thr = 1; thr< threads_per_core; thr++)
>  			if (cpu_online(thr * threads_per_core + thr))
> -				cpumask_set_cpu(thr * threads_per_core + thr, &stop_cpus);
> +				cpumask_set_cpu(core * threads_per_core + thr, &stop_cpus);
>  
>  	stop_machine(xics_migrate_irqs_away_secondary, NULL, &stop_cpus);
> +	/* fixme, create a kthread on primary hwthread to handle tickless mode */
> +	//kthread_create_on_cpu(prevent_tickless, NULL, core * threads_per_core, "ppckvm_prevent_tickless");
>  	set_bit(core, &kvm_on_core);
>  	return count;
>  }
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 06/11] powerpc: kvm: introduce online in paca to indicate whether cpu is needed by host
  2014-10-16 19:29 ` [RFC 06/11] powerpc: kvm: introduce online in paca to indicate whether cpu is needed by host kernelfans
@ 2014-10-27  5:32   ` Preeti U Murthy
  2014-11-18  5:29     ` Liu ping fan
  0 siblings, 1 reply; 30+ messages in thread
From: Preeti U Murthy @ 2014-10-27  5:32 UTC (permalink / raw)
  To: kernelfans, linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

Hi Liu,

On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> Nowadays, powerKVM runs with secondary hwthread offline. Although
> we can make all secondary hwthread online later, we still preserve
> this behavior for dedicated KVM env. Achieve this by setting
> paca->online as false.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/paca.h         |  3 +++
>  arch/powerpc/kernel/asm-offsets.c       |  3 +++
>  arch/powerpc/kernel/smp.c               |  3 +++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 ++++++++++++
>  4 files changed, 21 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
> index a5139ea..67c2500 100644
> --- a/arch/powerpc/include/asm/paca.h
> +++ b/arch/powerpc/include/asm/paca.h
> @@ -84,6 +84,9 @@ struct paca_struct {
>  	u8 cpu_start;			/* At startup, processor spins until */
>  					/* this becomes non-zero. */
>  	u8 kexec_state;		/* set when kexec down has irqs off */
> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
> +	u8 online;
> +#endif
>  #ifdef CONFIG_PPC_STD_MMU_64
>  	struct slb_shadow *slb_shadow_ptr;
>  	struct dtl_entry *dispatch_log;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 9d7dede..0faa8fe 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -182,6 +182,9 @@ int main(void)
>  	DEFINE(PACATOC, offsetof(struct paca_struct, kernel_toc));
>  	DEFINE(PACAKBASE, offsetof(struct paca_struct, kernelbase));
>  	DEFINE(PACAKMSR, offsetof(struct paca_struct, kernel_msr));
> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
> +	DEFINE(PACAONLINE, offsetof(struct paca_struct, online));
> +#endif
>  	DEFINE(PACASOFTIRQEN, offsetof(struct paca_struct, soft_enabled));
>  	DEFINE(PACAIRQHAPPENED, offsetof(struct paca_struct, irq_happened));
>  	DEFINE(PACACONTEXTID, offsetof(struct paca_struct, context.id));
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index a0738af..4c3843e 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -736,6 +736,9 @@ void start_secondary(void *unused)
>  
>  	cpu_startup_entry(CPUHP_ONLINE);
>  
> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
> +	get_paca()->online = true;
> +#endif 
>  	BUG();
>  }
>  
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index f0c4db7..d5594b0 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -322,6 +322,13 @@ kvm_no_guest:
>  	li	r0, KVM_HWTHREAD_IN_NAP
>  	stb	r0, HSTATE_HWTHREAD_STATE(r13)
>  kvm_do_nap:
> +#ifdef PPCKVM_ENABLE_SECONDARY
> +	/* check the cpu is needed by host or not */
> +	ld      r2, PACAONLINE(r13)
> +	ld      r3, 0
> +	cmp     r2, r3
> +	bne     kvm_secondary_exit_trampoline
> +#endif
>  	/* Clear the runlatch bit before napping */
>  	mfspr	r2, SPRN_CTRLF
>  	clrrdi	r2, r2, 1
> @@ -340,6 +347,11 @@ kvm_do_nap:
>  	nap
>  	b	.
>  
> +#ifdef PPCKVM_ENABLE_SECONDARY
> +kvm_secondary_exit_trampoline:
> +	b	.

Uh? When we have no vcpu to run, we loop here instead of doing a nap?
What are we achieving?

If I understand the intention of the patch well, we are looking to
provide a knob whereby the host can indicate if it needs the secondaries
at all.

Today the host does boot with all threads online. There are some init
scripts which take the secondaries down. So today the host does not have
a say in preventing this, compile time or runtime. So lets see how we
can switch between the two behaviors if we don't have the init script,
which looks like a saner thing to do.

We should set the paca->online flag to false by default. If
KVM_PPC_ENABLE_SECONDARY is configured, we need to set this flag to
true. So at compile time, we resolve the flag.

While booting, we look at the flag and decide whether to get the
secondaries online. So we get the current behavior if we have not
configured KVM_PPC_ENABLE_SECONDARY. Will this achieve the purpose of
this patch?

Regards
Preeti U Murthy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 07/11] powerpc: kvm: the stopper func to cease secondary hwthread
  2014-10-16 19:29 ` [RFC 07/11] powerpc: kvm: the stopper func to cease secondary hwthread kernelfans
  2014-10-22  7:12   ` Preeti U Murthy
@ 2014-10-27  6:07   ` Preeti U Murthy
  1 sibling, 0 replies; 30+ messages in thread
From: Preeti U Murthy @ 2014-10-27  6:07 UTC (permalink / raw)
  To: kernelfans, linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> To enter guest, primary hwtherad schedules the stopper func on
> secondary threads and force them into NAP mode.
> When exit to host,secondary threads hardcode to restore the stack,
> then switch back to the stopper func, i.e host.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv.c            | 15 +++++++++++++++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 34 +++++++++++++++++++++++++++++++++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index ba258c8..4348abd 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1486,6 +1486,21 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
>  	list_del(&vcpu->arch.run_list);
>  }
>  
> +#ifdef KVMPPC_ENABLE_SECONDARY
> +
> +extern void kvmppc_secondary_stopper_enter();
> +
> +static int kvmppc_secondary_stopper(void *data)
> +{
> +	int cpu =smp_processor_id();
> +	struct paca_struct *lpaca = get_paca();
> +	BUG_ON(!(cpu%thread_per_core));
> +
> +	kvmppc_secondary_stopper_enter();
> +}
> +
> +#endif
> +
>  static int kvmppc_grab_hwthread(int cpu)
>  {
>  	struct paca_struct *tpaca;
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index d5594b0..254038b 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -349,7 +349,41 @@ kvm_do_nap:
>  
>  #ifdef PPCKVM_ENABLE_SECONDARY
>  kvm_secondary_exit_trampoline:
> +
> +	/* all register is free to use, later kvmppc_secondary_stopper_exit set up them*/
> +	//loop-wait for the primary to signal that host env is ready
> +
> +	LOAD_REG_ADDR(r5, kvmppc_secondary_stopper_exit)
> +	/* fixme, load msr from lpaca stack */
> +	li	r6, MSR_IR | MSR_DR
> +	mtsrr0	r5
> +	mtsrr1	r6
> +	RFI
> +
> +_GLOBAL_TOC(kvmppc_secondary_stopper_enter)
> +	mflr	r0
> +	std	r0, PPC_LR_STKOFF(r1)
> +	stdu	r1, -112(r1)
> +
> +	/* fixme: store other register such as msr */
> +
> +	/* prevent us to enter kernel */
> +	li	r0, 1
> +	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> +	/* tell the primary that we are ready */
> +        li      r0,KVM_HWTHREAD_IN_KERNEL
> +        stb     r0,HSTATE_HWTHREAD_STATE(r13)
> +	nap
>  	b	.

I see a fundamental issues with this design. Note that the secondaries
are still capable of getting IPIs, timer interrupts and scheduler
interrupts in nap. Scheduling ticks do not get stopped just because we
nap. So the primary thread cannot assume that the secondaries will wait
in nap until it has switched to guest context. There will definitely be
race conditions here between primary switching context and the secondary
handling some interrupt in virtual mode.

So your answer to above could be that we are preventing the secondaries
from entering the kernel by setting the HSTATE_HWTHREAD_REQ above. But
we cannot do this because the secondary thread is as alive and kicking
as the primary thread on the host. There can be hrtimers queued, there
may be an IPI from a thread in another core. We cannot ignore handling
them while we wait for the primary thread to switch context.

There is also a bug as far as I can see with having a b . after nap. I
mentioned that in one of my earlier replies to this patch.


Regards
Preeti U Murthy
> +
> +/* enter with vmode */
> +kvmppc_secondary_stopper_exit:
> +	/* fixme, restore the stack which we store on lpaca */
> +
> +	ld	r0, 112+PPC_LR_STKOFF(r1)
> +	addi	r1, r1, 112
> +	mtlr	r0
> +	blr
>  #endif
>  
>  /******************************************************************************
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 08/11] powerpc: kvm: add a flag in vcore to sync primary with secondry hwthread
  2014-10-16 19:29 ` [RFC 08/11] powerpc: kvm: add a flag in vcore to sync primary with secondry hwthread kernelfans
@ 2014-10-27  6:28   ` Preeti U Murthy
  0 siblings, 0 replies; 30+ messages in thread
From: Preeti U Murthy @ 2014-10-27  6:28 UTC (permalink / raw)
  To: kernelfans, linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> The secondary thread can only jump back to host until primary has set
> up the env. Add host_ready field in kvm_vcore to sync this action.

Do we need to do this? We already synchronize among the sibling threads
when there is a need to enter the host. Today too, the secondaries may
require host kernel's intervention and there is already code in which
under such circumstances, the secondaries set the HDEC of the sibling
threads and wait for the primary to switch context. I see the code under
secondary_too_late. Besides this, do we need anything additional to
synchronize the exit to host path? IIUC, we will need synchronization
only in the entry to guest path.

And of course there will be more guest exits now since the secondaries
are as alive in the host as the primary and will have scheduling
interrupts atleast, forcing them to exit to host. This cannot be
trivially solved unless we can isolate these threads in some way.

Regards
Preeti U Murthy
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_host.h     |  3 +++
>  arch/powerpc/kernel/asm-offsets.c       |  3 +++
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 11 ++++++++++-
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 9a3355e..1310e03 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -305,6 +305,9 @@ struct kvmppc_vcore {
>  	u32 arch_compat;
>  	ulong pcr;
>  	ulong dpdes;		/* doorbell state (POWER8) */
> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
> +	u8 host_ready;
> +#endif
>  	void *mpp_buffer; /* Micro Partition Prefetch buffer */
>  	bool mpp_buffer_is_valid;
>  };
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 0faa8fe..9c04ac2 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -562,6 +562,9 @@ int main(void)
>  	DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr));
>  	DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr));
>  	DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes));
> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
> +	DEFINE(VCORE_HOST_READY, offsetof(struct kvmppc_vcore, host_ready));
> +#endif
>  	DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige));
>  	DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv));
>  	DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 254038b..89ea16c 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -351,7 +351,11 @@ kvm_do_nap:
>  kvm_secondary_exit_trampoline:
>  
>  	/* all register is free to use, later kvmppc_secondary_stopper_exit set up them*/
> -	//loop-wait for the primary to signal that host env is ready
> +	/* wait until the primary to set up host env */
> +	ld	r5, HSTATE_KVM_VCORE(r13)
> +	ld	r0, VCORE_HOST_READY(r5)
> +	cmp	r0,  //primary is ready?
> +	bne	kvm_secondary_exit_trampoline
>  
>  	LOAD_REG_ADDR(r5, kvmppc_secondary_stopper_exit)
>  	/* fixme, load msr from lpaca stack */
> @@ -1821,6 +1825,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
>  	li	r0, KVM_GUEST_MODE_NONE
>  	stb	r0, HSTATE_IN_GUEST(r13)
>  
> +#ifdef PPCKVM_ENABLE_SECONDARY
> +	/* signal the secondary that host env is ready */
> +	li	r0, 1
> +	stb	r0, VCORE_HOST_READY(r5)
> +#endif
>  	ld	r0, 112+PPC_LR_STKOFF(r1)
>  	addi	r1, r1, 112
>  	mtlr	r0
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 09/11] powerpc: kvm: handle time base on secondary hwthread
  2014-10-16 19:29 ` [RFC 09/11] powerpc: kvm: handle time base on secondary hwthread kernelfans
@ 2014-10-27  6:40   ` Preeti U Murthy
  2014-11-18  5:43     ` Liu ping fan
  0 siblings, 1 reply; 30+ messages in thread
From: Preeti U Murthy @ 2014-10-27  6:40 UTC (permalink / raw)
  To: kernelfans, linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
> (This is a place holder patch.)
> We need to store the time base for host on secondary hwthread.
> Later when switching back, we need to reprogram it with elapse
> time.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 89ea16c..a817ba6 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -371,6 +371,8 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
>  
>  	/* fixme: store other register such as msr */
>  
> +	/* fixme: store the tb, and set it as MAX, so we cease the tick on secondary */

This can lead to serious consequences. First of all, even if we set the
decrementer(not tb) to MAX, it is bound to fire after 4s. That is the
maximum time till which you can keep off the decrementer from firing.

In the hotplug path, the offline cpus nap and their decrementers are
programmed to fire at MAX too. But the difference is that we clear the
LPCR_PECE1 wakeup bit which prevents cpus from waking up on a
decrementer interrupt.

We cannot afford to do this here though because there are other tasks on
the secondary threads' runqueue. They need to be scheduled in.
There are also timers besides the tick_sched one, which can be queued on
these secondary threads. These patches have not taken care to migrate
timers or tasks before entering guest as far as I observed. Hence we
cannot just turn off time base like this and expect to handle the above
mentioned events the next time the primary thread decides to exit to the
host.

Regards
Preeti U Murthy
> +
>  	/* prevent us to enter kernel */
>  	li	r0, 1
>  	stb	r0, HSTATE_HWTHREAD_REQ(r13)
> @@ -382,6 +384,10 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
>  
>  /* enter with vmode */
>  kvmppc_secondary_stopper_exit:
> +	/* fixme: restore the tb, with the orig val plus time elapse
> +         * so we can fire the hrtimer as soon as possible
> +         */
> +
>  	/* fixme, restore the stack which we store on lpaca */
>  
>  	ld	r0, 112+PPC_LR_STKOFF(r1)
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 11/11] powerpc: kvm: Kconfig add an option for enabling secondary hwthread
  2014-10-16 19:30 ` [RFC 11/11] powerpc: kvm: Kconfig add an option for enabling secondary hwthread kernelfans
@ 2014-10-27  6:44   ` Preeti U Murthy
  2014-11-18  5:47     ` Liu ping fan
  0 siblings, 1 reply; 30+ messages in thread
From: Preeti U Murthy @ 2014-10-27  6:44 UTC (permalink / raw)
  To: kernelfans, linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras, Alexander Graf

On 10/17/2014 01:00 AM, kernelfans@gmail.com wrote:
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kvm/Kconfig | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index 602eb51..de38566 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -93,6 +93,10 @@ config KVM_BOOK3S_64_HV
>  
>  	  If unsure, say N.
>  
> +config KVMPPC_ENABLE_SECONDARY
> +	tristate "KVM support for running on secondary hwthread in host"
> +	depends on KVM_BOOK3S_64_HV

This patch is required ontop of all the rest :) The top patches won't
compile without this one. Every patch in the patchset should be able to
compile successfully without the aid of the patches that come after it.

Regards
Preeti U Murthy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system
  2014-10-16 19:29 ` [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system kernelfans
@ 2014-11-12  9:22   ` Srikar Dronamraju
  2014-11-18  5:07     ` Liu ping fan
  0 siblings, 1 reply; 30+ messages in thread
From: Srikar Dronamraju @ 2014-11-12  9:22 UTC (permalink / raw)
  To: kernelfans; +Cc: Paul Mackerras, linuxppc-dev, Alexander Graf, kvm-ppc

* kernelfans@gmail.com <kernelfans@gmail.com> [2014-10-16 15:29:50]:

> Some system such as powerpc, some tsk (vcpu thread) can only run on
> the dedicated cpu. Since we adapt some asymmetric method to monitor the
> whole physical cpu. (powerKVM only allows the primary hwthread to
> set up runtime env for the secondary when entering guest).
> 
> Nowadays, powerKVM run with all the secondary hwthread offline to ensure
> the vcpu threads only run on the primary thread. But we plan to keep all
> cpus online when running powerKVM to give more power when switching back
> to host, so introduce sys_allowed cpumask to reflect the cpuset which
> the vcpu thread can run on.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  include/linux/init_task.h |  1 +
>  include/linux/sched.h     |  6 ++++++
>  kernel/sched/core.c       | 10 ++++++++--
>  3 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
> index 2bb4c4f3..c56f69e 100644
> --- a/include/linux/init_task.h
> +++ b/include/linux/init_task.h
> @@ -172,6 +172,7 @@ extern struct task_group root_task_group;
>  	.normal_prio	= MAX_PRIO-20,					\
>  	.policy		= SCHED_NORMAL,					\
>  	.cpus_allowed	= CPU_MASK_ALL,					\
> +	.sys_allowed = CPU_MASK_ALL,			\

Do we really need another mask, cant we just use cpus_allowed itself.

>  	.nr_cpus_allowed= NR_CPUS,					\
>  	.mm		= NULL,						\
>  	.active_mm	= &init_mm,					\
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 5c2c885..ce429f3 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1260,7 +1260,10 @@ struct task_struct {
>  
>  	unsigned int policy;
>  	int nr_cpus_allowed;
> +	/* Anded user and sys_allowed */
>  	cpumask_t cpus_allowed;
> +	/* due to the feature of asymmetric, some tsk can only run on such cpu */
> +	cpumask_t sys_allowed;
>  
>  #ifdef CONFIG_PREEMPT_RCU
>  	int rcu_read_lock_nesting;
> @@ -2030,6 +2033,9 @@ static inline void tsk_restore_flags(struct task_struct *task,
>  }
>  
>  #ifdef CONFIG_SMP
> +extern void set_cpus_sys_allowed(struct task_struct *p,
> +			const struct cpumask *new_mask);
> +
>  extern void do_set_cpus_allowed(struct task_struct *p,
>  			       const struct cpumask *new_mask);
>  
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index ec1a286..2cd1ae3 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4596,13 +4596,19 @@ void init_idle(struct task_struct *idle, int cpu)
>  }
>  
>  #ifdef CONFIG_SMP
> +void set_cpus_sys_allowed(struct task_struct *p,
> +	const struct cpumask *new_mask)
> +{
> +	cpumask_copy(&p->sys_allowed, new_mask);
> +}
> +

This function doesnt seem to be used anywhere... Not sure why it is
introduced

>  void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
>  {
>  	if (p->sched_class && p->sched_class->set_cpus_allowed)
>  		p->sched_class->set_cpus_allowed(p, new_mask);
>  
> -	cpumask_copy(&p->cpus_allowed, new_mask);
> -	p->nr_cpus_allowed = cpumask_weight(new_mask);
> +	cpumask_and(&p->cpus_allowed, &p->sys_allowed, new_mask);
> +	p->nr_cpus_allowed = cpumask_weight(&p->cpus_allowed);
>  }
>  
>  /*
> -- 
> 1.8.3.1
> 
> 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 02/11] powerpc: kvm: ensure vcpu-thread run only on primary hwthread
  2014-10-16 19:29 ` [RFC 02/11] powerpc: kvm: ensure vcpu-thread run only on primary hwthread kernelfans
@ 2014-11-12 10:17   ` Srikar Dronamraju
  0 siblings, 0 replies; 30+ messages in thread
From: Srikar Dronamraju @ 2014-11-12 10:17 UTC (permalink / raw)
  To: kernelfans; +Cc: Paul Mackerras, linuxppc-dev, Alexander Graf, kvm-ppc

* kernelfans@gmail.com <kernelfans@gmail.com> [2014-10-16 15:29:51]:

> When vcpu thread runs at the first time, it will ensure to stick
> to the primary thread.
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_host.h |  3 +++
>  arch/powerpc/kvm/book3s_hv.c        | 17 +++++++++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 98d9dd5..9a3355e 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -666,6 +666,9 @@ struct kvm_vcpu_arch {
>  	spinlock_t tbacct_lock;
>  	u64 busy_stolen;
>  	u64 busy_preempt;
> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
> +	bool cpu_selected;
> +#endif
>  #endif
>  };
>  
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index 27cced9..ba258c8 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -1909,6 +1909,23 @@ static int kvmppc_vcpu_run_hv(struct kvm_run *run, struct kvm_vcpu *vcpu)
>  {
>  	int r;
>  	int srcu_idx;
> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
> +	int cpu = smp_processor_id();
> +	int target_cpu;
> +	unsigned int cpu;

2 variables with same name... cpu

> +	struct task_struct *p = current;
> +
> +	if (unlikely(!vcpu->arch.cpu_selected)) {
> +		vcpu->arch.cpu_selected = true;

Nit: something like cpumask_set seems to be better than cpu_selected

> +		for (cpu = 0; cpu < NR_CPUS; cpu+=threads_per_core) {
> +			cpumask_set_cpu(cpu, &p->sys_allowed);
		Dont we need to reset the cpumask first before we set
		the cpumask here?

> +		}
> +		if (cpu%threads_per_core != 0) {

At this time, cpu should be NR_CPUS and most times it should be a
multiple of threads_per_core. Unfortunately there wont be a cpu with
cpu number NR_CPUS.

> +			target_cpu = cpu/threads_per_core*threads_per_core;

Its probably better of to have parenthesis here. 

> +			migrate_task_to(current, target_cpu);
We are probably migrating to a non-existant cpu.
Also dont you need to check if the target_cpu is part of the cpumask?

> +		}
> +	}
> +#endif
>  
>  	if (!vcpu->arch.sane) {
>  		run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
> -- 
> 1.8.3.1
> 
> 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core
  2014-10-16 19:29 ` [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core kernelfans
  2014-10-27  4:04   ` Preeti U Murthy
@ 2014-11-12 13:01   ` Srikar Dronamraju
  1 sibling, 0 replies; 30+ messages in thread
From: Srikar Dronamraju @ 2014-11-12 13:01 UTC (permalink / raw)
  To: kernelfans; +Cc: Paul Mackerras, linuxppc-dev, Alexander Graf, kvm-ppc

* kernelfans@gmail.com <kernelfans@gmail.com> [2014-10-16 15:29:52]:

> When kvm is enabled on a core, we migrate all external irq to primary
> thread. Since currently, the kvmirq logic is handled by the primary
> hwthread.
> 
> Todo: this patch lacks re-enable of irqbalance when kvm is disable on
> the core
> 
> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> ---
>  arch/powerpc/kernel/sysfs.c            | 39 ++++++++++++++++++++++++++++++++++
>  arch/powerpc/sysdev/xics/xics-common.c | 12 +++++++++++
>  2 files changed, 51 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
> index 67fd2fd..a2595dd 100644
> --- a/arch/powerpc/kernel/sysfs.c
> +++ b/arch/powerpc/kernel/sysfs.c
> @@ -552,6 +552,45 @@ static void sysfs_create_dscr_default(void)
>  	if (cpu_has_feature(CPU_FTR_DSCR))
>  		err = device_create_file(cpu_subsys.dev_root, &dev_attr_dscr_default);
>  }
> +
> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
> +#define NR_CORES	(CONFIG_NR_CPUS/threads_per_core)
> +static DECLARE_BITMAP(kvm_on_core, NR_CORES) __read_mostly
> +
> +static ssize_t show_kvm_enable(struct device *dev,
> +		struct device_attribute *attr, char *buf)
> +{
> +}
> +
> +static ssize_t __used store_kvm_enable(struct device *dev,
> +		struct device_attribute *attr, const char *buf,
> +		size_t count)
> +{
> +	struct cpumask stop_cpus;
> +	unsigned long core, thr;
> +
> +	sscanf(buf, "%lx", &core);
> +	if (core > NR_CORES)
> +		return -1;
> +	if (!test_bit(core, &kvm_on_core))
> +		for (thr = 1; thr< threads_per_core; thr++)
> +			if (cpu_online(thr * threads_per_core + thr))
> +				cpumask_set_cpu(thr * threads_per_core + thr, &stop_cpus);

Shouldnt this be 
			if (cpu_online(core * threads_per_core + thr))
				cpumask_set_cpu(core * threads_per_core + thr, &stop_cpus);
?


-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system
  2014-11-12  9:22   ` Srikar Dronamraju
@ 2014-11-18  5:07     ` Liu ping fan
  0 siblings, 0 replies; 30+ messages in thread
From: Liu ping fan @ 2014-11-18  5:07 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: Paul Mackerras, linuxppc-dev, Alexander Graf, kvm-ppc

On Wed, Nov 12, 2014 at 5:22 PM, Srikar Dronamraju
<srikar@linux.vnet.ibm.com> wrote:
> * kernelfans@gmail.com <kernelfans@gmail.com> [2014-10-16 15:29:50]:
>
>> Some system such as powerpc, some tsk (vcpu thread) can only run on
>> the dedicated cpu. Since we adapt some asymmetric method to monitor the
>> whole physical cpu. (powerKVM only allows the primary hwthread to
>> set up runtime env for the secondary when entering guest).
>>
>> Nowadays, powerKVM run with all the secondary hwthread offline to ensure
>> the vcpu threads only run on the primary thread. But we plan to keep all
>> cpus online when running powerKVM to give more power when switching back
>> to host, so introduce sys_allowed cpumask to reflect the cpuset which
>> the vcpu thread can run on.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  include/linux/init_task.h |  1 +
>>  include/linux/sched.h     |  6 ++++++
>>  kernel/sched/core.c       | 10 ++++++++--
>>  3 files changed, 15 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/init_task.h b/include/linux/init_task.h
>> index 2bb4c4f3..c56f69e 100644
>> --- a/include/linux/init_task.h
>> +++ b/include/linux/init_task.h
>> @@ -172,6 +172,7 @@ extern struct task_group root_task_group;
>>       .normal_prio    = MAX_PRIO-20,                                  \
>>       .policy         = SCHED_NORMAL,                                 \
>>       .cpus_allowed   = CPU_MASK_ALL,                                 \
>> +     .sys_allowed = CPU_MASK_ALL,                    \
>
> Do we really need another mask, cant we just use cpus_allowed itself.
>
I think it is not easy to cast two request: chip inherit and user's
configuration onto one mask.

>>       .nr_cpus_allowed= NR_CPUS,                                      \
>>       .mm             = NULL,                                         \
>>       .active_mm      = &init_mm,                                     \
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 5c2c885..ce429f3 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -1260,7 +1260,10 @@ struct task_struct {
>>
>>       unsigned int policy;
>>       int nr_cpus_allowed;
>> +     /* Anded user and sys_allowed */
>>       cpumask_t cpus_allowed;
>> +     /* due to the feature of asymmetric, some tsk can only run on such cpu */
>> +     cpumask_t sys_allowed;
>>
>>  #ifdef CONFIG_PREEMPT_RCU
>>       int rcu_read_lock_nesting;
>> @@ -2030,6 +2033,9 @@ static inline void tsk_restore_flags(struct task_struct *task,
>>  }
>>
>>  #ifdef CONFIG_SMP
>> +extern void set_cpus_sys_allowed(struct task_struct *p,
>> +                     const struct cpumask *new_mask);
>> +
>>  extern void do_set_cpus_allowed(struct task_struct *p,
>>                              const struct cpumask *new_mask);
>>
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index ec1a286..2cd1ae3 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -4596,13 +4596,19 @@ void init_idle(struct task_struct *idle, int cpu)
>>  }
>>
>>  #ifdef CONFIG_SMP
>> +void set_cpus_sys_allowed(struct task_struct *p,
>> +     const struct cpumask *new_mask)
>> +{
>> +     cpumask_copy(&p->sys_allowed, new_mask);
>> +}
>> +
>
> This function doesnt seem to be used anywhere... Not sure why it is
> introduced
>
Not layered the patches well :(  It is used later in the series.

Thx,
Fan

>>  void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
>>  {
>>       if (p->sched_class && p->sched_class->set_cpus_allowed)
>>               p->sched_class->set_cpus_allowed(p, new_mask);
>>
>> -     cpumask_copy(&p->cpus_allowed, new_mask);
>> -     p->nr_cpus_allowed = cpumask_weight(new_mask);
>> +     cpumask_and(&p->cpus_allowed, &p->sys_allowed, new_mask);
>> +     p->nr_cpus_allowed = cpumask_weight(&p->cpus_allowed);
>>  }
>>
>>  /*
>> --
>> 1.8.3.1
>>
>>
>
> --
> Thanks and Regards
> Srikar Dronamraju
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core
  2014-10-27  4:04   ` Preeti U Murthy
@ 2014-11-18  5:17     ` Liu ping fan
  0 siblings, 0 replies; 30+ messages in thread
From: Liu ping fan @ 2014-11-18  5:17 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: Paul Mackerras, linuxppc-dev, Alexander Graf, kvm-ppc

On Mon, Oct 27, 2014 at 12:04 PM, Preeti U Murthy
<preeti@linux.vnet.ibm.com> wrote:
> Hi Liu,
>
> On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
>> When kvm is enabled on a core, we migrate all external irq to primary
>> thread. Since currently, the kvmirq logic is handled by the primary
>> hwthread.
>>
>> Todo: this patch lacks re-enable of irqbalance when kvm is disable on
>> the core
>
> Why is a sysfs file introduced to trigger irq migration? Why is it not
> done during kvm module insert ? And similarly spread interrupts when the
> module is removed? Isn't this a saner way ?

Consider the scene: coreA and coreB, we want to enable KVM on coreA,
while keeping coreB unchanged.
In fact, I try to acheive something un-symmetric on the platform. Do
you think it is an justification?
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/sysfs.c            | 39 ++++++++++++++++++++++++++++++++++
>>  arch/powerpc/sysdev/xics/xics-common.c | 12 +++++++++++
>>  2 files changed, 51 insertions(+)
>>
>> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
>> index 67fd2fd..a2595dd 100644
>> --- a/arch/powerpc/kernel/sysfs.c
>> +++ b/arch/powerpc/kernel/sysfs.c
>> @@ -552,6 +552,45 @@ static void sysfs_create_dscr_default(void)
>>       if (cpu_has_feature(CPU_FTR_DSCR))
>>               err = device_create_file(cpu_subsys.dev_root, &dev_attr_dscr_default);
>>  }
>> +
>> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
>> +#define NR_CORES     (CONFIG_NR_CPUS/threads_per_core)
>> +static DECLARE_BITMAP(kvm_on_core, NR_CORES) __read_mostly
>> +
>> +static ssize_t show_kvm_enable(struct device *dev,
>> +             struct device_attribute *attr, char *buf)
>> +{
>> +}
>> +
>> +static ssize_t __used store_kvm_enable(struct device *dev,
>> +             struct device_attribute *attr, const char *buf,
>> +             size_t count)
>> +{
>> +     struct cpumask stop_cpus;
>> +     unsigned long core, thr;
>> +
>> +     sscanf(buf, "%lx", &core);
>> +     if (core > NR_CORES)
>> +             return -1;
>> +     if (!test_bit(core, &kvm_on_core))
>> +             for (thr = 1; thr< threads_per_core; thr++)
>> +                     if (cpu_online(thr * threads_per_core + thr))
>> +                             cpumask_set_cpu(thr * threads_per_core + thr, &stop_cpus);
>
> What is the above logic trying to do? Did you mean
> cpu_online(threads_per_core * core + thr) ?
>
Yeah. My mistake, should be cpumask_set_cpu(core * threads_per_core +
thr, &stop_cpus)

>> +
>> +     stop_machine(xics_migrate_irqs_away_secondary, NULL, &stop_cpus);
>> +     set_bit(core, &kvm_on_core);
>> +     return count;
>> +}
>> +
>> +static DEVICE_ATTR(kvm_enable, 0600,
>> +     show_kvm_enable, store_kvm_enable);
>> +
>> +static void sysfs_create_kvm_enable(void)
>> +{
>> +     device_create_file(cpu_subsys.dev_root, &dev_attr_kvm_enable);
>> +}
>> +#endif
>> +
>>  #endif /* CONFIG_PPC64 */
>>
>>  #ifdef HAS_PPC_PMC_PA6T
>> diff --git a/arch/powerpc/sysdev/xics/xics-common.c b/arch/powerpc/sysdev/xics/xics-common.c
>> index fe0cca4..68b33d8 100644
>> --- a/arch/powerpc/sysdev/xics/xics-common.c
>> +++ b/arch/powerpc/sysdev/xics/xics-common.c
>> @@ -258,6 +258,18 @@ unlock:
>>               raw_spin_unlock_irqrestore(&desc->lock, flags);
>>       }
>>  }
>> +
>> +int xics_migrate_irqs_away_secondary(void *data)
>> +{
>> +     int cpu = smp_processor_id();
>> +     if(cpu%thread_per_core != 0) {
>> +             WARN(condition, format...);
>> +             return 0;
>> +     }
>> +     /* In fact, if we can migrate the primary, it will be more fine */
>> +     ();
>
> Isn't the aim of the patch to migrate irqs away from the secondary onto
> the primary? But from above it looks like we are returning when we find
> out that we are secondary threads, isn't it?
>
Yes, will fix in next version.

>> +     return 0;
>> +}
>>  #endif /* CONFIG_HOTPLUG_CPU */
>
> Note that xics_migrate_irqs_away() is defined under CONFIG_CPU_HOTPLUG.
> But we will need this option on PowerKVM even when hotplug is not
> configured in.
>
Yes, will fix the dependency in next version

Thx,
Fan

> Regards
> Preeti U Murthy
>>  #ifdef CONFIG_SMP
>>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 04/11] powerpc: kvm: introduce a kthread on primary thread to anti tickless
  2014-10-27  4:45   ` Preeti U Murthy
@ 2014-11-18  5:24     ` Liu ping fan
  0 siblings, 0 replies; 30+ messages in thread
From: Liu ping fan @ 2014-11-18  5:24 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: Paul Mackerras, linuxppc-dev, Alexander Graf, kvm-ppc

On Mon, Oct 27, 2014 at 12:45 PM, Preeti U Murthy
<preeti@linux.vnet.ibm.com> wrote:
> On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
>> (This patch is a place holder.)
>>
>> If there is only one vcpu thread is ready(the other vcpu thread can
>> wait for it to execute), the primary thread can enter tickless mode,
>
> We do not configure NOHZ_FULL to y by default. Hence no thread would
> enter tickless mode.
>
But NOHZ_FULL can be chosen by user, and we should survive from it :)

>> which causes the primary keeps running, so the secondary has no
>> opportunity to exit to host, even they have other tsk on them.
>
> The secondary threads can still get scheduling ticks. The decrementer of
> the secondary threads is still active. So as long as secondary threads
> are busy, scheduling ticks will fire and try to schedule a new task on them.
>
No. As my original thought, after enable KVM on core, the HDEC on
secondary is disabled, otherwise the host exit will be too frequent.
Any suggestion?

Thx,
Fan

> Regards
> Preeti U Murthy
>>
>> Introduce a kthread (anti_tickless) on primary, so when there is only
>> one vcpu thread on primary, the secondary can resort to anti_tickless
>> to keep the primary out of tickless mode.
>> (I thought that anti_tickless thread can goto NAP, so we can let the
>> secondary run).
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kernel/sysfs.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/kernel/sysfs.c b/arch/powerpc/kernel/sysfs.c
>> index a2595dd..f0b110e 100644
>> --- a/arch/powerpc/kernel/sysfs.c
>> +++ b/arch/powerpc/kernel/sysfs.c
>> @@ -575,9 +575,11 @@ static ssize_t __used store_kvm_enable(struct device *dev,
>>       if (!test_bit(core, &kvm_on_core))
>>               for (thr = 1; thr< threads_per_core; thr++)
>>                       if (cpu_online(thr * threads_per_core + thr))
>> -                             cpumask_set_cpu(thr * threads_per_core + thr, &stop_cpus);
>> +                             cpumask_set_cpu(core * threads_per_core + thr, &stop_cpus);
>>
>>       stop_machine(xics_migrate_irqs_away_secondary, NULL, &stop_cpus);
>> +     /* fixme, create a kthread on primary hwthread to handle tickless mode */
>> +     //kthread_create_on_cpu(prevent_tickless, NULL, core * threads_per_core, "ppckvm_prevent_tickless");
>>       set_bit(core, &kvm_on_core);
>>       return count;
>>  }
>>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 06/11] powerpc: kvm: introduce online in paca to indicate whether cpu is needed by host
  2014-10-27  5:32   ` Preeti U Murthy
@ 2014-11-18  5:29     ` Liu ping fan
  0 siblings, 0 replies; 30+ messages in thread
From: Liu ping fan @ 2014-11-18  5:29 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: Paul Mackerras, linuxppc-dev, Alexander Graf, kvm-ppc

On Mon, Oct 27, 2014 at 1:32 PM, Preeti U Murthy
<preeti@linux.vnet.ibm.com> wrote:
> Hi Liu,
>
> On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
>> Nowadays, powerKVM runs with secondary hwthread offline. Although
>> we can make all secondary hwthread online later, we still preserve
>> this behavior for dedicated KVM env. Achieve this by setting
>> paca->online as false.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/include/asm/paca.h         |  3 +++
>>  arch/powerpc/kernel/asm-offsets.c       |  3 +++
>>  arch/powerpc/kernel/smp.c               |  3 +++
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 12 ++++++++++++
>>  4 files changed, 21 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
>> index a5139ea..67c2500 100644
>> --- a/arch/powerpc/include/asm/paca.h
>> +++ b/arch/powerpc/include/asm/paca.h
>> @@ -84,6 +84,9 @@ struct paca_struct {
>>       u8 cpu_start;                   /* At startup, processor spins until */
>>                                       /* this becomes non-zero. */
>>       u8 kexec_state;         /* set when kexec down has irqs off */
>> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
>> +     u8 online;
>> +#endif
>>  #ifdef CONFIG_PPC_STD_MMU_64
>>       struct slb_shadow *slb_shadow_ptr;
>>       struct dtl_entry *dispatch_log;
>> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
>> index 9d7dede..0faa8fe 100644
>> --- a/arch/powerpc/kernel/asm-offsets.c
>> +++ b/arch/powerpc/kernel/asm-offsets.c
>> @@ -182,6 +182,9 @@ int main(void)
>>       DEFINE(PACATOC, offsetof(struct paca_struct, kernel_toc));
>>       DEFINE(PACAKBASE, offsetof(struct paca_struct, kernelbase));
>>       DEFINE(PACAKMSR, offsetof(struct paca_struct, kernel_msr));
>> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
>> +     DEFINE(PACAONLINE, offsetof(struct paca_struct, online));
>> +#endif
>>       DEFINE(PACASOFTIRQEN, offsetof(struct paca_struct, soft_enabled));
>>       DEFINE(PACAIRQHAPPENED, offsetof(struct paca_struct, irq_happened));
>>       DEFINE(PACACONTEXTID, offsetof(struct paca_struct, context.id));
>> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
>> index a0738af..4c3843e 100644
>> --- a/arch/powerpc/kernel/smp.c
>> +++ b/arch/powerpc/kernel/smp.c
>> @@ -736,6 +736,9 @@ void start_secondary(void *unused)
>>
>>       cpu_startup_entry(CPUHP_ONLINE);
>>
>> +#ifdef CONFIG_KVMPPC_ENABLE_SECONDARY
>> +     get_paca()->online = true;
>> +#endif
>>       BUG();
>>  }
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index f0c4db7..d5594b0 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -322,6 +322,13 @@ kvm_no_guest:
>>       li      r0, KVM_HWTHREAD_IN_NAP
>>       stb     r0, HSTATE_HWTHREAD_STATE(r13)
>>  kvm_do_nap:
>> +#ifdef PPCKVM_ENABLE_SECONDARY
>> +     /* check the cpu is needed by host or not */
>> +     ld      r2, PACAONLINE(r13)
>> +     ld      r3, 0
>> +     cmp     r2, r3
>> +     bne     kvm_secondary_exit_trampoline
>> +#endif
>>       /* Clear the runlatch bit before napping */
>>       mfspr   r2, SPRN_CTRLF
>>       clrrdi  r2, r2, 1
>> @@ -340,6 +347,11 @@ kvm_do_nap:
>>       nap
>>       b       .
>>
>> +#ifdef PPCKVM_ENABLE_SECONDARY
>> +kvm_secondary_exit_trampoline:
>> +     b       .
>
> Uh? When we have no vcpu to run, we loop here instead of doing a nap?
> What are we achieving?
>
> If I understand the intention of the patch well, we are looking to
> provide a knob whereby the host can indicate if it needs the secondaries
> at all.
>
Yes, you catch it :)

> Today the host does boot with all threads online. There are some init
> scripts which take the secondaries down. So today the host does not have
> a say in preventing this, compile time or runtime. So lets see how we
> can switch between the two behaviors if we don't have the init script,
> which looks like a saner thing to do.
>
> We should set the paca->online flag to false by default. If
> KVM_PPC_ENABLE_SECONDARY is configured, we need to set this flag to
> true. So at compile time, we resolve the flag.
>
> While booting, we look at the flag and decide whether to get the
> secondaries online. So we get the current behavior if we have not
> configured KVM_PPC_ENABLE_SECONDARY. Will this achieve the purpose of
> this patch?
>
At boot time, KVM can not run. So we can achieve the change of the
flag by soft cpu hotplug on/off.
I think this is a more flexible way.

Thx,
Fan

> Regards
> Preeti U Murthy
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 09/11] powerpc: kvm: handle time base on secondary hwthread
  2014-10-27  6:40   ` Preeti U Murthy
@ 2014-11-18  5:43     ` Liu ping fan
  0 siblings, 0 replies; 30+ messages in thread
From: Liu ping fan @ 2014-11-18  5:43 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: Paul Mackerras, linuxppc-dev, Alexander Graf, kvm-ppc

On Mon, Oct 27, 2014 at 2:40 PM, Preeti U Murthy
<preeti@linux.vnet.ibm.com> wrote:
> On 10/17/2014 12:59 AM, kernelfans@gmail.com wrote:
>> (This is a place holder patch.)
>> We need to store the time base for host on secondary hwthread.
>> Later when switching back, we need to reprogram it with elapse
>> time.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kvm/book3s_hv_rmhandlers.S | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> index 89ea16c..a817ba6 100644
>> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
>> @@ -371,6 +371,8 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
>>
>>       /* fixme: store other register such as msr */
>>
>> +     /* fixme: store the tb, and set it as MAX, so we cease the tick on secondary */
>
> This can lead to serious consequences. First of all, even if we set the
> decrementer(not tb) to MAX, it is bound to fire after 4s. That is the
> maximum time till which you can keep off the decrementer from firing.
>
> In the hotplug path, the offline cpus nap and their decrementers are
> programmed to fire at MAX too. But the difference is that we clear the
> LPCR_PECE1 wakeup bit which prevents cpus from waking up on a
> decrementer interrupt.
>
> We cannot afford to do this here though because there are other tasks on
> the secondary threads' runqueue. They need to be scheduled in.
> There are also timers besides the tick_sched one, which can be queued on
> these secondary threads. These patches have not taken care to migrate
> timers or tasks before entering guest as far as I observed. Hence we
> cannot just turn off time base like this and expect to handle the above
> mentioned events the next time the primary thread decides to exit to the
> host.
>
Yes, that is the nut in this series. My plan is to compensate the
hrtimer when the secondary exit.
But as to the scheduler on secondary, if it is ceased, what is side-effect?

Thx,
Fan

> Regards
> Preeti U Murthy
>> +
>>       /* prevent us to enter kernel */
>>       li      r0, 1
>>       stb     r0, HSTATE_HWTHREAD_REQ(r13)
>> @@ -382,6 +384,10 @@ _GLOBAL_TOC(kvmppc_secondary_stopper_enter)
>>
>>  /* enter with vmode */
>>  kvmppc_secondary_stopper_exit:
>> +     /* fixme: restore the tb, with the orig val plus time elapse
>> +         * so we can fire the hrtimer as soon as possible
>> +         */
>> +
>>       /* fixme, restore the stack which we store on lpaca */
>>
>>       ld      r0, 112+PPC_LR_STKOFF(r1)
>>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 11/11] powerpc: kvm: Kconfig add an option for enabling secondary hwthread
  2014-10-27  6:44   ` Preeti U Murthy
@ 2014-11-18  5:47     ` Liu ping fan
  0 siblings, 0 replies; 30+ messages in thread
From: Liu ping fan @ 2014-11-18  5:47 UTC (permalink / raw)
  To: Preeti U Murthy; +Cc: Paul Mackerras, linuxppc-dev, Alexander Graf, kvm-ppc

On Mon, Oct 27, 2014 at 2:44 PM, Preeti U Murthy
<preeti@linux.vnet.ibm.com> wrote:
> On 10/17/2014 01:00 AM, kernelfans@gmail.com wrote:
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/kvm/Kconfig | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
>> index 602eb51..de38566 100644
>> --- a/arch/powerpc/kvm/Kconfig
>> +++ b/arch/powerpc/kvm/Kconfig
>> @@ -93,6 +93,10 @@ config KVM_BOOK3S_64_HV
>>
>>         If unsure, say N.
>>
>> +config KVMPPC_ENABLE_SECONDARY
>> +     tristate "KVM support for running on secondary hwthread in host"
>> +     depends on KVM_BOOK3S_64_HV
>
> This patch is required ontop of all the rest :) The top patches won't
> compile without this one. Every patch in the patchset should be able to
> compile successfully without the aid of the patches that come after it.
>
I think here is a conflict. If we do so, then we should make effort to
prevent the independent patch to take effect before the whole patchset
is applied.

Thx,
Fan

> Regards
> Preeti U Murthy
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host
  2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
                   ` (10 preceding siblings ...)
  2014-10-16 19:30 ` [RFC 11/11] powerpc: kvm: Kconfig add an option for enabling secondary hwthread kernelfans
@ 2014-11-18 17:54 ` Alexander Graf
  11 siblings, 0 replies; 30+ messages in thread
From: Alexander Graf @ 2014-11-18 17:54 UTC (permalink / raw)
  To: kernelfans, linuxppc-dev, kvm-ppc; +Cc: Paul Mackerras



On 16.10.14 21:29, kernelfans@gmail.com wrote:
> Nowadays, when running powerKVM(book3s, hv mode), we should make the secondary hwthread
> offline. Which means that if we run misc tsks other than dedicated KVM (e.g mix java and KVM),
> we will lose the compute power of the secondary hwthread on host env.

I'm personally more concerned about IO threads and the likes blocking
CPUs that could do actual work.

But really, IMHO this should just get fixed in hardware. The patch set
looks like quite a good addition of complexity to an already complex
problem - which means it will definitely break :).

Couldn't we just do something as simple as partition the system into SMT
and non-SMT cores? Then the user can just say "keep 2 cores in SMT mode"
and we would refuse to run KVM threads on those.

But then again we would bounce on these threads and increase latency on
entry if we happen to get scheduled there, so it's probably not a win
either.

I really don't have a good answer, except for "POWER8 wasn't designed
for this".


Alex

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-11-18 17:54 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-16 19:29 [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host kernelfans
2014-10-16 19:29 ` [RFC 01/11] sched: introduce sys_cpumask in tsk to adapt asymmetric system kernelfans
2014-11-12  9:22   ` Srikar Dronamraju
2014-11-18  5:07     ` Liu ping fan
2014-10-16 19:29 ` [RFC 02/11] powerpc: kvm: ensure vcpu-thread run only on primary hwthread kernelfans
2014-11-12 10:17   ` Srikar Dronamraju
2014-10-16 19:29 ` [RFC 03/11] powerpc: kvm: add interface to control kvm function on a core kernelfans
2014-10-27  4:04   ` Preeti U Murthy
2014-11-18  5:17     ` Liu ping fan
2014-11-12 13:01   ` Srikar Dronamraju
2014-10-16 19:29 ` [RFC 04/11] powerpc: kvm: introduce a kthread on primary thread to anti tickless kernelfans
2014-10-27  4:45   ` Preeti U Murthy
2014-11-18  5:24     ` Liu ping fan
2014-10-16 19:29 ` [RFC 05/11] sched: introduce stop_cpus_async() to schedule special tsk on cpu kernelfans
2014-10-16 19:29 ` [RFC 06/11] powerpc: kvm: introduce online in paca to indicate whether cpu is needed by host kernelfans
2014-10-27  5:32   ` Preeti U Murthy
2014-11-18  5:29     ` Liu ping fan
2014-10-16 19:29 ` [RFC 07/11] powerpc: kvm: the stopper func to cease secondary hwthread kernelfans
2014-10-22  7:12   ` Preeti U Murthy
2014-10-27  6:07   ` Preeti U Murthy
2014-10-16 19:29 ` [RFC 08/11] powerpc: kvm: add a flag in vcore to sync primary with secondry hwthread kernelfans
2014-10-27  6:28   ` Preeti U Murthy
2014-10-16 19:29 ` [RFC 09/11] powerpc: kvm: handle time base on secondary hwthread kernelfans
2014-10-27  6:40   ` Preeti U Murthy
2014-11-18  5:43     ` Liu ping fan
2014-10-16 19:29 ` [RFC 10/11] powerpc: kvm: on_primary_thread() force the secondary threads into NAP mode kernelfans
2014-10-16 19:30 ` [RFC 11/11] powerpc: kvm: Kconfig add an option for enabling secondary hwthread kernelfans
2014-10-27  6:44   ` Preeti U Murthy
2014-11-18  5:47     ` Liu ping fan
2014-11-18 17:54 ` [RFC 00/11]: powerKVM, release the compute power of secondary hwthread on host Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).