All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] implement vcpu preempted check
@ 2016-07-21 11:45 Pan Xinhui
  2016-07-21 11:45 ` [PATCH v3 1/4] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
                   ` (8 more replies)
  0 siblings, 9 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: benh, paulus, mpe, mingo, peterz, paulmck, will.deacon,
	kernellwp, jgross, pbonzini, bsingharora, Pan Xinhui

change from v2:
	no code change, fix typos, update some comments

change from v1:
	a simplier definition of default vcpu_is_preempted
	skip mahcine type check on ppc, and add config. remove dedicated macro.
	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
	add more comments
	thanks boqun and Peter's suggestion.

This patch set aims to fix lock holder preemption issues.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
These spin_on_onwer variant also cause rcu stall before we apply this patch set

Pan Xinhui (4):
  kernel/sched: introduce vcpu preempted check interface
  powerpc/spinlock: support vcpu preempted check
  locking/osq: Drop the overhead of osq_lock()
  kernel/locking: Drop the overhead of {mutex,rwsem}_spin_on_owner

 arch/powerpc/include/asm/spinlock.h | 18 ++++++++++++++++++
 include/linux/sched.h               | 12 ++++++++++++
 kernel/locking/mutex.c              | 15 +++++++++++++--
 kernel/locking/osq_lock.c           | 10 +++++++++-
 kernel/locking/rwsem-xadd.c         | 16 +++++++++++++---
 5 files changed, 65 insertions(+), 6 deletions(-)

-- 
2.4.11

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 1/4] kernel/sched: introduce vcpu preempted check interface
  2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
@ 2016-07-21 11:45 ` Pan Xinhui
  2016-07-21 11:45 ` Pan Xinhui
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: benh, paulus, mpe, mingo, peterz, paulmck, will.deacon,
	kernellwp, jgross, pbonzini, bsingharora, Pan Xinhui

This patch supports to fix lock holder preemption issue.

For kernel users, we could use bool vcpu_is_preempted(int cpu) to detech if
one vcpu is preempted or not.

The default implementation is a macro defined by false. So compiler can
wrap it out if arch dose not support such vcpu pteempted check.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 include/linux/sched.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6e42ada..cbe0574 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3293,6 +3293,18 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
 
 #endif /* CONFIG_SMP */
 
+/*
+ * In order to deal with a various lock holder preemption issues provide an
+ * interface to see if a vCPU is currently running or not.
+ *
+ * This allows us to terminate optimistic spin loops and block, analogous to
+ * the native optimistic spin heuristic of testing if the lock owner task is
+ * running or not.
+ */
+#ifndef vcpu_is_preempted
+#define vcpu_is_preempted(cpu)	false
+#endif
+
 extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
 extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 1/4] kernel/sched: introduce vcpu preempted check interface
  2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
  2016-07-21 11:45 ` [PATCH v3 1/4] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
@ 2016-07-21 11:45 ` Pan Xinhui
  2016-07-21 11:45 ` [PATCH v3 2/4] powerpc/spinlock: support vcpu preempted check Pan Xinhui
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: kernellwp, jgross, Pan Xinhui, peterz, benh, will.deacon, mingo,
	paulus, mpe, pbonzini, paulmck

This patch supports to fix lock holder preemption issue.

For kernel users, we could use bool vcpu_is_preempted(int cpu) to detech if
one vcpu is preempted or not.

The default implementation is a macro defined by false. So compiler can
wrap it out if arch dose not support such vcpu pteempted check.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 include/linux/sched.h | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6e42ada..cbe0574 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3293,6 +3293,18 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
 
 #endif /* CONFIG_SMP */
 
+/*
+ * In order to deal with a various lock holder preemption issues provide an
+ * interface to see if a vCPU is currently running or not.
+ *
+ * This allows us to terminate optimistic spin loops and block, analogous to
+ * the native optimistic spin heuristic of testing if the lock owner task is
+ * running or not.
+ */
+#ifndef vcpu_is_preempted
+#define vcpu_is_preempted(cpu)	false
+#endif
+
 extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
 extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 2/4] powerpc/spinlock: support vcpu preempted check
  2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
                   ` (2 preceding siblings ...)
  2016-07-21 11:45 ` [PATCH v3 2/4] powerpc/spinlock: support vcpu preempted check Pan Xinhui
@ 2016-07-21 11:45 ` Pan Xinhui
  2016-07-21 11:45 ` [PATCH v3 3/4] locking/osq: Drop the overhead of osq_lock() Pan Xinhui
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: benh, paulus, mpe, mingo, peterz, paulmck, will.deacon,
	kernellwp, jgross, pbonzini, bsingharora, Pan Xinhui

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself.
Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
takes the cpu as parameter and return true if the cpu is preempted. Then
kernel can break the spin loops upon the retval of vcpu_is_preempted().

As kernel has used this interface, So lets support it.

Only pSeries need support it. And the fact is powerNV are built into same
kernel image with pSeries. So we need return false if we are runnig as
powerNV. The another fact is that lppaca->yield_count keeps zero on
powerNV. So we can just skip the machine type check.

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/spinlock.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 523673d..3ac9fcb 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -52,6 +52,24 @@
 #define SYNC_IO
 #endif
 
+/*
+ * This support kernel to check if one cpu is preempted or not.
+ * Then we can fix some lock holder preemption issue.
+ */
+#ifdef CONFIG_PPC_PSERIES
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+	/*
+	 * pSeries and powerNV can be built into same kernel image. In
+	 * principle we need return false directly if we are running as
+	 * powerNV. However the yield_count is always zero on powerNV, So
+	 * skip such machine type check
+	 */
+	return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
+}
+#endif
+
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
 	return lock.slock == 0;
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 2/4] powerpc/spinlock: support vcpu preempted check
  2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
  2016-07-21 11:45 ` [PATCH v3 1/4] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
  2016-07-21 11:45 ` Pan Xinhui
@ 2016-07-21 11:45 ` Pan Xinhui
  2016-07-21 11:45 ` Pan Xinhui
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: kernellwp, jgross, Pan Xinhui, peterz, benh, will.deacon, mingo,
	paulus, mpe, pbonzini, paulmck

This is to fix some lock holder preemption issues. Some other locks
implementation do a spin loop before acquiring the lock itself.
Currently kernel has an interface of bool vcpu_is_preempted(int cpu). It
takes the cpu as parameter and return true if the cpu is preempted. Then
kernel can break the spin loops upon the retval of vcpu_is_preempted().

As kernel has used this interface, So lets support it.

Only pSeries need support it. And the fact is powerNV are built into same
kernel image with pSeries. So we need return false if we are runnig as
powerNV. The another fact is that lppaca->yield_count keeps zero on
powerNV. So we can just skip the machine type check.

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/spinlock.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 523673d..3ac9fcb 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -52,6 +52,24 @@
 #define SYNC_IO
 #endif
 
+/*
+ * This support kernel to check if one cpu is preempted or not.
+ * Then we can fix some lock holder preemption issue.
+ */
+#ifdef CONFIG_PPC_PSERIES
+#define vcpu_is_preempted vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+	/*
+	 * pSeries and powerNV can be built into same kernel image. In
+	 * principle we need return false directly if we are running as
+	 * powerNV. However the yield_count is always zero on powerNV, So
+	 * skip such machine type check
+	 */
+	return !!(be32_to_cpu(lppaca_of(cpu).yield_count) & 1);
+}
+#endif
+
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
 	return lock.slock == 0;
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 3/4] locking/osq: Drop the overhead of osq_lock()
  2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
                   ` (3 preceding siblings ...)
  2016-07-21 11:45 ` Pan Xinhui
@ 2016-07-21 11:45 ` Pan Xinhui
  2016-07-21 11:45 ` Pan Xinhui
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: benh, paulus, mpe, mingo, peterz, paulmck, will.deacon,
	kernellwp, jgross, pbonzini, bsingharora, Pan Xinhui

An over-committed guest with more vCPUs than pCPUs has a heavy overhead in
osq_lock().

This is because vCPU A hold the osq lock and yield out, vCPU B wait per_cpu
node->locked to be set. IOW, vCPU B wait vCPU A to run and unlock the osq
lock.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

after patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/osq_lock.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..858a0ed 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
 	return cpu_nr + 1;
 }
 
+static inline int node_cpu(struct optimistic_spin_node *node)
+{
+	return node->cpu - 1;
+}
+
 static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
 {
 	int cpu_nr = encoded_cpu_val - 1;
@@ -118,8 +123,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	while (!READ_ONCE(node->locked)) {
 		/*
 		 * If we need to reschedule bail... so we can block.
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break the loop. vcpu_is_preempted is a macro defined by
+		 * false if arch does not support vcpu preempted check,
 		 */
-		if (need_resched())
+		if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
 			goto unqueue;
 
 		cpu_relax_lowlatency();
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 3/4] locking/osq: Drop the overhead of osq_lock()
  2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
                   ` (4 preceding siblings ...)
  2016-07-21 11:45 ` [PATCH v3 3/4] locking/osq: Drop the overhead of osq_lock() Pan Xinhui
@ 2016-07-21 11:45 ` Pan Xinhui
  2016-07-21 11:45   ` Pan Xinhui
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: kernellwp, jgross, Pan Xinhui, peterz, benh, will.deacon, mingo,
	paulus, mpe, pbonzini, paulmck

An over-committed guest with more vCPUs than pCPUs has a heavy overhead in
osq_lock().

This is because vCPU A hold the osq lock and yield out, vCPU B wait per_cpu
node->locked to be set. IOW, vCPU B wait vCPU A to run and unlock the osq
lock.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

after patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/osq_lock.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..858a0ed 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
 	return cpu_nr + 1;
 }
 
+static inline int node_cpu(struct optimistic_spin_node *node)
+{
+	return node->cpu - 1;
+}
+
 static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
 {
 	int cpu_nr = encoded_cpu_val - 1;
@@ -118,8 +123,11 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	while (!READ_ONCE(node->locked)) {
 		/*
 		 * If we need to reschedule bail... so we can block.
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break the loop. vcpu_is_preempted is a macro defined by
+		 * false if arch does not support vcpu preempted check,
 		 */
-		if (need_resched())
+		if (need_resched() || vcpu_is_preempted(node_cpu(node->prev)))
 			goto unqueue;
 
 		cpu_relax_lowlatency();
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 4/4] kernel/locking: Drop the overhead of {mutex,rwsem}_spin_on_owner
  2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
  2016-07-21 11:45 ` [PATCH v3 1/4] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
@ 2016-07-21 11:45   ` Pan Xinhui
  2016-07-21 11:45 ` [PATCH v3 2/4] powerpc/spinlock: support vcpu preempted check Pan Xinhui
                     ` (6 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: benh, paulus, mpe, mingo, peterz, paulmck, will.deacon,
	kernellwp, jgross, pbonzini, bsingharora, Pan Xinhui

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
the two spin_on_owner. This blames on the lock holder preemption issue.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/mutex.c      | 15 +++++++++++++--
 kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 79d2d76..4f00b0c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -236,7 +236,13 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 		 */
 		barrier();
 
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break. vcpu_is_preempted is a macro defined by false if
+		 * arch does not support vcpu preempted check,
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			ret = false;
 			break;
 		}
@@ -261,8 +267,13 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 
 	rcu_read_lock();
 	owner = READ_ONCE(lock->owner);
+
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task is not
+	 * on cpu or its cpu is preempted
+	 */
 	if (owner)
-		retval = owner->on_cpu;
+		retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 	rcu_read_unlock();
 	/*
 	 * if lock->owner is not set, the mutex owner may have just acquired
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09e30c6..99eb8fd 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -319,7 +319,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 		goto done;
 	}
 
-	ret = owner->on_cpu;
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task is not
+	 * on cpu or its cpu is preempted
+	 */
+	ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 done:
 	rcu_read_unlock();
 	return ret;
@@ -340,8 +344,14 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 		 */
 		barrier();
 
-		/* abort spinning when need_resched or owner is not running */
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * abort spinning when need_resched or owner is not running or
+		 * owner's cpu is preempted. vcpu_is_preempted is a macro
+		 * defined by false if arch does not support vcpu preempted
+		 * check
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
 			return false;
 		}
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 4/4] kernel/locking: Drop the overhead of {mutex, rwsem}_spin_on_owner
@ 2016-07-21 11:45   ` Pan Xinhui
  0 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: kernellwp, jgross, Pan Xinhui, peterz, benh, will.deacon, mingo,
	paulus, mpe, pbonzini, paulmck

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
the two spin_on_owner. This blames on the lock holder preemption issue.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/mutex.c      | 15 +++++++++++++--
 kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 79d2d76..4f00b0c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -236,7 +236,13 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 		 */
 		barrier();
 
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break. vcpu_is_preempted is a macro defined by false if
+		 * arch does not support vcpu preempted check,
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			ret = false;
 			break;
 		}
@@ -261,8 +267,13 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 
 	rcu_read_lock();
 	owner = READ_ONCE(lock->owner);
+
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task is not
+	 * on cpu or its cpu is preempted
+	 */
 	if (owner)
-		retval = owner->on_cpu;
+		retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 	rcu_read_unlock();
 	/*
 	 * if lock->owner is not set, the mutex owner may have just acquired
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09e30c6..99eb8fd 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -319,7 +319,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 		goto done;
 	}
 
-	ret = owner->on_cpu;
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task is not
+	 * on cpu or its cpu is preempted
+	 */
+	ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 done:
 	rcu_read_unlock();
 	return ret;
@@ -340,8 +344,14 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 		 */
 		barrier();
 
-		/* abort spinning when need_resched or owner is not running */
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * abort spinning when need_resched or owner is not running or
+		 * owner's cpu is preempted. vcpu_is_preempted is a macro
+		 * defined by false if arch does not support vcpu preempted
+		 * check
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
 			return false;
 		}
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH v3 4/4] kernel/locking: Drop the overhead of {mutex, rwsem}_spin_on_owner
@ 2016-07-21 11:45   ` Pan Xinhui
  0 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: benh, paulus, mpe, mingo, peterz, paulmck, will.deacon,
	kernellwp, jgross, pbonzini, bsingharora, Pan Xinhui

An over-committed guest with more vCPUs than pCPUs has a heavy overload in
the two spin_on_owner. This blames on the lock holder preemption issue.

Kernel has an interface bool vcpu_is_preempted(int cpu) to see if a vCPU is
currently running or not. So break the spin loops on true condition.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/mutex.c      | 15 +++++++++++++--
 kernel/locking/rwsem-xadd.c | 16 +++++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 79d2d76..4f00b0c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -236,7 +236,13 @@ bool mutex_spin_on_owner(struct mutex *lock, struct task_struct *owner)
 		 */
 		barrier();
 
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * Use vcpu_is_preempted to detech lock holder preemption issue
+		 * and break. vcpu_is_preempted is a macro defined by false if
+		 * arch does not support vcpu preempted check,
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			ret = false;
 			break;
 		}
@@ -261,8 +267,13 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 
 	rcu_read_lock();
 	owner = READ_ONCE(lock->owner);
+
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task is not
+	 * on cpu or its cpu is preempted
+	 */
 	if (owner)
-		retval = owner->on_cpu;
+		retval = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 	rcu_read_unlock();
 	/*
 	 * if lock->owner is not set, the mutex owner may have just acquired
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
index 09e30c6..99eb8fd 100644
--- a/kernel/locking/rwsem-xadd.c
+++ b/kernel/locking/rwsem-xadd.c
@@ -319,7 +319,11 @@ static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
 		goto done;
 	}
 
-	ret = owner->on_cpu;
+	/*
+	 * As lock holder preemption issue, we both skip spinning if task is not
+	 * on cpu or its cpu is preempted
+	 */
+	ret = owner->on_cpu && !vcpu_is_preempted(task_cpu(owner));
 done:
 	rcu_read_unlock();
 	return ret;
@@ -340,8 +344,14 @@ bool rwsem_spin_on_owner(struct rw_semaphore *sem, struct task_struct *owner)
 		 */
 		barrier();
 
-		/* abort spinning when need_resched or owner is not running */
-		if (!owner->on_cpu || need_resched()) {
+		/*
+		 * abort spinning when need_resched or owner is not running or
+		 * owner's cpu is preempted. vcpu_is_preempted is a macro
+		 * defined by false if arch does not support vcpu preempted
+		 * check
+		 */
+		if (!owner->on_cpu || need_resched() ||
+				vcpu_is_preempted(task_cpu(owner))) {
 			rcu_read_unlock();
 			return false;
 		}
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
@ 2016-09-29 10:10   ` Peter Zijlstra
  2016-07-21 11:45 ` Pan Xinhui
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2016-09-29 10:10 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm, benh, paulus, mpe, mingo, paulmck,
	will.deacon, kernellwp, jgross, pbonzini, bsingharora

On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
> change from v2:
> 	no code change, fix typos, update some comments
> 
> change from v1:
> 	a simplier definition of default vcpu_is_preempted
> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
> 	add more comments
> 	thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.

So I really like the concept, but I would also really like to see
support for more hypervisors included before we can move forward with
this.

Please consider s390 and (x86/arm) KVM. Once we have a few, more can
follow later, but I think its important to not only have PPC support for
this.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-09-29 10:10   ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2016-09-29 10:10 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: kernellwp, linux-s390, jgross, kvm, xen-devel-request,
	will.deacon, linux-kernel, virtualization, mingo, paulus, mpe,
	benh, pbonzini, paulmck, linuxppc-dev

On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
> change from v2:
> 	no code change, fix typos, update some comments
> 
> change from v1:
> 	a simplier definition of default vcpu_is_preempted
> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
> 	add more comments
> 	thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.

So I really like the concept, but I would also really like to see
support for more hypervisors included before we can move forward with
this.

Please consider s390 and (x86/arm) KVM. Once we have a few, more can
follow later, but I think its important to not only have PPC support for
this.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-29 10:10   ` Peter Zijlstra
@ 2016-09-29 10:23     ` Christian Borntraeger
  -1 siblings, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-09-29 10:23 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm, benh, paulus, mpe, mingo, paulmck,
	will.deacon, kernellwp, jgross, pbonzini, bsingharora,
	linux-s390, Heiko Carstens

On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>> change from v2:
>> 	no code change, fix typos, update some comments
>>
>> change from v1:
>> 	a simplier definition of default vcpu_is_preempted
>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
>> 	add more comments
>> 	thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
> 
> So I really like the concept, but I would also really like to see
> support for more hypervisors included before we can move forward with
> this.
> 
> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
> follow later, but I think its important to not only have PPC support for
> this.

Actually the s390 preemted check via sigp sense running  is available for
all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
longer buy s390 systems without LPAR.

As Heiko already pointed out we could simply use a small inline function
that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)

Christian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-09-29 10:23     ` Christian Borntraeger
  0 siblings, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-09-29 10:23 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: kernellwp, linux-s390, jgross, kvm, xen-devel-request,
	will.deacon, linux-kernel, Heiko Carstens, virtualization, mingo,
	paulus, mpe, benh, pbonzini, paulmck, linuxppc-dev

On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>> change from v2:
>> 	no code change, fix typos, update some comments
>>
>> change from v1:
>> 	a simplier definition of default vcpu_is_preempted
>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
>> 	add more comments
>> 	thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
> 
> So I really like the concept, but I would also really like to see
> support for more hypervisors included before we can move forward with
> this.
> 
> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
> follow later, but I think its important to not only have PPC support for
> this.

Actually the s390 preemted check via sigp sense running  is available for
all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
longer buy s390 systems without LPAR.

As Heiko already pointed out we could simply use a small inline function
that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)

Christian

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-29 10:23     ` Christian Borntraeger
@ 2016-09-29 10:31       ` Peter Zijlstra
  -1 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2016-09-29 10:31 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: Pan Xinhui, linux-kernel, linuxppc-dev, virtualization,
	linux-s390, xen-devel-request, kvm, benh, paulus, mpe, mingo,
	paulmck, will.deacon, kernellwp, jgross, pbonzini, bsingharora,
	Heiko Carstens

On Thu, Sep 29, 2016 at 12:23:19PM +0200, Christian Borntraeger wrote:
> On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
> > On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
> >> change from v2:
> >> 	no code change, fix typos, update some comments
> >>
> >> change from v1:
> >> 	a simplier definition of default vcpu_is_preempted
> >> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
> >> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
> >> 	add more comments
> >> 	thanks boqun and Peter's suggestion.
> >>
> >> This patch set aims to fix lock holder preemption issues.
> > 
> > So I really like the concept, but I would also really like to see
> > support for more hypervisors included before we can move forward with
> > this.
> > 
> > Please consider s390 and (x86/arm) KVM. Once we have a few, more can
> > follow later, but I think its important to not only have PPC support for
> > this.
> 
> Actually the s390 preemted check via sigp sense running  is available for
> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
> longer buy s390 systems without LPAR.
> 
> As Heiko already pointed out we could simply use a small inline function
> that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)

Sure, and I had vague memories of Heiko's email. This patch set however
completely fails to do that trivial hooking up.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-09-29 10:31       ` Peter Zijlstra
  0 siblings, 0 replies; 37+ messages in thread
From: Peter Zijlstra @ 2016-09-29 10:31 UTC (permalink / raw)
  To: Christian Borntraeger
  Cc: kernellwp, linux-s390, benh, jgross, kvm, Pan Xinhui,
	will.deacon, linux-kernel, Heiko Carstens, virtualization, mingo,
	paulus, mpe, xen-devel-request, pbonzini, paulmck, linuxppc-dev

On Thu, Sep 29, 2016 at 12:23:19PM +0200, Christian Borntraeger wrote:
> On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
> > On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
> >> change from v2:
> >> 	no code change, fix typos, update some comments
> >>
> >> change from v1:
> >> 	a simplier definition of default vcpu_is_preempted
> >> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
> >> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
> >> 	add more comments
> >> 	thanks boqun and Peter's suggestion.
> >>
> >> This patch set aims to fix lock holder preemption issues.
> > 
> > So I really like the concept, but I would also really like to see
> > support for more hypervisors included before we can move forward with
> > this.
> > 
> > Please consider s390 and (x86/arm) KVM. Once we have a few, more can
> > follow later, but I think its important to not only have PPC support for
> > this.
> 
> Actually the s390 preemted check via sigp sense running  is available for
> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
> longer buy s390 systems without LPAR.
> 
> As Heiko already pointed out we could simply use a small inline function
> that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)

Sure, and I had vague memories of Heiko's email. This patch set however
completely fails to do that trivial hooking up.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-29 10:23     ` Christian Borntraeger
                       ` (2 preceding siblings ...)
  (?)
@ 2016-09-29 10:40     ` Christian Borntraeger
  2016-09-29 11:05       ` Christian Borntraeger
  2016-09-29 11:05       ` Christian Borntraeger
  -1 siblings, 2 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-09-29 10:40 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm, benh, paulus, mpe, mingo, paulmck,
	will.deacon, kernellwp, jgross, pbonzini, bsingharora,
	Heiko Carstens

On 09/29/2016 12:23 PM, Christian Borntraeger wrote:
> On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
>> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>>> change from v2:
>>> 	no code change, fix typos, update some comments
>>>
>>> change from v1:
>>> 	a simplier definition of default vcpu_is_preempted
>>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
>>> 	add more comments
>>> 	thanks boqun and Peter's suggestion.
>>>
>>> This patch set aims to fix lock holder preemption issues.
>>
>> So I really like the concept, but I would also really like to see
>> support for more hypervisors included before we can move forward with
>> this.
>>
>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>> follow later, but I think its important to not only have PPC support for
>> this.
> 
> Actually the s390 preemted check via sigp sense running  is available for
> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
> longer buy s390 systems without LPAR.
> 
> As Heiko already pointed out we could simply use a small inline function
> that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)

Maybe something like
(untested and just pasted, so white space damaged)

diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index 63ebf37..6e82986 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -21,6 +21,13 @@ _raw_compare_and_swap(unsigned int *lock, unsigned int old, unsigned int new)
        return __sync_bool_compare_and_swap(lock, old, new);
 }
 
+int arch_vcpu_is_preempted(int cpu);
+#define vcpu_is_preempted cpu_is_preempted
+static inline bool cpu_is_preempted(int cpu)
+{
+       return arch_vcpu_is_preempted(cpu);
+}
+
 /*
  * Simple spin lock operations.  There are two variants, one clears IRQ's
  * on the local processor, one does not.
diff --git a/arch/s390/lib/spinlock.c b/arch/s390/lib/spinlock.c
index e5f50a7..260d179 100644
--- a/arch/s390/lib/spinlock.c
+++ b/arch/s390/lib/spinlock.c
@@ -37,7 +37,7 @@ static inline void _raw_compare_and_delay(unsigned int *lock, unsigned int old)
        asm(".insn rsy,0xeb0000000022,%0,0,%1" : : "d" (old), "Q" (*lock));
 }
 
-static inline int cpu_is_preempted(int cpu)
+int arch_vcpu_is_preempted(int cpu)
 {
        if (test_cpu_flag_of(CIF_ENABLED_WAIT, cpu))
                return 0;
@@ -45,6 +45,7 @@ static inline int cpu_is_preempted(int cpu)
                return 0;
        return 1;
 }
+EXPORT_SYMBOL(arch_vcpu_is_preempted);
 
 void arch_spin_lock_wait(arch_spinlock_t *lp)
 {



If ok I can respin into a proper patch.

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-29 10:23     ` Christian Borntraeger
  (?)
  (?)
@ 2016-09-29 10:40     ` Christian Borntraeger
  -1 siblings, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-09-29 10:40 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: kernellwp, linux-s390, jgross, kvm, xen-devel-request,
	will.deacon, linux-kernel, Heiko Carstens, virtualization, mingo,
	paulus, mpe, benh, pbonzini, paulmck, linuxppc-dev

On 09/29/2016 12:23 PM, Christian Borntraeger wrote:
> On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
>> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>>> change from v2:
>>> 	no code change, fix typos, update some comments
>>>
>>> change from v1:
>>> 	a simplier definition of default vcpu_is_preempted
>>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
>>> 	add more comments
>>> 	thanks boqun and Peter's suggestion.
>>>
>>> This patch set aims to fix lock holder preemption issues.
>>
>> So I really like the concept, but I would also really like to see
>> support for more hypervisors included before we can move forward with
>> this.
>>
>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>> follow later, but I think its important to not only have PPC support for
>> this.
> 
> Actually the s390 preemted check via sigp sense running  is available for
> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
> longer buy s390 systems without LPAR.
> 
> As Heiko already pointed out we could simply use a small inline function
> that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)

Maybe something like
(untested and just pasted, so white space damaged)

diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
index 63ebf37..6e82986 100644
--- a/arch/s390/include/asm/spinlock.h
+++ b/arch/s390/include/asm/spinlock.h
@@ -21,6 +21,13 @@ _raw_compare_and_swap(unsigned int *lock, unsigned int old, unsigned int new)
        return __sync_bool_compare_and_swap(lock, old, new);
 }
 
+int arch_vcpu_is_preempted(int cpu);
+#define vcpu_is_preempted cpu_is_preempted
+static inline bool cpu_is_preempted(int cpu)
+{
+       return arch_vcpu_is_preempted(cpu);
+}
+
 /*
  * Simple spin lock operations.  There are two variants, one clears IRQ's
  * on the local processor, one does not.
diff --git a/arch/s390/lib/spinlock.c b/arch/s390/lib/spinlock.c
index e5f50a7..260d179 100644
--- a/arch/s390/lib/spinlock.c
+++ b/arch/s390/lib/spinlock.c
@@ -37,7 +37,7 @@ static inline void _raw_compare_and_delay(unsigned int *lock, unsigned int old)
        asm(".insn rsy,0xeb0000000022,%0,0,%1" : : "d" (old), "Q" (*lock));
 }
 
-static inline int cpu_is_preempted(int cpu)
+int arch_vcpu_is_preempted(int cpu)
 {
        if (test_cpu_flag_of(CIF_ENABLED_WAIT, cpu))
                return 0;
@@ -45,6 +45,7 @@ static inline int cpu_is_preempted(int cpu)
                return 0;
        return 1;
 }
+EXPORT_SYMBOL(arch_vcpu_is_preempted);
 
 void arch_spin_lock_wait(arch_spinlock_t *lp)
 {



If ok I can respin into a proper patch.

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-29 10:40     ` Christian Borntraeger
@ 2016-09-29 11:05       ` Christian Borntraeger
  2016-09-29 11:05       ` Christian Borntraeger
  1 sibling, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-09-29 11:05 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm, benh, paulus, mpe, mingo, paulmck,
	will.deacon, kernellwp, jgross, pbonzini, bsingharora,
	Heiko Carstens

On 09/29/2016 12:40 PM, Christian Borntraeger wrote:
> On 09/29/2016 12:23 PM, Christian Borntraeger wrote:
>> On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
>>> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>>>> change from v2:
>>>> 	no code change, fix typos, update some comments
>>>>
>>>> change from v1:
>>>> 	a simplier definition of default vcpu_is_preempted
>>>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>>>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
>>>> 	add more comments
>>>> 	thanks boqun and Peter's suggestion.
>>>>
>>>> This patch set aims to fix lock holder preemption issues.
>>>
>>> So I really like the concept, but I would also really like to see
>>> support for more hypervisors included before we can move forward with
>>> this.
>>>
>>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>>> follow later, but I think its important to not only have PPC support for
>>> this.
>>
>> Actually the s390 preemted check via sigp sense running  is available for
>> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
>> longer buy s390 systems without LPAR.
>>
>> As Heiko already pointed out we could simply use a small inline function
>> that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)
> 
> Maybe something like
> (untested and just pasted, so white space damaged)

Now tested. With 8 host cpus and 16 guest cpus perf bench sched shows the
same improvements as in Pan Xinhuis cover letter.
Also the runtime shrinks a lot.

> 
> diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
> index 63ebf37..6e82986 100644
> --- a/arch/s390/include/asm/spinlock.h
> +++ b/arch/s390/include/asm/spinlock.h
> @@ -21,6 +21,13 @@ _raw_compare_and_swap(unsigned int *lock, unsigned int old, unsigned int new)
>         return __sync_bool_compare_and_swap(lock, old, new);
>  }
> 
> +int arch_vcpu_is_preempted(int cpu);
> +#define vcpu_is_preempted cpu_is_preempted
> +static inline bool cpu_is_preempted(int cpu)
> +{
> +       return arch_vcpu_is_preempted(cpu);
> +}
> +
>  /*
>   * Simple spin lock operations.  There are two variants, one clears IRQ's
>   * on the local processor, one does not.
> diff --git a/arch/s390/lib/spinlock.c b/arch/s390/lib/spinlock.c
> index e5f50a7..260d179 100644
> --- a/arch/s390/lib/spinlock.c
> +++ b/arch/s390/lib/spinlock.c
> @@ -37,7 +37,7 @@ static inline void _raw_compare_and_delay(unsigned int *lock, unsigned int old)
>         asm(".insn rsy,0xeb0000000022,%0,0,%1" : : "d" (old), "Q" (*lock));
>  }
> 
> -static inline int cpu_is_preempted(int cpu)
> +int arch_vcpu_is_preempted(int cpu)
>  {
>         if (test_cpu_flag_of(CIF_ENABLED_WAIT, cpu))
>                 return 0;
> @@ -45,6 +45,7 @@ static inline int cpu_is_preempted(int cpu)
>                 return 0;
>         return 1;
>  }
> +EXPORT_SYMBOL(arch_vcpu_is_preempted);
> 
>  void arch_spin_lock_wait(arch_spinlock_t *lp)
>  {
> 
> 
> 
> If ok I can respin into a proper patch.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-29 10:40     ` Christian Borntraeger
  2016-09-29 11:05       ` Christian Borntraeger
@ 2016-09-29 11:05       ` Christian Borntraeger
  1 sibling, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-09-29 11:05 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: kernellwp, linux-s390, jgross, kvm, xen-devel-request,
	will.deacon, linux-kernel, Heiko Carstens, virtualization, mingo,
	paulus, mpe, benh, pbonzini, paulmck, linuxppc-dev

On 09/29/2016 12:40 PM, Christian Borntraeger wrote:
> On 09/29/2016 12:23 PM, Christian Borntraeger wrote:
>> On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
>>> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>>>> change from v2:
>>>> 	no code change, fix typos, update some comments
>>>>
>>>> change from v1:
>>>> 	a simplier definition of default vcpu_is_preempted
>>>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>>>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
>>>> 	add more comments
>>>> 	thanks boqun and Peter's suggestion.
>>>>
>>>> This patch set aims to fix lock holder preemption issues.
>>>
>>> So I really like the concept, but I would also really like to see
>>> support for more hypervisors included before we can move forward with
>>> this.
>>>
>>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>>> follow later, but I think its important to not only have PPC support for
>>> this.
>>
>> Actually the s390 preemted check via sigp sense running  is available for
>> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
>> longer buy s390 systems without LPAR.
>>
>> As Heiko already pointed out we could simply use a small inline function
>> that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)
> 
> Maybe something like
> (untested and just pasted, so white space damaged)

Now tested. With 8 host cpus and 16 guest cpus perf bench sched shows the
same improvements as in Pan Xinhuis cover letter.
Also the runtime shrinks a lot.

> 
> diff --git a/arch/s390/include/asm/spinlock.h b/arch/s390/include/asm/spinlock.h
> index 63ebf37..6e82986 100644
> --- a/arch/s390/include/asm/spinlock.h
> +++ b/arch/s390/include/asm/spinlock.h
> @@ -21,6 +21,13 @@ _raw_compare_and_swap(unsigned int *lock, unsigned int old, unsigned int new)
>         return __sync_bool_compare_and_swap(lock, old, new);
>  }
> 
> +int arch_vcpu_is_preempted(int cpu);
> +#define vcpu_is_preempted cpu_is_preempted
> +static inline bool cpu_is_preempted(int cpu)
> +{
> +       return arch_vcpu_is_preempted(cpu);
> +}
> +
>  /*
>   * Simple spin lock operations.  There are two variants, one clears IRQ's
>   * on the local processor, one does not.
> diff --git a/arch/s390/lib/spinlock.c b/arch/s390/lib/spinlock.c
> index e5f50a7..260d179 100644
> --- a/arch/s390/lib/spinlock.c
> +++ b/arch/s390/lib/spinlock.c
> @@ -37,7 +37,7 @@ static inline void _raw_compare_and_delay(unsigned int *lock, unsigned int old)
>         asm(".insn rsy,0xeb0000000022,%0,0,%1" : : "d" (old), "Q" (*lock));
>  }
> 
> -static inline int cpu_is_preempted(int cpu)
> +int arch_vcpu_is_preempted(int cpu)
>  {
>         if (test_cpu_flag_of(CIF_ENABLED_WAIT, cpu))
>                 return 0;
> @@ -45,6 +45,7 @@ static inline int cpu_is_preempted(int cpu)
>                 return 0;
>         return 1;
>  }
> +EXPORT_SYMBOL(arch_vcpu_is_preempted);
> 
>  void arch_spin_lock_wait(arch_spinlock_t *lp)
>  {
> 
> 
> 
> If ok I can respin into a proper patch.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-29 10:10   ` Peter Zijlstra
@ 2016-09-30  4:00     ` Pan Xinhui
  -1 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-09-30  4:00 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: kernellwp, linux-s390, jgross, kvm, xen-devel-request,
	will.deacon, linux-kernel, virtualization, mingo, paulus, mpe,
	benh, pbonzini, paulmck, linuxppc-dev



在 2016/9/29 18:10, Peter Zijlstra 写道:
> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>> change from v2:
>> 	no code change, fix typos, update some comments
>>
>> change from v1:
>> 	a simplier definition of default vcpu_is_preempted
>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>> 	add more comments
>> 	thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>
> So I really like the concept, but I would also really like to see
> support for more hypervisors included before we can move forward with
> this.
>
> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
> follow later, but I think its important to not only have PPC support for
> this.
>
hi, Peter
	right, we need consider more arches.
I will try to work out a RFC patch on x86 anyway. There are already some discussions of the solution on x86 :)

thanks
xinhui

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-09-30  4:00     ` Pan Xinhui
  0 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-09-30  4:00 UTC (permalink / raw)
  To: Peter Zijlstra, Pan Xinhui
  Cc: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm, benh, paulus, mpe, mingo, paulmck,
	will.deacon, kernellwp, jgross, pbonzini, bsingharora



在 2016/9/29 18:10, Peter Zijlstra 写道:
> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>> change from v2:
>> 	no code change, fix typos, update some comments
>>
>> change from v1:
>> 	a simplier definition of default vcpu_is_preempted
>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>> 	add more comments
>> 	thanks boqun and Peter's suggestion.
>>
>> This patch set aims to fix lock holder preemption issues.
>
> So I really like the concept, but I would also really like to see
> support for more hypervisors included before we can move forward with
> this.
>
> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
> follow later, but I think its important to not only have PPC support for
> this.
>
hi, Peter
	right, we need consider more arches.
I will try to work out a RFC patch on x86 anyway. There are already some discussions of the solution on x86 :)

thanks
xinhui

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-29 10:31       ` Peter Zijlstra
@ 2016-09-30  5:03         ` Pan Xinhui
  -1 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-09-30  5:03 UTC (permalink / raw)
  To: Peter Zijlstra, Christian Borntraeger
  Cc: Pan Xinhui, linux-kernel, linuxppc-dev, virtualization,
	linux-s390, xen-devel-request, kvm, benh, paulus, mpe, mingo,
	paulmck, will.deacon, kernellwp, jgross, pbonzini, bsingharora,
	Heiko Carstens



在 2016/9/29 18:31, Peter Zijlstra 写道:
> On Thu, Sep 29, 2016 at 12:23:19PM +0200, Christian Borntraeger wrote:
>> On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
>>> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>>>> change from v2:
>>>> 	no code change, fix typos, update some comments
>>>>
>>>> change from v1:
>>>> 	a simplier definition of default vcpu_is_preempted
>>>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>>>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>>>> 	add more comments
>>>> 	thanks boqun and Peter's suggestion.
>>>>
>>>> This patch set aims to fix lock holder preemption issues.
>>>
>>> So I really like the concept, but I would also really like to see
>>> support for more hypervisors included before we can move forward with
>>> this.
>>>
>>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>>> follow later, but I think its important to not only have PPC support for
>>> this.
>>
>> Actually the s390 preemted check via sigp sense running  is available for
>> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
>> longer buy s390 systems without LPAR.
>>
>> As Heiko already pointed out we could simply use a small inline function
>> that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)
>
> Sure, and I had vague memories of Heiko's email. This patch set however
> completely fails to do that trivial hooking up.
>

sorry for that.
I will try to work it out on x86.

Hi, Will
	I appreciate that if you or some other arm guys could help on it. :)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-09-30  5:03         ` Pan Xinhui
  0 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-09-30  5:03 UTC (permalink / raw)
  To: Peter Zijlstra, Christian Borntraeger
  Cc: kernellwp, linux-s390, benh, jgross, kvm, Pan Xinhui,
	will.deacon, linux-kernel, Heiko Carstens, virtualization, mingo,
	paulus, mpe, xen-devel-request, pbonzini, paulmck, linuxppc-dev



在 2016/9/29 18:31, Peter Zijlstra 写道:
> On Thu, Sep 29, 2016 at 12:23:19PM +0200, Christian Borntraeger wrote:
>> On 09/29/2016 12:10 PM, Peter Zijlstra wrote:
>>> On Thu, Jul 21, 2016 at 07:45:10AM -0400, Pan Xinhui wrote:
>>>> change from v2:
>>>> 	no code change, fix typos, update some comments
>>>>
>>>> change from v1:
>>>> 	a simplier definition of default vcpu_is_preempted
>>>> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
>>>> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner.
>>>> 	add more comments
>>>> 	thanks boqun and Peter's suggestion.
>>>>
>>>> This patch set aims to fix lock holder preemption issues.
>>>
>>> So I really like the concept, but I would also really like to see
>>> support for more hypervisors included before we can move forward with
>>> this.
>>>
>>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>>> follow later, but I think its important to not only have PPC support for
>>> this.
>>
>> Actually the s390 preemted check via sigp sense running  is available for
>> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can no
>> longer buy s390 systems without LPAR.
>>
>> As Heiko already pointed out we could simply use a small inline function
>> that calls cpu_is_preempted from arch/s390/lib/spinlock (or smp_vcpu_scheduled from smp.c)
>
> Sure, and I had vague memories of Heiko's email. This patch set however
> completely fails to do that trivial hooking up.
>

sorry for that.
I will try to work it out on x86.

Hi, Will
	I appreciate that if you or some other arm guys could help on it. :)

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-30  5:03         ` Pan Xinhui
@ 2016-09-30  6:58           ` Paolo Bonzini
  -1 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2016-09-30  6:58 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: Peter Zijlstra, Christian Borntraeger, Pan Xinhui, linux-kernel,
	linuxppc-dev, virtualization, linux-s390, xen-devel-request, kvm,
	benh, paulus, mpe, mingo, paulmck, will deacon, kernellwp,
	jgross, bsingharora, Heiko Carstens

> > > > Please consider s390 and (x86/arm) KVM. Once we have a few, more can
> > > > follow later, but I think its important to not only have PPC support for
> > > > this.
> > >
> > > Actually the s390 preemted check via sigp sense running  is available for
> > > all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can
> > > no longer buy s390 systems without LPAR.
> > >
> > > As Heiko already pointed out we could simply use a small inline function
> > > that calls cpu_is_preempted from arch/s390/lib/spinlock (or
> > > smp_vcpu_scheduled from smp.c)
> >
> > Sure, and I had vague memories of Heiko's email. This patch set however
> > completely fails to do that trivial hooking up.
> 
> sorry for that.
> I will try to work it out on x86.

x86 has no hypervisor support, and I'd like to understand the desired
semantics first, so I don't think it should block this series.  In
particular, there are at least the following choices:

1) exit to userspace (5-10.000 clock cycles best case) counts as
lock holder preemption

2) any time the vCPU thread not running counts as lock holder
preemption

To implement the latter you'd need a hypercall or MSR (at least as
a slow path), because the KVM preempt notifier is only active
during the KVM_RUN ioctl.

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-09-30  6:58           ` Paolo Bonzini
  0 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2016-09-30  6:58 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: kernellwp, linux-s390, benh, jgross, kvm, Peter Zijlstra,
	Pan Xinhui, will deacon, linux-kernel, Heiko Carstens,
	virtualization, mingo, paulus, mpe, xen-devel-request, paulmck,
	linuxppc-dev

> > > > Please consider s390 and (x86/arm) KVM. Once we have a few, more can
> > > > follow later, but I think its important to not only have PPC support for
> > > > this.
> > >
> > > Actually the s390 preemted check via sigp sense running  is available for
> > > all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can
> > > no longer buy s390 systems without LPAR.
> > >
> > > As Heiko already pointed out we could simply use a small inline function
> > > that calls cpu_is_preempted from arch/s390/lib/spinlock (or
> > > smp_vcpu_scheduled from smp.c)
> >
> > Sure, and I had vague memories of Heiko's email. This patch set however
> > completely fails to do that trivial hooking up.
> 
> sorry for that.
> I will try to work it out on x86.

x86 has no hypervisor support, and I'd like to understand the desired
semantics first, so I don't think it should block this series.  In
particular, there are at least the following choices:

1) exit to userspace (5-10.000 clock cycles best case) counts as
lock holder preemption

2) any time the vCPU thread not running counts as lock holder
preemption

To implement the latter you'd need a hypercall or MSR (at least as
a slow path), because the KVM preempt notifier is only active
during the KVM_RUN ioctl.

Paolo

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-30  6:58           ` Paolo Bonzini
@ 2016-09-30  8:52             ` Pan Xinhui
  -1 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-09-30  8:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, Christian Borntraeger, Pan Xinhui, linux-kernel,
	linuxppc-dev, virtualization, linux-s390, xen-devel-request, kvm,
	benh, paulus, mpe, mingo, paulmck, will deacon, kernellwp,
	jgross, bsingharora, Heiko Carstens


hi, Paolo
	thanks for your reply.

在 2016/9/30 14:58, Paolo Bonzini 写道:
>>>>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>>>>> follow later, but I think its important to not only have PPC support for
>>>>> this.
>>>>
>>>> Actually the s390 preemted check via sigp sense running  is available for
>>>> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can
>>>> no longer buy s390 systems without LPAR.
>>>>
>>>> As Heiko already pointed out we could simply use a small inline function
>>>> that calls cpu_is_preempted from arch/s390/lib/spinlock (or
>>>> smp_vcpu_scheduled from smp.c)
>>>
>>> Sure, and I had vague memories of Heiko's email. This patch set however
>>> completely fails to do that trivial hooking up.
>>
>> sorry for that.
>> I will try to work it out on x86.
>
> x86 has no hypervisor support, and I'd like to understand the desired
> semantics first, so I don't think it should block this series.  In

Once a guest do a hypercall or something similar, IOW, there is a kvm_guest_exit. we think this is a lock holder preemption.
Adn PPC implement it in this way.

> particular, there are at least the following choices:
>
> 1) exit to userspace (5-10.000 clock cycles best case) counts as
> lock holder preemption
>
just to avoid any misunderstanding.
You are saying that the guest does an IO operation for example and then exit to QEMU right?
Yes, in this scenario it's hard to guarntee that such IO operation or someghing like that could be finished in time.

  
> 2) any time the vCPU thread not running counts as lock holder
> preemption
>
> To implement the latter you'd need a hypercall or MSR (at least as
> a slow path), because the KVM preempt notifier is only active
> during the KVM_RUN ioctl.
>
seems a little expensive. :(
How many clock cycles it might cost.

I am still looking for one shared struct between kvm and guest kernel on x86.
and every time kvm_guest_exit/enter called, we store some info in it. So guest kernel can check one vcpu is running or not quickly.

thanks
xinhui

> Paolo
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-09-30  8:52             ` Pan Xinhui
  0 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-09-30  8:52 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kernellwp, linux-s390, benh, jgross, kvm, Peter Zijlstra,
	Pan Xinhui, will deacon, linux-kernel, Heiko Carstens,
	virtualization, mingo, paulus, mpe, xen-devel-request, paulmck,
	linuxppc-dev


hi, Paolo
	thanks for your reply.

在 2016/9/30 14:58, Paolo Bonzini 写道:
>>>>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>>>>> follow later, but I think its important to not only have PPC support for
>>>>> this.
>>>>
>>>> Actually the s390 preemted check via sigp sense running  is available for
>>>> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can
>>>> no longer buy s390 systems without LPAR.
>>>>
>>>> As Heiko already pointed out we could simply use a small inline function
>>>> that calls cpu_is_preempted from arch/s390/lib/spinlock (or
>>>> smp_vcpu_scheduled from smp.c)
>>>
>>> Sure, and I had vague memories of Heiko's email. This patch set however
>>> completely fails to do that trivial hooking up.
>>
>> sorry for that.
>> I will try to work it out on x86.
>
> x86 has no hypervisor support, and I'd like to understand the desired
> semantics first, so I don't think it should block this series.  In

Once a guest do a hypercall or something similar, IOW, there is a kvm_guest_exit. we think this is a lock holder preemption.
Adn PPC implement it in this way.

> particular, there are at least the following choices:
>
> 1) exit to userspace (5-10.000 clock cycles best case) counts as
> lock holder preemption
>
just to avoid any misunderstanding.
You are saying that the guest does an IO operation for example and then exit to QEMU right?
Yes, in this scenario it's hard to guarntee that such IO operation or someghing like that could be finished in time.

  
> 2) any time the vCPU thread not running counts as lock holder
> preemption
>
> To implement the latter you'd need a hypercall or MSR (at least as
> a slow path), because the KVM preempt notifier is only active
> during the KVM_RUN ioctl.
>
seems a little expensive. :(
How many clock cycles it might cost.

I am still looking for one shared struct between kvm and guest kernel on x86.
and every time kvm_guest_exit/enter called, we store some info in it. So guest kernel can check one vcpu is running or not quickly.

thanks
xinhui

> Paolo
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-30  8:52             ` Pan Xinhui
  (?)
  (?)
@ 2016-09-30  9:08             ` Paolo Bonzini
  2016-09-30  9:31                 ` Pan Xinhui
  -1 siblings, 1 reply; 37+ messages in thread
From: Paolo Bonzini @ 2016-09-30  9:08 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: Peter Zijlstra, Christian Borntraeger, Pan Xinhui, linux-kernel,
	linuxppc-dev, virtualization, linux-s390, xen-devel-request, kvm,
	benh, paulus, mpe, mingo, paulmck, will deacon, kernellwp,
	jgross, bsingharora, Heiko Carstens



On 30/09/2016 10:52, Pan Xinhui wrote:
>> x86 has no hypervisor support, and I'd like to understand the desired
>> semantics first, so I don't think it should block this series.  In
> 
> Once a guest do a hypercall or something similar, IOW, there is a
> kvm_guest_exit. we think this is a lock holder preemption.
> Adn PPC implement it in this way.

Ok, good.

>> particular, there are at least the following choices:
>>
>> 1) exit to userspace (5-10.000 clock cycles best case) counts as
>> lock holder preemption
>>
>> 2) any time the vCPU thread not running counts as lock holder
>> preemption
>>
>> To implement the latter you'd need a hypercall or MSR (at least as
>> a slow path), because the KVM preempt notifier is only active
>> during the KVM_RUN ioctl.
>
> seems a little expensive. :(
> How many clock cycles it might cost.

An MSR read is about 1500 clock cycles, but it need not be the fast path
(e.g. use a bit to check if the CPU is running, if not use the MSR to
check if the CPU is in userspace but the CPU thread is scheduled).  But
it's not necessary if you are just matching PPC semantics.

Then the simplest thing is to use the kvm_steal_time struct, and add a
new field to it that replaces pad[0].  You can write a 0 to the flag in
record_steal_time (not preempted) and a 1 in kvm_arch_vcpu_put
(preempted).  record_steal_time is called before the VM starts running,
immediately after KVM_RUN and also after every sched_in.

If KVM doesn't implement the flag, it won't touch that field at all.  So
the kernel can write a 0, meaning "not preempted", and not care if the
hypervisor implements the flag or not: the answer will always be safe.

The pointer to the flag can be placed in a per-cpu u32*, and again if
the u32* is NULL that means "not preempted".

Paolo


> I am still looking for one shared struct between kvm and guest kernel on
> x86.
> and every time kvm_guest_exit/enter called, we store some info in it. So
> guest kernel can check one vcpu is running or not quickly.
> 
> thanks
> xinhui
> 
>> Paolo
>>
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-30  8:52             ` Pan Xinhui
  (?)
@ 2016-09-30  9:08             ` Paolo Bonzini
  -1 siblings, 0 replies; 37+ messages in thread
From: Paolo Bonzini @ 2016-09-30  9:08 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: kernellwp, linux-s390, benh, jgross, kvm, Peter Zijlstra,
	Pan Xinhui, will deacon, linux-kernel, Heiko Carstens,
	virtualization, mingo, paulus, mpe, xen-devel-request, paulmck,
	linuxppc-dev



On 30/09/2016 10:52, Pan Xinhui wrote:
>> x86 has no hypervisor support, and I'd like to understand the desired
>> semantics first, so I don't think it should block this series.  In
> 
> Once a guest do a hypercall or something similar, IOW, there is a
> kvm_guest_exit. we think this is a lock holder preemption.
> Adn PPC implement it in this way.

Ok, good.

>> particular, there are at least the following choices:
>>
>> 1) exit to userspace (5-10.000 clock cycles best case) counts as
>> lock holder preemption
>>
>> 2) any time the vCPU thread not running counts as lock holder
>> preemption
>>
>> To implement the latter you'd need a hypercall or MSR (at least as
>> a slow path), because the KVM preempt notifier is only active
>> during the KVM_RUN ioctl.
>
> seems a little expensive. :(
> How many clock cycles it might cost.

An MSR read is about 1500 clock cycles, but it need not be the fast path
(e.g. use a bit to check if the CPU is running, if not use the MSR to
check if the CPU is in userspace but the CPU thread is scheduled).  But
it's not necessary if you are just matching PPC semantics.

Then the simplest thing is to use the kvm_steal_time struct, and add a
new field to it that replaces pad[0].  You can write a 0 to the flag in
record_steal_time (not preempted) and a 1 in kvm_arch_vcpu_put
(preempted).  record_steal_time is called before the VM starts running,
immediately after KVM_RUN and also after every sched_in.

If KVM doesn't implement the flag, it won't touch that field at all.  So
the kernel can write a 0, meaning "not preempted", and not care if the
hypervisor implements the flag or not: the answer will always be safe.

The pointer to the flag can be placed in a per-cpu u32*, and again if
the u32* is NULL that means "not preempted".

Paolo


> I am still looking for one shared struct between kvm and guest kernel on
> x86.
> and every time kvm_guest_exit/enter called, we store some info in it. So
> guest kernel can check one vcpu is running or not quickly.
> 
> thanks
> xinhui
> 
>> Paolo
>>
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-30  9:08             ` Paolo Bonzini
@ 2016-09-30  9:31                 ` Pan Xinhui
  0 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-09-30  9:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Peter Zijlstra, Christian Borntraeger, Pan Xinhui, linux-kernel,
	linuxppc-dev, virtualization, linux-s390, xen-devel-request, kvm,
	benh, paulus, mpe, mingo, paulmck, will deacon, kernellwp,
	jgross, bsingharora, Heiko Carstens



在 2016/9/30 17:08, Paolo Bonzini 写道:
>
>
> On 30/09/2016 10:52, Pan Xinhui wrote:
>>> x86 has no hypervisor support, and I'd like to understand the desired
>>> semantics first, so I don't think it should block this series.  In
>>
>> Once a guest do a hypercall or something similar, IOW, there is a
>> kvm_guest_exit. we think this is a lock holder preemption.
>> Adn PPC implement it in this way.
>
> Ok, good.
>
>>> particular, there are at least the following choices:
>>>
>>> 1) exit to userspace (5-10.000 clock cycles best case) counts as
>>> lock holder preemption
>>>
>>> 2) any time the vCPU thread not running counts as lock holder
>>> preemption
>>>
>>> To implement the latter you'd need a hypercall or MSR (at least as
>>> a slow path), because the KVM preempt notifier is only active
>>> during the KVM_RUN ioctl.
>>
>> seems a little expensive. :(
>> How many clock cycles it might cost.
>
> An MSR read is about 1500 clock cycles, but it need not be the fast path
> (e.g. use a bit to check if the CPU is running, if not use the MSR to
> check if the CPU is in userspace but the CPU thread is scheduled).  But
> it's not necessary if you are just matching PPC semantics.
>
> Then the simplest thing is to use the kvm_steal_time struct, and add a
> new field to it that replaces pad[0].  You can write a 0 to the flag in
> record_steal_time (not preempted) and a 1 in kvm_arch_vcpu_put
> (preempted).  record_steal_time is called before the VM starts running,
> immediately after KVM_RUN and also after every sched_in.
>
> If KVM doesn't implement the flag, it won't touch that field at all.  So
> the kernel can write a 0, meaning "not preempted", and not care if the
> hypervisor implements the flag or not: the answer will always be safe.
>
> The pointer to the flag can be placed in a per-cpu u32*, and again if
> the u32* is NULL that means "not preempted".
>
really nice suggestion!  That's what I want :)

thanks
xinhui

> Paolo
>
>
>> I am still looking for one shared struct between kvm and guest kernel on
>> x86.
>> and every time kvm_guest_exit/enter called, we store some info in it. So
>> guest kernel can check one vcpu is running or not quickly.
>>
>> thanks
>> xinhui
>>
>>> Paolo
>>>
>>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-09-30  9:31                 ` Pan Xinhui
  0 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-09-30  9:31 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kernellwp, linux-s390, benh, jgross, kvm, Peter Zijlstra,
	Pan Xinhui, will deacon, linux-kernel, Heiko Carstens,
	virtualization, mingo, paulus, mpe, xen-devel-request, paulmck,
	linuxppc-dev



在 2016/9/30 17:08, Paolo Bonzini 写道:
>
>
> On 30/09/2016 10:52, Pan Xinhui wrote:
>>> x86 has no hypervisor support, and I'd like to understand the desired
>>> semantics first, so I don't think it should block this series.  In
>>
>> Once a guest do a hypercall or something similar, IOW, there is a
>> kvm_guest_exit. we think this is a lock holder preemption.
>> Adn PPC implement it in this way.
>
> Ok, good.
>
>>> particular, there are at least the following choices:
>>>
>>> 1) exit to userspace (5-10.000 clock cycles best case) counts as
>>> lock holder preemption
>>>
>>> 2) any time the vCPU thread not running counts as lock holder
>>> preemption
>>>
>>> To implement the latter you'd need a hypercall or MSR (at least as
>>> a slow path), because the KVM preempt notifier is only active
>>> during the KVM_RUN ioctl.
>>
>> seems a little expensive. :(
>> How many clock cycles it might cost.
>
> An MSR read is about 1500 clock cycles, but it need not be the fast path
> (e.g. use a bit to check if the CPU is running, if not use the MSR to
> check if the CPU is in userspace but the CPU thread is scheduled).  But
> it's not necessary if you are just matching PPC semantics.
>
> Then the simplest thing is to use the kvm_steal_time struct, and add a
> new field to it that replaces pad[0].  You can write a 0 to the flag in
> record_steal_time (not preempted) and a 1 in kvm_arch_vcpu_put
> (preempted).  record_steal_time is called before the VM starts running,
> immediately after KVM_RUN and also after every sched_in.
>
> If KVM doesn't implement the flag, it won't touch that field at all.  So
> the kernel can write a 0, meaning "not preempted", and not care if the
> hypervisor implements the flag or not: the answer will always be safe.
>
> The pointer to the flag can be placed in a per-cpu u32*, and again if
> the u32* is NULL that means "not preempted".
>
really nice suggestion!  That's what I want :)

thanks
xinhui

> Paolo
>
>
>> I am still looking for one shared struct between kvm and guest kernel on
>> x86.
>> and every time kvm_guest_exit/enter called, we store some info in it. So
>> guest kernel can check one vcpu is running or not quickly.
>>
>> thanks
>> xinhui
>>
>>> Paolo
>>>
>>
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-09-30  6:58           ` Paolo Bonzini
@ 2016-09-30 10:44             ` Christian Borntraeger
  -1 siblings, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-09-30 10:44 UTC (permalink / raw)
  To: Paolo Bonzini, Pan Xinhui
  Cc: Peter Zijlstra, Pan Xinhui, linux-kernel, linuxppc-dev,
	virtualization, linux-s390, xen-devel-request, kvm, benh, paulus,
	mpe, mingo, paulmck, will deacon, kernellwp, jgross, bsingharora,
	Heiko Carstens

On 09/30/2016 08:58 AM, Paolo Bonzini wrote:
>>>>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>>>>> follow later, but I think its important to not only have PPC support for
>>>>> this.
>>>>
>>>> Actually the s390 preemted check via sigp sense running  is available for
>>>> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can
>>>> no longer buy s390 systems without LPAR.
>>>>
>>>> As Heiko already pointed out we could simply use a small inline function
>>>> that calls cpu_is_preempted from arch/s390/lib/spinlock (or
>>>> smp_vcpu_scheduled from smp.c)
>>>
>>> Sure, and I had vague memories of Heiko's email. This patch set however
>>> completely fails to do that trivial hooking up.
>>
>> sorry for that.
>> I will try to work it out on x86.
> 
> x86 has no hypervisor support, and I'd like to understand the desired
> semantics first, so I don't think it should block this series.  In
> particular, there are at least the following choices:

I think the semantics can be slightly different for different architectures
after all it is still a heuristics to improve performance.
> 
> 1) exit to userspace (5-10.000 clock cycles best case) counts as
> lock holder preemption
> 
> 2) any time the vCPU thread not running counts as lock holder
> preemption
> 
> To implement the latter you'd need a hypercall or MSR (at least as
> a slow path), because the KVM preempt notifier is only active
> during the KVM_RUN ioctl.

FWIW, The s390 implementation uses kvm_arch_vcpu_put/load as trigger
points for (un)setting the CPUSTAT_RUNNING. Strictly speaking an exit to
userspace is not preempted, But as KVM has no control if we are being
scheduled out when in QEMU this is the compromise that seems to work quite
well for the s390 spinlock code (which checks the running state before 
doing a yield hypercall). 
In addition an exit to QEMU is really a rare case.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-09-30 10:44             ` Christian Borntraeger
  0 siblings, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-09-30 10:44 UTC (permalink / raw)
  To: Paolo Bonzini, Pan Xinhui
  Cc: kernellwp, linux-s390, benh, jgross, kvm, Peter Zijlstra,
	Pan Xinhui, will deacon, linux-kernel, Heiko Carstens,
	virtualization, mingo, paulus, mpe, xen-devel-request, paulmck,
	linuxppc-dev

On 09/30/2016 08:58 AM, Paolo Bonzini wrote:
>>>>> Please consider s390 and (x86/arm) KVM. Once we have a few, more can
>>>>> follow later, but I think its important to not only have PPC support for
>>>>> this.
>>>>
>>>> Actually the s390 preemted check via sigp sense running  is available for
>>>> all hypervisors (z/VM, LPAR and KVM) which implies everywhere as you can
>>>> no longer buy s390 systems without LPAR.
>>>>
>>>> As Heiko already pointed out we could simply use a small inline function
>>>> that calls cpu_is_preempted from arch/s390/lib/spinlock (or
>>>> smp_vcpu_scheduled from smp.c)
>>>
>>> Sure, and I had vague memories of Heiko's email. This patch set however
>>> completely fails to do that trivial hooking up.
>>
>> sorry for that.
>> I will try to work it out on x86.
> 
> x86 has no hypervisor support, and I'd like to understand the desired
> semantics first, so I don't think it should block this series.  In
> particular, there are at least the following choices:

I think the semantics can be slightly different for different architectures
after all it is still a heuristics to improve performance.
> 
> 1) exit to userspace (5-10.000 clock cycles best case) counts as
> lock holder preemption
> 
> 2) any time the vCPU thread not running counts as lock holder
> preemption
> 
> To implement the latter you'd need a hypercall or MSR (at least as
> a slow path), because the KVM preempt notifier is only active
> during the KVM_RUN ioctl.

FWIW, The s390 implementation uses kvm_arch_vcpu_put/load as trigger
points for (un)setting the CPUSTAT_RUNNING. Strictly speaking an exit to
userspace is not preempted, But as KVM has no control if we are being
scheduled out when in QEMU this is the compromise that seems to work quite
well for the s390 spinlock code (which checks the running state before 
doing a yield hypercall). 
In addition an exit to QEMU is really a rare case.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
  2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
@ 2016-10-05 11:00   ` Christian Borntraeger
  2016-07-21 11:45 ` Pan Xinhui
                     ` (7 subsequent siblings)
  8 siblings, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-10-05 11:00 UTC (permalink / raw)
  To: Pan Xinhui, linux-kernel, linuxppc-dev, virtualization,
	linux-s390, xen-devel-request, kvm
  Cc: benh, paulus, mpe, mingo, peterz, paulmck, will.deacon,
	kernellwp, jgross, pbonzini, bsingharora

On 07/21/2016 01:45 PM, Pan Xinhui wrote:
> change from v2:
> 	no code change, fix typos, update some comments
> 
> change from v1:
> 	a simplier definition of default vcpu_is_preempted
> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
> 	add more comments
> 	thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.
> 
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
> 
> before patch:
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
> 
> after patch:
>  9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
>  4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
>  3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
>  3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
>  2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
> 
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch set
> 
> Pan Xinhui (4):
>   kernel/sched: introduce vcpu preempted check interface
>   powerpc/spinlock: support vcpu preempted check
>   locking/osq: Drop the overhead of osq_lock()
>   kernel/locking: Drop the overhead of {mutex,rwsem}_spin_on_owner
> 
>  arch/powerpc/include/asm/spinlock.h | 18 ++++++++++++++++++
>  include/linux/sched.h               | 12 ++++++++++++
>  kernel/locking/mutex.c              | 15 +++++++++++++--
>  kernel/locking/osq_lock.c           | 10 +++++++++-
>  kernel/locking/rwsem-xadd.c         | 16 +++++++++++++---
>  5 files changed, 65 insertions(+), 6 deletions(-)
> 

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
for the full series. With my patch on top this really improves
some benchmarks for overcommitted KVM guests.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH v3 0/4] implement vcpu preempted check
@ 2016-10-05 11:00   ` Christian Borntraeger
  0 siblings, 0 replies; 37+ messages in thread
From: Christian Borntraeger @ 2016-10-05 11:00 UTC (permalink / raw)
  To: Pan Xinhui, linux-kernel, linuxppc-dev, virtualization,
	linux-s390, xen-devel-request, kvm
  Cc: kernellwp, jgross, peterz, benh, will.deacon, mingo, paulus, mpe,
	pbonzini, paulmck

On 07/21/2016 01:45 PM, Pan Xinhui wrote:
> change from v2:
> 	no code change, fix typos, update some comments
> 
> change from v1:
> 	a simplier definition of default vcpu_is_preempted
> 	skip mahcine type check on ppc, and add config. remove dedicated macro.
> 	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
> 	add more comments
> 	thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.
> 
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
> 
> before patch:
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
> 
> after patch:
>  9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
>  4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
>  3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
>  3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
>  2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
> 
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch set
> 
> Pan Xinhui (4):
>   kernel/sched: introduce vcpu preempted check interface
>   powerpc/spinlock: support vcpu preempted check
>   locking/osq: Drop the overhead of osq_lock()
>   kernel/locking: Drop the overhead of {mutex,rwsem}_spin_on_owner
> 
>  arch/powerpc/include/asm/spinlock.h | 18 ++++++++++++++++++
>  include/linux/sched.h               | 12 ++++++++++++
>  kernel/locking/mutex.c              | 15 +++++++++++++--
>  kernel/locking/osq_lock.c           | 10 +++++++++-
>  kernel/locking/rwsem-xadd.c         | 16 +++++++++++++---
>  5 files changed, 65 insertions(+), 6 deletions(-)
> 

Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
for the full series. With my patch on top this really improves
some benchmarks for overcommitted KVM guests.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH v3 0/4] implement vcpu preempted check
@ 2016-07-21 11:45 Pan Xinhui
  0 siblings, 0 replies; 37+ messages in thread
From: Pan Xinhui @ 2016-07-21 11:45 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev, virtualization, linux-s390,
	xen-devel-request, kvm
  Cc: kernellwp, jgross, Pan Xinhui, peterz, benh, will.deacon, mingo,
	paulus, mpe, pbonzini, paulmck

change from v2:
	no code change, fix typos, update some comments

change from v1:
	a simplier definition of default vcpu_is_preempted
	skip mahcine type check on ppc, and add config. remove dedicated macro.
	add one patch to drop overload of rwsem_spin_on_owner and mutex_spin_on_owner. 
	add more comments
	thanks boqun and Peter's suggestion.

This patch set aims to fix lock holder preemption issues.

test-case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

after patch:
 9.99%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 5.28%  sched-messaging  [unknown]         [H] 0xc0000000000768e0
 4.27%  sched-messaging  [kernel.vmlinux]  [k] __copy_tofrom_user_power7
 3.77%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 3.24%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.02%  sched-messaging  [kernel.vmlinux]  [k] system_call
 2.69%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task

We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
These spin_on_onwer variant also cause rcu stall before we apply this patch set

Pan Xinhui (4):
  kernel/sched: introduce vcpu preempted check interface
  powerpc/spinlock: support vcpu preempted check
  locking/osq: Drop the overhead of osq_lock()
  kernel/locking: Drop the overhead of {mutex,rwsem}_spin_on_owner

 arch/powerpc/include/asm/spinlock.h | 18 ++++++++++++++++++
 include/linux/sched.h               | 12 ++++++++++++
 kernel/locking/mutex.c              | 15 +++++++++++++--
 kernel/locking/osq_lock.c           | 10 +++++++++-
 kernel/locking/rwsem-xadd.c         | 16 +++++++++++++---
 5 files changed, 65 insertions(+), 6 deletions(-)

-- 
2.4.11

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2016-10-05 11:01 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-07-21 11:45 [PATCH v3 0/4] implement vcpu preempted check Pan Xinhui
2016-07-21 11:45 ` [PATCH v3 1/4] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
2016-07-21 11:45 ` Pan Xinhui
2016-07-21 11:45 ` [PATCH v3 2/4] powerpc/spinlock: support vcpu preempted check Pan Xinhui
2016-07-21 11:45 ` Pan Xinhui
2016-07-21 11:45 ` [PATCH v3 3/4] locking/osq: Drop the overhead of osq_lock() Pan Xinhui
2016-07-21 11:45 ` Pan Xinhui
2016-07-21 11:45 ` [PATCH v3 4/4] kernel/locking: Drop the overhead of {mutex,rwsem}_spin_on_owner Pan Xinhui
2016-07-21 11:45   ` [PATCH v3 4/4] kernel/locking: Drop the overhead of {mutex, rwsem}_spin_on_owner Pan Xinhui
2016-07-21 11:45   ` Pan Xinhui
2016-09-29 10:10 ` [PATCH v3 0/4] implement vcpu preempted check Peter Zijlstra
2016-09-29 10:10   ` Peter Zijlstra
2016-09-29 10:23   ` Christian Borntraeger
2016-09-29 10:23     ` Christian Borntraeger
2016-09-29 10:31     ` Peter Zijlstra
2016-09-29 10:31       ` Peter Zijlstra
2016-09-30  5:03       ` Pan Xinhui
2016-09-30  5:03         ` Pan Xinhui
2016-09-30  6:58         ` Paolo Bonzini
2016-09-30  6:58           ` Paolo Bonzini
2016-09-30  8:52           ` Pan Xinhui
2016-09-30  8:52             ` Pan Xinhui
2016-09-30  9:08             ` Paolo Bonzini
2016-09-30  9:08             ` Paolo Bonzini
2016-09-30  9:31               ` Pan Xinhui
2016-09-30  9:31                 ` Pan Xinhui
2016-09-30 10:44           ` Christian Borntraeger
2016-09-30 10:44             ` Christian Borntraeger
2016-09-29 10:40     ` Christian Borntraeger
2016-09-29 10:40     ` Christian Borntraeger
2016-09-29 11:05       ` Christian Borntraeger
2016-09-29 11:05       ` Christian Borntraeger
2016-09-30  4:00   ` Pan Xinhui
2016-09-30  4:00     ` Pan Xinhui
2016-10-05 11:00 ` Christian Borntraeger
2016-10-05 11:00   ` Christian Borntraeger
  -- strict thread matches above, loose matches on Subject: below --
2016-07-21 11:45 Pan Xinhui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.