All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface
  2016-06-27 17:41 ` [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
@ 2016-06-27 14:00   ` Peter Zijlstra
  2016-06-27 14:02     ` Peter Zijlstra
  2016-06-28  7:00     ` Heiko Carstens
  2016-06-27 14:05   ` Boqun Feng
  1 sibling, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-06-27 14:00 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: linuxppc-dev, linux-kernel, paulmck, mingo, mpe, paulus, benh,
	Waiman.Long, boqun.feng, will.deacon, dave

On Mon, Jun 27, 2016 at 01:41:28PM -0400, Pan Xinhui wrote:
> +++ b/include/linux/sched.h
> @@ -3293,6 +3293,15 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
>  
>  #endif /* CONFIG_SMP */
>  
> +#ifdef arch_vcpu_is_preempted
> +static inline bool vcpu_is_preempted(int cpu)
> +{
> +	return arch_vcpu_is_preempted(cpu);
> +}
> +#else
> +#define vcpu_is_preempted(cpu)	false
> +#endif

#ifndef vcpu_is_preempted
#define vcpu_is_preempted(cpu)		(false)
#endif

Is so much simpler...

Also, please Cc the virt list so that other interested parties can
comment, and maybe also the s390 folks.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface
  2016-06-27 14:00   ` Peter Zijlstra
@ 2016-06-27 14:02     ` Peter Zijlstra
  2016-06-28  3:14       ` xinhui
  2016-06-28  7:00     ` Heiko Carstens
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-06-27 14:02 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: linuxppc-dev, linux-kernel, paulmck, mingo, mpe, paulus, benh,
	Waiman.Long, boqun.feng, will.deacon, dave

On Mon, Jun 27, 2016 at 04:00:43PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 27, 2016 at 01:41:28PM -0400, Pan Xinhui wrote:
> > +++ b/include/linux/sched.h
> > @@ -3293,6 +3293,15 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
> >  
> >  #endif /* CONFIG_SMP */
> >  
> > +#ifdef arch_vcpu_is_preempted
> > +static inline bool vcpu_is_preempted(int cpu)
> > +{
> > +	return arch_vcpu_is_preempted(cpu);
> > +}
> > +#else
> > +#define vcpu_is_preempted(cpu)	false
> > +#endif
> 
> #ifndef vcpu_is_preempted
> #define vcpu_is_preempted(cpu)		(false)
> #endif
> 
> Is so much simpler...
> 
> Also, please Cc the virt list so that other interested parties can
> comment, and maybe also the s390 folks.

And before you hurry off to post again, add a patch doing
mutex_spin_on_owner() and rwsem_spin_in_owner().

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface
  2016-06-27 17:41 ` [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
  2016-06-27 14:00   ` Peter Zijlstra
@ 2016-06-27 14:05   ` Boqun Feng
  2016-06-28  3:15     ` xinhui
  1 sibling, 1 reply; 18+ messages in thread
From: Boqun Feng @ 2016-06-27 14:05 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: linuxppc-dev, linux-kernel, paulmck, peterz, mingo, mpe, paulus,
	benh, Waiman.Long, will.deacon, dave

[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]

On Mon, Jun 27, 2016 at 01:41:28PM -0400, Pan Xinhui wrote:
> this supports to fix lock holder preempted issue which run as a guest
> 
> for kernel users, we could use bool vcpu_is_preempted(int cpu) to detech
> if one vcpu is preempted or not.
> 
> The default implementation is a macrodefined by false. So compiler can
> wrap it out if arch dose not support such vcpu pteempted check.
> 
> archs can implement it by define arch_vcpu_is_preempted().
> 
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
>  include/linux/sched.h | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 6e42ada..dc0a9c3 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -3293,6 +3293,15 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
>  
>  #endif /* CONFIG_SMP */
>  
> +#ifdef arch_vcpu_is_preempted
> +static inline bool vcpu_is_preempted(int cpu)
> +{
> +	return arch_vcpu_is_preempted(cpu);
> +}
> +#else
> +#define vcpu_is_preempted(cpu)	false
> +#endif
> +

I think you are missing Peter's comment here. We can

#ifndef vcpu_is_preempted
#define vcpu_is_preempted(cpu)	fasle
#endif

And different archs implement their own versions of vcpu_is_preempted(),
IOW, no need for an arch_vcpu_is_preempted().

Regards,
Boqun

>  extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
>  extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
>  
> -- 
> 2.4.11
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/spinlock: support vcpu preempted check
  2016-06-27 17:41 ` [PATCH 2/3] powerpc/spinlock: support vcpu preempted check Pan Xinhui
@ 2016-06-27 14:17   ` Peter Zijlstra
  2016-06-28  3:23     ` xinhui
  2016-06-27 14:58   ` Boqun Feng
  1 sibling, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2016-06-27 14:17 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: linuxppc-dev, linux-kernel, paulmck, mingo, mpe, paulus, benh,
	Waiman.Long, boqun.feng, will.deacon, dave

On Mon, Jun 27, 2016 at 01:41:29PM -0400, Pan Xinhui wrote:
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index 523673d..ae938ee 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -52,6 +52,21 @@
>  #define SYNC_IO
>  #endif
>  
> +/* For fixing some spinning issues in a guest.
> + * kernel would check if vcpu is preempted during a spin loop.
> + * we support that.
> + */

If you look around in that file you'll notice that the above comment
style is inconsistent.

Nor is the comment really clarifying things, for one you fail to mention
the problem by its known name. You also forget to explain how this
interface will help. How about something like this:

/*
 * In order to deal with a various lock holder preemption issues provide
 * an interface to see if a vCPU is currently running or not.
 *
 * This allows us to terminate optimistic spin loops and block,
 * analogous to the native optimistic spin heuristic of testing if the
 * lock owner task is running or not.
 */

Also, since you now have a useful comment, which is not architecture
specific, I would place it with the common vcpu_is_preempted()
definition in sched.h.

Hmm?

> +#define arch_vcpu_is_preempted arch_vcpu_is_preempted
> +static inline bool arch_vcpu_is_preempted(int cpu)
> +{
> +	struct lppaca *lp = &lppaca_of(cpu);
> +
> +	if (unlikely(!(lppaca_shared_proc(lp) ||
> +			lppaca_dedicated_proc(lp))))
> +		return false;
> +	return !!(be32_to_cpu(lp->yield_count) & 1);
> +}
> +
>  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>  {
>  	return lock.slock == 0;
> -- 
> 2.4.11
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] locking/osq: Drop the overload of osq_lock()
  2016-06-27 17:41 ` [PATCH 3/3] locking/osq: Drop the overload of osq_lock() Pan Xinhui
@ 2016-06-27 14:21   ` Peter Zijlstra
  0 siblings, 0 replies; 18+ messages in thread
From: Peter Zijlstra @ 2016-06-27 14:21 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: linuxppc-dev, linux-kernel, paulmck, mingo, mpe, paulus, benh,
	Waiman.Long, boqun.feng, will.deacon, dave

On Mon, Jun 27, 2016 at 01:41:30PM -0400, Pan Xinhui wrote:
> @@ -118,8 +123,17 @@ bool osq_lock(struct optimistic_spin_queue *lock)
>  	while (!READ_ONCE(node->locked)) {
>  		/*
>  		 * If we need to reschedule bail... so we can block.
> +		 * An over-committed guest with more vCPUs than pCPUs
> +		 * might fall in this loop and cause a huge overload.
> +		 * This is because vCPU A(prev) hold the osq lock and yield out
> +		 * vCPU B(node) wait ->locked to be set, IOW, it wait utill
> +		 * vCPU A run and unlock the osq lock. Such spin is meaningless
> +		 * use vcpu_is_preempted to detech such case. IF arch does not
> +		 * support vcpu preempted check, vcpu_is_preempted is a macro
> +		 * defined by false.

Or you could mention lock holder preemption and everybody will know what
you're talking about.

>  		 */
> -		if (need_resched())
> +		if (need_resched() ||
> +			vcpu_is_preempted(node_cpu(node->prev)))

Did you really need that linebreak?

>  			goto unqueue;
>  
>  		cpu_relax_lowlatency();
> -- 
> 2.4.11
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/spinlock: support vcpu preempted check
  2016-06-27 17:41 ` [PATCH 2/3] powerpc/spinlock: support vcpu preempted check Pan Xinhui
  2016-06-27 14:17   ` Peter Zijlstra
@ 2016-06-27 14:58   ` Boqun Feng
  2016-06-28  3:39     ` xinhui
  1 sibling, 1 reply; 18+ messages in thread
From: Boqun Feng @ 2016-06-27 14:58 UTC (permalink / raw)
  To: Pan Xinhui
  Cc: linuxppc-dev, linux-kernel, paulmck, peterz, mingo, mpe, paulus,
	benh, Waiman.Long, will.deacon, dave

[-- Attachment #1: Type: text/plain, Size: 2870 bytes --]

Hi Xinhui,

On Mon, Jun 27, 2016 at 01:41:29PM -0400, Pan Xinhui wrote:
> This is to fix some holder preemption issues. Spinning at one
> vcpu which is preempted is meaningless.
> 
> Kernel need such interfaces, So lets support it.
> 
> We also should suooprt both the shared and dedicated mode.
> So add lppaca_dedicated_proc macro in lppaca.h
> 
> Suggested-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/lppaca.h   |  6 ++++++
>  arch/powerpc/include/asm/spinlock.h | 15 +++++++++++++++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
> index d0a2a2f..0a263d3 100644
> --- a/arch/powerpc/include/asm/lppaca.h
> +++ b/arch/powerpc/include/asm/lppaca.h
> @@ -111,12 +111,18 @@ extern struct lppaca lppaca[];
>   * we will have to transition to something better.
>   */
>  #define LPPACA_OLD_SHARED_PROC		2
> +#define LPPACA_OLD_DEDICATED_PROC      (1 << 6)
>  

I think you should describe a little bit about the magic number here,
i.e. what document/specification says this should work, and how this
works.

>  static inline bool lppaca_shared_proc(struct lppaca *l)
>  {
>  	return !!(l->__old_status & LPPACA_OLD_SHARED_PROC);
>  }
>  
> +static inline bool lppaca_dedicated_proc(struct lppaca *l)
> +{
> +	return !!(l->__old_status & LPPACA_OLD_DEDICATED_PROC);
> +}
> +
>  /*
>   * SLB shadow buffer structure as defined in the PAPR.  The save_area
>   * contains adjacent ESID and VSID pairs for each shadowed SLB.  The
> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
> index 523673d..ae938ee 100644
> --- a/arch/powerpc/include/asm/spinlock.h
> +++ b/arch/powerpc/include/asm/spinlock.h
> @@ -52,6 +52,21 @@
>  #define SYNC_IO
>  #endif
>  
> +/* For fixing some spinning issues in a guest.
> + * kernel would check if vcpu is preempted during a spin loop.
> + * we support that.
> + */
> +#define arch_vcpu_is_preempted arch_vcpu_is_preempted
> +static inline bool arch_vcpu_is_preempted(int cpu)

This function should be guarded by #ifdef PPC_PSERIES .. #endif, right?
Because if the kernel is not compiled with guest support,
vcpu_is_preempted() should always be false, right?

> +{
> +	struct lppaca *lp = &lppaca_of(cpu);
> +
> +	if (unlikely(!(lppaca_shared_proc(lp) ||
> +			lppaca_dedicated_proc(lp))))

Do you want to detect whether we are running in a guest(ie. pseries
kernel) here? Then I wonder whether "machine_is(pseries)" works here.

Regards,
Boqun

> +		return false;
> +	return !!(be32_to_cpu(lp->yield_count) & 1);
> +}
> +
>  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>  {
>  	return lock.slock == 0;
> -- 
> 2.4.11
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 0/3] implement vcpu preempted check
@ 2016-06-27 17:41 Pan Xinhui
  2016-06-27 17:41 ` [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Pan Xinhui @ 2016-06-27 17:41 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: paulmck, peterz, mingo, mpe, paulus, benh, Waiman.Long,
	boqun.feng, will.deacon, dave, Pan Xinhui

This is to fix some bad issues on an over-committed guest.

test-caes:
perf record -a perf bench sched messaging -g 400 -p && perf report

18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

osq takes a long time with preemption disabled which is really bad.

This is because vCPU A hold the osq lock and yield out, vCPU B wait
per_cpu node->locked to be set. IOW, vCPU B wait vCPU A to run and
unlock the osq lock. Even there is need_resched(), it did not help on
such scenario.

we may also need fix other XXX_spin_on_owner later based on this patch set.
these spin_on_onwer variant cause rcu stall.

Pan Xinhui (3):
  powerpc/spinlock: support vcpu preempted check
  locking/osq: Drop the overload of osq_lock()
  kernel/sched: introduce vcpu preempted check interface

 arch/powerpc/include/asm/lppaca.h   |  6 ++++++
 arch/powerpc/include/asm/spinlock.h | 15 +++++++++++++++
 include/linux/sched.h               |  9 +++++++++
 kernel/locking/osq_lock.c           | 16 +++++++++++++++-
 4 files changed, 45 insertions(+), 1 deletion(-)

-- 
2.4.11

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface
  2016-06-27 17:41 [PATCH 0/3] implement vcpu preempted check Pan Xinhui
@ 2016-06-27 17:41 ` Pan Xinhui
  2016-06-27 14:00   ` Peter Zijlstra
  2016-06-27 14:05   ` Boqun Feng
  2016-06-27 17:41 ` [PATCH 2/3] powerpc/spinlock: support vcpu preempted check Pan Xinhui
  2016-06-27 17:41 ` [PATCH 3/3] locking/osq: Drop the overload of osq_lock() Pan Xinhui
  2 siblings, 2 replies; 18+ messages in thread
From: Pan Xinhui @ 2016-06-27 17:41 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: paulmck, peterz, mingo, mpe, paulus, benh, Waiman.Long,
	boqun.feng, will.deacon, dave, Pan Xinhui

this supports to fix lock holder preempted issue which run as a guest

for kernel users, we could use bool vcpu_is_preempted(int cpu) to detech
if one vcpu is preempted or not.

The default implementation is a macrodefined by false. So compiler can
wrap it out if arch dose not support such vcpu pteempted check.

archs can implement it by define arch_vcpu_is_preempted().

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 include/linux/sched.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6e42ada..dc0a9c3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3293,6 +3293,15 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
 
 #endif /* CONFIG_SMP */
 
+#ifdef arch_vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+	return arch_vcpu_is_preempted(cpu);
+}
+#else
+#define vcpu_is_preempted(cpu)	false
+#endif
+
 extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
 extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] powerpc/spinlock: support vcpu preempted check
  2016-06-27 17:41 [PATCH 0/3] implement vcpu preempted check Pan Xinhui
  2016-06-27 17:41 ` [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
@ 2016-06-27 17:41 ` Pan Xinhui
  2016-06-27 14:17   ` Peter Zijlstra
  2016-06-27 14:58   ` Boqun Feng
  2016-06-27 17:41 ` [PATCH 3/3] locking/osq: Drop the overload of osq_lock() Pan Xinhui
  2 siblings, 2 replies; 18+ messages in thread
From: Pan Xinhui @ 2016-06-27 17:41 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: paulmck, peterz, mingo, mpe, paulus, benh, Waiman.Long,
	boqun.feng, will.deacon, dave, Pan Xinhui

This is to fix some holder preemption issues. Spinning at one
vcpu which is preempted is meaningless.

Kernel need such interfaces, So lets support it.

We also should suooprt both the shared and dedicated mode.
So add lppaca_dedicated_proc macro in lppaca.h

Suggested-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/lppaca.h   |  6 ++++++
 arch/powerpc/include/asm/spinlock.h | 15 +++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
index d0a2a2f..0a263d3 100644
--- a/arch/powerpc/include/asm/lppaca.h
+++ b/arch/powerpc/include/asm/lppaca.h
@@ -111,12 +111,18 @@ extern struct lppaca lppaca[];
  * we will have to transition to something better.
  */
 #define LPPACA_OLD_SHARED_PROC		2
+#define LPPACA_OLD_DEDICATED_PROC      (1 << 6)
 
 static inline bool lppaca_shared_proc(struct lppaca *l)
 {
 	return !!(l->__old_status & LPPACA_OLD_SHARED_PROC);
 }
 
+static inline bool lppaca_dedicated_proc(struct lppaca *l)
+{
+	return !!(l->__old_status & LPPACA_OLD_DEDICATED_PROC);
+}
+
 /*
  * SLB shadow buffer structure as defined in the PAPR.  The save_area
  * contains adjacent ESID and VSID pairs for each shadowed SLB.  The
diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
index 523673d..ae938ee 100644
--- a/arch/powerpc/include/asm/spinlock.h
+++ b/arch/powerpc/include/asm/spinlock.h
@@ -52,6 +52,21 @@
 #define SYNC_IO
 #endif
 
+/* For fixing some spinning issues in a guest.
+ * kernel would check if vcpu is preempted during a spin loop.
+ * we support that.
+ */
+#define arch_vcpu_is_preempted arch_vcpu_is_preempted
+static inline bool arch_vcpu_is_preempted(int cpu)
+{
+	struct lppaca *lp = &lppaca_of(cpu);
+
+	if (unlikely(!(lppaca_shared_proc(lp) ||
+			lppaca_dedicated_proc(lp))))
+		return false;
+	return !!(be32_to_cpu(lp->yield_count) & 1);
+}
+
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
 	return lock.slock == 0;
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] locking/osq: Drop the overload of osq_lock()
  2016-06-27 17:41 [PATCH 0/3] implement vcpu preempted check Pan Xinhui
  2016-06-27 17:41 ` [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
  2016-06-27 17:41 ` [PATCH 2/3] powerpc/spinlock: support vcpu preempted check Pan Xinhui
@ 2016-06-27 17:41 ` Pan Xinhui
  2016-06-27 14:21   ` Peter Zijlstra
  2 siblings, 1 reply; 18+ messages in thread
From: Pan Xinhui @ 2016-06-27 17:41 UTC (permalink / raw)
  To: linuxppc-dev, linux-kernel
  Cc: paulmck, peterz, mingo, mpe, paulus, benh, Waiman.Long,
	boqun.feng, will.deacon, dave, Pan Xinhui

An over-committed guest with more vCPUs than pCPUs has a heavy overload
in osq_lock().

This is because vCPU A hold the osq lock and yield out, vCPU B wait
per_cpu node->locked to be set. IOW, vCPU B wait vCPU A to run and
unlock the osq lock. Such spinning is meaningless.

So lets use vcpu_is_preempted() to detect if we need stop the spinning

test case:
perf record -a perf bench sched messaging -g 400 -p && perf report

before patch:
18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

after patch:
20.68%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner
 8.45%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 4.12%  sched-messaging  [kernel.vmlinux]  [k] system_call
 3.01%  sched-messaging  [kernel.vmlinux]  [k] system_call_common
 2.83%  sched-messaging  [kernel.vmlinux]  [k] copypage_power7
 2.64%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 2.00%  sched-messaging  [kernel.vmlinux]  [k] osq_lock

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/osq_lock.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..9e86f0b 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
 	return cpu_nr + 1;
 }
 
+static inline int node_cpu(struct optimistic_spin_node *node)
+{
+	return node->cpu - 1;
+}
+
 static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
 {
 	int cpu_nr = encoded_cpu_val - 1;
@@ -118,8 +123,17 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	while (!READ_ONCE(node->locked)) {
 		/*
 		 * If we need to reschedule bail... so we can block.
+		 * An over-committed guest with more vCPUs than pCPUs
+		 * might fall in this loop and cause a huge overload.
+		 * This is because vCPU A(prev) hold the osq lock and yield out
+		 * vCPU B(node) wait ->locked to be set, IOW, it wait utill
+		 * vCPU A run and unlock the osq lock. Such spin is meaningless
+		 * use vcpu_is_preempted to detech such case. IF arch does not
+		 * support vcpu preempted check, vcpu_is_preempted is a macro
+		 * defined by false.
 		 */
-		if (need_resched())
+		if (need_resched() ||
+			vcpu_is_preempted(node_cpu(node->prev)))
 			goto unqueue;
 
 		cpu_relax_lowlatency();
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface
  2016-06-27 14:02     ` Peter Zijlstra
@ 2016-06-28  3:14       ` xinhui
  0 siblings, 0 replies; 18+ messages in thread
From: xinhui @ 2016-06-28  3:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linuxppc-dev, linux-kernel, paulmck, mingo, mpe, paulus, benh,
	Waiman.Long, boqun.feng, will.deacon, dave



On 2016年06月27日 22:02, Peter Zijlstra wrote:
> On Mon, Jun 27, 2016 at 04:00:43PM +0200, Peter Zijlstra wrote:
>> On Mon, Jun 27, 2016 at 01:41:28PM -0400, Pan Xinhui wrote:
>>> +++ b/include/linux/sched.h
>>> @@ -3293,6 +3293,15 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
>>>
>>>   #endif /* CONFIG_SMP */
>>>
>>> +#ifdef arch_vcpu_is_preempted
>>> +static inline bool vcpu_is_preempted(int cpu)
>>> +{
>>> +	return arch_vcpu_is_preempted(cpu);
>>> +}
>>> +#else
>>> +#define vcpu_is_preempted(cpu)	false
>>> +#endif
>>
>> #ifndef vcpu_is_preempted
>> #define vcpu_is_preempted(cpu)		(false)
>> #endif
>>
>> Is so much simpler...
>>
fair enough.

>> Also, please Cc the virt list so that other interested parties can
>> comment, and maybe also the s390 folks.
>
oh. I forgot that. maybe we need cc more.

root@ltcalpine2-lp13:~/linux# find ./arch -name kvm
./arch/arm/kvm
./arch/arm64/kvm
./arch/mips/kvm
./arch/powerpc/kvm
./arch/s390/kvm
./arch/tile/kvm
./arch/x86/kvm

> And before you hurry off to post again, add a patch doing
> mutex_spin_on_owner() and rwsem_spin_in_owner().
>
will do that.

thanks for your suggestion. :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface
  2016-06-27 14:05   ` Boqun Feng
@ 2016-06-28  3:15     ` xinhui
  0 siblings, 0 replies; 18+ messages in thread
From: xinhui @ 2016-06-28  3:15 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linuxppc-dev, linux-kernel, paulmck, peterz, mingo, mpe, paulus,
	benh, Waiman.Long, will.deacon, dave



On 2016年06月27日 22:05, Boqun Feng wrote:
> On Mon, Jun 27, 2016 at 01:41:28PM -0400, Pan Xinhui wrote:
>> this supports to fix lock holder preempted issue which run as a guest
>>
>> for kernel users, we could use bool vcpu_is_preempted(int cpu) to detech
>> if one vcpu is preempted or not.
>>
>> The default implementation is a macrodefined by false. So compiler can
>> wrap it out if arch dose not support such vcpu pteempted check.
>>
>> archs can implement it by define arch_vcpu_is_preempted().
>>
>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>> ---
>>   include/linux/sched.h | 9 +++++++++
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 6e42ada..dc0a9c3 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -3293,6 +3293,15 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
>>
>>   #endif /* CONFIG_SMP */
>>
>> +#ifdef arch_vcpu_is_preempted
>> +static inline bool vcpu_is_preempted(int cpu)
>> +{
>> +	return arch_vcpu_is_preempted(cpu);
>> +}
>> +#else
>> +#define vcpu_is_preempted(cpu)	false
>> +#endif
>> +
>
> I think you are missing Peter's comment here. We can
>
> #ifndef vcpu_is_preempted
> #define vcpu_is_preempted(cpu)	fasle
> #endif
>
> And different archs implement their own versions of vcpu_is_preempted(),
> IOW, no need for an arch_vcpu_is_preempted().
>
yes, right.
just vcpu_is_preempted, no arch_vcpu_is_preempted..
thanks

> Regards,
> Boqun
>
>>   extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
>>   extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
>>
>> --
>> 2.4.11
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/spinlock: support vcpu preempted check
  2016-06-27 14:17   ` Peter Zijlstra
@ 2016-06-28  3:23     ` xinhui
  0 siblings, 0 replies; 18+ messages in thread
From: xinhui @ 2016-06-28  3:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linuxppc-dev, linux-kernel, paulmck, mingo, mpe, paulus, benh,
	Waiman.Long, boqun.feng, will.deacon, dave



On 2016年06月27日 22:17, Peter Zijlstra wrote:
> On Mon, Jun 27, 2016 at 01:41:29PM -0400, Pan Xinhui wrote:
>> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
>> index 523673d..ae938ee 100644
>> --- a/arch/powerpc/include/asm/spinlock.h
>> +++ b/arch/powerpc/include/asm/spinlock.h
>> @@ -52,6 +52,21 @@
>>   #define SYNC_IO
>>   #endif
>>
>> +/* For fixing some spinning issues in a guest.
>> + * kernel would check if vcpu is preempted during a spin loop.
>> + * we support that.
>> + */
>
> If you look around in that file you'll notice that the above comment
> style is inconsistent.
>
> Nor is the comment really clarifying things, for one you fail to mention
> the problem by its known name. You also forget to explain how this
> interface will help. How about something like this:
>
> /*
>   * In order to deal with a various lock holder preemption issues provide
>   * an interface to see if a vCPU is currently running or not.
>   *
>   * This allows us to terminate optimistic spin loops and block,
>   * analogous to the native optimistic spin heuristic of testing if the
>   * lock owner task is running or not.
>   */
thanks!!!

>
> Also, since you now have a useful comment, which is not architecture
> specific, I would place it with the common vcpu_is_preempted()
> definition in sched.h.
>
agree with you. Will do that. I will also add Suggested-by with you.
thanks

> Hmm?
>
>> +#define arch_vcpu_is_preempted arch_vcpu_is_preempted
>> +static inline bool arch_vcpu_is_preempted(int cpu)
>> +{
>> +	struct lppaca *lp = &lppaca_of(cpu);
>> +
>> +	if (unlikely(!(lppaca_shared_proc(lp) ||
>> +			lppaca_dedicated_proc(lp))))
>> +		return false;
>> +	return !!(be32_to_cpu(lp->yield_count) & 1);
>> +}
>> +
>>   static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>>   {
>>   	return lock.slock == 0;
>> --
>> 2.4.11
>>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/spinlock: support vcpu preempted check
  2016-06-27 14:58   ` Boqun Feng
@ 2016-06-28  3:39     ` xinhui
  2016-06-28  5:03       ` Boqun Feng
  0 siblings, 1 reply; 18+ messages in thread
From: xinhui @ 2016-06-28  3:39 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linuxppc-dev, linux-kernel, paulmck, peterz, mingo, mpe, paulus,
	benh, Waiman.Long, will.deacon, dave



On 2016年06月27日 22:58, Boqun Feng wrote:
> Hi Xinhui,
>
> On Mon, Jun 27, 2016 at 01:41:29PM -0400, Pan Xinhui wrote:
>> This is to fix some holder preemption issues. Spinning at one
>> vcpu which is preempted is meaningless.
>>
>> Kernel need such interfaces, So lets support it.
>>
>> We also should suooprt both the shared and dedicated mode.
>> So add lppaca_dedicated_proc macro in lppaca.h
>>
>> Suggested-by: Boqun Feng <boqun.feng@gmail.com>
>> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/include/asm/lppaca.h   |  6 ++++++
>>   arch/powerpc/include/asm/spinlock.h | 15 +++++++++++++++
>>   2 files changed, 21 insertions(+)
>>
>> diff --git a/arch/powerpc/include/asm/lppaca.h b/arch/powerpc/include/asm/lppaca.h
>> index d0a2a2f..0a263d3 100644
>> --- a/arch/powerpc/include/asm/lppaca.h
>> +++ b/arch/powerpc/include/asm/lppaca.h
>> @@ -111,12 +111,18 @@ extern struct lppaca lppaca[];
>>    * we will have to transition to something better.
>>    */
>>   #define LPPACA_OLD_SHARED_PROC		2
>> +#define LPPACA_OLD_DEDICATED_PROC      (1 << 6)
>>
>
> I think you should describe a little bit about the magic number here,
right.

> i.e. what document/specification says this should work, and how this
> works.
>
yep, I need add some comments here. for example, this bit is firmware reserved...
thanks, will do that.

>>   static inline bool lppaca_shared_proc(struct lppaca *l)
>>   {
>>   	return !!(l->__old_status & LPPACA_OLD_SHARED_PROC);
>>   }
>>
>> +static inline bool lppaca_dedicated_proc(struct lppaca *l)
>> +{
>> +	return !!(l->__old_status & LPPACA_OLD_DEDICATED_PROC);
>> +}
>> +
>>   /*
>>    * SLB shadow buffer structure as defined in the PAPR.  The save_area
>>    * contains adjacent ESID and VSID pairs for each shadowed SLB.  The
>> diff --git a/arch/powerpc/include/asm/spinlock.h b/arch/powerpc/include/asm/spinlock.h
>> index 523673d..ae938ee 100644
>> --- a/arch/powerpc/include/asm/spinlock.h
>> +++ b/arch/powerpc/include/asm/spinlock.h
>> @@ -52,6 +52,21 @@
>>   #define SYNC_IO
>>   #endif
>>
>> +/* For fixing some spinning issues in a guest.
>> + * kernel would check if vcpu is preempted during a spin loop.
>> + * we support that.
>> + */
>> +#define arch_vcpu_is_preempted arch_vcpu_is_preempted
>> +static inline bool arch_vcpu_is_preempted(int cpu)
>
> This function should be guarded by #ifdef PPC_PSERIES .. #endif, right?
> Because if the kernel is not compiled with guest support,
> vcpu_is_preempted() should always be false, right?
>
oh, I forgot that. thanks for pointing it out.

>> +{
>> +	struct lppaca *lp = &lppaca_of(cpu);
>> +
>> +	if (unlikely(!(lppaca_shared_proc(lp) ||
>> +			lppaca_dedicated_proc(lp))))
>
> Do you want to detect whether we are running in a guest(ie. pseries
> kernel) here? Then I wonder whether "machine_is(pseries)" works here.
>
I tried as you said yesterday. but .h file has dependencies.
As you said, if we add #ifdef PPC_PSERIES, this is not a big problem. only powernv will be affected as they are built into same kernel img.

> Regards,
> Boqun
>
>> +		return false;
>> +	return !!(be32_to_cpu(lp->yield_count) & 1);
>> +}
>> +
>>   static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>>   {
>>   	return lock.slock == 0;
>> --
>> 2.4.11
>>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/spinlock: support vcpu preempted check
  2016-06-28  3:39     ` xinhui
@ 2016-06-28  5:03       ` Boqun Feng
  2016-06-28  5:38         ` xinhui
  0 siblings, 1 reply; 18+ messages in thread
From: Boqun Feng @ 2016-06-28  5:03 UTC (permalink / raw)
  To: xinhui
  Cc: linuxppc-dev, linux-kernel, paulmck, peterz, mingo, mpe, paulus,
	benh, Waiman.Long, will.deacon, dave

[-- Attachment #1: Type: text/plain, Size: 1215 bytes --]

On Tue, Jun 28, 2016 at 11:39:18AM +0800, xinhui wrote:
[snip]
> > > +{
> > > +	struct lppaca *lp = &lppaca_of(cpu);
> > > +
> > > +	if (unlikely(!(lppaca_shared_proc(lp) ||
> > > +			lppaca_dedicated_proc(lp))))
> > 
> > Do you want to detect whether we are running in a guest(ie. pseries
> > kernel) here? Then I wonder whether "machine_is(pseries)" works here.
> > 
> I tried as you said yesterday. but .h file has dependencies.
> As you said, if we add #ifdef PPC_PSERIES, this is not a big problem. only powernv will be affected as they are built into same kernel img.
> 

I never said this it not a big problem ;-)

The problem here is that we only need to detect the vcpu preemption in
a guest, and there could be several ways we can detect whether the
kernel is running in a guest. It's worthwhile to try find the best one
for this. Besides, it's really better that you can make sure we are
runing out of options before you introduce something like
lppaca_dedicated_proc().

I have a feeling that yield_count is non-zero only if we are running in
a guest, if so, we can use this and save several loads. But surely we
need the confirmation from ppc maintainers.

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] powerpc/spinlock: support vcpu preempted check
  2016-06-28  5:03       ` Boqun Feng
@ 2016-06-28  5:38         ` xinhui
  0 siblings, 0 replies; 18+ messages in thread
From: xinhui @ 2016-06-28  5:38 UTC (permalink / raw)
  To: Boqun Feng
  Cc: linuxppc-dev, linux-kernel, paulmck, peterz, mingo, mpe, paulus,
	benh, Waiman.Long, will.deacon, dave



On 2016年06月28日 13:03, Boqun Feng wrote:
> On Tue, Jun 28, 2016 at 11:39:18AM +0800, xinhui wrote:
> [snip]
>>>> +{
>>>> +	struct lppaca *lp = &lppaca_of(cpu);
>>>> +
>>>> +	if (unlikely(!(lppaca_shared_proc(lp) ||
>>>> +			lppaca_dedicated_proc(lp))))
>>>
>>> Do you want to detect whether we are running in a guest(ie. pseries
>>> kernel) here? Then I wonder whether "machine_is(pseries)" works here.
>>>
>> I tried as you said yesterday. but .h file has dependencies.
>> As you said, if we add #ifdef PPC_PSERIES, this is not a big problem. only powernv will be affected as they are built into same kernel img.
>>
>
> I never said this it not a big problem ;-)
>
> The problem here is that we only need to detect the vcpu preemption in
> a guest, and there could be several ways we can detect whether the
> kernel is running in a guest. It's worthwhile to try find the best one
> for this. Besides, it's really better that you can make sure we are
> runing out of options before you introduce something like
> lppaca_dedicated_proc().
>
> I have a feeling that yield_count is non-zero only if we are running in
> a guest, if so, we can use this and save several loads. But surely we
> need the confirmation from ppc maintainers.
>
yes, on powernv, print the lppaca.yield_count and it is always zero. looks like only hypervisor and os can touch/modify it.


> Regards,
> Boqun
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface
  2016-06-27 14:00   ` Peter Zijlstra
  2016-06-27 14:02     ` Peter Zijlstra
@ 2016-06-28  7:00     ` Heiko Carstens
  2016-06-28  9:47       ` xinhui
  1 sibling, 1 reply; 18+ messages in thread
From: Heiko Carstens @ 2016-06-28  7:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Pan Xinhui, linuxppc-dev, linux-kernel, paulmck, mingo, mpe,
	paulus, benh, Waiman.Long, boqun.feng, will.deacon, dave,
	Martin Schwidefsky, Christian Borntraeger

On Mon, Jun 27, 2016 at 04:00:43PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 27, 2016 at 01:41:28PM -0400, Pan Xinhui wrote:
> > +++ b/include/linux/sched.h
> > @@ -3293,6 +3293,15 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
> >  
> >  #endif /* CONFIG_SMP */
> >  
> > +#ifdef arch_vcpu_is_preempted
> > +static inline bool vcpu_is_preempted(int cpu)
> > +{
> > +	return arch_vcpu_is_preempted(cpu);
> > +}
> > +#else
> > +#define vcpu_is_preempted(cpu)	false
> > +#endif
> 
> #ifndef vcpu_is_preempted
> #define vcpu_is_preempted(cpu)		(false)
> #endif
> 
> Is so much simpler...
> 
> Also, please Cc the virt list so that other interested parties can
> comment, and maybe also the s390 folks.

The s390 implementation would be to simply use cpu_is_preempted() from
arch/s390/lib/spinlock.c.
It's nice that there will be a common code function for this!

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface
  2016-06-28  7:00     ` Heiko Carstens
@ 2016-06-28  9:47       ` xinhui
  0 siblings, 0 replies; 18+ messages in thread
From: xinhui @ 2016-06-28  9:47 UTC (permalink / raw)
  To: Heiko Carstens, Peter Zijlstra
  Cc: linuxppc-dev, linux-kernel, paulmck, mingo, mpe, paulus, benh,
	Waiman.Long, boqun.feng, will.deacon, dave, Martin Schwidefsky,
	Christian Borntraeger



On 2016年06月28日 15:00, Heiko Carstens wrote:
> On Mon, Jun 27, 2016 at 04:00:43PM +0200, Peter Zijlstra wrote:
>> On Mon, Jun 27, 2016 at 01:41:28PM -0400, Pan Xinhui wrote:
>>> +++ b/include/linux/sched.h
>>> @@ -3293,6 +3293,15 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
>>>
>>>   #endif /* CONFIG_SMP */
>>>
>>> +#ifdef arch_vcpu_is_preempted
>>> +static inline bool vcpu_is_preempted(int cpu)
>>> +{
>>> +	return arch_vcpu_is_preempted(cpu);
>>> +}
>>> +#else
>>> +#define vcpu_is_preempted(cpu)	false
>>> +#endif
>>
>> #ifndef vcpu_is_preempted
>> #define vcpu_is_preempted(cpu)		(false)
>> #endif
>>
>> Is so much simpler...
>>
>> Also, please Cc the virt list so that other interested parties can
>> comment, and maybe also the s390 folks.
>
> The s390 implementation would be to simply use cpu_is_preempted() from
> arch/s390/lib/spinlock.c.
that's great.

> It's nice that there will be a common code function for this!
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-06-28  9:47 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-27 17:41 [PATCH 0/3] implement vcpu preempted check Pan Xinhui
2016-06-27 17:41 ` [PATCH 1/3] kernel/sched: introduce vcpu preempted check interface Pan Xinhui
2016-06-27 14:00   ` Peter Zijlstra
2016-06-27 14:02     ` Peter Zijlstra
2016-06-28  3:14       ` xinhui
2016-06-28  7:00     ` Heiko Carstens
2016-06-28  9:47       ` xinhui
2016-06-27 14:05   ` Boqun Feng
2016-06-28  3:15     ` xinhui
2016-06-27 17:41 ` [PATCH 2/3] powerpc/spinlock: support vcpu preempted check Pan Xinhui
2016-06-27 14:17   ` Peter Zijlstra
2016-06-28  3:23     ` xinhui
2016-06-27 14:58   ` Boqun Feng
2016-06-28  3:39     ` xinhui
2016-06-28  5:03       ` Boqun Feng
2016-06-28  5:38         ` xinhui
2016-06-27 17:41 ` [PATCH 3/3] locking/osq: Drop the overload of osq_lock() Pan Xinhui
2016-06-27 14:21   ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.