linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/2] locking/osq: Drop the overload of osq_lock()
  2016-06-26 10:41 ` [PATCH 2/2] locking/osq: Drop the overload of osq_lock() Pan Xinhui
@ 2016-06-26  7:12   ` panxinhui
  0 siblings, 0 replies; 6+ messages in thread
From: panxinhui @ 2016-06-26  7:12 UTC (permalink / raw)
  To: Pan Xinhui; +Cc: panxinhui, linux-kernel, peterz, mingo, boqun.feng


> 在 2016年6月26日,18:41,Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> 写道:
> 
> An over-committed guest with more vCPUs than pCPUs has a heavy overload
> in osq_lock().
> 
> This is because vCPU A hold the osq lock and yield out, vCPU B wait
> per_cpu node->locked to be set. IOW, vCPU B wait vCPU A to run and
> unlock the osq lock.
> 
> So lets also use neet_yield_to() to detect if we need stop the spinning
> 
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
> kernel/locking/osq_lock.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 05a3785..4287603 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
> 	return cpu_nr + 1;
> }
> 
> +static inline int node_cpu(struct optimistic_spin_node *node)
> +{
> +	return node->cpu - 1;
> +}
> +
> static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
> {
> 	int cpu_nr = encoded_cpu_val - 1;
> @@ -84,9 +89,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
> bool osq_lock(struct optimistic_spin_queue *lock)
> {
> 	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> -	struct optimistic_spin_node *prev, *next;
> +	struct optimistic_spin_node *prev, *next, *prev_old;
> 	int curr = encode_cpu(smp_processor_id());
> 	int old;
> +	unsigned int yield_count;
> 
> 	node->locked = 0;
> 	node->next = NULL;
> @@ -114,14 +120,20 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> 	 * guaranteed their existence -- this allows us to apply
> 	 * cmpxchg in an attempt to undo our queueing.
> 	 */
> -
> +	prev_old = prev;
> +	yield_count = vcpu_get_yield_count(node_cpu(prev));
> 	while (!READ_ONCE(node->locked)) {
> 		/*
> 		 * If we need to reschedule bail... so we can block.
> 		 */
> -		if (need_resched())
> +		if (need_resched() ||
> +			need_yield_to(node_cpu(prev), yield_count))
> 			goto unqueue;
> 
> +		prev = READ_ONCE(node->prev);
> +		if (prev != prev_old)
> +			yield_count = vcpu_get_yield_count(node_cpu(prev));
> +

OK, one mistake , 
should be
	if (prev != prev_old) {
		prev_old = prev;
		yield_count = vcpu_get_yield_count(node_cpu(prev));
	}

> 		cpu_relax_lowlatency();
> 	}
> 	return true;
> -- 
> 2.4.11
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 0/2] implement vcpu preempted check
@ 2016-06-26 10:41 Pan Xinhui
  2016-06-26 10:41 ` [PATCH 1/2] kernel/sched: introduce vcpu preempted interface Pan Xinhui
  2016-06-26 10:41 ` [PATCH 2/2] locking/osq: Drop the overload of osq_lock() Pan Xinhui
  0 siblings, 2 replies; 6+ messages in thread
From: Pan Xinhui @ 2016-06-26 10:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, boqun.feng, Pan Xinhui

This is to fix some bad issues on an over-commited guest.

test-caes:
perf record -a perf bench sched messaging -g 400 -p && perf report

18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
 5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
 3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
 3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
 3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
 2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call

osq takes a long time with preemption disabled which is really bad.

This is because vCPU A hold the osq lock and yield out, vCPU B wait
per_cpu node->locked to be set. IOW, vCPU B wait vCPU A to run and
unlock the osq lock. Even there is need_resched(), it did not help on
such scenario.

we may also need fix other XXX_spin_on_owner later based on this patch set.
these spin_on_onwer variant cause rcu stall.

Pan Xinhui (1):
  locking/osq: Drop the overload of osq_lock()

pan xinhui (1):
  kernel/sched: introduce vcpu preempted interface

 include/linux/sched.h     | 34 ++++++++++++++++++++++++++++++++++
 kernel/locking/osq_lock.c | 18 +++++++++++++++---
 2 files changed, 49 insertions(+), 3 deletions(-)

-- 
2.4.11

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] kernel/sched: introduce vcpu preempted interface
  2016-06-26 10:41 [PATCH 0/2] implement vcpu preempted check Pan Xinhui
@ 2016-06-26 10:41 ` Pan Xinhui
  2016-06-27  8:42   ` Peter Zijlstra
  2016-06-26 10:41 ` [PATCH 2/2] locking/osq: Drop the overload of osq_lock() Pan Xinhui
  1 sibling, 1 reply; 6+ messages in thread
From: Pan Xinhui @ 2016-06-26 10:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, boqun.feng, pan xinhui

From: pan xinhui <xinhui.pan@linux.vnet.ibm.com>

this supports to fix lock holder preempted issue which run as a guest

two interfaces,
bool vcpu_is_preempted(int cpu);
unsigned int vcpu_get_yield_count(int cpu);
arch may need implement anyone of them.

some spinneris may also need call need_yield_to(int cpu, unsigned int
old_yield_count) to know if it need stop the spinning.

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 include/linux/sched.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6e42ada..9c565d2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3293,6 +3293,40 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
 
 #endif /* CONFIG_SMP */
 
+#ifdef arch_vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+	return arch_vcpu_is_preempted(cpu);
+}
+#else
+static inline bool vcpu_is_preempted(int cpu)
+{
+	return 0;
+}
+#endif
+
+#ifdef arch_vcpu_get_yield_count
+static inline unsigned int vcpu_get_yield_count(int cpu)
+{
+	return arch_vcpu_get_yield_count(cpu);
+}
+#else
+static inline unsigned int vcpu_get_yield_count(int cpu)
+{
+	return 0;
+}
+#endif
+
+static inline bool
+need_yield_to(int vcpu, unsigned int old_yield_count)
+{
+	/* if we find the vcpu is preempted,
+	* then we may want to kick it, IOW, yield to it
+	*/
+	return vcpu_is_preempted(vcpu) ||
+		(vcpu_get_yield_count(vcpu) != old_yield_count);
+}
+
 extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
 extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
 
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] locking/osq: Drop the overload of osq_lock()
  2016-06-26 10:41 [PATCH 0/2] implement vcpu preempted check Pan Xinhui
  2016-06-26 10:41 ` [PATCH 1/2] kernel/sched: introduce vcpu preempted interface Pan Xinhui
@ 2016-06-26 10:41 ` Pan Xinhui
  2016-06-26  7:12   ` panxinhui
  1 sibling, 1 reply; 6+ messages in thread
From: Pan Xinhui @ 2016-06-26 10:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: peterz, mingo, boqun.feng, Pan Xinhui

An over-committed guest with more vCPUs than pCPUs has a heavy overload
in osq_lock().

This is because vCPU A hold the osq lock and yield out, vCPU B wait
per_cpu node->locked to be set. IOW, vCPU B wait vCPU A to run and
unlock the osq lock.

So lets also use neet_yield_to() to detect if we need stop the spinning

Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
 kernel/locking/osq_lock.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..4287603 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
 	return cpu_nr + 1;
 }
 
+static inline int node_cpu(struct optimistic_spin_node *node)
+{
+	return node->cpu - 1;
+}
+
 static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
 {
 	int cpu_nr = encoded_cpu_val - 1;
@@ -84,9 +89,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
 bool osq_lock(struct optimistic_spin_queue *lock)
 {
 	struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
-	struct optimistic_spin_node *prev, *next;
+	struct optimistic_spin_node *prev, *next, *prev_old;
 	int curr = encode_cpu(smp_processor_id());
 	int old;
+	unsigned int yield_count;
 
 	node->locked = 0;
 	node->next = NULL;
@@ -114,14 +120,20 @@ bool osq_lock(struct optimistic_spin_queue *lock)
 	 * guaranteed their existence -- this allows us to apply
 	 * cmpxchg in an attempt to undo our queueing.
 	 */
-
+	prev_old = prev;
+	yield_count = vcpu_get_yield_count(node_cpu(prev));
 	while (!READ_ONCE(node->locked)) {
 		/*
 		 * If we need to reschedule bail... so we can block.
 		 */
-		if (need_resched())
+		if (need_resched() ||
+			need_yield_to(node_cpu(prev), yield_count))
 			goto unqueue;
 
+		prev = READ_ONCE(node->prev);
+		if (prev != prev_old)
+			yield_count = vcpu_get_yield_count(node_cpu(prev));
+
 		cpu_relax_lowlatency();
 	}
 	return true;
-- 
2.4.11

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] kernel/sched: introduce vcpu preempted interface
  2016-06-26 10:41 ` [PATCH 1/2] kernel/sched: introduce vcpu preempted interface Pan Xinhui
@ 2016-06-27  8:42   ` Peter Zijlstra
  2016-06-27  9:42     ` xinhui
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-06-27  8:42 UTC (permalink / raw)
  To: Pan Xinhui; +Cc: linux-kernel, mingo, boqun.feng

On Sun, Jun 26, 2016 at 06:41:54AM -0400, Pan Xinhui wrote:

> +#ifdef arch_vcpu_is_preempted
> +static inline bool vcpu_is_preempted(int cpu)
> +{
> +	return arch_vcpu_is_preempted(cpu);
> +}
> +#else
> +static inline bool vcpu_is_preempted(int cpu)
> +{
> +	return 0;
> +}
> +#endif
> +
> +#ifdef arch_vcpu_get_yield_count
> +static inline unsigned int vcpu_get_yield_count(int cpu)
> +{
> +	return arch_vcpu_get_yield_count(cpu);
> +}
> +#else
> +static inline unsigned int vcpu_get_yield_count(int cpu)
> +{
> +	return 0;
> +}
> +#endif


Please, just do something like:

#ifndef vcpu_is_preempted
static inline bool vcpu_is_preempted(int cpu)
{
	return false;
}
#endif

No point in making it more complicated.

> +static inline bool
> +need_yield_to(int vcpu, unsigned int old_yield_count)

namespace... this thing should be called: vcpu_something()

> +{
> +	/* if we find the vcpu is preempted,
> +	* then we may want to kick it, IOW, yield to it
> +	*/
> +	return vcpu_is_preempted(vcpu) ||
> +		(vcpu_get_yield_count(vcpu) != old_yield_count);
> +}

And can you make doubly sure (and mention in the Changelog) that the OSQ
code compiles all this away when using these definitions.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] kernel/sched: introduce vcpu preempted interface
  2016-06-27  8:42   ` Peter Zijlstra
@ 2016-06-27  9:42     ` xinhui
  0 siblings, 0 replies; 6+ messages in thread
From: xinhui @ 2016-06-27  9:42 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, boqun.feng



On 2016年06月27日 16:42, Peter Zijlstra wrote:
> On Sun, Jun 26, 2016 at 06:41:54AM -0400, Pan Xinhui wrote:
>
>> +#ifdef arch_vcpu_is_preempted
>> +static inline bool vcpu_is_preempted(int cpu)
>> +{
>> +	return arch_vcpu_is_preempted(cpu);
>> +}
>> +#else
>> +static inline bool vcpu_is_preempted(int cpu)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>> +#ifdef arch_vcpu_get_yield_count
>> +static inline unsigned int vcpu_get_yield_count(int cpu)
>> +{
>> +	return arch_vcpu_get_yield_count(cpu);
>> +}
>> +#else
>> +static inline unsigned int vcpu_get_yield_count(int cpu)
>> +{
>> +	return 0;
>> +}
>> +#endif
>
>
> Please, just do something like:
>
> #ifndef vcpu_is_preempted
> static inline bool vcpu_is_preempted(int cpu)
> {
> 	return false;
> }
> #endif
>
> No point in making it more complicated.
>
right, vcpu_is_preempted() is good enough to handle our osq issue.

>> +static inline bool
>> +need_yield_to(int vcpu, unsigned int old_yield_count)
>
> namespace... this thing should be called: vcpu_something()
>
>> +{
>> +	/* if we find the vcpu is preempted,
>> +	* then we may want to kick it, IOW, yield to it
>> +	*/
>> +	return vcpu_is_preempted(vcpu) ||
>> +		(vcpu_get_yield_count(vcpu) != old_yield_count);
>> +}
>
> And can you make doubly sure (and mention in the Changelog) that the OSQ
> code compiles all this away when using these definitions.
>
vimdiff shows the osq_lock.o has a little difference because osq read the yield_count and prev, even they are not used.

however, as you suggest above, IF I remove the vcpu_get_yield_count() and relevant code in osq_lock,  then the binary is same.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-06-27  9:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-26 10:41 [PATCH 0/2] implement vcpu preempted check Pan Xinhui
2016-06-26 10:41 ` [PATCH 1/2] kernel/sched: introduce vcpu preempted interface Pan Xinhui
2016-06-27  8:42   ` Peter Zijlstra
2016-06-27  9:42     ` xinhui
2016-06-26 10:41 ` [PATCH 2/2] locking/osq: Drop the overload of osq_lock() Pan Xinhui
2016-06-26  7:12   ` panxinhui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).