* Re: [PATCH 2/2] locking/osq: Drop the overload of osq_lock()
2016-06-26 10:41 ` [PATCH 2/2] locking/osq: Drop the overload of osq_lock() Pan Xinhui
@ 2016-06-26 7:12 ` panxinhui
0 siblings, 0 replies; 6+ messages in thread
From: panxinhui @ 2016-06-26 7:12 UTC (permalink / raw)
To: Pan Xinhui; +Cc: panxinhui, linux-kernel, peterz, mingo, boqun.feng
> 在 2016年6月26日,18:41,Pan Xinhui <xinhui.pan@linux.vnet.ibm.com> 写道:
>
> An over-committed guest with more vCPUs than pCPUs has a heavy overload
> in osq_lock().
>
> This is because vCPU A hold the osq lock and yield out, vCPU B wait
> per_cpu node->locked to be set. IOW, vCPU B wait vCPU A to run and
> unlock the osq lock.
>
> So lets also use neet_yield_to() to detect if we need stop the spinning
>
> Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
> ---
> kernel/locking/osq_lock.c | 18 +++++++++++++++---
> 1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
> index 05a3785..4287603 100644
> --- a/kernel/locking/osq_lock.c
> +++ b/kernel/locking/osq_lock.c
> @@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
> return cpu_nr + 1;
> }
>
> +static inline int node_cpu(struct optimistic_spin_node *node)
> +{
> + return node->cpu - 1;
> +}
> +
> static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
> {
> int cpu_nr = encoded_cpu_val - 1;
> @@ -84,9 +89,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
> bool osq_lock(struct optimistic_spin_queue *lock)
> {
> struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
> - struct optimistic_spin_node *prev, *next;
> + struct optimistic_spin_node *prev, *next, *prev_old;
> int curr = encode_cpu(smp_processor_id());
> int old;
> + unsigned int yield_count;
>
> node->locked = 0;
> node->next = NULL;
> @@ -114,14 +120,20 @@ bool osq_lock(struct optimistic_spin_queue *lock)
> * guaranteed their existence -- this allows us to apply
> * cmpxchg in an attempt to undo our queueing.
> */
> -
> + prev_old = prev;
> + yield_count = vcpu_get_yield_count(node_cpu(prev));
> while (!READ_ONCE(node->locked)) {
> /*
> * If we need to reschedule bail... so we can block.
> */
> - if (need_resched())
> + if (need_resched() ||
> + need_yield_to(node_cpu(prev), yield_count))
> goto unqueue;
>
> + prev = READ_ONCE(node->prev);
> + if (prev != prev_old)
> + yield_count = vcpu_get_yield_count(node_cpu(prev));
> +
OK, one mistake ,
should be
if (prev != prev_old) {
prev_old = prev;
yield_count = vcpu_get_yield_count(node_cpu(prev));
}
> cpu_relax_lowlatency();
> }
> return true;
> --
> 2.4.11
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 0/2] implement vcpu preempted check
@ 2016-06-26 10:41 Pan Xinhui
2016-06-26 10:41 ` [PATCH 1/2] kernel/sched: introduce vcpu preempted interface Pan Xinhui
2016-06-26 10:41 ` [PATCH 2/2] locking/osq: Drop the overload of osq_lock() Pan Xinhui
0 siblings, 2 replies; 6+ messages in thread
From: Pan Xinhui @ 2016-06-26 10:41 UTC (permalink / raw)
To: linux-kernel; +Cc: peterz, mingo, boqun.feng, Pan Xinhui
This is to fix some bad issues on an over-commited guest.
test-caes:
perf record -a perf bench sched messaging -g 400 -p && perf report
18.09% sched-messaging [kernel.vmlinux] [k] osq_lock
12.28% sched-messaging [kernel.vmlinux] [k] rwsem_spin_on_owner
5.27% sched-messaging [kernel.vmlinux] [k] mutex_unlock
3.89% sched-messaging [kernel.vmlinux] [k] wait_consider_task
3.64% sched-messaging [kernel.vmlinux] [k] _raw_write_lock_irq
3.41% sched-messaging [kernel.vmlinux] [k] mutex_spin_on_owner.is
2.49% sched-messaging [kernel.vmlinux] [k] system_call
osq takes a long time with preemption disabled which is really bad.
This is because vCPU A hold the osq lock and yield out, vCPU B wait
per_cpu node->locked to be set. IOW, vCPU B wait vCPU A to run and
unlock the osq lock. Even there is need_resched(), it did not help on
such scenario.
we may also need fix other XXX_spin_on_owner later based on this patch set.
these spin_on_onwer variant cause rcu stall.
Pan Xinhui (1):
locking/osq: Drop the overload of osq_lock()
pan xinhui (1):
kernel/sched: introduce vcpu preempted interface
include/linux/sched.h | 34 ++++++++++++++++++++++++++++++++++
kernel/locking/osq_lock.c | 18 +++++++++++++++---
2 files changed, 49 insertions(+), 3 deletions(-)
--
2.4.11
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] kernel/sched: introduce vcpu preempted interface
2016-06-26 10:41 [PATCH 0/2] implement vcpu preempted check Pan Xinhui
@ 2016-06-26 10:41 ` Pan Xinhui
2016-06-27 8:42 ` Peter Zijlstra
2016-06-26 10:41 ` [PATCH 2/2] locking/osq: Drop the overload of osq_lock() Pan Xinhui
1 sibling, 1 reply; 6+ messages in thread
From: Pan Xinhui @ 2016-06-26 10:41 UTC (permalink / raw)
To: linux-kernel; +Cc: peterz, mingo, boqun.feng, pan xinhui
From: pan xinhui <xinhui.pan@linux.vnet.ibm.com>
this supports to fix lock holder preempted issue which run as a guest
two interfaces,
bool vcpu_is_preempted(int cpu);
unsigned int vcpu_get_yield_count(int cpu);
arch may need implement anyone of them.
some spinneris may also need call need_yield_to(int cpu, unsigned int
old_yield_count) to know if it need stop the spinning.
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
include/linux/sched.h | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6e42ada..9c565d2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -3293,6 +3293,40 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu)
#endif /* CONFIG_SMP */
+#ifdef arch_vcpu_is_preempted
+static inline bool vcpu_is_preempted(int cpu)
+{
+ return arch_vcpu_is_preempted(cpu);
+}
+#else
+static inline bool vcpu_is_preempted(int cpu)
+{
+ return 0;
+}
+#endif
+
+#ifdef arch_vcpu_get_yield_count
+static inline unsigned int vcpu_get_yield_count(int cpu)
+{
+ return arch_vcpu_get_yield_count(cpu);
+}
+#else
+static inline unsigned int vcpu_get_yield_count(int cpu)
+{
+ return 0;
+}
+#endif
+
+static inline bool
+need_yield_to(int vcpu, unsigned int old_yield_count)
+{
+ /* if we find the vcpu is preempted,
+ * then we may want to kick it, IOW, yield to it
+ */
+ return vcpu_is_preempted(vcpu) ||
+ (vcpu_get_yield_count(vcpu) != old_yield_count);
+}
+
extern long sched_setaffinity(pid_t pid, const struct cpumask *new_mask);
extern long sched_getaffinity(pid_t pid, struct cpumask *mask);
--
2.4.11
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] locking/osq: Drop the overload of osq_lock()
2016-06-26 10:41 [PATCH 0/2] implement vcpu preempted check Pan Xinhui
2016-06-26 10:41 ` [PATCH 1/2] kernel/sched: introduce vcpu preempted interface Pan Xinhui
@ 2016-06-26 10:41 ` Pan Xinhui
2016-06-26 7:12 ` panxinhui
1 sibling, 1 reply; 6+ messages in thread
From: Pan Xinhui @ 2016-06-26 10:41 UTC (permalink / raw)
To: linux-kernel; +Cc: peterz, mingo, boqun.feng, Pan Xinhui
An over-committed guest with more vCPUs than pCPUs has a heavy overload
in osq_lock().
This is because vCPU A hold the osq lock and yield out, vCPU B wait
per_cpu node->locked to be set. IOW, vCPU B wait vCPU A to run and
unlock the osq lock.
So lets also use neet_yield_to() to detect if we need stop the spinning
Signed-off-by: Pan Xinhui <xinhui.pan@linux.vnet.ibm.com>
---
kernel/locking/osq_lock.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/kernel/locking/osq_lock.c b/kernel/locking/osq_lock.c
index 05a3785..4287603 100644
--- a/kernel/locking/osq_lock.c
+++ b/kernel/locking/osq_lock.c
@@ -21,6 +21,11 @@ static inline int encode_cpu(int cpu_nr)
return cpu_nr + 1;
}
+static inline int node_cpu(struct optimistic_spin_node *node)
+{
+ return node->cpu - 1;
+}
+
static inline struct optimistic_spin_node *decode_cpu(int encoded_cpu_val)
{
int cpu_nr = encoded_cpu_val - 1;
@@ -84,9 +89,10 @@ osq_wait_next(struct optimistic_spin_queue *lock,
bool osq_lock(struct optimistic_spin_queue *lock)
{
struct optimistic_spin_node *node = this_cpu_ptr(&osq_node);
- struct optimistic_spin_node *prev, *next;
+ struct optimistic_spin_node *prev, *next, *prev_old;
int curr = encode_cpu(smp_processor_id());
int old;
+ unsigned int yield_count;
node->locked = 0;
node->next = NULL;
@@ -114,14 +120,20 @@ bool osq_lock(struct optimistic_spin_queue *lock)
* guaranteed their existence -- this allows us to apply
* cmpxchg in an attempt to undo our queueing.
*/
-
+ prev_old = prev;
+ yield_count = vcpu_get_yield_count(node_cpu(prev));
while (!READ_ONCE(node->locked)) {
/*
* If we need to reschedule bail... so we can block.
*/
- if (need_resched())
+ if (need_resched() ||
+ need_yield_to(node_cpu(prev), yield_count))
goto unqueue;
+ prev = READ_ONCE(node->prev);
+ if (prev != prev_old)
+ yield_count = vcpu_get_yield_count(node_cpu(prev));
+
cpu_relax_lowlatency();
}
return true;
--
2.4.11
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] kernel/sched: introduce vcpu preempted interface
2016-06-26 10:41 ` [PATCH 1/2] kernel/sched: introduce vcpu preempted interface Pan Xinhui
@ 2016-06-27 8:42 ` Peter Zijlstra
2016-06-27 9:42 ` xinhui
0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2016-06-27 8:42 UTC (permalink / raw)
To: Pan Xinhui; +Cc: linux-kernel, mingo, boqun.feng
On Sun, Jun 26, 2016 at 06:41:54AM -0400, Pan Xinhui wrote:
> +#ifdef arch_vcpu_is_preempted
> +static inline bool vcpu_is_preempted(int cpu)
> +{
> + return arch_vcpu_is_preempted(cpu);
> +}
> +#else
> +static inline bool vcpu_is_preempted(int cpu)
> +{
> + return 0;
> +}
> +#endif
> +
> +#ifdef arch_vcpu_get_yield_count
> +static inline unsigned int vcpu_get_yield_count(int cpu)
> +{
> + return arch_vcpu_get_yield_count(cpu);
> +}
> +#else
> +static inline unsigned int vcpu_get_yield_count(int cpu)
> +{
> + return 0;
> +}
> +#endif
Please, just do something like:
#ifndef vcpu_is_preempted
static inline bool vcpu_is_preempted(int cpu)
{
return false;
}
#endif
No point in making it more complicated.
> +static inline bool
> +need_yield_to(int vcpu, unsigned int old_yield_count)
namespace... this thing should be called: vcpu_something()
> +{
> + /* if we find the vcpu is preempted,
> + * then we may want to kick it, IOW, yield to it
> + */
> + return vcpu_is_preempted(vcpu) ||
> + (vcpu_get_yield_count(vcpu) != old_yield_count);
> +}
And can you make doubly sure (and mention in the Changelog) that the OSQ
code compiles all this away when using these definitions.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 1/2] kernel/sched: introduce vcpu preempted interface
2016-06-27 8:42 ` Peter Zijlstra
@ 2016-06-27 9:42 ` xinhui
0 siblings, 0 replies; 6+ messages in thread
From: xinhui @ 2016-06-27 9:42 UTC (permalink / raw)
To: Peter Zijlstra; +Cc: linux-kernel, mingo, boqun.feng
On 2016年06月27日 16:42, Peter Zijlstra wrote:
> On Sun, Jun 26, 2016 at 06:41:54AM -0400, Pan Xinhui wrote:
>
>> +#ifdef arch_vcpu_is_preempted
>> +static inline bool vcpu_is_preempted(int cpu)
>> +{
>> + return arch_vcpu_is_preempted(cpu);
>> +}
>> +#else
>> +static inline bool vcpu_is_preempted(int cpu)
>> +{
>> + return 0;
>> +}
>> +#endif
>> +
>> +#ifdef arch_vcpu_get_yield_count
>> +static inline unsigned int vcpu_get_yield_count(int cpu)
>> +{
>> + return arch_vcpu_get_yield_count(cpu);
>> +}
>> +#else
>> +static inline unsigned int vcpu_get_yield_count(int cpu)
>> +{
>> + return 0;
>> +}
>> +#endif
>
>
> Please, just do something like:
>
> #ifndef vcpu_is_preempted
> static inline bool vcpu_is_preempted(int cpu)
> {
> return false;
> }
> #endif
>
> No point in making it more complicated.
>
right, vcpu_is_preempted() is good enough to handle our osq issue.
>> +static inline bool
>> +need_yield_to(int vcpu, unsigned int old_yield_count)
>
> namespace... this thing should be called: vcpu_something()
>
>> +{
>> + /* if we find the vcpu is preempted,
>> + * then we may want to kick it, IOW, yield to it
>> + */
>> + return vcpu_is_preempted(vcpu) ||
>> + (vcpu_get_yield_count(vcpu) != old_yield_count);
>> +}
>
> And can you make doubly sure (and mention in the Changelog) that the OSQ
> code compiles all this away when using these definitions.
>
vimdiff shows the osq_lock.o has a little difference because osq read the yield_count and prev, even they are not used.
however, as you suggest above, IF I remove the vcpu_get_yield_count() and relevant code in osq_lock, then the binary is same.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-06-27 9:43 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-26 10:41 [PATCH 0/2] implement vcpu preempted check Pan Xinhui
2016-06-26 10:41 ` [PATCH 1/2] kernel/sched: introduce vcpu preempted interface Pan Xinhui
2016-06-27 8:42 ` Peter Zijlstra
2016-06-27 9:42 ` xinhui
2016-06-26 10:41 ` [PATCH 2/2] locking/osq: Drop the overload of osq_lock() Pan Xinhui
2016-06-26 7:12 ` panxinhui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).