[PATCH] sched: correct testing need_resched in mutex_spin_on

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
@ 2011-06-07 13:41 Hillf Danton
  2011-06-07 13:47 ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Hillf Danton @ 2011-06-07 13:41 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra

It is suppose to check the owner task that is not absolutly running on the
local CPU, and if NEED_RESCHED is happenly set on the current task of local
CPU, we get incorrect result.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
---
 kernel/sched.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index fd18f39..3ea64fe 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4326,7 +4326,7 @@ int mutex_spin_on_owner(struct mutex *lock,
struct task_struct *owner)
 		return 0;

 	while (owner_running(lock, owner)) {
-		if (need_resched())
+		if (test_tsk_need_resched(owner))
 			return 0;

 		arch_mutex_cpu_relax();

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 13:41 [PATCH] sched: correct testing need_resched in mutex_spin_on_owner() Hillf Danton
@ 2011-06-07 13:47 ` Peter Zijlstra
  2011-06-07 13:50   ` Peter Zijlstra
                     ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-07 13:47 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Ingo Molnar

On Tue, 2011-06-07 at 21:41 +0800, Hillf Danton wrote:
> It is suppose to check the owner task that is not absolutly running on the
> local CPU, 

Oh, why do you think so?

> and if NEED_RESCHED is happenly set on the current task of local
> CPU, we get incorrect result.

Only if your above assumption holds, which it doesn't. It explicitly
checks to see if _this_ cpu needs a resched while spinning, if so it
bails the spinning and calls schedule in the lock slow path.

If the owner cpu reschedules, owner will leave the rq and
owner_running() will return false, also breaking the loop.

> 
> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> ---
>  kernel/sched.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/kernel/sched.c b/kernel/sched.c
> index fd18f39..3ea64fe 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4326,7 +4326,7 @@ int mutex_spin_on_owner(struct mutex *lock,
> struct task_struct *owner)
>  		return 0;
> 
>  	while (owner_running(lock, owner)) {
> -		if (need_resched())
> +		if (test_tsk_need_resched(owner))
>  			return 0;
> 
>  		arch_mutex_cpu_relax();


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 13:47 ` Peter Zijlstra
@ 2011-06-07 13:50   ` Peter Zijlstra
  2011-06-07 13:59   ` Peter Zijlstra
  2011-06-07 14:10   ` Hillf Danton
  2 siblings, 0 replies; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-07 13:50 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Ingo Molnar

On Tue, 2011-06-07 at 15:47 +0200, Peter Zijlstra wrote:
> Only if your above assumption holds, which it doesn't. It explicitly
> checks to see if _this_ cpu needs a resched while spinning, if so it
> bails the spinning and calls schedule in the lock slow path.
> 
> If the owner cpu reschedules, owner will leave the rq and

s/rq/cpu/

> owner_running() will return false, also breaking the loop. 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 13:47 ` Peter Zijlstra
  2011-06-07 13:50   ` Peter Zijlstra
@ 2011-06-07 13:59   ` Peter Zijlstra
  2011-06-07 14:14     ` Hillf Danton
  2011-06-07 14:10   ` Hillf Danton
  2 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-07 13:59 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Ingo Molnar

On Tue, 2011-06-07 at 15:47 +0200, Peter Zijlstra wrote:
> On Tue, 2011-06-07 at 21:41 +0800, Hillf Danton wrote:
> > It is suppose to check the owner task that is not absolutly running on the
> > local CPU, 
> 
> Oh, why do you think so?
> 
> > and if NEED_RESCHED is happenly set on the current task of local
> > CPU, we get incorrect result.
> 
> Only if your above assumption holds, which it doesn't. It explicitly
> checks to see if _this_ cpu needs a resched while spinning, if so it
> bails the spinning and calls schedule in the lock slow path.
> 
> If the owner cpu reschedules, owner will leave the rq and
> owner_running() will return false, also breaking the loop.
> 
> > 
> > Signed-off-by: Hillf Danton <dhillf@gmail.com>
> > ---
> >  kernel/sched.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/kernel/sched.c b/kernel/sched.c
> > index fd18f39..3ea64fe 100644
> > --- a/kernel/sched.c
> > +++ b/kernel/sched.c
> > @@ -4326,7 +4326,7 @@ int mutex_spin_on_owner(struct mutex *lock,
> > struct task_struct *owner)
> >  		return 0;
> > 
> >  	while (owner_running(lock, owner)) {
> > -		if (need_resched())
> > +		if (test_tsk_need_resched(owner))
> >  			return 0;
> > 

Furthermore, that can crash the machine, as there's no guarantee owner
is a sane pointer at this point.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 13:47 ` Peter Zijlstra
  2011-06-07 13:50   ` Peter Zijlstra
  2011-06-07 13:59   ` Peter Zijlstra
@ 2011-06-07 14:10   ` Hillf Danton
  2011-06-07 14:24     ` Peter Zijlstra
  2 siblings, 1 reply; 15+ messages in thread
From: Hillf Danton @ 2011-06-07 14:10 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar

On Tue, Jun 7, 2011 at 9:47 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2011-06-07 at 21:41 +0800, Hillf Danton wrote:
>> It is suppose to check the owner task that is not absolutly running on the
>> local CPU,
>
> Oh, why do you think so?
>
as the comment in __mutex_lock_common() says,

#ifdef CONFIG_MUTEX_SPIN_ON_OWNER
	/*
	 * Optimistic spinning.
	 *
	 * We try to spin for acquisition when we find that there are no
	 * pending waiters and the lock owner is currently running on a
	 ***************************
         * (different) CPU.
	 ***************************
         * .....
         */

>> and if NEED_RESCHED is happenly set on the current task of local
>> CPU, we get incorrect result.
>
> Only if your above assumption holds, which it doesn't. It explicitly
> checks to see if _this_ cpu needs a resched while spinning, if so it
> bails the spinning and calls schedule in the lock slow path.
>
> If the owner cpu reschedules, owner will leave the rq and
> owner_running() will return false, also breaking the loop.
>

need_resched is checked after true is returned by owner_running(),
in other words, owner is still on its CPU, so owner should be check
here. Even ower's CPU == this CPU, checking owner also gives
correct result.

thanks
           Hillf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 13:59   ` Peter Zijlstra
@ 2011-06-07 14:14     ` Hillf Danton
  0 siblings, 0 replies; 15+ messages in thread
From: Hillf Danton @ 2011-06-07 14:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar

On Tue, Jun 7, 2011 at 9:59 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2011-06-07 at 15:47 +0200, Peter Zijlstra wrote:
>> On Tue, 2011-06-07 at 21:41 +0800, Hillf Danton wrote:
>> > It is suppose to check the owner task that is not absolutly running on the
>> > local CPU,
>>
>> Oh, why do you think so?
>>
>> > and if NEED_RESCHED is happenly set on the current task of local
>> > CPU, we get incorrect result.
>>
>> Only if your above assumption holds, which it doesn't. It explicitly
>> checks to see if _this_ cpu needs a resched while spinning, if so it
>> bails the spinning and calls schedule in the lock slow path.
>>
>> If the owner cpu reschedules, owner will leave the rq and
>> owner_running() will return false, also breaking the loop.
>>
>> >
>> > Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> > ---
>> >  kernel/sched.c |    2 +-
>> >  1 files changed, 1 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/kernel/sched.c b/kernel/sched.c
>> > index fd18f39..3ea64fe 100644
>> > --- a/kernel/sched.c
>> > +++ b/kernel/sched.c
>> > @@ -4326,7 +4326,7 @@ int mutex_spin_on_owner(struct mutex *lock,
>> > struct task_struct *owner)
>> >             return 0;
>> >
>> >     while (owner_running(lock, owner)) {
>> > -           if (need_resched())
>> > +           if (test_tsk_need_resched(owner))
>> >                     return 0;
>> >
>
> Furthermore, that can crash the machine, as there's no guarantee owner
> is a sane pointer at this point.
>

Well, why is owner_running looped safely?

thanks
           Hillf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 14:10   ` Hillf Danton
@ 2011-06-07 14:24     ` Peter Zijlstra
  2011-06-07 14:36       ` Hillf Danton
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-07 14:24 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Ingo Molnar

On Tue, 2011-06-07 at 22:10 +0800, Hillf Danton wrote:
> On Tue, Jun 7, 2011 at 9:47 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, 2011-06-07 at 21:41 +0800, Hillf Danton wrote:
> >> It is suppose to check the owner task that is not absolutly running on the
> >> local CPU,
> >
> > Oh, why do you think so?
> >
> as the comment in __mutex_lock_common() says,
> 
> #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
> 	/*
> 	 * Optimistic spinning.
> 	 *
> 	 * We try to spin for acquisition when we find that there are no
> 	 * pending waiters and the lock owner is currently running on a
> 	 ***************************
>          * (different) CPU.
> 	 ***************************
>          * .....
>          */

Which is exactly what owner_running() does, clearly it cannot run on the
current cpu, since then we wouldn't be running to check things, so for
as long as owner is on_cpu we spin.

However,

> need_resched is checked after true is returned by owner_running(),
> in other words, owner is still on its CPU, so owner should be check
> here. Even ower's CPU == this CPU, checking owner also gives
> correct result.

no, after rcu_read_unlock() in owner_running() -- read that comment
again -- the owner pointer can be free or reused memory.

Also, since we already check owner_running() this need_resched() is
clearly something different. By your argument it would be superfluous
not wrong.

Now since it appears superfluous, ask yourself why it would exist. The
answer is simple, we should not be greedy and consume our cpu when
there's someone else that wants to run, we should yield the cpu on
need_resched(), which is exactly what happens.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 14:24     ` Peter Zijlstra
@ 2011-06-07 14:36       ` Hillf Danton
  2011-06-07 14:40         ` Peter Zijlstra
  0 siblings, 1 reply; 15+ messages in thread
From: Hillf Danton @ 2011-06-07 14:36 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar

On Tue, Jun 7, 2011 at 10:24 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2011-06-07 at 22:10 +0800, Hillf Danton wrote:
>> On Tue, Jun 7, 2011 at 9:47 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, 2011-06-07 at 21:41 +0800, Hillf Danton wrote:
>> >> It is suppose to check the owner task that is not absolutly running on the
>> >> local CPU,
>> >
>> > Oh, why do you think so?
>> >
>> as the comment in __mutex_lock_common() says,
>>
>> #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
>>       /*
>>        * Optimistic spinning.
>>        *
>>        * We try to spin for acquisition when we find that there are no
>>        * pending waiters and the lock owner is currently running on a
>>        ***************************
>>          * (different) CPU.
>>        ***************************
>>          * .....
>>          */
>
> Which is exactly what owner_running() does, clearly it cannot run on the
> current cpu, since then we wouldn't be running to check things, so for
> as long as owner is on_cpu we spin.
>
> However,
>
>> need_resched is checked after true is returned by owner_running(),
>> in other words, owner is still on its CPU, so owner should be check
>> here. Even ower's CPU == this CPU, checking owner also gives
>> correct result.
>
> no, after rcu_read_unlock() in owner_running() -- read that comment
> again -- the owner pointer can be free or reused memory.
>
> Also, since we already check owner_running() this need_resched() is
> clearly something different. By your argument it would be superfluous
> not wrong.
>
> Now since it appears superfluous, ask yourself why it would exist. The
> answer is simple, we should not be greedy and consume our cpu when
> there's someone else that wants to run, we should yield the cpu on
> need_resched(), which is exactly what happens.
>
If you are right, the following comment also in __mutex_lock_common()

	for (;;) {
		struct task_struct *owner;

		/*
		 * If there's an owner, wait for it to either
		 * release the lock or go to sleep.
		 */
		owner = ACCESS_ONCE(lock->owner);
		if (owner && !mutex_spin_on_owner(lock, owner))
			break;

looks misleading too, but if owner is on this CPU, for what does we wait?

thanks
           Hillf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 14:36       ` Hillf Danton
@ 2011-06-07 14:40         ` Peter Zijlstra
  2011-06-07 14:47           ` Hillf Danton
  0 siblings, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-07 14:40 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Ingo Molnar

On Tue, 2011-06-07 at 22:36 +0800, Hillf Danton wrote:

> If you are right, the following comment also in __mutex_lock_common()
> 
> 	for (;;) {
> 		struct task_struct *owner;
> 
> 		/*
> 		 * If there's an owner, wait for it to either
> 		 * release the lock or go to sleep.
> 		 */
> 		owner = ACCESS_ONCE(lock->owner);
> 		if (owner && !mutex_spin_on_owner(lock, owner))
> 			break;
> 
> looks misleading too, but if owner is on this CPU, for what does we wait?

huh, wtf!? it cannot be on this cpu, if it was we wouldn't be running
the above code but whatever owner was doing.

So my argument was, it cannot be on this cpu, therefore, by checking it
is on a cpu, we check its on a different cpu.

And I really don't see how any of that is related to the above.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 14:40         ` Peter Zijlstra
@ 2011-06-07 14:47           ` Hillf Danton
  2011-06-07 15:08             ` Steven Rostedt
  2011-06-07 15:20             ` Peter Zijlstra
  0 siblings, 2 replies; 15+ messages in thread
From: Hillf Danton @ 2011-06-07 14:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar

On Tue, Jun 7, 2011 at 10:40 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2011-06-07 at 22:36 +0800, Hillf Danton wrote:
>
>> If you are right, the following comment also in __mutex_lock_common()
>>
>>       for (;;) {
>>               struct task_struct *owner;
>>
>>               /*
>>                * If there's an owner, wait for it to either
>>                * release the lock or go to sleep.
>>                */
>>               owner = ACCESS_ONCE(lock->owner);
>>               if (owner && !mutex_spin_on_owner(lock, owner))
>>                       break;
>>
>> looks misleading too, but if owner is on this CPU, for what does we wait?
>
> huh, wtf!? it cannot be on this cpu, if it was we wouldn't be running
> the above code but whatever owner was doing.
>
> So my argument was, it cannot be on this cpu, therefore, by checking it
> is on a cpu, we check its on a different cpu.
>
> And I really don't see how any of that is related to the above.
>
Oh, it looks you are willing to rethink about testing need_resched?

thanks
           Hillf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 14:47           ` Hillf Danton
@ 2011-06-07 15:08             ` Steven Rostedt
  2011-06-07 18:22               ` Hillf Danton
  2011-06-07 15:20             ` Peter Zijlstra
  1 sibling, 1 reply; 15+ messages in thread
From: Steven Rostedt @ 2011-06-07 15:08 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Peter Zijlstra, LKML, Ingo Molnar

On Tue, Jun 07, 2011 at 10:47:21PM +0800, Hillf Danton wrote:
> On Tue, Jun 7, 2011 at 10:40 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, 2011-06-07 at 22:36 +0800, Hillf Danton wrote:
> >
> >> If you are right, the following comment also in __mutex_lock_common()
> >>
> >>       for (;;) {
> >>               struct task_struct *owner;
> >>
> >>               /*
> >>                * If there's an owner, wait for it to either
> >>                * release the lock or go to sleep.
> >>                */
> >>               owner = ACCESS_ONCE(lock->owner);
> >>               if (owner && !mutex_spin_on_owner(lock, owner))
> >>                       break;
> >>
> >> looks misleading too, but if owner is on this CPU, for what does we wait?
> >
> > huh, wtf!? it cannot be on this cpu, if it was we wouldn't be running
> > the above code but whatever owner was doing.
> >
> > So my argument was, it cannot be on this cpu, therefore, by checking it
> > is on a cpu, we check its on a different cpu.
> >
> > And I really don't see how any of that is related to the above.
> >
> Oh, it looks you are willing to rethink about testing need_resched?

NO!

Hillf, reading this thread I understand that you have no idea of what is
happening here. You are totally clueless. Let me give you a clue.

The idea of an adaptive mutex is quite simple. If a mutex is held for a
short period of time, and on another CPU a task tries to grab the same
mutex, it would be slower to have that task schedule out than to just
spin waiting for that mutex to be released.

Ideally, this mutex would then act like a spin lock, and things would
run very efficiently. But, as it is not a spin lock, there's things we
must also be aware of.

1) If the owner we are waiting for to release the mutex schedules out.

2) If another task wants us to run on the CPU of the waiter that is
spinning.

3) If the waiter that is spinning runs out of its time quantum and
should yield to another task.

#1 is checked by the owner_running(). If that fails, then we should not
spin anymore and its good to schedule. BTW, the answer why
owner_running() is safe to check owner, is because it grabs rcu_locks()
to make sure it is safe.

#2 and #3 are checked by testing NEED_RESCHED on the waiter (NOT THE
OWNER!).  If the waiter that is spinning should be preempted, the waiter
(not the owner) will have its NEED_RESCHED bit set. That is why the
waiter's NEED_RESCHED flag is checked and not owners.

And also we don't need to check need_resched() in the above code because
the loop checks it anyway, or we break out of the loop.

Understand?

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 14:47           ` Hillf Danton
  2011-06-07 15:08             ` Steven Rostedt
@ 2011-06-07 15:20             ` Peter Zijlstra
  2011-06-07 15:43               ` Steven Rostedt
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2011-06-07 15:20 UTC (permalink / raw)
  To: Hillf Danton; +Cc: LKML, Ingo Molnar

On Tue, 2011-06-07 at 22:47 +0800, Hillf Danton wrote:
> Oh, it looks you are willing to rethink about testing need_resched?

Dude, however did you come up with that deduction?

mutex_spin_on_owner() does two things:

 - it validates that owner is in fact still running (if so it must be on
another cpu, since we're running on this one).

 - it ensures we play nice and reschedule when we need to, so we don't
hog our cpu.

Testing TIF_NEED_RESCHED on owner like you propose is wrong because:

 - owner is not a stable pointer you can dereference, see
owner_running(), you first need to validate that its still a valid
pointer and then keep it valid while dereferencing it.

 - if owner were to reschedule, it would leave the cpu and we'd break
out of the loop anyway by means of owner_running() failing, so its
superfluous.

Please, get a grip on reality and stop sending endless streams of
patches based on wrong assumptions and mis-understandings.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 15:20             ` Peter Zijlstra
@ 2011-06-07 15:43               ` Steven Rostedt
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2011-06-07 15:43 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Hillf Danton, LKML, Ingo Molnar

On Tue, Jun 07, 2011 at 05:20:40PM +0200, Peter Zijlstra wrote:
> 
> Please, get a grip on reality and stop sending endless streams of
> patches based on wrong assumptions and mis-understandings.

Hillf,

Some of your patches are good, but others are frivolous. It becomes a
burden on the maintainer to figure out which is which.

I've asked you to please back up your patches with tests and/or
benchmarks that clearly demonstrate what you are showing. I have yet to
see any of that. If you have trouble coming up with test cases it may be
because your assumptions about what is happening is incorrect.

If you just randomly send out a bunch of patches where the noise/signal
ratio is high, you will soon just be ignored. I don't want this
happening, as I stated, some of your patches are good. But when you
start to fight back and it becomes clear that you totally do not
understand what you are talking about, it becomes hard to take you
seriously.

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 15:08             ` Steven Rostedt
@ 2011-06-07 18:22               ` Hillf Danton
  2011-06-07 18:34                 ` Steven Rostedt
  0 siblings, 1 reply; 15+ messages in thread
From: Hillf Danton @ 2011-06-07 18:22 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: Peter Zijlstra, LKML, Ingo Molnar

On Tue, Jun 7, 2011 at 11:08 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Tue, Jun 07, 2011 at 10:47:21PM +0800, Hillf Danton wrote:
>> On Tue, Jun 7, 2011 at 10:40 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, 2011-06-07 at 22:36 +0800, Hillf Danton wrote:
>> >
>> >> If you are right, the following comment also in __mutex_lock_common()
>> >>
>> >>       for (;;) {
>> >>               struct task_struct *owner;
>> >>
>> >>               /*
>> >>                * If there's an owner, wait for it to either
>> >>                * release the lock or go to sleep.
>> >>                */
>> >>               owner = ACCESS_ONCE(lock->owner);
>> >>               if (owner && !mutex_spin_on_owner(lock, owner))
>> >>                       break;
>> >>
>> >> looks misleading too, but if owner is on this CPU, for what does we wait?
>> >
>> > huh, wtf!? it cannot be on this cpu, if it was we wouldn't be running
>> > the above code but whatever owner was doing.
>> >
>> > So my argument was, it cannot be on this cpu, therefore, by checking it
>> > is on a cpu, we check its on a different cpu.
>> >
>> > And I really don't see how any of that is related to the above.
>> >
>> Oh, it looks you are willing to rethink about testing need_resched?
>
> NO!
>
> Hillf, reading this thread I understand that you have no idea of what is
> happening here. You are totally clueless. Let me give you a clue.
>
> The idea of an adaptive mutex is quite simple. If a mutex is held for a
> short period of time, and on another CPU a task tries to grab the same
> mutex, it would be slower to have that task schedule out than to just
> spin waiting for that mutex to be released.
>
> Ideally, this mutex would then act like a spin lock, and things would
> run very efficiently. But, as it is not a spin lock, there's things we
> must also be aware of.
>
> 1) If the owner we are waiting for to release the mutex schedules out.
>
> 2) If another task wants us to run on the CPU of the waiter that is
> spinning.
>
> 3) If the waiter that is spinning runs out of its time quantum and
> should yield to another task.
>
> #1 is checked by the owner_running(). If that fails, then we should not
> spin anymore and its good to schedule. BTW, the answer why
> owner_running() is safe to check owner, is because it grabs rcu_locks()
> to make sure it is safe.
>
> #2 and #3 are checked by testing NEED_RESCHED on the waiter (NOT THE
> OWNER!).  If the waiter that is spinning should be preempted, the waiter
> (not the owner) will have its NEED_RESCHED bit set. That is why the
> waiter's NEED_RESCHED flag is checked and not owners.
>
> And also we don't need to check need_resched() in the above code because
> the loop checks it anyway, or we break out of the loop.
>
> Understand?
>

After hours chewing __mutex_lock_common() and mutex_spin_on_owner(),
and what you and Peter said, the owner is on different CPU from the waiter,
and the patched need_resched is simply meaning the waiter is no longer
willing to wait for whatever reasons.

It is really unlucky tonight:(

Hillf

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] sched: correct testing need_resched in mutex_spin_on_owner()
  2011-06-07 18:22               ` Hillf Danton
@ 2011-06-07 18:34                 ` Steven Rostedt
  0 siblings, 0 replies; 15+ messages in thread
From: Steven Rostedt @ 2011-06-07 18:34 UTC (permalink / raw)
  To: Hillf Danton; +Cc: Peter Zijlstra, LKML, Ingo Molnar

On Wed, 2011-06-08 at 02:22 +0800, Hillf Danton wrote:

> After hours chewing __mutex_lock_common() and mutex_spin_on_owner(),
> and what you and Peter said, the owner is on different CPU from the waiter,
> and the patched need_resched is simply meaning the waiter is no longer
> willing to wait for whatever reasons.
> 
> It is really unlucky tonight:(

Please don't get discouraged.

We just want you to spend a bit more time looking at the code. Do what I
do when something looks wrong. Try to figure out what the author was
doing and why it may actually be correct. You may find out that the code
is indeed correct.

When I see code that looks wrong, I first think my assumptions are
incorrect, and I try to prove myself wrong. Only when I fail to do that,
do I send a patch or bring it up to the author. And then, I do it in
such a way that the author of the code may still have the ability to
prove me wrong. And several times, an author will tell me something that
I overlooked, and the original code stays as is.

Don't take it personally, this is just a learning phase. ;)

-- Steve

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-06-07 18:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-07 13:41 [PATCH] sched: correct testing need_resched in mutex_spin_on_owner() Hillf Danton
2011-06-07 13:47 ` Peter Zijlstra
2011-06-07 13:50   ` Peter Zijlstra
2011-06-07 13:59   ` Peter Zijlstra
2011-06-07 14:14     ` Hillf Danton
2011-06-07 14:10   ` Hillf Danton
2011-06-07 14:24     ` Peter Zijlstra
2011-06-07 14:36       ` Hillf Danton
2011-06-07 14:40         ` Peter Zijlstra
2011-06-07 14:47           ` Hillf Danton
2011-06-07 15:08             ` Steven Rostedt
2011-06-07 18:22               ` Hillf Danton
2011-06-07 18:34                 ` Steven Rostedt
2011-06-07 15:20             ` Peter Zijlstra
2011-06-07 15:43               ` Steven Rostedt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).