linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] wait_task_inactive: don't consider task->nivcsw
       [not found] <200807260245.m6Q2jwB4012297@imap1.linux-foundation.org>
@ 2008-07-27 11:37 ` Oleg Nesterov
  2008-07-27 19:55   ` Roland McGrath
  2008-07-27 12:15 ` Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT Oleg Nesterov
  2008-07-27 12:42 ` [PATCH] kthread_bind: use wait_task_inactive(TASK_UNINTERRUPTIBLE) Oleg Nesterov
  2 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2008-07-27 11:37 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, roland, mingo, linux-kernel

If wait_task_inactive() returns success the task was deactivated.
In that case schedule() always increments ->nvcsw which alone can
be used as a "generation counter".

If the next call returns the same number, we can be sure that the
task was unscheduled. Otherwise, because we know that .on_rq == 0
again, ->nvcsw should have been changed in between.

Q: perhaps it is better to do "ncsw = (p->nvcsw << 1) | 1" ? This
decreases the possibility of "was it unscheduled" false positive
when ->nvcsw == 0.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- LINUS/kernel/sched.c~	2008-07-27 14:46:03.000000000 +0400
+++ LINUS/kernel/sched.c	2008-07-27 14:55:12.000000000 +0400
@@ -1922,11 +1922,8 @@ unsigned long wait_task_inactive(struct 
 		running = task_running(rq, p);
 		on_rq = p->se.on_rq;
 		ncsw = 0;
-		if (!match_state || p->state == match_state) {
-			ncsw = p->nivcsw + p->nvcsw;
-			if (unlikely(!ncsw))
-				ncsw = 1;
-		}
+		if (!match_state || p->state == match_state)
+			ncsw = p->nvcsw ?: 1;
 		task_rq_unlock(rq, &flags);
 
 		/*


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT
       [not found] <200807260245.m6Q2jwB4012297@imap1.linux-foundation.org>
  2008-07-27 11:37 ` [PATCH] wait_task_inactive: don't consider task->nivcsw Oleg Nesterov
@ 2008-07-27 12:15 ` Oleg Nesterov
  2008-07-27 16:51   ` Linus Torvalds
  2008-07-27 20:05   ` Roland McGrath
  2008-07-27 12:42 ` [PATCH] kthread_bind: use wait_task_inactive(TASK_UNINTERRUPTIBLE) Oleg Nesterov
  2 siblings, 2 replies; 12+ messages in thread
From: Oleg Nesterov @ 2008-07-27 12:15 UTC (permalink / raw)
  To: akpm; +Cc: torvalds, roland, mingo, linux-kernel

Without CONFIG_SMP wait_task_inactive() is noop, this doesn't look right.
Shouldn't we also take CONFIG_PREEMPT into account?

Not that it really matters, just curious. kthread_bind() itself could be
noop without CONFIG_SMP. ptrace_check_attach() shouldn't have real problems,
but still.


Also, the !SMP version of wait_task_inactive() always returns 1, this
doesn't conform to the comment near kernel/sched.c:wait_task_inactive().

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] kthread_bind: use wait_task_inactive(TASK_UNINTERRUPTIBLE)
       [not found] <200807260245.m6Q2jwB4012297@imap1.linux-foundation.org>
  2008-07-27 11:37 ` [PATCH] wait_task_inactive: don't consider task->nivcsw Oleg Nesterov
  2008-07-27 12:15 ` Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT Oleg Nesterov
@ 2008-07-27 12:42 ` Oleg Nesterov
  2 siblings, 0 replies; 12+ messages in thread
From: Oleg Nesterov @ 2008-07-27 12:42 UTC (permalink / raw)
  To: akpm; +Cc: roland, mingo, linux-kernel

Now that wait_task_inactive(task, state) checks task->state == state,
we can simplify the code and make this debugging check more robust.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

--- LINUS/kernel/kthread.c~	2008-07-27 16:19:27.000000000 +0400
+++ LINUS/kernel/kthread.c	2008-07-27 16:22:17.000000000 +0400
@@ -171,12 +171,11 @@ EXPORT_SYMBOL(kthread_create);
  */
 void kthread_bind(struct task_struct *k, unsigned int cpu)
 {
-	if (k->state != TASK_UNINTERRUPTIBLE) {
+	/* Must have done schedule() in kthread() before we set_task_cpu */
+	if (!wait_task_inactive(k, TASK_UNINTERRUPTIBLE)) {
 		WARN_ON(1);
 		return;
 	}
-	/* Must have done schedule() in kthread() before we set_task_cpu */
-	wait_task_inactive(k, 0);
 	set_task_cpu(k, cpu);
 	k->cpus_allowed = cpumask_of_cpu(cpu);
 	k->rt.nr_cpus_allowed = 1;


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT
  2008-07-27 12:15 ` Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT Oleg Nesterov
@ 2008-07-27 16:51   ` Linus Torvalds
  2008-07-27 20:05   ` Roland McGrath
  1 sibling, 0 replies; 12+ messages in thread
From: Linus Torvalds @ 2008-07-27 16:51 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, roland, mingo, linux-kernel



On Sun, 27 Jul 2008, Oleg Nesterov wrote:
>
> Without CONFIG_SMP wait_task_inactive() is noop, this doesn't look right.
> Shouldn't we also take CONFIG_PREEMPT into account?

The exit path had _better_ be non-preemptable until it hits the final 
schedule that disables it (and that wait_task_inactive() was waiting for). 
At least that used to be the rule.

Of course, historically the only user for this was just the "wait for 
exit to complete" case. And that isn't really true any more, I guess. So I 
dunno.

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] wait_task_inactive: don't consider task->nivcsw
  2008-07-27 11:37 ` [PATCH] wait_task_inactive: don't consider task->nivcsw Oleg Nesterov
@ 2008-07-27 19:55   ` Roland McGrath
  0 siblings, 0 replies; 12+ messages in thread
From: Roland McGrath @ 2008-07-27 19:55 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, torvalds, mingo, linux-kernel

> If wait_task_inactive() returns success the task was deactivated.
> In that case schedule() always increments ->nvcsw which alone can
> be used as a "generation counter".

Thanks, that looks fine to me.

> Q: perhaps it is better to do "ncsw = (p->nvcsw << 1) | 1" ? This
> decreases the possibility of "was it unscheduled" false positive
> when ->nvcsw == 0.

That makes sense.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT
  2008-07-27 12:15 ` Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT Oleg Nesterov
  2008-07-27 16:51   ` Linus Torvalds
@ 2008-07-27 20:05   ` Roland McGrath
  2008-07-28 12:57     ` Oleg Nesterov
  1 sibling, 1 reply; 12+ messages in thread
From: Roland McGrath @ 2008-07-27 20:05 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, torvalds, mingo, linux-kernel

> Without CONFIG_SMP wait_task_inactive() is noop, this doesn't look right.
> Shouldn't we also take CONFIG_PREEMPT into account?

wait_task_inactive is only called when task->state is nonzero (i.e. not
TASK_RUNNING).  Preemption leaves a task in TASK_RUNNING, so a preempted
task shouldn't ever be passed to wait_task_inactive.  I dont see the problem.

> Also, the !SMP version of wait_task_inactive() always returns 1, this
> doesn't conform to the comment near kernel/sched.c:wait_task_inactive().

You mean the "(its total switch count)" part of the comment?
The normative part was only meant to be "a positive number".


Thanks,
Roland

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT
  2008-07-27 20:05   ` Roland McGrath
@ 2008-07-28 12:57     ` Oleg Nesterov
  2008-07-28 23:39       ` Roland McGrath
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2008-07-28 12:57 UTC (permalink / raw)
  To: Roland McGrath; +Cc: akpm, torvalds, mingo, linux-kernel

On 07/27, Roland McGrath wrote:
>
> > Without CONFIG_SMP wait_task_inactive() is noop, this doesn't look right.
> > Shouldn't we also take CONFIG_PREEMPT into account?
>
> wait_task_inactive is only called when task->state is nonzero (i.e. not
> TASK_RUNNING).  Preemption leaves a task in TASK_RUNNING, so a preempted
> task shouldn't ever be passed to wait_task_inactive.

No, schedule() doesn't change prev->state when the task with ->state !=
TASK_RUNNING gets a preemption. Note this check

	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {

in schedule().

Let's suppose the child does ptrace_stop(). It sets state = TASK_TRACED
and unlocks ->siglock.

If it is preempted by the parent which does ptrace_check_attach(),
wait_task_inactive() must wait until the child leaves the runqueue,
but the dummy version just returns success.

sys_ptrace() continues assuming that the child sleeps in TASK_TRACED,
while it fact it is running, despite its ->state == TASK_TRACED.


As I said, nothing realy bad can happen, the child can't return to the
user-space or something, but this just means that ptrace_check_attach()
afaics doesn't have the strong reasons for wait_task_inactive().

> > Also, the !SMP version of wait_task_inactive() always returns 1, this
> > doesn't conform to the comment near kernel/sched.c:wait_task_inactive().
>
> You mean the "(its total switch count)" part of the comment?
> The normative part was only meant to be "a positive number".

I refer to this patch of the comment:

	If a second call a short while later returns the same number, the
	caller can be sure that @p has remained unscheduled the whole time.

The dummy version always returns the same number == 1.


So. I think that wait_task_inactive() needs "defined(SMP) || defined(PREEMPT)"
and the dummy version should return ->nvcsw too.

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT
  2008-07-28 12:57     ` Oleg Nesterov
@ 2008-07-28 23:39       ` Roland McGrath
  2008-07-29 12:21         ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Roland McGrath @ 2008-07-28 23:39 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, torvalds, mingo, linux-kernel

> If it is preempted by the parent which does ptrace_check_attach(),
> wait_task_inactive() must wait until the child leaves the runqueue,
> but the dummy version just returns success.

I see your point.

> sys_ptrace() continues assuming that the child sleeps in TASK_TRACED,
> while it fact it is running, despite its ->state == TASK_TRACED.

For ptrace, the only real expectation has ever been that it's no longer on
the physical CPU, i.e. we are not racing with context switch itself.  On a
uniprocessor, this can of course never happen.

The historical picture is that the preemption issue wasn't thought about
much.  ptrace has always used lock_kernel(), and mostly this implied
disabling preemption anyway (there was CONFIG_PREEMPT_BKL for a while).
So it's moot there.

Even if preemption were an issue for ptrace, it's not a problem.  All that
matters is that the tracee is not going to run any code that changes the
thread machine state ptrace accesses (pt_regs, thread.foo, etc).  If ptrace
gets preempted, the tracee gets switched in, gets switched back out, and
the ptrace-calling thread switched back in, there is no problem.  All the
flutter on the kernel memory ptrace might touch took place during the
context switches themselves, and every byte was back in the same place
between when ptrace got preempted and when it resumed its next instruction.

I can't speak to the kthread case.  I suspect that set_task_cpu() is always
safe on !SMP PREEMPT, and that's why it's fine.  But I'm not really sure.

> I refer to this patch of the comment:
> 
> 	If a second call a short while later returns the same number, the
> 	caller can be sure that @p has remained unscheduled the whole time.
> 
> The dummy version always returns the same number == 1.

Right.  For the general case where this is the contract wait_task_inactive
is expected to meet, it does matter.  I think task_current_syscall() does
want this checked for the preempted uniprocessor case, for example.

> So. I think that wait_task_inactive() needs "defined(SMP) || defined(PREEMPT)"
> and the dummy version should return ->nvcsw too.

Is this what we want?

	#ifdef CONFIG_SMP
	extern unsigned long wait_task_inactive(struct task_struct *, long);
	#else
	static inline unsigned long wait_task_inactive(struct task_struct *p,
						       long match_state)
	{
		unsigned long ret = 0;
		if (match_state) {
			preempt_disable();
			if (p->state == match_state)
				ret = (p->nvcsw << 1) | 1;
			preempt_enable();
		}
		return ret;
	}
	#endif


Thanks,
Roland

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT
  2008-07-28 23:39       ` Roland McGrath
@ 2008-07-29 12:21         ` Oleg Nesterov
  2008-08-01  1:27           ` Roland McGrath
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2008-07-29 12:21 UTC (permalink / raw)
  To: Roland McGrath; +Cc: akpm, torvalds, mingo, linux-kernel

On 07/28, Roland McGrath wrote:
>
> I can't speak to the kthread case.  I suspect that set_task_cpu() is always
> safe on !SMP PREEMPT, and that's why it's fine.

Yes, kthread_bind() is fine, it changes nothing in *k if !SMP.

> > I refer to this patch of the comment:
> > 
> > 	If a second call a short while later returns the same number, the
> > 	caller can be sure that @p has remained unscheduled the whole time.
> > 
> > The dummy version always returns the same number == 1.
> 
> Right.  For the general case where this is the contract wait_task_inactive
> is expected to meet, it does matter.  I think task_current_syscall() does
> want this checked for the preempted uniprocessor case, for example.
> 
> > So. I think that wait_task_inactive() needs "defined(SMP) || defined(PREEMPT)"
> > and the dummy version should return ->nvcsw too.
> 
> Is this what we want?
> 
> 	#ifdef CONFIG_SMP
> 	extern unsigned long wait_task_inactive(struct task_struct *, long);
> 	#else
> 	static inline unsigned long wait_task_inactive(struct task_struct *p,
> 						       long match_state)
> 	{
> 		unsigned long ret = 0;
> 		if (match_state) {
> 			preempt_disable();
> 			if (p->state == match_state)
> 				ret = (p->nvcsw << 1) | 1;
> 			preempt_enable();
> 		}
> 		return ret;
> 	}
> 	#endif

I dont think this is right.

Firstly, the above always fails if match_state == 0, this is not right.

But more importantly, we can't just check ->state == match_state. And
preempt_disable() buys nothing.

Let's look at task_current_syscall(). The "target" can set, say,
TASK_UNINTERRUPTIBLE many times, do a lot of syscalls, and not once
call schedule().

And the task remains fully preemptible even if it runs in
TASK_UNINTERRUPTIBLE state.


Let's suppose we implement kthread_set_nice() in the same manner
as kthread_bind(),

	kthread_set_nice(struct task_struct *k, long nice)
	{
		wait_task_inactive(k, TASK_UNINTERRUPTIBLE);

		... just change ->prio/static_prio ...
	}

the above is ugly of course, but should be correct correct even
with !SMP && PREEMPT.


I think we need

	#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT)
	extern unsigned long wait_task_inactive(struct task_struct *, long);
	#else
	static inline unsigned long wait_task_inactive(struct task_struct *p,
							long match_state)
	{
		if (match_state && p->state != match_state)
			return 0;
		return p->nvcsw | (LONG_MAX + 1); // the same in sched.c
	}

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT
  2008-07-29 12:21         ` Oleg Nesterov
@ 2008-08-01  1:27           ` Roland McGrath
  2008-08-01 11:49             ` Oleg Nesterov
  0 siblings, 1 reply; 12+ messages in thread
From: Roland McGrath @ 2008-08-01  1:27 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, torvalds, mingo, linux-kernel

> I dont think this is right.
> 
> Firstly, the above always fails if match_state == 0, this is not right.

A call with 0 is the "legacy case", where the return value is 0 and nothing
but the traditional wait_task_inactive behavior is expected.  On UP, this
was a nop before and still is.

Anyway, this is moot since we are soon to have no callers that pass 0.

> But more importantly, we can't just check ->state == match_state. And
> preempt_disable() buys nothing.

It ensures that the samples of ->state and ->nvcsw both came while the
target could never have run in between.  Without it, a preemption after the
->state check could mean the ->nvcsw value we use is from a later block in
a different state than the one intended.

> Let's look at task_current_syscall(). The "target" can set, say,
> TASK_UNINTERRUPTIBLE many times, do a lot of syscalls, and not once
> call schedule().
> 
> And the task remains fully preemptible even if it runs in
> TASK_UNINTERRUPTIBLE state.

One of us is missing something basic.  We are on the only CPU.  If target
does *anything*, it means we got preempted, then target switched in, did
things, and then called schedule (including via preemption)--so that we
could possibly be running again now afterwards.  That schedule call bumped
the counter after we sampled it.  The second call done for "is it still
blocked afterwards?" will see a different count and abort.  Am I confused?

Ah, I think it was me who was missing something when I let you talk me into
checking only ->nvcsw.  It really should be ->nivcsw + ->nvcsw as I had it
originally (| LONG_MIN as you've done, a good trick).  That makes what I
just said true in the preemption case.  This bit:

	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {

will not hit, so switch_count = &prev->nivcsw; remains from before.
This is why it was nivcsw + nvcsw to begin with.

What am I missing here?


Thanks,
Roland

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT
  2008-08-01  1:27           ` Roland McGrath
@ 2008-08-01 11:49             ` Oleg Nesterov
  2008-08-01 21:22               ` Roland McGrath
  0 siblings, 1 reply; 12+ messages in thread
From: Oleg Nesterov @ 2008-08-01 11:49 UTC (permalink / raw)
  To: Roland McGrath; +Cc: akpm, torvalds, mingo, linux-kernel, Dmitry Adamushko

On 07/31, Roland McGrath wrote:
>
> > I dont think this is right.
> > 
> > Firstly, the above always fails if match_state == 0, this is not right.
>
> A call with 0 is the "legacy case", where the return value is 0 and nothing
> but the traditional wait_task_inactive behavior is expected.  On UP, this
> was a nop before and still is.
>
> Anyway, this is moot since we are soon to have no callers that pass 0.

This means we can't use wait_task_inactive(p) unless we know p->state
is already != TASK_RUNNING. OK, the current callers assume exactly this.

But perhaps we should change the state checks from "==" to "&", note
that pfm_task_incompatible() checks task_is_stopped_or_traced().

> > But more importantly, we can't just check ->state == match_state. And
> > preempt_disable() buys nothing.
> 
> It ensures that the samples of ->state and ->nvcsw both came while the
> target could never have run in between.  Without it, a preemption after the
> ->state check could mean the ->nvcsw value we use is from a later block in
> a different state than the one intended.

I meant, it buys nothing because the task can set its state == TASK_RUNNING
right after preempt_enable(), because the task can be runnable despite
the fact its state is (say) TASK_UNINTERRUPTIBLE.

Please see below.

> > Let's look at task_current_syscall(). The "target" can set, say,
> > TASK_UNINTERRUPTIBLE many times, do a lot of syscalls, and not once
> > call schedule().
> > 
> > And the task remains fully preemptible even if it runs in
> > TASK_UNINTERRUPTIBLE state.
> 
> One of us is missing something basic.  We are on the only CPU.  If target
> does *anything*, it means we got preempted, then target switched in, did
> things, and then called schedule (including via preemption)--so that we
> could possibly be running again now afterwards.  That schedule call bumped
> the counter after we sampled it.  The second call done for "is it still
> blocked afterwards?" will see a different count and abort.  Am I confused?

I think we misunderstood each other. I never said the _current_ callers have
problems (except the dummy version can't just return 1 of course).

I proposed to synchronize 2 implementations to avoid the possible problems,
and personally I still dislike the fact that !SMP version has the very
different behaviour.

But yes, if we only want to "fix" the current callers, we can make the !SMP
version

	static inline unsigned long wait_task_inactive(struct task_struct *p,
						       long match_state)
	{
		int ret = 0;

		preempt_disable();
		if (!match_state || p->state == match_state)
			ret = (p->nvcsw + p->nivcsw) | LONG_MIN;
		preempt_enable();

		return ret;
	}

> Ah, I think it was me who was missing something when I let you talk me into
> checking only ->nvcsw.  It really should be ->nivcsw + ->nvcsw as I had it
> originally (| LONG_MIN as you've done, a good trick).  That makes what I
> just said true in the preemption case.  This bit:
>
> 	if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
>
> will not hit, so switch_count = &prev->nivcsw; remains from before.
> This is why it was nivcsw + nvcsw to begin with.

This is fine on SMP, afaics.

More precisely, this is fine if wait_task_inactive(p) returns success only
when p->se.on_rq == 0, we shouldn't worry about preemption (nivcsw) at all.

Let's forget about the overflow, and suppose that 2 subsequent calls return
the same ->nvcsw. Every time the task leaves the runqueue (so that !.on_rq
becomes true), shedule() bumps its ->nvcsw. If the task was scheduled in
between we must notice this, because the second success means the task
was deactivate_task()'ed again.

We shouldn't look at nivcsw at all. While the task is deactivated, its
nivcsw/nvcsw can't be changed. If it is scheduled again, it must increment
->nvcsw before wait_task_inactive() can return success.

(Dmitry cc'ed to check my understanding).

Oleg.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT
  2008-08-01 11:49             ` Oleg Nesterov
@ 2008-08-01 21:22               ` Roland McGrath
  0 siblings, 0 replies; 12+ messages in thread
From: Roland McGrath @ 2008-08-01 21:22 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: akpm, torvalds, mingo, linux-kernel, Dmitry Adamushko

> This means we can't use wait_task_inactive(p) unless we know p->state
> is already != TASK_RUNNING. OK, the current callers assume exactly this.

Correct.

> But perhaps we should change the state checks from "==" to "&", note
> that pfm_task_incompatible() checks task_is_stopped_or_traced().

No, the intent is "not changed at all", not "in some acceptable state".  So
the caller should sample ->state, decide that value is ok, and pass that
same value in.  Then wait_task_inactive succeeds only if exactly that value
stayed in place and the target got off the CPU.  (If it came out and went
back into the exact same state before we got into wait_task_inactive, we
don't really care.  The callers have outside means to know that either is
good enough for their purposes, or is impossible.)

> I meant, it buys nothing because the task can set its state == TASK_RUNNING
> right after preempt_enable(), because the task can be runnable despite
> the fact its state is (say) TASK_UNINTERRUPTIBLE.

Right, but we don't care about that.  The predicate really is "says
expected ->state, and is off the CPU", not "... and is not runnable".

> I think we misunderstood each other. I never said the _current_ callers have
> problems (except the dummy version can't just return 1 of course).

Callers like the current ones are the only reasons we have this function.
Its only purpose is to meet those needs.  If those requirements are not as
strong as a "generic" specification of what the function does, then we can
just change the specification.

> I proposed to synchronize 2 implementations to avoid the possible problems,
> and personally I still dislike the fact that !SMP version has the very
> different behaviour.

It's only "very different" if what you call its "behavior" is an abstract
specification based on what the SMP version happens to do, rather than
saying "its behavior" is just to meet the guarantees I set out earlier.
Meeting those guarantees is what the function is for.  So it should do the
cheapest thing to satisfies them in each configuration.

> But yes, if we only want to "fix" the current callers, we can make the !SMP
> version
[...]
> 		int ret = 0;

unsigned long

> 
> 		preempt_disable();
> 		if (!match_state || p->state == match_state)
> 			ret = (p->nvcsw + p->nivcsw) | LONG_MIN;
> 		preempt_enable();

In the !match_state case (nearly moot now anyway), there's no need to set
the return value.  Callers passing 0 never expected a return value.

> This is fine on SMP, afaics.
> 
> More precisely, this is fine if wait_task_inactive(p) returns success only
> when p->se.on_rq == 0, we shouldn't worry about preemption (nivcsw) at all.

Ok.  I was looking at the !SMP PREEMPT context when I said that.  I don't
dispute your rationale about the SMP case; it relies on scheduler innards
that I don't know at all well.


Thanks,
Roland

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-08-01 21:27 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200807260245.m6Q2jwB4012297@imap1.linux-foundation.org>
2008-07-27 11:37 ` [PATCH] wait_task_inactive: don't consider task->nivcsw Oleg Nesterov
2008-07-27 19:55   ` Roland McGrath
2008-07-27 12:15 ` Q: wait_task_inactive() and !CONFIG_SMP && CONFIG_PREEMPT Oleg Nesterov
2008-07-27 16:51   ` Linus Torvalds
2008-07-27 20:05   ` Roland McGrath
2008-07-28 12:57     ` Oleg Nesterov
2008-07-28 23:39       ` Roland McGrath
2008-07-29 12:21         ` Oleg Nesterov
2008-08-01  1:27           ` Roland McGrath
2008-08-01 11:49             ` Oleg Nesterov
2008-08-01 21:22               ` Roland McGrath
2008-07-27 12:42 ` [PATCH] kthread_bind: use wait_task_inactive(TASK_UNINTERRUPTIBLE) Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).