All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] do_wait fix for 2.6.10-rc1
@ 2004-11-05  9:57 Sripathi Kodi
  2004-11-05 10:03 ` Ingo Molnar
       [not found] ` <Pine.LNX.4.58.0411051101500.30457@ppc970.osdl.org>
  0 siblings, 2 replies; 12+ messages in thread
From: Sripathi Kodi @ 2004-11-05  9:57 UTC (permalink / raw)
  To: linux-kernel; +Cc: akpm

Here is a patch that fixes a condition that can make a thread get stuck
forever in do_wait unless manually interrupted. 

I have provided a testcase that can recreate this problem while running
with NPTL on a multi-processor machine. I have seen this on x86 and PPC64. 
The condition happens when two threads race in waitpid to collect exit 
status of the same child. Sometimes one of the threads gets permanently 
stuck in waitpid, instead of returning ECHILD.

In do_wait we use the variable 'flag' to indicate if there are eligible 
children to collect. But we don't re-initialize it to 0 at the beginning 
of the do-while loop, which makes the thread always believe that there are 
eligible children in it's second run through the loop. It calls schedule, 
but there won't be anyone to wake it up. This can happen when two threads 
enter do-wait simultaneously. We need to re-initialize 'flag' to zero 
every time we loop through the do-while to solve this problem.

Thanks, 
Sripathi.

A simple testcase can demonstrate this problem. Patch follows the testcase.

#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <signal.h>
#include <limits.h>
#include <sys/wait.h>

int status_freq = 0;
int child_limit = INT_MAX;

void
catch_sigchld(int signo)
{
    pid_t wait_pid;
    int  status;

    do {
        wait_pid = waitpid((pid_t) -1, &status, WNOHANG);
    } while (wait_pid > 0);

    return;
}

void *
thread_rtn(void *arg)
{
    int  rc, child_cnt, status;
    int pfds[2];
    pid_t child_pid, wait_pid;
    char output[80];
    ssize_t bytes_read;
    sigset_t sigmask;

        if ((rc = sigemptyset(&sigmask)) != 0) {
            fprintf(stderr, "sigemptyset() failed: %d.\n", errno);
            exit(1);
        }

        if ((rc = sigaddset(&sigmask, SIGCHLD)) != 0) {
            fprintf(stderr, "sigaddset() failed: %d.\n", errno);
            exit(1);
        }

        if ((rc = pthread_sigmask(SIG_SETMASK, &sigmask, NULL)) != 0) {
            fprintf(stderr, "Error setting thread signal mask: %d.\n", rc);
            exit(1);
        }

    for (child_cnt = 1; child_cnt <= child_limit; child_cnt++) {

        if ((rc = pipe(pfds)) != 0) {
            fprintf(stderr, "Error calling pipe(): %d.\n", errno);
            exit(1);
        }

        child_pid = fork();
        if (child_pid < 0) {
            fprintf(stderr, "Error calling fork(): %d.\n", errno);
            exit(1);
        }

        if (child_pid == 0) {               /* This code run in the child    */
            close(pfds[0]);                 /* Close read end of pipe        */
            dup2(pfds[1], STDOUT_FILENO);   /* Make stdout write end of pipe */
            close(pfds[1]);                 /* Close duplicate write fd      */
            execlp("date", "date", NULL);   /* Execute the "date" program    */
            exit(1);                        /* It's an error to reach here   */
        }

        do {
            rc = close(pfds[1]);
        } while (rc == -1 && errno == EINTR);

        do {
            bytes_read = read(pfds[0], output, sizeof output);
        } while (bytes_read > 0 || (bytes_read == -1 && errno == EINTR));

        do {
            rc = close(pfds[0]);
        } while (rc == -1 && errno == EINTR);

        fprintf(stderr, "calling waitpid() for %d.\n", child_pid);

        do {
            wait_pid = waitpid(child_pid, &status, 0);
        } while (wait_pid == -1 && errno == EINTR);

        if (status_freq > 0 && (child_cnt % status_freq) == 0) {
            fprintf(stderr, "Child count: %d.\n", child_cnt);
        }
    }

    exit(0);
    return NULL;
}

int
main(int argc, char **argv)
{
    struct sigaction sa;
    pthread_t thread_id;
    int flag, rc;

    status_freq = 1000;
    child_limit = 10000;

    sa.sa_handler = catch_sigchld;
    sigemptyset(&sa.sa_mask);
    sa.sa_flags = 0;

    rc = sigaction(SIGCHLD, &sa, NULL);
    if (rc != 0) {
        fprintf(stderr, "sigaction() failed: %d.\n", errno);
        exit(1);
    }

    if ((rc = pthread_create(&thread_id, NULL, thread_rtn, NULL)) != 0) {
        fprintf(stderr, "Error creating thread: %d.\n", rc);
        exit(1);
    }

    for (;;) {
        sleep(10000);
    }

    return 0;
}

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>

--- linux-2.6.10-rc1/kernel/exit.c	2004-11-05 16:08:39.458442280 +0530
+++ ./exit-rc1.c	2004-11-05 16:09:47.017171792 +0530
@@ -1306,11 +1306,10 @@ static long do_wait(pid_t pid, int optio
 {
 	DECLARE_WAITQUEUE(wait, current);
 	struct task_struct *tsk;
-	int flag, retval;
+	int flag = 0, retval;
 
 	add_wait_queue(&current->wait_chldexit,&wait);
 repeat:
-	flag = 0;
 	current->state = TASK_INTERRUPTIBLE;
 	read_lock(&tasklist_lock);
 	tsk = current;
@@ -1319,6 +1318,7 @@ repeat:
 		struct list_head *_p;
 		int ret;
 
+		flag = 0;
 		list_for_each(_p,&tsk->children) {
 			p = list_entry(_p,struct task_struct,sibling);


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-05  9:57 [PATCH] do_wait fix for 2.6.10-rc1 Sripathi Kodi
@ 2004-11-05 10:03 ` Ingo Molnar
       [not found] ` <Pine.LNX.4.58.0411051101500.30457@ppc970.osdl.org>
  1 sibling, 0 replies; 12+ messages in thread
From: Ingo Molnar @ 2004-11-05 10:03 UTC (permalink / raw)
  To: Sripathi Kodi; +Cc: linux-kernel, akpm


* Sripathi Kodi <sripathik@in.ibm.com> wrote:

> Here is a patch that fixes a condition that can make a thread get
> stuck forever in do_wait unless manually interrupted. 

ahh ... very nice catch!

	Ingo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
       [not found] ` <Pine.LNX.4.58.0411051101500.30457@ppc970.osdl.org>
@ 2004-11-08 14:27   ` Sripathi Kodi
  2004-11-08 16:01     ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Sripathi Kodi @ 2004-11-08 14:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, Roland McGrath, Ingo Molnar, Andrew Morton, dino

On Fri, Nov 05, 2004 at 11:17:44AM -0800, Linus Torvalds wrote:

>> I think the real fix is to notice when we have dropped the tasklist_lock 
>> inside the loop, and _not_ re-schedule in that case, but just repeat the 
>> loop from the top.
>> 
>> And that's easy enough to do: set current->state to TASK_RUNNING in the
>> cases where we might have raced with somebody else. That will cause the
>> schedule() to be a no-op.
>> 
>> We could also choose to just wake up all our siblings "child_wait" lists
>> when we reap a child ourselves. They likely got woken up _anyway_ when the
>> child died in the first place, after all. For extra bonus points, make the
>> child_wait thing use the self-removing waitqueue entries, ie use
>> "prepare_to_wait()" instead of add_wait_queue(), and move it after the
>> "repeat:" thing.
>> 
>> 		Linus
>  
>

Linus,

Thanks for your suggestions. I have attached the re-done patch. I have 
implemented your first suggestion because it was much easier. I hope it 
looks better now.

Thanks and regards,
Sripathi Kodi.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>

--- linux-2.6.10-rc1/kernel/exit.c	2004-11-08 23:38:17.358375128 +0530
+++ /home/sripathi/12013/patch/take2/exit.c	2004-11-08 23:33:44.973783880 +0530
@@ -1345,8 +1345,10 @@ repeat:
 				break;
 			default:
 			// case EXIT_DEAD:
-				if (p->exit_state == EXIT_DEAD)
-					continue;
+				if (p->exit_state == EXIT_DEAD) {
+					current->state = TASK_RUNNING;
+					break;
+				}
 			// case EXIT_ZOMBIE:
 				if (p->exit_state == EXIT_ZOMBIE) {
 					/*
@@ -1363,6 +1365,7 @@ repeat:
 					/* He released the lock.  */
 					if (retval != 0)
 						goto end;
+					current->state = TASK_RUNNING;
 					break;
 				}
 check_continued:



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-08 14:27   ` Sripathi Kodi
@ 2004-11-08 16:01     ` Linus Torvalds
  2004-11-08 16:13       ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2004-11-08 16:01 UTC (permalink / raw)
  To: Sripathi Kodi
  Cc: linux-kernel, Roland McGrath, Ingo Molnar, Andrew Morton, dino



On Mon, 8 Nov 2004, Sripathi Kodi wrote:
> 
> Thanks for your suggestions. I have attached the re-done patch. I have 
> implemented your first suggestion because it was much easier. I hope it 
> looks better now.

Hmm.. I'm a grumpy old man, and I'm not happy yet. Feel free to explain to 
me why I'm wrong, though - I don't mind.

> --- linux-2.6.10-rc1/kernel/exit.c	2004-11-08 23:38:17.358375128 +0530
> +++ /home/sripathi/12013/patch/take2/exit.c	2004-11-08 23:33:44.973783880 +0530
> @@ -1345,8 +1345,10 @@ repeat:
>  				break;
>  			default:
>  			// case EXIT_DEAD:
> -				if (p->exit_state == EXIT_DEAD)
> -					continue;
> +				if (p->exit_state == EXIT_DEAD) {
> +					current->state = TASK_RUNNING;
> +					break;
> +				}

Why here? We haven't actually released the lock, so I don't see why we'd 
consider this a special case..

That said, I wonder if we should consider this kind of task non-eligible. 
It may well be a bug to consider a EXIT_DEAD child to be eligible for 
waiting, since we can't actually ever reap it any more. So maybe you found 
a bug, but the fix might be to check this case in "eligible_child()"?

Roland?

>  			// case EXIT_ZOMBIE:
>  				if (p->exit_state == EXIT_ZOMBIE) {
>  					/*
> @@ -1363,6 +1365,7 @@ repeat:
>  					/* He released the lock.  */
>  					if (retval != 0)
>  						goto end;
> +					current->state = TASK_RUNNING;
>  					break;
>  				}

There are cases where 'wait_task_zombie()' will return 0 _without_ 
releasing the lock, and you now made those be busy-loops (with a 
"schedule()", so the machine still works, it just eats CPU time like mad).

I think the _real_ case is the "wait_noreap_copyout()", case, but that 
already returns non-zero in case it released the lock, so it looks all 
good already.

In fact, I think the only case we care about is actually in 
wait_task_stopped(), in the "bail_ref:" case. That's where we have 
released the lock and potentially another process came in and reaped it, 
but we still return zero.

That case should set itself to running, so that we don't sleep forever.

In fact, it looks like the "bail_ref" case has _another_ bug: since it 
returns zero, we may still be _using_ the task pointer (to get to the next 
task), so we must NOT do a "put_task_struct()" intil we've re-acquired the 
tasklist_lock.

And I think _those_ two things were the real bugs. Does this patch fix it 
for you?

Roland, I'd still love to hear your input on the EXIT_DEAD case. Is it 
reapable?

		Linus

-----
===== kernel/exit.c 1.166 vs edited =====
--- 1.166/kernel/exit.c	2004-11-04 11:13:19 -08:00
+++ edited/kernel/exit.c	2004-11-08 08:00:12 -08:00
@@ -1200,8 +1200,10 @@
 		 */
 		write_unlock_irq(&tasklist_lock);
 bail_ref:
-		put_task_struct(p);
 		read_lock(&tasklist_lock);
+		put_task_struct(p);
+		/* When the waiter does a sched(), we need to re-start the loop! */
+		current->state = TASK_RUNNING;
 		return 0;
 	}
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-08 16:01     ` Linus Torvalds
@ 2004-11-08 16:13       ` Linus Torvalds
  2004-11-08 16:37         ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2004-11-08 16:13 UTC (permalink / raw)
  To: Sripathi Kodi
  Cc: linux-kernel, Roland McGrath, Ingo Molnar, Andrew Morton, dino



On Mon, 8 Nov 2004, Linus Torvalds wrote:
> 
> In fact, it looks like the "bail_ref" case has _another_ bug: since it 
> returns zero, we may still be _using_ the task pointer (to get to the next 
> task), so we must NOT do a "put_task_struct()" intil we've re-acquired the 
> tasklist_lock.

Actually, this part of the patch is bogus. If our "put_task_struct" is the
last one, it doesn't help at all that we're insidfe the tasklist_lock. 
We'll just release the task structure ourselves.

The problem remains, though: "do_wait()" does end up accessing "tsk" in

	tsk = next_thread(tsk);

and as far as I can see, "tsk" may be gone by then.

Is there anything else protecting us? This looks like a serious (if 
extremely unlikely) bug..

Anyway, here's an updated patch for Sripathi's original problem, with a 
better comment, and with the put_task_struct back where it was originally 
since it shouldn't matter.

Help me, Obi-Wan McGrath,

		Linus

----
===== kernel/exit.c 1.166 vs edited =====
--- 1.166/kernel/exit.c	2004-11-04 11:13:19 -08:00
+++ edited/kernel/exit.c	2004-11-08 08:08:18 -08:00
@@ -1202,6 +1202,18 @@
 bail_ref:
 		put_task_struct(p);
 		read_lock(&tasklist_lock);
+
+		/*
+		 * We are returning to the wait loop without having successfully
+		 * removed the process, and a zero value. That means that the wait
+		 * will continue - but somebody else might have reaped a child
+		 * (maybe this one) that we already decided was eligible.
+		 *
+		 * As a result, we need to make sure that we don't wait for
+		 * children forever - they might be all gone. So mark us
+		 * running, and the "schedule()" in do_wait() is fine.
+		 */
+		current->state = TASK_RUNNING;
 		return 0;
 	}
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-08 16:13       ` Linus Torvalds
@ 2004-11-08 16:37         ` Linus Torvalds
  2004-11-09  1:09           ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2004-11-08 16:37 UTC (permalink / raw)
  To: Sripathi Kodi
  Cc: linux-kernel, Roland McGrath, Ingo Molnar, Andrew Morton, dino



On Mon, 8 Nov 2004, Linus Torvalds wrote:
> 
> Actually, this part of the patch is bogus. If our "put_task_struct" is the
> last one, it doesn't help at all that we're insidfe the tasklist_lock. 
> We'll just release the task structure ourselves.
> 
> The problem remains, though: "do_wait()" does end up accessing "tsk" in
> 
> 	tsk = next_thread(tsk);
> 
> and as far as I can see, "tsk" may be gone by then.
> 
> Is there anything else protecting us? This looks like a serious (if 
> extremely unlikely) bug..

It also looks like we should have gotten an oops in do_wait() if this ever
happened with SLAB poisoning. Google doesn't seem to find any reports like
that.

Of course, the race to make this happen is likely extremely small indeed,
so the reason may just be that nobody ever triggers it in practice (and it
almost certainly requires that a threaded app waits for its children in
multiple threads, which is also fairly unusual - so this is likely a very
small race coupled with an access pattern that doesn't actually happen in
real life).

But maybe it's because I just missed some reason why this all is ok. I'd 
love to be wrong about it.

Anyway, if I'm right, the suggested fix would be something like this (this 
replaces the earlier patches, since it also makes the zero return case go 
away - we don't need to mark anything runnable, since we restart the whole 
loop).

NOTE! -EAGAIN should be safe, because the other routines involved can only
return -EFAULT as an error, so this is all unique to the "try again"  
case.

Ok, three patches for the same piece of code withing minutes. Please tell 
me this one is not also broken..

			Linus

----
===== kernel/exit.c 1.166 vs edited =====
--- 1.166/kernel/exit.c	2004-11-04 11:13:19 -08:00
+++ edited/kernel/exit.c	2004-11-08 08:34:37 -08:00
@@ -1201,8 +1201,15 @@
 		write_unlock_irq(&tasklist_lock);
 bail_ref:
 		put_task_struct(p);
-		read_lock(&tasklist_lock);
-		return 0;
+		/*
+		 * We are returning to the wait loop without having successfully
+		 * removed the process and having released the lock. We cannot
+		 * continue, since the "p" task pointer is potentially stale.
+		 *
+		 * Return -EAGAIN, and do_wait() will restart the loop from the
+		 * beginning. Do _not_ re-acquire the lock.
+		 */
+		return -EAGAIN;
 	}
 
 	/* move to end of parent's list to avoid starvation */
@@ -1343,6 +1350,8 @@
 							   (options & WNOWAIT),
 							   infop,
 							   stat_addr, ru);
+				if (retval == -EAGAIN)
+					goto repeat;
 				if (retval != 0) /* He released the lock.  */
 					goto end;
 				break;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-08 16:37         ` Linus Torvalds
@ 2004-11-09  1:09           ` Linus Torvalds
  2004-11-09 14:31             ` Dinakar Guniguntala
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2004-11-09  1:09 UTC (permalink / raw)
  To: Sripathi Kodi
  Cc: linux-kernel, Roland McGrath, Ingo Molnar, Andrew Morton, dino



Ok, I haven't gotten any response on this one, but I'm running it in my 
tree, and look as I might, it still seems right to me. It should fix not 
only the problem Sripathi Kodi saw (can we verify that?), but also a more 
fundamental problem with the access of stale memory.

I'll commit it. I would still prefer to have somebody else check out my 
logic (or lack there-of).

		Linus

On Mon, 8 Nov 2004, Linus Torvalds wrote:
> 
> Anyway, if I'm right, the suggested fix would be something like this (this 
> replaces the earlier patches, since it also makes the zero return case go 
> away - we don't need to mark anything runnable, since we restart the whole 
> loop).
> 
> NOTE! -EAGAIN should be safe, because the other routines involved can only
> return -EFAULT as an error, so this is all unique to the "try again"  
> case.
> 
> Ok, three patches for the same piece of code withing minutes. Please tell 
> me this one is not also broken..
> 
> 			Linus
> 
> ----
> ===== kernel/exit.c 1.166 vs edited =====
> --- 1.166/kernel/exit.c	2004-11-04 11:13:19 -08:00
> +++ edited/kernel/exit.c	2004-11-08 08:34:37 -08:00
> @@ -1201,8 +1201,15 @@
>  		write_unlock_irq(&tasklist_lock);
>  bail_ref:
>  		put_task_struct(p);
> -		read_lock(&tasklist_lock);
> -		return 0;
> +		/*
> +		 * We are returning to the wait loop without having successfully
> +		 * removed the process and having released the lock. We cannot
> +		 * continue, since the "p" task pointer is potentially stale.
> +		 *
> +		 * Return -EAGAIN, and do_wait() will restart the loop from the
> +		 * beginning. Do _not_ re-acquire the lock.
> +		 */
> +		return -EAGAIN;
>  	}
>  
>  	/* move to end of parent's list to avoid starvation */
> @@ -1343,6 +1350,8 @@
>  							   (options & WNOWAIT),
>  							   infop,
>  							   stat_addr, ru);
> +				if (retval == -EAGAIN)
> +					goto repeat;
>  				if (retval != 0) /* He released the lock.  */
>  					goto end;
>  				break;
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-09  1:09           ` Linus Torvalds
@ 2004-11-09 14:31             ` Dinakar Guniguntala
  2004-11-09 16:11               ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Dinakar Guniguntala @ 2004-11-09 14:31 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sripathi Kodi, linux-kernel, Roland McGrath, Ingo Molnar, Andrew Morton

Linus,

All of your patches address the TASK_STOPPED case, whereas the current
problem that Sripathi reported does not have a task in the stopped state.
So the problem still remains.  Let me restate the problem


Thread A (PID 1)                Thread B (PID 1)        PID 2

                                   fork()
                                                        do something

   wait()                          wait()               exit()


   do_wait()                       do_wait()


   switch(p->state)              switch(p->state)
     -> default                    -> default
        (EXIT_ZOMBIE)                 (EXIT_ZOMBIE)


   wait_task_zombie()            wait_task_zombie()


   state = EXIT_DEAD
    (wins race)                     (loses race)
                                    return 0
    reap child

                                 since flag is set,
                              this thread will now hang



Hence the patches Sripathi sent which did fix the problem, but they end up 
busy looping because of schedule. The only clean solution seems to be to 
wake up the siblings if there are any in the case we reap a process from 
wait_task_zombie. What do you think?

Regards,

Dinakar


On Mon, Nov 08, 2004 at 05:09:59PM -0800, Linus Torvalds wrote:
> 
> 
> Ok, I haven't gotten any response on this one, but I'm running it in my 
> tree, and look as I might, it still seems right to me. It should fix not 
> only the problem Sripathi Kodi saw (can we verify that?), but also a more 
> fundamental problem with the access of stale memory.
> 
> I'll commit it. I would still prefer to have somebody else check out my 
> logic (or lack there-of).
> 
> 		Linus
> 
> On Mon, 8 Nov 2004, Linus Torvalds wrote:
> > 
> > Anyway, if I'm right, the suggested fix would be something like this (this 
> > replaces the earlier patches, since it also makes the zero return case go 
> > away - we don't need to mark anything runnable, since we restart the whole 
> > loop).
> > 
> > NOTE! -EAGAIN should be safe, because the other routines involved can only
> > return -EFAULT as an error, so this is all unique to the "try again"  
> > case.
> > 
> > Ok, three patches for the same piece of code withing minutes. Please tell 
> > me this one is not also broken..
> > 
> > 			Linus
> > 
> > ----
> > ===== kernel/exit.c 1.166 vs edited =====
> > --- 1.166/kernel/exit.c	2004-11-04 11:13:19 -08:00
> > +++ edited/kernel/exit.c	2004-11-08 08:34:37 -08:00
> > @@ -1201,8 +1201,15 @@
> >  		write_unlock_irq(&tasklist_lock);
> >  bail_ref:
> >  		put_task_struct(p);
> > -		read_lock(&tasklist_lock);
> > -		return 0;
> > +		/*
> > +		 * We are returning to the wait loop without having successfully
> > +		 * removed the process and having released the lock. We cannot
> > +		 * continue, since the "p" task pointer is potentially stale.
> > +		 *
> > +		 * Return -EAGAIN, and do_wait() will restart the loop from the
> > +		 * beginning. Do _not_ re-acquire the lock.
> > +		 */
> > +		return -EAGAIN;
> >  	}
> >  
> >  	/* move to end of parent's list to avoid starvation */
> > @@ -1343,6 +1350,8 @@
> >  							   (options & WNOWAIT),
> >  							   infop,
> >  							   stat_addr, ru);
> > +				if (retval == -EAGAIN)
> > +					goto repeat;
> >  				if (retval != 0) /* He released the lock.  */
> >  					goto end;
> >  				break;
> > 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-09 14:31             ` Dinakar Guniguntala
@ 2004-11-09 16:11               ` Linus Torvalds
  2004-11-10 14:35                 ` Dinakar Guniguntala
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2004-11-09 16:11 UTC (permalink / raw)
  To: Dinakar Guniguntala
  Cc: Sripathi Kodi, linux-kernel, Roland McGrath, Ingo Molnar, Andrew Morton



On Tue, 9 Nov 2004, Dinakar Guniguntala wrote:
> 
> All of your patches address the TASK_STOPPED case, whereas the current
> problem that Sripathi reported does not have a task in the stopped state.

Sorry, I'm a retard.

> Hence the patches Sripathi sent which did fix the problem, but they end up 
> busy looping because of schedule. The only clean solution seems to be to 
> wake up the siblings if there are any in the case we reap a process from 
> wait_task_zombie. What do you think?

Ok, that was my original second suggestion, but the more I think about it, 
the more unnecessarily complicated it sounds.

I think I have a potentially better approach: make the rules for "flag" a 
bit more explicit, and make it have structure. We use "flag" for two 
things: we use it to determine if we should return -ECHILD (no children), 
and for whether we should wait for something that might become available 
later. So just split up "flag" into these two meanings, instead of just 
trying to use a single bit:

 - one bit that says "we found a child that _will_ wake us up when it's 
   ready". In other words, that's a child that is ours, and is not yet a 
   zombie.

 - one bit that says "we found a child that is ours".

Make "eligible_child()" follow these rules, and then instead of just 
setting "flag = 1", we'd set "flag |= ret".

Now, with these rules, we know just what to do: we only do the wait if the 
"we have children that will wake us up" bit is set. But we return ECHILD 
only if flag is totally clear.

Does that sound like it would fix the problem?

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-09 16:11               ` Linus Torvalds
@ 2004-11-10 14:35                 ` Dinakar Guniguntala
  2004-11-10 17:04                   ` Linus Torvalds
  0 siblings, 1 reply; 12+ messages in thread
From: Dinakar Guniguntala @ 2004-11-10 14:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Sripathi Kodi, linux-kernel, Roland McGrath, Ingo Molnar, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

On Tue, Nov 09, 2004 at 08:11:41AM -0800, Linus Torvalds wrote:
> 
> I think I have a potentially better approach: make the rules for "flag" a 
> bit more explicit, and make it have structure. We use "flag" for two 
> things: we use it to determine if we should return -ECHILD (no children), 
> and for whether we should wait for something that might become available 
> later. So just split up "flag" into these two meanings, instead of just 
> trying to use a single bit:
> 
>  - one bit that says "we found a child that _will_ wake us up when it's 
>    ready". In other words, that's a child that is ours, and is not yet a 
>    zombie.
> 
>  - one bit that says "we found a child that is ours".
> 
> Make "eligible_child()" follow these rules, and then instead of just 
> setting "flag = 1", we'd set "flag |= ret".
> 
> Now, with these rules, we know just what to do: we only do the wait if the 
> "we have children that will wake us up" bit is set. But we return ECHILD 
> only if flag is totally clear.
> 
> Does that sound like it would fix the problem?
> 

How about if we set the flag only in the cases when the exit state is not
either TASK_DEAD or TASK_ZOMBIE. 
Patch attached below. I confirmed that this fixes the problem and I also ran 
some LTP tests

Signed-off-by: Dinakar Guniguntala <dino@in.ibm.com>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>



[-- Attachment #2: do_wait.patch --]
[-- Type: text/plain, Size: 751 bytes --]

diff -Naurp linux-2.6.10-rc1.orig/kernel/exit.c linux-2.6.10-rc1/kernel/exit.c
--- linux-2.6.10-rc1.orig/kernel/exit.c	2004-10-23 03:10:06.000000000 +0530
+++ linux-2.6.10-rc1/kernel/exit.c	2004-11-10 17:18:20.818103584 +0530
@@ -1325,14 +1325,15 @@ repeat:
 			ret = eligible_child(pid, options, p);
 			if (!ret)
 				continue;
-			flag = 1;
 
 			switch (p->state) {
 			case TASK_TRACED:
+				flag = 1;
 				if (!my_ptrace_child(p))
 					continue;
 				/*FALLTHROUGH*/
 			case TASK_STOPPED:
+				flag = 1;
 				if (!(options & WUNTRACED) &&
 				    !my_ptrace_child(p))
 					continue;
@@ -1365,6 +1366,7 @@ repeat:
 						goto end;
 					break;
 				}
+				flag = 1;
 check_continued:
 				if (!unlikely(options & WCONTINUED))
 					continue;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-10 14:35                 ` Dinakar Guniguntala
@ 2004-11-10 17:04                   ` Linus Torvalds
  2004-11-11 11:20                     ` Sripathi Kodi
  0 siblings, 1 reply; 12+ messages in thread
From: Linus Torvalds @ 2004-11-10 17:04 UTC (permalink / raw)
  To: Dinakar Guniguntala
  Cc: Sripathi Kodi, linux-kernel, Roland McGrath, Ingo Molnar, Andrew Morton



On Wed, 10 Nov 2004, Dinakar Guniguntala wrote:
> 
> How about if we set the flag only in the cases when the exit state is not
> either TASK_DEAD or TASK_ZOMBIE. 

Why TASK_DEAD? We can't reap such a process anyway, no? But yes, I agree 
with the approach.

		Linus

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] do_wait fix for 2.6.10-rc1
  2004-11-10 17:04                   ` Linus Torvalds
@ 2004-11-11 11:20                     ` Sripathi Kodi
  0 siblings, 0 replies; 12+ messages in thread
From: Sripathi Kodi @ 2004-11-11 11:20 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dinakar Guniguntala, linux-kernel, Roland McGrath, Ingo Molnar,
	Andrew Morton

Linus Torvalds wrote:

>On Wed, 10 Nov 2004, Dinakar Guniguntala wrote:
>  
>
>>How about if we set the flag only in the cases when the exit state is not
>>either TASK_DEAD or TASK_ZOMBIE. 
>>    
>>
>
>Why TASK_DEAD? We can't reap such a process anyway, no? But yes, I agree 
>with the approach.
>
>		Linus
>  
>
Linus,

Yes, when we see a process in TASK_DEAD state, we can't reap it. We 
shouldn't set the flag in this scenario because we have not really found 
an eligible child (one in any state other than TASK_DEAD). So we need to 
go back to parsing the list of children as if we have not found an 
eligible child at all.

Please let us know if this argument is correct and if we have reached 
the final shape for this fix.

Thanks,
Sripathi.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2004-11-11 11:25 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-11-05  9:57 [PATCH] do_wait fix for 2.6.10-rc1 Sripathi Kodi
2004-11-05 10:03 ` Ingo Molnar
     [not found] ` <Pine.LNX.4.58.0411051101500.30457@ppc970.osdl.org>
2004-11-08 14:27   ` Sripathi Kodi
2004-11-08 16:01     ` Linus Torvalds
2004-11-08 16:13       ` Linus Torvalds
2004-11-08 16:37         ` Linus Torvalds
2004-11-09  1:09           ` Linus Torvalds
2004-11-09 14:31             ` Dinakar Guniguntala
2004-11-09 16:11               ` Linus Torvalds
2004-11-10 14:35                 ` Dinakar Guniguntala
2004-11-10 17:04                   ` Linus Torvalds
2004-11-11 11:20                     ` Sripathi Kodi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.