* [parisc-linux] Re: Expect defunct, kill -9 panics kernel? [not found] <119aab440702100916q504101b1xe99f65ff5945e712@mail.gmail.com> @ 2007-02-10 18:10 ` John David Anglin 2007-02-10 18:35 ` [parisc-linux] " James Bottomley 1 sibling, 0 replies; 7+ messages in thread From: John David Anglin @ 2007-02-10 18:10 UTC (permalink / raw) To: Carlos O'Donell; +Cc: dave.anglin, parisc-linux > Is this the usual behaviour you see? > > 1. I run the gcc testsuite. > 2. expect dies, leaving a defunct process. > 3. Killing another expect panics the kernel. It similar to the behavior that I see. I don't usually see this with expect though. Possibly, this is because I use my own build of expect linked tcl8.3. I see this behavior quite consistently on my c3750 if I 1. Run the gcc libjava testsuite. 2. Usually, there a set of processes (e.g., Process_3) left running after the testsuite ends. These processes are not defunct and load the processor. I can kill all but the oldest thread. 3. Killing the oldest thread panics the kernel. Sometimes the system reboots. However, the system often hangs doing endless panics. I suspect a timing issue as the c3750 is the fastest processor that I test on. I don't see as many problems with the libjava testsuite on slower hardware. At one time, I thought this might be a 32 versus 64-bit issue, but I see the same problems running a 64-bit kernel. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] <119aab440702100916q504101b1xe99f65ff5945e712@mail.gmail.com> 2007-02-10 18:10 ` [parisc-linux] Re: Expect defunct, kill -9 panics kernel? John David Anglin @ 2007-02-10 18:35 ` James Bottomley 1 sibling, 0 replies; 7+ messages in thread From: James Bottomley @ 2007-02-10 18:35 UTC (permalink / raw) To: Carlos O'Donell; +Cc: John David Anglin, parisc-linux On Sat, 2007-02-10 at 12:16 -0500, Carlos O'Donell wrote: > At what point in the process life are we in __wake_up and > __wake_up_common? > An address of 0x10 is very suspicious. Almost every internal kernel event or semaphore uses these. Because of the empty backtrace, I'd be inclined to say it was the scheduler, possibly. 0x10 looks to be curr->func implying curr is NULL and thus the queue task_list is corrupt. That's the best I can do without the kernel to pull apart. James _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <200702101937.l1AJb7Uo014941@hiauly1.hia.nrc.ca>]
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] <200702101937.l1AJb7Uo014941@hiauly1.hia.nrc.ca> @ 2007-02-11 1:50 ` James Bottomley [not found] ` <1171158607.3373.54.camel@mulgrave.il.steeleye.com> 1 sibling, 0 replies; 7+ messages in thread From: James Bottomley @ 2007-02-11 1:50 UTC (permalink / raw) To: John David Anglin; +Cc: dave.anglin, parisc-linux On Sat, 2007-02-10 at 14:37 -0500, John David Anglin wrote: > > 0x10 looks to be curr->func implying curr is NULL and thus the queue > > task_list is corrupt. > > Do you think it help to add a check in __wake_up for a NULL pointer? I suppose so ... I'd really like someone to validate my guess though, although an additional BUG_ON() can't hurt. James _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <1171158607.3373.54.camel@mulgrave.il.steeleye.com>]
[parent not found: <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com>]
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] ` <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com> @ 2007-02-11 17:17 ` John David Anglin 2007-02-11 19:19 ` James Bottomley [not found] ` <1171221592.3406.32.camel@mulgrave.il.steeleye.com> 2 siblings, 0 replies; 7+ messages in thread From: John David Anglin @ 2007-02-11 17:17 UTC (permalink / raw) To: Carlos O'Donell; +Cc: James.Bottomley, dave.anglin, parisc-linux > On 2/10/07, James Bottomley <James.Bottomley@steeleye.com> wrote: > > On Sat, 2007-02-10 at 14:37 -0500, John David Anglin wrote: > > > > 0x10 looks to be curr->func implying curr is NULL and thus the queue > > > > task_list is corrupt. > > > > > > Do you think it help to add a check in __wake_up for a NULL pointer? > > > > I suppose so ... I'd really like someone to validate my guess though, > > although an additional BUG_ON() can't hurt. > > How do I validate your guess? Look for a null or bogus curr->func when > scheduling? I'm trying the change below. Hasn't triggered yet. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) diff --git a/kernel/sched.c b/kernel/sched.c index cca93cc..277e426 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3703,6 +3703,7 @@ void fastcall __wake_up(wait_queue_head_t *q, unsigned int mode, { unsigned long flags; + BUG_ON(!q); spin_lock_irqsave(&q->lock, flags); __wake_up_common(q, mode, nr_exclusive, 0, key); spin_unlock_irqrestore(&q->lock, flags); _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] ` <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com> 2007-02-11 17:17 ` John David Anglin @ 2007-02-11 19:19 ` James Bottomley [not found] ` <1171221592.3406.32.camel@mulgrave.il.steeleye.com> 2 siblings, 0 replies; 7+ messages in thread From: James Bottomley @ 2007-02-11 19:19 UTC (permalink / raw) To: Carlos O'Donell; +Cc: John David Anglin, dave.anglin, parisc-linux On Sun, 2007-02-11 at 12:09 -0500, Carlos O'Donell wrote: > How do I validate your guess? Look for a null or bogus curr->func when > scheduling? Disassemble the piece in vmlinux for __wait_common and check that the instruction that faulted is where the code gets the curr->func. James _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <1171221592.3406.32.camel@mulgrave.il.steeleye.com>]
[parent not found: <119aab440702111221k19b2643em26ac943399274b9f@mail.gmail.com>]
[parent not found: <119aab440702111222v3562f308v9808b4dea7b73d59@mail.gmail.com>]
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] ` <119aab440702111222v3562f308v9808b4dea7b73d59@mail.gmail.com> @ 2007-02-11 20:35 ` James Bottomley 0 siblings, 0 replies; 7+ messages in thread From: James Bottomley @ 2007-02-11 20:35 UTC (permalink / raw) To: Carlos O'Donell; +Cc: John David Anglin, dave.anglin, parisc-linux On Sun, 2007-02-11 at 15:22 -0500, Carlos O'Donell wrote: > On 2/11/07, Carlos O'Donell <carlos@systemhalted.org> wrote: > > The faulting instruction is: > > 74: 52 82 00 20 ldd 10(r20),rp > > > > Which is just before the curr->func call. > > 78: e8 40 f0 00 bve,l (rp),rp > > 7c: 52 9b 00 30 ldd 18(r20),dp > > > > So your assumption was correct. The value of curr->func is null. > > How did the list get corrupted? > > ... to be precise, the faulting instruction is the break at 0x10 that > we use for null pointer dereferences. Right, now here's a bit of really useful detective work: In the same piece of disassembly can you see what happens to %r26 ... the first argument to __wake_up_common() which is the wait queue? It may be clobbered, but if it isn't by the time we fault we know that 0x45f10250 is the address of the wait queue. If we're incredibly lucky, it's a symbol in the vmlinux, can you see if it is (and if it's valid)? Knowing what the wait queue is will tell us (hopefully) with precision where the fault lies. James _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 7+ messages in thread
[parent not found: <1171226106.3406.47.camel@mulgrave.il.steeleye.com>]
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] <1171226106.3406.47.camel@mulgrave.il.steeleye.com> @ 2007-02-11 20:59 ` John David Anglin 0 siblings, 0 replies; 7+ messages in thread From: John David Anglin @ 2007-02-11 20:59 UTC (permalink / raw) To: James Bottomley; +Cc: dave.anglin, parisc-linux > Right, now here's a bit of really useful detective work: > > In the same piece of disassembly can you see what happens to %r26 ... > the first argument to __wake_up_common() which is the wait queue? It > may be clobbered, but if it isn't by the time we fault we know that > 0x45f10250 is the address of the wait queue. If we're incredibly lucky, > it's a symbol in the vmlinux, can you see if it is (and if it's valid)? In the code I'm looking at, r26 is copied to r7 near the beginning of __wake_up_common(). r7 is 0 in the register dump. Of course, Carlos' kernel may differ. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2007-02-11 20:59 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <119aab440702100916q504101b1xe99f65ff5945e712@mail.gmail.com> 2007-02-10 18:10 ` [parisc-linux] Re: Expect defunct, kill -9 panics kernel? John David Anglin 2007-02-10 18:35 ` [parisc-linux] " James Bottomley [not found] <200702101937.l1AJb7Uo014941@hiauly1.hia.nrc.ca> 2007-02-11 1:50 ` James Bottomley [not found] ` <1171158607.3373.54.camel@mulgrave.il.steeleye.com> [not found] ` <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com> 2007-02-11 17:17 ` John David Anglin 2007-02-11 19:19 ` James Bottomley [not found] ` <1171221592.3406.32.camel@mulgrave.il.steeleye.com> [not found] ` <119aab440702111221k19b2643em26ac943399274b9f@mail.gmail.com> [not found] ` <119aab440702111222v3562f308v9808b4dea7b73d59@mail.gmail.com> 2007-02-11 20:35 ` James Bottomley [not found] <1171226106.3406.47.camel@mulgrave.il.steeleye.com> 2007-02-11 20:59 ` John David Anglin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.