* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] <200702101937.l1AJb7Uo014941@hiauly1.hia.nrc.ca> @ 2007-02-11 1:50 ` James Bottomley [not found] ` <1171158607.3373.54.camel@mulgrave.il.steeleye.com> 1 sibling, 0 replies; 6+ messages in thread From: James Bottomley @ 2007-02-11 1:50 UTC (permalink / raw) To: John David Anglin; +Cc: dave.anglin, parisc-linux On Sat, 2007-02-10 at 14:37 -0500, John David Anglin wrote: > > 0x10 looks to be curr->func implying curr is NULL and thus the queue > > task_list is corrupt. > > Do you think it help to add a check in __wake_up for a NULL pointer? I suppose so ... I'd really like someone to validate my guess though, although an additional BUG_ON() can't hurt. James _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1171158607.3373.54.camel@mulgrave.il.steeleye.com>]
[parent not found: <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com>]
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] ` <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com> @ 2007-02-11 17:17 ` John David Anglin 2007-02-11 19:19 ` James Bottomley [not found] ` <1171221592.3406.32.camel@mulgrave.il.steeleye.com> 2 siblings, 0 replies; 6+ messages in thread From: John David Anglin @ 2007-02-11 17:17 UTC (permalink / raw) To: Carlos O'Donell; +Cc: James.Bottomley, dave.anglin, parisc-linux > On 2/10/07, James Bottomley <James.Bottomley@steeleye.com> wrote: > > On Sat, 2007-02-10 at 14:37 -0500, John David Anglin wrote: > > > > 0x10 looks to be curr->func implying curr is NULL and thus the queue > > > > task_list is corrupt. > > > > > > Do you think it help to add a check in __wake_up for a NULL pointer? > > > > I suppose so ... I'd really like someone to validate my guess though, > > although an additional BUG_ON() can't hurt. > > How do I validate your guess? Look for a null or bogus curr->func when > scheduling? I'm trying the change below. Hasn't triggered yet. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) diff --git a/kernel/sched.c b/kernel/sched.c index cca93cc..277e426 100644 --- a/kernel/sched.c +++ b/kernel/sched.c @@ -3703,6 +3703,7 @@ void fastcall __wake_up(wait_queue_head_t *q, unsigned int mode, { unsigned long flags; + BUG_ON(!q); spin_lock_irqsave(&q->lock, flags); __wake_up_common(q, mode, nr_exclusive, 0, key); spin_unlock_irqrestore(&q->lock, flags); _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] ` <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com> 2007-02-11 17:17 ` John David Anglin @ 2007-02-11 19:19 ` James Bottomley [not found] ` <1171221592.3406.32.camel@mulgrave.il.steeleye.com> 2 siblings, 0 replies; 6+ messages in thread From: James Bottomley @ 2007-02-11 19:19 UTC (permalink / raw) To: Carlos O'Donell; +Cc: John David Anglin, dave.anglin, parisc-linux On Sun, 2007-02-11 at 12:09 -0500, Carlos O'Donell wrote: > How do I validate your guess? Look for a null or bogus curr->func when > scheduling? Disassemble the piece in vmlinux for __wait_common and check that the instruction that faulted is where the code gets the curr->func. James _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1171221592.3406.32.camel@mulgrave.il.steeleye.com>]
[parent not found: <119aab440702111221k19b2643em26ac943399274b9f@mail.gmail.com>]
[parent not found: <119aab440702111222v3562f308v9808b4dea7b73d59@mail.gmail.com>]
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] ` <119aab440702111222v3562f308v9808b4dea7b73d59@mail.gmail.com> @ 2007-02-11 20:35 ` James Bottomley 0 siblings, 0 replies; 6+ messages in thread From: James Bottomley @ 2007-02-11 20:35 UTC (permalink / raw) To: Carlos O'Donell; +Cc: John David Anglin, dave.anglin, parisc-linux On Sun, 2007-02-11 at 15:22 -0500, Carlos O'Donell wrote: > On 2/11/07, Carlos O'Donell <carlos@systemhalted.org> wrote: > > The faulting instruction is: > > 74: 52 82 00 20 ldd 10(r20),rp > > > > Which is just before the curr->func call. > > 78: e8 40 f0 00 bve,l (rp),rp > > 7c: 52 9b 00 30 ldd 18(r20),dp > > > > So your assumption was correct. The value of curr->func is null. > > How did the list get corrupted? > > ... to be precise, the faulting instruction is the break at 0x10 that > we use for null pointer dereferences. Right, now here's a bit of really useful detective work: In the same piece of disassembly can you see what happens to %r26 ... the first argument to __wake_up_common() which is the wait queue? It may be clobbered, but if it isn't by the time we fault we know that 0x45f10250 is the address of the wait queue. If we're incredibly lucky, it's a symbol in the vmlinux, can you see if it is (and if it's valid)? Knowing what the wait queue is will tell us (hopefully) with precision where the fault lies. James _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1171226106.3406.47.camel@mulgrave.il.steeleye.com>]
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] <1171226106.3406.47.camel@mulgrave.il.steeleye.com> @ 2007-02-11 20:59 ` John David Anglin 0 siblings, 0 replies; 6+ messages in thread From: John David Anglin @ 2007-02-11 20:59 UTC (permalink / raw) To: James Bottomley; +Cc: dave.anglin, parisc-linux > Right, now here's a bit of really useful detective work: > > In the same piece of disassembly can you see what happens to %r26 ... > the first argument to __wake_up_common() which is the wait queue? It > may be clobbered, but if it isn't by the time we fault we know that > 0x45f10250 is the address of the wait queue. If we're incredibly lucky, > it's a symbol in the vmlinux, can you see if it is (and if it's valid)? In the code I'm looking at, r26 is copied to r7 near the beginning of __wake_up_common(). r7 is 0 in the register dump. Of course, Carlos' kernel may differ. Dave -- J. David Anglin dave.anglin@nrc-cnrc.gc.ca National Research Council of Canada (613) 990-0752 (FAX: 952-6602) _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <119aab440702100916q504101b1xe99f65ff5945e712@mail.gmail.com>]
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel? [not found] <119aab440702100916q504101b1xe99f65ff5945e712@mail.gmail.com> @ 2007-02-10 18:35 ` James Bottomley 0 siblings, 0 replies; 6+ messages in thread From: James Bottomley @ 2007-02-10 18:35 UTC (permalink / raw) To: Carlos O'Donell; +Cc: John David Anglin, parisc-linux On Sat, 2007-02-10 at 12:16 -0500, Carlos O'Donell wrote: > At what point in the process life are we in __wake_up and > __wake_up_common? > An address of 0x10 is very suspicious. Almost every internal kernel event or semaphore uses these. Because of the empty backtrace, I'd be inclined to say it was the scheduler, possibly. 0x10 looks to be curr->func implying curr is NULL and thus the queue task_list is corrupt. That's the best I can do without the kernel to pull apart. James _______________________________________________ parisc-linux mailing list parisc-linux@lists.parisc-linux.org http://lists.parisc-linux.org/mailman/listinfo/parisc-linux ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2007-02-11 20:59 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <200702101937.l1AJb7Uo014941@hiauly1.hia.nrc.ca> 2007-02-11 1:50 ` [parisc-linux] Expect defunct, kill -9 panics kernel? James Bottomley [not found] ` <1171158607.3373.54.camel@mulgrave.il.steeleye.com> [not found] ` <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com> 2007-02-11 17:17 ` John David Anglin 2007-02-11 19:19 ` James Bottomley [not found] ` <1171221592.3406.32.camel@mulgrave.il.steeleye.com> [not found] ` <119aab440702111221k19b2643em26ac943399274b9f@mail.gmail.com> [not found] ` <119aab440702111222v3562f308v9808b4dea7b73d59@mail.gmail.com> 2007-02-11 20:35 ` James Bottomley [not found] <1171226106.3406.47.camel@mulgrave.il.steeleye.com> 2007-02-11 20:59 ` John David Anglin [not found] <119aab440702100916q504101b1xe99f65ff5945e712@mail.gmail.com> 2007-02-10 18:35 ` James Bottomley
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.