All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [parisc-linux] Expect defunct, kill -9 panics kernel?
       [not found] <200702101937.l1AJb7Uo014941@hiauly1.hia.nrc.ca>
@ 2007-02-11  1:50 ` James Bottomley
       [not found] ` <1171158607.3373.54.camel@mulgrave.il.steeleye.com>
  1 sibling, 0 replies; 6+ messages in thread
From: James Bottomley @ 2007-02-11  1:50 UTC (permalink / raw)
  To: John David Anglin; +Cc: dave.anglin, parisc-linux

On Sat, 2007-02-10 at 14:37 -0500, John David Anglin wrote:
> > 0x10 looks to be curr->func implying curr is NULL and thus the queue
> > task_list is corrupt.
> 
> Do you think it help to add a check in __wake_up for a NULL pointer?

I suppose so ... I'd really like someone to validate my guess though,
although an additional BUG_ON() can't hurt.

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [parisc-linux] Expect defunct, kill -9 panics kernel?
       [not found]   ` <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com>
@ 2007-02-11 17:17     ` John David Anglin
  2007-02-11 19:19     ` James Bottomley
       [not found]     ` <1171221592.3406.32.camel@mulgrave.il.steeleye.com>
  2 siblings, 0 replies; 6+ messages in thread
From: John David Anglin @ 2007-02-11 17:17 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: James.Bottomley, dave.anglin, parisc-linux

> On 2/10/07, James Bottomley <James.Bottomley@steeleye.com> wrote:
> > On Sat, 2007-02-10 at 14:37 -0500, John David Anglin wrote:
> > > > 0x10 looks to be curr->func implying curr is NULL and thus the queue
> > > > task_list is corrupt.
> > >
> > > Do you think it help to add a check in __wake_up for a NULL pointer?
> >
> > I suppose so ... I'd really like someone to validate my guess though,
> > although an additional BUG_ON() can't hurt.
> 
> How do I validate your guess? Look for a null or bogus curr->func when
> scheduling?

I'm trying the change below.  Hasn't triggered yet.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)

diff --git a/kernel/sched.c b/kernel/sched.c
index cca93cc..277e426 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -3703,6 +3703,7 @@ void fastcall __wake_up(wait_queue_head_t *q, unsigned int mode,
 {
 	unsigned long flags;
 
+	BUG_ON(!q);
 	spin_lock_irqsave(&q->lock, flags);
 	__wake_up_common(q, mode, nr_exclusive, 0, key);
 	spin_unlock_irqrestore(&q->lock, flags);
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [parisc-linux] Expect defunct, kill -9 panics kernel?
       [not found]   ` <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com>
  2007-02-11 17:17     ` John David Anglin
@ 2007-02-11 19:19     ` James Bottomley
       [not found]     ` <1171221592.3406.32.camel@mulgrave.il.steeleye.com>
  2 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2007-02-11 19:19 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: John David Anglin, dave.anglin, parisc-linux

On Sun, 2007-02-11 at 12:09 -0500, Carlos O'Donell wrote:
> How do I validate your guess? Look for a null or bogus curr->func when
> scheduling?

Disassemble the piece in vmlinux for __wait_common and check that the
instruction that faulted is where the code gets the curr->func.

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [parisc-linux] Expect defunct, kill -9 panics kernel?
       [not found]         ` <119aab440702111222v3562f308v9808b4dea7b73d59@mail.gmail.com>
@ 2007-02-11 20:35           ` James Bottomley
  0 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2007-02-11 20:35 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: John David Anglin, dave.anglin, parisc-linux

On Sun, 2007-02-11 at 15:22 -0500, Carlos O'Donell wrote:
> On 2/11/07, Carlos O'Donell <carlos@systemhalted.org> wrote:
> > The faulting instruction is:
> >   74:   52 82 00 20     ldd 10(r20),rp
> >
> > Which is just before the curr->func call.
> >   78:   e8 40 f0 00     bve,l (rp),rp
> >   7c:   52 9b 00 30     ldd 18(r20),dp
> >
> > So your assumption was correct. The value of curr->func is null.
> > How did the list get corrupted?
> 
> ... to be precise, the faulting instruction is the break at 0x10 that
> we use for null pointer dereferences.

Right, now here's a bit of really useful detective work:

In the same piece of disassembly can you see what happens to %r26 ...
the first argument to __wake_up_common() which is the wait queue?  It
may be clobbered, but if it isn't by the time we fault we know that
0x45f10250 is the address of the wait queue.  If we're incredibly lucky,
it's a symbol in the vmlinux, can you see if it is (and if it's valid)?

Knowing what the wait queue is will tell us (hopefully) with precision
where the fault lies.

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [parisc-linux] Expect defunct, kill -9 panics kernel?
       [not found] <1171226106.3406.47.camel@mulgrave.il.steeleye.com>
@ 2007-02-11 20:59 ` John David Anglin
  0 siblings, 0 replies; 6+ messages in thread
From: John David Anglin @ 2007-02-11 20:59 UTC (permalink / raw)
  To: James Bottomley; +Cc: dave.anglin, parisc-linux

> Right, now here's a bit of really useful detective work:
> 
> In the same piece of disassembly can you see what happens to %r26 ...
> the first argument to __wake_up_common() which is the wait queue?  It
> may be clobbered, but if it isn't by the time we fault we know that
> 0x45f10250 is the address of the wait queue.  If we're incredibly lucky,
> it's a symbol in the vmlinux, can you see if it is (and if it's valid)?

In the code I'm looking at, r26 is copied to r7 near the beginning of
__wake_up_common().  r7 is 0 in the register dump.  Of course, Carlos'
kernel may differ.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc-cnrc.gc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6602)
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [parisc-linux] Expect defunct, kill -9 panics kernel?
       [not found] <119aab440702100916q504101b1xe99f65ff5945e712@mail.gmail.com>
@ 2007-02-10 18:35 ` James Bottomley
  0 siblings, 0 replies; 6+ messages in thread
From: James Bottomley @ 2007-02-10 18:35 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: John David Anglin, parisc-linux

On Sat, 2007-02-10 at 12:16 -0500, Carlos O'Donell wrote:
> At what point in the process life are we in __wake_up and
> __wake_up_common?
> An address of 0x10 is very suspicious.

Almost every internal kernel event or semaphore uses these.

Because of the empty backtrace, I'd be inclined to say it was the
scheduler, possibly.

0x10 looks to be curr->func implying curr is NULL and thus the queue
task_list is corrupt.

That's the best I can do without the kernel to pull apart.

James


_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-02-11 20:59 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200702101937.l1AJb7Uo014941@hiauly1.hia.nrc.ca>
2007-02-11  1:50 ` [parisc-linux] Expect defunct, kill -9 panics kernel? James Bottomley
     [not found] ` <1171158607.3373.54.camel@mulgrave.il.steeleye.com>
     [not found]   ` <119aab440702110909r2018a297k98b4f1baed54821a@mail.gmail.com>
2007-02-11 17:17     ` John David Anglin
2007-02-11 19:19     ` James Bottomley
     [not found]     ` <1171221592.3406.32.camel@mulgrave.il.steeleye.com>
     [not found]       ` <119aab440702111221k19b2643em26ac943399274b9f@mail.gmail.com>
     [not found]         ` <119aab440702111222v3562f308v9808b4dea7b73d59@mail.gmail.com>
2007-02-11 20:35           ` James Bottomley
     [not found] <1171226106.3406.47.camel@mulgrave.il.steeleye.com>
2007-02-11 20:59 ` John David Anglin
     [not found] <119aab440702100916q504101b1xe99f65ff5945e712@mail.gmail.com>
2007-02-10 18:35 ` James Bottomley

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.