linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: wait_task_zombie NULL dereference
@ 2012-12-04 13:48 Bill Huey (hui)
  2012-12-04 14:03 ` Bill Huey (hui)
  0 siblings, 1 reply; 3+ messages in thread
From: Bill Huey (hui) @ 2012-12-04 13:48 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: ebiederm

I'm hitting this under a heavy scheduler test load with SCHED_RR tasks
exiting normally after completion and the parent exiting with some of
the pthreads still running:

(gdb) bt
#0  no_context (regs=0xffff880018c55d58, error_code=0, address=4,
signal=signal@entry=11,
    si_code=si_code@entry=196609) at arch/x86/mm/fault.c:630
#1  0xffffffff816a02fe in __bad_area_nosemaphore
(regs=regs@entry=0xffff880018c55d58,
    error_code=error_code@entry=0, address=address@entry=4,
si_code=si_code@entry=196609)
    at arch/x86/mm/fault.c:767
#2  0xffffffff816a0565 in __bad_area (si_code=196609, address=4,
error_code=0, regs=0xffff880018c55d58)
    at arch/x86/mm/fault.c:789
#3  bad_area (regs=regs@entry=0xffff880018c55d58,
error_code=error_code@entry=0, address=address@entry=4)
    at arch/x86/mm/fault.c:795
#4  0xffffffff816b381c in do_page_fault
(regs=regs@entry=0xffff880018c55d58, error_code=error_code@entry=0)
    at arch/x86/mm/fault.c:1159
#5  0xffffffff816b2ff5 in do_async_page_fault
(regs=0xffff880018c55d58, error_code=0) at arch/x86/kernel/kvm.c:246
#6  <signal handler called>
#7  wait_task_zombie (p=0xffff88003a034500, wo=0xffff880018c55f00) at
kernel/exit.c:1224
#8  wait_consider_task (p=0xffff88003a034500, ptrace=0,
wo=0xffff880018c55f00) at kernel/exit.c:1591
#9  wait_consider_task (wo=0xffff880018c55f00, ptrace=0,
p=0xffff88003a034500) at kernel/exit.c:1544
#10 0xffffffff8105a910 in do_wait_thread (tsk=0xffff88002f510000,
wo=0xffff880018c55f00) at kernel/exit.c:1666
#11 do_wait (wo=wo@entry=0xffff880018c55f00) at kernel/exit.c:1735
#12 0xffffffff8105bd45 in sys_wait4 (upid=<optimized out>,
stat_addr=0x7fff40f4168c, options=<optimized out>,
    ru=0x0 <irq_stack_union>) at kernel/exit.c:1865
#13 <signal handler called>
#14 0x00007f58c4d7f4ea in ?? ()
#15 0xffff88000000001b in ?? ()
#16 0xdead4ead001e001e in ?? ()
#17 0x00000000ffffffff in ?? ()
#18 0xffffffffffffffff in ?? ()
#19 0xffffffff8280e5e8 in __key.30461 ()
#20 0xffffffff8205f850 in lock_classes ()
#21 0x0000000000000000 in ?? ()



(gdb) down
#7  wait_task_zombie (p=0xffff88003a034500, wo=0xffff880018c55f00) at
kernel/exit.c:1224
1224            kuid_t two= task_uid(p);


[   23.324284] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000004
[   23.324284] IP: [<ffffffff8105a1a0>] wait_consider_task+0x5b0/0xc20
[   23.324284] PGD 2fa48067 PUD 39ff4067 PMD 0
[   23.324284] Oops: 0000 [#1] SMP

......

It crashes at that point with a NULL dereference it looks like. I
expanded out the arguments for from_kuid_munged() so that gdb can get
at a specific line.

bill

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BUG: wait_task_zombie NULL dereference
  2012-12-04 13:48 BUG: wait_task_zombie NULL dereference Bill Huey (hui)
@ 2012-12-04 14:03 ` Bill Huey (hui)
  2012-12-04 19:20   ` Eric W. Biederman
  0 siblings, 1 reply; 3+ messages in thread
From: Bill Huey (hui) @ 2012-12-04 14:03 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: ebiederm

I should add that I encountered this on 3.6.0 with some mild
modifications to the scheduler path that enqueue/dequeue a task before
any of the schedule exit logic gets hit. The SCHED_FF/FIFO rebalancer
does much the same so I can't imagine that being the source of the
problem.

I could be wrong however.

bill

On Tue, Dec 4, 2012 at 5:48 AM, Bill Huey (hui) <bill.huey@gmail.com> wrote:
> I'm hitting this under a heavy scheduler test load with SCHED_RR tasks
> exiting normally after completion and the parent exiting with some of
> the pthreads still running:

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: BUG: wait_task_zombie NULL dereference
  2012-12-04 14:03 ` Bill Huey (hui)
@ 2012-12-04 19:20   ` Eric W. Biederman
  0 siblings, 0 replies; 3+ messages in thread
From: Eric W. Biederman @ 2012-12-04 19:20 UTC (permalink / raw)
  To: Bill Huey (hui); +Cc: Linux Kernel Mailing List

"Bill Huey (hui)" <bill.huey@gmail.com> writes:

> I should add that I encountered this on 3.6.0 with some mild
> modifications to the scheduler path that enqueue/dequeue a task before
> any of the schedule exit logic gets hit. The SCHED_FF/FIFO rebalancer
> does much the same so I can't imagine that being the source of the
> problem.
>
> I could be wrong however.

In 3.6 from_kuid_munged should be only be expanded to the inline noop
version.

The code you quote does not exist in kernel/exit.c in wait_task_zombie
and has not existed in wait_task_zombie in Linus's tree.  So since I
can't see the code I can't help.

I suspect the bug relates to your local modifications.

Eric

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2012-12-04 19:20 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-04 13:48 BUG: wait_task_zombie NULL dereference Bill Huey (hui)
2012-12-04 14:03 ` Bill Huey (hui)
2012-12-04 19:20   ` Eric W. Biederman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).