All of lore.kernel.org
 help / color / mirror / Atom feed
* INFO: task snmpd:398 blocked for more than 120 seconds.
@ 2010-11-08 15:05 Joakim Tjernlund
  2010-11-08 15:19 ` David Laight
  2010-11-08 17:33 ` Joakim Tjernlund
  0 siblings, 2 replies; 7+ messages in thread
From: Joakim Tjernlund @ 2010-11-08 15:05 UTC (permalink / raw)
  To: linuxppc-dev


I can't make out what is causing this hang every now an then:

INFO: task snmpd:398 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
snmpd         D 0fad1760     0   398      1 0x00000000
Call Trace:
[c7b5fc70] [c7a49a80] 0xc7a49a80 (unreliable)
[c7b5fd30] [c0008608] __switch_to+0x4c/0x6c
[c7b5fd40] [c028fed4] schedule+0x184/0x310
[c7b5fd70] [c0290e5c] __mutex_lock_slowpath+0xa4/0x150
[c7b5fdb0] [c0290c50] mutex_lock+0x44/0x48
[c7b5fdc0] [c01eb5b0] rtnl_lock+0x18/0x28
[c7b5fdd0] [c01df6c4] dev_ioctl+0x2b4/0x764
[c7b5fe60] [c01cafc8] sock_ioctl+0x18c/0x2a0
[c7b5fe80] [c009a16c] vfs_ioctl+0xcc/0xe0
[c7b5fea0] [c009a998] do_vfs_ioctl+0xc4/0x79c
[c7b5ff10] [c009b0b0] sys_ioctl+0x40/0x74
[c7b5ff40] [c00117c4] ret_from_syscall+0x0/0x38

The dev_ioctl is in SIOCGIFCONF:
int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
{
	struct ifreq ifr;
	int ret;
	char *colon;

	/* One special case: SIOCGIFCONF takes ifconf argument
	   and requires shared lock, because it sleeps writing
	   to user space.
	 */

	if (cmd == SIOCGIFCONF) {
		rtnl_lock();
		ret = dev_ifconf(net, (char __user *) arg);
		rtnl_unlock();
		return ret;
	}

Any ideas? Kernel is 2.6.35

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: INFO: task snmpd:398 blocked for more than 120 seconds.
  2010-11-08 15:05 INFO: task snmpd:398 blocked for more than 120 seconds Joakim Tjernlund
@ 2010-11-08 15:19 ` David Laight
  2010-11-08 15:51   ` Joakim Tjernlund
  2010-11-08 17:33 ` Joakim Tjernlund
  1 sibling, 1 reply; 7+ messages in thread
From: David Laight @ 2010-11-08 15:19 UTC (permalink / raw)
  To: linuxppc-dev

=20
> I can't make out what is causing this hang every now an then:
>=20
> INFO: task snmpd:398 blocked for more than 120 seconds.

My problem with that 'error' message is that there is no way
for a driver to disable it on a per-process basis.
We have some processes whose 'normal state' is to sleep
uninterruptibly in the kernel. Shutdown is handled by
an explicit request (not by sending a signal).
The processes could be kernel worker threads (except that
is is ~impossible to handle them exiting from a loadble
kernel module) so are actually children of a daemon sat
inside an ioctl() request that never terminates!

However, on the face of it, your case does look as though
the mutex is fubar'ed.

Might be worth (somehow) dumping the mutex state.

	David

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: INFO: task snmpd:398 blocked for more than 120 seconds.
  2010-11-08 15:19 ` David Laight
@ 2010-11-08 15:51   ` Joakim Tjernlund
  2010-11-09  9:47     ` Norbert van Bolhuis
  0 siblings, 1 reply; 7+ messages in thread
From: Joakim Tjernlund @ 2010-11-08 15:51 UTC (permalink / raw)
  To: David Laight; +Cc: linuxppc-dev

>
> > I can't make out what is causing this hang every now an then:
> >
> > INFO: task snmpd:398 blocked for more than 120 seconds.
>
> My problem with that 'error' message is that there is no way
> for a driver to disable it on a per-process basis.
> We have some processes whose 'normal state' is to sleep
> uninterruptibly in the kernel. Shutdown is handled by
> an explicit request (not by sending a signal).
> The processes could be kernel worker threads (except that
> is is ~impossible to handle them exiting from a loadble
> kernel module) so are actually children of a daemon sat
> inside an ioctl() request that never terminates!
>
> However, on the face of it, your case does look as though
> the mutex is fubar'ed.
>
> Might be worth (somehow) dumping the mutex state.

ehh, it is locked, isn't it? How to find who locked it
and forgot to release it?

       Jocke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: INFO: task snmpd:398 blocked for more than 120 seconds.
  2010-11-08 15:05 INFO: task snmpd:398 blocked for more than 120 seconds Joakim Tjernlund
  2010-11-08 15:19 ` David Laight
@ 2010-11-08 17:33 ` Joakim Tjernlund
  2010-11-08 18:22   ` Joakim Tjernlund
  1 sibling, 1 reply; 7+ messages in thread
From: Joakim Tjernlund @ 2010-11-08 17:33 UTC (permalink / raw)
  Cc: linuxppc-dev



linuxppc-dev-bounces+joakim.tjernlund=transmode.se@lists.ozlabs.org wrote on 2010/11/08 16:05:30:

> From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> To: linuxppc-dev@ozlabs.org
> Date: 2010/11/08 16:12
> Subject: INFO: task snmpd:398 blocked for more than 120 seconds.
> Sent by: linuxppc-dev-bounces+joakim.tjernlund=transmode.se@lists.ozlabs.org
>
>
> I can't make out what is causing this hang every now an then:
>
> INFO: task snmpd:398 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> snmpd         D 0fad1760     0   398      1 0x00000000
> Call Trace:
> [c7b5fc70] [c7a49a80] 0xc7a49a80 (unreliable)
> [c7b5fd30] [c0008608] __switch_to+0x4c/0x6c
> [c7b5fd40] [c028fed4] schedule+0x184/0x310
> [c7b5fd70] [c0290e5c] __mutex_lock_slowpath+0xa4/0x150
> [c7b5fdb0] [c0290c50] mutex_lock+0x44/0x48
> [c7b5fdc0] [c01eb5b0] rtnl_lock+0x18/0x28
> [c7b5fdd0] [c01df6c4] dev_ioctl+0x2b4/0x764
> [c7b5fe60] [c01cafc8] sock_ioctl+0x18c/0x2a0
> [c7b5fe80] [c009a16c] vfs_ioctl+0xcc/0xe0
> [c7b5fea0] [c009a998] do_vfs_ioctl+0xc4/0x79c
> [c7b5ff10] [c009b0b0] sys_ioctl+0x40/0x74
> [c7b5ff40] [c00117c4] ret_from_syscall+0x0/0x38
>
> The dev_ioctl is in SIOCGIFCONF:
> int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
> {
>    struct ifreq ifr;
>    int ret;
>    char *colon;
>
>    /* One special case: SIOCGIFCONF takes ifconf argument
>       and requires shared lock, because it sleeps writing
>       to user space.
>     */
>
>    if (cmd == SIOCGIFCONF) {
>       rtnl_lock();
>       ret = dev_ifconf(net, (char __user *) arg);
>       rtnl_unlock();
>       return ret;
>    }
>
> Any ideas? Kernel is 2.6.35

I tried PROVE_LOCKING but now my kernel doesn't boot at all :)

Something goes very wrong in arch/powerpc/kernel/prom.c:694
694		memblock_reserve(PHYSICAL_START, __pa(klimit) - PHYSICAL_START);

(gdb) print klimit
$8 = 0xc08c9000

What is klimit and can I do something about it?

kernel 2.6.35 on a mpc8321

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: INFO: task snmpd:398 blocked for more than 120 seconds.
  2010-11-08 17:33 ` Joakim Tjernlund
@ 2010-11-08 18:22   ` Joakim Tjernlund
  0 siblings, 0 replies; 7+ messages in thread
From: Joakim Tjernlund @ 2010-11-08 18:22 UTC (permalink / raw)
  Cc: linuxppc-dev

>
> linuxppc-dev-bounces+joakim.tjernlund=transmode.se@lists.ozlabs.org wrote on 2010/11/08 16:05:30:
>
> > From: Joakim Tjernlund <joakim.tjernlund@transmode.se>
> > To: linuxppc-dev@ozlabs.org
> > Date: 2010/11/08 16:12
> > Subject: INFO: task snmpd:398 blocked for more than 120 seconds.
> > Sent by: linuxppc-dev-bounces+joakim.tjernlund=transmode.se@lists.ozlabs.org
> >
> >
> > I can't make out what is causing this hang every now an then:
> >
> > INFO: task snmpd:398 blocked for more than 120 seconds.
> > "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > snmpd         D 0fad1760     0   398      1 0x00000000
> > Call Trace:
> > [c7b5fc70] [c7a49a80] 0xc7a49a80 (unreliable)
> > [c7b5fd30] [c0008608] __switch_to+0x4c/0x6c
> > [c7b5fd40] [c028fed4] schedule+0x184/0x310
> > [c7b5fd70] [c0290e5c] __mutex_lock_slowpath+0xa4/0x150
> > [c7b5fdb0] [c0290c50] mutex_lock+0x44/0x48
> > [c7b5fdc0] [c01eb5b0] rtnl_lock+0x18/0x28
> > [c7b5fdd0] [c01df6c4] dev_ioctl+0x2b4/0x764
> > [c7b5fe60] [c01cafc8] sock_ioctl+0x18c/0x2a0
> > [c7b5fe80] [c009a16c] vfs_ioctl+0xcc/0xe0
> > [c7b5fea0] [c009a998] do_vfs_ioctl+0xc4/0x79c
> > [c7b5ff10] [c009b0b0] sys_ioctl+0x40/0x74
> > [c7b5ff40] [c00117c4] ret_from_syscall+0x0/0x38
> >
> > The dev_ioctl is in SIOCGIFCONF:
> > int dev_ioctl(struct net *net, unsigned int cmd, void __user *arg)
> > {
> >    struct ifreq ifr;
> >    int ret;
> >    char *colon;
> >
> >    /* One special case: SIOCGIFCONF takes ifconf argument
> >       and requires shared lock, because it sleeps writing
> >       to user space.
> >     */
> >
> >    if (cmd == SIOCGIFCONF) {
> >       rtnl_lock();
> >       ret = dev_ifconf(net, (char __user *) arg);
> >       rtnl_unlock();
> >       return ret;
> >    }
> >
> > Any ideas? Kernel is 2.6.35
>
> I tried PROVE_LOCKING but now my kernel doesn't boot at all :)
>
> Something goes very wrong in arch/powerpc/kernel/prom.c:694
> 694      memblock_reserve(PHYSICAL_START, __pa(klimit) - PHYSICAL_START);
>
> (gdb) print klimit
> $8 = 0xc08c9000
>
> What is klimit and can I do something about it?
>
> kernel 2.6.35 on a mpc8321

That was not quite right, turns out that early_init_devtree(*params)
points to a memory that has been zeroed:
print initial_boot_params
$3 = (struct boot_param_header *) 0xc07fa000

Above matches what u-boot writes:
Booting kernel from Legacy Image at 00200000 ...
   Image Name:   oskernel02a:p1a:99
   Created:      2010-11-08  18:09:28 UTC
   Image Type:   PowerPC Linux Kernel Image (gzip compressed)
   Data Size:    1898430 Bytes =  1.8 MB
   Load Address: 00000000
   Entry Point:  00000000
   Verifying Checksum ... OK
## Flattened Device Tree blob at 07fdebb0
   Booting using the fdt blob at 0x7fdebb0
   Uncompressing Kernel Image ... OK
   Loading Device Tree to 007fa000, end 007ff327 ... OK

(gdb) print *initial_boot_params
$4 = {magic = 0x0, totalsize = 0x0, off_dt_struct = 0x0, off_dt_strings = 0x0,
  off_mem_rsvmap = 0x0, version = 0x0, last_comp_version = 0x0,
  boot_cpuid_phys = 0x0, dt_strings_size = 0x0, dt_struct_size = 0x0}

 Jocke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: INFO: task snmpd:398 blocked for more than 120 seconds.
  2010-11-08 15:51   ` Joakim Tjernlund
@ 2010-11-09  9:47     ` Norbert van Bolhuis
  2010-11-09 12:30       ` Joakim Tjernlund
  0 siblings, 1 reply; 7+ messages in thread
From: Norbert van Bolhuis @ 2010-11-09  9:47 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev, David Laight

Joakim Tjernlund wrote:
>>> I can't make out what is causing this hang every now an then:
>>>
>>> INFO: task snmpd:398 blocked for more than 120 seconds.
>> My problem with that 'error' message is that there is no way
>> for a driver to disable it on a per-process basis.
>> We have some processes whose 'normal state' is to sleep
>> uninterruptibly in the kernel. Shutdown is handled by
>> an explicit request (not by sending a signal).
>> The processes could be kernel worker threads (except that
>> is is ~impossible to handle them exiting from a loadble
>> kernel module) so are actually children of a daemon sat
>> inside an ioctl() request that never terminates!
>>
>> However, on the face of it, your case does look as though
>> the mutex is fubar'ed.
>>
>> Might be worth (somehow) dumping the mutex state.
> 
> ehh, it is locked, isn't it? How to find who locked it
> and forgot to release it?
> 

You could add show_state to check_hung_task

It worked for me while solving some (application) task hangups.

---
N. van Bolhuis.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: INFO: task snmpd:398 blocked for more than 120 seconds.
  2010-11-09  9:47     ` Norbert van Bolhuis
@ 2010-11-09 12:30       ` Joakim Tjernlund
  0 siblings, 0 replies; 7+ messages in thread
From: Joakim Tjernlund @ 2010-11-09 12:30 UTC (permalink / raw)
  To: Norbert van Bolhuis; +Cc: linuxppc-dev, David Laight

Norbert van Bolhuis <nvbolhuis@aimvalley.nl> wrote on 2010/11/09 10:47:46:
>
> Joakim Tjernlund wrote:
> >>> I can't make out what is causing this hang every now an then:
> >>>
> >>> INFO: task snmpd:398 blocked for more than 120 seconds.
> >> My problem with that 'error' message is that there is no way
> >> for a driver to disable it on a per-process basis.
> >> We have some processes whose 'normal state' is to sleep
> >> uninterruptibly in the kernel. Shutdown is handled by
> >> an explicit request (not by sending a signal).
> >> The processes could be kernel worker threads (except that
> >> is is ~impossible to handle them exiting from a loadble
> >> kernel module) so are actually children of a daemon sat
> >> inside an ioctl() request that never terminates!
> >>
> >> However, on the face of it, your case does look as though
> >> the mutex is fubar'ed.
> >>
> >> Might be worth (somehow) dumping the mutex state.
> >
> > ehh, it is locked, isn't it? How to find who locked it
> > and forgot to release it?
> >
>
> You could add show_state to check_hung_task
>
> It worked for me while solving some (application) task hangups.

Will have a look at that too, thanks.

Meanwhile i have tried to turn on various LOCK debug and
that results in an unbootable kernel.
Somewhere the dev. tree. gets mangled and it happens when
DEBUG_LOCK_ALLOC is set.
Don't know why and I must move on.
Someone else might want to try if DEBUG_LOCK_ALLOC works
on embedded ppc boards, especially 83xx.

    Jocke

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-11-09 12:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-08 15:05 INFO: task snmpd:398 blocked for more than 120 seconds Joakim Tjernlund
2010-11-08 15:19 ` David Laight
2010-11-08 15:51   ` Joakim Tjernlund
2010-11-09  9:47     ` Norbert van Bolhuis
2010-11-09 12:30       ` Joakim Tjernlund
2010-11-08 17:33 ` Joakim Tjernlund
2010-11-08 18:22   ` Joakim Tjernlund

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.