linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [2.6.21.1] totally hanging altough not crashed
@ 2007-07-20 19:53 Folkert van Heusden
  2007-07-20 20:43 ` Chuck Ebbert
  0 siblings, 1 reply; 4+ messages in thread
From: Folkert van Heusden @ 2007-07-20 19:53 UTC (permalink / raw)
  To: linux-kernel

Hi,

One of my systems running 2.6.21.1 on a P4 with HT and 2GB of ram
occasionally crashes. Not with an oops or panic, it just suddenly stops
doing anything. It still lets you ping and it still forwards ip traffic
but you can't login: not via ssh and not on the console - pressing enter
brings the cursor to the next line and that's it. It is NOT swapping at
all, in fact alt+sysrq+m says that there's still plenty of memory and
swap available:

[783341.404485] SysRq : HELP : loglevel0-8 reBoot Crashdump show-all-locks(D) tErm Full kIll saK showMem Nice powerOff showPc show-all-timers(Q) unRaw Sync showTasks Unmount shoW-blocked-tasks 
[783344.380160] SysRq : Show Memory
[783344.380247] Mem-info:
[783344.380288] DMA per-cpu:
[783344.380331] CPU    0: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[783344.380383] CPU    1: Hot: hi:    0, btch:   1 usd:   0   Cold: hi:    0, btch:   1 usd:   0
[783344.380434] Normal per-cpu:
[783344.380477] CPU    0: Hot: hi:  186, btch:  31 usd:   5   Cold: hi:   62, btch:  15 usd:   0
[783344.380531] CPU    1: Hot: hi:  186, btch:  31 usd:  87   Cold: hi:   62, btch:  15 usd:  60
[783344.380581] HighMem per-cpu:
[783344.380624] CPU    0: Hot: hi:  186, btch:  31 usd: 100   Cold: hi:   62, btch:  15 usd:   3
[783344.380675] CPU    1: Hot: hi:  186, btch:  31 usd: 104   Cold: hi:   62, btch:  15 usd:  61
[783344.380728] Active:283054 inactive:192477 dirty:298 writeback:0 unstable:0
[783344.380730]  free:24391 slab:12716 mapped:14430 pagetables:1464 bounce:3
[783344.380818] DMA free:8604kB min:68kB low:84kB high:100kB active:7172kB inactive:0kB present:16256kB pages_scanned:0 all_unreclaimable? no
[783344.380871] lowmem_reserve[]: 0 873 2015
[783344.381048] Normal free:78776kB min:3744kB low:4680kB high:5616kB active:280560kB inactive:450412kB present:894080kB pages_scanned:0 all_unreclaimable? no
[783344.381104] lowmem_reserve[]: 0 0 9137
[783344.381280] HighMem free:10184kB min:512kB low:1736kB high:2960kB active:844484kB inactive:319496kB present:1169608kB pages_scanned:0 all_unreclaimable? no
[783344.381336] lowmem_reserve[]: 0 0 0
[783344.381510] DMA: 93*4kB 61*8kB 60*16kB 56*32kB 22*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 8604kB
[783344.381965] Normal: 6383*4kB 1501*8kB 2121*16kB 143*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 78676kB
[783344.382420] HighMem: 945*4kB 585*8kB 81*16kB 4*32kB 0*64kB 0*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 10140kB
[783344.382871] Swap cache: add 43, delete 43, find 0/0, race 0+0
[783344.382915] Free swap  = 454496kB
[783344.382956] Total swap = 454600kB
[783344.382998] Free swap:       454496kB
[783344.394272] 524080 pages of RAM
[783344.394323] 294704 pages of HIGHMEM
[783344.394368] 5983 reserved pages
[783344.394410] 134216 pages shared
[783344.394451] 0 pages swap cached
[783344.394493] 298 pages dirty
[783344.394535] 0 pages writeback
[783344.394575] 14430 pages mapped
[783344.394616] 12716 pages slab
[783344.394656] 1464 pages pagetables

The only thing getting it from this is...a hard reboot because even
ctrl+alt+delete is ignored.

alt+srq+p gives:

[783336.561037] SysRq : Show Regs
[783336.561116] 
[783336.561152] Pid: 0, comm:              swapper
[783336.561191] EIP: 0060:[<c100206b>] CPU: 0
[783336.561233] EIP is at default_idle+0x45/0x5a
[783336.561273]  EFLAGS: 00000282    Not tainted  (2.6.21.1test3 #11)
[783336.561313] EAX: 1d91386d EBX: c1336f00 ECX: 00000000 EDX: 00000000
[783336.561355] ESI: c1336e80 EDI: c12f5000 EBP: c1302fcc DS: 007b ES: 007b FS: 00d8
[783336.561432] CR0: 8005003b CR2: b7af3000 CR3: 1dd3c000 CR4: 000006d0
[783336.561473]  =======================

I've put the output of alt+srq+t at
http://keetweej.vanheusden.com/~folkert/ast.txt as it is halve a
megabyte in size.

Anyone can give a suggestion of what is going on? And what to do about
it?


Folkert van Heusden

-- 
www.vanheusden.com/multitail - multitail is tail on steroids. multiple
               windows, filtering, coloring, anything you can think of
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [2.6.21.1] totally hanging altough not crashed
  2007-07-20 20:43 ` Chuck Ebbert
@ 2007-07-20 20:22   ` Folkert van Heusden
  2007-07-20 20:29   ` Folkert van Heusden
  1 sibling, 0 replies; 4+ messages in thread
From: Folkert van Heusden @ 2007-07-20 20:22 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

> > One of my systems running 2.6.21.1 on a P4 with HT and 2GB of ram
> > occasionally crashes. Not with an oops or panic, it just suddenly stops
> > doing anything. It still lets you ping and it still forwards ip traffic
> > but you can't login: not via ssh and not on the console - pressing enter
> > brings the cursor to the next line and that's it. It is NOT swapping at
> > all, in fact alt+sysrq+m says that there's still plenty of memory and
> > swap available:
> Lots of processes are in uninterruptible wait at
> start_this_handle+0x208/0x365. Can you find out what line of code
> that matches? (There are a few of those waits in that function.)

I hope I follow the correct procedure but in any case gdb tells me this:

0 root@muur:/usr/src/linux$ gdb ./vmlinux
GNU gdb 6.5
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i486-slackware-linux"...l *Using host libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) l *c10c23a0
No symbol "c10c23a0" in current context.
(gdb) p start_this_handle
$1 = {int (journal_t *, handle_t *)} 0xc10c7614 <start_this_handle>
(gdb) l *0xC10C7979
0xc10c7979 is in new_handle (fs/jbd/transaction.c:238).
233             return ret;
234     }
235
236     /* Allocate a new handle.  This should probably be in a slab... */
237     static handle_t *new_handle(int nblocks)
238     {
239             handle_t *handle = jbd_alloc_handle(GFP_NOFS);
240             if (!handle)
241                     return NULL;
242             memset(handle, 0, sizeof(*handle));
(gdb) l *0xC10C781C
0xc10c781c is in start_this_handle (fs/jbd/transaction.c:155).
150
151                     prepare_to_wait(&journal->j_wait_transaction_locked,
152                                             &wait, TASK_UNINTERRUPTIBLE);
153                     spin_unlock(&journal->j_state_lock);
154                     schedule();
155                     finish_wait(&journal->j_wait_transaction_locked, &wait);
156                     goto repeat;
157             }
158
159             /*
(gdb)


Folkert van Heusden

-- 
www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [2.6.21.1] totally hanging altough not crashed
  2007-07-20 20:43 ` Chuck Ebbert
  2007-07-20 20:22   ` Folkert van Heusden
@ 2007-07-20 20:29   ` Folkert van Heusden
  1 sibling, 0 replies; 4+ messages in thread
From: Folkert van Heusden @ 2007-07-20 20:29 UTC (permalink / raw)
  To: Chuck Ebbert; +Cc: linux-kernel

> > One of my systems running 2.6.21.1 on a P4 with HT and 2GB of ram
> > occasionally crashes. Not with an oops or panic, it just suddenly stops
> > doing anything. It still lets you ping and it still forwards ip traffic
> > but you can't login: not via ssh and not on the console - pressing enter
> > brings the cursor to the next line and that's it. It is NOT swapping at
> > all, in fact alt+sysrq+m says that there's still plenty of memory and
> > swap available:
> 
> Lots of processes are in uninterruptible wait at
> start_this_handle+0x208/0x365. Can you find out what line of code
> that matches? (There are a few of those waits in that function.)

I did some more investigating and found:

folkert@muur:~$ grep -A 1 "Call Trace" www/ast.txt | grep -v -e "^--" -v -e "Call Trace:" | cut -d " "  -f 2- | genstats | more
    1   358     54.82%     1.75  [<c120f326>] schedule_timeout+0x8c/0x8e
    2    88     13.48%     7.42  [<c10c23a0>] start_this_handle+0x208/0x365
    3    63      9.65%    10.10  [<c120f2e0>] schedule_timeout+0x46/0x8e
    4    36      5.51%    17.69  [<c1020868>] do_wait+0x2c4/0x396
    5    36      5.51%    16.81  [<c1098cbd>] inotify_read+0x8d/0x1b5
    6    17      2.60%    37.59  [<c1074f52>] pipe_wait+0x8a/0xab
    7    13      1.99%     2.69  [<c102d22e>] worker_thread+0x130/0x165
    8    11      1.68%    55.73  [<c1210307>] do_nanosleep+0x42/0x70
    9     7      1.07%    92.29  [<c120f5e9>] __mutex_lock_slowpath+0xac/0x28c
   10     7      1.07%    90.71  [<c101f9fc>] do_exit+0x253/0x428
   11     2      0.31%   145.50  [<c1138101>] write_chan+0x15b/0x1d6
   12     2      0.31%     3.50  [<c104de2a>] watchdog+0x47/0x55
   13     2      0.31%     3.00  [<c10224f6>] ksoftirqd+0x85/0x98
   14     2      0.31%     2.50  [<c1018e44>] migration_thread+0x8a/0x10f
   15     1      0.15%   617.00  [<c10c7d56>] log_wait_commit+0xb5/0x121
   16     1      0.15%   562.00  [<c1029e46>] sys_pause+0x14/0x1b
   17     1      0.15%   225.00  [<f8a58ae1>] schluffen+0xad/0xaf [zaptel]
   18     1      0.15%    63.00  [<c1029efc>] sys_rt_sigsuspend+0xaf/0xcf
   19     1      0.15%    23.00  [<c10c75ff>] kjournald+0x1fd/0x207
   20     1      0.15%    22.00  [<c10c4b6f>] journal_commit_transaction+0x242/0xcd3
   21     1      0.15%    17.00  [<c105ab1b>] kswapd+0xf7/0x10b
   22     1      0.15%    16.00  [<c1191647>] serio_thread+0xfa/0xff
   23     1      0.15%    15.00  [<c11730e7>] hub_thread+0xe6/0xe8

(please ignore the fourth column)

As I'm not entirely sure that this an innocent to be in for a kernel I
put this point into gdb:

(gdb) p schedule_timeout
$1 = {long int (long int)} 0xc121a8b4 <schedule_timeout>
(gdb) l *0xC120F3B2
0xc120f3b2 is in __xfrm_state_insert (net/xfrm/xfrm_hash.h:49).
44      static inline unsigned __xfrm_src_hash(xfrm_address_t *daddr,
45                                             xfrm_address_t *saddr,
46                                             unsigned short family,
47                                             unsigned int hmask)
48      {
49              unsigned int h = family;
50              switch (family) {
51              case AF_INET:
52                      h ^= __xfrm4_daddr_saddr_hash(daddr, saddr);
53                      break;

I very much hope it is of any help.


Folkert van Heusden

-- 
www.vanheusden.com/multitail - win een vlaai van multivlaai! zorg
ervoor dat multitail opgenomen wordt in Fedora Core, AIX, Solaris of
HP/UX en win een vlaai naar keuze
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [2.6.21.1] totally hanging altough not crashed
  2007-07-20 19:53 [2.6.21.1] totally hanging altough not crashed Folkert van Heusden
@ 2007-07-20 20:43 ` Chuck Ebbert
  2007-07-20 20:22   ` Folkert van Heusden
  2007-07-20 20:29   ` Folkert van Heusden
  0 siblings, 2 replies; 4+ messages in thread
From: Chuck Ebbert @ 2007-07-20 20:43 UTC (permalink / raw)
  To: Folkert van Heusden; +Cc: linux-kernel

On 07/20/2007 03:53 PM, Folkert van Heusden wrote:
> Hi,
> 
> One of my systems running 2.6.21.1 on a P4 with HT and 2GB of ram
> occasionally crashes. Not with an oops or panic, it just suddenly stops
> doing anything. It still lets you ping and it still forwards ip traffic
> but you can't login: not via ssh and not on the console - pressing enter
> brings the cursor to the next line and that's it. It is NOT swapping at
> all, in fact alt+sysrq+m says that there's still plenty of memory and
> swap available:

Lots of processes are in uninterruptible wait at
start_this_handle+0x208/0x365. Can you find out what line of code
that matches? (There are a few of those waits in that function.)

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-07-20 21:01 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-20 19:53 [2.6.21.1] totally hanging altough not crashed Folkert van Heusden
2007-07-20 20:43 ` Chuck Ebbert
2007-07-20 20:22   ` Folkert van Heusden
2007-07-20 20:29   ` Folkert van Heusden

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).