Re: 2.4.22-pre lockups (decoded oops for pre8)

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: 2.4.22-pre lockups (decoded oops for pre8)
       [not found]   ` <Pine.LNX.4.55L.0307251545090.14733@freak.distro.conectiva>
@ 2003-08-02 12:27     ` Stephan von Krawczynski
  2003-08-03  7:25       ` Willy Tarreau
  2003-08-05 16:40       ` Marcelo Tosatti
  0 siblings, 2 replies; 21+ messages in thread
From: Stephan von Krawczynski @ 2003-08-02 12:27 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: andrea, linux-kernel

Hello Marcelo, hello andrea,

after some days of running 2.4.22-pre8 I finally got the crash (freeze as
usual). This time the debuggin setup worked and I got:


ksymoops 2.4.8 on i686 2.4.22-pre8.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.22-pre8/ (default)
     -m /boot/System.map-2.4.22-pre8 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Unable to handle kernel paging request at virtual address 4129b0fc
c0130084
*pde = 313f6067
Oops: 0002
CPU:    1
EIP:    0010:[<c0130084>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: c2cfdba0   ecx: 00000000   edx: 4129b0fc
esi: d5fb0a24   edi: 0001ca22   ebp: c02eaaa8   esp: c345df30
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 5, stackpage=c345d000)
Stack: c2cfdba0 d5fb0a24 c2cfdba0 c013924f c2cfdba0 000001d0 00000200 000001d0 
       00000006 00000020 000001d0 00000020 00000006 c0139493 00000006 00000001 
       c02eaaa8 000001d0 00000006 c02eaaa8 00000000 c013950e 00000020 c02eaaa8 
Call Trace:    [<c013924f>] [<c0139493>] [<c013950e>] [<c013961c>] [<c01396a8>]
  [<c01397d8>] [<c0139740>] [<c0105000>] [<c010592e>] [<c0139740>]
Code: 89 02 c7 43 24 00 00 00 00 f0 ff 0d 9c a5 37 c0 5a 5b 5e c3 


>>EIP; c0130084 <__remove_inode_page+44/60>   <=====

>>ebx; c2cfdba0 <_end+2952980/3852ee40>
>>esi; d5fb0a24 <_end+15c05804/3852ee40>
>>ebp; c02eaaa8 <contig_page_data+168/340>
>>esp; c345df30 <_end+30b2d10/3852ee40>

Trace; c013924f <shrink_cache+2df/3b0>
Trace; c0139493 <shrink_caches+63/a0>
Trace; c013950e <try_to_free_pages_zone+3e/60>
Trace; c013961c <kswapd_balance_pgdat+4c/b0>
Trace; c01396a8 <kswapd_balance+28/40>
Trace; c01397d8 <kswapd+98/c0>
Trace; c0139740 <kswapd+0/c0>
Trace; c0105000 <_stext+0/0>
Trace; c010592e <arch_kernel_thread+2e/40>
Trace; c0139740 <kswapd+0/c0>

Code;  c0130084 <__remove_inode_page+44/60>
00000000 <_EIP>:
Code;  c0130084 <__remove_inode_page+44/60>   <=====
   0:   89 02                     mov    %eax,(%edx)   <=====
Code;  c0130086 <__remove_inode_page+46/60>
   2:   c7 43 24 00 00 00 00      movl   $0x0,0x24(%ebx)
Code;  c013008d <__remove_inode_page+4d/60>
   9:   f0 ff 0d 9c a5 37 c0      lock decl 0xc037a59c
Code;  c0130094 <__remove_inode_page+54/60>
  10:   5a                        pop    %edx
Code;  c0130095 <__remove_inode_page+55/60>
  11:   5b                        pop    %ebx
Code;  c0130096 <__remove_inode_page+56/60>
  12:   5e                        pop    %esi
Code;  c0130097 <__remove_inode_page+57/60>
  13:   c3                        ret    


1 warning issued.  Results may not be reliable.


Hope this helps.
Anything further I can do?

Regards,
Stephan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (decoded oops for pre8)
  2003-08-02 12:27     ` 2.4.22-pre lockups (decoded oops for pre8) Stephan von Krawczynski
@ 2003-08-03  7:25       ` Willy Tarreau
  2003-08-03  9:40         ` Stephan von Krawczynski
  2003-08-05 16:40       ` Marcelo Tosatti
  1 sibling, 1 reply; 21+ messages in thread
From: Willy Tarreau @ 2003-08-03  7:25 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Marcelo Tosatti, andrea, linux-kernel

Hi Stephan,

This is in remove_page_from_hash_queue() at filemap.c:114 :
    *pprev = next;

pprev is taken from page->pprev_hash and is considered invalid here (4129b0fc).
Assuming it has been corrupted earlier, it seems that the only files able to
touch this either directly or indirectly are :
  - mm/filemap.c (add_page_to_hash_queue, add_to_page_cache*)
  - mm/shmem.c (add_to_page_cache_unique)
  - mm/swap_state.c (idem)
  - fs/ext3/inode.c and fs/buffer.c (find_or_create_page)

So the problem may be narrowed down to a few files. Perhaps digging through
the VM changes since before you had a problem will give you more clues...

Cheers,
Willy

On Sat, Aug 02, 2003 at 02:27:34PM +0200, Stephan von Krawczynski wrote:
> Unable to handle kernel paging request at virtual address 4129b0fc
> c0130084
> *pde = 313f6067
> Oops: 0002
> CPU:    1
> EIP:    0010:[<c0130084>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010246
> eax: 00000000   ebx: c2cfdba0   ecx: 00000000   edx: 4129b0fc
> esi: d5fb0a24   edi: 0001ca22   ebp: c02eaaa8   esp: c345df30
> ds: 0018   es: 0018   ss: 0018
> Process kswapd (pid: 5, stackpage=c345d000)
> Stack: c2cfdba0 d5fb0a24 c2cfdba0 c013924f c2cfdba0 000001d0 00000200 000001d0 
>        00000006 00000020 000001d0 00000020 00000006 c0139493 00000006 00000001 
>        c02eaaa8 000001d0 00000006 c02eaaa8 00000000 c013950e 00000020 c02eaaa8 
> Call Trace:    [<c013924f>] [<c0139493>] [<c013950e>] [<c013961c>] [<c01396a8>]
>   [<c01397d8>] [<c0139740>] [<c0105000>] [<c010592e>] [<c0139740>]
> Code: 89 02 c7 43 24 00 00 00 00 f0 ff 0d 9c a5 37 c0 5a 5b 5e c3 
> 
> 
> >>EIP; c0130084 <__remove_inode_page+44/60>   <=====
> 
> >>ebx; c2cfdba0 <_end+2952980/3852ee40>
> >>esi; d5fb0a24 <_end+15c05804/3852ee40>
> >>ebp; c02eaaa8 <contig_page_data+168/340>
> >>esp; c345df30 <_end+30b2d10/3852ee40>
> 
> Trace; c013924f <shrink_cache+2df/3b0>
> Trace; c0139493 <shrink_caches+63/a0>
> Trace; c013950e <try_to_free_pages_zone+3e/60>
> Trace; c013961c <kswapd_balance_pgdat+4c/b0>
> Trace; c01396a8 <kswapd_balance+28/40>
> Trace; c01397d8 <kswapd+98/c0>
> Trace; c0139740 <kswapd+0/c0>
> Trace; c0105000 <_stext+0/0>
> Trace; c010592e <arch_kernel_thread+2e/40>
> Trace; c0139740 <kswapd+0/c0>
> 
> Code;  c0130084 <__remove_inode_page+44/60>
> 00000000 <_EIP>:
> Code;  c0130084 <__remove_inode_page+44/60>   <=====
>    0:   89 02                     mov    %eax,(%edx)   <=====
> Code;  c0130086 <__remove_inode_page+46/60>
>    2:   c7 43 24 00 00 00 00      movl   $0x0,0x24(%ebx)
> Code;  c013008d <__remove_inode_page+4d/60>
>    9:   f0 ff 0d 9c a5 37 c0      lock decl 0xc037a59c
> Code;  c0130094 <__remove_inode_page+54/60>
>   10:   5a                        pop    %edx
> Code;  c0130095 <__remove_inode_page+55/60>
>   11:   5b                        pop    %ebx
> Code;  c0130096 <__remove_inode_page+56/60>
>   12:   5e                        pop    %esi
> Code;  c0130097 <__remove_inode_page+57/60>
>   13:   c3                        ret    
> 
> 
> 1 warning issued.  Results may not be reliable.
> 
> 
> Hope this helps.
> Anything further I can do?
> 
> Regards,
> Stephan
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (decoded oops for pre8)
  2003-08-03  7:25       ` Willy Tarreau
@ 2003-08-03  9:40         ` Stephan von Krawczynski
  0 siblings, 0 replies; 21+ messages in thread
From: Stephan von Krawczynski @ 2003-08-03  9:40 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: marcelo, andrea, linux-kernel

On Sun, 3 Aug 2003 09:25:25 +0200
Willy Tarreau <willy@w.ods.org> wrote:

> Hi Stephan,
> 
> This is in remove_page_from_hash_queue() at filemap.c:114 :
>     *pprev = next;
> 
> pprev is taken from page->pprev_hash and is considered invalid here
> (4129b0fc). Assuming it has been corrupted earlier, it seems that the only
> files able to touch this either directly or indirectly are :
>   - mm/filemap.c (add_page_to_hash_queue, add_to_page_cache*)
>   - mm/shmem.c (add_to_page_cache_unique)
>   - mm/swap_state.c (idem)

>   - fs/ext3/inode.c and fs/buffer.c (find_or_create_page)

Ext3 is unlikely to be related, the box never saw ext3. Ext2 is only used on
/boot (so very unlikely, too), everything else is reiserfs.


> 
> So the problem may be narrowed down to a few files. Perhaps digging through
> the VM changes since before you had a problem will give you more clues...
> 
> Cheers,
> Willy

Thanks for commenting, the problem really is annoying because I _know_ the box
will freeze, only it takes time, this time 4 days...

Regards,
Stephan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (decoded oops for pre8)
  2003-08-02 12:27     ` 2.4.22-pre lockups (decoded oops for pre8) Stephan von Krawczynski
  2003-08-03  7:25       ` Willy Tarreau
@ 2003-08-05 16:40       ` Marcelo Tosatti
  2003-08-06  2:37         ` Stephan von Krawczynski
  2003-08-06  7:41         ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski
  1 sibling, 2 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2003-08-05 16:40 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: andrea, linux-kernel


Stephan,

Is this _STOCK_ 2.4.22-pre10 (no vmware, no other modules) ? 

On Sat, 2 Aug 2003, Stephan von Krawczynski wrote:

> Hello Marcelo, hello andrea,
> 
> after some days of running 2.4.22-pre8 I finally got the crash (freeze as
> usual). This time the debuggin setup worked and I got:
> 
> Unable to handle kernel paging request at virtual address 4129b0fc
> c0130084
> *pde = 313f6067
> Oops: 0002
> CPU:    1
> EIP:    0010:[<c0130084>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010246
> eax: 00000000   ebx: c2cfdba0   ecx: 00000000   edx: 4129b0fc
> esi: d5fb0a24   edi: 0001ca22   ebp: c02eaaa8   esp: c345df30
> ds: 0018   es: 0018   ss: 0018
> Process kswapd (pid: 5, stackpage=c345d000)
> Stack: c2cfdba0 d5fb0a24 c2cfdba0 c013924f c2cfdba0 000001d0 00000200 000001d0 
>        00000006 00000020 000001d0 00000020 00000006 c0139493 00000006 00000001 
>        c02eaaa8 000001d0 00000006 c02eaaa8 00000000 c013950e 00000020 c02eaaa8 
> Call Trace:    [<c013924f>] [<c0139493>] [<c013950e>] [<c013961c>] [<c01396a8>]
>   [<c01397d8>] [<c0139740>] [<c0105000>] [<c010592e>] [<c0139740>]
> Code: 89 02 c7 43 24 00 00 00 00 f0 ff 0d 9c a5 37 c0 5a 5b 5e c3 
> 
> 
> >>EIP; c0130084 <__remove_inode_page+44/60>   <=====
> 
> >>ebx; c2cfdba0 <_end+2952980/3852ee40>
> >>esi; d5fb0a24 <_end+15c05804/3852ee40>
> >>ebp; c02eaaa8 <contig_page_data+168/340>
> >>esp; c345df30 <_end+30b2d10/3852ee40>
> 
> Trace; c013924f <shrink_cache+2df/3b0>
> Trace; c0139493 <shrink_caches+63/a0>
> Trace; c013950e <try_to_free_pages_zone+3e/60>
> Trace; c013961c <kswapd_balance_pgdat+4c/b0>
> Trace; c01396a8 <kswapd_balance+28/40>
> Trace; c01397d8 <kswapd+98/c0>
> Trace; c0139740 <kswapd+0/c0>
> Trace; c0105000 <_stext+0/0>
> Trace; c010592e <arch_kernel_thread+2e/40>
> Trace; c0139740 <kswapd+0/c0>
> 
> Code;  c0130084 <__remove_inode_page+44/60>
> 00000000 <_EIP>:
> Code;  c0130084 <__remove_inode_page+44/60>   <=====
>    0:   89 02                     mov    %eax,(%edx)   <=====
> Code;  c0130086 <__remove_inode_page+46/60>
>    2:   c7 43 24 00 00 00 00      movl   $0x0,0x24(%ebx)
> Code;  c013008d <__remove_inode_page+4d/60>
>    9:   f0 ff 0d 9c a5 37 c0      lock decl 0xc037a59c
> Code;  c0130094 <__remove_inode_page+54/60>
>   10:   5a                        pop    %edx
> Code;  c0130095 <__remove_inode_page+55/60>
>   11:   5b                        pop    %ebx
> Code;  c0130096 <__remove_inode_page+56/60>
>   12:   5e                        pop    %esi
> Code;  c0130097 <__remove_inode_page+57/60>
>   13:   c3                        ret    


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (decoded oops for pre8)
  2003-08-05 16:40       ` Marcelo Tosatti
@ 2003-08-06  2:37         ` Stephan von Krawczynski
  2003-08-06  7:41         ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski
  1 sibling, 0 replies; 21+ messages in thread
From: Stephan von Krawczynski @ 2003-08-06  2:37 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: andrea, linux-kernel

On Tue, 5 Aug 2003 13:40:48 -0300 (BRT)
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:

> 
> Stephan,
> 
> Is this _STOCK_ 2.4.22-pre10 (no vmware, no other modules) ? 

This was from a pre8. There were no strange modules and no vmware involved.
Everything clean, kernel 2.4.22-pre8 on top of SuSE 8.2 distro.
Output was created via serial console.

Regards,
Stephan


> 
> On Sat, 2 Aug 2003, Stephan von Krawczynski wrote:
> 
> > Hello Marcelo, hello andrea,
> > 
> > after some days of running 2.4.22-pre8 I finally got the crash (freeze as
> > usual). This time the debuggin setup worked and I got:
> > 
> > Unable to handle kernel paging request at virtual address 4129b0fc
> > c0130084
> > *pde = 313f6067
> > Oops: 0002
> > CPU:    1
> > EIP:    0010:[<c0130084>]    Not tainted
> > Using defaults from ksymoops -t elf32-i386 -a i386
> > EFLAGS: 00010246
> > eax: 00000000   ebx: c2cfdba0   ecx: 00000000   edx: 4129b0fc
> > esi: d5fb0a24   edi: 0001ca22   ebp: c02eaaa8   esp: c345df30
> > ds: 0018   es: 0018   ss: 0018
> > Process kswapd (pid: 5, stackpage=c345d000)
> > Stack: c2cfdba0 d5fb0a24 c2cfdba0 c013924f c2cfdba0 000001d0 00000200
> > 000001d0 
> >        00000006 00000020 000001d0 00000020 00000006 c0139493 00000006
> >        00000001 c02eaaa8 000001d0 00000006 c02eaaa8 00000000 c013950e
> >        00000020 c02eaaa8 
> > Call Trace:    [<c013924f>] [<c0139493>] [<c013950e>] [<c013961c>]
> > [<c01396a8>]
> >   [<c01397d8>] [<c0139740>] [<c0105000>] [<c010592e>] [<c0139740>]
> > Code: 89 02 c7 43 24 00 00 00 00 f0 ff 0d 9c a5 37 c0 5a 5b 5e c3 
> > 
> > 
> > >>EIP; c0130084 <__remove_inode_page+44/60>   <=====
> > 
> > >>ebx; c2cfdba0 <_end+2952980/3852ee40>
> > >>esi; d5fb0a24 <_end+15c05804/3852ee40>
> > >>ebp; c02eaaa8 <contig_page_data+168/340>
> > >>esp; c345df30 <_end+30b2d10/3852ee40>
> > 
> > Trace; c013924f <shrink_cache+2df/3b0>
> > Trace; c0139493 <shrink_caches+63/a0>
> > Trace; c013950e <try_to_free_pages_zone+3e/60>
> > Trace; c013961c <kswapd_balance_pgdat+4c/b0>
> > Trace; c01396a8 <kswapd_balance+28/40>
> > Trace; c01397d8 <kswapd+98/c0>
> > Trace; c0139740 <kswapd+0/c0>
> > Trace; c0105000 <_stext+0/0>
> > Trace; c010592e <arch_kernel_thread+2e/40>
> > Trace; c0139740 <kswapd+0/c0>
> > 
> > Code;  c0130084 <__remove_inode_page+44/60>
> > 00000000 <_EIP>:
> > Code;  c0130084 <__remove_inode_page+44/60>   <=====
> >    0:   89 02                     mov    %eax,(%edx)   <=====
> > Code;  c0130086 <__remove_inode_page+46/60>
> >    2:   c7 43 24 00 00 00 00      movl   $0x0,0x24(%ebx)
> > Code;  c013008d <__remove_inode_page+4d/60>
> >    9:   f0 ff 0d 9c a5 37 c0      lock decl 0xc037a59c
> > Code;  c0130094 <__remove_inode_page+54/60>
> >   10:   5a                        pop    %edx
> > Code;  c0130095 <__remove_inode_page+55/60>
> >   11:   5b                        pop    %ebx
> > Code;  c0130096 <__remove_inode_page+56/60>
> >   12:   5e                        pop    %esi
> > Code;  c0130097 <__remove_inode_page+57/60>
> >   13:   c3                        ret    
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-05 16:40       ` Marcelo Tosatti
  2003-08-06  2:37         ` Stephan von Krawczynski
@ 2003-08-06  7:41         ` Stephan von Krawczynski
  2003-08-06  8:58           ` Oleg Drokin
                             ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Stephan von Krawczynski @ 2003-08-06  7:41 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: andrea, linux-kernel, green

On Tue, 5 Aug 2003 13:40:48 -0300 (BRT)
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:

> 
> Stephan,
> 
> Is this _STOCK_ 2.4.22-pre10 (no vmware, no other modules) ? 

Hello Marcelo,

today I have a fresh -pre10 oops for you.

Everything seems to start with (there is no i/o error or the like, is it
possible that the fs got damaged during former crashes?):

sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478481)[dev:blocknr]:
bit already cleared
sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478445)[dev:blocknr]:
bit already cleared
sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478441)[dev:blocknr]:
bit already cleared
sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478348)[dev:blocknr]:
bit already cleared

An then:

ksymoops 2.4.8 on i686 2.4.22-pre10.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.22-pre10/ (default)
     -m /boot/System.map-2.4.22-pre10 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Unable to handle kernel NULL pointer dereference at virtual address 00000006
c0144b14
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[<c0144b14>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246
eax: 00000000   ebx: f0f66540   ecx: f0f66540   edx: 00000006
esi: f0f66540   edi: f0f66540   ebp: c2ce0350   esp: c345df24
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 5, stackpage=c345d000)
Stack: c0147ddf f0f66540 00000000 c2ce0350 0001bcad c02eab68 c0139228 c2ce0350
       000001d0 00000200 000001d0 00000016 00000020 000001d0 00000020 00000006
       c01394b3 00000006 c345c000 c02eab68 000001d0 00000006 c02eab68 00000000 
Call Trace:    [<c0147ddf>] [<c0139228>] [<c01394b3>] [<c013952e>] [<c013963c>]
  [<c01396c8>] [<c01397f8>] [<c0139760>] [<c0105000>] [<c010592e>] [<c0139760>]
Code: 89 02 c7 41 30 00 00 00 00 89 4c 24 04 e9 7a ff ff ff 8d 76 


>>EIP; c0144b14 <__remove_from_queues+14/30>   <=====

>>ebx; f0f66540 <_end+30bbb320/3852ee40>
>>ecx; f0f66540 <_end+30bbb320/3852ee40>
>>esi; f0f66540 <_end+30bbb320/3852ee40>
>>edi; f0f66540 <_end+30bbb320/3852ee40>
>>ebp; c2ce0350 <_end+2935130/3852ee40>
>>esp; c345df24 <_end+30b2d04/3852ee40>

Trace; c0147ddf <try_to_free_buffers+7f/170>
Trace; c0139228 <shrink_cache+298/3b0>
Trace; c01394b3 <shrink_caches+63/a0>
Trace; c013952e <try_to_free_pages_zone+3e/60>
Trace; c013963c <kswapd_balance_pgdat+4c/b0>
Trace; c01396c8 <kswapd_balance+28/40>
Trace; c01397f8 <kswapd+98/c0>
Trace; c0139760 <kswapd+0/c0>
Trace; c0105000 <_stext+0/0>
Trace; c010592e <arch_kernel_thread+2e/40>
Trace; c0139760 <kswapd+0/c0>

Code;  c0144b14 <__remove_from_queues+14/30>
00000000 <_EIP>:
Code;  c0144b14 <__remove_from_queues+14/30>   <=====
   0:   89 02                     mov    %eax,(%edx)   <=====
Code;  c0144b16 <__remove_from_queues+16/30>
   2:   c7 41 30 00 00 00 00      movl   $0x0,0x30(%ecx)
Code;  c0144b1d <__remove_from_queues+1d/30>
   9:   89 4c 24 04               mov    %ecx,0x4(%esp,1)
Code;  c0144b21 <__remove_from_queues+21/30>
   d:   e9 7a ff ff ff            jmp    ffffff8c <_EIP+0xffffff8c>
Code;  c0144b26 <__remove_from_queues+26/30>
  12:   8d 76 00                  lea    0x0(%esi),%esi


1 warning issued.  Results may not be reliable.

Regards,
Stephan



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-06  7:41         ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski
@ 2003-08-06  8:58           ` Oleg Drokin
  2003-08-06  9:09           ` Willy Tarreau
  2003-08-06 18:15           ` Marcelo Tosatti
  2 siblings, 0 replies; 21+ messages in thread
From: Oleg Drokin @ 2003-08-06  8:58 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Marcelo Tosatti, andrea, linux-kernel

Hello!

On Wed, Aug 06, 2003 at 09:41:50AM +0200, Stephan von Krawczynski wrote:

> > Is this _STOCK_ 2.4.22-pre10 (no vmware, no other modules) ? 
> Hello Marcelo,
> today I have a fresh -pre10 oops for you.
> Everything seems to start with (there is no i/o error or the like, is it
> possible that the fs got damaged during former crashes?):

Well, you'd better run reiserfsck after crashes with binary modules just to make sure everything is ok.

> sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478481)[dev:blocknr]:
> bit already cleared
> sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478445)[dev:blocknr]:
> bit already cleared
> sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478441)[dev:blocknr]:
> bit already cleared
> sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478348)[dev:blocknr]:
> bit already cleared

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-06  7:41         ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski
  2003-08-06  8:58           ` Oleg Drokin
@ 2003-08-06  9:09           ` Willy Tarreau
  2003-08-06  9:36             ` Stephan von Krawczynski
  2003-08-18 14:23             ` Andrea Arcangeli
  2003-08-06 18:15           ` Marcelo Tosatti
  2 siblings, 2 replies; 21+ messages in thread
From: Willy Tarreau @ 2003-08-06  9:09 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Marcelo Tosatti, andrea, linux-kernel, green

On Wed, Aug 06, 2003 at 09:41:50AM +0200, Stephan von Krawczynski wrote:
 
> Code;  c0144b14 <__remove_from_queues+14/30>
> 00000000 <_EIP>:
> Code;  c0144b14 <__remove_from_queues+14/30>   <=====
>    0:   89 02                     mov    %eax,(%edx)   <=====
> Code;  c0144b16 <__remove_from_queues+16/30>
>    2:   c7 41 30 00 00 00 00      movl   $0x0,0x30(%ecx)
> Code;  c0144b1d <__remove_from_queues+1d/30>
>    9:   89 4c 24 04               mov    %ecx,0x4(%esp,1)
> Code;  c0144b21 <__remove_from_queues+21/30>
>    d:   e9 7a ff ff ff            jmp    ffffff8c <_EIP+0xffffff8c>
> Code;  c0144b26 <__remove_from_queues+26/30>
>   12:   8d 76 00                  lea    0x0(%esi),%esi

once again, it's *pprev=next which is is causing trouble, with pprev=6 this
time (fs/buffer.c:523). There really seems to be something playing badly with
this...

I find amazing that such widely used portions of code only trigger panics on
your system ! either it's a rare combinations of several components/drivers, or
a strange hardware problem, although I can't imagine which (cpu? bus locking?).

Cheers,
Willy


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-06  9:09           ` Willy Tarreau
@ 2003-08-06  9:36             ` Stephan von Krawczynski
  2003-08-06 12:45               ` Willy Tarreau
  2003-08-18 14:23             ` Andrea Arcangeli
  1 sibling, 1 reply; 21+ messages in thread
From: Stephan von Krawczynski @ 2003-08-06  9:36 UTC (permalink / raw)
  To: Willy Tarreau; +Cc: marcelo, andrea, linux-kernel, green

On Wed, 6 Aug 2003 11:09:20 +0200
Willy Tarreau <willy@w.ods.org> wrote:

> On Wed, Aug 06, 2003 at 09:41:50AM +0200, Stephan von Krawczynski wrote:
>  
> > Code;  c0144b14 <__remove_from_queues+14/30>
> > 00000000 <_EIP>:
> > Code;  c0144b14 <__remove_from_queues+14/30>   <=====
> >    0:   89 02                     mov    %eax,(%edx)   <=====
> > Code;  c0144b16 <__remove_from_queues+16/30>
> >    2:   c7 41 30 00 00 00 00      movl   $0x0,0x30(%ecx)
> > Code;  c0144b1d <__remove_from_queues+1d/30>
> >    9:   89 4c 24 04               mov    %ecx,0x4(%esp,1)
> > Code;  c0144b21 <__remove_from_queues+21/30>
> >    d:   e9 7a ff ff ff            jmp    ffffff8c <_EIP+0xffffff8c>
> > Code;  c0144b26 <__remove_from_queues+26/30>
> >   12:   8d 76 00                  lea    0x0(%esi),%esi
> 
> once again, it's *pprev=next which is is causing trouble, with pprev=6 this
> time (fs/buffer.c:523). There really seems to be something playing badly with
> this...
> 
> I find amazing that such widely used portions of code only trigger panics on
> your system ! either it's a rare combinations of several components/drivers,
> or a strange hardware problem, although I can't imagine which (cpu? bus
> locking?).

Hm, the hardware may not be that widespread. I guess not many people are really
using SMP, 64 bit PCI network, 3 GB RAM, 3ware RAID5 and serverworks board
altogether in one box. I can't fight the impression it has something to do with
locking issues. It doesn't look exactly like a hardware problem, you would not
expect crashes on the same type of code then.
The question is: what additional information is needed to find the underlying
problem?

Regards,
Stephan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-06  9:36             ` Stephan von Krawczynski
@ 2003-08-06 12:45               ` Willy Tarreau
  0 siblings, 0 replies; 21+ messages in thread
From: Willy Tarreau @ 2003-08-06 12:45 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: marcelo, andrea, linux-kernel, green, alan

> Hm, the hardware may not be that widespread. I guess not many people are really
> using SMP, 64 bit PCI network, 3 GB RAM, 3ware RAID5 and serverworks board
> altogether in one box. I can't fight the impression it has something to do with
> locking issues. It doesn't look exactly like a hardware problem, you would not
> expect crashes on the same type of code then.

Well, it depends... I once had an overclocked CPU which died only in one
case, it was a car simulator, and it always crashed exactly on the same race,
at the same position in the round ! I even knew that if I could pass that
position, it was ok for another round ! So I later used that game as a
reliability test when I was not sure about the origin of a crash :-)
It seems as a particular sequence of data and/or code could reliably trigger it
although parallel makes never hurt it.

> The question is: what additional information is needed to find the underlying
> problem?

Perhaps cache poisonning could help. Alan has already used this technique
extensively in the past, and might still have a patch which could apply to your
kernel without too many changes. Alan ?

On the other hand, you could also do it by hand, but it's a little hard. You
have to pick every place there's a free, and write particular data before the
free, if possible, data which can identify who has freed the page.

Then after the next crash, you can identify who used the page last. It can
sometimes lead you to some driver missing a lock. But that's not certain.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-06  9:09           ` Willy Tarreau
  2003-08-06  9:36             ` Stephan von Krawczynski
@ 2003-08-18 14:23             ` Andrea Arcangeli
  1 sibling, 0 replies; 21+ messages in thread
From: Andrea Arcangeli @ 2003-08-18 14:23 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Stephan von Krawczynski, Marcelo Tosatti, linux-kernel, green

On Wed, Aug 06, 2003 at 11:09:20AM +0200, Willy Tarreau wrote:
> On Wed, Aug 06, 2003 at 09:41:50AM +0200, Stephan von Krawczynski wrote:
>  
> > Code;  c0144b14 <__remove_from_queues+14/30>
> > 00000000 <_EIP>:
> > Code;  c0144b14 <__remove_from_queues+14/30>   <=====
> >    0:   89 02                     mov    %eax,(%edx)   <=====
> > Code;  c0144b16 <__remove_from_queues+16/30>
> >    2:   c7 41 30 00 00 00 00      movl   $0x0,0x30(%ecx)
> > Code;  c0144b1d <__remove_from_queues+1d/30>
> >    9:   89 4c 24 04               mov    %ecx,0x4(%esp,1)
> > Code;  c0144b21 <__remove_from_queues+21/30>
> >    d:   e9 7a ff ff ff            jmp    ffffff8c <_EIP+0xffffff8c>
> > Code;  c0144b26 <__remove_from_queues+26/30>
> >   12:   8d 76 00                  lea    0x0(%esi),%esi
> 
> once again, it's *pprev=next which is is causing trouble, with pprev=6 this
> time (fs/buffer.c:523). There really seems to be something playing badly with
> this...
> 
> I find amazing that such widely used portions of code only trigger panics on
> your system ! either it's a rare combinations of several components/drivers, or
> a strange hardware problem, although I can't imagine which (cpu? bus locking?).

normally it's bad ram (or anyways a problem with the memory) when bugs
triggers in that place reproducibly. the list walking trashes the l2 and
that put more stress on the ram. If it was random memory corruption
(software) it would more likely crash in different places (though it's
not guaranteed ;).

Andrea

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-06  7:41         ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski
  2003-08-06  8:58           ` Oleg Drokin
  2003-08-06  9:09           ` Willy Tarreau
@ 2003-08-06 18:15           ` Marcelo Tosatti
  2003-08-07  2:14             ` Stephan von Krawczynski
  2 siblings, 1 reply; 21+ messages in thread
From: Marcelo Tosatti @ 2003-08-06 18:15 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: andrea, linux-kernel, green



On Wed, 6 Aug 2003, Stephan von Krawczynski wrote:

> Unable to handle kernel NULL pointer dereference at virtual address 00000006
> c0144b14
> *pde = 00000000
> Oops: 0002
> CPU:    1
> EIP:    0010:[<c0144b14>]    Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010246
> eax: 00000000   ebx: f0f66540   ecx: f0f66540   edx: 00000006
> esi: f0f66540   edi: f0f66540   ebp: c2ce0350   esp: c345df24
> ds: 0018   es: 0018   ss: 0018
> Process kswapd (pid: 5, stackpage=c345d000)
> Stack: c0147ddf f0f66540 00000000 c2ce0350 0001bcad c02eab68 c0139228 c2ce0350
>        000001d0 00000200 000001d0 00000016 00000020 000001d0 00000020 00000006
>        c01394b3 00000006 c345c000 c02eab68 000001d0 00000006 c02eab68 00000000 
> Call Trace:    [<c0147ddf>] [<c0139228>] [<c01394b3>] [<c013952e>] [<c013963c>]
>   [<c01396c8>] [<c01397f8>] [<c0139760>] [<c0105000>] [<c010592e>] [<c0139760>]
> Code: 89 02 c7 41 30 00 00 00 00 89 4c 24 04 e9 7a ff ff ff 8d 76 
> 
> 
> >>EIP; c0144b14 <__remove_from_queues+14/30>   <=====
> 
> >>ebx; f0f66540 <_end+30bbb320/3852ee40>
> >>ecx; f0f66540 <_end+30bbb320/3852ee40>
> >>esi; f0f66540 <_end+30bbb320/3852ee40>
> >>edi; f0f66540 <_end+30bbb320/3852ee40>
> >>ebp; c2ce0350 <_end+2935130/3852ee40>
> >>esp; c345df24 <_end+30b2d04/3852ee40>

Stephan,

I'm pretty worried about this problem.

Your oopses seem to be the result of some kind of memory corruption. On
the other oopses we could see the kernel oopsing on
remove_page_from_hash_queue due to corrupted pointers (as Willy pointed 
out). 

Can you please try to crash your box again with 

CONFIG_DEBUG_SLAB=y 

Again, thanks a lot for your reports.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-06 18:15           ` Marcelo Tosatti
@ 2003-08-07  2:14             ` Stephan von Krawczynski
  2003-08-07  5:35               ` Oleg Drokin
  2003-08-07 12:45               ` Marcelo Tosatti
  0 siblings, 2 replies; 21+ messages in thread
From: Stephan von Krawczynski @ 2003-08-07  2:14 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: andrea, linux-kernel, green

On Wed, 6 Aug 2003 15:15:39 -0300 (BRT)
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:

> Stephan,
> 
> I'm pretty worried about this problem.
> 
> Your oopses seem to be the result of some kind of memory corruption. On
> the other oopses we could see the kernel oopsing on
> remove_page_from_hash_queue due to corrupted pointers (as Willy pointed 
> out). 
> 
> Can you please try to crash your box again with 
> 
> CONFIG_DEBUG_SLAB=y 
> 
> Again, thanks a lot for your reports.

Ok, I have two things. 
First, another oops. I upgraded the system to rc1 yesterday and it did not
survive a single day. Here's the decoded oops, the box was "clean" meaning no
weird modules or the like:


ksymoops 2.4.8 on i686 2.4.22-rc1.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.22-rc1/ (default)
     -m /boot/System.map-2.4.22-rc1 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Unable to handle kernel NULL pointer dereference at virtual address 00000004
c0145060
*pde = 00000000
Oops: 0002
CPU:    1
EIP:    0010:[<c0145060>]    Not tainted   
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010283
eax: 00000000   ebx: c822feb4   ecx: c822fe60   edx: e07e7780
esi: 00000000   edi: e07e7780   ebp: f59bfe3c   esp: f59bfe2c
ds: 0018   es: 0018   ss: 0018
Process nfsd (pid: 1737, stackpage=f59bf000)
Stack: f0cce7a0 00000001 f59bfe38 c822fe60 f0cce7f4 eec54ef4 00000000 e07e7760
       f59be000 f59bfea8 c0183ef5 e07e7780 e07e77cc c02ed880 e07e7760 f8c84fc8
       f59bfea8 dfe6c960 00000000 e07e7760 dfe6c960 00000000 f59c6e04 f59bfea8
Call Trace:    [<c0183ef5>] [<f8c84fc8>] [<f8c856f1>] [<f8c8cee4>] [<f8c8e295>]
  [<f8c923f4>] [<f8c80699>] [<f8c65938>] [<f8c923f4>] [<f8c91a38>] [<f8c91a58>]
  [<f8c80411>] [<c010592e>] [<f8c80210>]
Code: 89 50 04 c7 41 54 00 00 00 00 c7 43 04 00 00 00 00 8b 44 24


>>EIP; c0145060 <fsync_buffers_list+50/1b0>   <=====

>>ebx; c822feb4 <_end+7e84c94/3852ee40>
>>ecx; c822fe60 <_end+7e84c40/3852ee40>
>>edx; e07e7780 <_end+2043c560/3852ee40>
>>edi; e07e7780 <_end+2043c560/3852ee40>
>>ebp; f59bfe3c <_end+35614c1c/3852ee40>
>>esp; f59bfe2c <_end+35614c0c/3852ee40>

Trace; c0183ef5 <reiserfs_sync_file+65/d0>
Trace; f8c84fc8 <[nfsd]nfsd_sync+78/d0>
Trace; f8c856f1 <[nfsd]nfsd_commit+a1/b0>
Trace; f8c8cee4 <[nfsd]nfsd3_proc_commit+94/130>
Trace; f8c8e295 <[nfsd]nfs3svc_decode_commitargs+35/e0>
Trace; f8c923f4 <[nfsd]nfsd_procedures3+2f4/320>
Trace; f8c80699 <[nfsd]nfsd_dispatch+119/21d>
Trace; f8c65938 <[sunrpc]svc_process+4d8/570>
Trace; f8c923f4 <[nfsd]nfsd_procedures3+2f4/320>
Trace; f8c91a38 <[nfsd]nfsd_version3+0/10>
Trace; f8c91a58 <[nfsd]nfsd_program+0/28>
Trace; f8c80411 <[nfsd]nfsd+201/370>
Trace; c010592e <arch_kernel_thread+2e/40>
Trace; f8c80210 <[nfsd]nfsd+0/370>

Code;  c0145060 <fsync_buffers_list+50/1b0>
00000000 <_EIP>:
Code;  c0145060 <fsync_buffers_list+50/1b0>   <=====
   0:   89 50 04                  mov    %edx,0x4(%eax)   <=====
Code;  c0145063 <fsync_buffers_list+53/1b0>
   3:   c7 41 54 00 00 00 00      movl   $0x0,0x54(%ecx)
Code;  c014506a <fsync_buffers_list+5a/1b0>
   a:   c7 43 04 00 00 00 00      movl   $0x0,0x4(%ebx)
Code;  c0145071 <fsync_buffers_list+61/1b0>
  11:   8b 44 24 00               mov    0x0(%esp,1),%eax


1 warning issued.  Results may not be reliable.


As you can see reiserfs seems involved. Regarding reiserfs and my last postings
I can assure you that all reiserfs partitions were checked via reiserfsck right
before installation of rc1 - as Oleg advised - and found:
"Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs"
I was told to use --fix-fixable option which I did and it indeed fixed the
problem. Trying reiserfsck after that found no errors any more. So I see no
chance that corrupt data on the media (through former crashes) is responsible
for this one. Hint: spelling in reiserfsck should be checked ;-)

Second, I re-install the box with CONFIG_DEBUG_SLAB="y" right now. Please tell
me if I should perform special steps (SYSRQ or the like) after the next crash
happens, or if the decoded oops will be sufficient.

Regards,
Stephan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-07  2:14             ` Stephan von Krawczynski
@ 2003-08-07  5:35               ` Oleg Drokin
  2003-08-07 12:45               ` Marcelo Tosatti
  1 sibling, 0 replies; 21+ messages in thread
From: Oleg Drokin @ 2003-08-07  5:35 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Marcelo Tosatti, andrea, linux-kernel

Hello!

On Thu, Aug 07, 2003 at 04:14:40AM +0200, Stephan von Krawczynski wrote:

> Unable to handle kernel NULL pointer dereference at virtual address 00000004

Hm NULL pointer in j_dirty_buffers list. This cannot happen, basically.
This is a cyclically linked list of buffers. And we add stuff to it via standard
functions, so the linkage happens by itself.

> Trace; c0183ef5 <reiserfs_sync_file+65/d0>
> Trace; f8c84fc8 <[nfsd]nfsd_sync+78/d0>
> Code;  c0145060 <fsync_buffers_list+50/1b0>
> 00000000 <_EIP>:
> Code;  c0145060 <fsync_buffers_list+50/1b0>   <=====
>    0:   89 50 04                  mov    %edx,0x4(%eax)   <=====

> As you can see reiserfs seems involved. Regarding reiserfs and my last postings
> I can assure you that all reiserfs partitions were checked via reiserfsck right
> before installation of rc1 - as Oleg advised - and found:
> "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs"

That might explain your prior "freeing already free block" messages.

> I was told to use --fix-fixable option which I did and it indeed fixed the
> problem. Trying reiserfsck after that found no errors any more. So I see no
> chance that corrupt data on the media (through former crashes) is responsible
> for this one. Hint: spelling in reiserfsck should be checked ;-)

Yes, but how the condition that triggered the oops have appeared is totally unclear for me.

Bye,
    Oleg

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-07  2:14             ` Stephan von Krawczynski
  2003-08-07  5:35               ` Oleg Drokin
@ 2003-08-07 12:45               ` Marcelo Tosatti
       [not found]                 ` <3F325198.2010301@namesys.com>
  2003-08-07 15:52                 ` Stephan von Krawczynski
  1 sibling, 2 replies; 21+ messages in thread
From: Marcelo Tosatti @ 2003-08-07 12:45 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: andrea, linux-kernel, green



On Thu, 7 Aug 2003, Stephan von Krawczynski wrote:

> On Wed, 6 Aug 2003 15:15:39 -0300 (BRT)
> Marcelo Tosatti <marcelo@conectiva.com.br> wrote:
> 
> > Stephan,
> > 
> > I'm pretty worried about this problem.
> > 
> > Your oopses seem to be the result of some kind of memory corruption. On
> > the other oopses we could see the kernel oopsing on
> > remove_page_from_hash_queue due to corrupted pointers (as Willy pointed 
> > out). 
> > 
> > Can you please try to crash your box again with 
> > 
> > CONFIG_DEBUG_SLAB=y 
> > 
> > Again, thanks a lot for your reports.
> 
> Ok, I have two things. 
> First, another oops. I upgraded the system to rc1 yesterday and it did not
> survive a single day. Here's the decoded oops, the box was "clean" meaning no
> weird modules or the like:
> 
> 
> ksymoops 2.4.8 on i686 2.4.22-rc1.  Options used
>      -V (default)
>      -k /proc/ksyms (default)
>      -l /proc/modules (default)
>      -o /lib/modules/2.4.22-rc1/ (default)
>      -m /boot/System.map-2.4.22-rc1 (default)
> 
> Warning: You did not tell me where to find symbol information.  I will
> assume that the log matches the kernel and modules that are running
> right now and I'll use the default options above for symbol resolution.
> If the current kernel and/or modules do not match the log, you can get
> more accurate output by telling me the kernel version and where to find
> map, modules, ksyms etc.  ksymoops -h explains the options.
> 
> Unable to handle kernel NULL pointer dereference at virtual address 00000004
> c0145060
> *pde = 00000000
> Oops: 0002
> CPU:    1
> EIP:    0010:[<c0145060>]    Not tainted   
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010283
> eax: 00000000   ebx: c822feb4   ecx: c822fe60   edx: e07e7780
> esi: 00000000   edi: e07e7780   ebp: f59bfe3c   esp: f59bfe2c
> ds: 0018   es: 0018   ss: 0018
> Process nfsd (pid: 1737, stackpage=f59bf000)
> Stack: f0cce7a0 00000001 f59bfe38 c822fe60 f0cce7f4 eec54ef4 00000000 e07e7760
>        f59be000 f59bfea8 c0183ef5 e07e7780 e07e77cc c02ed880 e07e7760 f8c84fc8
>        f59bfea8 dfe6c960 00000000 e07e7760 dfe6c960 00000000 f59c6e04 f59bfea8
> Call Trace:    [<c0183ef5>] [<f8c84fc8>] [<f8c856f1>] [<f8c8cee4>] [<f8c8e295>]
>   [<f8c923f4>] [<f8c80699>] [<f8c65938>] [<f8c923f4>] [<f8c91a38>] [<f8c91a58>]
>   [<f8c80411>] [<c010592e>] [<f8c80210>]
> Code: 89 50 04 c7 41 54 00 00 00 00 c7 43 04 00 00 00 00 8b 44 24
> 
> 
> >>EIP; c0145060 <fsync_buffers_list+50/1b0>   <=====
> 
> >>ebx; c822feb4 <_end+7e84c94/3852ee40>
> >>ecx; c822fe60 <_end+7e84c40/3852ee40>
> >>edx; e07e7780 <_end+2043c560/3852ee40>
> >>edi; e07e7780 <_end+2043c560/3852ee40>
> >>ebp; f59bfe3c <_end+35614c1c/3852ee40>
> >>esp; f59bfe2c <_end+35614c0c/3852ee40>
> 
> Trace; c0183ef5 <reiserfs_sync_file+65/d0>
> Trace; f8c84fc8 <[nfsd]nfsd_sync+78/d0>
> Trace; f8c856f1 <[nfsd]nfsd_commit+a1/b0>
> Trace; f8c8cee4 <[nfsd]nfsd3_proc_commit+94/130>
> Trace; f8c8e295 <[nfsd]nfs3svc_decode_commitargs+35/e0>
> Trace; f8c923f4 <[nfsd]nfsd_procedures3+2f4/320>
> Trace; f8c80699 <[nfsd]nfsd_dispatch+119/21d>
> Trace; f8c65938 <[sunrpc]svc_process+4d8/570>
> Trace; f8c923f4 <[nfsd]nfsd_procedures3+2f4/320>
> Trace; f8c91a38 <[nfsd]nfsd_version3+0/10>
> Trace; f8c91a58 <[nfsd]nfsd_program+0/28>
> Trace; f8c80411 <[nfsd]nfsd+201/370>
> Trace; c010592e <arch_kernel_thread+2e/40>
> Trace; f8c80210 <[nfsd]nfsd+0/370>
> 
> Code;  c0145060 <fsync_buffers_list+50/1b0>
> 00000000 <_EIP>:
> Code;  c0145060 <fsync_buffers_list+50/1b0>   <=====
>    0:   89 50 04                  mov    %edx,0x4(%eax)   <=====
> Code;  c0145063 <fsync_buffers_list+53/1b0>
>    3:   c7 41 54 00 00 00 00      movl   $0x0,0x54(%ecx)
> Code;  c014506a <fsync_buffers_list+5a/1b0>
>    a:   c7 43 04 00 00 00 00      movl   $0x0,0x4(%ebx)
> Code;  c0145071 <fsync_buffers_list+61/1b0>
>   11:   8b 44 24 00               mov    0x0(%esp,1),%eax
> 
> 
> 1 warning issued.  Results may not be reliable.
> 
> 
> As you can see reiserfs seems involved. Regarding reiserfs and my last postings
> I can assure you that all reiserfs partitions were checked via reiserfsck right
> before installation of rc1 - as Oleg advised - and found:
> "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs"
> I was told to use --fix-fixable option which I did and it indeed fixed the
> problem. Trying reiserfsck after that found no errors any more. So I see no
> chance that corrupt data on the media (through former crashes) is responsible
> for this one. Hint: spelling in reiserfsck should be checked ;-)

It might be a problem in reiserfs. You're getting oopses on different
places with different stack traces, which is weird. 

I'll take a closer look at this oops now. 

> Second, I re-install the box with CONFIG_DEBUG_SLAB="y" right now. Please tell
> me if I should perform special steps (SYSRQ or the like) after the next crash
> happens, or if the decoded oops will be sufficient.

The decoded oops should be sufficient. 


^ permalink raw reply	[flat|nested] 21+ messages in thread

[parent not found: <3F325198.2010301@namesys.com>]

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
       [not found]                 ` <3F325198.2010301@namesys.com>
@ 2003-08-07 13:32                   ` Stephan von Krawczynski
  2003-08-18 20:29                     ` Mike Fedyk
  0 siblings, 1 reply; 21+ messages in thread
From: Stephan von Krawczynski @ 2003-08-07 13:32 UTC (permalink / raw)
  To: Hans Reiser; +Cc: linux-kernel

On Thu, 07 Aug 2003 17:18:16 +0400
Hans Reiser <reiser@namesys.com> wrote:

> >On Thu, 7 Aug 2003, Stephan von Krawczynski wrote:
> >>for this one. Hint: spelling in reiserfsck should be checked ;-)
> >
> where?

Hello Hans,

I am no native english, but 
"Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs"
feels uncomfortable in my ears ;-)
I'd say "two things differ", without trailing "s". I am not even sure if
"bitmaps" shouldn't be singular "bitmap" instead.

But, as stated, I am no native, I can't be sure.

Regards,
Stephan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-07 13:32                   ` Stephan von Krawczynski
@ 2003-08-18 20:29                     ` Mike Fedyk
  2003-08-18 20:39                       ` Stephan von Krawczynski
  0 siblings, 1 reply; 21+ messages in thread
From: Mike Fedyk @ 2003-08-18 20:29 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: Hans Reiser, linux-kernel

On Thu, Aug 07, 2003 at 03:32:57PM +0200, Stephan von Krawczynski wrote:
> On Thu, 07 Aug 2003 17:18:16 +0400
> Hans Reiser <reiser@namesys.com> wrote:
> 
> > >On Thu, 7 Aug 2003, Stephan von Krawczynski wrote:
> > >>for this one. Hint: spelling in reiserfsck should be checked ;-)
> > >
> > where?
> 
> Hello Hans,
> 
> I am no native english, but 
> "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs"
> feels uncomfortable in my ears ;-)
> I'd say "two things differ", without trailing "s". I am not even sure if
> "bitmaps" shouldn't be singular "bitmap" instead.

"bitmaps" with your changes would be correct.

Though, just turn "bitmaps" into "bitmap" and it should be fine.  I can't
really think of a phrase specific enough for the error message without
adding enough text to make it two lines, which wouldn't be good.

"Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmap differs"

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-18 20:29                     ` Mike Fedyk
@ 2003-08-18 20:39                       ` Stephan von Krawczynski
  2003-08-18 21:05                         ` [grammar] " Matt Gibson
  2003-08-18 21:09                         ` Mike Fedyk
  0 siblings, 2 replies; 21+ messages in thread
From: Stephan von Krawczynski @ 2003-08-18 20:39 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: reiser, linux-kernel

On Mon, 18 Aug 2003 13:29:49 -0700
Mike Fedyk <mfedyk@matchmail.com> wrote:

> > I'd say "two things differ", without trailing "s". I am not even sure if
> > "bitmaps" shouldn't be singular "bitmap" instead.
> 
> "bitmaps" with your changes would be correct.
> 
> Though, just turn "bitmaps" into "bitmap" and it should be fine.  I can't
> really think of a phrase specific enough for the error message without
> adding enough text to make it two lines, which wouldn't be good.
> 
> "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmap differs"

Hm, but:

"a and b differ"
"a differs from b"

or not?

Alternatives:

"a and b are different"

But if you use "are" here, you cannot use "differs" above, right?

Regards,
Stephan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [grammar] 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-18 20:39                       ` Stephan von Krawczynski
@ 2003-08-18 21:05                         ` Matt Gibson
  2003-08-18 21:09                         ` Mike Fedyk
  1 sibling, 0 replies; 21+ messages in thread
From: Matt Gibson @ 2003-08-18 21:05 UTC (permalink / raw)
  To: linux-kernel

On Monday 18 Aug 2003 21:39, Stephan von Krawczynski wrote:
> Mike Fedyk <mfedyk@matchmail.com> wrote:
> > "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmap
> > differs"
>
> Hm, but:
>
> "a and b differ"
> "a differs from b"

Yes.  Assuming that you're reporting the comparison of two single bitmaps:

"The on-disk bitmap and the correct bitmap differ."
"The on-disk and the correct bitmap differ."
"The on-disk bitmap differs from the correct bitmap."

I'd say the last of those three sounds best; the second sounds a little 
stilted because you have to think for a moment to realise that "on-disk" is 
being used as a contraction of "on-disk bitmap."

If the difference is between two sets of bitmaps:

"The on-disk bitmaps and the correct bitmaps differ."
"The on-disk and the correct bitmaps differ."
"The on-disk bitmaps differ from the correct bitmaps."

Matt (and that's the last you'll hear from me on this one; there's enough 
traffic on here as it is...)

-- 
"It's the small gaps between the rain that count,
 and learning how to live amongst them."
	      -- Jeff Noon

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-18 20:39                       ` Stephan von Krawczynski
  2003-08-18 21:05                         ` [grammar] " Matt Gibson
@ 2003-08-18 21:09                         ` Mike Fedyk
  1 sibling, 0 replies; 21+ messages in thread
From: Mike Fedyk @ 2003-08-18 21:09 UTC (permalink / raw)
  To: Stephan von Krawczynski; +Cc: reiser, linux-kernel

On Mon, Aug 18, 2003 at 10:39:46PM +0200, Stephan von Krawczynski wrote:
> On Mon, 18 Aug 2003 13:29:49 -0700
> Mike Fedyk <mfedyk@matchmail.com> wrote:
> 
> > > I'd say "two things differ", without trailing "s". I am not even sure if
> > > "bitmaps" shouldn't be singular "bitmap" instead.
> > 
> > "bitmaps" with your changes would be correct.
> > 
> > Though, just turn "bitmaps" into "bitmap" and it should be fine.  I can't
> > really think of a phrase specific enough for the error message without
> > adding enough text to make it two lines, which wouldn't be good.
> > 
> > "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmap differs"
> 
> Hm, but:
> 
> "a and b differ"

1) "Comparing bitmaps.. vpf-10640: The on-disk and correct bitmap differ"

> "a differs from b"

2) "Comparing bitmaps.. vpf-10640: The on-disk differs from the correct bitmap"

> 
> or not?
> 
> Alternatives:
> 
> "a and b are different"

3) "Comparing bitmaps.. vpf-10640: The on-disk and correct are different"

> 
> But if you use "are" here, you cannot use "differs" above, right?
> 

Yes.

I kinda like (1), or the origional changed to "bitmap" instead of
"bitmaps".

Mike

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: 2.4.22-pre lockups (now decoded oops for pre10)
  2003-08-07 12:45               ` Marcelo Tosatti
       [not found]                 ` <3F325198.2010301@namesys.com>
@ 2003-08-07 15:52                 ` Stephan von Krawczynski
  1 sibling, 0 replies; 21+ messages in thread
From: Stephan von Krawczynski @ 2003-08-07 15:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: andrea, linux-kernel, green

On Thu, 7 Aug 2003 09:45:36 -0300 (BRT)
Marcelo Tosatti <marcelo@conectiva.com.br> wrote:

> The decoded oops should be sufficient. 

Well, how about this one:


ksymoops 2.4.8 on i686 2.4.22-rc1.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.22-rc1/ (default)
     -m /boot/System.map-2.4.22-rc1 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

Unable to handle kernel paging request at virtual address 63eabdb3
c0145f31 
*pde = 00000000
Oops: 0000
CPU:    0
EIP:    0010:[<c0145f31>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 00000000   ebx: 00000000   ecx: 00000061   edx: 63eabd93
esi: 00000000   edi: 00001000   ebp: 00000000   esp: c34f7e60
ds: 0018   es: 0018   ss: 0018
Process kupdated (pid: 7, stackpage=c34f7000)
Stack: 00000000 f7afb1f0 c0146018 00000000 c01312e9 00000000 c1849dd0 00001000
       00001000 00000803 c014823a c1849dd0 00001000 00000000 f79b7fa4 00001e18
       c0148428 f79b7fa4 00001e18 00001000 e9640000 00000000 00000803 00001000
Call Trace:    [<c0146018>] [<c01312e9>] [<c014823a>] [<c0148428>] [<c0145b36>]
  [<c0197328>] [<c019ceb9>] [<c019c4f5>] [<c0188e94>] [<c01498cb>] [<c014887c>]
  [<c0148be9>] [<c0105000>] [<c010592e>] [<c0148af0>]
Code: 8b 42 20 a3 30 c6 37 c0 8d 41 ff a3 34 c6 37 c0 c6 05 c0 bb


>>EIP; c0145f31 <get_unused_buffer_head+21/b0>   <=====

>>esp; c34f7e60 <_end+314cc40/3852ee40>

Trace; c0146018 <create_buffers+28/100>
Trace; c01312e9 <find_or_create_page+109/110>
Trace; c014823a <grow_dev_page+7a/c0>
Trace; c0148428 <grow_buffers+98/110>
Trace; c0145b36 <getblk+46/80>
Trace; c0197328 <journal_getblk+28/30>
Trace; c019ceb9 <do_journal_end+139/bb0>
Trace; c019c4f5 <flush_old_commits+135/1d0>
Trace; c0188e94 <reiserfs_write_super+64/90>
Trace; c01498cb <sync_supers+14b/170>
Trace; c014887c <sync_old_buffers+3c/b0>
Trace; c0148be9 <kupdate+f9/130>
Trace; c0105000 <_stext+0/0>
Trace; c010592e <arch_kernel_thread+2e/40>
Trace; c0148af0 <kupdate+0/130>

Code;  c0145f31 <get_unused_buffer_head+21/b0>
00000000 <_EIP>:
Code;  c0145f31 <get_unused_buffer_head+21/b0>   <=====
   0:   8b 42 20                  mov    0x20(%edx),%eax   <=====
Code;  c0145f34 <get_unused_buffer_head+24/b0>
   3:   a3 30 c6 37 c0            mov    %eax,0xc037c630
Code;  c0145f39 <get_unused_buffer_head+29/b0>
   8:   8d 41 ff                  lea    0xffffffff(%ecx),%eax
Code;  c0145f3c <get_unused_buffer_head+2c/b0>
   b:   a3 34 c6 37 c0            mov    %eax,0xc037c634
Code;  c0145f41 <get_unused_buffer_head+31/b0>
  10:   c6 05 c0 bb 00 00 00      movb   $0x0,0xbbc0


1 warning issued.  Results may not be reliable.


After that I received this one:


ksymoops 2.4.8 on i686 2.4.22-rc1.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.22-rc1/ (default)
     -m /boot/System.map-2.4.22-rc1 (default)

Warning: You did not tell me where to find symbol information.  I will
assume that the log matches the kernel and modules that are running
right now and I'll use the default options above for symbol resolution.
If the current kernel and/or modules do not match the log, you can get
more accurate output by telling me the kernel version and where to find
map, modules, ksyms etc.  ksymoops -h explains the options.

 NMI Watchdog detected LOCKUP on CPU1, eip c011a747, registers:
CPU:    1
EIP:    0010:[<c011a747>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00000082
eax: cef0b8dc   ebx: cef0b894   ecx: 00000001   edx: 00000003  
esi: 00000008   edi: cef0b8dc   ebp: ec8efe48   esp: ec8efe28
ds: 0018   es: 0018   ss: 0018
Process tar (pid: 13603, stackpage=ec8ef000)
Stack: 00000000 cef0b894 00000000 00000282 00000003 cef0b894 00000008 cef0b8dc
       00000000 c01c4f41 00000000 cef0b894 00000000 0001679d cef0b894 00001000 
       c0146c87 00000000 cef0b894 cef0b894 00000004 cef0b894 ec8ee000 00000001
Call Trace:    [<c01c4f41>] [<c0146c87>] [<c013ae92>] [<c0119630>] [<c0130d7e>]
  [<c017ff50>] [<c013146f>] [<c0131751>] [<c0131d50>] [<c0131ffc>] [<c0131d50>]
  [<c014328b>] [<c010782f>]
Code: 7e f9 e9 d9 ec ff ff 80 38 00 f3 90 7e f9 e9 5d ed ff ff 80 


>>EIP; c011a747 <.text.lock.sched+3f/178>   <=====

>>eax; cef0b8dc <_end+eb606bc/3852ee40>
>>ebx; cef0b894 <_end+eb60674/3852ee40>
>>edi; cef0b8dc <_end+eb606bc/3852ee40>
>>ebp; ec8efe48 <_end+2c544c28/3852ee40>
>>esp; ec8efe28 <_end+2c544c08/3852ee40>

Trace; c01c4f41 <submit_bh+a1/c0>
Trace; c0146c87 <block_read_full_page+2d7/2f0>
Trace; c013ae92 <__alloc_pages+42/190>
Trace; c0119630 <wait_for_completion+70/b0>
Trace; c0130d7e <page_cache_read+be/e0>
Trace; c017ff50 <reiserfs_get_block+0/1490>
Trace; c013146f <generic_file_readahead+af/1a0>
Trace; c0131751 <do_generic_file_read+1c1/470>
Trace; c0131d50 <file_read_actor+0/110>
Trace; c0131ffc <generic_file_read+19c/1b0>
Trace; c0131d50 <file_read_actor+0/110>
Trace; c014328b <sys_read+9b/180>
Trace; c010782f <system_call+33/38>

Code;  c011a747 <.text.lock.sched+3f/178>
00000000 <_EIP>:
Code;  c011a747 <.text.lock.sched+3f/178>   <=====
   0:   7e f9                     jle    fffffffb <_EIP+0xfffffffb>   <=====
Code;  c011a749 <.text.lock.sched+41/178>
   2:   e9 d9 ec ff ff            jmp    ffffece0 <_EIP+0xffffece0>
Code;  c011a74e <.text.lock.sched+46/178>
   7:   80 38 00                  cmpb   $0x0,(%eax)
Code;  c011a751 <.text.lock.sched+49/178>
   a:   f3 90                     repz nop 
Code;  c011a753 <.text.lock.sched+4b/178>
   c:   7e f9                     jle    7 <_EIP+0x7>
Code;  c011a755 <.text.lock.sched+4d/178>
   e:   e9 5d ed ff ff            jmp    ffffed70 <_EIP+0xffffed70>
Code;  c011a75a <.text.lock.sched+52/178>
  13:   80 00 00                  addb   $0x0,(%eax)


1 warning issued.  Results may not be reliable.


There were no I/O errors or any other spectacular things happening. It just
died while I was sitting right next to it during the verify run of tar.

Regards,
Stephan

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2003-08-18 21:38 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.55L.0307251040240.12645@freak.distro.conectiva>
     [not found] ` <20030725174517.5b21116d.skraw@ithnet.com>
     [not found]   ` <Pine.LNX.4.55L.0307251545090.14733@freak.distro.conectiva>
2003-08-02 12:27     ` 2.4.22-pre lockups (decoded oops for pre8) Stephan von Krawczynski
2003-08-03  7:25       ` Willy Tarreau
2003-08-03  9:40         ` Stephan von Krawczynski
2003-08-05 16:40       ` Marcelo Tosatti
2003-08-06  2:37         ` Stephan von Krawczynski
2003-08-06  7:41         ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski
2003-08-06  8:58           ` Oleg Drokin
2003-08-06  9:09           ` Willy Tarreau
2003-08-06  9:36             ` Stephan von Krawczynski
2003-08-06 12:45               ` Willy Tarreau
2003-08-18 14:23             ` Andrea Arcangeli
2003-08-06 18:15           ` Marcelo Tosatti
2003-08-07  2:14             ` Stephan von Krawczynski
2003-08-07  5:35               ` Oleg Drokin
2003-08-07 12:45               ` Marcelo Tosatti
     [not found]                 ` <3F325198.2010301@namesys.com>
2003-08-07 13:32                   ` Stephan von Krawczynski
2003-08-18 20:29                     ` Mike Fedyk
2003-08-18 20:39                       ` Stephan von Krawczynski
2003-08-18 21:05                         ` [grammar] " Matt Gibson
2003-08-18 21:09                         ` Mike Fedyk
2003-08-07 15:52                 ` Stephan von Krawczynski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).