* Re: 2.4.22-pre lockups (decoded oops for pre8) [not found] ` <Pine.LNX.4.55L.0307251545090.14733@freak.distro.conectiva> @ 2003-08-02 12:27 ` Stephan von Krawczynski 2003-08-03 7:25 ` Willy Tarreau 2003-08-05 16:40 ` Marcelo Tosatti 0 siblings, 2 replies; 21+ messages in thread From: Stephan von Krawczynski @ 2003-08-02 12:27 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: andrea, linux-kernel Hello Marcelo, hello andrea, after some days of running 2.4.22-pre8 I finally got the crash (freeze as usual). This time the debuggin setup worked and I got: ksymoops 2.4.8 on i686 2.4.22-pre8. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.22-pre8/ (default) -m /boot/System.map-2.4.22-pre8 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Unable to handle kernel paging request at virtual address 4129b0fc c0130084 *pde = 313f6067 Oops: 0002 CPU: 1 EIP: 0010:[<c0130084>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010246 eax: 00000000 ebx: c2cfdba0 ecx: 00000000 edx: 4129b0fc esi: d5fb0a24 edi: 0001ca22 ebp: c02eaaa8 esp: c345df30 ds: 0018 es: 0018 ss: 0018 Process kswapd (pid: 5, stackpage=c345d000) Stack: c2cfdba0 d5fb0a24 c2cfdba0 c013924f c2cfdba0 000001d0 00000200 000001d0 00000006 00000020 000001d0 00000020 00000006 c0139493 00000006 00000001 c02eaaa8 000001d0 00000006 c02eaaa8 00000000 c013950e 00000020 c02eaaa8 Call Trace: [<c013924f>] [<c0139493>] [<c013950e>] [<c013961c>] [<c01396a8>] [<c01397d8>] [<c0139740>] [<c0105000>] [<c010592e>] [<c0139740>] Code: 89 02 c7 43 24 00 00 00 00 f0 ff 0d 9c a5 37 c0 5a 5b 5e c3 >>EIP; c0130084 <__remove_inode_page+44/60> <===== >>ebx; c2cfdba0 <_end+2952980/3852ee40> >>esi; d5fb0a24 <_end+15c05804/3852ee40> >>ebp; c02eaaa8 <contig_page_data+168/340> >>esp; c345df30 <_end+30b2d10/3852ee40> Trace; c013924f <shrink_cache+2df/3b0> Trace; c0139493 <shrink_caches+63/a0> Trace; c013950e <try_to_free_pages_zone+3e/60> Trace; c013961c <kswapd_balance_pgdat+4c/b0> Trace; c01396a8 <kswapd_balance+28/40> Trace; c01397d8 <kswapd+98/c0> Trace; c0139740 <kswapd+0/c0> Trace; c0105000 <_stext+0/0> Trace; c010592e <arch_kernel_thread+2e/40> Trace; c0139740 <kswapd+0/c0> Code; c0130084 <__remove_inode_page+44/60> 00000000 <_EIP>: Code; c0130084 <__remove_inode_page+44/60> <===== 0: 89 02 mov %eax,(%edx) <===== Code; c0130086 <__remove_inode_page+46/60> 2: c7 43 24 00 00 00 00 movl $0x0,0x24(%ebx) Code; c013008d <__remove_inode_page+4d/60> 9: f0 ff 0d 9c a5 37 c0 lock decl 0xc037a59c Code; c0130094 <__remove_inode_page+54/60> 10: 5a pop %edx Code; c0130095 <__remove_inode_page+55/60> 11: 5b pop %ebx Code; c0130096 <__remove_inode_page+56/60> 12: 5e pop %esi Code; c0130097 <__remove_inode_page+57/60> 13: c3 ret 1 warning issued. Results may not be reliable. Hope this helps. Anything further I can do? Regards, Stephan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (decoded oops for pre8) 2003-08-02 12:27 ` 2.4.22-pre lockups (decoded oops for pre8) Stephan von Krawczynski @ 2003-08-03 7:25 ` Willy Tarreau 2003-08-03 9:40 ` Stephan von Krawczynski 2003-08-05 16:40 ` Marcelo Tosatti 1 sibling, 1 reply; 21+ messages in thread From: Willy Tarreau @ 2003-08-03 7:25 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: Marcelo Tosatti, andrea, linux-kernel Hi Stephan, This is in remove_page_from_hash_queue() at filemap.c:114 : *pprev = next; pprev is taken from page->pprev_hash and is considered invalid here (4129b0fc). Assuming it has been corrupted earlier, it seems that the only files able to touch this either directly or indirectly are : - mm/filemap.c (add_page_to_hash_queue, add_to_page_cache*) - mm/shmem.c (add_to_page_cache_unique) - mm/swap_state.c (idem) - fs/ext3/inode.c and fs/buffer.c (find_or_create_page) So the problem may be narrowed down to a few files. Perhaps digging through the VM changes since before you had a problem will give you more clues... Cheers, Willy On Sat, Aug 02, 2003 at 02:27:34PM +0200, Stephan von Krawczynski wrote: > Unable to handle kernel paging request at virtual address 4129b0fc > c0130084 > *pde = 313f6067 > Oops: 0002 > CPU: 1 > EIP: 0010:[<c0130084>] Not tainted > Using defaults from ksymoops -t elf32-i386 -a i386 > EFLAGS: 00010246 > eax: 00000000 ebx: c2cfdba0 ecx: 00000000 edx: 4129b0fc > esi: d5fb0a24 edi: 0001ca22 ebp: c02eaaa8 esp: c345df30 > ds: 0018 es: 0018 ss: 0018 > Process kswapd (pid: 5, stackpage=c345d000) > Stack: c2cfdba0 d5fb0a24 c2cfdba0 c013924f c2cfdba0 000001d0 00000200 000001d0 > 00000006 00000020 000001d0 00000020 00000006 c0139493 00000006 00000001 > c02eaaa8 000001d0 00000006 c02eaaa8 00000000 c013950e 00000020 c02eaaa8 > Call Trace: [<c013924f>] [<c0139493>] [<c013950e>] [<c013961c>] [<c01396a8>] > [<c01397d8>] [<c0139740>] [<c0105000>] [<c010592e>] [<c0139740>] > Code: 89 02 c7 43 24 00 00 00 00 f0 ff 0d 9c a5 37 c0 5a 5b 5e c3 > > > >>EIP; c0130084 <__remove_inode_page+44/60> <===== > > >>ebx; c2cfdba0 <_end+2952980/3852ee40> > >>esi; d5fb0a24 <_end+15c05804/3852ee40> > >>ebp; c02eaaa8 <contig_page_data+168/340> > >>esp; c345df30 <_end+30b2d10/3852ee40> > > Trace; c013924f <shrink_cache+2df/3b0> > Trace; c0139493 <shrink_caches+63/a0> > Trace; c013950e <try_to_free_pages_zone+3e/60> > Trace; c013961c <kswapd_balance_pgdat+4c/b0> > Trace; c01396a8 <kswapd_balance+28/40> > Trace; c01397d8 <kswapd+98/c0> > Trace; c0139740 <kswapd+0/c0> > Trace; c0105000 <_stext+0/0> > Trace; c010592e <arch_kernel_thread+2e/40> > Trace; c0139740 <kswapd+0/c0> > > Code; c0130084 <__remove_inode_page+44/60> > 00000000 <_EIP>: > Code; c0130084 <__remove_inode_page+44/60> <===== > 0: 89 02 mov %eax,(%edx) <===== > Code; c0130086 <__remove_inode_page+46/60> > 2: c7 43 24 00 00 00 00 movl $0x0,0x24(%ebx) > Code; c013008d <__remove_inode_page+4d/60> > 9: f0 ff 0d 9c a5 37 c0 lock decl 0xc037a59c > Code; c0130094 <__remove_inode_page+54/60> > 10: 5a pop %edx > Code; c0130095 <__remove_inode_page+55/60> > 11: 5b pop %ebx > Code; c0130096 <__remove_inode_page+56/60> > 12: 5e pop %esi > Code; c0130097 <__remove_inode_page+57/60> > 13: c3 ret > > > 1 warning issued. Results may not be reliable. > > > Hope this helps. > Anything further I can do? > > Regards, > Stephan > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (decoded oops for pre8) 2003-08-03 7:25 ` Willy Tarreau @ 2003-08-03 9:40 ` Stephan von Krawczynski 0 siblings, 0 replies; 21+ messages in thread From: Stephan von Krawczynski @ 2003-08-03 9:40 UTC (permalink / raw) To: Willy Tarreau; +Cc: marcelo, andrea, linux-kernel On Sun, 3 Aug 2003 09:25:25 +0200 Willy Tarreau <willy@w.ods.org> wrote: > Hi Stephan, > > This is in remove_page_from_hash_queue() at filemap.c:114 : > *pprev = next; > > pprev is taken from page->pprev_hash and is considered invalid here > (4129b0fc). Assuming it has been corrupted earlier, it seems that the only > files able to touch this either directly or indirectly are : > - mm/filemap.c (add_page_to_hash_queue, add_to_page_cache*) > - mm/shmem.c (add_to_page_cache_unique) > - mm/swap_state.c (idem) > - fs/ext3/inode.c and fs/buffer.c (find_or_create_page) Ext3 is unlikely to be related, the box never saw ext3. Ext2 is only used on /boot (so very unlikely, too), everything else is reiserfs. > > So the problem may be narrowed down to a few files. Perhaps digging through > the VM changes since before you had a problem will give you more clues... > > Cheers, > Willy Thanks for commenting, the problem really is annoying because I _know_ the box will freeze, only it takes time, this time 4 days... Regards, Stephan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (decoded oops for pre8) 2003-08-02 12:27 ` 2.4.22-pre lockups (decoded oops for pre8) Stephan von Krawczynski 2003-08-03 7:25 ` Willy Tarreau @ 2003-08-05 16:40 ` Marcelo Tosatti 2003-08-06 2:37 ` Stephan von Krawczynski 2003-08-06 7:41 ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski 1 sibling, 2 replies; 21+ messages in thread From: Marcelo Tosatti @ 2003-08-05 16:40 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: andrea, linux-kernel Stephan, Is this _STOCK_ 2.4.22-pre10 (no vmware, no other modules) ? On Sat, 2 Aug 2003, Stephan von Krawczynski wrote: > Hello Marcelo, hello andrea, > > after some days of running 2.4.22-pre8 I finally got the crash (freeze as > usual). This time the debuggin setup worked and I got: > > Unable to handle kernel paging request at virtual address 4129b0fc > c0130084 > *pde = 313f6067 > Oops: 0002 > CPU: 1 > EIP: 0010:[<c0130084>] Not tainted > Using defaults from ksymoops -t elf32-i386 -a i386 > EFLAGS: 00010246 > eax: 00000000 ebx: c2cfdba0 ecx: 00000000 edx: 4129b0fc > esi: d5fb0a24 edi: 0001ca22 ebp: c02eaaa8 esp: c345df30 > ds: 0018 es: 0018 ss: 0018 > Process kswapd (pid: 5, stackpage=c345d000) > Stack: c2cfdba0 d5fb0a24 c2cfdba0 c013924f c2cfdba0 000001d0 00000200 000001d0 > 00000006 00000020 000001d0 00000020 00000006 c0139493 00000006 00000001 > c02eaaa8 000001d0 00000006 c02eaaa8 00000000 c013950e 00000020 c02eaaa8 > Call Trace: [<c013924f>] [<c0139493>] [<c013950e>] [<c013961c>] [<c01396a8>] > [<c01397d8>] [<c0139740>] [<c0105000>] [<c010592e>] [<c0139740>] > Code: 89 02 c7 43 24 00 00 00 00 f0 ff 0d 9c a5 37 c0 5a 5b 5e c3 > > > >>EIP; c0130084 <__remove_inode_page+44/60> <===== > > >>ebx; c2cfdba0 <_end+2952980/3852ee40> > >>esi; d5fb0a24 <_end+15c05804/3852ee40> > >>ebp; c02eaaa8 <contig_page_data+168/340> > >>esp; c345df30 <_end+30b2d10/3852ee40> > > Trace; c013924f <shrink_cache+2df/3b0> > Trace; c0139493 <shrink_caches+63/a0> > Trace; c013950e <try_to_free_pages_zone+3e/60> > Trace; c013961c <kswapd_balance_pgdat+4c/b0> > Trace; c01396a8 <kswapd_balance+28/40> > Trace; c01397d8 <kswapd+98/c0> > Trace; c0139740 <kswapd+0/c0> > Trace; c0105000 <_stext+0/0> > Trace; c010592e <arch_kernel_thread+2e/40> > Trace; c0139740 <kswapd+0/c0> > > Code; c0130084 <__remove_inode_page+44/60> > 00000000 <_EIP>: > Code; c0130084 <__remove_inode_page+44/60> <===== > 0: 89 02 mov %eax,(%edx) <===== > Code; c0130086 <__remove_inode_page+46/60> > 2: c7 43 24 00 00 00 00 movl $0x0,0x24(%ebx) > Code; c013008d <__remove_inode_page+4d/60> > 9: f0 ff 0d 9c a5 37 c0 lock decl 0xc037a59c > Code; c0130094 <__remove_inode_page+54/60> > 10: 5a pop %edx > Code; c0130095 <__remove_inode_page+55/60> > 11: 5b pop %ebx > Code; c0130096 <__remove_inode_page+56/60> > 12: 5e pop %esi > Code; c0130097 <__remove_inode_page+57/60> > 13: c3 ret ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (decoded oops for pre8) 2003-08-05 16:40 ` Marcelo Tosatti @ 2003-08-06 2:37 ` Stephan von Krawczynski 2003-08-06 7:41 ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski 1 sibling, 0 replies; 21+ messages in thread From: Stephan von Krawczynski @ 2003-08-06 2:37 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: andrea, linux-kernel On Tue, 5 Aug 2003 13:40:48 -0300 (BRT) Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > > Stephan, > > Is this _STOCK_ 2.4.22-pre10 (no vmware, no other modules) ? This was from a pre8. There were no strange modules and no vmware involved. Everything clean, kernel 2.4.22-pre8 on top of SuSE 8.2 distro. Output was created via serial console. Regards, Stephan > > On Sat, 2 Aug 2003, Stephan von Krawczynski wrote: > > > Hello Marcelo, hello andrea, > > > > after some days of running 2.4.22-pre8 I finally got the crash (freeze as > > usual). This time the debuggin setup worked and I got: > > > > Unable to handle kernel paging request at virtual address 4129b0fc > > c0130084 > > *pde = 313f6067 > > Oops: 0002 > > CPU: 1 > > EIP: 0010:[<c0130084>] Not tainted > > Using defaults from ksymoops -t elf32-i386 -a i386 > > EFLAGS: 00010246 > > eax: 00000000 ebx: c2cfdba0 ecx: 00000000 edx: 4129b0fc > > esi: d5fb0a24 edi: 0001ca22 ebp: c02eaaa8 esp: c345df30 > > ds: 0018 es: 0018 ss: 0018 > > Process kswapd (pid: 5, stackpage=c345d000) > > Stack: c2cfdba0 d5fb0a24 c2cfdba0 c013924f c2cfdba0 000001d0 00000200 > > 000001d0 > > 00000006 00000020 000001d0 00000020 00000006 c0139493 00000006 > > 00000001 c02eaaa8 000001d0 00000006 c02eaaa8 00000000 c013950e > > 00000020 c02eaaa8 > > Call Trace: [<c013924f>] [<c0139493>] [<c013950e>] [<c013961c>] > > [<c01396a8>] > > [<c01397d8>] [<c0139740>] [<c0105000>] [<c010592e>] [<c0139740>] > > Code: 89 02 c7 43 24 00 00 00 00 f0 ff 0d 9c a5 37 c0 5a 5b 5e c3 > > > > > > >>EIP; c0130084 <__remove_inode_page+44/60> <===== > > > > >>ebx; c2cfdba0 <_end+2952980/3852ee40> > > >>esi; d5fb0a24 <_end+15c05804/3852ee40> > > >>ebp; c02eaaa8 <contig_page_data+168/340> > > >>esp; c345df30 <_end+30b2d10/3852ee40> > > > > Trace; c013924f <shrink_cache+2df/3b0> > > Trace; c0139493 <shrink_caches+63/a0> > > Trace; c013950e <try_to_free_pages_zone+3e/60> > > Trace; c013961c <kswapd_balance_pgdat+4c/b0> > > Trace; c01396a8 <kswapd_balance+28/40> > > Trace; c01397d8 <kswapd+98/c0> > > Trace; c0139740 <kswapd+0/c0> > > Trace; c0105000 <_stext+0/0> > > Trace; c010592e <arch_kernel_thread+2e/40> > > Trace; c0139740 <kswapd+0/c0> > > > > Code; c0130084 <__remove_inode_page+44/60> > > 00000000 <_EIP>: > > Code; c0130084 <__remove_inode_page+44/60> <===== > > 0: 89 02 mov %eax,(%edx) <===== > > Code; c0130086 <__remove_inode_page+46/60> > > 2: c7 43 24 00 00 00 00 movl $0x0,0x24(%ebx) > > Code; c013008d <__remove_inode_page+4d/60> > > 9: f0 ff 0d 9c a5 37 c0 lock decl 0xc037a59c > > Code; c0130094 <__remove_inode_page+54/60> > > 10: 5a pop %edx > > Code; c0130095 <__remove_inode_page+55/60> > > 11: 5b pop %ebx > > Code; c0130096 <__remove_inode_page+56/60> > > 12: 5e pop %esi > > Code; c0130097 <__remove_inode_page+57/60> > > 13: c3 ret > ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-05 16:40 ` Marcelo Tosatti 2003-08-06 2:37 ` Stephan von Krawczynski @ 2003-08-06 7:41 ` Stephan von Krawczynski 2003-08-06 8:58 ` Oleg Drokin ` (2 more replies) 1 sibling, 3 replies; 21+ messages in thread From: Stephan von Krawczynski @ 2003-08-06 7:41 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: andrea, linux-kernel, green On Tue, 5 Aug 2003 13:40:48 -0300 (BRT) Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > > Stephan, > > Is this _STOCK_ 2.4.22-pre10 (no vmware, no other modules) ? Hello Marcelo, today I have a fresh -pre10 oops for you. Everything seems to start with (there is no i/o error or the like, is it possible that the fs got damaged during former crashes?): sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478481)[dev:blocknr]: bit already cleared sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478445)[dev:blocknr]: bit already cleared sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478441)[dev:blocknr]: bit already cleared sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478348)[dev:blocknr]: bit already cleared An then: ksymoops 2.4.8 on i686 2.4.22-pre10. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.22-pre10/ (default) -m /boot/System.map-2.4.22-pre10 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Unable to handle kernel NULL pointer dereference at virtual address 00000006 c0144b14 *pde = 00000000 Oops: 0002 CPU: 1 EIP: 0010:[<c0144b14>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010246 eax: 00000000 ebx: f0f66540 ecx: f0f66540 edx: 00000006 esi: f0f66540 edi: f0f66540 ebp: c2ce0350 esp: c345df24 ds: 0018 es: 0018 ss: 0018 Process kswapd (pid: 5, stackpage=c345d000) Stack: c0147ddf f0f66540 00000000 c2ce0350 0001bcad c02eab68 c0139228 c2ce0350 000001d0 00000200 000001d0 00000016 00000020 000001d0 00000020 00000006 c01394b3 00000006 c345c000 c02eab68 000001d0 00000006 c02eab68 00000000 Call Trace: [<c0147ddf>] [<c0139228>] [<c01394b3>] [<c013952e>] [<c013963c>] [<c01396c8>] [<c01397f8>] [<c0139760>] [<c0105000>] [<c010592e>] [<c0139760>] Code: 89 02 c7 41 30 00 00 00 00 89 4c 24 04 e9 7a ff ff ff 8d 76 >>EIP; c0144b14 <__remove_from_queues+14/30> <===== >>ebx; f0f66540 <_end+30bbb320/3852ee40> >>ecx; f0f66540 <_end+30bbb320/3852ee40> >>esi; f0f66540 <_end+30bbb320/3852ee40> >>edi; f0f66540 <_end+30bbb320/3852ee40> >>ebp; c2ce0350 <_end+2935130/3852ee40> >>esp; c345df24 <_end+30b2d04/3852ee40> Trace; c0147ddf <try_to_free_buffers+7f/170> Trace; c0139228 <shrink_cache+298/3b0> Trace; c01394b3 <shrink_caches+63/a0> Trace; c013952e <try_to_free_pages_zone+3e/60> Trace; c013963c <kswapd_balance_pgdat+4c/b0> Trace; c01396c8 <kswapd_balance+28/40> Trace; c01397f8 <kswapd+98/c0> Trace; c0139760 <kswapd+0/c0> Trace; c0105000 <_stext+0/0> Trace; c010592e <arch_kernel_thread+2e/40> Trace; c0139760 <kswapd+0/c0> Code; c0144b14 <__remove_from_queues+14/30> 00000000 <_EIP>: Code; c0144b14 <__remove_from_queues+14/30> <===== 0: 89 02 mov %eax,(%edx) <===== Code; c0144b16 <__remove_from_queues+16/30> 2: c7 41 30 00 00 00 00 movl $0x0,0x30(%ecx) Code; c0144b1d <__remove_from_queues+1d/30> 9: 89 4c 24 04 mov %ecx,0x4(%esp,1) Code; c0144b21 <__remove_from_queues+21/30> d: e9 7a ff ff ff jmp ffffff8c <_EIP+0xffffff8c> Code; c0144b26 <__remove_from_queues+26/30> 12: 8d 76 00 lea 0x0(%esi),%esi 1 warning issued. Results may not be reliable. Regards, Stephan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-06 7:41 ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski @ 2003-08-06 8:58 ` Oleg Drokin 2003-08-06 9:09 ` Willy Tarreau 2003-08-06 18:15 ` Marcelo Tosatti 2 siblings, 0 replies; 21+ messages in thread From: Oleg Drokin @ 2003-08-06 8:58 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: Marcelo Tosatti, andrea, linux-kernel Hello! On Wed, Aug 06, 2003 at 09:41:50AM +0200, Stephan von Krawczynski wrote: > > Is this _STOCK_ 2.4.22-pre10 (no vmware, no other modules) ? > Hello Marcelo, > today I have a fresh -pre10 oops for you. > Everything seems to start with (there is no i/o error or the like, is it > possible that the fs got damaged during former crashes?): Well, you'd better run reiserfsck after crashes with binary modules just to make sure everything is ok. > sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478481)[dev:blocknr]: > bit already cleared > sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478445)[dev:blocknr]: > bit already cleared > sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478441)[dev:blocknr]: > bit already cleared > sd(8,17):vs-4080: reiserfs_free_block: free_block (0811:14478348)[dev:blocknr]: > bit already cleared Bye, Oleg ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-06 7:41 ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski 2003-08-06 8:58 ` Oleg Drokin @ 2003-08-06 9:09 ` Willy Tarreau 2003-08-06 9:36 ` Stephan von Krawczynski 2003-08-18 14:23 ` Andrea Arcangeli 2003-08-06 18:15 ` Marcelo Tosatti 2 siblings, 2 replies; 21+ messages in thread From: Willy Tarreau @ 2003-08-06 9:09 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: Marcelo Tosatti, andrea, linux-kernel, green On Wed, Aug 06, 2003 at 09:41:50AM +0200, Stephan von Krawczynski wrote: > Code; c0144b14 <__remove_from_queues+14/30> > 00000000 <_EIP>: > Code; c0144b14 <__remove_from_queues+14/30> <===== > 0: 89 02 mov %eax,(%edx) <===== > Code; c0144b16 <__remove_from_queues+16/30> > 2: c7 41 30 00 00 00 00 movl $0x0,0x30(%ecx) > Code; c0144b1d <__remove_from_queues+1d/30> > 9: 89 4c 24 04 mov %ecx,0x4(%esp,1) > Code; c0144b21 <__remove_from_queues+21/30> > d: e9 7a ff ff ff jmp ffffff8c <_EIP+0xffffff8c> > Code; c0144b26 <__remove_from_queues+26/30> > 12: 8d 76 00 lea 0x0(%esi),%esi once again, it's *pprev=next which is is causing trouble, with pprev=6 this time (fs/buffer.c:523). There really seems to be something playing badly with this... I find amazing that such widely used portions of code only trigger panics on your system ! either it's a rare combinations of several components/drivers, or a strange hardware problem, although I can't imagine which (cpu? bus locking?). Cheers, Willy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-06 9:09 ` Willy Tarreau @ 2003-08-06 9:36 ` Stephan von Krawczynski 2003-08-06 12:45 ` Willy Tarreau 2003-08-18 14:23 ` Andrea Arcangeli 1 sibling, 1 reply; 21+ messages in thread From: Stephan von Krawczynski @ 2003-08-06 9:36 UTC (permalink / raw) To: Willy Tarreau; +Cc: marcelo, andrea, linux-kernel, green On Wed, 6 Aug 2003 11:09:20 +0200 Willy Tarreau <willy@w.ods.org> wrote: > On Wed, Aug 06, 2003 at 09:41:50AM +0200, Stephan von Krawczynski wrote: > > > Code; c0144b14 <__remove_from_queues+14/30> > > 00000000 <_EIP>: > > Code; c0144b14 <__remove_from_queues+14/30> <===== > > 0: 89 02 mov %eax,(%edx) <===== > > Code; c0144b16 <__remove_from_queues+16/30> > > 2: c7 41 30 00 00 00 00 movl $0x0,0x30(%ecx) > > Code; c0144b1d <__remove_from_queues+1d/30> > > 9: 89 4c 24 04 mov %ecx,0x4(%esp,1) > > Code; c0144b21 <__remove_from_queues+21/30> > > d: e9 7a ff ff ff jmp ffffff8c <_EIP+0xffffff8c> > > Code; c0144b26 <__remove_from_queues+26/30> > > 12: 8d 76 00 lea 0x0(%esi),%esi > > once again, it's *pprev=next which is is causing trouble, with pprev=6 this > time (fs/buffer.c:523). There really seems to be something playing badly with > this... > > I find amazing that such widely used portions of code only trigger panics on > your system ! either it's a rare combinations of several components/drivers, > or a strange hardware problem, although I can't imagine which (cpu? bus > locking?). Hm, the hardware may not be that widespread. I guess not many people are really using SMP, 64 bit PCI network, 3 GB RAM, 3ware RAID5 and serverworks board altogether in one box. I can't fight the impression it has something to do with locking issues. It doesn't look exactly like a hardware problem, you would not expect crashes on the same type of code then. The question is: what additional information is needed to find the underlying problem? Regards, Stephan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-06 9:36 ` Stephan von Krawczynski @ 2003-08-06 12:45 ` Willy Tarreau 0 siblings, 0 replies; 21+ messages in thread From: Willy Tarreau @ 2003-08-06 12:45 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: marcelo, andrea, linux-kernel, green, alan > Hm, the hardware may not be that widespread. I guess not many people are really > using SMP, 64 bit PCI network, 3 GB RAM, 3ware RAID5 and serverworks board > altogether in one box. I can't fight the impression it has something to do with > locking issues. It doesn't look exactly like a hardware problem, you would not > expect crashes on the same type of code then. Well, it depends... I once had an overclocked CPU which died only in one case, it was a car simulator, and it always crashed exactly on the same race, at the same position in the round ! I even knew that if I could pass that position, it was ok for another round ! So I later used that game as a reliability test when I was not sure about the origin of a crash :-) It seems as a particular sequence of data and/or code could reliably trigger it although parallel makes never hurt it. > The question is: what additional information is needed to find the underlying > problem? Perhaps cache poisonning could help. Alan has already used this technique extensively in the past, and might still have a patch which could apply to your kernel without too many changes. Alan ? On the other hand, you could also do it by hand, but it's a little hard. You have to pick every place there's a free, and write particular data before the free, if possible, data which can identify who has freed the page. Then after the next crash, you can identify who used the page last. It can sometimes lead you to some driver missing a lock. But that's not certain. Cheers, Willy ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-06 9:09 ` Willy Tarreau 2003-08-06 9:36 ` Stephan von Krawczynski @ 2003-08-18 14:23 ` Andrea Arcangeli 1 sibling, 0 replies; 21+ messages in thread From: Andrea Arcangeli @ 2003-08-18 14:23 UTC (permalink / raw) To: Willy Tarreau Cc: Stephan von Krawczynski, Marcelo Tosatti, linux-kernel, green On Wed, Aug 06, 2003 at 11:09:20AM +0200, Willy Tarreau wrote: > On Wed, Aug 06, 2003 at 09:41:50AM +0200, Stephan von Krawczynski wrote: > > > Code; c0144b14 <__remove_from_queues+14/30> > > 00000000 <_EIP>: > > Code; c0144b14 <__remove_from_queues+14/30> <===== > > 0: 89 02 mov %eax,(%edx) <===== > > Code; c0144b16 <__remove_from_queues+16/30> > > 2: c7 41 30 00 00 00 00 movl $0x0,0x30(%ecx) > > Code; c0144b1d <__remove_from_queues+1d/30> > > 9: 89 4c 24 04 mov %ecx,0x4(%esp,1) > > Code; c0144b21 <__remove_from_queues+21/30> > > d: e9 7a ff ff ff jmp ffffff8c <_EIP+0xffffff8c> > > Code; c0144b26 <__remove_from_queues+26/30> > > 12: 8d 76 00 lea 0x0(%esi),%esi > > once again, it's *pprev=next which is is causing trouble, with pprev=6 this > time (fs/buffer.c:523). There really seems to be something playing badly with > this... > > I find amazing that such widely used portions of code only trigger panics on > your system ! either it's a rare combinations of several components/drivers, or > a strange hardware problem, although I can't imagine which (cpu? bus locking?). normally it's bad ram (or anyways a problem with the memory) when bugs triggers in that place reproducibly. the list walking trashes the l2 and that put more stress on the ram. If it was random memory corruption (software) it would more likely crash in different places (though it's not guaranteed ;). Andrea ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-06 7:41 ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski 2003-08-06 8:58 ` Oleg Drokin 2003-08-06 9:09 ` Willy Tarreau @ 2003-08-06 18:15 ` Marcelo Tosatti 2003-08-07 2:14 ` Stephan von Krawczynski 2 siblings, 1 reply; 21+ messages in thread From: Marcelo Tosatti @ 2003-08-06 18:15 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: andrea, linux-kernel, green On Wed, 6 Aug 2003, Stephan von Krawczynski wrote: > Unable to handle kernel NULL pointer dereference at virtual address 00000006 > c0144b14 > *pde = 00000000 > Oops: 0002 > CPU: 1 > EIP: 0010:[<c0144b14>] Not tainted > Using defaults from ksymoops -t elf32-i386 -a i386 > EFLAGS: 00010246 > eax: 00000000 ebx: f0f66540 ecx: f0f66540 edx: 00000006 > esi: f0f66540 edi: f0f66540 ebp: c2ce0350 esp: c345df24 > ds: 0018 es: 0018 ss: 0018 > Process kswapd (pid: 5, stackpage=c345d000) > Stack: c0147ddf f0f66540 00000000 c2ce0350 0001bcad c02eab68 c0139228 c2ce0350 > 000001d0 00000200 000001d0 00000016 00000020 000001d0 00000020 00000006 > c01394b3 00000006 c345c000 c02eab68 000001d0 00000006 c02eab68 00000000 > Call Trace: [<c0147ddf>] [<c0139228>] [<c01394b3>] [<c013952e>] [<c013963c>] > [<c01396c8>] [<c01397f8>] [<c0139760>] [<c0105000>] [<c010592e>] [<c0139760>] > Code: 89 02 c7 41 30 00 00 00 00 89 4c 24 04 e9 7a ff ff ff 8d 76 > > > >>EIP; c0144b14 <__remove_from_queues+14/30> <===== > > >>ebx; f0f66540 <_end+30bbb320/3852ee40> > >>ecx; f0f66540 <_end+30bbb320/3852ee40> > >>esi; f0f66540 <_end+30bbb320/3852ee40> > >>edi; f0f66540 <_end+30bbb320/3852ee40> > >>ebp; c2ce0350 <_end+2935130/3852ee40> > >>esp; c345df24 <_end+30b2d04/3852ee40> Stephan, I'm pretty worried about this problem. Your oopses seem to be the result of some kind of memory corruption. On the other oopses we could see the kernel oopsing on remove_page_from_hash_queue due to corrupted pointers (as Willy pointed out). Can you please try to crash your box again with CONFIG_DEBUG_SLAB=y Again, thanks a lot for your reports. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-06 18:15 ` Marcelo Tosatti @ 2003-08-07 2:14 ` Stephan von Krawczynski 2003-08-07 5:35 ` Oleg Drokin 2003-08-07 12:45 ` Marcelo Tosatti 0 siblings, 2 replies; 21+ messages in thread From: Stephan von Krawczynski @ 2003-08-07 2:14 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: andrea, linux-kernel, green On Wed, 6 Aug 2003 15:15:39 -0300 (BRT) Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > Stephan, > > I'm pretty worried about this problem. > > Your oopses seem to be the result of some kind of memory corruption. On > the other oopses we could see the kernel oopsing on > remove_page_from_hash_queue due to corrupted pointers (as Willy pointed > out). > > Can you please try to crash your box again with > > CONFIG_DEBUG_SLAB=y > > Again, thanks a lot for your reports. Ok, I have two things. First, another oops. I upgraded the system to rc1 yesterday and it did not survive a single day. Here's the decoded oops, the box was "clean" meaning no weird modules or the like: ksymoops 2.4.8 on i686 2.4.22-rc1. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.22-rc1/ (default) -m /boot/System.map-2.4.22-rc1 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Unable to handle kernel NULL pointer dereference at virtual address 00000004 c0145060 *pde = 00000000 Oops: 0002 CPU: 1 EIP: 0010:[<c0145060>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010283 eax: 00000000 ebx: c822feb4 ecx: c822fe60 edx: e07e7780 esi: 00000000 edi: e07e7780 ebp: f59bfe3c esp: f59bfe2c ds: 0018 es: 0018 ss: 0018 Process nfsd (pid: 1737, stackpage=f59bf000) Stack: f0cce7a0 00000001 f59bfe38 c822fe60 f0cce7f4 eec54ef4 00000000 e07e7760 f59be000 f59bfea8 c0183ef5 e07e7780 e07e77cc c02ed880 e07e7760 f8c84fc8 f59bfea8 dfe6c960 00000000 e07e7760 dfe6c960 00000000 f59c6e04 f59bfea8 Call Trace: [<c0183ef5>] [<f8c84fc8>] [<f8c856f1>] [<f8c8cee4>] [<f8c8e295>] [<f8c923f4>] [<f8c80699>] [<f8c65938>] [<f8c923f4>] [<f8c91a38>] [<f8c91a58>] [<f8c80411>] [<c010592e>] [<f8c80210>] Code: 89 50 04 c7 41 54 00 00 00 00 c7 43 04 00 00 00 00 8b 44 24 >>EIP; c0145060 <fsync_buffers_list+50/1b0> <===== >>ebx; c822feb4 <_end+7e84c94/3852ee40> >>ecx; c822fe60 <_end+7e84c40/3852ee40> >>edx; e07e7780 <_end+2043c560/3852ee40> >>edi; e07e7780 <_end+2043c560/3852ee40> >>ebp; f59bfe3c <_end+35614c1c/3852ee40> >>esp; f59bfe2c <_end+35614c0c/3852ee40> Trace; c0183ef5 <reiserfs_sync_file+65/d0> Trace; f8c84fc8 <[nfsd]nfsd_sync+78/d0> Trace; f8c856f1 <[nfsd]nfsd_commit+a1/b0> Trace; f8c8cee4 <[nfsd]nfsd3_proc_commit+94/130> Trace; f8c8e295 <[nfsd]nfs3svc_decode_commitargs+35/e0> Trace; f8c923f4 <[nfsd]nfsd_procedures3+2f4/320> Trace; f8c80699 <[nfsd]nfsd_dispatch+119/21d> Trace; f8c65938 <[sunrpc]svc_process+4d8/570> Trace; f8c923f4 <[nfsd]nfsd_procedures3+2f4/320> Trace; f8c91a38 <[nfsd]nfsd_version3+0/10> Trace; f8c91a58 <[nfsd]nfsd_program+0/28> Trace; f8c80411 <[nfsd]nfsd+201/370> Trace; c010592e <arch_kernel_thread+2e/40> Trace; f8c80210 <[nfsd]nfsd+0/370> Code; c0145060 <fsync_buffers_list+50/1b0> 00000000 <_EIP>: Code; c0145060 <fsync_buffers_list+50/1b0> <===== 0: 89 50 04 mov %edx,0x4(%eax) <===== Code; c0145063 <fsync_buffers_list+53/1b0> 3: c7 41 54 00 00 00 00 movl $0x0,0x54(%ecx) Code; c014506a <fsync_buffers_list+5a/1b0> a: c7 43 04 00 00 00 00 movl $0x0,0x4(%ebx) Code; c0145071 <fsync_buffers_list+61/1b0> 11: 8b 44 24 00 mov 0x0(%esp,1),%eax 1 warning issued. Results may not be reliable. As you can see reiserfs seems involved. Regarding reiserfs and my last postings I can assure you that all reiserfs partitions were checked via reiserfsck right before installation of rc1 - as Oleg advised - and found: "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs" I was told to use --fix-fixable option which I did and it indeed fixed the problem. Trying reiserfsck after that found no errors any more. So I see no chance that corrupt data on the media (through former crashes) is responsible for this one. Hint: spelling in reiserfsck should be checked ;-) Second, I re-install the box with CONFIG_DEBUG_SLAB="y" right now. Please tell me if I should perform special steps (SYSRQ or the like) after the next crash happens, or if the decoded oops will be sufficient. Regards, Stephan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-07 2:14 ` Stephan von Krawczynski @ 2003-08-07 5:35 ` Oleg Drokin 2003-08-07 12:45 ` Marcelo Tosatti 1 sibling, 0 replies; 21+ messages in thread From: Oleg Drokin @ 2003-08-07 5:35 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: Marcelo Tosatti, andrea, linux-kernel Hello! On Thu, Aug 07, 2003 at 04:14:40AM +0200, Stephan von Krawczynski wrote: > Unable to handle kernel NULL pointer dereference at virtual address 00000004 Hm NULL pointer in j_dirty_buffers list. This cannot happen, basically. This is a cyclically linked list of buffers. And we add stuff to it via standard functions, so the linkage happens by itself. > Trace; c0183ef5 <reiserfs_sync_file+65/d0> > Trace; f8c84fc8 <[nfsd]nfsd_sync+78/d0> > Code; c0145060 <fsync_buffers_list+50/1b0> > 00000000 <_EIP>: > Code; c0145060 <fsync_buffers_list+50/1b0> <===== > 0: 89 50 04 mov %edx,0x4(%eax) <===== > As you can see reiserfs seems involved. Regarding reiserfs and my last postings > I can assure you that all reiserfs partitions were checked via reiserfsck right > before installation of rc1 - as Oleg advised - and found: > "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs" That might explain your prior "freeing already free block" messages. > I was told to use --fix-fixable option which I did and it indeed fixed the > problem. Trying reiserfsck after that found no errors any more. So I see no > chance that corrupt data on the media (through former crashes) is responsible > for this one. Hint: spelling in reiserfsck should be checked ;-) Yes, but how the condition that triggered the oops have appeared is totally unclear for me. Bye, Oleg ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-07 2:14 ` Stephan von Krawczynski 2003-08-07 5:35 ` Oleg Drokin @ 2003-08-07 12:45 ` Marcelo Tosatti [not found] ` <3F325198.2010301@namesys.com> 2003-08-07 15:52 ` Stephan von Krawczynski 1 sibling, 2 replies; 21+ messages in thread From: Marcelo Tosatti @ 2003-08-07 12:45 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: andrea, linux-kernel, green On Thu, 7 Aug 2003, Stephan von Krawczynski wrote: > On Wed, 6 Aug 2003 15:15:39 -0300 (BRT) > Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > > > Stephan, > > > > I'm pretty worried about this problem. > > > > Your oopses seem to be the result of some kind of memory corruption. On > > the other oopses we could see the kernel oopsing on > > remove_page_from_hash_queue due to corrupted pointers (as Willy pointed > > out). > > > > Can you please try to crash your box again with > > > > CONFIG_DEBUG_SLAB=y > > > > Again, thanks a lot for your reports. > > Ok, I have two things. > First, another oops. I upgraded the system to rc1 yesterday and it did not > survive a single day. Here's the decoded oops, the box was "clean" meaning no > weird modules or the like: > > > ksymoops 2.4.8 on i686 2.4.22-rc1. Options used > -V (default) > -k /proc/ksyms (default) > -l /proc/modules (default) > -o /lib/modules/2.4.22-rc1/ (default) > -m /boot/System.map-2.4.22-rc1 (default) > > Warning: You did not tell me where to find symbol information. I will > assume that the log matches the kernel and modules that are running > right now and I'll use the default options above for symbol resolution. > If the current kernel and/or modules do not match the log, you can get > more accurate output by telling me the kernel version and where to find > map, modules, ksyms etc. ksymoops -h explains the options. > > Unable to handle kernel NULL pointer dereference at virtual address 00000004 > c0145060 > *pde = 00000000 > Oops: 0002 > CPU: 1 > EIP: 0010:[<c0145060>] Not tainted > Using defaults from ksymoops -t elf32-i386 -a i386 > EFLAGS: 00010283 > eax: 00000000 ebx: c822feb4 ecx: c822fe60 edx: e07e7780 > esi: 00000000 edi: e07e7780 ebp: f59bfe3c esp: f59bfe2c > ds: 0018 es: 0018 ss: 0018 > Process nfsd (pid: 1737, stackpage=f59bf000) > Stack: f0cce7a0 00000001 f59bfe38 c822fe60 f0cce7f4 eec54ef4 00000000 e07e7760 > f59be000 f59bfea8 c0183ef5 e07e7780 e07e77cc c02ed880 e07e7760 f8c84fc8 > f59bfea8 dfe6c960 00000000 e07e7760 dfe6c960 00000000 f59c6e04 f59bfea8 > Call Trace: [<c0183ef5>] [<f8c84fc8>] [<f8c856f1>] [<f8c8cee4>] [<f8c8e295>] > [<f8c923f4>] [<f8c80699>] [<f8c65938>] [<f8c923f4>] [<f8c91a38>] [<f8c91a58>] > [<f8c80411>] [<c010592e>] [<f8c80210>] > Code: 89 50 04 c7 41 54 00 00 00 00 c7 43 04 00 00 00 00 8b 44 24 > > > >>EIP; c0145060 <fsync_buffers_list+50/1b0> <===== > > >>ebx; c822feb4 <_end+7e84c94/3852ee40> > >>ecx; c822fe60 <_end+7e84c40/3852ee40> > >>edx; e07e7780 <_end+2043c560/3852ee40> > >>edi; e07e7780 <_end+2043c560/3852ee40> > >>ebp; f59bfe3c <_end+35614c1c/3852ee40> > >>esp; f59bfe2c <_end+35614c0c/3852ee40> > > Trace; c0183ef5 <reiserfs_sync_file+65/d0> > Trace; f8c84fc8 <[nfsd]nfsd_sync+78/d0> > Trace; f8c856f1 <[nfsd]nfsd_commit+a1/b0> > Trace; f8c8cee4 <[nfsd]nfsd3_proc_commit+94/130> > Trace; f8c8e295 <[nfsd]nfs3svc_decode_commitargs+35/e0> > Trace; f8c923f4 <[nfsd]nfsd_procedures3+2f4/320> > Trace; f8c80699 <[nfsd]nfsd_dispatch+119/21d> > Trace; f8c65938 <[sunrpc]svc_process+4d8/570> > Trace; f8c923f4 <[nfsd]nfsd_procedures3+2f4/320> > Trace; f8c91a38 <[nfsd]nfsd_version3+0/10> > Trace; f8c91a58 <[nfsd]nfsd_program+0/28> > Trace; f8c80411 <[nfsd]nfsd+201/370> > Trace; c010592e <arch_kernel_thread+2e/40> > Trace; f8c80210 <[nfsd]nfsd+0/370> > > Code; c0145060 <fsync_buffers_list+50/1b0> > 00000000 <_EIP>: > Code; c0145060 <fsync_buffers_list+50/1b0> <===== > 0: 89 50 04 mov %edx,0x4(%eax) <===== > Code; c0145063 <fsync_buffers_list+53/1b0> > 3: c7 41 54 00 00 00 00 movl $0x0,0x54(%ecx) > Code; c014506a <fsync_buffers_list+5a/1b0> > a: c7 43 04 00 00 00 00 movl $0x0,0x4(%ebx) > Code; c0145071 <fsync_buffers_list+61/1b0> > 11: 8b 44 24 00 mov 0x0(%esp,1),%eax > > > 1 warning issued. Results may not be reliable. > > > As you can see reiserfs seems involved. Regarding reiserfs and my last postings > I can assure you that all reiserfs partitions were checked via reiserfsck right > before installation of rc1 - as Oleg advised - and found: > "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs" > I was told to use --fix-fixable option which I did and it indeed fixed the > problem. Trying reiserfsck after that found no errors any more. So I see no > chance that corrupt data on the media (through former crashes) is responsible > for this one. Hint: spelling in reiserfsck should be checked ;-) It might be a problem in reiserfs. You're getting oopses on different places with different stack traces, which is weird. I'll take a closer look at this oops now. > Second, I re-install the box with CONFIG_DEBUG_SLAB="y" right now. Please tell > me if I should perform special steps (SYSRQ or the like) after the next crash > happens, or if the decoded oops will be sufficient. The decoded oops should be sufficient. ^ permalink raw reply [flat|nested] 21+ messages in thread
[parent not found: <3F325198.2010301@namesys.com>]
* Re: 2.4.22-pre lockups (now decoded oops for pre10) [not found] ` <3F325198.2010301@namesys.com> @ 2003-08-07 13:32 ` Stephan von Krawczynski 2003-08-18 20:29 ` Mike Fedyk 0 siblings, 1 reply; 21+ messages in thread From: Stephan von Krawczynski @ 2003-08-07 13:32 UTC (permalink / raw) To: Hans Reiser; +Cc: linux-kernel On Thu, 07 Aug 2003 17:18:16 +0400 Hans Reiser <reiser@namesys.com> wrote: > >On Thu, 7 Aug 2003, Stephan von Krawczynski wrote: > >>for this one. Hint: spelling in reiserfsck should be checked ;-) > > > where? Hello Hans, I am no native english, but "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs" feels uncomfortable in my ears ;-) I'd say "two things differ", without trailing "s". I am not even sure if "bitmaps" shouldn't be singular "bitmap" instead. But, as stated, I am no native, I can't be sure. Regards, Stephan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-07 13:32 ` Stephan von Krawczynski @ 2003-08-18 20:29 ` Mike Fedyk 2003-08-18 20:39 ` Stephan von Krawczynski 0 siblings, 1 reply; 21+ messages in thread From: Mike Fedyk @ 2003-08-18 20:29 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: Hans Reiser, linux-kernel On Thu, Aug 07, 2003 at 03:32:57PM +0200, Stephan von Krawczynski wrote: > On Thu, 07 Aug 2003 17:18:16 +0400 > Hans Reiser <reiser@namesys.com> wrote: > > > >On Thu, 7 Aug 2003, Stephan von Krawczynski wrote: > > >>for this one. Hint: spelling in reiserfsck should be checked ;-) > > > > > where? > > Hello Hans, > > I am no native english, but > "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmaps differs" > feels uncomfortable in my ears ;-) > I'd say "two things differ", without trailing "s". I am not even sure if > "bitmaps" shouldn't be singular "bitmap" instead. "bitmaps" with your changes would be correct. Though, just turn "bitmaps" into "bitmap" and it should be fine. I can't really think of a phrase specific enough for the error message without adding enough text to make it two lines, which wouldn't be good. "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmap differs" ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-18 20:29 ` Mike Fedyk @ 2003-08-18 20:39 ` Stephan von Krawczynski 2003-08-18 21:05 ` [grammar] " Matt Gibson 2003-08-18 21:09 ` Mike Fedyk 0 siblings, 2 replies; 21+ messages in thread From: Stephan von Krawczynski @ 2003-08-18 20:39 UTC (permalink / raw) To: Mike Fedyk; +Cc: reiser, linux-kernel On Mon, 18 Aug 2003 13:29:49 -0700 Mike Fedyk <mfedyk@matchmail.com> wrote: > > I'd say "two things differ", without trailing "s". I am not even sure if > > "bitmaps" shouldn't be singular "bitmap" instead. > > "bitmaps" with your changes would be correct. > > Though, just turn "bitmaps" into "bitmap" and it should be fine. I can't > really think of a phrase specific enough for the error message without > adding enough text to make it two lines, which wouldn't be good. > > "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmap differs" Hm, but: "a and b differ" "a differs from b" or not? Alternatives: "a and b are different" But if you use "are" here, you cannot use "differs" above, right? Regards, Stephan ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [grammar] 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-18 20:39 ` Stephan von Krawczynski @ 2003-08-18 21:05 ` Matt Gibson 2003-08-18 21:09 ` Mike Fedyk 1 sibling, 0 replies; 21+ messages in thread From: Matt Gibson @ 2003-08-18 21:05 UTC (permalink / raw) To: linux-kernel On Monday 18 Aug 2003 21:39, Stephan von Krawczynski wrote: > Mike Fedyk <mfedyk@matchmail.com> wrote: > > "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmap > > differs" > > Hm, but: > > "a and b differ" > "a differs from b" Yes. Assuming that you're reporting the comparison of two single bitmaps: "The on-disk bitmap and the correct bitmap differ." "The on-disk and the correct bitmap differ." "The on-disk bitmap differs from the correct bitmap." I'd say the last of those three sounds best; the second sounds a little stilted because you have to think for a moment to realise that "on-disk" is being used as a contraction of "on-disk bitmap." If the difference is between two sets of bitmaps: "The on-disk bitmaps and the correct bitmaps differ." "The on-disk and the correct bitmaps differ." "The on-disk bitmaps differ from the correct bitmaps." Matt (and that's the last you'll hear from me on this one; there's enough traffic on here as it is...) -- "It's the small gaps between the rain that count, and learning how to live amongst them." -- Jeff Noon ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-18 20:39 ` Stephan von Krawczynski 2003-08-18 21:05 ` [grammar] " Matt Gibson @ 2003-08-18 21:09 ` Mike Fedyk 1 sibling, 0 replies; 21+ messages in thread From: Mike Fedyk @ 2003-08-18 21:09 UTC (permalink / raw) To: Stephan von Krawczynski; +Cc: reiser, linux-kernel On Mon, Aug 18, 2003 at 10:39:46PM +0200, Stephan von Krawczynski wrote: > On Mon, 18 Aug 2003 13:29:49 -0700 > Mike Fedyk <mfedyk@matchmail.com> wrote: > > > > I'd say "two things differ", without trailing "s". I am not even sure if > > > "bitmaps" shouldn't be singular "bitmap" instead. > > > > "bitmaps" with your changes would be correct. > > > > Though, just turn "bitmaps" into "bitmap" and it should be fine. I can't > > really think of a phrase specific enough for the error message without > > adding enough text to make it two lines, which wouldn't be good. > > > > "Comparing bitmaps.. vpf-10640: The on-disk and the correct bitmap differs" > > Hm, but: > > "a and b differ" 1) "Comparing bitmaps.. vpf-10640: The on-disk and correct bitmap differ" > "a differs from b" 2) "Comparing bitmaps.. vpf-10640: The on-disk differs from the correct bitmap" > > or not? > > Alternatives: > > "a and b are different" 3) "Comparing bitmaps.. vpf-10640: The on-disk and correct are different" > > But if you use "are" here, you cannot use "differs" above, right? > Yes. I kinda like (1), or the origional changed to "bitmap" instead of "bitmaps". Mike ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: 2.4.22-pre lockups (now decoded oops for pre10) 2003-08-07 12:45 ` Marcelo Tosatti [not found] ` <3F325198.2010301@namesys.com> @ 2003-08-07 15:52 ` Stephan von Krawczynski 1 sibling, 0 replies; 21+ messages in thread From: Stephan von Krawczynski @ 2003-08-07 15:52 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: andrea, linux-kernel, green On Thu, 7 Aug 2003 09:45:36 -0300 (BRT) Marcelo Tosatti <marcelo@conectiva.com.br> wrote: > The decoded oops should be sufficient. Well, how about this one: ksymoops 2.4.8 on i686 2.4.22-rc1. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.22-rc1/ (default) -m /boot/System.map-2.4.22-rc1 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Unable to handle kernel paging request at virtual address 63eabdb3 c0145f31 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[<c0145f31>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010206 eax: 00000000 ebx: 00000000 ecx: 00000061 edx: 63eabd93 esi: 00000000 edi: 00001000 ebp: 00000000 esp: c34f7e60 ds: 0018 es: 0018 ss: 0018 Process kupdated (pid: 7, stackpage=c34f7000) Stack: 00000000 f7afb1f0 c0146018 00000000 c01312e9 00000000 c1849dd0 00001000 00001000 00000803 c014823a c1849dd0 00001000 00000000 f79b7fa4 00001e18 c0148428 f79b7fa4 00001e18 00001000 e9640000 00000000 00000803 00001000 Call Trace: [<c0146018>] [<c01312e9>] [<c014823a>] [<c0148428>] [<c0145b36>] [<c0197328>] [<c019ceb9>] [<c019c4f5>] [<c0188e94>] [<c01498cb>] [<c014887c>] [<c0148be9>] [<c0105000>] [<c010592e>] [<c0148af0>] Code: 8b 42 20 a3 30 c6 37 c0 8d 41 ff a3 34 c6 37 c0 c6 05 c0 bb >>EIP; c0145f31 <get_unused_buffer_head+21/b0> <===== >>esp; c34f7e60 <_end+314cc40/3852ee40> Trace; c0146018 <create_buffers+28/100> Trace; c01312e9 <find_or_create_page+109/110> Trace; c014823a <grow_dev_page+7a/c0> Trace; c0148428 <grow_buffers+98/110> Trace; c0145b36 <getblk+46/80> Trace; c0197328 <journal_getblk+28/30> Trace; c019ceb9 <do_journal_end+139/bb0> Trace; c019c4f5 <flush_old_commits+135/1d0> Trace; c0188e94 <reiserfs_write_super+64/90> Trace; c01498cb <sync_supers+14b/170> Trace; c014887c <sync_old_buffers+3c/b0> Trace; c0148be9 <kupdate+f9/130> Trace; c0105000 <_stext+0/0> Trace; c010592e <arch_kernel_thread+2e/40> Trace; c0148af0 <kupdate+0/130> Code; c0145f31 <get_unused_buffer_head+21/b0> 00000000 <_EIP>: Code; c0145f31 <get_unused_buffer_head+21/b0> <===== 0: 8b 42 20 mov 0x20(%edx),%eax <===== Code; c0145f34 <get_unused_buffer_head+24/b0> 3: a3 30 c6 37 c0 mov %eax,0xc037c630 Code; c0145f39 <get_unused_buffer_head+29/b0> 8: 8d 41 ff lea 0xffffffff(%ecx),%eax Code; c0145f3c <get_unused_buffer_head+2c/b0> b: a3 34 c6 37 c0 mov %eax,0xc037c634 Code; c0145f41 <get_unused_buffer_head+31/b0> 10: c6 05 c0 bb 00 00 00 movb $0x0,0xbbc0 1 warning issued. Results may not be reliable. After that I received this one: ksymoops 2.4.8 on i686 2.4.22-rc1. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.22-rc1/ (default) -m /boot/System.map-2.4.22-rc1 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. NMI Watchdog detected LOCKUP on CPU1, eip c011a747, registers: CPU: 1 EIP: 0010:[<c011a747>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00000082 eax: cef0b8dc ebx: cef0b894 ecx: 00000001 edx: 00000003 esi: 00000008 edi: cef0b8dc ebp: ec8efe48 esp: ec8efe28 ds: 0018 es: 0018 ss: 0018 Process tar (pid: 13603, stackpage=ec8ef000) Stack: 00000000 cef0b894 00000000 00000282 00000003 cef0b894 00000008 cef0b8dc 00000000 c01c4f41 00000000 cef0b894 00000000 0001679d cef0b894 00001000 c0146c87 00000000 cef0b894 cef0b894 00000004 cef0b894 ec8ee000 00000001 Call Trace: [<c01c4f41>] [<c0146c87>] [<c013ae92>] [<c0119630>] [<c0130d7e>] [<c017ff50>] [<c013146f>] [<c0131751>] [<c0131d50>] [<c0131ffc>] [<c0131d50>] [<c014328b>] [<c010782f>] Code: 7e f9 e9 d9 ec ff ff 80 38 00 f3 90 7e f9 e9 5d ed ff ff 80 >>EIP; c011a747 <.text.lock.sched+3f/178> <===== >>eax; cef0b8dc <_end+eb606bc/3852ee40> >>ebx; cef0b894 <_end+eb60674/3852ee40> >>edi; cef0b8dc <_end+eb606bc/3852ee40> >>ebp; ec8efe48 <_end+2c544c28/3852ee40> >>esp; ec8efe28 <_end+2c544c08/3852ee40> Trace; c01c4f41 <submit_bh+a1/c0> Trace; c0146c87 <block_read_full_page+2d7/2f0> Trace; c013ae92 <__alloc_pages+42/190> Trace; c0119630 <wait_for_completion+70/b0> Trace; c0130d7e <page_cache_read+be/e0> Trace; c017ff50 <reiserfs_get_block+0/1490> Trace; c013146f <generic_file_readahead+af/1a0> Trace; c0131751 <do_generic_file_read+1c1/470> Trace; c0131d50 <file_read_actor+0/110> Trace; c0131ffc <generic_file_read+19c/1b0> Trace; c0131d50 <file_read_actor+0/110> Trace; c014328b <sys_read+9b/180> Trace; c010782f <system_call+33/38> Code; c011a747 <.text.lock.sched+3f/178> 00000000 <_EIP>: Code; c011a747 <.text.lock.sched+3f/178> <===== 0: 7e f9 jle fffffffb <_EIP+0xfffffffb> <===== Code; c011a749 <.text.lock.sched+41/178> 2: e9 d9 ec ff ff jmp ffffece0 <_EIP+0xffffece0> Code; c011a74e <.text.lock.sched+46/178> 7: 80 38 00 cmpb $0x0,(%eax) Code; c011a751 <.text.lock.sched+49/178> a: f3 90 repz nop Code; c011a753 <.text.lock.sched+4b/178> c: 7e f9 jle 7 <_EIP+0x7> Code; c011a755 <.text.lock.sched+4d/178> e: e9 5d ed ff ff jmp ffffed70 <_EIP+0xffffed70> Code; c011a75a <.text.lock.sched+52/178> 13: 80 00 00 addb $0x0,(%eax) 1 warning issued. Results may not be reliable. There were no I/O errors or any other spectacular things happening. It just died while I was sitting right next to it during the verify run of tar. Regards, Stephan ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2003-08-18 21:38 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <Pine.LNX.4.55L.0307251040240.12645@freak.distro.conectiva> [not found] ` <20030725174517.5b21116d.skraw@ithnet.com> [not found] ` <Pine.LNX.4.55L.0307251545090.14733@freak.distro.conectiva> 2003-08-02 12:27 ` 2.4.22-pre lockups (decoded oops for pre8) Stephan von Krawczynski 2003-08-03 7:25 ` Willy Tarreau 2003-08-03 9:40 ` Stephan von Krawczynski 2003-08-05 16:40 ` Marcelo Tosatti 2003-08-06 2:37 ` Stephan von Krawczynski 2003-08-06 7:41 ` 2.4.22-pre lockups (now decoded oops for pre10) Stephan von Krawczynski 2003-08-06 8:58 ` Oleg Drokin 2003-08-06 9:09 ` Willy Tarreau 2003-08-06 9:36 ` Stephan von Krawczynski 2003-08-06 12:45 ` Willy Tarreau 2003-08-18 14:23 ` Andrea Arcangeli 2003-08-06 18:15 ` Marcelo Tosatti 2003-08-07 2:14 ` Stephan von Krawczynski 2003-08-07 5:35 ` Oleg Drokin 2003-08-07 12:45 ` Marcelo Tosatti [not found] ` <3F325198.2010301@namesys.com> 2003-08-07 13:32 ` Stephan von Krawczynski 2003-08-18 20:29 ` Mike Fedyk 2003-08-18 20:39 ` Stephan von Krawczynski 2003-08-18 21:05 ` [grammar] " Matt Gibson 2003-08-18 21:09 ` Mike Fedyk 2003-08-07 15:52 ` Stephan von Krawczynski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).