* Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) [not found] <200205160528.g4G5S631019167@sol.mixi.net> @ 2002-05-16 12:28 ` Todd R. Eigenschink 2002-05-16 19:38 ` William Lee Irwin III 2002-05-20 12:58 ` Todd R. Eigenschink 0 siblings, 2 replies; 9+ messages in thread From: Todd R. Eigenschink @ 2002-05-16 12:28 UTC (permalink / raw) To: Mike Galbraith; +Cc: linux-kernel Mike Galbraith writes: >Methinks there's an easier way to get to the line in question. Compile sched.c with -g via make kernel/sched.o EXTRA_CFLAGS=-g.. gbd can then be used to get you the line with list *__wake_up+0xb2. Ooh, spiffy idea. (Like I said, asm rookie.) I just compiled gdb, and here's what it says. Interesting, to me, at least. (gdb) list *__wake_up+0xb2 0x9d6 is in __wake_up (/src/linux-2.4.19-pre8/include/asm/processor.h:488). 483 #ifdef CONFIG_MPENTIUMIII 484 485 #define ARCH_HAS_PREFETCH 486 extern inline void prefetch(const void *x) 487 { 488 __asm__ __volatile__ ("prefetchnta (%0)" : : "r"(x)); 489 } 490 491 #elif CONFIG_X86_USE_3DNOW 492 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) 2002-05-16 12:28 ` Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) Todd R. Eigenschink @ 2002-05-16 19:38 ` William Lee Irwin III 2002-05-20 12:58 ` Todd R. Eigenschink 1 sibling, 0 replies; 9+ messages in thread From: William Lee Irwin III @ 2002-05-16 19:38 UTC (permalink / raw) To: Todd R. Eigenschink; +Cc: Mike Galbraith, linux-kernel On Thu, May 16, 2002 at 07:28:44AM -0500, Todd R. Eigenschink wrote: > Ooh, spiffy idea. (Like I said, asm rookie.) I just compiled gdb, > and here's what it says. Interesting, to me, at least. > (gdb) list *__wake_up+0xb2 > 0x9d6 is in __wake_up > (/src/linux-2.4.19-pre8/include/asm/processor.h:488). > 483 #ifdef CONFIG_MPENTIUMIII > 484 > 485 #define ARCH_HAS_PREFETCH > 486 extern inline void prefetch(const void *x) > 487 { > 488 __asm__ __volatile__ ("prefetchnta (%0)" : : "r"(x)); > 489 } > 490 > 491 #elif CONFIG_X86_USE_3DNOW list_for_each() uses prefetch() and is used in __wake_up_common(), which is in turn used by __wake_up(). This is waitqueue list corruption. Cheers, Bill ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) 2002-05-16 12:28 ` Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) Todd R. Eigenschink 2002-05-16 19:38 ` William Lee Irwin III @ 2002-05-20 12:58 ` Todd R. Eigenschink 2002-05-20 17:00 ` William Lee Irwin III 1 sibling, 1 reply; 9+ messages in thread From: Todd R. Eigenschink @ 2002-05-20 12:58 UTC (permalink / raw) To: linux-kernel Todd R. Eigenschink writes: >Mike Galbraith writes: >>Methinks there's an easier way to get to the line in question. Compile sched.c with -g via make kernel/sched.o EXTRA_CFLAGS=-g.. gbd can then be used to get you the line with list *__wake_up+0xb2. Since the particular snippet of code at the point of oops in the last one I posted was P3-specified, I recompiled for 586. The oops remains the same, although the call stack happens to be a lot longer this time. I'm going to run memtest86 on it for a while after it gets done with its morning processing, although this failure seems a little too consistent to be memory related. Trace; c0129b39 <unlock_page+81/88> Trace; c0139179 <end_buffer_io_async+8d/a8> Trace; c01b6f45 <end_that_request_first+65/c8> Trace; c01c1c3c <ide_end_request+68/a8> Trace; c01c806a <ide_dma_intr+6a/ac> Trace; c01c38ad <ide_intr+f9/164> Trace; c01c8000 <ide_dma_intr+0/ac> Trace; c010a1e1 <handle_IRQ_event+59/84> Trace; c010a3d9 <do_IRQ+a9/f4> Trace; c010c568 <call_do_IRQ+5/d> Trace; c0154b07 <statm_pgd_range+133/1a8> Trace; c0154c43 <proc_pid_statm+c7/16c> Trace; c015279e <proc_info_read+5a/118> Trace; c0137497 <sys_read+8f/104> Trace; c0108a43 <system_call+33/40> Code; c0116383 <__wake_up+3b/c0> 00000000 <_EIP>: Code; c0116383 <__wake_up+3b/c0> <===== 0: 8b 01 mov (%ecx),%eax <===== Code; c0116385 <__wake_up+3d/c0> 2: 85 45 fc test %eax,0xfffffffc(%ebp) Code; c0116388 <__wake_up+40/c0> 5: 74 66 je 6d <_EIP+0x6d> c01163f0 <__wake_up+a8/c 0> Code; c011638a <__wake_up+42/c0> 7: 31 d2 xor %edx,%edx Code; c011638c <__wake_up+44/c0> 9: 9c pushf Code; c011638d <__wake_up+45/c0> a: 5e pop %esi Code; c011638e <__wake_up+46/c0> b: fa cli Code; c011638f <__wake_up+47/c0> c: f0 fe 0d 80 99 30 c0 lock decb 0xc0309980 Code; c0116396 <__wake_up+4e/c0> 13: 0f 00 00 sldtl (%eax) (gdb) list *__wake_up+0x3b 0x96f is in __wake_up (kernel/sched.c:732). 727 wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list); 728 729 CHECK_MAGIC(curr->__magic); 730 p = curr->task; 731 state = p->state; 732 if (state & mode) { 733 WQ_NOTE_WAKER(curr); 734 if (try_to_wake_up(p, sync) && (curr->flags&WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) 735 break; 736 } ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) 2002-05-20 12:58 ` Todd R. Eigenschink @ 2002-05-20 17:00 ` William Lee Irwin III 2002-05-20 20:26 ` Todd R. Eigenschink 0 siblings, 1 reply; 9+ messages in thread From: William Lee Irwin III @ 2002-05-20 17:00 UTC (permalink / raw) To: Todd R. Eigenschink; +Cc: linux-kernel On Mon, May 20, 2002 at 07:58:25AM -0500, Todd R. Eigenschink wrote: > Since the particular snippet of code at the point of oops in the last > one I posted was P3-specified, I recompiled for 586. The oops remains > the same, although the call stack happens to be a lot longer this > time. I suspect the lowest parts of the call chain are being handed bad data. On Mon, May 20, 2002 at 07:58:25AM -0500, Todd R. Eigenschink wrote: > I'm going to run memtest86 on it for a while after it gets done with > its morning processing, although this failure seems a little too > consistent to be memory related. I hope I didn't say that. On Mon, May 20, 2002 at 07:58:25AM -0500, Todd R. Eigenschink wrote: > Trace; c0129b39 <unlock_page+81/88> > Trace; c0139179 <end_buffer_io_async+8d/a8> > Trace; c01b6f45 <end_that_request_first+65/c8> > Trace; c01c1c3c <ide_end_request+68/a8> > Trace; c01c806a <ide_dma_intr+6a/ac> > Trace; c01c38ad <ide_intr+f9/164> > Trace; c01c8000 <ide_dma_intr+0/ac> > Trace; c010a1e1 <handle_IRQ_event+59/84> > Trace; c010a3d9 <do_IRQ+a9/f4> > Trace; c010c568 <call_do_IRQ+5/d> > Trace; c0154b07 <statm_pgd_range+133/1a8> > Trace; c0154c43 <proc_pid_statm+c7/16c> > Trace; c015279e <proc_info_read+5a/118> > Trace; c0137497 <sys_read+8f/104> > Trace; c0108a43 <system_call+33/40> The __wake_up()/unlock_page() isn't the interesting part of the call chain, the parts from end_buffer_io_async() to ide_dma_intr() are. Any chance you can list them in gdb? Cheers, Bill ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) 2002-05-20 17:00 ` William Lee Irwin III @ 2002-05-20 20:26 ` Todd R. Eigenschink 2002-05-20 22:36 ` William Lee Irwin III 0 siblings, 1 reply; 9+ messages in thread From: Todd R. Eigenschink @ 2002-05-20 20:26 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-kernel William Lee Irwin III writes: >On Mon, May 20, 2002 at 07:58:25AM -0500, Todd R. Eigenschink wrote: >> I'm going to run memtest86 on it for a while after it gets done with >> its morning processing, although this failure seems a little too >> consistent to be memory related. > >I hope I didn't say that. Someone else had suggested testing the memory and power supply. memtest86 is easy to run, so I'll try that. It'll have to be tonight, now. >The __wake_up()/unlock_page() isn't the interesting part of the call >chain, the parts from end_buffer_io_async() to ide_dma_intr() are. > >Any chance you can list them in gdb? Well, after my posting from earlier today, I recompiled the kernel after stripping some more stuff. I just induced an oops in that one, so I can list the call stack for it. No IDE stuff this time; this looks a lot like most of the other ones I've seen. This morning was the first time I've ever seen IDE stuff in the post-oops call stack. It seems I can pretty much induce them at will, now. I started up four simultaneous Webtrends sessions, which grow fairly quickly to 400-600 MB each, give or take. (The machine has 2 GB of RAM, so it only swaps a little, sometimes.) Within half an hour, it fell over. Here's the oops itself, then the gdb output. ---------------------------------------------------------------------- Oops: 0000 CPU: 1 EIP: 0010:[<c0116363>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010087 eax: c2802db4 ebx: c2002db4 ecx: 00000000 edx: 00000003 esi: c2802db0 edi: c2802db0 ebp: f7bf3ee8 esp: f7bf3ecc ds: 0018 es: 0018 ss: 0018 Process kswapd (pid: 5, stackpage=f7bf3000) Stack: c133d790 c2802db0 c02acbf4 c2802db4 00000000 00000282 00000003 d911d9f0 c0129b19 0076eb00 c133d790 f7bf3f4c c0130817 00000000 c12e9ca0 00000020 00008efe 81e65000 81a7d000 1147d047 00000009 81c00000 f6c99818 81c00000 Call Trace: [<c0129b19>] [<c0130817>] [<c0130ca7>] [<c0130ea0>] [<c0130efc>] [<c0130f97>] [<c0130ff6>] [<c0131107>] [<c010712c>] Code: 8b 01 85 45 fc 74 66 31 d2 9c 5e fa f0 fe 0d 80 99 30 c0 0f >>EIP; c0116363 <__wake_up+3b/c0> <===== >>eax; c2802db4 <END_OF_CODE+249b758/????> >>ebx; c2002db4 <END_OF_CODE+1c9b758/????> >>esi; c2802db0 <END_OF_CODE+249b754/????> >>edi; c2802db0 <END_OF_CODE+249b754/????> >>ebp; f7bf3ee8 <END_OF_CODE+3788c88c/????> >>esp; f7bf3ecc <END_OF_CODE+3788c870/????> Trace; c0129b19 <unlock_page+81/88> Trace; c0130817 <swap_out+347/4b4> Trace; c0130ca7 <shrink_cache+323/3cc> Trace; c0130ea0 <shrink_caches+5c/84> Trace; c0130efc <try_to_free_pages+34/54> Trace; c0130f97 <kswapd_balance_pgdat+47/90> Trace; c0130ff6 <kswapd_balance+16/2c> Trace; c0131107 <kswapd+9b/b6> Trace; c010712c <kernel_thread+28/38> Code; c0116363 <__wake_up+3b/c0> 00000000 <_EIP>: Code; c0116363 <__wake_up+3b/c0> <===== 0: 8b 01 mov (%ecx),%eax <===== Code; c0116365 <__wake_up+3d/c0> 2: 85 45 fc test %eax,0xfffffffc(%ebp) Code; c0116368 <__wake_up+40/c0> 5: 74 66 je 6d <_EIP+0x6d> c01163d0 <__wake_up+a8/c 0> Code; c011636a <__wake_up+42/c0> 7: 31 d2 xor %edx,%edx Code; c011636c <__wake_up+44/c0> 9: 9c pushf Code; c011636d <__wake_up+45/c0> a: 5e pop %esi Code; c011636e <__wake_up+46/c0> b: fa cli Code; c011636f <__wake_up+47/c0> c: f0 fe 0d 80 99 30 c0 lock decb 0xc0309980 Code; c0116376 <__wake_up+4e/c0> 13: 0f 00 00 sldtl (%eax) ---------------------------------------------------------------------- (gdb) list *__wake_up+0x3b 0x973 is in __wake_up (sched.c:731). 726 unsigned int state; 727 wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list); 728 729 CHECK_MAGIC(curr->__magic); 730 p = curr->task; 731 state = p->state; 732 if (state & mode) { 733 WQ_NOTE_WAKER(curr); 734 if (try_to_wake_up(p, sync) && (curr->flags&WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) 735 break; (gdb) list *unlock_page+0x81 0xcf9 is in unlock_page (filemap.c:845). 840 smp_mb__before_clear_bit(); 841 if (!test_and_clear_bit(PG_locked, &(page)->flags)) 842 BUG(); 843 smp_mb__after_clear_bit(); 844 if (waitqueue_active(waitqueue)) 845 wake_up_all(waitqueue); 846 } 847 848 /* 849 * Get a lock on the page, assuming we need to sleep (gdb) list *swap_out+0x347 No source file for address 0x347. (gdb) list *swap_out 0x0 is in kswapd_init (vmscan.c:750). 745 } 746 } 747 748 static int __init kswapd_init(void) 749 { 750 printk("Starting kswapd\n"); 751 swap_setup(); 752 kernel_thread(kswapd, NULL, CLONE_FS | CLONE_FILES | CLONE_SIGNAL); 753 return 0; 754 } (I'm fuzzzy on swap_out...can I not see the code because it's a static function?) (gdb) list *shrink_cache+0x323 0x7d7 is in shrink_cache (vmscan.c:483). 478 * Alert! We've found too many mapped pages on the 479 * inactive list, so we start swapping out now! 480 */ 481 spin_unlock(&pagemap_lru_lock); 482 swap_out(priority, gfp_mask, classzone); 483 return nr_pages; 484 } 485 486 /* 487 * It is critical to check PageDirty _after_ we made sure (gdb) list *shrink_caches+0x5c 0x9d0 is in shrink_caches (vmscan.c:571). 566 nr_pages = chunk_size; 567 /* try to keep the active list 2/3 of the size of the cache */ 568 ratio = (unsigned long) nr_pages * nr_active_pages / ((nr_inactive_pages + 1) * 2); 569 refill_inactive(ratio); 570 571 nr_pages = shrink_cache(nr_pages, classzone, gfp_mask, priority); 572 if (nr_pages <= 0) 573 return 0; 574 575 shrink_dcache_memory(priority, gfp_mask); (gdb) list *try_to_free_pages+0x34 0xa2c is in try_to_free_pages (vmscan.c:591). 586 int priority = DEF_PRIORITY; 587 int nr_pages = SWAP_CLUSTER_MAX; 588 589 gfp_mask = pf_gfp_mask(gfp_mask); 590 do { 591 nr_pages = shrink_caches(classzone, priority, gfp_mask, nr_pages); 592 if (nr_pages <= 0) 593 return 1; 594 } while (--priority); 595 (gdb) list *kswapd_balance_pgdat+0x47 0xac7 is in kswapd_balance_pgdat (vmscan.c:630). 625 zone = pgdat->node_zones + i; 626 if (unlikely(current->need_resched)) 627 schedule(); 628 if (!zone->need_balance) 629 continue; 630 if (!try_to_free_pages(zone, GFP_KSWAPD, 0)) { 631 zone->need_balance = 0; 632 __set_current_state(TASK_INTERRUPTIBLE); 633 schedule_timeout(HZ); 634 continue; (gdb) list *kswapd_balance+0x16 0xb26 is in kswapd_balance (vmscan.c:655). 650 do { 651 need_more_balance = 0; 652 pgdat = pgdat_list; 653 do 654 need_more_balance |= kswapd_balance_pgdat(pgdat); 655 while ((pgdat = pgdat->node_next)); 656 } while (need_more_balance); 657 } 658 659 static int kswapd_can_sleep_pgdat(pg_data_t * pgdat) (gdb) list *kswapd+0x9b 0xc37 is in kswapd (/src/linux-2.4.19-pre8/include/linux/tqueue.h:121). 116 117 extern void __run_task_queue(task_queue *list); 118 119 static inline void run_task_queue(task_queue *list) 120 { 121 if (TQ_ACTIVE(*list)) 122 __run_task_queue(list); 123 } 124 125 #endif /* _LINUX_TQUEUE_H */ (gdb) list *kernel_thread+0x28 0x3fc is in kernel_thread (process.c:492). 487 */ 488 int kernel_thread(int (*fn)(void *), void * arg, unsigned long flags) 489 { 490 long retval, d0; 491 492 __asm__ __volatile__( 493 "movl %%esp,%%esi\n\t" 494 "int $0x80\n\t" /* Linux/i386 system call */ 495 "cmpl %%esp,%%esi\n\t" /* child or parent? */ 496 "je 1f\n\t" /* parent - jump */ ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) 2002-05-20 20:26 ` Todd R. Eigenschink @ 2002-05-20 22:36 ` William Lee Irwin III 2002-05-20 23:07 ` Todd R. Eigenschink 0 siblings, 1 reply; 9+ messages in thread From: William Lee Irwin III @ 2002-05-20 22:36 UTC (permalink / raw) To: Todd R. Eigenschink; +Cc: linux-kernel On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote: > Someone else had suggested testing the memory and power supply. > memtest86 is easy to run, so I'll try that. It'll have to be tonight, > now. Bitflips are usually things where a pointer turns up invalid (or non-NULL) and the difference between it and a valid pointer (or NULL) is one bit. I don't see that here and don't like blaming hardware. On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote: > Well, after my posting from earlier today, I recompiled the kernel > after stripping some more stuff. I just induced an oops in that one, > so I can list the call stack for it. Nice, I presume you've got -g there? Any chance of doing something like objdump --disassemble --source vmlinux and sending me the annotated disassembly of __wake_up()? I want to doublecheck something... On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote: > No IDE stuff this time; this looks a lot like most of the other ones > I've seen. This morning was the first time I've ever seen IDE stuff > in the post-oops call stack. This is pretty strange, yes. On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote: > It seems I can pretty much induce them at will, now. I started up > four simultaneous Webtrends sessions, which grow fairly quickly to > 400-600 MB each, give or take. (The machine has 2 GB of RAM, so it > only swaps a little, sometimes.) Within half an hour, it fell over. > Here's the oops itself, then the gdb output. Great stuff! Thanks. On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote: > Oops: 0000 > CPU: 1 > EIP: 0010:[<c0116363>] Not tainted > Using defaults from ksymoops -t elf32-i386 -a i386 > EFLAGS: 00010087 > eax: c2802db4 ebx: c2002db4 ecx: 00000000 edx: 00000003 > esi: c2802db0 edi: c2802db0 ebp: f7bf3ee8 esp: f7bf3ecc > ds: 0018 es: 0018 ss: 0018 Okay, %ecx is 0 -- no bitflip, just plain old NULL... On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote: > Code; c0116363 <__wake_up+3b/c0> > 00000000 <_EIP>: > Code; c0116363 <__wake_up+3b/c0> <===== > 0: 8b 01 mov (%ecx),%eax <===== > Code; c0116365 <__wake_up+3d/c0> > 2: 85 45 fc test %eax,0xfffffffc(%ebp) > Code; c0116368 <__wake_up+40/c0> > 5: 74 66 je 6d <_EIP+0x6d> c01163d0 <__wake_up+a8/c Okay, the offending instruction is mov (%ecx), %eax -- dereferencing the NULL %ecx... On Mon, May 20, 2002 at 03:26:56PM -0500, Todd R. Eigenschink wrote: > (gdb) list *__wake_up+0x3b > 0x973 is in __wake_up (sched.c:731). > 726 unsigned int state; > 727 wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list); > 728 > 729 CHECK_MAGIC(curr->__magic); > 730 p = curr->task; > 731 state = p->state; > 732 if (state & mode) { > 733 WQ_NOTE_WAKER(curr); > 734 if (try_to_wake_up(p, sync) && (curr->flags&WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) > 735 break; This makes it pretty clear the offending instruction corresponds to the first dereference of curr->task. Someone's leaving a NULL pointer in there when they shouldn't. So this entire call chain has nothing to do with the offender -- it only trips over the bad pointer the offending code left behind. This looks like a PITA. The objdump --disassemble --source stuff is just to have the assembly and source next to each other for a "more convincing" demonstration, not that this isn't already pretty good as it stands. Of course, finding the offender will be painful. Cheers, Bill ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) 2002-05-20 22:36 ` William Lee Irwin III @ 2002-05-20 23:07 ` Todd R. Eigenschink 2002-05-20 23:28 ` William Lee Irwin III 0 siblings, 1 reply; 9+ messages in thread From: Todd R. Eigenschink @ 2002-05-20 23:07 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-kernel William Lee Irwin III writes: >Bitflips are usually things where a pointer turns up invalid (or >non-NULL) and the difference between it and a valid pointer (or NULL) >is one bit. I don't see that here and don't like blaming hardware. Good point. >Nice, I presume you've got -g there? Any chance of doing something like >objdump --disassemble --source vmlinux and sending me the annotated >disassembly of __wake_up()? I want to doublecheck something... Everything's compiled with -g at the moment. In fact, I tried compiling without the -O2, but found out pretty quickly that You Can't Do That. :) The disassembly is included below. It's not too big. I was upstairs rebooting from another oops when your mail arrived, just a few hours after the last oops. (Same workload, continuing where it left off before.) It was identical apart from trivialities, and of course %ecx was 0. > The objdump --disassemble >--source stuff is just to have the assembly and source next to each >other for a "more convincing" demonstration, not that this isn't already >pretty good as it stands. Of course, finding the offender will be painful. I'll be glad to do whatever I can to help. If four jobs crashes it in a couple hours, 20 will probably crash it a lot sooner. :) For whatever this may be worth--probably nothing--I have softdog compiled in, but it has only successfully rebooted after an oops maybe twice out of 20 or more oopsen. On a bunch of them, the message has come out to the serial console that it was initiating a reboot (but it didn't). Most of the time, it's just the oops and then...darkness. Also, on the off chance that this is a code generation problem, this is gcc 2.95.3. I actually was about to say 3.0.4 and wait for the slaps-upside-the-head, but I just checked and realized I haven't upgraded this box. Todd Partial disassembly follows. If for some strange reason you want the whole thing, it's ~5MB and at http://www.mixi.net/~eigenstr/vmlinux.disassembly.bz2 . ---------------------------------------------------------------------- c0116328 <__wake_up>: /* * The core wakeup function. Non-exclusive wakeups (nr_exclusive == 0) just wake everything * up. If it's an exclusive wakeup (nr_exclusive == small +ve number) then we wake all the * non-exclusive tasks and one exclusive task. * * There are circumstances in which we can try to wake a task which has already * started to run but is not in state TASK_RUNNING. try_to_wake_up() returns zero * in this (rare) case, and we handle it by contonuing to scan the queue. */ static inline void __wake_up_common (wait_queue_head_t *q, unsigned int mode, int nr_exclusive, const int sync) { struct list_head *tmp; struct task_struct *p; CHECK_MAGIC_WQHEAD(q); WQ_CHECK_LIST_HEAD(&q->task_list); list_for_each(tmp,&q->task_list) { unsigned int state; wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list); CHECK_MAGIC(curr->__magic); p = curr->task; state = p->state; if (state & mode) { WQ_NOTE_WAKER(curr); if (try_to_wake_up(p, sync) && (curr->flags&WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) break; } } } void __wake_up(wait_queue_head_t *q, unsigned int mode, int nr) { c0116328: 55 push %ebp c0116329: 89 e5 mov %esp,%ebp c011632b: 83 ec 10 sub $0x10,%esp c011632e: 57 push %edi c011632f: 56 push %esi c0116330: 53 push %ebx c0116331: 89 55 fc mov %edx,0xfffffffc(%ebp) c0116334: 89 c7 mov %eax,%edi if (q) { c0116336: 85 ff test %edi,%edi c0116338: 0f 84 a2 00 00 00 je c01163e0 <__wake_up+0xb8> unsigned long flags; wq_read_lock_irqsave(&q->lock, flags); c011633e: 9c pushf c011633f: 8f 45 f8 popl 0xfffffff8(%ebp) c0116342: fa cli printk("eip: %p\n", &&here); BUG(); } #endif __asm__ __volatile__( c0116343: f0 fe 0f lock decb (%edi) c0116346: 0f 88 5f 0f 00 00 js c01172ab <Letext+0x8a> c011634c: 89 4d f4 mov %ecx,0xfffffff4(%ebp) c011634f: 8b 5f 04 mov 0x4(%edi),%ebx c0116352: 8d 47 04 lea 0x4(%edi),%eax c0116355: 89 45 f0 mov %eax,0xfffffff0(%ebp) c0116358: 39 c3 cmp %eax,%ebx c011635a: 74 7b je c01163d7 <__wake_up+0xaf> c011635c: 8d 74 26 00 lea 0x0(%esi,1),%esi c0116360: 8b 4b fc mov 0xfffffffc(%ebx),%ecx c0116363: 8b 01 mov (%ecx),%eax c0116365: 85 45 fc test %eax,0xfffffffc(%ebp) c0116368: 74 66 je c01163d0 <__wake_up+0xa8> c011636a: 31 d2 xor %edx,%edx c011636c: 9c pushf c011636d: 5e pop %esi c011636e: fa cli printk("eip: %p\n", &&here); BUG(); } #endif __asm__ __volatile__( c011636f: f0 fe 0d 80 99 30 c0 lock decb 0xc0309980 c0116376: 0f 88 3b 0f 00 00 js c01172b7 <Letext+0x96> c011637c: c7 01 00 00 00 00 movl $0x0,(%ecx) c0116382: 83 79 3c 00 cmpl $0x0,0x3c(%ecx) c0116386: 75 2d jne c01163b5 <__wake_up+0x8d> */ static __inline__ void __list_add(struct list_head * new, struct list_head * prev, struct list_head * next) { c0116388: a1 c0 b5 2a c0 mov 0xc02ab5c0,%eax next->prev = new; new->next = next; new->prev = prev; prev->next = new; } /** * list_add - add a new entry * @new: new entry to be added * @head: list head to add it after * * Insert a new entry after the specified head. * This is good for implementing stacks. */ static __inline__ void list_add(struct list_head *new, struct list_head *head) { c011638d: 8d 51 3c lea 0x3c(%ecx),%edx c0116390: 89 50 04 mov %edx,0x4(%eax) c0116393: 89 41 3c mov %eax,0x3c(%ecx) c0116396: c7 42 04 c0 b5 2a c0 movl $0xc02ab5c0,0x4(%edx) c011639d: 89 15 c0 b5 2a c0 mov %edx,0xc02ab5c0 c01163a3: ff 05 60 7a 32 c0 incl 0xc0327a60 c01163a9: 89 c8 mov %ecx,%eax c01163ab: e8 48 f6 ff ff call c01159f8 <reschedule_idle> c01163b0: ba 01 00 00 00 mov $0x1,%edx :"0" (oldval) : "memory" static inline void spin_unlock(spinlock_t *lock) { char oldval = 1; c01163b5: b0 01 mov $0x1,%al #if SPINLOCK_DEBUG if (lock->magic != SPINLOCK_MAGIC) BUG(); if (!spin_is_locked(lock)) BUG(); #endif __asm__ __volatile__( c01163b7: 86 05 80 99 30 c0 xchg %al,0xc0309980 c01163bd: 56 push %esi c01163be: 9d popf c01163bf: 85 d2 test %edx,%edx c01163c1: 74 0d je c01163d0 <__wake_up+0xa8> c01163c3: f6 43 f8 01 testb $0x1,0xfffffff8(%ebx) c01163c7: 74 07 je c01163d0 <__wake_up+0xa8> c01163c9: ff 4d f4 decl 0xfffffff4(%ebp) c01163cc: 74 09 je c01163d7 <__wake_up+0xaf> c01163ce: 89 f6 mov %esi,%esi c01163d0: 8b 1b mov (%ebx),%ebx c01163d2: 3b 5d f0 cmp 0xfffffff0(%ebp),%ebx c01163d5: 75 89 jne c0116360 <__wake_up+0x38> :"0" (oldval) : "memory" static inline void spin_unlock(spinlock_t *lock) { char oldval = 1; c01163d7: b0 01 mov $0x1,%al #if SPINLOCK_DEBUG if (lock->magic != SPINLOCK_MAGIC) BUG(); if (!spin_is_locked(lock)) BUG(); #endif __asm__ __volatile__( c01163d9: 86 07 xchg %al,(%edi) __wake_up_common(q, mode, nr, 0); wq_read_unlock_irqrestore(&q->lock, flags); c01163db: ff 75 f8 pushl 0xfffffff8(%ebp) c01163de: 9d popf } c01163df: 90 nop c01163e0: 5b pop %ebx c01163e1: 5e pop %esi c01163e2: 5f pop %edi c01163e3: 89 ec mov %ebp,%esp c01163e5: 5d pop %ebp c01163e6: c3 ret } c01163e7: 90 nop ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) 2002-05-20 23:07 ` Todd R. Eigenschink @ 2002-05-20 23:28 ` William Lee Irwin III 2002-05-20 23:59 ` Todd R. Eigenschink 0 siblings, 1 reply; 9+ messages in thread From: William Lee Irwin III @ 2002-05-20 23:28 UTC (permalink / raw) To: Todd R. Eigenschink; +Cc: linux-kernel On Mon, May 20, 2002 at 06:07:12PM -0500, Todd R. Eigenschink wrote: > For whatever this may be worth--probably nothing--I have softdog > compiled in, but it has only successfully rebooted after an oops maybe > twice out of 20 or more oopsen. On a bunch of them, the message has > come out to the serial console that it was initiating a reboot (but it > didn't). Most of the time, it's just the oops and then...darkness. Actually, getting a notion of your sourcebase and what's actually running sounds like a great idea. Any chance you could rattle off what patches you've got and/or name the tree, and maybe send me a .config? Also, any chance you could tell me a little about the hardware? I'm not going to tell you what to run or not to run, I just want to know where to start looking. On Mon, May 20, 2002 at 06:07:12PM -0500, Todd R. Eigenschink wrote: > Also, on the off chance that this is a code generation problem, this > is gcc 2.95.3. I actually was about to say 3.0.4 and wait for the > slaps-upside-the-head, but I just checked and realized I haven't > upgraded this box. I don't know of any particular issues with gcc 2.95.3, but I'll compare the disassemblies you sent me just in case. Your help in tracking this down has been immense, I hope you have the patience to bear with me as I try to fix this for you. Thanks, Bill ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) 2002-05-20 23:28 ` William Lee Irwin III @ 2002-05-20 23:59 ` Todd R. Eigenschink 0 siblings, 0 replies; 9+ messages in thread From: Todd R. Eigenschink @ 2002-05-20 23:59 UTC (permalink / raw) To: William Lee Irwin III; +Cc: linux-kernel William Lee Irwin III writes: >Actually, getting a notion of your sourcebase and what's actually >running sounds like a great idea. Any chance you could rattle off what >patches you've got and/or name the tree, and maybe send me a .config? >Also, any chance you could tell me a little about the hardware? >I'm not going to tell you what to run or not to run, I just want to >know where to start looking. Kernel: vanilla 2.4.19-pre8 at the moment. I recompiled after adding Steven Tweedie's latest ext3 patch the other night, but that's it. I've been following the 2.4.19-pre kernels "religiously", but never mix in *any* other patches. While I don't have any actual oops output from previous kernels, I think this has been around in every 2.4.19-pre. (I've been having trouble for longer than that, but my last round--see link below--at least *appeared* different.) Stuff That Runs: vanilla. syslog-ng, bind 9.2.1, gated, portmap, ypserv, xinted automount, cron, rpc.mountd, ypbind, rpc.nfsd, Apache (hardly ever touched), Backup Exec agent, postgres 7.2.1 (only hit by Apache). Webtrends runs early every morning. A bunch of other machines rcp log files to it between midnight and 04:00. I've had oopsen while webtrends is running and while it's not running. I've had them just when there are rsh/rcp sessions from a couple different machines at the same time. I've even had them when the machine is (as far as I could predict) completely idle. If you have suggestions for stuff to run (or not)--whatever--I'll be glad to try it. I can start going backwards kernel-wise, if you want me to try to pin a starting point for the problem. A couple other references: http://groups.google.com/groups?q=todd+eigenschink&hl=en&lr=&ie=utf-8&oe=utf-8&scoring=d&selm=linux.kernel.15404.36497.77658.797884%40rtfm.ofc.tekinteractive.com&rnum=7 http://groups.google.com/groups?q=todd+eigenschink&hl=en&lr=&ie=utf-8&oe=utf-8&scoring=d&selm=linux.kernel.3C3D375C.E4A7EE77%40zip.com.au&rnum=6 >Your help in tracking this down has been immense, I hope you have the >patience to bear with me as I try to fix this for you. I have a lot more patience than kernel hacking skill, so I'll do what I can, and you do your thing. :-) A steak dinner and a case of your favorite if you fix it. I'm *really* tired of getting paged and driving in to the office in the wee hours of the morning to hit the freaking reset button. I do preemptive reboots some evenings so I can control it, but it may still croak a couple hours later. (I'd love an APC MasterSwitch right now, but I can do a *lot* of driving and switch-flipping for $600.) Todd (Hardware info and .config follows.) ---------------------------------------------------------------------- Hardware: Intel L440GX-C mainboard. Dual P3/500 CPUs, 2 GB of RAM. 1 9GB SCSI disk, 1 36GB SCSI, 4 x 30GB IDE disks, all on the internal IDE & Adaptec SCSI. (The IDE used to be one 4-disk softraid RAID0 partition; now it's two separate 2-disk RAID0 partitions.) ---------------------------------------------------------------------- "grep =y .config" (nothing configured as modules). It had been CONFIG_MPENTIUMIII; I recompiled as M586 a few days ago. No change. CONFIG_X86=y CONFIG_ISA=y CONFIG_UID16=y CONFIG_EXPERIMENTAL=y CONFIG_MODULES=y CONFIG_MODVERSIONS=y CONFIG_KMOD=y CONFIG_M586=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_X86_USE_STRING_486=y CONFIG_X86_ALIGNMENT_16=y CONFIG_X86_PPRO_FENCE=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_HIGHMEM4G=y CONFIG_HIGHMEM=y CONFIG_MTRR=y CONFIG_SMP=y CONFIG_HAVE_DEC_LOCK=y CONFIG_NET=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_NAMES=y CONFIG_SYSVIPC=y CONFIG_SYSCTL=y CONFIG_KCORE_ELF=y CONFIG_BINFMT_ELF=y CONFIG_BLK_DEV_FD=y CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_RAID0=y CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_NETLINK_DEV=y CONFIG_NETFILTER=y CONFIG_FILTER=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_NF_CONNTRACK=y CONFIG_IP_NF_FTP=y CONFIG_IP_NF_IPTABLES=y CONFIG_IP_NF_MATCH_MULTIPORT=y CONFIG_IP_NF_MATCH_STATE=y CONFIG_IP_NF_FILTER=y CONFIG_IP_NF_TARGET_REJECT=y CONFIG_IP_NF_TARGET_LOG=y CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDEDISK=y CONFIG_BLK_DEV_IDECD=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_BLK_DEV_IDEDMA_PCI=y CONFIG_IDEDMA_PCI_AUTO=y CONFIG_BLK_DEV_IDEDMA=y CONFIG_BLK_DEV_ADMA=y CONFIG_BLK_DEV_PIIX=y CONFIG_PIIX_TUNING=y CONFIG_IDE_CHIPSETS=y CONFIG_IDEDMA_AUTO=y CONFIG_BLK_DEV_IDE_MODES=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_AIC7XXX=y CONFIG_NETDEVICES=y CONFIG_NET_ETHERNET=y CONFIG_NET_PCI=y CONFIG_EEPRO100=y CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_SERIAL=y CONFIG_SERIAL_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_WATCHDOG=y CONFIG_SOFT_WATCHDOG=y CONFIG_RTC=y CONFIG_AUTOFS_FS=y CONFIG_AUTOFS4_FS=y CONFIG_EXT3_FS=y CONFIG_JBD=y CONFIG_RAMFS=y CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_PROC_FS=y CONFIG_DEVPTS_FS=y CONFIG_EXT2_FS=y CONFIG_NFS_FS=y CONFIG_NFS_V3=y CONFIG_NFSD=y CONFIG_NFSD_V3=y CONFIG_SUNRPC=y CONFIG_LOCKD=y CONFIG_LOCKD_V4=y CONFIG_MSDOS_PARTITION=y CONFIG_NLS=y CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_ISO8859_1=y CONFIG_VGA_CONSOLE=y ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2002-05-20 23:59 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <200205160528.g4G5S631019167@sol.mixi.net> 2002-05-16 12:28 ` Re: kswapd OOPS under 2.4.19-pre8 (ext3, Reiserfs + (soft)raid0) Todd R. Eigenschink 2002-05-16 19:38 ` William Lee Irwin III 2002-05-20 12:58 ` Todd R. Eigenschink 2002-05-20 17:00 ` William Lee Irwin III 2002-05-20 20:26 ` Todd R. Eigenschink 2002-05-20 22:36 ` William Lee Irwin III 2002-05-20 23:07 ` Todd R. Eigenschink 2002-05-20 23:28 ` William Lee Irwin III 2002-05-20 23:59 ` Todd R. Eigenschink
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).