* Possible dcache BUG @ 2004-08-02 13:14 Brett Charbeneau 2004-08-05 2:16 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Brett Charbeneau @ 2004-08-02 13:14 UTC (permalink / raw) To: linux-kernel Greetings, I am getting the oops below - twice since 7/26, but I haven't a clue what's causing it. I am not a subscriber, so any replies directed to me would be gratefully received. Thank you for your hard work on this! -- Brett Charbeneau, Network Administrator Tel: 757-259-7750 Williamsburg Regional Library FAX: 757-259-7798 7770 Croaker Road brett@wrl.org Williamsburg, VA 23188-7064 http://www.wrl.org ksymoops 2.4.9 on i686 2.4.26. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.26/ (default) -m /boot/System.map (specified) 1151MB HIGHMEM available. 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html kernel BUG at dcache.c:345! invalid operand: 0000 CPU: 0 EIP: 0010:[<c014322d>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010206 eax: 00040000 ebx: eb8d7c70 ecx: c281b394 edx: e5636700 esi: eb8d7c58 edi: c281b394 ebp: d2b15f34 esp: d2b15f08 ds: 0018 es: 0018 ss: 0018 Process umount (pid: 14814, stackpage=d2b15000) Stack: c0128f81 c281b49c c281f000 00000246 d2b15f34 f721e1a0 00000466 f721e178 f721e178 f721e178 c02991c0 d2b15f44 c01435a6 00000150 f7b6f400 d2b15f5c c013714f f721e178 d2b15f88 08052179 0804d82b d2b15f7c c013afea f7b6f400 Call Trace: [<c0128f81>] [<c01435a6>] [<c013714f>] [<c013afea>] [<c01472d0>] [<c01472ee>] [<c0106d93>] Code: 0f 0b 59 01 1e d6 25 c0 8d 56 10 8b 4a 04 8b 46 10 89 48 04 >>EIP; c014322d <prune_dcache+5d/140> <===== >>ebx; eb8d7c70 <_end+2b5bb734/384f6ac4> >>ecx; c281b394 <_end+24fee58/384f6ac4> >>edx; e5636700 <_end+2531a1c4/384f6ac4> >>esi; eb8d7c58 <_end+2b5bb71c/384f6ac4> >>edi; c281b394 <_end+24fee58/384f6ac4> >>ebp; d2b15f34 <_end+127f99f8/384f6ac4> >>esp; d2b15f08 <_end+127f99cc/384f6ac4> Trace; c0128f81 <kmem_cache_free+1c1/270> Trace; c01435a6 <shrink_dcache_parent+16/30> Trace; c013714f <kill_super+5f/f0> Trace; c013afea <path_release+2a/40> Trace; c01472d0 <sys_umount+80/90> Trace; c01472ee <sys_oldumount+e/20> Trace; c0106d93 <system_call+33/38> Code; c014322d <prune_dcache+5d/140> 00000000 <_EIP>: Code; c014322d <prune_dcache+5d/140> <===== 0: 0f 0b ud2a <===== Code; c014322f <prune_dcache+5f/140> 2: 59 pop %ecx Code; c0143230 <prune_dcache+60/140> 3: 01 1e add %ebx,(%esi) Code; c0143232 <prune_dcache+62/140> 5: d6 (bad) Code; c0143233 <prune_dcache+63/140> 6: 25 c0 8d 56 10 and $0x10568dc0,%eax Code; c0143238 <prune_dcache+68/140> b: 8b 4a 04 mov 0x4(%edx),%ecx Code; c014323b <prune_dcache+6b/140> e: 8b 46 10 mov 0x10(%esi),%eax Code; c014323e <prune_dcache+6e/140> 11: 89 48 04 mov %ecx,0x4(%eax) kernel BUG at dcache.c:345! invalid operand: 0000 CPU: 0 EIP: 0010:[<c014322d>] Not tainted EFLAGS: 00010206 eax: 00040000 ebx: ea612c70 ecx: c281b394 edx: dd1f64bc esi: ea612c58 edi: c281b394 ebp: c2825f00 esp: c2825ed4 ds: 0018 es: 0018 ss: 0018 Process kswapd (pid: 4, stackpage=c2825000) Stack: 00000187 00000003 c2825ef4 c0128525 c281b418 d8728000 c281b418 00000006 00000000 c233bfb0 00000003 c2825f0c c01435e2 00000d1d c2825f4c c012a284 00000006 000001d0 c2824000 ffffffff 00012199 000001d0 c02970d0 c2825f50 Call Trace: [<c0128525>] [<c01435e2>] [<c012a284>] [<c012a462>] [<c012a501>] [<c012a580>] [<c012a739>] [<c012a7b6>] [<c012a8ff>] [<c012a860>] [<c0105000>] [<c01055b6>] [<c012a860>] Code: 0f 0b 59 01 1e d6 25 c0 8d 56 10 8b 4a 04 8b 46 10 89 48 04 >>EIP; c014322d <prune_dcache+5d/140> <===== >>ebx; ea612c70 <_end+2a2f6734/384f6ac4> >>ecx; c281b394 <_end+24fee58/384f6ac4> >>edx; dd1f64bc <_end+1ced9f80/384f6ac4> >>esi; ea612c58 <_end+2a2f671c/384f6ac4> >>edi; c281b394 <_end+24fee58/384f6ac4> >>ebp; c2825f00 <_end+25099c4/384f6ac4> >>esp; c2825ed4 <_end+2509998/384f6ac4> Trace; c0128525 <__kmem_cache_shrink_locked+45/70> Trace; c01435e2 <shrink_dcache_memory+22/40> Trace; c012a284 <shrink_cache+294/370> Trace; c012a462 <refill_inactive+102/170> Trace; c012a501 <shrink_caches+31/40> Trace; c012a580 <try_to_free_pages_zone+70/f0> Trace; c012a739 <kswapd_balance_pgdat+59/b0> Trace; c012a7b6 <kswapd_balance+26/40> Trace; c012a8ff <kswapd+9f/c0> Trace; c012a860 <kswapd+0/c0> Trace; c0105000 <_stext+0/0> Trace; c01055b6 <arch_kernel_thread+26/40> Trace; c012a860 <kswapd+0/c0> Code; c014322d <prune_dcache+5d/140> 00000000 <_EIP>: Code; c014322d <prune_dcache+5d/140> <===== 0: 0f 0b ud2a <===== Code; c014322f <prune_dcache+5f/140> 2: 59 pop %ecx Code; c0143230 <prune_dcache+60/140> 3: 01 1e add %ebx,(%esi) Code; c0143232 <prune_dcache+62/140> 5: d6 (bad) Code; c0143233 <prune_dcache+63/140> 6: 25 c0 8d 56 10 and $0x10568dc0,%eax Code; c0143238 <prune_dcache+68/140> b: 8b 4a 04 mov 0x4(%edx),%ecx Code; c014323b <prune_dcache+6b/140> e: 8b 46 10 mov 0x10(%esi),%eax Code; c014323e <prune_dcache+6e/140> 11: 89 48 04 mov %ecx,0x4(%eax) ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-02 13:14 Possible dcache BUG Brett Charbeneau @ 2004-08-05 2:16 ` Gene Heskett 2004-08-05 3:46 ` Andrew Morton 2004-08-05 7:25 ` Linus Torvalds 0 siblings, 2 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-05 2:16 UTC (permalink / raw) To: linux-kernel On Monday 02 August 2004 09:14, Brett Charbeneau wrote: >Greetings, > > I am getting the oops below - twice since 7/26, but I haven't a >clue what's causing it. > I am not a subscriber, so any replies directed to me would be >gratefully received. > Thank you for your hard work on this! The attachment this gentleman included specifically points to prune_dcache(). Thats nice. It also means I'm not alone. See the 'prune_dcache() Oops, the saga continues' thread. I got in about 9pm after spending the afternoon inside a tv transmitter, having left the house about 1ish. Black screen. keyboard leds out. The usual. Last log entry was at 14:49 EDT this afternoon. Some file fam couldn't find message. Whenever it went down, it went so fast there was no logged trace. The next entry is syslogd restarting after I'd hit the reset button. So whatever took it down, did it all by itself as the only non-system processes running were setiathome, X and kmail (from kde3.2, kde3.2.3, and kde3.3-beta2, makes no diff, all fail in prune_dcache() ) making an every 10 minute run to get the mail. I *thought* I had PREEMPT turned off, but when I did a make xconfig, it was turned on. So its now off, and a new 2.6.8-rc3 is building. It was frame pointers I had turned on for the last build, still on for this one underway now. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 2:16 ` Gene Heskett @ 2004-08-05 3:46 ` Andrew Morton 2004-08-05 4:31 ` Gene Heskett 2004-08-05 7:25 ` Linus Torvalds 1 sibling, 1 reply; 146+ messages in thread From: Andrew Morton @ 2004-08-05 3:46 UTC (permalink / raw) To: gene.heskett; +Cc: linux-kernel Gene Heskett <gene.heskett@verizon.net> wrote: > > On Monday 02 August 2004 09:14, Brett Charbeneau wrote: > >Greetings, > > > > I am getting the oops below - twice since 7/26, but I haven't a > >clue what's causing it. > > I am not a subscriber, so any replies directed to me would be > >gratefully received. > > Thank you for your hard work on this! > > The attachment this gentleman included specifically points to > prune_dcache(). Thats nice. It also means I'm not alone. See the > 'prune_dcache() Oops, the saga continues' thread. Except he's running a 2.4 kernel. Is there any reason why I'm wrong in thinking that you have dodgy hardware? ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 3:46 ` Andrew Morton @ 2004-08-05 4:31 ` Gene Heskett 2004-08-05 0:44 ` Chris Shoemaker 2004-08-05 8:33 ` Denis Vlasenko 0 siblings, 2 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-05 4:31 UTC (permalink / raw) To: linux-kernel On Wednesday 04 August 2004 23:46, Andrew Morton wrote: >Gene Heskett <gene.heskett@verizon.net> wrote: >> On Monday 02 August 2004 09:14, Brett Charbeneau wrote: >> >Greetings, >> > >> > I am getting the oops below - twice since 7/26, but I haven't a >> >clue what's causing it. >> > I am not a subscriber, so any replies directed to me would be >> >gratefully received. >> > Thank you for your hard work on this! >> >> The attachment this gentleman included specifically points to >> prune_dcache(). Thats nice. It also means I'm not alone. See >> the 'prune_dcache() Oops, the saga continues' thread. > >Except he's running a 2.4 kernel. > >Is there any reason why I'm wrong in thinking that you have dodgy >hardware? Well, it has, in the past week, ran memtest86-3a for 12 full passes over the whole gig of ram with no errors. This was the longest test, I gave it a 2 hour, 5 pass test before I ever booted linux the first time on this motherboard over 2 weeks ago now, a new Biostar M7NCD-Pro, with an nforce2(3?) chipset. I did that because I was comeing from an older board whose memory had been overstressed by a failing video card and I wanted to make sure this new memory, nearly $210 worth of it, was good. I gave it another, probably 4 hour test after the first couple of crashes, which it also passed. And it got worse as the kernel versions incremented from 2.6.7. I can have the same fault in prune_dcache() while running a 2.6.7 kernel without an instant lockup, but it will eventually die, maybe half an hour later. Move to 2.6.7-mm1, which has a patch to fs/dcache.c that remains untouched thru 2.6.8-rc2, and those kernels, if they lock up, do it totally, often with nothing in the logs at all. That was the case today, on 2.6.8-rc3, which has a new dcache.c patch in it if I read the release notes correctly. If this is dodgy hardware, give me something to take to tcwo.com when I ask for an rma. Not having M$ windows of any kind here, I frankly haven't had the inclination to look at the cd's that came with the board. Should I? Or does linux have a hardware test suite I've not heard about? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 4:31 ` Gene Heskett @ 2004-08-05 0:44 ` Chris Shoemaker 2004-08-05 8:35 ` Denis Vlasenko 2004-08-05 13:48 ` Gene Heskett 2004-08-05 8:33 ` Denis Vlasenko 1 sibling, 2 replies; 146+ messages in thread From: Chris Shoemaker @ 2004-08-05 0:44 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel On Thu, Aug 05, 2004 at 12:31:21AM -0400, Gene Heskett wrote: > On Wednesday 04 August 2004 23:46, Andrew Morton wrote: > >Gene Heskett <gene.heskett@verizon.net> wrote: > >> > >> The attachment this gentleman included specifically points to > >> prune_dcache(). Thats nice. It also means I'm not alone. See > >> the 'prune_dcache() Oops, the saga continues' thread. > > > >Except he's running a 2.4 kernel. > > > >Is there any reason why I'm wrong in thinking that you have dodgy > >hardware? > > Well, it has, in the past week, ran memtest86-3a for 12 full passes > over the whole gig of ram with no errors. This was the longest test, > I gave it a 2 hour, 5 pass test before I ever booted linux the first > time on this motherboard over 2 weeks ago now, a new Biostar > M7NCD-Pro, with an nforce2(3?) chipset. I did that because I was > comeing from an older board whose memory had been overstressed by a > failing video card and I wanted to make sure this new memory, nearly > $210 worth of it, was good. I gave it another, probably 4 hour test > after the first couple of crashes, which it also passed. And it got > worse as the kernel versions incremented from 2.6.7. I can have the > same fault in prune_dcache() while running a 2.6.7 kernel without an > instant lockup, but it will eventually die, maybe half an hour later. > Move to 2.6.7-mm1, which has a patch to fs/dcache.c that remains > untouched thru 2.6.8-rc2, and those kernels, if they lock up, do it > totally, often with nothing in the logs at all. That was the case > today, on 2.6.8-rc3, which has a new dcache.c patch in it if I read > the release notes correctly. > > If this is dodgy hardware, give me something to take to tcwo.com when > I ask for an rma. Not having M$ windows of any kind here, I frankly > haven't had the inclination to look at the cd's that came with the > board. Should I? > > Or does linux have a hardware test suite I've not heard about? Gene, I sympathize with you. Back in March and April I was seeing oopses in prune_dcache() once every few days. After tracing the asm down for a few of them, I found one that looked like a 3 bit flip and then one that looked like a single bit flip. I memtested my RAM for days with no failure. I tried cpuburn. I looped over kernel compiles. I couldn't make it fail, but every day or two, as long as I wasn't trying, I'd get an oops, and more than %50 were in prune_dcache. I believed that there was a correspondence with low memory conditions, but I never proved this. I _added_ a memory module (keeping everything I had) and I compiled 2.6.7-rc3 on Jun 10th. I haven't oopsed since. (I think I may also have turned off PREEMP around this time, so that's why I suggested it earlier.) FWIW, I've seen no fewer than 4 independent reports that looked suspiciously like yours and mine over the past 3 months. Maybe we all have bad hardware, and memtest86 just isn't stressful enough to show it. The alternative is that there's some bug that has affected several versions of 2.6 (and maybe 2.4) that seems to hit in low memory conditions (e.g. as a result of a 4am cron.daily, or a large rsync). If you're curious, search google groups for "+oops +prune_dcache group:linux.kernel", sort by date and look through the first 3 or 4 pages. You'll see the same story with the same oopses over and over. I know the few single bit flips are _probably_ bad hardware, but the more similarities I see, the more I wonder. But, since my problems have completely gone away by adding more RAM, I haven't been motivated to track it down anymore. Sorry I can't be more helpful. Good luck. -chris > ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 0:44 ` Chris Shoemaker @ 2004-08-05 8:35 ` Denis Vlasenko 2004-08-05 14:14 ` Gene Heskett 2004-08-05 13:48 ` Gene Heskett 1 sibling, 1 reply; 146+ messages in thread From: Denis Vlasenko @ 2004-08-05 8:35 UTC (permalink / raw) To: Chris Shoemaker, Gene Heskett; +Cc: linux-kernel > FWIW, I've seen no fewer than 4 independent reports that looked > suspiciously like yours and mine over the past 3 months. Maybe we all > have bad hardware, and memtest86 just isn't stressful enough to show it. > The alternative is that there's some bug that has affected several > versions of 2.6 (and maybe 2.4) that seems to hit in low memory > conditions (e.g. as a result of a 4am cron.daily, or a large rsync). > > If you're curious, search google groups for "+oops +prune_dcache > group:linux.kernel", sort by date and look through the first 3 or 4 > pages. You'll see the same story with the same oopses over and over. > I know the few single bit flips are _probably_ bad hardware, but the more > similarities I see, the more I wonder. > > But, since my problems have completely gone away by adding more RAM, > I haven't been motivated to track it down anymore. Let's rule out PREEMPT first > Sorry I can't be more helpful. Good luck. Maybe turn PREEMPT back on? -- vda ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 8:35 ` Denis Vlasenko @ 2004-08-05 14:14 ` Gene Heskett 0 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-05 14:14 UTC (permalink / raw) To: linux-kernel On Thursday 05 August 2004 04:35, Denis Vlasenko wrote: > >Let's rule out PREEMPT first > >> Sorry I can't be more helpful. Good luck. > >Maybe turn PREEMPT back on? I found it was on when I checked last night, and turned it off for this build. About 9 hours uptime now. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 0:44 ` Chris Shoemaker 2004-08-05 8:35 ` Denis Vlasenko @ 2004-08-05 13:48 ` Gene Heskett [not found] ` <200408210118.02011.vda@port.imtp.ilyichevsk.odessa.ua> 1 sibling, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-05 13:48 UTC (permalink / raw) To: linux-kernel On Wednesday 04 August 2004 20:44, Chris Shoemaker wrote: >On Thu, Aug 05, 2004 at 12:31:21AM -0400, Gene Heskett wrote: >> On Wednesday 04 August 2004 23:46, Andrew Morton wrote: >> >Gene Heskett <gene.heskett@verizon.net> wrote: >> >> The attachment this gentleman included specifically points to >> >> prune_dcache(). Thats nice. It also means I'm not alone. >> >> See the 'prune_dcache() Oops, the saga continues' thread. >> > >> >Except he's running a 2.4 kernel. I didn't take note of that. What triggerd my response was the prune_dcache() problem. In my case we've taken a couple of than apart and the Opps is actually in the _dput() statement where the eas register contains a very small, but non-zero value, like maybe 0x00000820. Thats a bit difficult as some of this code is marked as __inline, and can reach over 130 bytes between the labels we put into the srcs. IMO, thats too much to inline if its used more than once, and it is. And guess what, both prune_dcache() and _dput() are inlined... >> >Is there any reason why I'm wrong in thinking that you have dodgy >> >hardware? >> >> Well, it has, in the past week, ran memtest86-3a for 12 full >> passes over the whole gig of ram with no errors. This was the >> longest test, I gave it a 2 hour, 5 pass test before I ever booted >> linux the first time on this motherboard over 2 weeks ago now, a >> new Biostar M7NCD-Pro, with an nforce2(3?) chipset. I did that >> because I was comeing from an older board whose memory had been >> overstressed by a failing video card and I wanted to make sure >> this new memory, nearly $210 worth of it, was good. I gave it >> another, probably 4 hour test after the first couple of crashes, >> which it also passed. And it got worse as the kernel versions >> incremented from 2.6.7. I can have the same fault in >> prune_dcache() while running a 2.6.7 kernel without an instant >> lockup, but it will eventually die, maybe half an hour later. Move >> to 2.6.7-mm1, which has a patch to fs/dcache.c that remains >> untouched thru 2.6.8-rc2, and those kernels, if they lock up, do >> it totally, often with nothing in the logs at all. That was the >> case today, on 2.6.8-rc3, which has a new dcache.c patch in it if >> I read the release notes correctly. >> >> If this is dodgy hardware, give me something to take to tcwo.com >> when I ask for an rma. Not having M$ windows of any kind here, I >> frankly haven't had the inclination to look at the cd's that came >> with the board. Should I? >> >> Or does linux have a hardware test suite I've not heard about? > >Gene, > I sympathize with you. Back in March and April I was seeing >oopses in prune_dcache() once every few days. After tracing the asm >down for a few of them, I found one that looked like a 3 bit flip > and then one that looked like a single bit flip. I memtested my > RAM for days with no failure. I tried cpuburn. I looped over > kernel compiles. I couldn't make it fail, but every day or two, as > long as I wasn't trying, I'd get an oops, and more than %50 were in > prune_dcache. I believed that there was a correspondence with low > memory conditions, but I never proved this. I _added_ a memory > module (keeping everything I had) and I compiled 2.6.7-rc3 on Jun > 10th. I haven't oopsed since. (I think I may also have turned off > PREEMP around this time, so that's why I suggested it earlier.) > > FWIW, I've seen no fewer than 4 independent reports that looked >suspiciously like yours and mine over the past 3 months. Maybe we > all have bad hardware, and memtest86 just isn't stressful enough to > show it. The alternative is that there's some bug that has affected > several versions of 2.6 (and maybe 2.4) that seems to hit in low > memory conditions (e.g. as a result of a 4am cron.daily, or a large > rsync). That does seem to correlate slightly, but yesterdays was in the middle of the afternoon while I was elsewhere, and very little is cron related at that time of day. > If you're curious, search google groups for "+oops +prune_dcache >group:linux.kernel", sort by date and look through the first 3 or 4 >pages. You'll see the same story with the same oopses over and > over. I know the few single bit flips are _probably_ bad hardware, > but the more similarities I see, the more I wonder. Me too, says he in a plaintive voice. > But, since my problems have completely gone away by adding more > RAM, I haven't been motivated to track it down anymore. > > Sorry I can't be more helpful. Good luck. This is a '3 slots for ram' board, and according to the docs, the Dual Channel DDR 400 banking scheme only works if the ram is in the 1st and 3rd slots, so thats where I put it. memtest86 reports a ram bandwidth of around 1.2 Gb/sec, and an L1 cache bandwidth of around 12Gb/sec. No L2 present. I might add that the first time I ran memtest86, the bios was missconfigured, at factory defaults, and was running the athlon 2800 at 3200, and the memory bus at over 450 mhz. No problem, but I did find an FSB and multiplier setting that gave a 400Mb bus, and which says the athlon-XP is a 2800+, so I figure that should be correct. The defaults didn't always want to post, but gave no other problems once it had. If I put another stick in the last, center slot, how does the hardware accept that? I'd have to go get one as the 256's in the old board are known dodgy. Would this incipient Oom condition not be handled correctly? If thats the cause, then maybe that portion of the code needs looked at, by experts the likes of which I don't pretend to be. But with a Gig of ram, I don't recall it ever using any swap. But it's perilously close to that right now according to top: top - 08:58:30 up 9:59, 5 users, load average: 1.21, 1.31, 1.16 Tasks: 100 total, 2 running, 97 sleeping, 0 stopped, 1 zombie Cpu(s): 5.6% us, 2.6% sy, 91.4% ni, 0.0% id, 0.0% wa, 0.3% hi, 0.0% si Mem: 1036020k total, 1018644k used, 17376k free, 230960k buffers Swap: 3857104k total, 0k used, 3857104k free, 119552k cached mmm, I wonder who the zombie is. Ahh, it's ~/bin/its-daylight. It's a script that cron triggered, and which changes the mode of the heyu/xtend stuff for daytime operations. Its (a bash script) apparently hung looking for a response it didn't get. I have 3 of those at various times of the day and I've never gotten email from that one. The mode change does occur though... FWIW heyu has been fixed, the distro version has a severe scope problem from a missing '}' which was not caught by the compiler, but by a tool I wrote years ago for os9 that I've ported to linux! The heyu author ): didn't seem to be interested in fixing it either. I'll go take a look at it after I've sent this, but it does bring up a sore point. linux doesn't get this right, os9 did. zombies are killable by os9, it simply takes it out of the execution queue, and reclaims all resources used back into the free pool, no questions asked or expected. We shouldn't have to reboot just to kill a fscking zombie... In any event, PREEPMT is now off, if this takes a dump, then the hi-mem support gets turned off and PREEMPT back on. One thing at a time. Thanks for the discussion, it was "enlightening" :-) >-chris -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
[parent not found: <200408210118.02011.vda@port.imtp.ilyichevsk.odessa.ua>]
* Re: Possible dcache BUG [not found] ` <200408210118.02011.vda@port.imtp.ilyichevsk.odessa.ua> @ 2004-08-21 1:40 ` Gene Heskett 0 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-21 1:40 UTC (permalink / raw) To: linux-kernel; +Cc: Denis Vlasenko On Friday 20 August 2004 18:18, Denis Vlasenko wrote: >> mmm, I wonder who the zombie is. Ahh, it's ~/bin/its-daylight. >> It's a script that cron triggered, and which changes the mode of >> the heyu/xtend stuff for daytime operations. Its (a bash script) >> apparently hung looking for a response it didn't get. I have 3 >> of those at various times of the day and I've never gotten email >> from that one. The mode change does occur though... FWIW heyu >> has been fixed, the distro version has a severe scope problem >> from a missing '}' which was not caught by the compiler, but by >> a tool I wrote years ago for os9 that I've ported to linux! The >> heyu author ): didn't seem to be interested in fixing it either. >> >> I'll go take a look at it after I've sent this, but it does bring >> up a sore point. linux doesn't get this right, os9 did. zombies >> are killable by os9, it simply takes it out of the execution >> queue, and reclaims all resources used back into the free pool, no >> questions asked or expected. We shouldn't have to reboot just to >> kill a fscking zombie... > >zombie is not much more than an exit code to be collected by >wait() syscall. All other resources are already freed. > >Zombies result when parent does not wait() for dead children. >Trivial example: > >#!/bin/sh >sleep 10 & >exec env - sleep 100 > >26752 pts/0 S 0:00 sleep 100 >26753 pts/0 Z 0:00 [sleep <defunct>] > >Such zombies got reparented to init *as soon as parent dies itself*. >Properly functioning init constanly wait()s for any unexpected > chindren, so it takes care of zombies. >-- >vda Oh oh, looks like I need a lesson in bash then. The whole basic idea of what I was doing there was for the parent shell to go away, leaving the child process sitting there until its done some 10 seconds later. If I didn't do that, then cron seemed to hang on the first execution as if was dutifully waiting for bash to exit... The bash manual I have is both too concise, and too verbose because bash is as close to emac's as I can think of when looking for a universal executer. In the crontab its this: 00 05 * * * /root/bin/its-daylight Then /root/bin/its-daylight calls 2 other scripts using the "&" syntax. So I guess its time to RTFM on bash again. Thanks. Now, to get this back on-thread.. I switched the memory sticks to each others sockets this afternoon. And "memburn 512" megabytes, which puts me into the swap about 70 megs, is still running with no detected errors in 1162 loops. About 4:30 elapsed time so far. I've got all my fingers and toes crossed, and everything but tied a knot in it, hoping this may be the end of the problem. If not, then the nightmare continues. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 4:31 ` Gene Heskett 2004-08-05 0:44 ` Chris Shoemaker @ 2004-08-05 8:33 ` Denis Vlasenko 2004-08-05 14:19 ` Gene Heskett 2004-08-05 21:26 ` Chris Shoemaker 1 sibling, 2 replies; 146+ messages in thread From: Denis Vlasenko @ 2004-08-05 8:33 UTC (permalink / raw) To: gene.heskett, linux-kernel > Well, it has, in the past week, ran memtest86-3a for 12 full passes > over the whole gig of ram with no errors. This was the longest test, > I gave it a 2 hour, 5 pass test before I ever booted linux the first > time on this motherboard over 2 weeks ago now, a new Biostar > M7NCD-Pro, with an nforce2(3?) chipset. I did that because I was > comeing from an older board whose memory had been overstressed by a > failing video card and I wanted to make sure this new memory, nearly > $210 worth of it, was good. I gave it another, probably 4 hour test > after the first couple of crashes, which it also passed. And it got You may use cpuburn to test RAM/CPU too. Although I have a memory which, when clocked a bit too high, pass both memtest86 and cpuburn for extended periods of time, yet large compile runs die with sig11 sometimes. Using a tiny bit less aggressive clocking helped. :) -- vda ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 8:33 ` Denis Vlasenko @ 2004-08-05 14:19 ` Gene Heskett [not found] ` <200408070203.35268.vda@port.imtp.ilyichevsk.odessa.ua> 2004-08-05 21:26 ` Chris Shoemaker 1 sibling, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-05 14:19 UTC (permalink / raw) To: linux-kernel On Thursday 05 August 2004 04:33, Denis Vlasenko wrote: >> Well, it has, in the past week, ran memtest86-3a for 12 full >> passes over the whole gig of ram with no errors. This was the >> longest test, I gave it a 2 hour, 5 pass test before I ever booted >> linux the first time on this motherboard over 2 weeks ago now, a >> new Biostar M7NCD-Pro, with an nforce2(3?) chipset. I did that >> because I was comeing from an older board whose memory had been >> overstressed by a failing video card and I wanted to make sure >> this new memory, nearly $210 worth of it, was good. I gave it >> another, probably 4 hour test after the first couple of crashes, >> which it also passed. And it got > >You may use cpuburn to test RAM/CPU too. Setiathome should be doing a pretty good job of that, the cpu is at 100% 99.99% of the time. Only going down for a few seconds as its managing script switches the link to a new data packet directory when its done with the current one. I keep 101 packets cached here. :) >Although I have a memory which, when clocked a bit too high, >pass both memtest86 and cpuburn for extended periods of time, >yet large compile runs die with sig11 sometimes. Using a tiny >bit less aggressive clocking helped. :) -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
[parent not found: <200408070203.35268.vda@port.imtp.ilyichevsk.odessa.ua>]
* Re: Possible dcache BUG [not found] ` <200408070203.35268.vda@port.imtp.ilyichevsk.odessa.ua> @ 2004-08-07 1:28 ` Gene Heskett 0 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-07 1:28 UTC (permalink / raw) To: linux-kernel; +Cc: Denis Vlasenko On Friday 06 August 2004 19:03, Denis Vlasenko wrote: >Hi Gene, > >Please do not remove my address from To or CC >fields, I can miss your emails otherwise. > Denis: Mmm, sorry. I was in the habit of using a button on kmail that replies only to the mailing list, thinking that then I wasn't bombarding everyone with 2 or more copies of my replies. I've now re-arranged it so that I have a "reply all" button, and will use that one from now on unless the subject is really OT. Linus: One comment re the patch, I'm seeing a huge slowdown in the seti processing, its only done about 2.5 units since 6am local, and it should be well into the 4th by now. Anybody: Speaking of somewhat OT, what is the command I should use to actually turn on the PREEMPT option in the kernel? Its on in the compile, but I think I read someplace where I had to do an "echo 1 >someplace in /proc" to actually enable it. I've survived over 24 hours now with the patch Linus sent, and I thought maybe I'd get some exersize pushing my luck :) [...] -- Cheers Denis, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 8:33 ` Denis Vlasenko 2004-08-05 14:19 ` Gene Heskett @ 2004-08-05 21:26 ` Chris Shoemaker 1 sibling, 0 replies; 146+ messages in thread From: Chris Shoemaker @ 2004-08-05 21:26 UTC (permalink / raw) To: Denis Vlasenko; +Cc: gene.heskett, linux-kernel On Thu, Aug 05, 2004 at 11:33:44AM +0300, Denis Vlasenko wrote: > > You may use cpuburn to test RAM/CPU too. > > Although I have a memory which, when clocked a bit too high, > pass both memtest86 and cpuburn for extended periods of time, > yet large compile runs die with sig11 sometimes. Using a tiny > bit less aggressive clocking helped. :) > -- > vda Oh yes, now I remember that it was you who recommened cpuburn to me back in April/May or so. I also was suspicious that neither memtest86 nor cpuburn were really stressful enough, but the large-compiles-in-a-loop weren't any better for me. I would _love_ to just have some confident test to say "yep, your hardware is bad, go buy a shiny new box" :) I've seen memtest86 actually find bad RAM on a machine before, so I know it works _sometimes_. Can anyone say the same for cpuburn? What does a failure look like, and were there correlated symptoms like kernel oopses? -chris > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 2:16 ` Gene Heskett 2004-08-05 3:46 ` Andrew Morton @ 2004-08-05 7:25 ` Linus Torvalds 2004-08-05 7:31 ` Andrew Morton ` (2 more replies) 1 sibling, 3 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-05 7:25 UTC (permalink / raw) To: Gene Heskett; +Cc: Kernel Mailing List, Andrew Morton On Wed, 4 Aug 2004, Gene Heskett wrote: > > I *thought* I had PREEMPT turned off, but when I did a make xconfig, > it was turned on. So its now off, and a new 2.6.8-rc3 is building. > It was frame pointers I had turned on for the last build, still on for > this one underway now. Your latest bug report definitely had preempt on, you could see the preempt code in the oops output when disassembled. Also, could you please enable CONFIG_DEBUG_BUGVERBOSE by hand if you use the -mm tree, since you definitely hit a BUG() in there somewhere, but in the -mm tree, the BUG() message is totally unreadable unless you enable BUGVERBOSE (and it's not in the config file). (Andrew - I think you should drop that patch, or at least enable BUGVERBOSE on x86 - it looks like it's disabled and with no way to enable it in the current -mm tree..) I _suspect_ you hit the new "list_del-debug.patch" in Andrew's tree, because in my tree there are no BUG_ON's in prune_cache() at all. If so, I think the last oops you had was BUG_ON(entry->next->prev != entry); in list_del(), but the fact is, the _interesting_ part in prune_dcache() ends up being the "list_del_init()" at the top, which is _not_ instrumented by the list_del-debug patch. So what I'd actually _like_ you to do is: - test 2.6.8-rc3, but with the "list_del-debug-patch" applied (appended). That way the BUG message will actually be readable. - add the same two BUG_ON() to "list_del_init()" too, for better coverage. Most of the dcache uses the "init" version. - keep PREEMPT on, since it is quite possible (likely) that this is a preempt problem. I'd love to see if you can hit the BUG() that way. Linus ---- >From Manfred Spraul A list_del debugging check. Signed-off-by: Andrew Morton <akpm@osdl.org> --- 25-akpm/include/linux/list.h | 3 +++ 1 files changed, 3 insertions(+) diff -puN include/linux/list.h~list_del-debug include/linux/list.h --- 25/include/linux/list.h~list_del-debug Mon Jun 14 16:44:07 2004 +++ 25-akpm/include/linux/list.h Mon Jun 14 16:51:27 2004 @@ -6,6 +6,7 @@ #include <linux/stddef.h> #include <linux/prefetch.h> #include <asm/system.h> +#include <asm/bug.h> /* * These are non-NULL pointers that will result in page faults @@ -160,6 +161,8 @@ static inline void __list_del(struct lis */ static inline void list_del(struct list_head *entry) { + BUG_ON(entry->prev->next != entry); + BUG_ON(entry->next->prev != entry); __list_del(entry->prev, entry->next); entry->next = LIST_POISON1; entry->prev = LIST_POISON2; _ ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 7:25 ` Linus Torvalds @ 2004-08-05 7:31 ` Andrew Morton 2004-08-05 8:33 ` Denis Vlasenko 2004-08-06 2:50 ` Linus Torvalds 2 siblings, 0 replies; 146+ messages in thread From: Andrew Morton @ 2004-08-05 7:31 UTC (permalink / raw) To: Linus Torvalds; +Cc: gene.heskett, linux-kernel Linus Torvalds <torvalds@osdl.org> wrote: > > (Andrew - I think you should drop that patch, or at least enable > BUGVERBOSE on x86 - it looks like it's disabled and with no way to enable > it in the current -mm tree..) ah, OK. I'll put it back to `#if 1', thanks. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 7:25 ` Linus Torvalds 2004-08-05 7:31 ` Andrew Morton @ 2004-08-05 8:33 ` Denis Vlasenko 2004-08-05 14:55 ` Gene Heskett 2004-08-05 16:26 ` Linus Torvalds 2004-08-06 2:50 ` Linus Torvalds 2 siblings, 2 replies; 146+ messages in thread From: Denis Vlasenko @ 2004-08-05 8:33 UTC (permalink / raw) To: Linus Torvalds, Gene Heskett; +Cc: Kernel Mailing List, Andrew Morton Hi Linus, On Thursday 05 August 2004 10:25, Linus Torvalds wrote: > On Wed, 4 Aug 2004, Gene Heskett wrote: > > I *thought* I had PREEMPT turned off, but when I did a make xconfig, > > it was turned on. So its now off, and a new 2.6.8-rc3 is building. > > It was frame pointers I had turned on for the last build, still on for > > this one underway now. > > Your latest bug report definitely had preempt on, you could see the > preempt code in the oops output when disassembled. > > Also, could you please enable CONFIG_DEBUG_BUGVERBOSE by hand if you use > the -mm tree, since you definitely hit a BUG() in there somewhere, but in > the -mm tree, the BUG() message is totally unreadable unless you enable > BUGVERBOSE (and it's not in the config file). It is not a BUG(). It's an oops (dereferencing a d_op pointer with value 0x00000900+14 IIRC, Gene has complete disassembly with location of that event). It is not reproducible on request, but happens for him from time to time in the same place with the same bogus value of d_op. -- vda ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 8:33 ` Denis Vlasenko @ 2004-08-05 14:55 ` Gene Heskett 2004-08-05 16:26 ` Linus Torvalds 1 sibling, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-05 14:55 UTC (permalink / raw) To: linux-kernel On Thursday 05 August 2004 04:33, Denis Vlasenko wrote: >Hi Linus, > >On Thursday 05 August 2004 10:25, Linus Torvalds wrote: >> On Wed, 4 Aug 2004, Gene Heskett wrote: >> > I *thought* I had PREEMPT turned off, but when I did a make >> > xconfig, it was turned on. So its now off, and a new 2.6.8-rc3 >> > is building. It was frame pointers I had turned on for the last >> > build, still on for this one underway now. >> >> Your latest bug report definitely had preempt on, you could see >> the preempt code in the oops output when disassembled. >> >> Also, could you please enable CONFIG_DEBUG_BUGVERBOSE by hand if >> you use the -mm tree, since you definitely hit a BUG() in there >> somewhere, but in the -mm tree, the BUG() message is totally >> unreadable unless you enable BUGVERBOSE (and it's not in the >> config file). > >It is not a BUG(). > >It's an oops (dereferencing a d_op pointer with value 0x00000900+14 >IIRC, Gene has complete disassembly with location of that event). Unforch Denis, this is 2.6.8-rc3, the stuff we dissed was from 2.6.7, where it can be hit without (usually that is) killing the machine instantly. From 2.6.7-mm1 on, the death seems generally sudden and instant, generally no logs get written at all. >It is not reproducible on request, but happens for him from time >to time in the same place with the same bogus value of d_op. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 8:33 ` Denis Vlasenko 2004-08-05 14:55 ` Gene Heskett @ 2004-08-05 16:26 ` Linus Torvalds 2004-08-05 18:06 ` Ingo Molnar ` (3 more replies) 1 sibling, 4 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-05 16:26 UTC (permalink / raw) To: Denis Vlasenko; +Cc: Gene Heskett, Kernel Mailing List, Andrew Morton On Thu, 5 Aug 2004, Denis Vlasenko wrote: > > It is not a BUG(). Oh yes it is. The 2.6.8-rc2-mm2 report definitely was a BUG(). Earlier ones may not have been, but on the other hand, earlier ones may not have had the BUG()-check for corrupted list_del() usage - it's not in the standard kernel, and I don't know when it was added to -mm. (We used to have it a _long_ time ago, but then we removed it because there were no reports of problems). > It's an oops (dereferencing a d_op pointer with value 0x00000900+14 > IIRC, Gene has complete disassembly with location of that event). .. and that must be because of some kind of pointer corruption, where the dentry was either free'd twice or the dentry simply isn't a dentry at all, it just got to be used as such because of some bug. > It is not reproducible on request, but happens for him from time > to time in the same place with the same bogus value of d_op. I've followed the discussion. You may not have noticed that the last one was different. (And I _think_ it may hav ebeen the first time Gene did a -mm kernel, so I do believe that the list_del() debugging was the thing that caught it). Anyway, one other thing that makes me worry is the fact that Gene apparently has a K7. One of the things AMD has gotten wrong several times is prefetching, and it so happens that the dcache code is one of the users of the prefetch instruction. prude_dcache() in particular. So I'm also entertaining the notion that there's an actual prefetch data corruption, not just the known AMD bug with occasional spurious page faults. Who else has seen the problem? What CPU's are involved? Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 16:26 ` Linus Torvalds @ 2004-08-05 18:06 ` Ingo Molnar 2004-08-05 18:50 ` Linus Torvalds 2004-08-05 21:10 ` Chris Shoemaker ` (2 subsequent siblings) 3 siblings, 1 reply; 146+ messages in thread From: Ingo Molnar @ 2004-08-05 18:06 UTC (permalink / raw) To: Linus Torvalds Cc: Denis Vlasenko, Gene Heskett, Kernel Mailing List, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 1952 bytes --] * Linus Torvalds <torvalds@osdl.org> wrote: > Anyway, one other thing that makes me worry is the fact that Gene > apparently has a K7. One of the things AMD has gotten wrong several > times is prefetching, and it so happens that the dcache code is one of > the users of the prefetch instruction. prude_dcache() in particular. hm, i too happen to have an Athlon64 box (running the x86 kernel) where i can reproduce dcache pruning crashes after a few hours of testing using a near-vanilla kernel. The crash is triggered by two infinite loops of: while true; do du /; done while true; dd if=/dev/zero of=/tmp/bigfile bs=1000000 count=500 sync sleep 30 done using FC2, stock normal ext3, 1GB of RAM, single-disk IDE and nothing else. NOTE: i discovered these crashes while working on the voluntary-preempt stuff, so it's not a pristine kernel. But i reproduced it using 2.6.8-rc2 plus voluntary-preempt=1 (i.e. no softirq or hardirq redirection to process context) - so it does nothing that CONFIG_PREEMPT wouldnt do. (i had CONFIG_PREEMPT on but kernel_preemption=0.) I've attached 3 oopses. this patch does introduce a conditional reschedule in prune_icache: --- linux/fs/inode.c.orig +++ linux/fs/inode.c @@ -428,6 +429,8 @@ static void prune_icache(int nr_to_scan) for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) { struct inode *inode; + voluntary_resched_lock(&inode_lock); + if (list_empty(&inode_unused)) break; but it should be perfectly fine to do that there. NOTE2: i tried hard but couldnt reproduce the problem using the very same kernel and the same workload on a PIII box. Once i ran it overnight to check. Only the Athlon64 box does it. It could also be a hardware problem - albeit the box withstood days of memtest86. NOTE3: there's no history of instability on this box otherwise, but i only started doing this test 1-2 weeks ago. Ingo [-- Attachment #2: 11 --] [-- Type: text/plain, Size: 1425 bytes --] Unable to handle kernel paging request at virtual address ffffffd8 printing eip: c016a3d0 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 EIP: 0060:[<c016a3d0>] Not tainted VLI EFLAGS: 00010217 (2.6.8-rc2-mm2) EIP is at remove_inode_buffers+0x60/0xe0 eax: 00000000 ebx: c03ba9dc ecx: 00000000 edx: c03ba8d0 esi: c03ba8d0 edi: c0379b2a ebp: c4115ec4 esp: c4115eac ds: 007b es: 007b ss: 0068 Process kswapd0 (pid: 39, threadinfo=c4114000 task=c40aa070) Stack: c03ba8d0 c0379b76 00000001 c03ba8d8 c03ba8d0 00000000 c4115ef8 c0186c4c c03ba8d0 00000077 c4114000 00000000 0000004d 00000000 c4115ee4 c4115ee4 c4114000 c07fd6a0 00004e09 c4115f04 c0186df5 00000080 c4115f38 c014f4b3 Call Trace: [<c01059ff>] show_stack+0x8f/0xb0 [<c0105bb3>] show_registers+0x163/0x1d0 [<c0105dc6>] die+0xe6/0x1c0 [<c0117773>] do_page_fault+0x213/0x6c0 [<c0105674>] exception_start+0x6/0xe [<c0186c4c>] prune_icache+0x20c/0x390 [<c0186df5>] shrink_icache_memory+0x25/0x50 [<c014f4b3>] shrink_slab+0x123/0x1d0 [<c01511ee>] balance_pgdat+0x24e/0x2a0 [<c015130c>] kswapd+0xcc/0xe0 [<c0102899>] kernel_thread_helper+0x5/0xc Code: 00 e0 ff ff 21 e0 ff 40 14 8d 47 4c 89 45 ec 31 c0 86 47 4c 84 c0 0f 8e 79 00 00 00 8b 86 0c 01 00 00 39 d8 74 23 89 c1 8d 76 00 <8b> 41 d8 a8 02 75 5a 8b 01 8b 51 04 89 02 89 09 89 50 04 8b 03 <6>note: kswapd0[39] exited with preempt_count 1 [-- Attachment #3: 12 --] [-- Type: text/plain, Size: 1500 bytes --] Unable to handle kernel NULL pointer dereference at virtual address 00000104 printing eip: c014c0d1 *pde = 36c9c001 *pte = 00000000 Oops: 0002 [#1] PREEMPT SMP Modules linked in: CPU: 0 EIP: 0060:[<c014c0d1>] Not tainted EFLAGS: 00010016 (2.6.8-rc2) EIP is at free_block+0x51/0xe0 eax: 00000100 ebx: e7d580c8 ecx: e7d58100 edx: e7d58100 esi: c40e5040 edi: 00000014 ebp: c413be38 esp: c413be1c ds: 007b es: 007b ss: 0068 Process kswapd0 (pid: 39, threadinfo=c413a000 task=c40a8070) Stack: c40e5040 eb864100 c40e5068 c40e5078 c4160050 00000282 e61addc0 c413be64 c014c1d0 c40e5040 c4160050 0000001b c4160050 c40e50a0 0000001b c4160040 00000282 e61addc0 c413be80 c014c792 c40e5040 c4160040 e61ade5c c413bee4 Call Trace: [<c0105a0f>] show_stack+0x8f/0xb0 [<c0105bc3>] show_registers+0x163/0x1c0 [<c0105d97>] die+0xb7/0x180 [<c0116fb3>] do_page_fault+0x213/0x6c9 [<c0105684>] exception_start+0x6/0xe [<c014c1d0>] cache_flusharray+0x70/0x140 [<c014c792>] kmem_cache_free+0x52/0x60 [<c01b8094>] ext3_destroy_inode+0x24/0x30 [<c018713b>] destroy_inode+0x3b/0x60 [<c0187479>] dispose_list+0x59/0x110 [<c0187927>] prune_icache+0x127/0x3a0 [<c0187be8>] shrink_icache_memory+0x48/0x50 [<c014f4ec>] shrink_slab+0x15c/0x1d0 [<c0151237>] balance_pgdat+0x217/0x270 [<c015135c>] kswapd+0xcc/0xe0 [<c0102859>] kernel_thread_helper+0x5/0xc Code: 89 50 04 89 02 8b 43 0c 31 d2 c7 03 00 01 10 00 c7 43 04 00 <6>note: kswapd0[39] exited with preempt_count 1 [-- Attachment #4: 13 --] [-- Type: text/plain, Size: 1258 bytes --] Unable to handle kernel NULL pointer dereference at virtual address 0000000c printing eip: c019a4e1 *pde = 0ddbe001 *pte = 00000000 Oops: 0002 [#1] PREEMPT Modules linked in: CPU: 0 EIP: 0060:[<c019a4e1>] Not tainted EFLAGS: 00010202 (2.6.8-rc2) EIP is at prune_icache+0x431/0x600 eax: 00000008 ebx: c0538b3c ecx: c0538b44 edx: f1b3d17c esi: 00000029 edi: c03f0790 ebp: f7ee9f04 esp: f7ee9ec8 ds: 007b es: 007b ss: 0068 Process kswapd0 (pid: 38, threadinfo=f7ee8000 task=f7eb0670) Stack: c0538b3c 00000077 f7ee9f10 c01b63a1 c17590a0 c17590c0 c17590e0 f7ee8000 00000000 00000029 c0538dc4 c07d6884 00000080 00000000 f7ee8000 f7ee9f10 c019a6f8 00000080 f7ee9f44 c015450c 00000080 000000d0 00017a89 9384c800 Call Trace: [<c0105d6f>] show_stack+0x7f/0xa0 [<c0105f1e>] show_registers+0x15e/0x1c0 [<c0106127>] die+0xe7/0x240 [<c0116113>] do_page_fault+0x213/0x6c8 [<c0105a01>] error_code+0x2d/0x38 [<c019a6f8>] shrink_icache_memory+0x48/0x50 [<c015450c>] shrink_slab+0x15c/0x1a0 [<c015657e>] balance_pgdat+0x1ce/0x210 [<c015667f>] kswapd+0xbf/0xd0 [<c0102795>] kernel_thread_helper+0x5/0x10 Code: 89 50 04 c7 03 00 00 00 00 c7 43 04 00 00 00 00 8d 53 08 8b <6>note: kswapd0[38] exited with preempt_count 1 ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 18:06 ` Ingo Molnar @ 2004-08-05 18:50 ` Linus Torvalds 2004-08-05 20:29 ` Andi Kleen [not found] ` <20040806073739.GA6617@elte.hu> 0 siblings, 2 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-05 18:50 UTC (permalink / raw) To: Ingo Molnar Cc: Denis Vlasenko, Gene Heskett, Kernel Mailing List, Andrew Morton, Andi Kleen On Thu, 5 Aug 2004, Ingo Molnar wrote: > > * Linus Torvalds <torvalds@osdl.org> wrote: > > > Anyway, one other thing that makes me worry is the fact that Gene > > apparently has a K7. One of the things AMD has gotten wrong several > > times is prefetching, and it so happens that the dcache code is one of > > the users of the prefetch instruction. prude_dcache() in particular. > > hm, i too happen to have an Athlon64 box (running the x86 kernel) where > i can reproduce dcache pruning crashes after a few hours of testing > using a near-vanilla kernel. Very interesthing. The K8 core (aka Opteron or Athlon64) has exactly the same prefetch page fault bugs that the K7 core has. This, coupled with your observation > NOTE2: i tried hard but couldnt reproduce the problem using the very > same kernel and the same workload on a PIII box. Once i ran it overnight > to check. Only the Athlon64 box does it. It could also be a hardware > problem - albeit the box withstood days of memtest86. really makes me wonder.. NOTE! Almost every time we've wondered about a CPU bug, it really wasn't. It usually ends up being something really subtle with memory ordering, with TLB updates, or something. So I'm putting the prefetch issue up on the table as just a wild theory. It would be interestign to see if we can get a bigger set of boxes with this crash. Andi, I think you were the contact for the AMD prefetch bug. Can you ask around the same people whether there might be other problems in this area? No point in putting a lot of effort into it, but just as one thing to check for.. Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 18:50 ` Linus Torvalds @ 2004-08-05 20:29 ` Andi Kleen [not found] ` <20040806073739.GA6617@elte.hu> 1 sibling, 0 replies; 146+ messages in thread From: Andi Kleen @ 2004-08-05 20:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: mingo, vda, gene.heskett, linux-kernel, akpm On Thu, 5 Aug 2004 11:50:33 -0700 (PDT) Linus Torvalds <torvalds@osdl.org> wrote: > > > On Thu, 5 Aug 2004, Ingo Molnar wrote: > > > > * Linus Torvalds <torvalds@osdl.org> wrote: > > > > > Anyway, one other thing that makes me worry is the fact that Gene > > > apparently has a K7. One of the things AMD has gotten wrong several > > > times is prefetching, and it so happens that the dcache code is one of > > > the users of the prefetch instruction. prude_dcache() in particular. > > > > hm, i too happen to have an Athlon64 box (running the x86 kernel) where > > i can reproduce dcache pruning crashes after a few hours of testing > > using a near-vanilla kernel. > > Very interesthing. > > The K8 core (aka Opteron or Athlon64) has exactly the same prefetch page > fault bugs that the K7 core has. This, coupled with your observation Yep, but they should be handled. Of course in theory it could be a subtle bug in the prefetch handler. But normally even when that goes wrong you just get a obvious oops on the prefetch instruction itself. When you disable the use of prefetch does it still happen? diff -u linux-2.6.8rc2-update/include/asm-i386/processor.h-o linux-2.6.8rc2-update/include/asm-i386/processor.h --- linux-2.6.8rc2-update/include/asm-i386/processor.h-o 2004-07-28 02:23:44.000000000 +0200 +++ linux-2.6.8rc2-update/include/asm-i386/processor.h 2004-08-05 22:25:46.000000000 +0200 @@ -612,6 +612,7 @@ #define ASM_NOP_MAX 8 +#if 0 /* Prefetch instructions for Pentium III and AMD Athlon */ /* It's not worth to care about 3dnow! prefetches for the K6 because they are microcoded there and very slow. @@ -640,6 +641,7 @@ "r" (x)); } #define spin_lock_prefetch(x) prefetchw(x) +#endif extern void select_idle_routine(const struct cpuinfo_x86 *c); > > NOTE2: i tried hard but couldnt reproduce the problem using the very > > same kernel and the same workload on a PIII box. Once i ran it overnight > > to check. Only the Athlon64 box does it. It could also be a hardware > > problem - albeit the box withstood days of memtest86. Both K8/K7 are usually a lot faster and a lot more aggressive in out of order execution than the P3 box. A P4 would be a better comparison. > Andi, I think you were the contact for the AMD prefetch bug. Can you ask > around the same people whether there might be other problems in this area? > No point in putting a lot of effort into it, but just as one thing to > check for.. A bigger sample size that shows it really only happens on AMD first would be useful. -Andi ^ permalink raw reply [flat|nested] 146+ messages in thread
[parent not found: <20040806073739.GA6617@elte.hu>]
[parent not found: <20040806004231.143c8bd2.akpm@osdl.org>]
* Re: Possible dcache BUG [not found] ` <20040806004231.143c8bd2.akpm@osdl.org> @ 2004-08-06 8:27 ` Ingo Molnar 2004-08-06 11:51 ` Gene Heskett 1 sibling, 0 replies; 146+ messages in thread From: Ingo Molnar @ 2004-08-06 8:27 UTC (permalink / raw) To: Andrew Morton; +Cc: torvalds, vda, gene.heskett, linux-kernel, ak * Andrew Morton <akpm@osdl.org> wrote: > Ingo Molnar <mingo@elte.hu> wrote: > > > > [btw., it would be nice to dump > > instructions prior the crash point so that we could know precisely what > > prefetch instruction the kernel included.] > > I've had a patch (from Keith) to do that in -mm for over a year, and > ksymoops has supported it for that long. But I think Linus has some > problem-which-I-never-understood with the whole idea. There were some more naive patches around previously i believe and those problems are solved in this patch: the dump splits the pre-crash and post-crash instruction stream decoding, so crash-EIP decoding is never unreliable. > 25-akpm/arch/i386/kernel/traps.c | 18 ++++++++++-------- > 1 files changed, 10 insertions(+), 8 deletions(-) a strong ack from me. Signed-off-by: Ingo Molnar <mingo@elte.hu> Ingo ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG [not found] ` <20040806004231.143c8bd2.akpm@osdl.org> 2004-08-06 8:27 ` Ingo Molnar @ 2004-08-06 11:51 ` Gene Heskett 2004-08-06 16:58 ` Linus Torvalds 1 sibling, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-06 11:51 UTC (permalink / raw) To: linux-kernel; +Cc: Andrew Morton, Ingo Molnar, torvalds, vda, ak On Friday 06 August 2004 03:42, Andrew Morton wrote: >Ingo Molnar <mingo@elte.hu> wrote: >> [btw., it would be nice to dump >> instructions prior the crash point so that we could know >> precisely what prefetch instruction the kernel included.] > >I've had a patch (from Keith) to do that in -mm for over a year, and >ksymoops has supported it for that long. But I think Linus has some >problem-which-I-never-understood with the whole idea. > > > > >This teaches the i386 oops dumper to dump opcodes preceding and > after the offending EIP. Supporting code against ksymoops has been > tested and produces output like the below. > >Support for this was added to ksymoops-2.4.9. > >Note that ksymoops will guarantee that the disassembly after the > <eip> value is always in sync - if the disassembly from the start > of the Code: line does not sync up with the EIP address ksymoops > will perform the resync. > > >Warning (merge_maps): no symbols in merged map >Mar 18 23:47:36 vmm kernel: kernel BUG at fs/open.c:802! >Mar 18 23:47:36 vmm kernel: invalid operand: 0000 [#1] >Mar 18 23:47:36 vmm kernel: CPU: 0 >Mar 18 23:47:36 vmm kernel: EIP: 0060:[<c014fedf>] VLI Not > tainted Using defaults from ksymoops -t elf32-i386 -a i386 >Mar 18 23:47:36 vmm kernel: EFLAGS: 00010246 >Mar 18 23:47:36 vmm kernel: eax: ccdfb900 ebx: 4001020d ecx: > 00000000 edx: 0000007b Mar 18 23:47:36 vmm kernel: esi: 00000000 > edi: bfffdd70 ebp: ccdfdfbc esp: ccdfdfb0 Mar 18 23:47:36 vmm > kernel: ds: 007b es: 007b ss: 0068 >Mar 18 23:47:36 vmm kernel: Stack: 4001020d 00000000 bfffdd70 > ccdfc000 c0109213 4001020d 00000000 00000003 Mar 18 23:47:36 vmm > kernel: 00000000 bfffdd70 bfffdc88 00000005 0000007b > 0000007b 00000005 4000ef94 Mar 18 23:47:36 vmm kernel: > 00000073 00000206 bfffdbd8 0000007b Mar 18 23:47:36 vmm kernel: > Call Trace: >Mar 18 23:47:36 vmm kernel: [<c0109213>] syscall_call+0x7/0xb >Mar 18 23:47:36 vmm kernel: Code: 14 98 f0 81 41 04 00 00 00 01 5b > 89 ec 5d c3 90 b8 00 e0 ff ff 21 e0 55 89 e5 57 56 53 8b 00 81 b8 > e4 01 00 00 0f 27 00 00 75 08 <0f> 0b 22 03 85 18 2f c0 8b 45 08 50 > e8 30 d4 00 00 89 c7 83 c4 > >>>EIP; c014fedf No symbols available <===== > >Trace; c0109213 No symbols available > >This architecture has variable length instructions, decoding before > eip is unreliable, take these instructions with a pinch of salt. > >Code; c014feb4 No symbols available >00000000 <_EIP>: >Code; c014feb4 No symbols available > 0: 14 98 adc $0x98,%al >Code; c014feb6 No symbols available > 2: f0 81 41 04 00 00 00 lock addl $0x1000000,0x4(%ecx) >Code; c014febd No symbols available > 9: 01 >Code; c014febe No symbols available > a: 5b pop %ebx >Code; c014febf No symbols available > b: 89 ec mov %ebp,%esp >Code; c014fec1 No symbols available > d: 5d pop %ebp >Code; c014fec2 No symbols available > e: c3 ret >Code; c014fec3 No symbols available > f: 90 nop >Code; c014fec4 No symbols available > 10: b8 00 e0 ff ff mov $0xffffe000,%eax >Code; c014fec9 No symbols available > 15: 21 e0 and %esp,%eax >Code; c014fecb No symbols available > 17: 55 push %ebp >Code; c014fecc No symbols available > 18: 89 e5 mov %esp,%ebp >Code; c014fece No symbols available > 1a: 57 push %edi >Code; c014fecf No symbols available > 1b: 56 push %esi >Code; c014fed0 No symbols available > 1c: 53 push %ebx >Code; c014fed1 No symbols available > 1d: 8b 00 mov (%eax),%eax >Code; c014fed3 No symbols available > 1f: 81 b8 e4 01 00 00 0f cmpl $0x270f,0x1e4(%eax) >Code; c014feda No symbols available > 26: 27 00 00 >Code; c014fedd No symbols available > 29: 75 08 jne 33 <_EIP+0x33> c014fee7 No > symbols available > >This decode from eip onwards should be reliable > >Code; c014fedf No symbols available >00000000 <_EIP>: >Code; c014fedf No symbols available <===== > 0: 0f 0b ud2a <===== >Code; c014fee1 No symbols available > 2: 22 03 and (%ebx),%al >Code; c014fee3 No symbols available > 4: 85 18 test %ebx,(%eax) >Code; c014fee5 No symbols available > 6: 2f das >Code; c014fee6 No symbols available > 7: c0 8b 45 08 50 e8 30 rorb $0x30,0xe8500845(%ebx) >Code; c014feed No symbols available > e: d4 00 aam $0x0 >Code; c014feef No symbols available > 10: 00 .byte 0x0 >Code; c014fef0 No symbols available > 11: 89 c7 mov %eax,%edi >Code; c014fef2 No symbols available > 13: 83 .byte 0x83 >Code; c014fef3 No symbols available > 14: c4 .byte 0xc4 > > > >Signed-off-by: Andrew Morton <akpm@osdl.org> >--- > > 25-akpm/arch/i386/kernel/traps.c | 18 ++++++++++-------- > 1 files changed, 10 insertions(+), 8 deletions(-) > >diff -puN arch/i386/kernel/traps.c~oops-dump-preceding-code > arch/i386/kernel/traps.c --- > 25/arch/i386/kernel/traps.c~oops-dump-preceding-code 2004-06-28 > 00:47:26.807038944 -0700 +++ > 25-akpm/arch/i386/kernel/traps.c 2004-06-28 00:47:26.812038184 > -0700 @@ -250,7 +250,7 @@ void show_registers(struct pt_regs *regs > ss = regs->xss & 0xffff; > } > print_modules(); >- printk("CPU: %d\nEIP: %04x:[<%08lx>] %s\nEFLAGS: %08lx" >+ printk("CPU: %d\nEIP: %04x:[<%08lx>] %s VLI\nEFLAGS: > %08lx" " (%s) \n", > smp_processor_id(), 0xffff & regs->xcs, regs->eip, > print_tainted(), regs->eflags, UTS_RELEASE); >@@ -268,23 +268,25 @@ void show_registers(struct pt_regs *regs > * time of the fault.. > */ > if (in_kernel) { >+ u8 *eip; > > printk("\nStack: "); > show_stack(NULL, (unsigned long*)esp); > > printk("Code: "); >- if(regs->eip < PAGE_OFFSET) >- goto bad; > >- for(i=0;i<20;i++) >- { >+ eip = (u8 *)regs->eip - 43; >+ for (i = 0; i < 64; i++, eip++) { > unsigned char c; >- if(__get_user(c, &((unsigned char*)regs->eip)[i])) { >-bad: >+ >+ if (eip < (u8 *)PAGE_OFFSET || __get_user(c, eip)) { > printk(" Bad EIP value."); > break; > } >- printk("%02x ", c); >+ if (eip == (u8 *)regs->eip) >+ printk("<%02x> ", c); >+ else >+ printk("%02x ", c); > } > } > printk("\n"); >_ Veddy veddy Interestink. Linus, Andrew, should I apply this patch too at the next remake? FWIW, I'm still up (20:38) this morning, and showing plenty (127+ megs) of free memory. No crash, no odd log (other than samba squawking about some option thats been changed & I haven't fixed the smb.conf) so far. I'm beginning to like this test patch, Linus, thanks :) -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 11:51 ` Gene Heskett @ 2004-08-06 16:58 ` Linus Torvalds 2004-08-06 17:16 ` Gene Heskett 2004-08-06 23:09 ` Chris Shoemaker 0 siblings, 2 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-06 16:58 UTC (permalink / raw) To: Gene Heskett Cc: Kernel Mailing List, Andrew Morton, Ingo Molnar, vda, ak, Chris Shoemaker, William Lee Irwin III On Fri, 6 Aug 2004, Gene Heskett wrote: > > Linus, Andrew, should I apply this patch too at the next remake? Might be worth it, but it's more important to see any oops at all, or lack of oopses.. > FWIW, I'm still up (20:38) this morning, and showing plenty (127+ > megs) of free memory. No crash, no odd log (other than samba > squawking about some option thats been changed & I haven't fixed the > smb.conf) so far. > > I'm beginning to like this test patch, Linus, thanks :) If the only thing you have done is add the list_del_init() debugging patch, then the only thing that has changed is really the access patterns to uncached memory. The original list_del_init() tries to only do a few single _writes_ to the dentries around it. The added debugging will do _reads_ (and thus bring it into the cache) of the dentry pointers of the dentries around it. If that change makes a real difference, I really only see two possibilities: - there really is a prefetch bug (or possibly, there's a bug in our prefetch fixup code, and the known prefetch bug just triggers the problem indirectly) - it just changes the timing enough that whatever bug you hit went away. Now, Chris Shoemaker reported dentry problems on a intel CPU and said that wli had seen something too, but I'm wondering whether Chris and wli might have been seeing the knfsd/xfs-related dentry bug that I found yesterday. So I think the prefetch theory is still alive, but we should check with Chris. Chris? Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 16:58 ` Linus Torvalds @ 2004-08-06 17:16 ` Gene Heskett 2004-08-06 17:26 ` William Lee Irwin III 2004-08-06 23:09 ` Chris Shoemaker 1 sibling, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-06 17:16 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak, Chris Shoemaker, William Lee Irwin III On Friday 06 August 2004 12:58, Linus Torvalds wrote: >On Fri, 6 Aug 2004, Gene Heskett wrote: >> Linus, Andrew, should I apply this patch too at the next remake? > >Might be worth it, but it's more important to see any oops at all, > or lack of oopses.. > >> FWIW, I'm still up (20:38) this morning, and showing plenty (127+ >> megs) of free memory. No crash, no odd log (other than samba >> squawking about some option thats been changed & I haven't fixed >> the smb.conf) so far. >> >> I'm beginning to like this test patch, Linus, thanks :) > >If the only thing you have done is add the list_del_init() debugging >patch, then the only thing that has changed is really the access > patterns to uncached memory. > >The original list_del_init() tries to only do a few single _writes_ > to the dentries around it. The added debugging will do _reads_ (and > thus bring it into the cache) of the dentry pointers of the > dentries around it. > >If that change makes a real difference, I really only see two >possibilities: > - there really is a prefetch bug (or possibly, there's a bug in our > prefetch fixup code, and the known prefetch bug just triggers the > problem indirectly) > - it just changes the timing enough that whatever bug you hit went > away. > >Now, Chris Shoemaker reported dentry problems on a intel CPU and > said that wli had seen something too, but I'm wondering whether > Chris and wli might have been seeing the knfsd/xfs-related dentry > bug that I found yesterday. So I think the prefetch theory is still > alive, but we should check with Chris. Chris? > > Linus I'm still up, a bit over 24 hours now. :) Free memory is slowly going away, I ran mozilla for a while which got rid of about 60 megs, and now I see I'm down to 23 free, whereas at the 11 hour up marker I had nearly 130 megs free yet. I've got to go to town, so that will leave seti and kmail doing their thing till I get back. If it goes down, hopefully it will record something, unlike the last couple of times. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 17:16 ` Gene Heskett @ 2004-08-06 17:26 ` William Lee Irwin III 2004-08-06 23:19 ` Chris Shoemaker 0 siblings, 1 reply; 146+ messages in thread From: William Lee Irwin III @ 2004-08-06 17:26 UTC (permalink / raw) To: Gene Heskett Cc: linux-kernel, Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak, Chris Shoemaker On Friday 06 August 2004 12:58, Linus Torvalds wrote: >> Now, Chris Shoemaker reported dentry problems on a intel CPU and >> said that wli had seen something too, but I'm wondering whether >> Chris and wli might have been seeing the knfsd/xfs-related dentry >> bug that I found yesterday. So I think the prefetch theory is still >> alive, but we should check with Chris. Chris? On Fri, Aug 06, 2004 at 01:16:24PM -0400, Gene Heskett wrote: > I'm still up, a bit over 24 hours now. :) Free memory is slowly going > away, I ran mozilla for a while which got rid of about 60 megs, and > now I see I'm down to 23 free, whereas at the 11 hour up marker I had > nearly 130 megs free yet. I've got to go to town, so that will leave > seti and kmail doing their thing till I get back. If it goes down, > hopefully it will record something, unlike the last couple of times. I've not had issues around the dcache for quite some time, I think not since the 2.5.65 timeframe. IIRC maneesh and dipankar had some fixes that resolved all my issues not long afterward. So unfortunately I have nothing strictly dcache-related to report. Chris may have been referring to some potentially pathological NFS behavior I've seen for a long time centered around extended periods of knfsd unresponsiveness. -- wli ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 17:26 ` William Lee Irwin III @ 2004-08-06 23:19 ` Chris Shoemaker 2004-08-07 4:15 ` William Lee Irwin III 0 siblings, 1 reply; 146+ messages in thread From: Chris Shoemaker @ 2004-08-06 23:19 UTC (permalink / raw) To: William Lee Irwin III, Gene Heskett, linux-kernel, Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak On Fri, Aug 06, 2004 at 10:26:07AM -0700, William Lee Irwin III wrote: > On Friday 06 August 2004 12:58, Linus Torvalds wrote: > >> Now, Chris Shoemaker reported dentry problems on a intel CPU and > >> said that wli had seen something too, but I'm wondering whether > >> Chris and wli might have been seeing the knfsd/xfs-related dentry > >> bug that I found yesterday. So I think the prefetch theory is still > >> alive, but we should check with Chris. Chris? > > I've not had issues around the dcache for quite some time, I think not > since the 2.5.65 timeframe. IIRC maneesh and dipankar had some fixes > that resolved all my issues not long afterward. So unfortunately I have > nothing strictly dcache-related to report. Chris may have been > referring to some potentially pathological NFS behavior I've seen for a > long time centered around extended periods of knfsd unresponsiveness. > > -- wli I was referring to: http://www.ussg.iu.edu/hypermail/linux/kernel/0406.2/0410.html ...doesn't look NFS-related to me. OTOH, it does bear some resemblance to some other oopses floating around. Did you solve this one? -chris ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 23:19 ` Chris Shoemaker @ 2004-08-07 4:15 ` William Lee Irwin III 2004-08-07 0:05 ` Chris Shoemaker 0 siblings, 1 reply; 146+ messages in thread From: William Lee Irwin III @ 2004-08-07 4:15 UTC (permalink / raw) To: Chris Shoemaker Cc: Gene Heskett, linux-kernel, Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak On Fri, Aug 06, 2004 at 10:26:07AM -0700, William Lee Irwin III wrote: >> I've not had issues around the dcache for quite some time, I think not >> since the 2.5.65 timeframe. IIRC maneesh and dipankar had some fixes >> that resolved all my issues not long afterward. So unfortunately I have >> nothing strictly dcache-related to report. Chris may have been >> referring to some potentially pathological NFS behavior I've seen for a >> long time centered around extended periods of knfsd unresponsiveness. On Fri, Aug 06, 2004 at 07:19:02PM -0400, Chris Shoemaker wrote: > I was referring to: > http://www.ussg.iu.edu/hypermail/linux/kernel/0406.2/0410.html > ...doesn't look NFS-related to me. OTOH, it does bear some resemblance > to some other oopses floating around. Did you solve this one? I've not seen this ever again after some point, and don't recall enough of the context/etc. to say much about what was going on with it. -- wli ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-07 4:15 ` William Lee Irwin III @ 2004-08-07 0:05 ` Chris Shoemaker 2004-08-07 5:50 ` William Lee Irwin III 0 siblings, 1 reply; 146+ messages in thread From: Chris Shoemaker @ 2004-08-07 0:05 UTC (permalink / raw) To: William Lee Irwin III, Gene Heskett, linux-kernel, Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak On Fri, Aug 06, 2004 at 09:15:50PM -0700, William Lee Irwin III wrote: > On Fri, Aug 06, 2004 at 10:26:07AM -0700, William Lee Irwin III wrote: > >> I've not had issues around the dcache for quite some time, I think not > >> since the 2.5.65 timeframe. IIRC maneesh and dipankar had some fixes > >> that resolved all my issues not long afterward. So unfortunately I have > >> nothing strictly dcache-related to report. Chris may have been > >> referring to some potentially pathological NFS behavior I've seen for a > >> long time centered around extended periods of knfsd unresponsiveness. > > On Fri, Aug 06, 2004 at 07:19:02PM -0400, Chris Shoemaker wrote: > > I was referring to: > > http://www.ussg.iu.edu/hypermail/linux/kernel/0406.2/0410.html > > ...doesn't look NFS-related to me. OTOH, it does bear some resemblance > > to some other oopses floating around. Did you solve this one? > > I've not seen this ever again after some point, and don't recall enough > of the context/etc. to say much about what was going on with it. > > -- wli I know what you mean. Sometimes I don't know which bothers me more, the oopses that inexplicably DON'T come back, or the ones that DO. Perchance, have you added RAM since the oops, or changed the machine's memory-related behavior? -chris ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-07 0:05 ` Chris Shoemaker @ 2004-08-07 5:50 ` William Lee Irwin III 0 siblings, 0 replies; 146+ messages in thread From: William Lee Irwin III @ 2004-08-07 5:50 UTC (permalink / raw) To: Chris Shoemaker Cc: Gene Heskett, linux-kernel, Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak On Fri, Aug 06, 2004 at 09:15:50PM -0700, William Lee Irwin III wrote: >> I've not seen this ever again after some point, and don't recall enough >> of the context/etc. to say much about what was going on with it. On Fri, Aug 06, 2004 at 08:05:21PM -0400, Chris Shoemaker wrote: > I know what you mean. Sometimes I don't know which bothers me more, the > oopses that inexplicably DON'T come back, or the ones that DO. > Perchance, have you added RAM since the oops, or changed the machine's > memory-related behavior? Neither. Only the kernel has changed. Upon closer inspection, local changes with direct impact on the inode cache are likely suspects. -- wli ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 16:58 ` Linus Torvalds 2004-08-06 17:16 ` Gene Heskett @ 2004-08-06 23:09 ` Chris Shoemaker 2004-08-07 6:20 ` Linus Torvalds 1 sibling, 1 reply; 146+ messages in thread From: Chris Shoemaker @ 2004-08-06 23:09 UTC (permalink / raw) To: Linus Torvalds Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Ingo Molnar, vda, ak, William Lee Irwin III [-- Attachment #1: Type: text/plain, Size: 929 bytes --] On Fri, Aug 06, 2004 at 09:58:35AM -0700, Linus Torvalds wrote: > > Now, Chris Shoemaker reported dentry problems on a intel CPU and said that > wli had seen something too, but I'm wondering whether Chris and wli might > have been seeing the knfsd/xfs-related dentry bug that I found yesterday. > So I think the prefetch theory is still alive, but we should check with > Chris. Chris? > > Linus My oopses were not related to nfs or xfs. I don't use either of these on this box. In the interest of contributing more than conspiracy theories, I'm trying to dig up some records of the dcache problems I was having. Unfortunately, a period of low free disk space led to some aggressive "cleaning" on my part since then. :( I _was_ able to find the attached oops, but I don't think I have the corresponding object files, so I hope the decoding it contains is good enough. Just ask if you need some more info. -chris [-- Attachment #2: Mar17.4.txt --] [-- Type: text/plain, Size: 2190 bytes --] Mar 17 16:42:01 peace kernel: Unable to handle kernel paging request at virtual address 0b7eec1c Mar 17 16:42:01 peace kernel: printing eip: Mar 17 16:42:01 peace kernel: c01a6667 Mar 17 16:42:01 peace kernel: *pde = 00000000 Mar 17 16:42:01 peace kernel: Oops: 0000 [#1] Mar 17 16:42:01 peace kernel: PREEMPT DEBUG_PAGEALLOC Mar 17 16:42:01 peace kernel: CPU: 0 Mar 17 16:42:01 peace kernel: EIP: 0060:[iput+23/112] Not tainted Mar 17 16:42:01 peace kernel: EFLAGS: 00010202 Mar 17 16:42:01 peace kernel: EIP is at iput+0x17/0x70 Mar 17 16:42:01 peace kernel: eax: 0b7eebf8 ebx: c33fee3c ecx: c33fee4c edx: c33fee4c Mar 17 16:42:01 peace kernel: esi: c33f2ef8 edi: cba32000 ebp: cba33e54 esp: cba33e50 Mar 17 16:42:01 peace kernel: ds: 007b es: 007b ss: 0068 Mar 17 16:42:01 peace kernel: Process kswapd0 (pid: 7, threadinfo=cba32000 task=cba559e0) Mar 17 16:42:01 peace kernel: Stack: c33fee3c cba33e88 c019f540 00000066 cba33e60 cba33e60 00000000 00000001 Mar 17 16:42:01 peace kernel: 00000000 c11bcc40 0000003c 00000080 cba32000 0000009c cba33e90 c01a06d7 Mar 17 16:42:01 peace kernel: cba33ec4 c01612d8 000e1048 00000000 000079d9 0000001d 00000000 cbffb654 Mar 17 16:42:01 peace kernel: Call Trace: Mar 17 16:42:01 peace kernel: [prune_dcache+1120/1952] prune_dcache+0x460/0x7a0 Mar 17 16:42:01 peace kernel: [shrink_dcache_memory+23/32] shrink_dcache_memory+0x17/0x20 Mar 17 16:42:01 peace kernel: [shrink_slab+280/368] shrink_slab+0x118/0x170 Mar 17 16:42:01 peace kernel: [balance_pgdat+492/528] balance_pgdat+0x1ec/0x210 Mar 17 16:42:01 peace kernel: [kswapd+220/240] kswapd+0xdc/0xf0 Mar 17 16:42:01 peace kernel: [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40 Mar 17 16:42:01 peace kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14 Mar 17 16:42:01 peace kernel: [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40 Mar 17 16:42:01 peace kernel: [kswapd+0/240] kswapd+0x0/0xf0 Mar 17 16:42:01 peace kernel: [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc Mar 17 16:42:01 peace kernel: Mar 17 16:42:01 peace kernel: Code: 8b 40 24 74 4a 85 c0 74 07 8b 50 14 85 d2 75 39 8d 43 1c ba ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 23:09 ` Chris Shoemaker @ 2004-08-07 6:20 ` Linus Torvalds 2004-08-07 12:38 ` Gene Heskett 2004-08-07 13:44 ` Chris Shoemaker 0 siblings, 2 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-07 6:20 UTC (permalink / raw) To: Chris Shoemaker Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Ingo Molnar, vda, ak, William Lee Irwin III On Fri, 6 Aug 2004, Chris Shoemaker wrote: > > I _was_ able to find the attached oops, but I don't think I have the > corresponding object files, so I hope the decoding it contains is > good enough. It's fine. It oopses on inode->i_sb->s_op where "i_sb" is bad and contains the pointer "0x0b7eebf8" which is definitely not a valid kernel pointer. There's a few other strange details in your oops report too. One being that the inode pointer (in %ebx, apparently) doesn't show on the stack where I'd expect it to show. Hmm. That might be just a different compiler issue, though. Anyway, this does look somewhat like the ones Gene is seeing. If I had to guess, I'd guess that either the inode pointer is bad, or it's just stale from an inode that has already been free'd. Most likely because of prune_dcache() having had a corrupt LRU list with a stale/corrupt entry. That would blow the prefetch theory out of the water. Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-07 6:20 ` Linus Torvalds @ 2004-08-07 12:38 ` Gene Heskett 2004-08-07 13:44 ` Chris Shoemaker 1 sibling, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-07 12:38 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Chris Shoemaker, Andrew Morton, Ingo Molnar, vda, ak, William Lee Irwin III On Saturday 07 August 2004 02:20, Linus Torvalds wrote: >On Fri, 6 Aug 2004, Chris Shoemaker wrote: >> I _was_ able to find the attached oops, but I don't think I have >> the corresponding object files, so I hope the decoding it contains >> is good enough. > >It's fine. > >It oopses on > > inode->i_sb->s_op > >where "i_sb" is bad and contains the pointer "0x0b7eebf8" which is >definitely not a valid kernel pointer. > >There's a few other strange details in your oops report too. One > being that the inode pointer (in %ebx, apparently) doesn't show on > the stack where I'd expect it to show. Hmm. That might be just a > different compiler issue, though. > >Anyway, this does look somewhat like the ones Gene is seeing. If I > had to guess, I'd guess that either the inode pointer is bad, or > it's just stale from an inode that has already been free'd. Most > likely because of prune_dcache() having had a corrupt LRU list with > a stale/corrupt entry. > >That would blow the prefetch theory out of the water. > > Linus And I'm still up, no Oops yet. 08:34:07 up 1 day, 21:25, 4 users, load average: 1.10, 1.08, 1.03 I've also only done 3 seti units since yesterday morning, about 40% to 50% of my usual production even with the crashes. In other words, system seems stable, but old dog slow too. & thats with top showing seti getting 97-99% of the cpu. Ouch! -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-07 6:20 ` Linus Torvalds 2004-08-07 12:38 ` Gene Heskett @ 2004-08-07 13:44 ` Chris Shoemaker 2004-08-07 18:49 ` Linus Torvalds 2004-08-07 19:01 ` Gene Heskett 1 sibling, 2 replies; 146+ messages in thread From: Chris Shoemaker @ 2004-08-07 13:44 UTC (permalink / raw) To: Linus Torvalds Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Ingo Molnar, vda, ak, William Lee Irwin III [-- Attachment #1: Type: text/plain, Size: 1457 bytes --] On Fri, Aug 06, 2004 at 11:20:28PM -0700, Linus Torvalds wrote: > > > On Fri, 6 Aug 2004, Chris Shoemaker wrote: > > > > I _was_ able to find the attached oops, but I don't think I have the > > corresponding object files, so I hope the decoding it contains is > > good enough. > > It's fine. Well then, maybe you'd like more? I attached two more from the same period. Please remember that these are 5 months old, and could represent bugs already fixed. I think this was stock 2.6.4. > > It oopses on > > inode->i_sb->s_op > > where "i_sb" is bad and contains the pointer "0x0b7eebf8" which is > definitely not a valid kernel pointer. > > There's a few other strange details in your oops report too. One being > that the inode pointer (in %ebx, apparently) doesn't show on the stack > where I'd expect it to show. Hmm. That might be just a different compiler > issue, though. Perhaps due to CONFIG_REGPARM? I haven't used it for quite a while, but back in March I was a bit bolder about config options marked experimental. Gene, are you using REGPARM? -chris > > Anyway, this does look somewhat like the ones Gene is seeing. If I had to > guess, I'd guess that either the inode pointer is bad, or it's just stale > from an inode that has already been free'd. Most likely because of > prune_dcache() having had a corrupt LRU list with a stale/corrupt entry. > > That would blow the prefetch theory out of the water. > > Linus [-- Attachment #2: Mar17.2.txt --] [-- Type: text/plain, Size: 4271 bytes --] Mar 17 03:34:28 peace kernel: Unable to handle kernel paging request at virtual address 0034779d Mar 17 03:34:28 peace kernel: printing eip: Mar 17 03:34:28 peace kernel: c0211e8f Mar 17 03:34:28 peace kernel: *pde = 00000000 Mar 17 03:34:28 peace kernel: Oops: 0000 [#1] Mar 17 03:34:28 peace kernel: PREEMPT DEBUG_PAGEALLOC Mar 17 03:34:28 peace kernel: CPU: 0 Mar 17 03:34:28 peace kernel: EIP: 0060:[vsnprintf+799/1184] Not tainted Mar 17 03:34:28 peace kernel: EFLAGS: 00010097 Mar 17 03:34:28 peace kernel: EIP is at vsnprintf+0x31f/0x4a0 Mar 17 03:34:28 peace kernel: eax: 0034779d ebx: 0000000a ecx: 0034779d edx: fffffffe Mar 17 03:34:28 peace kernel: esi: c042d1fb edi: 00000000 ebp: cba33cf4 esp: cba33cb8 Mar 17 03:34:28 peace kernel: ds: 007b es: 007b ss: 0068 Mar 17 03:34:29 peace kernel: Process kswapd0 (pid: 7, threadinfo=cba32000 task=cba559e0) Mar 17 03:34:29 peace kernel: Stack: 000001a0 00000000 0000000a ffffffff 00000002 00000002 ffffffff ffffffff Mar 17 03:34:29 peace kernel: c042d5df 00000400 c042d1e0 c033a3f2 00000400 00000246 c03419b3 cba33d04 Mar 17 03:34:29 peace kernel: c0212028 cba33d6c c042d1e0 cba33d54 c012a7b7 cba33d60 c10786b8 c1078690 Mar 17 03:34:29 peace kernel: Call Trace: Mar 17 03:34:29 peace kernel: [vscnprintf+24/48] vscnprintf+0x18/0x30 Mar 17 03:34:29 peace kernel: [printk+359/1008] printk+0x167/0x3f0 Mar 17 03:34:29 peace kernel: [shrink_list+2259/2816] shrink_list+0x8d3/0xb00 Mar 17 03:34:29 peace kernel: [shrink_cache+574/1664] shrink_cache+0x23e/0x680 Mar 17 03:34:29 peace kernel: [balance_pgdat+401/528] balance_pgdat+0x191/0x210 Mar 17 03:34:29 peace kernel: [kswapd+220/240] kswapd+0xdc/0xf0 Mar 17 03:34:29 peace kernel: [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40 Mar 17 03:34:29 peace kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14 Mar 17 03:34:29 peace kernel: [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40 Mar 17 03:34:29 peace kernel: [kswapd+0/240] kswapd+0x0/0xf0 Mar 17 03:34:29 peace kernel: [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc Mar 17 03:34:29 peace kernel: Mar 17 03:34:29 peace kernel: Code: 80 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 83 e7 10 89 c3 75 Mar 17 03:34:29 peace kernel: <6>note: kswapd0[7] exited with preempt_count 2 Mar 17 03:34:29 peace kernel: Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Mar 17 03:34:29 peace kernel: in_atomic():1, irqs_disabled():0 Mar 17 03:34:29 peace kernel: Call Trace: Mar 17 03:34:29 peace kernel: [__might_sleep+172/224] __might_sleep+0xac/0xe0 Mar 17 03:34:29 peace kernel: [profile_exit_task+35/96] profile_exit_task+0x23/0x60 Mar 17 03:34:29 peace kernel: [do_exit+117/2480] do_exit+0x75/0x9b0 Mar 17 03:34:29 peace kernel: [die+594/608] die+0x252/0x260 Mar 17 03:34:29 peace kernel: [do_page_fault+0/1360] do_page_fault+0x0/0x550 Mar 17 03:34:29 peace kernel: [do_page_fault+485/1360] do_page_fault+0x1e5/0x550 Mar 17 03:34:29 peace kernel: [update_wall_time+22/64] update_wall_time+0x16/0x40 Mar 17 03:34:29 peace kernel: [do_timer+199/208] do_timer+0xc7/0xd0 Mar 17 03:34:29 peace kernel: [do_page_fault+0/1360] do_page_fault+0x0/0x550 Mar 17 03:34:29 peace kernel: [error_code+45/56] error_code+0x2d/0x38 Mar 17 03:34:29 peace kernel: [vsnprintf+799/1184] vsnprintf+0x31f/0x4a0 Mar 17 03:34:29 peace kernel: [vscnprintf+24/48] vscnprintf+0x18/0x30 Mar 17 03:34:29 peace kernel: [printk+359/1008] printk+0x167/0x3f0 Mar 17 03:34:29 peace kernel: [shrink_list+2259/2816] shrink_list+0x8d3/0xb00 Mar 17 03:34:29 peace kernel: [shrink_cache+574/1664] shrink_cache+0x23e/0x680 Mar 17 03:34:29 peace kernel: [balance_pgdat+401/528] balance_pgdat+0x191/0x210 Mar 17 03:34:29 peace kernel: [kswapd+220/240] kswapd+0xdc/0xf0 Mar 17 03:34:29 peace kernel: [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40 Mar 17 03:34:29 peace kernel: [ret_from_fork+6/20] ret_from_fork+0x6/0x14 Mar 17 03:34:29 peace kernel: [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40 Mar 17 03:34:29 peace kernel: [kswapd+0/240] kswapd+0x0/0xf0 Mar 17 03:34:29 peace kernel: [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc Mar 17 03:34:29 peace kernel: [-- Attachment #3: Mar17.3.txt --] [-- Type: text/plain, Size: 6974 bytes --] Mar 17 06:25:01 peace /USR/SBIN/CRON[1153]: (root) CMD (test -e /usr/sbin/anacron || run-parts --report /etc/cron.daily) Mar 17 06:25:04 peace kernel: Unable to handle kernel paging request at virtual address 00a6be3c Mar 17 06:25:04 peace kernel: printing eip: Mar 17 06:25:04 peace kernel: c01a4650 Mar 17 06:25:04 peace kernel: *pde = 00000000 Mar 17 06:25:04 peace kernel: Oops: 0000 [#2] Mar 17 06:25:04 peace kernel: PREEMPT DEBUG_PAGEALLOC Mar 17 06:25:04 peace kernel: CPU: 0 Mar 17 06:25:04 peace kernel: EIP: 0060:[find_inode_fast+32/96] Not tainted Mar 17 06:25:04 peace kernel: EFLAGS: 00010206 Mar 17 06:25:04 peace kernel: EIP is at find_inode_fast+0x20/0x60 Mar 17 06:25:04 peace kernel: eax: c35b5e3c ebx: 000382ea ecx: 00a6be3c edx: 00a6be3c Mar 17 06:25:04 peace kernel: esi: cb7eebf8 edi: c11f1cac ebp: c616de24 esp: c616de18 Mar 17 06:25:04 peace kernel: ds: 007b es: 007b ss: 0068 Mar 17 06:25:04 peace kernel: Process find (pid: 1170, threadinfo=c616c000 task=c5d169e0) Mar 17 06:25:04 peace kernel: Stack: 000382ea 000382ea cb7eebf8 c616de58 c01a5aa0 c55612ec 00000000 00000000 Mar 17 06:25:04 peace kernel: c5f87f8c c616def8 cab3cef8 1d244b3c c11f1cac 000382ea c5f87ef8 cb7eebf8 Mar 17 06:25:04 peace kernel: c616de70 c01dde7c c8fae110 c0388a80 fffffff4 cacb3ebc c616de94 c0191a7e Mar 17 06:25:04 peace kernel: Call Trace: Mar 17 06:25:04 peace kernel: [iget_locked+176/672] iget_locked+0xb0/0x2a0 Mar 17 06:25:04 peace kernel: [ext3_lookup+92/176] ext3_lookup+0x5c/0xb0 Mar 17 06:25:04 peace kernel: [real_lookup+206/256] real_lookup+0xce/0x100 Mar 17 06:25:04 peace kernel: [do_lookup+117/128] do_lookup+0x75/0x80 Mar 17 06:25:04 peace kernel: [link_path_walk+2138/4224] link_path_walk+0x85a/0x1080 Mar 17 06:25:04 peace kernel: [kmem_cache_alloc+134/480] kmem_cache_alloc+0x86/0x1e0 Mar 17 06:25:04 peace kernel: [getname+126/192] getname+0x7e/0xc0 Mar 17 06:25:04 peace kernel: [__user_walk+61/80] __user_walk+0x3d/0x50 Mar 17 06:25:04 peace kernel: [vfs_lstat+29/80] vfs_lstat+0x1d/0x50 Mar 17 06:25:04 peace kernel: [sys_lstat64+22/48] sys_lstat64+0x16/0x30 Mar 17 06:25:04 peace kernel: [syscall_call+7/11] syscall_call+0x7/0xb Mar 17 06:25:04 peace kernel: Mar 17 06:25:04 peace kernel: Code: 8b 11 0f 18 02 90 39 59 18 89 c8 74 13 85 d2 89 d1 75 ed 31 Mar 17 06:25:04 peace kernel: <6>note: find[1170] exited with preempt_count 1 Mar 17 06:25:04 peace kernel: Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 Mar 17 06:25:04 peace kernel: in_atomic():1, irqs_disabled():0 Mar 17 06:25:04 peace kernel: Call Trace: Mar 17 06:25:04 peace kernel: [__might_sleep+172/224] __might_sleep+0xac/0xe0 Mar 17 06:25:04 peace kernel: [profile_exit_task+35/96] profile_exit_task+0x23/0x60 Mar 17 06:25:04 peace kernel: [do_exit+117/2480] do_exit+0x75/0x9b0 Mar 17 06:25:04 peace kernel: [die+594/608] die+0x252/0x260 Mar 17 06:25:04 peace kernel: [do_page_fault+0/1360] do_page_fault+0x0/0x550 Mar 17 06:25:04 peace kernel: [do_page_fault+485/1360] do_page_fault+0x1e5/0x550 Mar 17 06:25:04 peace kernel: [__getblk+28/64] __getblk+0x1c/0x40 Mar 17 06:25:04 peace kernel: [ext3_getblk+119/576] ext3_getblk+0x77/0x240 Mar 17 06:25:04 peace kernel: [wake_up_buffer+9/48] wake_up_buffer+0x9/0x30 Mar 17 06:25:04 peace kernel: [ll_rw_block+92/144] ll_rw_block+0x5c/0x90 Mar 17 06:25:04 peace kernel: [do_page_fault+0/1360] do_page_fault+0x0/0x550 Mar 17 06:25:04 peace kernel: [error_code+45/56] error_code+0x2d/0x38 Mar 17 06:25:04 peace kernel: [find_inode_fast+32/96] find_inode_fast+0x20/0x60 Mar 17 06:25:04 peace kernel: [iget_locked+176/672] iget_locked+0xb0/0x2a0 Mar 17 06:25:04 peace kernel: [ext3_lookup+92/176] ext3_lookup+0x5c/0xb0 Mar 17 06:25:04 peace kernel: [real_lookup+206/256] real_lookup+0xce/0x100 Mar 17 06:25:04 peace kernel: [do_lookup+117/128] do_lookup+0x75/0x80 Mar 17 06:25:04 peace kernel: [link_path_walk+2138/4224] link_path_walk+0x85a/0x1080 Mar 17 06:25:04 peace kernel: [kmem_cache_alloc+134/480] kmem_cache_alloc+0x86/0x1e0 Mar 17 06:25:04 peace kernel: [getname+126/192] getname+0x7e/0xc0 Mar 17 06:25:04 peace kernel: [__user_walk+61/80] __user_walk+0x3d/0x50 Mar 17 06:25:04 peace kernel: [vfs_lstat+29/80] vfs_lstat+0x1d/0x50 Mar 17 06:25:04 peace kernel: [sys_lstat64+22/48] sys_lstat64+0x16/0x30 Mar 17 06:25:04 peace kernel: [syscall_call+7/11] syscall_call+0x7/0xb Mar 17 06:25:04 peace kernel: Mar 17 06:25:04 peace kernel: bad: scheduling while atomic! Mar 17 06:25:04 peace kernel: Call Trace: Mar 17 06:25:04 peace kernel: [schedule+2311/2320] schedule+0x907/0x910 Mar 17 06:25:04 peace kernel: [zap_pmd_range+68/96] zap_pmd_range+0x44/0x60 Mar 17 06:25:04 peace kernel: [unmap_page_range+70/128] unmap_page_range+0x46/0x80 Mar 17 06:25:04 peace kernel: [unmap_vmas+534/848] unmap_vmas+0x216/0x350 Mar 17 06:25:04 peace kernel: [__pagevec_lru_add_active+451/672] __pagevec_lru_add_active+0x1c3/0x2a0 Mar 17 06:25:04 peace kernel: [exit_mmap+199/688] exit_mmap+0xc7/0x2b0 Mar 17 06:25:04 peace kernel: [dump_stack+23/32] dump_stack+0x17/0x20 Mar 17 06:25:04 peace kernel: [mmput+173/288] mmput+0xad/0x120 Mar 17 06:25:04 peace kernel: [do_exit+482/2480] do_exit+0x1e2/0x9b0 Mar 17 06:25:04 peace kernel: [die+594/608] die+0x252/0x260 Mar 17 06:25:04 peace kernel: [do_page_fault+0/1360] do_page_fault+0x0/0x550 Mar 17 06:25:04 peace kernel: [do_page_fault+485/1360] do_page_fault+0x1e5/0x550 Mar 17 06:25:04 peace kernel: [__getblk+28/64] __getblk+0x1c/0x40 Mar 17 06:25:04 peace kernel: [ext3_getblk+119/576] ext3_getblk+0x77/0x240 Mar 17 06:25:04 peace kernel: [wake_up_buffer+9/48] wake_up_buffer+0x9/0x30 Mar 17 06:25:04 peace kernel: [ll_rw_block+92/144] ll_rw_block+0x5c/0x90 Mar 17 06:25:04 peace kernel: [do_page_fault+0/1360] do_page_fault+0x0/0x550 Mar 17 06:25:04 peace kernel: [error_code+45/56] error_code+0x2d/0x38 Mar 17 06:25:04 peace kernel: [find_inode_fast+32/96] find_inode_fast+0x20/0x60 Mar 17 06:25:04 peace kernel: [iget_locked+176/672] iget_locked+0xb0/0x2a0 Mar 17 06:25:04 peace kernel: [ext3_lookup+92/176] ext3_lookup+0x5c/0xb0 Mar 17 06:25:04 peace kernel: [real_lookup+206/256] real_lookup+0xce/0x100 Mar 17 06:25:04 peace kernel: [do_lookup+117/128] do_lookup+0x75/0x80 Mar 17 06:25:04 peace kernel: [link_path_walk+2138/4224] link_path_walk+0x85a/0x1080 Mar 17 06:25:04 peace kernel: [kmem_cache_alloc+134/480] kmem_cache_alloc+0x86/0x1e0 Mar 17 06:25:04 peace kernel: [getname+126/192] getname+0x7e/0xc0 Mar 17 06:25:04 peace kernel: [__user_walk+61/80] __user_walk+0x3d/0x50 Mar 17 06:25:04 peace kernel: [vfs_lstat+29/80] vfs_lstat+0x1d/0x50 Mar 17 06:25:04 peace kernel: [sys_lstat64+22/48] sys_lstat64+0x16/0x30 Mar 17 06:25:04 peace kernel: [syscall_call+7/11] syscall_call+0x7/0xb Mar 17 06:25:04 peace kernel: Mar 17 06:25:04 peace kernel: fs/fs-writeback.c:71: spin_lock(fs/inode.c:c0386770) already locked by fs/inode.c/798 ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-07 13:44 ` Chris Shoemaker @ 2004-08-07 18:49 ` Linus Torvalds 2004-08-07 19:01 ` Gene Heskett 1 sibling, 0 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-07 18:49 UTC (permalink / raw) To: Chris Shoemaker Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Ingo Molnar, vda, ak, William Lee Irwin III On Sat, 7 Aug 2004, Chris Shoemaker wrote: > > Well then, maybe you'd like more? I attached two more from the same > period. Please remember that these are 5 months old, and could > represent bugs already fixed. I think this was stock 2.6.4. These look like total memory corruption, they don't look anything like the prune_dcache things. > Perhaps due to CONFIG_REGPARM? I haven't used it for quite a while, but > back in March I was a bit bolder about config options marked > experimental. Entirely possible. gcc has historically had bugs in regparm (extra register pressure causing incorrect register re-use). It's supposed to be fixed in gcc-3+ Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-07 13:44 ` Chris Shoemaker 2004-08-07 18:49 ` Linus Torvalds @ 2004-08-07 19:01 ` Gene Heskett 1 sibling, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-07 19:01 UTC (permalink / raw) To: linux-kernel Cc: Chris Shoemaker, Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak, William Lee Irwin III On Saturday 07 August 2004 09:44, Chris Shoemaker wrote: >On Fri, Aug 06, 2004 at 11:20:28PM -0700, Linus Torvalds wrote: >> On Fri, 6 Aug 2004, Chris Shoemaker wrote: >> > I _was_ able to find the attached oops, but I don't think I have >> > the corresponding object files, so I hope the decoding it >> > contains is good enough. >> >> It's fine. > >Well then, maybe you'd like more? I attached two more from the same >period. Please remember that these are 5 months old, and could >represent bugs already fixed. I think this was stock 2.6.4. > >> It oopses on >> >> inode->i_sb->s_op >> >> where "i_sb" is bad and contains the pointer "0x0b7eebf8" which is >> definitely not a valid kernel pointer. >> >> There's a few other strange details in your oops report too. One >> being that the inode pointer (in %ebx, apparently) doesn't show on >> the stack where I'd expect it to show. Hmm. That might be just a >> different compiler issue, though. > >Perhaps due to CONFIG_REGPARM? I haven't used it for quite a while, > but back in March I was a bit bolder about config options marked > experimental. Gene, are you using REGPARM? > >-chris No Chris. I think I may have had it on for maybe 10 minutes, in 2.6.7-mmsomething maybe. But it died without a trace, (on the old motherboard with an Athlon 1600XP on it) as I was starting X, so on the next reboot to 2.6.7, I turned it back off and haven't turned it back on since. IIRC that was before the video card took ill, so at that point I was blaming my problems, which were generally only post problems then, as symptoms of heat. TBE it had to warm up before it would post! By the time the video card wouldn't post, memtest86 was also finding bad memory (with a new card plugged in), hence the whole mobo got retired. >> Anyway, this does look somewhat like the ones Gene is seeing. If I >> had to guess, I'd guess that either the inode pointer is bad, or >> it's just stale from an inode that has already been free'd. Most >> likely because of prune_dcache() having had a corrupt LRU list >> with a stale/corrupt entry. >> >> That would blow the prefetch theory out of the water. >> >> Linus -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG [not found] ` <20040806073739.GA6617@elte.hu> [not found] ` <20040806004231.143c8bd2.akpm@osdl.org> @ 2004-08-06 11:31 ` Andi Kleen 2004-08-06 17:16 ` Linus Torvalds 2 siblings, 0 replies; 146+ messages in thread From: Andi Kleen @ 2004-08-06 11:31 UTC (permalink / raw) To: Ingo Molnar; +Cc: torvalds, vda, gene.heskett, linux-kernel, akpm On Fri, 6 Aug 2004 09:37:39 +0200 Ingo Molnar <mingo@elte.hu> wrote: > > ebx is 00000008, it came in from (%esi), which is (0xc20a7b30) - that > looks like a valid pointer. > > to me this crash seems to imply prefetch. Can you add the following patch and see if it triggers at all? Maybe it is just the software prefetch fault handler that is somehow buggy. There was a change there recently to handle NX, maybe that broke something. Also testing with prefetch disabled (see my earlier patch) may also be useful just to see if it triggers then too. -Andi diff -u linux-2.6.8rc2-update/arch/i386/mm/fault.c-o linux-2.6.8rc2-update/arch/i386/mm/fault.c --- linux-2.6.8rc2-update/arch/i386/mm/fault.c-o 2004-07-28 02:23:24.000000000 +0200 +++ linux-2.6.8rc2-update/arch/i386/mm/fault.c 2004-08-05 22:20:02.000000000 +0200 @@ -21,6 +21,7 @@ #include <linux/vt_kern.h> /* For unblank_screen() */ #include <linux/highmem.h> #include <linux/module.h> +#include <linux/kallsyms.h> #include <asm/system.h> #include <asm/uaccess.h> @@ -185,6 +186,12 @@ break; } } + + if (prefetch) { + printk("corrected prefetch fault at %lx ", addr); + print_symbol("eip %s\n", regs->eip); + } + return prefetch; } @@ -193,6 +200,9 @@ { if (unlikely(boot_cpu_data.x86_vendor == X86_VENDOR_AMD && boot_cpu_data.x86 >= 6)) { + printk("possible prefetch fault at %lx ", addr); + print_symbol("eip %s\n", regs->eip); + /* Catch an obscure case of prefetch inside an NX page. */ if (nx_enabled && (error_code & 16)) return 0; ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG [not found] ` <20040806073739.GA6617@elte.hu> [not found] ` <20040806004231.143c8bd2.akpm@osdl.org> 2004-08-06 11:31 ` Andi Kleen @ 2004-08-06 17:16 ` Linus Torvalds 2 siblings, 0 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-06 17:16 UTC (permalink / raw) To: Ingo Molnar Cc: Denis Vlasenko, Gene Heskett, Kernel Mailing List, Andrew Morton, Andi Kleen On Fri, 6 Aug 2004, Ingo Molnar wrote: > > last night i ran another overnight test: 2.6.8-rc3-vanilla with > CONFIG_PREEMPT enabled and no other changes. I've also reduced the CPU's > clock speed by 5% to reduce the chance of hw problems. The crash below > triggered after roughly 12 hours of runtime. I've also attached the full > disassembly of __d_lookup(). The crash happens in hlist_for_each(): > > c01632f3: 8d b6 00 00 00 00 lea 0x0(%esi),%esi > c01632f9: 8d bc 27 00 00 00 00 lea 0x0(%edi,1),%edi > c0163300: 8b 03 mov (%ebx),%eax <==== [*] > > the crashing instruction is preceeded by two prefetch instructions (the > disassembly has the alternate-insn NOP). That's not right. The prefetchnta instruction is three or four bytes long (four if it uses the ebp register that needs the "0(ebp)" modrm format). We use a NOP4 for space in there, and the things you point to are a NOP6+NOP7 pair. Your two nop's are the ones gcc has inserted in order to start the loop at a 16-byte boundary (ie c0163300 is the top of the loop). The nop that gets replaced by a prefetch is the instruction _after_ the one that faulted for you: 8b 03 mov (%ebx),%eax 8d 74 26 00 lea 0x0(%esi,1),%esi I think. > to me this crash seems to imply prefetch. I don't think it's obvious yet. It's close to the prefetch, but it's the instruction just before. Which in an OoO CPU doesn't necessarily mean much, of course - or it could be that the prefetch caused some trouble last time around the loop and we only see it now. Or it could be totally prefetch-unrelated. I do find the prefetch thing intriguing, though. Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 16:26 ` Linus Torvalds 2004-08-05 18:06 ` Ingo Molnar @ 2004-08-05 21:10 ` Chris Shoemaker 2004-08-06 2:03 ` Gene Heskett 2004-08-06 2:12 ` Gene Heskett 3 siblings, 0 replies; 146+ messages in thread From: Chris Shoemaker @ 2004-08-05 21:10 UTC (permalink / raw) To: Linus Torvalds Cc: Denis Vlasenko, Gene Heskett, Kernel Mailing List, Andrew Morton On Thu, Aug 05, 2004 at 09:26:10AM -0700, Linus Torvalds wrote: > > Anyway, one other thing that makes me worry is the fact that Gene > apparently has a K7. One of the things AMD has gotten wrong several times > is prefetching, and it so happens that the dcache code is one of the users > of the prefetch instruction. prude_dcache() in particular. > > So I'm also entertaining the notion that there's an actual prefetch data > corruption, not just the known AMD bug with occasional spurious page > faults. Who else has seen the problem? What CPU's are involved? > > Linus Assuming that what I was seeing was the same problem... chris@peace:~$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Celeron (Coppermine) stepping : 10 cpu MHz : 1002.487 cache size : 128 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 1982.46 BTW, a recent oops from wli looked similar, but I don't think he's spoken up in this thread. He seems busy tracking down other things. -chris ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 16:26 ` Linus Torvalds 2004-08-05 18:06 ` Ingo Molnar 2004-08-05 21:10 ` Chris Shoemaker @ 2004-08-06 2:03 ` Gene Heskett 2004-08-06 2:12 ` Gene Heskett 3 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-06 2:03 UTC (permalink / raw) To: linux-kernel On Thursday 05 August 2004 12:26, Linus Torvalds wrote: >On Thu, 5 Aug 2004, Denis Vlasenko wrote: >> It is not a BUG(). > >Oh yes it is. The 2.6.8-rc2-mm2 report definitely was a BUG(). > >Earlier ones may not have been, but on the other hand, earlier ones > may not have had the BUG()-check for corrupted list_del() usage - > it's not in the standard kernel, and I don't know when it was added > to -mm. (We used to have it a _long_ time ago, but then we removed > it because there were no reports of problems). > >> It's an oops (dereferencing a d_op pointer with value >> 0x00000900+14 IIRC, Gene has complete disassembly with location of >> that event). > >.. and that must be because of some kind of pointer corruption, > where the dentry was either free'd twice or the dentry simply isn't > a dentry at all, it just got to be used as such because of some > bug. > >> It is not reproducible on request, but happens for him from time >> to time in the same place with the same bogus value of d_op. > >I've followed the discussion. You may not have noticed that the last > one was different. (And I _think_ it may hav ebeen the first time > Gene did a -mm kernel, so I do believe that the list_del() > debugging was the thing that caught it). > >Anyway, one other thing that makes me worry is the fact that Gene >apparently has a K7. One of the things AMD has gotten wrong several > times is prefetching, and it so happens that the dcache code is one > of the users of the prefetch instruction. prude_dcache() in > particular. > >So I'm also entertaining the notion that there's an actual prefetch > data corruption, not just the known AMD bug with occasional > spurious page faults. Who else has seen the problem? What CPU's are > involved? > > Linus Two things that may be of interest: 1, I do run the -mm kernels too as I figure the more they get excersized, the quicker some fault will be found. This included the whole chain of 2.6.7's & 2.6.8-rc1/2-mm1/2. In this case all hell broke loose here while everyone was at the conventions. Not your fault, but I was "grabbing a life jacket" here. 2. with the patch Linus sent, top is now showing 383 megs of free ram. I suspect that without that patch, it might well be less than 15 megs after a many (10+) hour uptime. So this patch is definitely more aggressive in its memory housekeeping than without it. And everything is on except the acpi stuffs, which has ever interested me here. I do use apm, but only for shutdowns & rtc in utc. PREEMPT, 4k stacks, page tables in high mem, frame pointers etc, its all on right now. So far, I haven't even heard the first shoe drop. :) -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 16:26 ` Linus Torvalds ` (2 preceding siblings ...) 2004-08-06 2:03 ` Gene Heskett @ 2004-08-06 2:12 ` Gene Heskett 3 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-06 2:12 UTC (permalink / raw) To: linux-kernel On Thursday 05 August 2004 12:26, Linus Torvalds wrote: [...] >Anyway, one other thing that makes me worry is the fact that Gene >apparently has a K7. One of the things AMD has gotten wrong several > times is prefetching, and it so happens that the dcache code is one > of the users of the prefetch instruction. prude_dcache() in > particular. > >So I'm also entertaining the notion that there's an actual prefetch > data corruption, not just the known AMD bug with occasional > spurious page faults. Who else has seen the problem? What CPU's are > involved? > > Linus If we run it down to that, can I bounce it back at AMD as defective? Or can it be coded around? If its bugging my Athlon 2800XP, then I'd have to assume (dmesg says its stepping 00 FWIW) that I'm far from alone. AMD is peddling these just like Orville R. sells popcorn. And, from previous experience with this particular vendor, if I give convincing proof the chip really is from a defective run, I'd suspect a replacement would be in a fedex bag & headed my way before the day is out. But I'd need proof of the problem. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-05 7:25 ` Linus Torvalds 2004-08-05 7:31 ` Andrew Morton 2004-08-05 8:33 ` Denis Vlasenko @ 2004-08-06 2:50 ` Linus Torvalds 2004-08-06 3:18 ` viro 2 siblings, 1 reply; 146+ messages in thread From: Linus Torvalds @ 2004-08-06 2:50 UTC (permalink / raw) To: Gene Heskett; +Cc: Kernel Mailing List, Andrew Morton On Thu, 5 Aug 2004, Linus Torvalds wrote: > > I _suspect_ you hit the new "list_del-debug.patch" in Andrew's tree, > because in my tree there are no BUG_ON's in prune_cache() at all. Hmm.. I'm starting to have a wild suspicion here. Let's look at this: the d_lru list is used for dentries that have had their count go down to zero in dput() (which is most of them, actually), and we do _not_ move them away from the unused list when we increment their count again, because we don't want to take the dcache lock in the critical lookup region. So what we do to determine whether they are on the list or not is not to look at the count, but is we mark dentries that are _not_ on the LRU list by making their d_lru list be empty. This is why the dcache code uses the "careful" delete function "list_del_init()" a lot - because when we remove the dentry from the unused list, we really need to _mark_ it removed. Then the removal code does if (!list_empty(&dentry->d_lru)) .. Fine. HOWEVER. Sometimes we use the plain "list_del()", because we know that we're going to throw the dentry away. And in shrink_dcache_anon() we do it because we expect to add it back to the dentry list. BUT WE DON'T ALWAYS DO THAT! So as far as I can tell, shrink_dcache_anon() will have _removed_ a dentry from the unused_list, but still left the dentry with wild pointers pointing to other dentries. Next time around we do a dput() on such a dentry, we'll be screwed, because we'll try to remove it again. Boom. Does anybody see why this isn't a serious dentry list corruption case? Or am I just crazy? But if I'm right, this particular bug should only hit you if you export a filesystem through knfsd (I don't see how you'd get an anonymous dentry any other way). Oh. XFS with some of the magic ioctls will do it too. But I don't think Gene had either of those enabled, so.. But this may explain _some_ of the dcache problems, and maybe we have more than one bug here. Comments? Am I getting senile? Linus ---- ===== fs/dcache.c 1.88 vs edited ===== --- 1.88/fs/dcache.c 2004-06-24 01:55:55 -07:00 +++ edited/fs/dcache.c 2004-08-05 19:35:03 -07:00 @@ -628,7 +628,7 @@ struct dentry *this = hlist_entry(lp, struct dentry, d_hash); if (!list_empty(&this->d_lru)) { dentry_stat.nr_unused--; - list_del(&this->d_lru); + list_del_init(&this->d_lru); } /* ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 2:50 ` Linus Torvalds @ 2004-08-06 3:18 ` viro 2004-08-06 3:24 ` Linus Torvalds 0 siblings, 1 reply; 146+ messages in thread From: viro @ 2004-08-06 3:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: Gene Heskett, Kernel Mailing List, Andrew Morton On Thu, Aug 05, 2004 at 07:50:28PM -0700, Linus Torvalds wrote: > So as far as I can tell, shrink_dcache_anon() will have _removed_ a dentry > from the unused_list, but still left the dentry with wild pointers > pointing to other dentries. Next time around we do a dput() on such a > dentry, we'll be screwed, because we'll try to remove it again. Boom. It doesn't even take a dput(). Look: we do list_del(), then notice that sucker still has positive refcount and leave it alone. Now think what happens on the next pass. That's right, we hit that dentry *again*. And see that list_empty() is false. And do list_del() one more time. However, what used to be e.g. next dentry might very well be freed by now. *BOOM*. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 3:18 ` viro @ 2004-08-06 3:24 ` Linus Torvalds 2004-08-08 4:42 ` Gene Heskett 2004-08-08 14:30 ` Gene Heskett 0 siblings, 2 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-06 3:24 UTC (permalink / raw) To: viro; +Cc: Gene Heskett, Kernel Mailing List, Andrew Morton On Fri, 6 Aug 2004 viro@parcelfarce.linux.theplanet.co.uk wrote: > > It doesn't even take a dput(). Look: we do list_del(), then notice that > sucker still has positive refcount and leave it alone. Now think what > happens on the next pass. That's right, we hit that dentry *again*. And > see that list_empty() is false. And do list_del() one more time. Well, the sad part is that doing another list_del() won't even necessarily go *boom*. Most of the time it might even leave the list as-is, but often enough it should give list corruption. > However, what used to be e.g. next dentry might very well be freed by > now. *BOOM*. Absolutely. It does look like a rather nasty bug. It doesn't explain what Gene sees, though, unless you can explain how we'd get an anon dentry without knfsd/xfs. Oh well. I'll commit the obvious one-liner fix, since it might explain _some_ problems people have seen. Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 3:24 ` Linus Torvalds @ 2004-08-08 4:42 ` Gene Heskett 2004-08-08 14:30 ` Gene Heskett 1 sibling, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-08 4:42 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds, viro, Andrew Morton On Thursday 05 August 2004 23:24, Linus Torvalds wrote: >On Fri, 6 Aug 2004 viro@parcelfarce.linux.theplanet.co.uk wrote: >> It doesn't even take a dput(). Look: we do list_del(), then >> notice that sucker still has positive refcount and leave it alone. >> Now think what happens on the next pass. That's right, we hit >> that dentry *again*. And see that list_empty() is false. And do >> list_del() one more time. > >Well, the sad part is that doing another list_del() won't even > necessarily go *boom*. Most of the time it might even leave the > list as-is, but often enough it should give list corruption. > >> However, what used to be e.g. next dentry might very well be freed >> by now. *BOOM*. > >Absolutely. It does look like a rather nasty bug. > >It doesn't explain what Gene sees, though, unless you can explain > how we'd get an anon dentry without knfsd/xfs. Oh well. > >I'll commit the obvious one-liner fix, since it might explain _some_ >problems people have seen. > > Linus I just had to reboot, after about an 8 hour uptime with the 'one liner' only on top of 2.6.8-rc3. Out of memory basicly. tvtime and mozilla were casualties of what must be the Oom killer. Nothing in the logs. I had seti@home, X, kde3.3-beta2, and its kmail, plus top, tail, tvtime and mozilla. Moz died first, or at least thats what I noticed first. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-06 3:24 ` Linus Torvalds 2004-08-08 4:42 ` Gene Heskett @ 2004-08-08 14:30 ` Gene Heskett 2004-08-08 18:39 ` Andrew Morton 1 sibling, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-08 14:30 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds, viro, Andrew Morton On Thursday 05 August 2004 23:24, Linus Torvalds wrote: [...] >I'll commit the obvious one-liner fix, since it might explain _some_ >problems people have seen. > > Linus I had to reboot late last night, out of memory and things (like mozilla (1.7.2) were dying, but nothing in the logs. Nearly out again, now ~40megs free but so far its stable & nothing in swap. I'm getting the impression there is a memory leak somewhere. OOm hasn't killed anything I am using at this time anyway. Its running like an arthritic dog though, 3 units for seti yesterday, s/b 6 to 7. The gkrellm2 cpu usage display looks plumb normal, so I'm a bit puzzled as to why the slowdown, the rest of the system 'feels good'. This is with just the 'one liner' on top of rc3 & non-verbose-debug. The question is, is rc3-mm1 ready for *me* to try? I don't want to be the hangup, holding up forward progress, but it appears I (& maybe 1 or 2 others) may be exactly that with all this time sitting around waiting for the other shoe to drop. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-08 14:30 ` Gene Heskett @ 2004-08-08 18:39 ` Andrew Morton 2004-08-10 4:12 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Andrew Morton @ 2004-08-08 18:39 UTC (permalink / raw) To: gene.heskett; +Cc: linux-kernel, torvalds, viro Gene Heskett <gene.heskett@verizon.net> wrote: > > On Thursday 05 August 2004 23:24, Linus Torvalds wrote: > > [...] > > >I'll commit the obvious one-liner fix, since it might explain _some_ > >problems people have seen. > > > > Linus > > I had to reboot late last night, out of memory and things (like > mozilla (1.7.2) were dying, but nothing in the logs. Please wait for it to happen again, then send the contents of /proc/meminfo, /proc/slabinfo and then do su dmesg -c echo m > /proc/sysrq-trigger dmesg > foo and send foo as well. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-08 18:39 ` Andrew Morton @ 2004-08-10 4:12 ` Gene Heskett 2004-08-11 3:42 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-10 4:12 UTC (permalink / raw) To: linux-kernel; +Cc: Andrew Morton, torvalds, viro On Sunday 08 August 2004 14:39, Andrew Morton wrote: >Gene Heskett <gene.heskett@verizon.net> wrote: >> On Thursday 05 August 2004 23:24, Linus Torvalds wrote: >> >> [...] >> >> >I'll commit the obvious one-liner fix, since it might explain >> > _some_ problems people have seen. >> > >> > Linus >> >> I had to reboot late last night, out of memory and things (like >> mozilla (1.7.2) were dying, but nothing in the logs. > >Please wait for it to happen again, then send the contents of >/proc/meminfo, /proc/slabinfo and then do > > su > dmesg -c > echo m > /proc/sysrq-trigger > dmesg > foo > >and send foo as well. I just had to reboot again. Top was showing about 50 megs free, and there was about 60 megs in the swap. Top wasn't showing anything else of interest that I noted. I've been gone all day, a long day at that, 12 hours. We had another blowup in the hi voltage at the tv transmitter, and we'll be sometime tomorrow getting things back to normal there. Its 40 years old, and quite far up the far end of the "bathtub curve". I left about 10:15 and came back in about 22:30. A friend had been trying to reach me over an alsa problem, and I'd opened a shell and was showing him how the new 2.6 modprobe.conf worked. When we were done, I hit a q to quit less, and (surprise) the whole shell went away, and I could not start another shell, each attempt being reported as an error 5 on the kickstart panel at the bottom of the screen after the new window opened and reclosed in about 100 milliseconds per attempt. I quit the top program to free that shell, and thinking maybe I was being attacked, entered a 'w' at the prompt, and that shell went away too, with the same error. That left me with the tail on the log, which at that point still wasn't showing me anything but a samba restart I do once daily else it dies from a profound lack of interest anyway. I right clicked on the screen and selected quit X. It quit, but then a trap error was reported. I typed "reboot" and the machine reported no more processes at this run level and was then DOA, requireing a tap on the reset button to bring it back to life. On the subsequent e2fsck's, /dev/hda8 had this error: i_dir_acl for inode 654880 (/lib/local/ar_YE is 42752 but s/b zero. And then dropped me to a shell to run e2fsck without any options. Which I did. Eventually it asked me if I wanted to clear that inode, so I answered 'y' and it finished without any other errors, but when I did the ctl-d to reboot, it still wanted to do an e2fsck on everything, which passed. So now I'm rebooted, but without anything of meaning (there is nothing in the logs) to report. Any evidence of the debacle is now gone. Also, during the reboot, I'm blind from "ok, booting the kernel" until the line in something that sets the default font is executed, setting it to "lat0_sun16" at which time I have readable info on the screen again. I don't recall seeing that particular font mentioned in a make xconfig, so I've no idea how to make it use it from square one so I can read the dmesg as it goes by the first time. I have iso-8859-1 compiled in, along with codepage 437 for US useage, with everything else as modular. How can I fix this "blind" time? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-10 4:12 ` Gene Heskett @ 2004-08-11 3:42 ` Gene Heskett 2004-08-11 3:46 ` Linus Torvalds 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-11 3:42 UTC (permalink / raw) To: linux-kernel; +Cc: Andrew Morton, torvalds, viro On Tuesday 10 August 2004 00:12, Gene Heskett wrote: >On Sunday 08 August 2004 14:39, Andrew Morton wrote: >>Gene Heskett <gene.heskett@verizon.net> wrote: >>> On Thursday 05 August 2004 23:24, Linus Torvalds wrote: >>> >>> [...] >>> >>> >I'll commit the obvious one-liner fix, since it might explain >>> > _some_ problems people have seen. >>> > >>> > Linus Linus, I hate to be a killjoy on this, but I just had to reboot again, it was killing processes, even first the shells I had open then kmail and X this time, but with nothing in the logs, and when X had quit, a top in the launching shell reported nearly 250 megs free with nothing in the swap. So I'm not getting any usefull data, the machine is dog slow: real 17m51.460s user 13m11.201s sys 1m34.718s That should have been 6 minutes maximum. I got rc4 as the whole thing just now, maybe there was something wrong with the 2.6.7 base I was using. Thats rare since I quit getting the .bz2's, switching to tar.gz's which seem to be the more dependable format here. >>> I had to reboot late last night, out of memory and things (like >>> mozilla (1.7.2) were dying, but nothing in the logs. >> >>Please wait for it to happen again, then send the contents of >>/proc/meminfo, /proc/slabinfo and then do >> >> su >> dmesg -c >> echo m > /proc/sysrq-trigger >> dmesg > foo >> >>and send foo as well. The above was not available (X wouldn't restart), and trying to print from any kde app causes the app, and its launcher, to exit. So I don't have a paper copy and my memory isn't photographic, please accept my apologies on this. Maybe rc4 will also do it. We'll find out I guess. Reboot time again. [...] -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 3:42 ` Gene Heskett @ 2004-08-11 3:46 ` Linus Torvalds 2004-08-11 4:18 ` Udo A. Steinberg 2004-08-11 4:47 ` Gene Heskett 0 siblings, 2 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-11 3:46 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Andrew Morton, viro On Tue, 10 Aug 2004, Gene Heskett wrote: > > Linus, I hate to be a killjoy on this, but I just had to reboot again, Note that this is something else going on. The "obvious one-liner" can be an issue only with certain special XFS stuff or knfsd, neither of which you have. > it was killing processes, even first the shells I had open then kmail > and X this time, but with nothing in the logs, and when X had quit, a > top in the launching shell reported nearly 250 megs free with nothing > in the swap. As Andrew already requested, the only way for us to figure out what is wrong is to get output from you on where the memory has gone. Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps axm" helps too. If it is slow, the above will still work. Just save them away and reboot. Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 3:46 ` Linus Torvalds @ 2004-08-11 4:18 ` Udo A. Steinberg 2004-08-11 5:13 ` Linus Torvalds 2004-08-11 4:47 ` Gene Heskett 1 sibling, 1 reply; 146+ messages in thread From: Udo A. Steinberg @ 2004-08-11 4:18 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, viro [-- Attachment #1: Type: text/plain, Size: 20973 bytes --] On Tue, 10 Aug 2004 20:46:33 -0700 (PDT) Linus Torvalds (LT) wrote: I'm currently using 2.6.8-rc4 and I'm seeing the same problem. Each day the machine just gets slower and swappier, even though I'm always running the same workload. Rebooting helps a lot. The machine has very little memory (128MB). LT> As Andrew already requested, the only way for us to figure out what is LT> wrong is to get output from you on where the memory has gone. Notably, the LT> output of "/proc/meminfo" and "/proc/slabinfo". "ps axm" helps too. See below. -Udo. MemTotal: 125124 kB MemFree: 1404 kB Buffers: 19060 kB Cached: 40484 kB SwapCached: 33336 kB Active: 70176 kB Inactive: 41892 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 125124 kB LowFree: 1404 kB SwapTotal: 506512 kB SwapFree: 455536 kB Dirty: 4 kB Writeback: 0 kB Mapped: 65312 kB Slab: 9068 kB Committed_AS: 99576 kB PageTables: 704 kB VmallocTotal: 909268 kB VmallocUsed: 8936 kB VmallocChunk: 900312 kB slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 : slabdata 4 4 0 rpc_tasks 8 25 160 25 1 : tunables 120 60 0 : slabdata 1 1 0 rpc_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 xfrm6_tunnel_spi 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 fib6_nodes 5 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 ip6_dst_cache 5 18 224 18 1 : tunables 120 60 0 : slabdata 1 1 0 ndisc_cache 1 25 160 25 1 : tunables 120 60 0 : slabdata 1 1 0 raw6_sock 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0 udp6_sock 0 0 608 6 1 : tunables 54 27 0 : slabdata 0 0 0 tcp6_sock 6 7 1120 7 2 : tunables 24 12 0 : slabdata 1 1 0 unix_sock 42 50 384 10 1 : tunables 54 27 0 : slabdata 5 5 0 ip_conntrack 4 25 160 25 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_tw_bucket 2 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_bind_bucket 10 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 secpath_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 xfrm_dst_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 ip_fib_hash 9 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 ip_dst_cache 10 30 256 15 1 : tunables 120 60 0 : slabdata 2 2 0 arp_cache 1 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 480 8 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 1 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 11 16 1024 4 1 : tunables 54 27 0 : slabdata 4 4 0 flow_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uhci_urb_priv 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0 ntfs_big_inode_cache 0 0 448 9 1 : tunables 54 27 0 : slabdata 0 0 0 ntfs_inode_cache 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 ntfs_name_cache 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 ntfs_attr_ctx_cache 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 ntfs_index_ctx_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 smb_request 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 smb_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 nfs_write_data 36 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0 nfs_read_data 32 36 416 9 1 : tunables 54 27 0 : slabdata 4 4 0 nfs_inode_cache 0 0 544 7 1 : tunables 54 27 0 : slabdata 0 0 0 nfs_page 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0 ext2_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 ext2_xattr 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0 journal_handle 16 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 46 162 48 81 1 : tunables 120 60 0 : slabdata 2 2 0 revoke_table 12 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 ext3_inode_cache 844 1359 448 9 1 : tunables 54 27 0 : slabdata 151 151 0 ext3_xattr 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 0 0 20 185 1 : tunables 120 60 0 : slabdata 0 0 0 file_lock_cache 1 43 92 43 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 2 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 shmem_inode_cache 8 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 2 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 cfq_pool 64 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 4 65 60 65 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_ioc 43 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 9 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0 blkdev_requests 4 26 152 26 1 : tunables 120 60 0 : slabdata 1 1 0 biovec-(256) 60 60 3072 2 2 : tunables 24 12 0 : slabdata 30 30 0 biovec-128 121 125 1536 5 2 : tunables 24 12 0 : slabdata 25 25 0 biovec-64 242 245 768 5 1 : tunables 54 27 0 : slabdata 49 49 0 biovec-16 242 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0 biovec-4 242 244 64 61 1 : tunables 120 60 0 : slabdata 4 4 0 biovec-1 242 452 16 226 1 : tunables 120 60 0 : slabdata 2 2 0 bio 259 366 64 61 1 : tunables 120 60 0 : slabdata 6 6 0 sock_inode_cache 65 77 352 11 1 : tunables 54 27 0 : slabdata 7 7 0 skbuff_head_cache 520 580 192 20 1 : tunables 120 60 0 : slabdata 29 29 0 sock 4 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 proc_inode_cache 360 360 320 12 1 : tunables 54 27 0 : slabdata 30 30 0 sigqueue 16 27 148 27 1 : tunables 120 60 0 : slabdata 1 1 0 radix_tree_node 1934 2044 276 14 1 : tunables 54 27 0 : slabdata 146 146 0 bdev_cache 10 18 416 9 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 23 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 1069 1078 288 14 1 : tunables 54 27 0 : slabdata 77 77 0 dentry_cache 2432 4368 140 28 1 : tunables 120 60 0 : slabdata 156 156 0 filp 800 800 160 25 1 : tunables 120 60 0 : slabdata 32 32 0 names_cache 8 8 4096 1 1 : tunables 24 12 0 : slabdata 8 8 0 idr_layer_cache 84 87 136 29 1 : tunables 120 60 0 : slabdata 3 3 0 buffer_head 33719 36126 48 81 1 : tunables 120 60 0 : slabdata 446 446 0 mm_struct 56 56 512 7 1 : tunables 54 27 0 : slabdata 8 8 0 vm_area_struct 1619 1692 84 47 1 : tunables 120 60 0 : slabdata 36 36 0 fs_cache 59 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 54 54 416 9 1 : tunables 54 27 0 : slabdata 6 6 0 signal_cache 95 123 96 41 1 : tunables 120 60 0 : slabdata 3 3 0 sighand_cache 63 63 1312 3 1 : tunables 24 12 0 : slabdata 21 21 0 task_struct 82 85 1424 5 2 : tunables 24 12 0 : slabdata 17 17 0 anon_vma 762 814 8 407 1 : tunables 120 60 0 : slabdata 2 2 0 pgd 46 46 4096 1 1 : tunables 24 12 0 : slabdata 46 46 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 28 28 32768 1 8 : tunables 8 4 0 : slabdata 28 28 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 3 3 16384 1 4 : tunables 8 4 0 : slabdata 3 3 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 84 84 8192 1 2 : tunables 8 4 0 : slabdata 84 84 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 53 53 4096 1 1 : tunables 24 12 0 : slabdata 53 53 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 118 118 2048 2 1 : tunables 24 12 0 : slabdata 59 59 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 184 184 1024 4 1 : tunables 54 27 0 : slabdata 46 46 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 383 568 512 8 1 : tunables 54 27 0 : slabdata 71 71 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 252 480 256 15 1 : tunables 120 60 0 : slabdata 32 32 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0 size-192 200 200 192 20 1 : tunables 120 60 0 : slabdata 10 10 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 335 372 128 31 1 : tunables 120 60 0 : slabdata 12 12 0 size-96(DMA) 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 size-96 1655 1681 96 41 1 : tunables 120 60 0 : slabdata 41 41 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 1782 2013 64 61 1 : tunables 120 60 0 : slabdata 33 33 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 2677 2737 32 119 1 : tunables 120 60 0 : slabdata 23 23 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 PID TTY STAT TIME COMMAND 1 ? - 0:01 init [4] - - S 0:01 - 2 ? - 0:00 [ksoftirqd/0] - - SN 0:00 - 3 ? - 0:00 [events/0] - - S< 0:00 - 4 ? - 0:00 [khelper] - - S< 0:00 - 5 ? - 0:04 [kacpid] - - S< 0:04 - 22 ? - 0:00 [kblockd/0] - - S< 0:00 - 23 ? - 0:00 [khubd] - - S 0:00 - 37 ? - 0:00 [pdflush] - - S 0:00 - 40 ? - 0:00 [aio/0] - - S< 0:00 - 39 ? - 0:02 [kswapd0] - - S 0:02 - 142 ? - 0:00 [pccardd] - - S 0:00 - 144 ? - 0:00 [pccardd] - - S 0:00 - 152 ? - 0:00 [kseriod] - - S 0:00 - 171 ? - 0:00 [kjournald] - - S 0:00 - 330 ? - 0:00 [kjournald] - - S 0:00 - 331 ? - 0:24 [loop0] - - S< 0:24 - 332 ? - 0:00 [kjournald] - - S 0:00 - 333 ? - 0:00 [kjournald] - - S 0:00 - 334 ? - 0:00 [kjournald] - - S 0:00 - 335 ? - 0:00 [kjournald] - - S 0:00 - 497 ? - 0:00 /usr/sbin/syslogd -m 0 - - Ss 0:00 - 511 ? - 0:00 /usr/sbin/klogd -c 3 -x - - Ss 0:00 - 514 ? - 0:00 /sbin/cardmgr - - Ss 0:00 - 857 ? - 0:00 /sbin/rpc.portmap - - Ss 0:00 - 898 ? - 0:00 /usr/sbin/inetd - - Ss 0:00 - 904 ? - 0:00 /usr/local/sbin/sshd - - Ss 0:00 - 914 ? - 0:00 /usr/sbin/crond -l10 - - S 0:00 - 917 ? - 0:00 /usr/sbin/acpid - - Ss 0:00 - 930 ? - 0:00 /usr/sbin/gpm -m /dev/mouse -t ps2 - - Ss 0:00 - 933 ? - 0:00 /usr/sbin/smartd - - S 0:00 - 950 tty2 - 0:00 /sbin/agetty 38400 tty2 linux - - Ss+ 0:00 - 951 tty3 - 0:00 /sbin/agetty 38400 tty3 linux - - Ss+ 0:00 - 952 tty4 - 0:00 /sbin/agetty 38400 tty4 linux - - Ss+ 0:00 - 953 tty5 - 0:00 /sbin/agetty 38400 tty5 linux - - Ss+ 0:00 - 954 tty6 - 0:00 /sbin/agetty 38400 tty6 linux - - Ss+ 0:00 - 955 tty7 - 0:00 /sbin/agetty 38400 tty7 linux - - Ss+ 0:00 - 956 tty8 - 0:00 /sbin/agetty 38400 tty8 linux - - Ss+ 0:00 - 957 tty9 - 0:00 /sbin/agetty 38400 tty9 linux - - Ss+ 0:00 - 958 tty10 - 0:00 /sbin/agetty 38400 tty10 linux - - Ss+ 0:00 - 959 ? - 0:00 /usr/X11R6/bin/xdm -nodaemon - - Ss 0:00 - 1081 ? - 19:54 /usr/X11R6/bin/X -auth /usr/X11R6/lib/X11/xdm/authdir/authfiles/A:0-yl5ncw - - S 19:54 - 1082 ? - 0:00 -:0 - - S 0:00 - 1181 ? - 0:00 [eth1] - - S 0:00 - 1244 ? - 0:00 /sbin/dhcpcd -d eth1 - - Ss 0:00 - 1251 ? - 0:01 blackbox - - S 0:01 - 1280 ? - 6:18 /home/uas/bin/wmbatteries - - S 6:18 - 1282 ? - 0:02 /home/uas/bin/wmcpuload -a -n -lc rgb:ff/ff/33 - - S 0:02 - 1284 ? - 2:24 /home/uas/bin/wmnetload -n eth1 - - S 2:24 - 1286 ? - 0:01 /home/uas/bin/wmmemload -am -b -c -lc rgb:ff/80/30 - - S 0:01 - 1288 ? - 0:12 /home/uas/bin/wmtime -lc rgb:33/33/ff - - S 0:12 - 1292 ? - 0:00 /home/uas/bin/root-tail -f -g 350x10+5-10 -fn -schumacher-clean-medium-r-*-*-10-*-*-*-*-*-*-* -color rgb:cc/cc/ff /var/log/messages rgb:88/88/ff /var/log/syslog rgb:ff/88/ff /var/log/maillog - - Ss 0:00 - 1328 ? - 0:00 licq - - Ss 0:00 - 1331 ? - 0:00 licq - - S 0:00 - 1332 ? - 0:06 licq - - S 0:06 - 1333 ? - 0:00 licq - - S 0:00 - 1334 ? - 0:00 licq - - S 0:00 - 1335 ? - 0:04 licq - - S 0:04 - 1484 ? - 0:00 /home/uas/bin/aterm -geometry 80x25 - - Ss 0:00 - 1485 pts/1 - 0:00 -bash - - Rs 0:00 - 1508 tty1 - 0:00 /sbin/agetty 38400 tty1 linux - - Ss+ 0:00 - 4232 ? - 0:03 xmms - - Ss 0:03 - 4233 ? - 0:00 xmms - - S 0:00 - 4234 ? - 0:00 xmms - - S 0:00 - 4235 ? - 0:00 xmms - - S 0:00 - 4265 ? - 0:00 /bin/sh /usr/local/bin/firefox - - Ss 0:00 - 4277 ? - 0:00 /bin/sh /usr/local/firefox/run-mozilla.sh /usr/local/firefox/firefox-bin - - S 0:00 - 4282 ? - 0:30 /usr/local/firefox/firefox-bin - - S 0:30 - 4283 ? - 0:00 /usr/local/firefox/firefox-bin - - S 0:00 - 4284 ? - 0:00 /usr/local/firefox/firefox-bin - - S 0:00 - 4285 ? - 0:00 /usr/local/firefox/firefox-bin - - S 0:00 - 4307 ? - 0:00 [netstat] <defunct> - - Z 0:00 - 4335 ? - 0:00 [pdflush] - - S 0:00 - 4386 ? - 0:00 xmms - - S 0:00 - 4387 ? - 0:00 xmms - - S 0:00 - 4441 ? - 0:03 sylpheed - - Ss 0:03 - 4447 ? - 0:00 /home/uas/bin/aterm -geometry 80x25 - - Ss 0:00 - 4448 pts/2 - 0:00 -bash - - Ss+ 0:00 - 4474 ? - 0:00 /home/uas/bin/aterm -geometry 80x25 - - Ss 0:00 - 4475 pts/0 - 0:00 -bash - - Ss+ 0:00 - 4489 pts/1 - 0:00 ps axm - - R+ 0:00 - [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 4:18 ` Udo A. Steinberg @ 2004-08-11 5:13 ` Linus Torvalds 2004-08-11 5:15 ` Linus Torvalds 2004-08-11 5:55 ` David S. Miller 0 siblings, 2 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-11 5:13 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: linux-kernel, Andrew Morton, viro On Tue, 10 Aug 2004, Udo A. Steinberg wrote: > > I'm currently using 2.6.8-rc4 and I'm seeing the same problem. Each day the > machine just gets slower and swappier, even though I'm always running the same > workload. Rebooting helps a lot. The machine has very little memory (128MB). This is your slab-info sorted according to use: bytes used slab ---------- ----- 128,000 filp 128,832 size-64 142,128 vm_area_struct 161,376 size-96 184,320 biovec-(256) 188,160 biovec-64 188,416 pgd 188,416 size-1024 192,000 biovec-128 217,088 size-4096 241,664 size-2048 290,816 size-512 310,464 inode_cache 564,144 radix_tree_node 608,832 ext3_inode_cache 611,520 dentry_cache 688,128 size-8192 917,504 size-32768 1,734,048 buffer_head and that "buffer_head" thing really looks strange. I also wonder what the hell is allocating so many 8kB and 32kB entries. That said, the cumulative slabinfo usage seems to be no more than 8,924,868 bytes, so it doesn't seem to be slab that is the problem. > MemTotal: 125124 kB > MemFree: 1404 kB > Buffers: 19060 kB > Cached: 40484 kB > SwapCached: 33336 kB > Active: 70176 kB > Inactive: 41892 kB "active+inactive" adds up to ~111MB, which together with slab accounts for pretty much all your memory. So there isn't anything unaccounted either. So I suspect it's a balancing issue. Possibly just the slight change in slab balancing to fix the highmem problems. Maybe we shrink slab _too_ aggressively or something. Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 5:13 ` Linus Torvalds @ 2004-08-11 5:15 ` Linus Torvalds 2004-08-11 5:33 ` Udo A. Steinberg ` (2 more replies) 2004-08-11 5:55 ` David S. Miller 1 sibling, 3 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-11 5:15 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: linux-kernel, Andrew Morton, viro On Tue, 10 Aug 2004, Linus Torvalds wrote: > > So I suspect it's a balancing issue. Possibly just the slight change in > slab balancing to fix the highmem problems. Maybe we shrink slab _too_ > aggressively or something. Udo, that's a simple thing to check. If it's the slab balancing changes, then you should be able to test it with just a bk cset -x1.1830.4.3 if you have the current BK and are a BK user, or by just revertign the patch here ("patch -R -p1" from inside your linux source tree) if you're not a BK user.. Linus ----- # This is a BitKeeper generated diff -Nru style patch. # # ChangeSet # 2004/07/31 14:47:41-07:00 akpm@osdl.org # [PATCH] slab memory shrinking balancing fix # # The logic in shrink_slab tries to balance the proportion of slab which it # scans against the proportion of pagecache which the caller scanned. Problem # is that with a large number of highmem LRU pages and a small number of lowmem # LRU pages, the amount of pagecache scanning appears to be very small, so we # don't push slab hard enough. # # The patch changes things so that for, say, a GFP_KERNEL allocation attempt we # only consider ZONE_NORMAL and ZONE_DMA when calculating "what proportion of # the LRU did the caller just scan". # # This will have the effect of shrinking slab harder in response to GFP_KERNEL # allocations than for GFP_HIGHMEM allocations. # # Signed-off-by: Andrew Morton <akpm@osdl.org> # Signed-off-by: Linus Torvalds <torvalds@osdl.org> # # mm/vmscan.c # 2004/07/31 14:35:31-07:00 akpm@osdl.org +23 -9 # slab memory shrinking balancing fix # # mm/page_alloc.c # 2004/07/31 14:02:21-07:00 akpm@osdl.org +0 -11 # slab memory shrinking balancing fix # # include/linux/mm.h # 2004/07/31 14:02:26-07:00 akpm@osdl.org +0 -2 # slab memory shrinking balancing fix # diff -Nru a/include/linux/mm.h b/include/linux/mm.h --- a/include/linux/mm.h 2004-08-10 22:15:23 -07:00 +++ b/include/linux/mm.h 2004-08-10 22:15:23 -07:00 @@ -706,8 +706,6 @@ extern struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr); -extern unsigned int nr_used_zone_pages(void); - extern struct page * vmalloc_to_page(void *addr); extern struct page * follow_page(struct mm_struct *mm, unsigned long address, int write); diff -Nru a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c 2004-08-10 22:15:23 -07:00 +++ b/mm/page_alloc.c 2004-08-10 22:15:23 -07:00 @@ -825,17 +825,6 @@ EXPORT_SYMBOL(nr_free_pages); -unsigned int nr_used_zone_pages(void) -{ - unsigned int pages = 0; - struct zone *zone; - - for_each_zone(zone) - pages += zone->nr_active + zone->nr_inactive; - - return pages; -} - #ifdef CONFIG_NUMA unsigned int nr_free_pages_pgdat(pg_data_t *pgdat) { diff -Nru a/mm/vmscan.c b/mm/vmscan.c --- a/mm/vmscan.c 2004-08-10 22:15:23 -07:00 +++ b/mm/vmscan.c 2004-08-10 22:15:23 -07:00 @@ -169,22 +169,25 @@ * slab to avoid swapping. * * We do weird things to avoid (scanned*seeks*entries) overflowing 32 bits. + * + * `lru_pages' represents the number of on-LRU pages in all the zones which + * are eligible for the caller's allocation attempt. It is used for balancing + * slab reclaim versus page reclaim. */ -static int shrink_slab(unsigned long scanned, unsigned int gfp_mask) +static int shrink_slab(unsigned long scanned, unsigned int gfp_mask, + unsigned long lru_pages) { struct shrinker *shrinker; - long pages; if (down_trylock(&shrinker_sem)) return 0; - pages = nr_used_zone_pages(); list_for_each_entry(shrinker, &shrinker_list, list) { unsigned long long delta; delta = (4 * scanned) / shrinker->seeks; delta *= (*shrinker->shrinker)(0, gfp_mask); - do_div(delta, pages + 1); + do_div(delta, lru_pages + 1); shrinker->nr += delta; if (shrinker->nr < 0) shrinker->nr = LONG_MAX; /* It wrapped! */ @@ -896,6 +899,7 @@ int total_scanned = 0, total_reclaimed = 0; struct reclaim_state *reclaim_state = current->reclaim_state; struct scan_control sc; + unsigned long lru_pages = 0; int i; sc.gfp_mask = gfp_mask; @@ -903,8 +907,12 @@ inc_page_state(allocstall); - for (i = 0; zones[i] != 0; i++) - zones[i]->temp_priority = DEF_PRIORITY; + for (i = 0; zones[i] != NULL; i++) { + struct zone *zone = zones[i]; + + zone->temp_priority = DEF_PRIORITY; + lru_pages += zone->nr_active + zone->nr_inactive; + } for (priority = DEF_PRIORITY; priority >= 0; priority--) { sc.nr_mapped = read_page_state(nr_mapped); @@ -912,7 +920,7 @@ sc.nr_reclaimed = 0; sc.priority = priority; shrink_caches(zones, &sc); - shrink_slab(sc.nr_scanned, gfp_mask); + shrink_slab(sc.nr_scanned, gfp_mask, lru_pages); if (reclaim_state) { sc.nr_reclaimed += reclaim_state->reclaimed_slab; reclaim_state->reclaimed_slab = 0; @@ -997,7 +1005,7 @@ for (priority = DEF_PRIORITY; priority >= 0; priority--) { int all_zones_ok = 1; int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ - + unsigned long lru_pages = 0; if (nr_pages == 0) { /* @@ -1021,6 +1029,12 @@ end_zone = pgdat->nr_zones - 1; } scan: + for (i = 0; i <= end_zone; i++) { + struct zone *zone = pgdat->node_zones + i; + + lru_pages += zone->nr_active + zone->nr_inactive; + } + /* * Now scan the zone in the dma->highmem direction, stopping * at the last zone which needs scanning. @@ -1048,7 +1062,7 @@ sc.priority = priority; shrink_zone(zone, &sc); reclaim_state->reclaimed_slab = 0; - shrink_slab(sc.nr_scanned, GFP_KERNEL); + shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages); sc.nr_reclaimed += reclaim_state->reclaimed_slab; total_reclaimed += sc.nr_reclaimed; if (zone->all_unreclaimable) ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 5:15 ` Linus Torvalds @ 2004-08-11 5:33 ` Udo A. Steinberg 2004-08-11 14:37 ` Gene Heskett 2004-08-13 1:00 ` Udo A. Steinberg 2 siblings, 0 replies; 146+ messages in thread From: Udo A. Steinberg @ 2004-08-11 5:33 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, viro [-- Attachment #1: Type: text/plain, Size: 502 bytes --] On Tue, 10 Aug 2004 22:15:34 -0700 (PDT) Linus Torvalds (LT) wrote: LT> Udo, that's a simple thing to check. If it's the slab balancing changes, LT> then you should be able to test it with just a LT> LT> bk cset -x1.1830.4.3 LT> LT> if you have the current BK and are a BK user, or by just revertign the LT> patch here ("patch -R -p1" from inside your linux source tree) if you're LT> not a BK user.. Linus, Thanks for the patch. I'll run it for a few days and see how things turn out. -Udo. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 5:15 ` Linus Torvalds 2004-08-11 5:33 ` Udo A. Steinberg @ 2004-08-11 14:37 ` Gene Heskett 2004-08-12 1:26 ` Nick Piggin 2004-08-13 1:00 ` Udo A. Steinberg 2 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-11 14:37 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro [-- Attachment #1: Type: text/plain, Size: 1653 bytes --] On Wednesday 11 August 2004 01:15, Linus Torvalds wrote: >On Tue, 10 Aug 2004, Linus Torvalds wrote: >> So I suspect it's a balancing issue. Possibly just the slight >> change in slab balancing to fix the highmem problems. Maybe we >> shrink slab _too_ aggressively or something. > >Udo, that's a simple thing to check. If it's the slab balancing > changes, then you should be able to test it with just a > > bk cset -x1.1830.4.3 > >if you have the current BK and are a BK user, or by just revertign > the patch here ("patch -R -p1" from inside your linux source tree) > if you're not a BK user.. > > Linus > With the previously attached patch reverted, a fresh kernel builds in: real 7m18.296s user 5m49.385s sys 0m31.760s which is a marked improvement, but still about 1m30 or so slow. Is there anything in the /proc/slabinfo output I should watch carefully in hopes I can see things going to hell before they actually take the machine down? The attachment is it with only about 20 minutes uptime. I had been playing in the bios, turning off the 50% cpu throttle on overtemp, and managed to kill the bios, so I just turned it off till daylight again. This bios seriously needs a tutorial on the interactions between various timeing related things. [snip patch to revert] -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. [-- Attachment #2: slabinfo.new --] [-- Type: text/plain, Size: 11441 bytes --] slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> unix_sock 157 160 384 10 1 : tunables 54 27 0 : slabdata 16 16 0 tcp_tw_bucket 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 tcp_bind_bucket 19 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 ip_fib_hash 10 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 ip_dst_cache 7 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0 arp_cache 4 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 480 8 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 2 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 31 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 flow_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 mqueue_inode_cache 1 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 udf_inode_cache 0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0 smb_request 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 smb_inode_cache 2 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 1 11 352 11 1 : tunables 54 27 0 : slabdata 1 1 0 ext2_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 journal_handle 8 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 106 972 48 81 1 : tunables 120 60 0 : slabdata 12 12 0 revoke_table 12 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 ext3_inode_cache 23067 23067 416 9 1 : tunables 54 27 0 : slabdata 2563 2563 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 172 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 file_lock_cache 43 43 92 43 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 2 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 shmem_inode_cache 4 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 7 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0 sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0 sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0 sgpool-8 32 62 128 31 1 : tunables 120 60 0 : slabdata 2 2 0 cfq_pool 80 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 68 107 36 107 1 : tunables 120 60 0 : slabdata 1 1 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 0 0 60 65 1 : tunables 120 60 0 : slabdata 0 0 0 blkdev_ioc 61 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 12 18 448 9 1 : tunables 54 27 0 : slabdata 2 2 0 blkdev_requests 54 78 152 26 1 : tunables 120 60 0 : slabdata 3 3 0 biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0 biovec-64 256 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0 biovec-16 256 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0 biovec-4 256 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0 biovec-1 329 452 16 226 1 : tunables 120 60 0 : slabdata 2 2 0 bio 341 366 64 61 1 : tunables 120 60 0 : slabdata 6 6 0 sock_inode_cache 194 198 352 11 1 : tunables 54 27 0 : slabdata 18 18 0 skbuff_head_cache 245 375 160 25 1 : tunables 120 60 0 : slabdata 15 15 0 sock 3 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 proc_inode_cache 468 468 320 12 1 : tunables 54 27 0 : slabdata 39 39 0 sigqueue 27 27 148 27 1 : tunables 120 60 0 : slabdata 1 1 0 radix_tree_node 13342 13342 276 14 1 : tunables 54 27 0 : slabdata 953 953 0 bdev_cache 11 18 416 9 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 26 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 2184 2184 288 14 1 : tunables 54 27 0 : slabdata 156 156 0 dentry_cache 35876 35896 140 28 1 : tunables 120 60 0 : slabdata 1282 1282 0 filp 1930 2050 160 25 1 : tunables 120 60 0 : slabdata 82 82 0 names_cache 19 19 4096 1 1 : tunables 24 12 0 : slabdata 19 19 0 idr_layer_cache 81 87 136 29 1 : tunables 120 60 0 : slabdata 3 3 0 buffer_head 37260 37260 48 81 1 : tunables 120 60 0 : slabdata 460 460 0 mm_struct 84 84 512 7 1 : tunables 54 27 0 : slabdata 12 12 0 vm_area_struct 7079 7332 84 47 1 : tunables 120 60 0 : slabdata 156 156 0 fs_cache 93 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 81 81 416 9 1 : tunables 54 27 0 : slabdata 9 9 0 signal_cache 112 123 96 41 1 : tunables 120 60 0 : slabdata 3 3 0 sighand_cache 93 93 1312 3 1 : tunables 24 12 0 : slabdata 31 31 0 task_struct 100 100 1424 5 2 : tunables 24 12 0 : slabdata 20 20 0 anon_vma 1534 2035 8 407 1 : tunables 120 60 0 : slabdata 5 5 0 pgd 80 80 4096 1 1 : tunables 24 12 0 : slabdata 80 80 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 6 6 16384 1 4 : tunables 8 4 0 : slabdata 6 6 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 10 10 8192 1 2 : tunables 8 4 0 : slabdata 10 10 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 181 181 4096 1 1 : tunables 24 12 0 : slabdata 181 181 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 178 192 2048 2 1 : tunables 24 12 0 : slabdata 96 96 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 116 116 1024 4 1 : tunables 54 27 0 : slabdata 29 29 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 184 448 512 8 1 : tunables 54 27 0 : slabdata 56 56 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 180 420 256 15 1 : tunables 120 60 0 : slabdata 28 28 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0 size-192 100 100 192 20 1 : tunables 120 60 0 : slabdata 5 5 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 1157 1209 128 31 1 : tunables 120 60 0 : slabdata 39 39 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 606 610 64 61 1 : tunables 120 60 0 : slabdata 10 10 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 1369 1428 32 119 1 : tunables 120 60 0 : slabdata 12 12 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 14:37 ` Gene Heskett @ 2004-08-12 1:26 ` Nick Piggin 2004-08-12 2:23 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Nick Piggin @ 2004-08-12 1:26 UTC (permalink / raw) To: gene.heskett Cc: linux-kernel, Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro Gene Heskett wrote: >On Wednesday 11 August 2004 01:15, Linus Torvalds wrote: > >>On Tue, 10 Aug 2004, Linus Torvalds wrote: >> >>>So I suspect it's a balancing issue. Possibly just the slight >>>change in slab balancing to fix the highmem problems. Maybe we >>>shrink slab _too_ aggressively or something. >>> >>Udo, that's a simple thing to check. If it's the slab balancing >>changes, then you should be able to test it with just a >> >> bk cset -x1.1830.4.3 >> >>if you have the current BK and are a BK user, or by just revertign >>the patch here ("patch -R -p1" from inside your linux source tree) >>if you're not a BK user.. >> >> Linus >> >> >With the previously attached patch reverted, a fresh kernel builds in: >real 7m18.296s >user 5m49.385s >sys 0m31.760s >which is a marked improvement, but still about 1m30 or so slow. > > This could easily be from too much slab pressure. How much memory do you have? Have you got highmem turned on? The new slab pressure calculation is an improvement in that it won't let slab get out of control and cause OOMs, however it can shrink the slab too much. If you regularly need ZONE_DMA pages, for example. AFAIKS there isn't much you can do about this except go to per-zone slab LRUs. That said, your stability problems should be resolved first. If they are fixed, and you would like to help track down the slowdown, run the kernel compile about 3 times each with and without the patch, and save cat /proc/vmstat before and after each compile. Try to keep all else constant. Thanks Nick ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-12 1:26 ` Nick Piggin @ 2004-08-12 2:23 ` Gene Heskett 2004-08-12 2:36 ` Nick Piggin 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-12 2:23 UTC (permalink / raw) To: linux-kernel Cc: Nick Piggin, Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro On Wednesday 11 August 2004 21:26, Nick Piggin wrote: >Gene Heskett wrote: >>On Wednesday 11 August 2004 01:15, Linus Torvalds wrote: >>>On Tue, 10 Aug 2004, Linus Torvalds wrote: >>>>So I suspect it's a balancing issue. Possibly just the slight >>>>change in slab balancing to fix the highmem problems. Maybe we >>>>shrink slab _too_ aggressively or something. >>> >>>Udo, that's a simple thing to check. If it's the slab balancing >>>changes, then you should be able to test it with just a >>> >>> bk cset -x1.1830.4.3 >>> >>>if you have the current BK and are a BK user, or by just revertign >>>the patch here ("patch -R -p1" from inside your linux source tree) >>>if you're not a BK user.. >>> >>> Linus >> >>With the previously attached patch reverted, a fresh kernel builds >> in: real 7m18.296s >>user 5m49.385s >>sys 0m31.760s >>which is a marked improvement, but still about 1m30 or so slow. > >This could easily be from too much slab pressure. How much memory do > you have? 1 Gb in 2 512Mb sticks of DDR400 ram which signs on in the bios as dual channel. The sticks are in the first and third slots as recommended by the mobo docs. >Have you got highmem turned on? Yes >The new slab pressure calculation is an improvement in that it won't > let slab >get out of control and cause OOMs, however it can shrink the slab > too much. If you regularly need ZONE_DMA pages, for example. AFAIKS > there isn't much you >can do about this except go to per-zone slab LRUs. And how would an otherwise clueless user like me determine that? >That said, your stability problems should be resolved first. If they > are fixed, Which as yet is an unknown, Nick. Uptime now at 22:15:14 up 12:30, 5 users, load average: 1.03, 1.11, 1.05 >and you would like to help track down the slowdown, run the kernel >compile about >3 times each with and without the patch, and save cat /proc/vmstat >before and >after each compile. Try to keep all else constant. I'll try that if I get to a 30+ hour uptime. >Thanks >Nick -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-12 2:23 ` Gene Heskett @ 2004-08-12 2:36 ` Nick Piggin 0 siblings, 0 replies; 146+ messages in thread From: Nick Piggin @ 2004-08-12 2:36 UTC (permalink / raw) To: gene.heskett Cc: linux-kernel, Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro Gene Heskett wrote: >On Wednesday 11 August 2004 21:26, Nick Piggin wrote: > >>This could easily be from too much slab pressure. How much memory do >>you have? >> > >1 Gb in 2 512Mb sticks of DDR400 ram which signs on in the bios as >dual channel. The sticks are in the first and third slots as >recommended by the mobo docs. > > >>Have you got highmem turned on? >> >Yes > > OK, leave it on until you sort out the stability problem. When we look at performance problem, we'll probably start with highmem off. I'll try to reproduce it here, but my highmem system has 4GB and is allergic to mem= >>The new slab pressure calculation is an improvement in that it won't >>let slab >>get out of control and cause OOMs, however it can shrink the slab >>too much. If you regularly need ZONE_DMA pages, for example. AFAIKS >>there isn't much you >>can do about this except go to per-zone slab LRUs. >> > >And how would an otherwise clueless user like me determine that? > > We can look at deltas on some /proc/vmstat fields like pgpgin, slab_scanned, pgalloc_dma etc. before and after the kbuild. Look at those deltas before and after Linus' patch, and see if we can work out what is going on. >>That said, your stability problems should be resolved first. If they >>are fixed, >> > >Which as yet is an unknown, Nick. Uptime now at > 22:15:14 up 12:30, 5 users, load average: 1.03, 1.11, 1.05 > > >>and you would like to help track down the slowdown, run the kernel >>compile about >>3 times each with and without the patch, and save cat /proc/vmstat >>before and >>after each compile. Try to keep all else constant. >> > >I'll try that if I get to a 30+ hour uptime. > Well make sure it is stable first, then get back to me when you're ready to tackle the performance problem. Thanks. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 5:15 ` Linus Torvalds 2004-08-11 5:33 ` Udo A. Steinberg 2004-08-11 14:37 ` Gene Heskett @ 2004-08-13 1:00 ` Udo A. Steinberg 2004-08-13 1:31 ` Linus Torvalds 2 siblings, 1 reply; 146+ messages in thread From: Udo A. Steinberg @ 2004-08-13 1:00 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, viro, Nick Piggin [-- Attachment #1: Type: text/plain, Size: 20266 bytes --] On Tue, 10 Aug 2004 22:15:34 -0700 (PDT) Linus Torvalds (LT) wrote: LT> > So I suspect it's a balancing issue. Possibly just the slight change in LT> > slab balancing to fix the highmem problems. Maybe we shrink slab _too_ LT> > aggressively or something. LT> LT> Udo, that's a simple thing to check. If it's the slab balancing changes, LT> then you should be able to test it with just a LT> LT> bk cset -x1.1830.4.3 Linus, After nearly 2 days of running 2.6.8-rc4 with above patch backed out, the machine has gone back into heavy swapping, being rather unresponsive for several minutes. At that time the only bigger applications running were X and my mailer, as can be seen in the ps output below. -Udo. MemTotal: 125124 kB MemFree: 1812 kB Buffers: 216 kB Cached: 2796 kB SwapCached: 10024 kB Active: 8352 kB Inactive: 4800 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 125124 kB LowFree: 1812 kB SwapTotal: 506512 kB SwapFree: 457116 kB Dirty: 0 kB Writeback: 904 kB Mapped: 8924 kB Slab: 107936 kB Committed_AS: 53088 kB PageTables: 500 kB VmallocTotal: 909268 kB VmallocUsed: 8936 kB VmallocChunk: 900312 kB slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> rpc_buffers 8 8 2048 2 1 : tunables 24 12 0 : slabdata 4 4 0 rpc_tasks 8 25 160 25 1 : tunables 120 60 0 : slabdata 1 1 0 rpc_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 xfrm6_tunnel_spi 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 fib6_nodes 5 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 ip6_dst_cache 5 18 224 18 1 : tunables 120 60 0 : slabdata 1 1 0 ndisc_cache 1 25 160 25 1 : tunables 120 60 0 : slabdata 1 1 0 raw6_sock 0 0 640 6 1 : tunables 54 27 0 : slabdata 0 0 0 udp6_sock 0 0 608 6 1 : tunables 54 27 0 : slabdata 0 0 0 tcp6_sock 6 7 1120 7 2 : tunables 24 12 0 : slabdata 1 1 0 unix_sock 31 40 384 10 1 : tunables 54 27 0 : slabdata 4 4 0 ip_conntrack 1 25 160 25 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_tw_bucket 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 tcp_bind_bucket 5 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 secpath_cache 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 xfrm_dst_cache 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 ip_fib_hash 9 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 ip_dst_cache 65 75 256 15 1 : tunables 120 60 0 : slabdata 5 5 0 arp_cache 1 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 480 8 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 1 7 512 7 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 8 8 1024 4 1 : tunables 54 27 0 : slabdata 2 2 0 flow_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uhci_urb_priv 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0 ntfs_big_inode_cache 0 0 448 9 1 : tunables 54 27 0 : slabdata 0 0 0 ntfs_inode_cache 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 ntfs_name_cache 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 ntfs_attr_ctx_cache 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 ntfs_index_ctx_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 smb_request 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 smb_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 nfs_write_data 36 36 448 9 1 : tunables 54 27 0 : slabdata 4 4 0 nfs_read_data 32 36 416 9 1 : tunables 54 27 0 : slabdata 4 4 0 nfs_inode_cache 0 0 544 7 1 : tunables 54 27 0 : slabdata 0 0 0 nfs_page 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0 ext2_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 ext2_xattr 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0 journal_handle 4 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 36 324 48 81 1 : tunables 120 60 0 : slabdata 4 4 0 revoke_table 12 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 1 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 ext3_inode_cache 284 549 448 9 1 : tunables 54 27 0 : slabdata 61 61 0 ext3_xattr 0 0 44 88 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 0 0 20 185 1 : tunables 120 60 0 : slabdata 0 0 0 file_lock_cache 1 43 92 43 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 2 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 shmem_inode_cache 5 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 2 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 cfq_pool 64 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 130 195 60 65 1 : tunables 120 60 0 : slabdata 3 3 0 blkdev_ioc 34 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 9 9 448 9 1 : tunables 54 27 0 : slabdata 1 1 0 blkdev_requests 122 182 152 26 1 : tunables 120 60 0 : slabdata 7 7 0 biovec-(256) 60 60 3072 2 2 : tunables 24 12 0 : slabdata 30 30 0 biovec-128 121 125 1536 5 2 : tunables 24 12 0 : slabdata 25 25 0 biovec-64 250 250 768 5 1 : tunables 54 27 0 : slabdata 50 50 0 biovec-16 251 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0 biovec-4 249 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0 biovec-1 512 678 16 226 1 : tunables 120 60 0 : slabdata 3 3 0 bio 510 1281 64 61 1 : tunables 120 60 0 : slabdata 21 21 0 sock_inode_cache 51 66 352 11 1 : tunables 54 27 0 : slabdata 6 6 0 skbuff_head_cache 600 740 192 20 1 : tunables 120 60 0 : slabdata 37 37 0 sock 4 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 proc_inode_cache 52 288 320 12 1 : tunables 54 27 0 : slabdata 24 24 0 sigqueue 27 27 148 27 1 : tunables 120 60 0 : slabdata 1 1 0 radix_tree_node 419 1078 276 14 1 : tunables 54 27 0 : slabdata 77 77 0 bdev_cache 10 18 416 9 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 23 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 1056 1078 288 14 1 : tunables 54 27 0 : slabdata 77 77 0 dentry_cache 1413 2436 140 28 1 : tunables 120 60 0 : slabdata 87 87 0 filp 441 750 160 25 1 : tunables 120 60 0 : slabdata 30 30 0 names_cache 13 13 4096 1 1 : tunables 24 12 0 : slabdata 13 13 0 idr_layer_cache 82 87 136 29 1 : tunables 120 60 0 : slabdata 3 3 0 buffer_head 152 891 48 81 1 : tunables 120 60 0 : slabdata 11 11 0 mm_struct 37 56 512 7 1 : tunables 54 27 0 : slabdata 8 8 0 vm_area_struct 929 1598 84 47 1 : tunables 120 60 0 : slabdata 34 34 0 fs_cache 38 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 37 72 416 9 1 : tunables 54 27 0 : slabdata 8 8 0 signal_cache 60 123 96 41 1 : tunables 120 60 0 : slabdata 3 3 0 sighand_cache 53 60 1312 3 1 : tunables 24 12 0 : slabdata 20 20 0 task_struct 60 80 1424 5 2 : tunables 24 12 0 : slabdata 16 16 0 anon_vma 455 814 8 407 1 : tunables 120 60 0 : slabdata 2 2 0 pgd 37 37 4096 1 1 : tunables 24 12 0 : slabdata 37 37 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 28 28 32768 1 8 : tunables 8 4 0 : slabdata 28 28 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 3 3 16384 1 4 : tunables 8 4 0 : slabdata 3 3 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 61 61 8192 1 2 : tunables 8 4 0 : slabdata 61 61 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 34 34 4096 1 1 : tunables 24 12 0 : slabdata 34 34 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 246 252 1024 4 1 : tunables 54 27 0 : slabdata 63 63 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 380 568 512 8 1 : tunables 54 27 0 : slabdata 71 71 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 335 480 256 15 1 : tunables 120 60 0 : slabdata 32 32 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0 size-192 220 220 192 20 1 : tunables 120 60 0 : slabdata 11 11 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 323 372 128 31 1 : tunables 120 60 0 : slabdata 12 12 0 size-96(DMA) 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 size-96 1629 1681 96 41 1 : tunables 120 60 0 : slabdata 41 41 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 1554463 1554463 64 61 1 : tunables 120 60 0 : slabdata 25483 25483 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 2618 2737 32 119 1 : tunables 120 60 0 : slabdata 23 23 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 nr_dirty 0 nr_writeback 0 nr_unstable 0 nr_page_table_pages 125 nr_mapped 2558 nr_slab 26996 pgpgin 6850970 pgpgout 2663539 pswpin 346485 pswpout 178555 pgalloc_high 0 pgalloc_normal 54477318 pgalloc_dma 7478366 pgfree 61956118 pgactivate 696231 pgdeactivate 1111221 pgfault 13656600 pgmajfault 138898 pgrefill_high 0 pgrefill_normal 11740872 pgrefill_dma 1307467 pgsteal_high 0 pgsteal_normal 1842048 pgsteal_dma 297724 pgscan_kswapd_high 0 pgscan_kswapd_normal 5756289 pgscan_kswapd_dma 1211228 pgscan_direct_high 0 pgscan_direct_normal 886314 pgscan_direct_dma 151569 pginodesteal 68 slabs_scanned 1428023 kswapd_steal 1655861 kswapd_inodesteal 805 pageoutrun 15255 allocstall 9684 pgrotated 177966 PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 1 ? S 0:01 319 446 33 52 0.0 init [4] 2 ? SN 0:00 0 0 0 0 0.0 [ksoftirqd/0] 3 ? S< 0:00 0 0 0 0 0.0 [events/0] 4 ? S< 0:00 0 0 0 0 0.0 [khelper] 5 ? S< 0:02 0 0 0 0 0.0 [kacpid] 22 ? S< 0:01 0 0 0 0 0.0 [kblockd/0] 23 ? S 0:00 0 0 0 0 0.0 [khubd] 40 ? S< 0:00 0 0 0 0 0.0 [aio/0] 39 ? S 0:12 0 0 0 0 0.0 [kswapd0] 142 ? S 0:00 0 0 0 0 0.0 [pccardd] 144 ? S 0:00 0 0 0 0 0.0 [pccardd] 152 ? S 0:00 0 0 0 0 0.0 [kseriod] 171 ? S 0:00 0 0 0 0 0.0 [kjournald] 330 ? S 0:00 0 0 0 0 0.0 [kjournald] 331 ? S< 1:31 0 0 0 0 0.0 [loop0] 332 ? S 0:00 0 0 0 0 0.0 [kjournald] 333 ? S 0:00 0 0 0 0 0.0 [kjournald] 334 ? S 0:00 0 0 0 0 0.0 [kjournald] 335 ? S 0:00 0 0 0 0 0.0 [kjournald] 497 ? Ss 0:00 374 23 1384 80 0.0 /usr/sbin/syslogd -m 0 511 ? Ss 0:00 27 18 1325 0 0.0 /usr/sbin/klogd -c 3 -x 514 ? Ss 0:00 7 39 1444 0 0.0 /sbin/cardmgr 857 ? Ss 0:00 7 20 1459 0 0.0 /sbin/rpc.portmap 898 ? Ss 0:00 7 19 1360 0 0.0 /usr/sbin/inetd 904 ? Ss 0:00 7 899 3188 0 0.0 /usr/local/sbin/sshd 914 ? S 0:00 273 11 1448 184 0.1 /usr/sbin/crond -l10 917 ? Ss 0:00 6 15 1332 0 0.0 /usr/sbin/acpid 930 ? Ss 0:00 93 59 1324 60 0.0 /usr/sbin/gpm -m /dev/mouse -t ps2 933 ? S 0:00 168 158 1529 44 0.0 /usr/sbin/smartd 948 tty2 Ss+ 0:00 7 11 1324 0 0.0 /sbin/agetty 38400 tty2 linux 949 tty3 Ss+ 0:00 7 11 1324 0 0.0 /sbin/agetty 38400 tty3 linux 950 tty4 Ss+ 0:00 7 11 1324 0 0.0 /sbin/agetty 38400 tty4 linux 951 tty5 Ss+ 0:00 7 11 1324 0 0.0 /sbin/agetty 38400 tty5 linux 952 tty6 Ss+ 0:00 7 11 1324 0 0.0 /sbin/agetty 38400 tty6 linux 953 tty7 Ss+ 0:00 6 11 1324 0 0.0 /sbin/agetty 38400 tty7 linux 955 tty8 Ss+ 0:00 6 11 1324 0 0.0 /sbin/agetty 38400 tty8 linux 959 tty9 Ss+ 0:00 6 11 1324 0 0.0 /sbin/agetty 38400 tty9 linux 962 tty10 Ss+ 0:00 5 11 1324 0 0.0 /sbin/agetty 38400 tty10 linux 964 ? Ss 0:00 10 98 3069 0 0.0 /usr/X11R6/bin/xdm -nodaemon 1081 ? S 43:25 25126 1505 39590 1712 1.3 /usr/X11R6/bin/X -auth /usr/X11R6/lib/X11/xdm/authdir/authfiles/A:0-dhgq55 1082 ? S 0:00 9 98 3517 0 0.0 -:0 1159 ? S 0:00 0 0 0 0 0.0 [eth1] 1222 ? Ss 0:01 217 33 1338 140 0.1 /sbin/dhcpcd -d eth1 1229 ? S 0:04 2325 313 3858 476 0.3 blackbox 1258 ? S 10:40 245 38 2449 276 0.2 /home/uas/bin/wmbatteries 1260 ? S 0:06 87 21 2390 248 0.1 /home/uas/bin/wmcpuload -a -n -lc rgb:ff/ff/33 1262 ? S 4:16 84 31 2392 232 0.1 /home/uas/bin/wmnetload -n eth1 1264 ? S 0:04 56 19 2388 156 0.1 /home/uas/bin/wmmemload -am -b -c -lc rgb:ff/80/30 1266 ? S 0:24 104 28 2391 224 0.1 /home/uas/bin/wmtime -lc rgb:33/33/ff 1270 ? Ds 0:00 406 11 2252 220 0.1 /home/uas/bin/root-tail -f -g 350x10+5-10 -fn -schumacher-clean-medium-r-*-*-10-*-*-*-*-*-*-* -color rgb:cc/cc/ff /var/log/messages rgb:88/88/ff /var/log/syslog rgb:ff/88/ff /var/log/maillog 1294 tty1 Ss+ 0:00 5 11 1324 0 0.0 /sbin/agetty 38400 tty1 linux 4093 ? S 0:01 0 0 0 0 0.0 [pdflush] 4707 ? S 0:00 0 0 0 0 0.0 [pdflush] 5222 ? Ss 0:00 429 98 3733 568 0.4 /home/uas/bin/aterm -geometry 80x25 5223 pts/0 Ss 0:00 313 586 2089 632 0.5 -bash 5449 ? Ds 0:15 21414 1403 39548 5784 4.6 sylpheed 5501 ? D 0:00 14 11 1452 444 0.3 /usr/sbin/crond -l10 5502 pts/0 R+ 0:00 16 60 2087 652 0.5 ps axv [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-13 1:00 ` Udo A. Steinberg @ 2004-08-13 1:31 ` Linus Torvalds 2004-08-13 2:03 ` Gene Heskett ` (3 more replies) 0 siblings, 4 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-13 1:31 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: linux-kernel, Andrew Morton, viro, Nick Piggin On Thu, 12 Aug 2004, Udo A. Steinberg wrote: > > After nearly 2 days of running 2.6.8-rc4 with above patch backed out, the > machine has gone back into heavy swapping, being rather unresponsive for > several minutes. At that time the only bigger applications running were > X and my mailer, as can be seen in the ps output below. Your slab usage seems to be: cumulative usage name ========= ====== ==== ..... 2,021,428 151,552 pgd 2,182,804 161,376 size-96 2,367,124 184,320 biovec-(256) 2,559,124 192,000 biovec-128 2,751,124 192,000 biovec-64 2,997,076 245,952 ext3_inode_cache 3,255,124 258,048 size-1024 3,545,940 290,816 size-512 3,843,468 297,528 radix_tree_node 4,153,932 310,464 inode_cache 4,494,972 341,040 dentry_cache 4,994,684 499,712 size-8192 5,912,188 917,504 size-32768 105,397,820 99,485,632 size-64 Something pretty much stands out. What the _heck_ is doing 64-byte allocations and leaking them? Can you figure out what triggers it for you? If nothing obvious comes to mind, could you do something really silly like this --- 1.141/mm/slab.c 2004-07-11 01:52:48 -07:00 +++ edited/mm/slab.c 2004-08-12 18:30:00 -07:00 @@ -2360,6 +2360,11 @@ */ BUG_ON(csizep->cs_cachep == NULL); #endif + if (csizep->cs_size == 64) { + static unsigned count; + if (!(4095 & ++count)) + dump_stack(); + } return __cache_alloc(flags & GFP_DMA ? csizep->cs_dmacachep : csizep->cs_cachep, flags); } (totally whitespace-damaged) which should just print out a stack dump every four thoudand allocations, which should give a good clue (somebody else might also be allocating those 64-byte things, but _likely_ we'll see at least one of the leakers.. Maybe.) Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-13 1:31 ` Linus Torvalds @ 2004-08-13 2:03 ` Gene Heskett 2004-08-13 2:27 ` Andreas Dilger ` (2 subsequent siblings) 3 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-13 2:03 UTC (permalink / raw) To: linux-kernel Cc: Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro, Nick Piggin On Thursday 12 August 2004 21:31, Linus Torvalds wrote: >On Thu, 12 Aug 2004, Udo A. Steinberg wrote: >> After nearly 2 days of running 2.6.8-rc4 with above patch backed >> out, the machine has gone back into heavy swapping, being rather >> unresponsive for several minutes. At that time the only bigger >> applications running were X and my mailer, as can be seen in the >> ps output below. > >Your slab usage seems to be: > > cumulative usage name > ========= ====== ==== > ..... > 2,021,428 151,552 pgd > 2,182,804 161,376 size-96 > 2,367,124 184,320 biovec-(256) > 2,559,124 192,000 biovec-128 > 2,751,124 192,000 biovec-64 > 2,997,076 245,952 ext3_inode_cache > 3,255,124 258,048 size-1024 > 3,545,940 290,816 size-512 > 3,843,468 297,528 radix_tree_node > 4,153,932 310,464 inode_cache > 4,494,972 341,040 dentry_cache > 4,994,684 499,712 size-8192 > 5,912,188 917,504 size-32768 > 105,397,820 99,485,632 size-64 > >Something pretty much stands out. > >What the _heck_ is doing 64-byte allocations and leaking them? > >Can you figure out what triggers it for you? If nothing obvious > comes to mind, could you do something really silly like this > > --- 1.141/mm/slab.c 2004-07-11 01:52:48 -07:00 > +++ edited/mm/slab.c 2004-08-12 18:30:00 -07:00 > @@ -2360,6 +2360,11 @@ > */ > BUG_ON(csizep->cs_cachep == NULL); > #endif > + if (csizep->cs_size == 64) { > + static unsigned count; > + if (!(4095 & ++count)) > + dump_stack(); > + } > return __cache_alloc(flags & GFP_DMA ? > csizep->cs_dmacachep : csizep->cs_cachep, > flags); } > >(totally whitespace-damaged) which should just print out a stack > dump every four thoudand allocations, which should give a good clue > (somebody else might also be allocating those 64-byte things, but > _likely_ we'll see at least one of the leakers.. Maybe.) > > > Linus I'm not seeing any of that here Linus. Let me snip just the alloc sizes of my current, up a bit over 24 hours, /proc/slabinfo size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 7 7 16384 1 4 : tunables 8 4 0 : slabdata 7 7 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 10 10 8192 1 2 : tunables 8 4 0 : slabdata 10 10 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 182 182 4096 1 1 : tunables 24 12 0 : slabdata 182 182 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 170 198 2048 2 1 : tunables 24 12 0 : slabdata 99 99 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 124 124 1024 4 1 : tunables 54 27 0 : slabdata 31 31 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 184 448 512 8 1 : tunables 54 27 0 : slabdata 56 56 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 165 450 256 15 1 : tunables 120 60 0 : slabdata 30 30 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0 size-192 100 100 192 20 1 : tunables 120 60 0 : slabdata 5 5 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 1205 1271 128 31 1 : tunables 120 60 0 : slabdata 41 41 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 1850 2745 64 61 1 : tunables 120 60 0 : slabdata 45 45 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 1369 1428 32 119 1 : tunables 120 60 0 : slabdata 12 12 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 And I'm doing my usual activities here, browseing the web with mozilla, handling the email with kmail from kde3.3-beta2, and watching a little tv with tvtime or playing solitaire. So far (knock on wood) its purring right along at about 80% of what its normal speed would be, nothing unusual in the logs or in the top display. And it did 8 seti units in the last 24 before its 6am update, part of which was on the unpatched kernel. 5 since 6am, its 22:00 here now. Apparently Udo has something running I don't, and its a leaker. Lets hope your snooper patch will show it. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-13 1:31 ` Linus Torvalds 2004-08-13 2:03 ` Gene Heskett @ 2004-08-13 2:27 ` Andreas Dilger 2004-08-13 3:33 ` Linus Torvalds 2004-08-20 7:02 ` Udo A. Steinberg 2004-09-12 7:03 ` Udo A. Steinberg 3 siblings, 1 reply; 146+ messages in thread From: Andreas Dilger @ 2004-08-13 2:27 UTC (permalink / raw) To: Linus Torvalds Cc: Udo A. Steinberg, linux-kernel, Andrew Morton, viro, Nick Piggin [-- Attachment #1: Type: text/plain, Size: 1404 bytes --] On Aug 12, 2004 18:31 -0700, Linus Torvalds wrote: > Can you figure out what triggers it for you? If nothing obvious comes to > mind, could you do something really silly like this > > --- 1.141/mm/slab.c 2004-07-11 01:52:48 -07:00 > +++ edited/mm/slab.c 2004-08-12 18:30:00 -07:00 > @@ -2360,6 +2360,11 @@ > */ > BUG_ON(csizep->cs_cachep == NULL); > #endif > + if (csizep->cs_size == 64) { > + static unsigned count; > + if (!(4095 & ++count)) > + dump_stack(); > + } > return __cache_alloc(flags & GFP_DMA ? > csizep->cs_dmacachep : csizep->cs_cachep, flags); > } I don't know who suggested it first, but someone on l-k had a similar problem and a more robust method of finding the offender was to dump_stack() when the slab was grown instead of for each allocation. That way you don't see frequent but harmless allocators that don't leak, but rather the process that is causing the slab to be grown repeatedly. So putting something like the above in cache_alloc_refill() is probably the right thing. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://members.shaw.ca/adilger/ http://members.shaw.ca/golinux/ [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-13 2:27 ` Andreas Dilger @ 2004-08-13 3:33 ` Linus Torvalds 0 siblings, 0 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-13 3:33 UTC (permalink / raw) To: Andreas Dilger Cc: Udo A. Steinberg, linux-kernel, Andrew Morton, viro, Nick Piggin On Thu, 12 Aug 2004, Andreas Dilger wrote: > > So putting something like the above in cache_alloc_refill() is probably > the right thing. Yes, that sounds about right. Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-13 1:31 ` Linus Torvalds 2004-08-13 2:03 ` Gene Heskett 2004-08-13 2:27 ` Andreas Dilger @ 2004-08-20 7:02 ` Udo A. Steinberg 2004-08-20 7:11 ` Andrew Morton 2004-09-12 7:03 ` Udo A. Steinberg 3 siblings, 1 reply; 146+ messages in thread From: Udo A. Steinberg @ 2004-08-20 7:02 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, viro, Nick Piggin [-- Attachment #1: Type: text/plain, Size: 3623 bytes --] On Thu, 12 Aug 2004 18:31:31 -0700 (PDT) Linus Torvalds (LT) wrote: LT> Your slab usage seems to be: LT> LT> cumulative usage name LT> ========= ====== ==== LT> ..... LT> 2,021,428 151,552 pgd LT> 2,182,804 161,376 size-96 LT> 2,367,124 184,320 biovec-(256) LT> 2,559,124 192,000 biovec-128 LT> 2,751,124 192,000 biovec-64 LT> 2,997,076 245,952 ext3_inode_cache LT> 3,255,124 258,048 size-1024 LT> 3,545,940 290,816 size-512 LT> 3,843,468 297,528 radix_tree_node LT> 4,153,932 310,464 inode_cache LT> 4,494,972 341,040 dentry_cache LT> 4,994,684 499,712 size-8192 LT> 5,912,188 917,504 size-32768 LT> 105,397,820 99,485,632 size-64 LT> LT> Something pretty much stands out. LT> LT> What the _heck_ is doing 64-byte allocations and leaking them? LT> LT> Can you figure out what triggers it for you? If nothing obvious comes to LT> mind, could you do something really silly like this [...] Linus, So far I have had serious trouble reproducing the slab misbehaviour quoted above. However, I've just come across what appears to be a serious VM or USB problem which may or may not be related to that, and I can reproduce it. I've tried to download 700 MB of data from a digital camera via USB using "gphoto2 --get-all-files" and I can repeatedly run my 128 MB box out of memory using either Linux 2.4.26 or 2.6.8.1 for that. 2.4.26 fails with Aug 19 23:02:05 laptop kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) Aug 19 23:02:05 laptop kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0) Aug 19 23:02:05 laptop kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0) 2.6.8.1 fails with Aug 19 21:27:41 laptop kernel: usb 1-1: usbfs: interface 0 claimed while 'gphoto2' sets config #1 Aug 19 21:46:43 laptop kernel: oom-killer: gfp_mask=0x1d2 Aug 19 21:46:43 laptop kernel: DMA per-cpu: Aug 19 21:46:43 laptop kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 19 21:46:43 laptop kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 19 21:46:43 laptop kernel: Normal per-cpu: Aug 19 21:46:43 laptop kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 19 21:46:43 laptop kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 19 21:46:43 laptop kernel: HighMem per-cpu: empty Aug 19 21:46:43 laptop kernel: Aug 19 21:46:43 laptop kernel: Free pages: 1324kB (0kB HighMem) Aug 19 21:46:43 laptop kernel: Active:1315 inactive:27343 dirty:0 writeback:0 unstable:0 free:331 slab:1606 mapped:1555 pagetables:241 Aug 19 21:46:43 laptop kernel: DMA free:704kB min:44kB low:88kB high:132kB active:0kB inactive:10720kB present:16384kB Aug 19 21:46:43 laptop kernel: protections[]: 22 178 178 Aug 19 21:46:43 laptop kernel: Normal free:620kB min:312kB low:624kB high:936kB active:5260kB inactive:98652kB present:114688kB Aug 19 21:46:43 laptop kernel: protections[]: 0 156 156 Aug 19 21:46:43 laptop kernel: HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB Aug 19 21:46:43 laptop kernel: protections[]: 0 0 0 Aug 19 21:46:43 laptop kernel: DMA: 0*4kB 2*8kB 5*16kB 5*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 704kB Aug 19 21:46:43 laptop kernel: Normal: 1*4kB 3*8kB 3*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 620kB Aug 19 21:46:43 laptop kernel: HighMem: empty Aug 19 21:46:43 laptop kernel: Swap cache: add 366080, delete 339455, find 219744/259874, race 0+0 Aug 19 21:46:43 laptop kernel: Out of Memory: Killed process 10239 (gphoto2). -Udo. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 7:02 ` Udo A. Steinberg @ 2004-08-20 7:11 ` Andrew Morton 2004-08-20 7:19 ` Udo A. Steinberg 0 siblings, 1 reply; 146+ messages in thread From: Andrew Morton @ 2004-08-20 7:11 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: torvalds, linux-kernel, viro, nickpiggin "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote: > > I've tried to download 700 MB of data from a digital camera via USB using > "gphoto2 --get-all-files" and I can repeatedly run my 128 MB box out of > memory using either Linux 2.4.26 or 2.6.8.1 for that. whee. How much swap is online? Not that it matters - you seem to have a bunch of reclaimable pagecache just sitting there. Very odd. Could gphoto2 be using mlock? Does it run as root? ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 7:11 ` Andrew Morton @ 2004-08-20 7:19 ` Udo A. Steinberg 2004-08-20 7:49 ` Nick Piggin 0 siblings, 1 reply; 146+ messages in thread From: Udo A. Steinberg @ 2004-08-20 7:19 UTC (permalink / raw) To: Andrew Morton; +Cc: torvalds, linux-kernel, viro, nickpiggin [-- Attachment #1: Type: text/plain, Size: 691 bytes --] On Fri, 20 Aug 2004 00:11:54 -0700 Andrew Morton (AM) wrote: AM> "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote: AM> > AM> > I've tried to download 700 MB of data from a digital camera via USB using AM> > "gphoto2 --get-all-files" and I can repeatedly run my 128 MB box out of AM> > memory using either Linux 2.4.26 or 2.6.8.1 for that. AM> AM> whee. How much swap is online? Something close to 512 MB. Adding 506512k swap on /dev/hda2. Priority:-1 extents:1 AM> Not that it matters - you seem to have a bunch of reclaimable pagecache AM> just sitting there. Very odd. AM> AM> Could gphoto2 be using mlock? Does it run as root? No, gphoto2 was not running as root. -Udo. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 7:19 ` Udo A. Steinberg @ 2004-08-20 7:49 ` Nick Piggin 2004-08-24 6:08 ` Udo A. Steinberg 0 siblings, 1 reply; 146+ messages in thread From: Nick Piggin @ 2004-08-20 7:49 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: Andrew Morton, torvalds, linux-kernel, viro [-- Attachment #1: Type: text/plain, Size: 850 bytes --] Udo A. Steinberg wrote: > On Fri, 20 Aug 2004 00:11:54 -0700 Andrew Morton (AM) wrote: > > AM> "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote: > AM> > > AM> > I've tried to download 700 MB of data from a digital camera via USB using > AM> > "gphoto2 --get-all-files" and I can repeatedly run my 128 MB box out of > AM> > memory using either Linux 2.4.26 or 2.6.8.1 for that. > AM> > AM> whee. How much swap is online? > > Something close to 512 MB. > > Adding 506512k swap on /dev/hda2. Priority:-1 extents:1 > > AM> Not that it matters - you seem to have a bunch of reclaimable pagecache > AM> just sitting there. Very odd. > AM> > AM> Could gphoto2 be using mlock? Does it run as root? > > No, gphoto2 was not running as root. > > -Udo. Can you reproduce the OOM with the following patch please? Then send the output. Thanks [-- Attachment #2: vm-unreclaimable-debug.patch --] [-- Type: text/x-patch, Size: 878 bytes --] --- linux-2.6-npiggin/mm/page_alloc.c | 6 +++++- 1 files changed, 5 insertions(+), 1 deletion(-) diff -puN mm/page_alloc.c~vm-unreclaimable-debug mm/page_alloc.c --- linux-2.6/mm/page_alloc.c~vm-unreclaimable-debug 2004-08-20 17:44:45.000000000 +1000 +++ linux-2.6-npiggin/mm/page_alloc.c 2004-08-20 17:48:26.000000000 +1000 @@ -1182,6 +1182,8 @@ void show_free_areas(void) " active:%lukB" " inactive:%lukB" " present:%lukB" + " pages_scanned:%lu" + " all_unreclaimable? %s" "\n", zone->name, K(zone->free_pages), @@ -1190,7 +1192,9 @@ void show_free_areas(void) K(zone->pages_high), K(zone->nr_active), K(zone->nr_inactive), - K(zone->present_pages) + K(zone->present_pages), + zone->pages_scanned, + (zone->all_unreclaimable ? "yes" : "no") ); printk("protections[]:"); for (i = 0; i < MAX_NR_ZONES; i++) _ ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 7:49 ` Nick Piggin @ 2004-08-24 6:08 ` Udo A. Steinberg 2004-08-24 7:41 ` Nick Piggin 0 siblings, 1 reply; 146+ messages in thread From: Udo A. Steinberg @ 2004-08-24 6:08 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, torvalds, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1492 bytes --] On Fri, 20 Aug 2004 17:49:55 +1000 Nick Piggin (NP) wrote: NP> Can you reproduce the OOM with the following patch please? Then NP> send the output. I reproduced the problem using a slightly different setup to trigger the problem faster: 128 MB RAM, 188992 KB swap Here's the output of the OOM killer with your patch applied: oom-killer: gfp_mask=0x1d2 DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 Normal per-cpu: cpu 0 hot: low 14, high 42, batch 7 cpu 0 cold: low 0, high 14, batch 7 HighMem per-cpu: empty Free pages: 1316kB (0kB HighMem) Active:5281 inactive:23611 dirty:0 writeback:0 unstable:0 free:329 slab:1403 mapped:12232 pagetables:167 DMA free:712kB min:44kB low:88kB high:132kB active:5076kB inactive:5332kB present:16384kB pages_scanned:10112 all_unreclaimable? yes protections[]: 22 178 178 Normal free:604kB min:312kB low:624kB high:936kB active:16048kB inactive:89112kB present:114688kB pages_scanned:62432 all_unreclaimable? yes protections[]: 0 156 156 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 DMA: 0*4kB 3*8kB 13*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 712kB Normal: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 604kB HighMem: empty Swap cache: add 90886, delete 74524, find 4659/4974, race 0+0 Out of Memory: Killed process 1217 (gphoto2). -Udo. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-24 6:08 ` Udo A. Steinberg @ 2004-08-24 7:41 ` Nick Piggin 2004-08-24 18:20 ` Marcelo Tosatti 0 siblings, 1 reply; 146+ messages in thread From: Nick Piggin @ 2004-08-24 7:41 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: Andrew Morton, torvalds, linux-kernel Udo A. Steinberg wrote: >On Fri, 20 Aug 2004 17:49:55 +1000 Nick Piggin (NP) wrote: > >NP> Can you reproduce the OOM with the following patch please? Then >NP> send the output. > >I reproduced the problem using a slightly different setup to trigger the >problem faster: 128 MB RAM, 188992 KB swap > >Here's the output of the OOM killer with your patch applied: > >oom-killer: gfp_mask=0x1d2 >DMA per-cpu: >cpu 0 hot: low 2, high 6, batch 1 >cpu 0 cold: low 0, high 2, batch 1 >Normal per-cpu: >cpu 0 hot: low 14, high 42, batch 7 >cpu 0 cold: low 0, high 14, batch 7 >HighMem per-cpu: empty > >Free pages: 1316kB (0kB HighMem) >Active:5281 inactive:23611 dirty:0 writeback:0 unstable:0 free:329 slab:1403 mapped:12232 pagetables:167 >DMA free:712kB min:44kB low:88kB high:132kB active:5076kB inactive:5332kB present:16384kB pages_scanned:10112 all_unreclaimable? yes >protections[]: 22 178 178 >Normal free:604kB min:312kB low:624kB high:936kB active:16048kB inactive:89112kB present:114688kB pages_scanned:62432 all_unreclaimable? yes >protections[]: 0 156 156 >HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no >protections[]: 0 0 0 >DMA: 0*4kB 3*8kB 13*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 712kB >Normal: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 604kB >HighMem: empty >Swap cache: add 90886, delete 74524, find 4659/4974, race 0+0 >Out of Memory: Killed process 1217 (gphoto2). > > OK, all_unreclaimable caused the scanner to virtually stop. If all_unreclaimable gets set, it throttles the scanning of that zone right back, which in turn greatly lowers the chance that all_unreclaimable will get cleared. When we get to priority = 0 in try_to_free_pages (ie. close to OOM), it might be worth clearing each zone's all_unreclaimable for this last time 'round the loop. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-24 7:41 ` Nick Piggin @ 2004-08-24 18:20 ` Marcelo Tosatti 2004-08-24 20:00 ` Andrew Morton 0 siblings, 1 reply; 146+ messages in thread From: Marcelo Tosatti @ 2004-08-24 18:20 UTC (permalink / raw) To: Nick Piggin; +Cc: Udo A. Steinberg, Andrew Morton, torvalds, linux-kernel On Tue, Aug 24, 2004 at 05:41:07PM +1000, Nick Piggin wrote: > Udo A. Steinberg wrote: > > >On Fri, 20 Aug 2004 17:49:55 +1000 Nick Piggin (NP) wrote: > > > >NP> Can you reproduce the OOM with the following patch please? Then > >NP> send the output. > > > >I reproduced the problem using a slightly different setup to trigger the > >problem faster: 128 MB RAM, 188992 KB swap > > > >Here's the output of the OOM killer with your patch applied: > > > >oom-killer: gfp_mask=0x1d2 > >DMA per-cpu: > >cpu 0 hot: low 2, high 6, batch 1 > >cpu 0 cold: low 0, high 2, batch 1 > >Normal per-cpu: > >cpu 0 hot: low 14, high 42, batch 7 > >cpu 0 cold: low 0, high 14, batch 7 > >HighMem per-cpu: empty > > > >Free pages: 1316kB (0kB HighMem) > >Active:5281 inactive:23611 dirty:0 writeback:0 unstable:0 free:329 > >slab:1403 mapped:12232 pagetables:167 > >DMA free:712kB min:44kB low:88kB high:132kB active:5076kB inactive:5332kB > >present:16384kB pages_scanned:10112 all_unreclaimable? yes > >protections[]: 22 178 178 > >Normal free:604kB min:312kB low:624kB high:936kB active:16048kB > >inactive:89112kB present:114688kB pages_scanned:62432 all_unreclaimable? > >yes > >protections[]: 0 156 156 > >HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB > >present:0kB pages_scanned:0 all_unreclaimable? no > >protections[]: 0 0 0 > >DMA: 0*4kB 3*8kB 13*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB > >0*2048kB 0*4096kB = 712kB > >Normal: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB > >0*2048kB 0*4096kB = 604kB > >HighMem: empty > >Swap cache: add 90886, delete 74524, find 4659/4974, race 0+0 > >Out of Memory: Killed process 1217 (gphoto2). > > > > Hi Nick, > OK, all_unreclaimable caused the scanner to virtually stop. If > all_unreclaimable > gets set, it throttles the scanning of that zone right back, which in > turn greatly > lowers the chance that all_unreclaimable will get cleared. Which is the logic to stop tasks from shrink_zone()ing zones which are known to be heavily scanned by kswapd (ie zone->pages_scanned > zone->present_pages * 2). With that logic we want tasks doing direct free to blk_congestion_wait(WRITE, HZ/10) instead shrink_zone()ing (and blk_congestion_wait(WRITE, HZ/50) on __alloc_pages()). I dont fully understand the all_unreclaimable logic yet. AFAICS it was added to prevent tasks from wasting excessive CPU time on shrinking the lists. But at the same time it stops tasks from potentially throttling on IO (on shrink_list -> pageout). Is that a feature? > When we get to priority = 0 in try_to_free_pages (ie. close to OOM), it > might be > worth clearing each zone's all_unreclaimable for this last time 'round > the loop. Or ignore all_unreclaimable when priority == 0 like this? It feels hackish for me but will effectively work as cleaning all_unreclaimable on zero priority. Against 2.6.9-rc1-bktoday. Udo, one question, do you have swap space available when the OOM killer triggers ? Dont remember seeing any info wrt to that. --- mm/vmscan.c.orig 2004-08-24 16:48:09.467086840 -0300 +++ mm/vmscan.c 2004-08-24 16:51:55.304754296 -0300 @@ -878,7 +878,8 @@ if (zone->prev_priority > sc->priority) zone->prev_priority = sc->priority; - if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY) + if (zone->all_unreclaimable && + (sc->priority < DEF_PRIORITY && sc->priority > 0)) continue; /* Let kswapd poll it */ shrink_zone(zone, sc); @@ -1054,7 +1055,8 @@ for (i = 0; i <= end_zone; i++) { struct zone *zone = pgdat->node_zones + i; - if (zone->all_unreclaimable && priority != DEF_PRIORITY) + if (zone->all_unreclaimable && + (priority < DEF_PRIORITY && priority > 0)) continue; if (nr_pages == 0) { /* Not software suspend */ ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-24 18:20 ` Marcelo Tosatti @ 2004-08-24 20:00 ` Andrew Morton 2004-08-24 18:40 ` Marcelo Tosatti 0 siblings, 1 reply; 146+ messages in thread From: Andrew Morton @ 2004-08-24 20:00 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: nickpiggin, us15, torvalds, linux-kernel Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote: > > I dont fully understand the all_unreclaimable logic yet. 1) bk revtool include/linux/mmzone.h 2) double-click on declaration of all_unreclaimable 3) read changelog ;) > --- mm/vmscan.c.orig 2004-08-24 16:48:09.467086840 -0300 > +++ mm/vmscan.c 2004-08-24 16:51:55.304754296 -0300 > @@ -878,7 +878,8 @@ > if (zone->prev_priority > sc->priority) > zone->prev_priority = sc->priority; > > - if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY) > + if (zone->all_unreclaimable && > + (sc->priority < DEF_PRIORITY && sc->priority > 0)) > continue; /* Let kswapd poll it */ > > shrink_zone(zone, sc); > @@ -1054,7 +1055,8 @@ > for (i = 0; i <= end_zone; i++) { > struct zone *zone = pgdat->node_zones + i; > > - if (zone->all_unreclaimable && priority != DEF_PRIORITY) > + if (zone->all_unreclaimable && > + (priority < DEF_PRIORITY && priority > 0)) > continue; > > if (nr_pages == 0) { /* Not software suspend */ Does anyone understand _why_ all_unreclaimable is getting set? If not, it's too early to be writing patches... ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-24 20:00 ` Andrew Morton @ 2004-08-24 18:40 ` Marcelo Tosatti 2004-08-25 0:27 ` Marcelo Tosatti 0 siblings, 1 reply; 146+ messages in thread From: Marcelo Tosatti @ 2004-08-24 18:40 UTC (permalink / raw) To: Andrew Morton; +Cc: nickpiggin, us15, torvalds, linux-kernel On Tue, Aug 24, 2004 at 01:00:27PM -0700, Andrew Morton wrote: > Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote: > > > > I dont fully understand the all_unreclaimable logic yet. > > 1) bk revtool include/linux/mmzone.h > 2) double-click on declaration of all_unreclaimable > 3) read changelog ;) Will do, but my question is still unanswered, why stop IO throttling when all_unreclaimable is set? Doesnt make sense to me right now. OK, will RTFS. > > > --- mm/vmscan.c.orig 2004-08-24 16:48:09.467086840 -0300 > > +++ mm/vmscan.c 2004-08-24 16:51:55.304754296 -0300 > > @@ -878,7 +878,8 @@ > > if (zone->prev_priority > sc->priority) > > zone->prev_priority = sc->priority; > > > > - if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY) > > + if (zone->all_unreclaimable && > > + (sc->priority < DEF_PRIORITY && sc->priority > 0)) > > continue; /* Let kswapd poll it */ > > > > shrink_zone(zone, sc); > > @@ -1054,7 +1055,8 @@ > > for (i = 0; i <= end_zone; i++) { > > struct zone *zone = pgdat->node_zones + i; > > > > - if (zone->all_unreclaimable && priority != DEF_PRIORITY) > > + if (zone->all_unreclaimable && > > + (priority < DEF_PRIORITY && priority > 0)) > > continue; > > > > if (nr_pages == 0) { /* Not software suspend */ > > Does anyone understand _why_ all_unreclaimable is getting set? > > If not, it's too early to be writing patches... As I wrote down in the first email, kswapd does if (zone->pages_scanned > zone->present_pages * 2) zone->all_unreclaimable = 1; Sure, it makes perfect sense to happen when we can't unreclaim pages from the zone. Its not something hard to understand. What is your point? I suppose your question is not "_why_ all_unreclaimable is getting set?" but "maybe it should not be getting set?". Anyway, will RTFS. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-24 18:40 ` Marcelo Tosatti @ 2004-08-25 0:27 ` Marcelo Tosatti 0 siblings, 0 replies; 146+ messages in thread From: Marcelo Tosatti @ 2004-08-25 0:27 UTC (permalink / raw) To: Andrew Morton; +Cc: nickpiggin, us15, torvalds, linux-kernel [-- Attachment #1: Type: text/plain, Size: 3565 bytes --] I suppose your question is not "_why_ all_unreclaimable is getting set?" but > "maybe it should not be getting set?". Now I realize both are the same. Doh. > Anyway, will RTFS. Done some tests and I can only get zone->all_unreclaimable to be set near OOM condition, as expected. Udo, can you please confirm you are not hitting lack of swap space by applying the attached patch (which contains Nick's patch) on top of 2.6.9-rc1. I've found a different bug, however: On a 512MB box with 512MB swap running 2.6.9-rc1, the OOM kill triggers killing a task with swap space available (the task in case is quintela's fillmem). I can only make it happen after having the OOM killer trigger for real. ie: - run fillmem 1024 setting all_unreclaimable!! setting all_unreclaimable!! setting all_unreclaimable!! oom-killer: gfp_mask=0x1d2 DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 Normal per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 HighMem per-cpu: empty Free pages: 2808kB (0kB HighMem) Active:63316 inactive:62992 dirty:0 writeback:0 unstable:0 free:702 slab:1051 mapped:126279 pagetables:287 DMA free:1440kB min:20kB low:40kB high:60kB active:5428kB inactive:5076kB present:16384kB pages_scanned:8416 all_unreclaimable? yes protections[]: 10 360 360 Normal free:1368kB min:700kB low:1400kB high:2100kB active:247836kB inactive:246892kB present:507888kB pages_scanned:950688 all_unreclaimable? yes protections[]: 0 350 350 protections[]: 0 350 350 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1440kB Normal: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1368kB HighMem: empty nr_free_swap_pages: 0 Swap cache: add 131105, delete 131105, find 16/28, race 0+0 Out of Memory: Killed process 933 (fillmem). Perfect. Everything as expected. - run fillmem 800: setting all_unreclaimable!! oom-killer: gfp_mask=0x1d2 DMA per-cpu: cpu 0 hot: low 2, high 6, batch 1 cpu 0 cold: low 0, high 2, batch 1 Normal per-cpu: cpu 0 hot: low 32, high 96, batch 16 cpu 0 cold: low 0, high 32, batch 16 HighMem per-cpu: empty Free pages: 2808kB (0kB HighMem) Active:126301 inactive:17 dirty:0 writeback:0 unstable:0 free:702 slab:1024 mapped:126333 pagetables:280 DMA free:1440kB min:20kB low:40kB high:60kB active:10508kB inactive:0kB present:16384kB pages_scanned:1000 all_unreclaimable? no protections[]: 10 360 360 Normal free:1368kB min:700kB low:1400kB high:2100kB active:494696kB inactive:68kB present:507888kB pages_scanned:123 all_unreclaimable? no protections[]: 0 350 350 HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no protections[]: 0 0 0 DMA: 0*4kB 2*8kB 3*16kB 3*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1440kB Normal: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1368kB HighMem: empty nr_free_swap_pages: 12167 Swap cache: add 1320161, delete 1319682, find 291280/333316, race 0+0 Out of Memory: Killed process 1010 (fillmem). Oops. Thats really bad. Will see if I discover something while trying to understand the fine source tomorrow morning. Maybe someone can figure out whats wrong before I try to... Bed time. [-- Attachment #2: vm-reclaim2.patch --] [-- Type: text/plain, Size: 1331 bytes --] --- mm/page_alloc.c.orig 2004-08-24 20:37:53.000000000 -0300 +++ mm/page_alloc.c 2004-08-24 22:51:49.498375608 -0300 @@ -1021,11 +1021,12 @@ void show_free_areas(void) { struct page_state ps; - int cpu, temperature; + int cpu, temperature, i; unsigned long active; unsigned long inactive; unsigned long free; struct zone *zone; + unsigned int swap_pages = 0; for_each_zone(zone) { show_node(zone); @@ -1086,6 +1087,8 @@ " active:%lukB" " inactive:%lukB" " present:%lukB" + " pages_scanned:%lu" + " all_unreclaimable? %s" "\n", zone->name, K(zone->free_pages), @@ -1094,7 +1097,9 @@ K(zone->pages_high), K(zone->nr_active), K(zone->nr_inactive), - K(zone->present_pages) + K(zone->present_pages), + zone->pages_scanned, + (zone->all_unreclaimable ? "yes" : "no") ); printk("protections[]:"); for (i = 0; i < MAX_NR_ZONES; i++) @@ -1125,6 +1130,18 @@ printk("= %lukB\n", K(total)); } + swap_list_lock(); + for (i = 0; i < nr_swapfiles; i++) { + if (!(swap_info[i].flags & SWP_USED) || + (swap_info[i].flags & SWP_WRITEOK)) + continue; + swap_pages += swap_info[i].inuse_pages; + } + swap_pages += nr_swap_pages; + swap_list_unlock(); + + printk("nr_free_swap_pages: %u\n", swap_pages); + show_swap_cache_info(); } ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-13 1:31 ` Linus Torvalds ` (2 preceding siblings ...) 2004-08-20 7:02 ` Udo A. Steinberg @ 2004-09-12 7:03 ` Udo A. Steinberg 2004-09-12 7:16 ` Andrew Morton 3 siblings, 1 reply; 146+ messages in thread From: Udo A. Steinberg @ 2004-09-12 7:03 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, Len Brown [-- Attachment #1: Type: text/plain, Size: 4370 bytes --] On Thu, 12 Aug 2004 18:31:31 -0700 (PDT) Linus Torvalds (LT) wrote: LT> Your slab usage seems to be: LT> LT> cumulative usage name LT> ========= ====== ==== LT> ..... LT> 4,994,684 499,712 size-8192 LT> 5,912,188 917,504 size-32768 LT> 105,397,820 99,485,632 size-64 LT> LT> Something pretty much stands out. LT> LT> What the _heck_ is doing 64-byte allocations and leaking them? I think the offender is ACPI. I've been logging 64-byte slab allocations for a while now and this is what I came up with: The most frequent user of 64-byte allocations is: [<c013e98f>] __kmalloc+0x6f/0x80 [<c016649e>] sys_poll+0xbe/0x230 [<c0165201>] sys_ioctl+0x101/0x2a0 [<c0165940>] __pollwait+0x0/0xc0 [<c011f00c>] sys_gettimeofday+0x2c/0x70 [<c01040db>] syscall_call+0x7/0xb But that doesn't seem to leak, because I've had these happen for days before things started getting bad. However, then as slab usage went skyrocket after 3 days, I started logging these: [<c013e98f>] __kmalloc+0x6f/0x80 [<c0217af9>] acpi_os_allocate+0xa/0xb [<c022b9b6>] acpi_ut_callocate+0x30/0x7a [<c022b840>] acpi_ut_acquire_from_cache+0x9d/0xaa [<c022c7d8>] acpi_ut_create_generic_state+0xa/0x12 [<c021b0b2>] acpi_ds_result_stack_push+0x8/0x25 [<c021b268>] acpi_ds_create_walk_state+0x53/0x70 [<c0227913>] acpi_ps_delete_parse_tree+0x20/0x89 [<c0227238>] acpi_ps_parse_loop+0x550/0x7bb [<c02274f0>] acpi_ps_parse_aml+0x4d/0x1a1 [<c0219dd4>] acpi_ds_call_control_method+0xd3/0x1b3 [<c0227505>] acpi_ps_parse_aml+0x62/0x1a1 [<c0227d1f>] acpi_psx_execute+0x13b/0x194 [<c0225212>] acpi_ns_execute_control_method+0x3b/0x47 [<c02251c0>] acpi_ns_evaluate_by_handle+0x6f/0x86 [<c02250cd>] acpi_ns_evaluate_relative+0xa9/0xc3 [<c02249c3>] acpi_evaluate_object+0xf3/0x1a0 [<c0160f56>] link_path_walk+0xbe6/0xe70 [<c022f496>] acpi_battery_get_status+0x68/0x102 [<c022f9b6>] acpi_battery_read_state+0x88/0x275 [<c018124b>] proc_file_read+0xbb/0x250 [<c0152ea1>] vfs_read+0xd1/0x130 [<c0153171>] sys_read+0x41/0x70 [<c01040db>] syscall_call+0x7/0xb [<c013e98f>] __kmalloc+0x6f/0x80 [<c0217af9>] acpi_os_allocate+0xa/0xb [<c022b9b6>] acpi_ut_callocate+0x30/0x7a [<c022b840>] acpi_ut_acquire_from_cache+0x9d/0xaa [<c022c7d8>] acpi_ut_create_generic_state+0xa/0x12 [<c0227a31>] acpi_ps_push_scope+0xf/0x57 [<c0227180>] acpi_ps_parse_loop+0x498/0x7bb [<c02274f0>] acpi_ps_parse_aml+0x4d/0x1a1 [<c0227d1f>] acpi_psx_execute+0x13b/0x194 [<c0225212>] acpi_ns_execute_control_method+0x3b/0x47 [<c02251c0>] acpi_ns_evaluate_by_handle+0x6f/0x86 [<c02250cd>] acpi_ns_evaluate_relative+0xa9/0xc3 [<c02249c3>] acpi_evaluate_object+0xf3/0x1a0 [<c022370f>] acpi_hw_low_level_read+0x56/0x94 [<c0230949>] acpi_ec_gpe_query+0xd5/0xec [<c0218098>] acpi_os_execute_deferred+0xc/0x16 [<c012a43e>] worker_thread+0x1ae/0x270 [<c021808c>] acpi_os_execute_deferred+0x0/0x16 [<c0117d70>] default_wake_function+0x0/0x10 [<c0117db7>] __wake_up_common+0x37/0x70 [<c0117d70>] default_wake_function+0x0/0x10 [<c012a290>] worker_thread+0x0/0x270 [<c012e266>] kthread+0x96/0xe0 [<c012e1d0>] kthread+0x0/0xe0 [<c010229d>] kernel_thread_helper+0x5/0x18 [<c013e98f>] __kmalloc+0x6f/0x80 [<c0217af9>] acpi_os_allocate+0xa/0xb [<c022b9b6>] acpi_ut_callocate+0x30/0x7a [<c022b840>] acpi_ut_acquire_from_cache+0x9d/0xaa [<c022c7d8>] acpi_ut_create_generic_state+0xa/0x12 [<c0227a31>] acpi_ps_push_scope+0xf/0x57 [<c0227180>] acpi_ps_parse_loop+0x498/0x7bb [<c02274f0>] acpi_ps_parse_aml+0x4d/0x1a1 [<c0219dd4>] acpi_ds_call_control_method+0xd3/0x1b3 [<c0227505>] acpi_ps_parse_aml+0x62/0x1a1 [<c0227d1f>] acpi_psx_execute+0x13b/0x194 [<c0225212>] acpi_ns_execute_control_method+0x3b/0x47 [<c02251c0>] acpi_ns_evaluate_by_handle+0x6f/0x86 [<c02250cd>] acpi_ns_evaluate_relative+0xa9/0xc3 [<c02249c3>] acpi_evaluate_object+0xf3/0x1a0 [<c02185ff>] acpi_evaluate_integer+0x2d/0x4b [<c0146a17>] do_mmap_pgoff+0x537/0x710 [<c0234048>] acpi_thermal_get_temperature+0x24/0x31 [<c0234808>] acpi_thermal_temp_seq_show+0x12/0x4d [<c017125e>] seq_read+0xbe/0x280 [<c0152ea1>] vfs_read+0xd1/0x130 [<c0153171>] sys_read+0x41/0x70 [<c01040db>] syscall_call+0x7/0xb The machine is now allocating 64-byte slabs at about 20 objects per second. I'm currently running 2.6.9-rc1-bk12 here. -Udo. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-09-12 7:03 ` Udo A. Steinberg @ 2004-09-12 7:16 ` Andrew Morton 2004-09-12 7:29 ` Udo A. Steinberg 0 siblings, 1 reply; 146+ messages in thread From: Andrew Morton @ 2004-09-12 7:16 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: torvalds, linux-kernel, len.brown "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote: > > However, then as slab usage went skyrocket after 3 days, I started logging > these: > > [<c013e98f>] __kmalloc+0x6f/0x80 > [<c0217af9>] acpi_os_allocate+0xa/0xb > [<c022b9b6>] acpi_ut_callocate+0x30/0x7a > [<c022b840>] acpi_ut_acquire_from_cache+0x9d/0xaa > [<c022c7d8>] acpi_ut_create_generic_state+0xa/0x12 > [<c021b0b2>] acpi_ds_result_stack_push+0x8/0x25 > [<c021b268>] acpi_ds_create_walk_state+0x53/0x70 > [<c0227913>] acpi_ps_delete_parse_tree+0x20/0x89 > [<c0227238>] acpi_ps_parse_loop+0x550/0x7bb > [<c02274f0>] acpi_ps_parse_aml+0x4d/0x1a1 > [<c0219dd4>] acpi_ds_call_control_method+0xd3/0x1b3 > [<c0227505>] acpi_ps_parse_aml+0x62/0x1a1 > [<c0227d1f>] acpi_psx_execute+0x13b/0x194 > [<c0225212>] acpi_ns_execute_control_method+0x3b/0x47 > [<c02251c0>] acpi_ns_evaluate_by_handle+0x6f/0x86 > [<c02250cd>] acpi_ns_evaluate_relative+0xa9/0xc3 > [<c02249c3>] acpi_evaluate_object+0xf3/0x1a0 > [<c0160f56>] link_path_walk+0xbe6/0xe70 > [<c022f496>] acpi_battery_get_status+0x68/0x102 > [<c022f9b6>] acpi_battery_read_state+0x88/0x275 > [<c018124b>] proc_file_read+0xbb/0x250 > [<c0152ea1>] vfs_read+0xd1/0x130 > [<c0153171>] sys_read+0x41/0x70 > [<c01040db>] syscall_call+0x7/0xb great, thanks for working that out. Random guess: acpi_evaluate_object() is returning an error but is allocating memory anyway. In acpi_battery_get_status(): status = acpi_evaluate_object(battery->handle, "_BST", NULL, &buffer); if (ACPI_FAILURE(status)) { ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "Error evaluating _BST\n")); return_VALUE(-ENODEV); } Is that failure path being taken? ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-09-12 7:16 ` Andrew Morton @ 2004-09-12 7:29 ` Udo A. Steinberg 2004-09-12 7:48 ` Andrew Morton 0 siblings, 1 reply; 146+ messages in thread From: Udo A. Steinberg @ 2004-09-12 7:29 UTC (permalink / raw) To: Andrew Morton; +Cc: torvalds, linux-kernel, len.brown [-- Attachment #1: Type: text/plain, Size: 544 bytes --] On Sun, 12 Sep 2004 00:16:26 -0700 Andrew Morton (AM) wrote: AM> Random guess: acpi_evaluate_object() is returning an error but is AM> allocating memory anyway. AM> AM> In acpi_battery_get_status(): AM> AM> status = acpi_evaluate_object(battery->handle, "_BST", NULL, &buffer); AM> if (ACPI_FAILURE(status)) { AM> ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "Error evaluating _BST\n")); AM> return_VALUE(-ENODEV); AM> } AM> AM> Is that failure path being taken? Is there a way for me to find that out without recompiling and rebooting? -Udo. [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-09-12 7:29 ` Udo A. Steinberg @ 2004-09-12 7:48 ` Andrew Morton 2004-09-13 4:53 ` Len Brown 0 siblings, 1 reply; 146+ messages in thread From: Andrew Morton @ 2004-09-12 7:48 UTC (permalink / raw) To: Udo A. Steinberg; +Cc: torvalds, linux-kernel, len.brown "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote: > > On Sun, 12 Sep 2004 00:16:26 -0700 Andrew Morton (AM) wrote: > > AM> Random guess: acpi_evaluate_object() is returning an error but is > AM> allocating memory anyway. > AM> > AM> In acpi_battery_get_status(): > AM> > AM> status = acpi_evaluate_object(battery->handle, "_BST", NULL, &buffer); > AM> if (ACPI_FAILURE(status)) { > AM> ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "Error evaluating _BST\n")); > AM> return_VALUE(-ENODEV); > AM> } > AM> > AM> Is that failure path being taken? > > Is there a way for me to find that out without recompiling and rebooting? Not sure. Looks like you need to set CONFIG_ACPI_DEBUG and then put the right number into /proc/acpi/debug_layer. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-09-12 7:48 ` Andrew Morton @ 2004-09-13 4:53 ` Len Brown 0 siblings, 0 replies; 146+ messages in thread From: Len Brown @ 2004-09-13 4:53 UTC (permalink / raw) To: Andrew Morton Cc: Udo A. Steinberg, Linus Torvalds, linux-kernel, ACPI Developers On Sun, 2004-09-12 at 03:48, Andrew Morton wrote: > "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote: > > > > On Sun, 12 Sep 2004 00:16:26 -0700 Andrew Morton (AM) wrote: > > > > AM> Random guess: acpi_evaluate_object() is returning an error but > is > > AM> allocating memory anyway. > > AM> > > AM> In acpi_battery_get_status(): > > AM> > > AM> status = acpi_evaluate_object(battery->handle, "_BST", NULL, > &buffer); > > AM> if (ACPI_FAILURE(status)) { > > AM> ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "Error evaluating > _BST\n")); > > AM> return_VALUE(-ENODEV); > > AM> } > > AM> > > AM> Is that failure path being taken? > > > > Is there a way for me to find that out without recompiling and > rebooting? > > Looks like you need to set CONFIG_ACPI_DEBUG and then put the > right number into /proc/acpi/debug_layer. For the battery module: # echo 0x00040000 > /proc/acpi/debug_layer and then to turn on everything about it: # echo 0xffffffff > /proc/acpi/debug_level These hooks exist only if the kernel is built with CONFIG_ACPI_DEBUG. It would be interesting to know if you can examine the contents of /proc/acpi/battery/*/* thanks, -Len ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 5:13 ` Linus Torvalds 2004-08-11 5:15 ` Linus Torvalds @ 2004-08-11 5:55 ` David S. Miller 1 sibling, 0 replies; 146+ messages in thread From: David S. Miller @ 2004-08-11 5:55 UTC (permalink / raw) To: Linus Torvalds; +Cc: us15, linux-kernel, akpm, viro On Tue, 10 Aug 2004 22:13:01 -0700 (PDT) Linus Torvalds <torvalds@osdl.org> wrote: > I also wonder what the > hell is allocating so many 8kB and 32kB entries. Loopback default MTU is 16K these days, might explain the 32K entries but not the 8KB ones. Perhaps the later are being used for page tables? Just a guess on that latter one. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 3:46 ` Linus Torvalds 2004-08-11 4:18 ` Udo A. Steinberg @ 2004-08-11 4:47 ` Gene Heskett 2004-08-11 4:59 ` Linus Torvalds 1 sibling, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-11 4:47 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds, Andrew Morton, viro [-- Attachment #1: Type: text/plain, Size: 3242 bytes --] On Tuesday 10 August 2004 23:46, Linus Torvalds wrote: >On Tue, 10 Aug 2004, Gene Heskett wrote: >> Linus, I hate to be a killjoy on this, but I just had to reboot >> again, > >Note that this is something else going on. The "obvious one-liner" > can be an issue only with certain special XFS stuff or knfsd, > neither of which you have. I've come to the same conclusion. >> it was killing processes, even first the shells I had open then >> kmail and X this time, but with nothing in the logs, and when X >> had quit, a top in the launching shell reported nearly 250 megs >> free with nothing in the swap. > >As Andrew already requested, the only way for us to figure out what > is wrong is to get output from you on where the memory has gone. > Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps > axm" helps too. > I'll try and remember that. >If it is slow, the above will still work. Just save them away and > reboot. And it appears to still be quit slow, on -rc4 now. I have a remake running just to double-check. Also, the X startup was extremely slow, taking over 3 minutes before gkrellm was able to load its theme face, and kmail nearly 2 minutes to get started. This is also the cfq scheduler which I hadn't tried in a while. Just to quantify the slow, here is the rebuild time on another copy of 2.6.8-rc4 while running 2.6.8-rc4: real 27m50.631s user 13m15.535s sys 1m35.908s Thats nearly 10 minutes longer than the same build took on -rc2. And over 22 minutes longer that it would take if I was running 2.6.7. This shouldn't be more than 6 minutes, so where is all the tire smoke I should see if its spinning them that badly? I must have something terribly, badly configured. So the .config is attached. This is a Biostar M7NCD-Pro, nforce2 SPP northbridge, MCP southbridge chipset, with an athlon 2800XP and a gig of ram, using the onboard audio (intel8x0) and ethernet (forcedeth). Actual cpu clock according to dmesg is 2079mhz. Bogomips=4100+. If this crashes too, I'll send those files if I can keep it going long enough to get them once I get rebooted, but the reboot will be to 2.6.7, it at least leaves its muddy footprints all over the log when it goes by-by. Unforch, when it goes, theres no chance to get that data, its total lockup. Just for grins, /proc/cpuinfo: processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 10 model name : AMD Athlon(tm) XP 2800+ stepping : 0 cpu MHz : 2079.940 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow bogomips : 4112.38 -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. [-- Attachment #2: .config --] [-- Type: text/plain, Size: 29571 bytes --] # # Automatically generated make config: don't edit # CONFIG_X86=y CONFIG_MMU=y CONFIG_UID16=y CONFIG_GENERIC_ISA_DMA=y # # Code maturity level options # CONFIG_EXPERIMENTAL=y # CONFIG_CLEAN_COMPILE is not set CONFIG_BROKEN=y CONFIG_BROKEN_ON_SMP=y # # General setup # CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y CONFIG_BSD_PROCESS_ACCT=y # CONFIG_BSD_PROCESS_ACCT_V3 is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_LOG_BUF_SHIFT=14 CONFIG_HOTPLUG=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_IOSCHED_NOOP=y CONFIG_IOSCHED_AS=y CONFIG_IOSCHED_DEADLINE=y CONFIG_IOSCHED_CFQ=y # CONFIG_CC_OPTIMIZE_FOR_SIZE is not set # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y CONFIG_MODULE_FORCE_UNLOAD=y CONFIG_OBSOLETE_MODPARM=y CONFIG_MODVERSIONS=y CONFIG_KMOD=y # # Processor type and features # CONFIG_X86_PC=y # CONFIG_X86_ELAN is not set # CONFIG_X86_VOYAGER is not set # CONFIG_X86_NUMAQ is not set # CONFIG_X86_SUMMIT is not set # CONFIG_X86_BIGSMP is not set # CONFIG_X86_VISWS is not set # CONFIG_X86_GENERICARCH is not set # CONFIG_X86_ES7000 is not set # CONFIG_M386 is not set # CONFIG_M486 is not set # CONFIG_M586 is not set # CONFIG_M586TSC is not set # CONFIG_M586MMX is not set # CONFIG_M686 is not set # CONFIG_MPENTIUMII is not set # CONFIG_MPENTIUMIII is not set # CONFIG_MPENTIUMM is not set # CONFIG_MPENTIUM4 is not set # CONFIG_MK6 is not set CONFIG_MK7=y # CONFIG_MK8 is not set # CONFIG_MCRUSOE is not set # CONFIG_MWINCHIPC6 is not set # CONFIG_MWINCHIP2 is not set # CONFIG_MWINCHIP3D is not set # CONFIG_MCYRIXIII is not set # CONFIG_MVIAC3_2 is not set # CONFIG_X86_GENERIC is not set CONFIG_X86_CMPXCHG=y CONFIG_X86_XADD=y CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_INTEL_USERCOPY=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_USE_3DNOW=y CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y # CONFIG_SMP is not set CONFIG_PREEMPT=y # CONFIG_X86_UP_APIC is not set CONFIG_X86_TSC=y # CONFIG_X86_MCE is not set # CONFIG_TOSHIBA is not set # CONFIG_I8K is not set # CONFIG_MICROCODE is not set # CONFIG_X86_MSR is not set CONFIG_X86_CPUID=y # # Firmware Drivers # # CONFIG_EDD is not set # CONFIG_NOHIGHMEM is not set CONFIG_HIGHMEM4G=y # CONFIG_HIGHMEM64G is not set CONFIG_HIGHMEM=y CONFIG_HIGHPTE=y # CONFIG_MATH_EMULATION is not set CONFIG_MTRR=y CONFIG_HAVE_DEC_LOCK=y # CONFIG_REGPARM is not set # # Power management options (ACPI, APM) # CONFIG_PM=y # CONFIG_SOFTWARE_SUSPEND is not set # CONFIG_PM_DISK is not set # # ACPI (Advanced Configuration and Power Interface) Support # # CONFIG_ACPI is not set CONFIG_ACPI_BOOT=y # # APM (Advanced Power Management) BIOS Support # CONFIG_APM=y # CONFIG_APM_IGNORE_USER_SUSPEND is not set # CONFIG_APM_DO_ENABLE is not set # CONFIG_APM_CPU_IDLE is not set # CONFIG_APM_DISPLAY_BLANK is not set CONFIG_APM_RTC_IS_GMT=y # CONFIG_APM_ALLOW_INTS is not set CONFIG_APM_REAL_MODE_POWER_OFF=y # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # # Bus options (PCI, PCMCIA, EISA, MCA, ISA) # CONFIG_PCI=y # CONFIG_PCI_GOBIOS is not set # CONFIG_PCI_GOMMCONFIG is not set # CONFIG_PCI_GODIRECT is not set CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_MMCONFIG=y CONFIG_PCI_LEGACY_PROC=y CONFIG_PCI_NAMES=y CONFIG_ISA=y # CONFIG_EISA is not set # CONFIG_MCA is not set # CONFIG_SCx200 is not set # # PCMCIA/CardBus support # # CONFIG_PCMCIA is not set CONFIG_PCMCIA_PROBE=y # # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set # # Executable file formats # CONFIG_BINFMT_ELF=y CONFIG_BINFMT_AOUT=y CONFIG_BINFMT_MISC=y # # Device Drivers # # # Generic Driver Options # CONFIG_STANDALONE=y CONFIG_PREVENT_FIRMWARE_BUILD=y # CONFIG_FW_LOADER is not set # # Memory Technology Devices (MTD) # # CONFIG_MTD is not set # # Parallel port support # CONFIG_PARPORT=y CONFIG_PARPORT_PC=y CONFIG_PARPORT_PC_CML1=y # CONFIG_PARPORT_SERIAL is not set # CONFIG_PARPORT_PC_FIFO is not set # CONFIG_PARPORT_PC_SUPERIO is not set # CONFIG_PARPORT_OTHER is not set CONFIG_PARPORT_1284=y # # Plug and Play support # CONFIG_PNP=y CONFIG_PNP_DEBUG=y # # Protocols # CONFIG_ISAPNP=y CONFIG_PNPBIOS=y CONFIG_PNPBIOS_PROC_FS=y # # Block devices # CONFIG_BLK_DEV_FD=y # CONFIG_BLK_DEV_XD is not set # CONFIG_PARIDE is not set # CONFIG_BLK_CPQ_DA is not set # CONFIG_BLK_CPQ_CISS_DA is not set # CONFIG_BLK_DEV_DAC960 is not set # CONFIG_BLK_DEV_UMEM is not set CONFIG_BLK_DEV_LOOP=y # CONFIG_BLK_DEV_CRYPTOLOOP is not set # CONFIG_BLK_DEV_NBD is not set # CONFIG_BLK_DEV_SX8 is not set # CONFIG_BLK_DEV_RAM is not set # CONFIG_LBD is not set # # ATA/ATAPI/MFM/RLL support # CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y # # Please see Documentation/ide.txt for help/info on IDE drives # # CONFIG_BLK_DEV_IDE_SATA is not set # CONFIG_BLK_DEV_HD_IDE is not set CONFIG_BLK_DEV_IDEDISK=y # CONFIG_IDEDISK_MULTI_MODE is not set CONFIG_BLK_DEV_IDECD=y # CONFIG_BLK_DEV_IDETAPE is not set # CONFIG_BLK_DEV_IDEFLOPPY is not set # CONFIG_BLK_DEV_IDESCSI is not set # CONFIG_IDE_TASK_IOCTL is not set # CONFIG_IDE_TASKFILE_IO is not set # # IDE chipset support/bugfixes # # CONFIG_IDE_GENERIC is not set # CONFIG_BLK_DEV_CMD640 is not set # CONFIG_BLK_DEV_IDEPNP is not set CONFIG_BLK_DEV_IDEPCI=y # CONFIG_IDEPCI_SHARE_IRQ is not set # CONFIG_BLK_DEV_OFFBOARD is not set # CONFIG_BLK_DEV_GENERIC is not set # CONFIG_BLK_DEV_OPTI621 is not set # CONFIG_BLK_DEV_RZ1000 is not set CONFIG_BLK_DEV_IDEDMA_PCI=y # CONFIG_BLK_DEV_IDEDMA_FORCED is not set CONFIG_IDEDMA_PCI_AUTO=y # CONFIG_IDEDMA_ONLYDISK is not set CONFIG_BLK_DEV_ADMA=y # CONFIG_BLK_DEV_AEC62XX is not set # CONFIG_BLK_DEV_ALI15X3 is not set CONFIG_BLK_DEV_AMD74XX=y # CONFIG_BLK_DEV_ATIIXP is not set # CONFIG_BLK_DEV_CMD64X is not set # CONFIG_BLK_DEV_TRIFLEX is not set # CONFIG_BLK_DEV_CY82C693 is not set # CONFIG_BLK_DEV_CS5520 is not set # CONFIG_BLK_DEV_CS5530 is not set # CONFIG_BLK_DEV_HPT34X is not set # CONFIG_BLK_DEV_HPT366 is not set # CONFIG_BLK_DEV_SC1200 is not set # CONFIG_BLK_DEV_PIIX is not set # CONFIG_BLK_DEV_NS87415 is not set # CONFIG_BLK_DEV_PDC202XX_OLD is not set # CONFIG_BLK_DEV_PDC202XX_NEW is not set # CONFIG_BLK_DEV_SVWKS is not set # CONFIG_BLK_DEV_SIIMAGE is not set # CONFIG_BLK_DEV_SIS5513 is not set # CONFIG_BLK_DEV_SLC90E66 is not set # CONFIG_BLK_DEV_TRM290 is not set # CONFIG_BLK_DEV_VIA82CXXX is not set # CONFIG_IDE_ARM is not set # CONFIG_IDE_CHIPSETS is not set CONFIG_BLK_DEV_IDEDMA=y # CONFIG_IDEDMA_IVB is not set CONFIG_IDEDMA_AUTO=y # CONFIG_BLK_DEV_HD is not set # # SCSI device support # CONFIG_SCSI=y CONFIG_SCSI_PROC_FS=y # # SCSI support type (disk, tape, CD-ROM) # CONFIG_BLK_DEV_SD=m CONFIG_CHR_DEV_ST=m # CONFIG_CHR_DEV_OSST is not set # CONFIG_BLK_DEV_SR is not set CONFIG_CHR_DEV_SG=m # # Some SCSI devices (e.g. CD jukebox) support multiple LUNs # CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y # CONFIG_SCSI_LOGGING is not set # # SCSI Transport Attributes # # CONFIG_SCSI_SPI_ATTRS is not set # CONFIG_SCSI_FC_ATTRS is not set # # SCSI low-level drivers # # CONFIG_BLK_DEV_3W_XXXX_RAID is not set # CONFIG_SCSI_3W_9XXX is not set # CONFIG_SCSI_7000FASST is not set # CONFIG_SCSI_ACARD is not set # CONFIG_SCSI_AHA152X is not set # CONFIG_SCSI_AHA1542 is not set # CONFIG_SCSI_AACRAID is not set # CONFIG_SCSI_AIC7XXX is not set # CONFIG_SCSI_AIC7XXX_OLD is not set # CONFIG_SCSI_AIC79XX is not set # CONFIG_SCSI_DPT_I2O is not set # CONFIG_SCSI_ADVANSYS is not set # CONFIG_SCSI_IN2000 is not set # CONFIG_SCSI_MEGARAID is not set # CONFIG_SCSI_SATA is not set # CONFIG_SCSI_BUSLOGIC is not set # CONFIG_SCSI_CPQFCTS is not set # CONFIG_SCSI_DMX3191D is not set # CONFIG_SCSI_DTC3280 is not set # CONFIG_SCSI_EATA is not set # CONFIG_SCSI_EATA_PIO is not set # CONFIG_SCSI_FUTURE_DOMAIN is not set # CONFIG_SCSI_GDTH is not set # CONFIG_SCSI_GENERIC_NCR5380 is not set # CONFIG_SCSI_GENERIC_NCR5380_MMIO is not set # CONFIG_SCSI_IPS is not set # CONFIG_SCSI_INITIO is not set # CONFIG_SCSI_INIA100 is not set # CONFIG_SCSI_PPA is not set # CONFIG_SCSI_IMM is not set # CONFIG_SCSI_NCR53C406A is not set # CONFIG_SCSI_SYM53C8XX_2 is not set # CONFIG_SCSI_IPR is not set # CONFIG_SCSI_PAS16 is not set # CONFIG_SCSI_PCI2000 is not set # CONFIG_SCSI_PCI2220I is not set # CONFIG_SCSI_PSI240I is not set # CONFIG_SCSI_QLOGIC_FAS is not set # CONFIG_SCSI_QLOGIC_ISP is not set # CONFIG_SCSI_QLOGIC_FC is not set # CONFIG_SCSI_QLOGIC_1280 is not set CONFIG_SCSI_QLA2XXX=y # CONFIG_SCSI_QLA21XX is not set # CONFIG_SCSI_QLA22XX is not set # CONFIG_SCSI_QLA2300 is not set # CONFIG_SCSI_QLA2322 is not set # CONFIG_SCSI_QLA6312 is not set # CONFIG_SCSI_QLA6322 is not set # CONFIG_SCSI_SEAGATE is not set # CONFIG_SCSI_SYM53C416 is not set # CONFIG_SCSI_DC395x is not set # CONFIG_SCSI_DC390T is not set # CONFIG_SCSI_T128 is not set # CONFIG_SCSI_U14_34F is not set # CONFIG_SCSI_ULTRASTOR is not set # CONFIG_SCSI_NSP32 is not set # CONFIG_SCSI_DEBUG is not set # # Old CD-ROM drivers (not SCSI, not IDE) # # CONFIG_CD_NO_IDESCSI is not set # # Multi-device support (RAID and LVM) # # CONFIG_MD is not set # # Fusion MPT device support # # CONFIG_FUSION is not set # # IEEE 1394 (FireWire) support # # CONFIG_IEEE1394 is not set # # I2O device support # # CONFIG_I2O is not set # # Networking support # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_NETLINK_DEV=y CONFIG_UNIX=y # CONFIG_NET_KEY is not set CONFIG_INET=y # CONFIG_IP_MULTICAST is not set # CONFIG_IP_ADVANCED_ROUTER is not set # CONFIG_IP_PNP is not set # CONFIG_NET_IPIP is not set # CONFIG_NET_IPGRE is not set # CONFIG_ARPD is not set # CONFIG_SYN_COOKIES is not set # CONFIG_INET_AH is not set # CONFIG_INET_ESP is not set # CONFIG_INET_IPCOMP is not set # CONFIG_IPV6 is not set # CONFIG_NETFILTER is not set # # SCTP Configuration (EXPERIMENTAL) # # CONFIG_IP_SCTP is not set # CONFIG_ATM is not set # CONFIG_BRIDGE is not set # CONFIG_VLAN_8021Q is not set # CONFIG_DECNET is not set # CONFIG_LLC2 is not set # CONFIG_IPX is not set # CONFIG_ATALK is not set # CONFIG_X25 is not set # CONFIG_LAPB is not set # CONFIG_NET_DIVERT is not set # CONFIG_ECONET is not set # CONFIG_WAN_ROUTER is not set # CONFIG_NET_FASTROUTE is not set # CONFIG_NET_HW_FLOWCONTROL is not set # # QoS and/or fair queueing # # CONFIG_NET_SCHED is not set # CONFIG_NET_CLS_ROUTE is not set # # Network testing # # CONFIG_NET_PKTGEN is not set # CONFIG_NETPOLL is not set # CONFIG_NET_POLL_CONTROLLER is not set # CONFIG_HAMRADIO is not set # CONFIG_IRDA is not set # CONFIG_BT is not set CONFIG_NETDEVICES=y CONFIG_DUMMY=y # CONFIG_BONDING is not set # CONFIG_EQUALIZER is not set # CONFIG_TUN is not set # CONFIG_ETHERTAP is not set # CONFIG_NET_SB1000 is not set # # ARCnet devices # # CONFIG_ARCNET is not set # # Ethernet (10 or 100Mbit) # CONFIG_NET_ETHERNET=y CONFIG_MII=y # CONFIG_HAPPYMEAL is not set # CONFIG_SUNGEM is not set # CONFIG_NET_VENDOR_3COM is not set # CONFIG_LANCE is not set # CONFIG_NET_VENDOR_SMC is not set # CONFIG_NET_VENDOR_RACAL is not set # # Tulip family network device support # # CONFIG_NET_TULIP is not set # CONFIG_AT1700 is not set # CONFIG_DEPCA is not set # CONFIG_HP100 is not set # CONFIG_NET_ISA is not set CONFIG_NET_PCI=y # CONFIG_PCNET32 is not set # CONFIG_AMD8111_ETH is not set # CONFIG_ADAPTEC_STARFIRE is not set # CONFIG_AC3200 is not set # CONFIG_APRICOT is not set # CONFIG_B44 is not set CONFIG_FORCEDETH=m # CONFIG_CS89x0 is not set # CONFIG_DGRS is not set # CONFIG_EEPRO100 is not set # CONFIG_E100 is not set # CONFIG_FEALNX is not set # CONFIG_NATSEMI is not set # CONFIG_NE2K_PCI is not set # CONFIG_8139CP is not set CONFIG_8139TOO=m # CONFIG_8139TOO_PIO is not set # CONFIG_8139TOO_TUNE_TWISTER is not set # CONFIG_8139TOO_8129 is not set # CONFIG_8139_OLD_RX_RESET is not set # CONFIG_SIS900 is not set # CONFIG_EPIC100 is not set # CONFIG_SUNDANCE is not set # CONFIG_TLAN is not set # CONFIG_VIA_RHINE is not set # CONFIG_VIA_VELOCITY is not set # CONFIG_NET_POCKET is not set # # Ethernet (1000 Mbit) # # CONFIG_ACENIC is not set # CONFIG_DL2K is not set # CONFIG_E1000 is not set # CONFIG_NS83820 is not set # CONFIG_HAMACHI is not set # CONFIG_YELLOWFIN is not set CONFIG_R8169=m # CONFIG_SK98LIN is not set # CONFIG_TIGON3 is not set # # Ethernet (10000 Mbit) # # CONFIG_IXGB is not set # CONFIG_S2IO is not set # # Token Ring devices # # CONFIG_TR is not set # # Wireless LAN (non-hamradio) # # CONFIG_NET_RADIO is not set # # Wan interfaces # # CONFIG_WAN is not set # CONFIG_FDDI is not set # CONFIG_HIPPI is not set # CONFIG_PLIP is not set # CONFIG_PPP is not set # CONFIG_SLIP is not set # CONFIG_NET_FC is not set # CONFIG_SHAPER is not set # CONFIG_NETCONSOLE is not set # # ISDN subsystem # # CONFIG_ISDN is not set # # Telephony Support # # CONFIG_PHONE is not set # # Input device support # CONFIG_INPUT=y # # Userland interfaces # CONFIG_INPUT_MOUSEDEV=y CONFIG_INPUT_MOUSEDEV_PSAUX=y CONFIG_INPUT_MOUSEDEV_SCREEN_X=1600 CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1200 # CONFIG_INPUT_JOYDEV is not set # CONFIG_INPUT_TSDEV is not set CONFIG_INPUT_EVDEV=y # CONFIG_INPUT_EVBUG is not set # # Input I/O drivers # # CONFIG_GAMEPORT is not set CONFIG_SOUND_GAMEPORT=y CONFIG_SERIO=y CONFIG_SERIO_I8042=y # CONFIG_SERIO_SERPORT is not set # CONFIG_SERIO_CT82C710 is not set # CONFIG_SERIO_PARKBD is not set # CONFIG_SERIO_PCIPS2 is not set # # Input Device Drivers # CONFIG_INPUT_KEYBOARD=y CONFIG_KEYBOARD_ATKBD=y # CONFIG_KEYBOARD_SUNKBD is not set # CONFIG_KEYBOARD_LKKBD is not set # CONFIG_KEYBOARD_XTKBD is not set # CONFIG_KEYBOARD_NEWTON is not set CONFIG_INPUT_MOUSE=y CONFIG_MOUSE_PS2=y # CONFIG_MOUSE_SERIAL is not set # CONFIG_MOUSE_INPORT is not set # CONFIG_MOUSE_LOGIBM is not set # CONFIG_MOUSE_PC110PAD is not set # CONFIG_MOUSE_VSXXXAA is not set # CONFIG_INPUT_JOYSTICK is not set # CONFIG_INPUT_TOUCHSCREEN is not set CONFIG_INPUT_MISC=y CONFIG_INPUT_PCSPKR=y # CONFIG_INPUT_UINPUT is not set # # Character devices # CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_HW_CONSOLE=y # CONFIG_SERIAL_NONSTANDARD is not set # # Serial drivers # CONFIG_SERIAL_8250=y # CONFIG_SERIAL_8250_CONSOLE is not set CONFIG_SERIAL_8250_NR_UARTS=2 CONFIG_SERIAL_8250_EXTENDED=y # CONFIG_SERIAL_8250_MANY_PORTS is not set CONFIG_SERIAL_8250_SHARE_IRQ=y # CONFIG_SERIAL_8250_DETECT_IRQ is not set # CONFIG_SERIAL_8250_MULTIPORT is not set # CONFIG_SERIAL_8250_RSA is not set # # Non-8250 serial port support # CONFIG_SERIAL_CORE=y CONFIG_UNIX98_PTYS=y CONFIG_LEGACY_PTYS=y CONFIG_LEGACY_PTY_COUNT=256 CONFIG_PRINTER=y # CONFIG_LP_CONSOLE is not set # CONFIG_PPDEV is not set # CONFIG_TIPAR is not set # CONFIG_QIC02_TAPE is not set # # IPMI # CONFIG_IPMI_HANDLER=y # CONFIG_IPMI_PANIC_EVENT is not set CONFIG_IPMI_DEVICE_INTERFACE=y # CONFIG_IPMI_SI is not set # CONFIG_IPMI_WATCHDOG is not set # # Watchdog Cards # # CONFIG_WATCHDOG is not set CONFIG_HW_RANDOM=y # CONFIG_NVRAM is not set CONFIG_RTC=y # CONFIG_DTLK is not set # CONFIG_R3964 is not set # CONFIG_APPLICOM is not set # CONFIG_SONYPI is not set # # Ftape, the floppy tape device driver # # CONFIG_FTAPE is not set CONFIG_AGP=y # CONFIG_AGP_ALI is not set # CONFIG_AGP_ATI is not set # CONFIG_AGP_AMD is not set # CONFIG_AGP_AMD64 is not set # CONFIG_AGP_INTEL is not set # CONFIG_AGP_INTEL_MCH is not set CONFIG_AGP_NVIDIA=y # CONFIG_AGP_SIS is not set # CONFIG_AGP_SWORKS is not set # CONFIG_AGP_VIA is not set # CONFIG_AGP_EFFICEON is not set CONFIG_DRM=y # CONFIG_DRM_TDFX is not set # CONFIG_DRM_GAMMA is not set # CONFIG_DRM_R128 is not set CONFIG_DRM_RADEON=y # CONFIG_DRM_MGA is not set # CONFIG_DRM_SIS is not set # CONFIG_MWAVE is not set # CONFIG_RAW_DRIVER is not set # CONFIG_HANGCHECK_TIMER is not set # # I2C support # CONFIG_I2C=y CONFIG_I2C_CHARDEV=y # # I2C Algorithms # CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_ALGOPCF is not set # # I2C Hardware Bus support # # CONFIG_I2C_ALI1535 is not set # CONFIG_I2C_ALI1563 is not set # CONFIG_I2C_ALI15X3 is not set # CONFIG_I2C_AMD756 is not set # CONFIG_I2C_AMD8111 is not set # CONFIG_I2C_ELEKTOR is not set # CONFIG_I2C_I801 is not set # CONFIG_I2C_I810 is not set CONFIG_I2C_ISA=y # CONFIG_I2C_NFORCE2 is not set # CONFIG_I2C_PARPORT is not set # CONFIG_I2C_PARPORT_LIGHT is not set # CONFIG_I2C_PIIX4 is not set # CONFIG_I2C_PROSAVAGE is not set # CONFIG_I2C_SAVAGE4 is not set # CONFIG_SCx200_ACB is not set # CONFIG_I2C_SIS5595 is not set # CONFIG_I2C_SIS630 is not set # CONFIG_I2C_SIS96X is not set # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set # # Hardware Sensors Chip support # CONFIG_I2C_SENSOR=y # CONFIG_SENSORS_ADM1021 is not set # CONFIG_SENSORS_ADM1025 is not set # CONFIG_SENSORS_ADM1031 is not set # CONFIG_SENSORS_ASB100 is not set # CONFIG_SENSORS_DS1621 is not set # CONFIG_SENSORS_FSCHER is not set # CONFIG_SENSORS_GL518SM is not set # CONFIG_SENSORS_IT87 is not set # CONFIG_SENSORS_LM75 is not set # CONFIG_SENSORS_LM77 is not set # CONFIG_SENSORS_LM78 is not set # CONFIG_SENSORS_LM80 is not set # CONFIG_SENSORS_LM83 is not set # CONFIG_SENSORS_LM85 is not set # CONFIG_SENSORS_LM90 is not set # CONFIG_SENSORS_MAX1619 is not set # CONFIG_SENSORS_VIA686A is not set CONFIG_SENSORS_W83781D=y # CONFIG_SENSORS_W83L785TS is not set # CONFIG_SENSORS_W83627HF is not set # # Other I2C Chip support # CONFIG_SENSORS_EEPROM=m # CONFIG_SENSORS_PCF8574 is not set # CONFIG_SENSORS_PCF8591 is not set # CONFIG_SENSORS_RTC8564 is not set # CONFIG_I2C_DEBUG_CORE is not set # CONFIG_I2C_DEBUG_ALGO is not set # CONFIG_I2C_DEBUG_BUS is not set # CONFIG_I2C_DEBUG_CHIP is not set # # Dallas's 1-wire bus # # CONFIG_W1 is not set # # Misc devices # # CONFIG_IBM_ASM is not set # # Multimedia devices # CONFIG_VIDEO_DEV=y # # Video For Linux # # # Video Adapters # CONFIG_VIDEO_BT848=m # CONFIG_VIDEO_PMS is not set # CONFIG_VIDEO_BWQCAM is not set # CONFIG_VIDEO_CQCAM is not set # CONFIG_VIDEO_W9966 is not set # CONFIG_VIDEO_CPIA is not set # CONFIG_VIDEO_SAA5246A is not set # CONFIG_VIDEO_SAA5249 is not set # CONFIG_TUNER_3036 is not set # CONFIG_VIDEO_STRADIS is not set # CONFIG_VIDEO_ZORAN is not set # CONFIG_VIDEO_ZR36120 is not set # CONFIG_VIDEO_SAA7134 is not set # CONFIG_VIDEO_MXB is not set # CONFIG_VIDEO_DPC is not set # CONFIG_VIDEO_HEXIUM_ORION is not set # CONFIG_VIDEO_HEXIUM_GEMINI is not set # CONFIG_VIDEO_CX88 is not set # CONFIG_VIDEO_OVCAMCHIP is not set # # Radio Adapters # # CONFIG_RADIO_CADET is not set # CONFIG_RADIO_RTRACK is not set # CONFIG_RADIO_RTRACK2 is not set # CONFIG_RADIO_AZTECH is not set # CONFIG_RADIO_GEMTEK is not set # CONFIG_RADIO_GEMTEK_PCI is not set # CONFIG_RADIO_MAXIRADIO is not set # CONFIG_RADIO_MAESTRO is not set # CONFIG_RADIO_SF16FMI is not set # CONFIG_RADIO_SF16FMR2 is not set # CONFIG_RADIO_TERRATEC is not set # CONFIG_RADIO_TRUST is not set # CONFIG_RADIO_TYPHOON is not set # CONFIG_RADIO_ZOLTRIX is not set # # Digital Video Broadcasting Devices # # CONFIG_DVB is not set CONFIG_VIDEO_TUNER=m CONFIG_VIDEO_BUF=m CONFIG_VIDEO_BTCX=m CONFIG_VIDEO_IR=m # # Graphics support # CONFIG_FB=y # CONFIG_FB_CIRRUS is not set # CONFIG_FB_PM2 is not set # CONFIG_FB_CYBER2000 is not set # CONFIG_FB_ASILIANT is not set # CONFIG_FB_IMSTT is not set # CONFIG_FB_VGA16 is not set # CONFIG_FB_VESA is not set CONFIG_VIDEO_SELECT=y # CONFIG_FB_HGA is not set # CONFIG_FB_RIVA is not set # CONFIG_FB_MATROX is not set # CONFIG_FB_RADEON_OLD is not set CONFIG_FB_RADEON=y CONFIG_FB_RADEON_I2C=y # CONFIG_FB_RADEON_DEBUG is not set # CONFIG_FB_ATY128 is not set # CONFIG_FB_ATY is not set # CONFIG_FB_SIS is not set # CONFIG_FB_NEOMAGIC is not set # CONFIG_FB_KYRO is not set # CONFIG_FB_3DFX is not set # CONFIG_FB_VOODOO1 is not set # CONFIG_FB_TRIDENT is not set # CONFIG_FB_PM3 is not set # CONFIG_FB_VIRTUAL is not set # # Console display driver support # CONFIG_VGA_CONSOLE=y # CONFIG_MDA_CONSOLE is not set CONFIG_DUMMY_CONSOLE=y # CONFIG_FRAMEBUFFER_CONSOLE is not set # # Logo configuration # # CONFIG_LOGO is not set # # Sound # CONFIG_SOUND=y # # Advanced Linux Sound Architecture # CONFIG_SND=m CONFIG_SND_TIMER=m CONFIG_SND_PCM=m CONFIG_SND_RAWMIDI=m CONFIG_SND_SEQUENCER=m # CONFIG_SND_SEQ_DUMMY is not set CONFIG_SND_OSSEMUL=y CONFIG_SND_MIXER_OSS=m CONFIG_SND_PCM_OSS=m CONFIG_SND_SEQUENCER_OSS=y CONFIG_SND_RTCTIMER=m # CONFIG_SND_VERBOSE_PRINTK is not set # CONFIG_SND_DEBUG is not set # # Generic devices # CONFIG_SND_MPU401_UART=m CONFIG_SND_DUMMY=m CONFIG_SND_VIRMIDI=m CONFIG_SND_MTPAV=m # CONFIG_SND_SERIAL_U16550 is not set CONFIG_SND_MPU401=m # # ISA devices # # CONFIG_SND_AD1816A is not set # CONFIG_SND_AD1848 is not set # CONFIG_SND_CS4231 is not set # CONFIG_SND_CS4232 is not set # CONFIG_SND_CS4236 is not set # CONFIG_SND_ES968 is not set # CONFIG_SND_ES1688 is not set # CONFIG_SND_ES18XX is not set # CONFIG_SND_GUSCLASSIC is not set # CONFIG_SND_GUSEXTREME is not set # CONFIG_SND_GUSMAX is not set # CONFIG_SND_INTERWAVE is not set # CONFIG_SND_INTERWAVE_STB is not set # CONFIG_SND_OPTI92X_AD1848 is not set # CONFIG_SND_OPTI92X_CS4231 is not set # CONFIG_SND_OPTI93X is not set # CONFIG_SND_SB8 is not set # CONFIG_SND_SB16 is not set # CONFIG_SND_SBAWE is not set # CONFIG_SND_WAVEFRONT is not set # CONFIG_SND_ALS100 is not set # CONFIG_SND_AZT2320 is not set # CONFIG_SND_CMI8330 is not set # CONFIG_SND_DT019X is not set # CONFIG_SND_OPL3SA2 is not set # CONFIG_SND_SGALAXY is not set # CONFIG_SND_SSCAPE is not set # # PCI devices # CONFIG_SND_AC97_CODEC=m # CONFIG_SND_ALI5451 is not set # CONFIG_SND_ATIIXP is not set # CONFIG_SND_AU8810 is not set # CONFIG_SND_AU8820 is not set # CONFIG_SND_AU8830 is not set # CONFIG_SND_AZT3328 is not set CONFIG_SND_BT87X=m # CONFIG_SND_CS46XX is not set # CONFIG_SND_CS4281 is not set # CONFIG_SND_EMU10K1 is not set # CONFIG_SND_KORG1212 is not set # CONFIG_SND_MIXART is not set # CONFIG_SND_NM256 is not set # CONFIG_SND_RME32 is not set # CONFIG_SND_RME96 is not set # CONFIG_SND_RME9652 is not set # CONFIG_SND_HDSP is not set # CONFIG_SND_TRIDENT is not set # CONFIG_SND_YMFPCI is not set # CONFIG_SND_ALS4000 is not set # CONFIG_SND_CMIPCI is not set # CONFIG_SND_ENS1370 is not set # CONFIG_SND_ENS1371 is not set # CONFIG_SND_ES1938 is not set # CONFIG_SND_ES1968 is not set # CONFIG_SND_MAESTRO3 is not set # CONFIG_SND_FM801 is not set # CONFIG_SND_ICE1712 is not set # CONFIG_SND_ICE1724 is not set CONFIG_SND_INTEL8X0=m # CONFIG_SND_INTEL8X0M is not set # CONFIG_SND_SONICVIBES is not set # CONFIG_SND_VIA82XX is not set # CONFIG_SND_VX222 is not set # # ALSA USB devices # # CONFIG_SND_USB_AUDIO is not set # # Open Sound System # # CONFIG_SOUND_PRIME is not set # # USB support # CONFIG_USB=y # CONFIG_USB_DEBUG is not set # # Miscellaneous USB options # CONFIG_USB_DEVICEFS=y # CONFIG_USB_BANDWIDTH is not set # CONFIG_USB_DYNAMIC_MINORS is not set # # USB Host Controller Drivers # CONFIG_USB_EHCI_HCD=y # CONFIG_USB_EHCI_SPLIT_ISO is not set # CONFIG_USB_EHCI_ROOT_HUB_TT is not set CONFIG_USB_OHCI_HCD=y # CONFIG_USB_UHCI_HCD is not set # # USB Device Class drivers # # CONFIG_USB_AUDIO is not set # CONFIG_USB_BLUETOOTH_TTY is not set # CONFIG_USB_MIDI is not set # CONFIG_USB_ACM is not set CONFIG_USB_PRINTER=y CONFIG_USB_STORAGE=y # CONFIG_USB_STORAGE_DEBUG is not set # CONFIG_USB_STORAGE_RW_DETECT is not set # CONFIG_USB_STORAGE_DATAFAB is not set # CONFIG_USB_STORAGE_FREECOM is not set # CONFIG_USB_STORAGE_ISD200 is not set # CONFIG_USB_STORAGE_DPCM is not set # CONFIG_USB_STORAGE_HP8200e is not set # CONFIG_USB_STORAGE_SDDR09 is not set # CONFIG_USB_STORAGE_SDDR55 is not set # CONFIG_USB_STORAGE_JUMPSHOT is not set # # USB Human Interface Devices (HID) # CONFIG_USB_HID=y CONFIG_USB_HIDINPUT=y # CONFIG_HID_FF is not set CONFIG_USB_HIDDEV=y # CONFIG_USB_AIPTEK is not set # CONFIG_USB_WACOM is not set # CONFIG_USB_KBTAB is not set # CONFIG_USB_POWERMATE is not set # CONFIG_USB_MTOUCH is not set # CONFIG_USB_EGALAX is not set # CONFIG_USB_XPAD is not set # CONFIG_USB_ATI_REMOTE is not set # # USB Imaging devices # # CONFIG_USB_MDC800 is not set # CONFIG_USB_MICROTEK is not set # CONFIG_USB_HPUSBSCSI is not set # # USB Multimedia devices # # CONFIG_USB_DABUSB is not set # CONFIG_USB_VICAM is not set # CONFIG_USB_DSBR is not set # CONFIG_USB_IBMCAM is not set # CONFIG_USB_KONICAWC is not set # CONFIG_USB_OV511 is not set # CONFIG_USB_PWC is not set # CONFIG_USB_SE401 is not set # CONFIG_USB_SN9C102 is not set # CONFIG_USB_STV680 is not set # # USB Network adaptors # # CONFIG_USB_CATC is not set # CONFIG_USB_KAWETH is not set # CONFIG_USB_PEGASUS is not set # CONFIG_USB_RTL8150 is not set # CONFIG_USB_USBNET is not set # # USB port drivers # # CONFIG_USB_USS720 is not set # # USB Serial Converter support # CONFIG_USB_SERIAL=y # CONFIG_USB_SERIAL_CONSOLE is not set # CONFIG_USB_SERIAL_GENERIC is not set # CONFIG_USB_SERIAL_BELKIN is not set # CONFIG_USB_SERIAL_WHITEHEAT is not set # CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set # CONFIG_USB_SERIAL_EMPEG is not set # CONFIG_USB_SERIAL_FTDI_SIO is not set # CONFIG_USB_SERIAL_VISOR is not set # CONFIG_USB_SERIAL_IPAQ is not set # CONFIG_USB_SERIAL_IR is not set # CONFIG_USB_SERIAL_EDGEPORT is not set # CONFIG_USB_SERIAL_EDGEPORT_TI is not set # CONFIG_USB_SERIAL_KEYSPAN_PDA is not set # CONFIG_USB_SERIAL_KEYSPAN is not set # CONFIG_USB_SERIAL_KLSI is not set # CONFIG_USB_SERIAL_KOBIL_SCT is not set # CONFIG_USB_SERIAL_MCT_U232 is not set CONFIG_USB_SERIAL_PL2303=y # CONFIG_USB_SERIAL_SAFE is not set # CONFIG_USB_SERIAL_CYBERJACK is not set # CONFIG_USB_SERIAL_XIRCOM is not set # CONFIG_USB_SERIAL_OMNINET is not set # # USB Miscellaneous drivers # # CONFIG_USB_EMI62 is not set # CONFIG_USB_EMI26 is not set # CONFIG_USB_TIGL is not set # CONFIG_USB_AUERSWALD is not set # CONFIG_USB_RIO500 is not set # CONFIG_USB_LEGOTOWER is not set # CONFIG_USB_LCD is not set # CONFIG_USB_LED is not set # CONFIG_USB_CYTHERM is not set # CONFIG_USB_PHIDGETSERVO is not set # CONFIG_USB_TEST is not set # # USB Gadget Support # # CONFIG_USB_GADGET is not set # # File systems # CONFIG_EXT2_FS=y # CONFIG_EXT2_FS_XATTR is not set CONFIG_EXT3_FS=y # CONFIG_EXT3_FS_XATTR is not set CONFIG_JBD=y # CONFIG_JBD_DEBUG is not set # CONFIG_REISERFS_FS is not set # CONFIG_JFS_FS is not set # CONFIG_XFS_FS is not set # CONFIG_MINIX_FS is not set # CONFIG_ROMFS_FS is not set # CONFIG_QUOTA is not set # CONFIG_AUTOFS_FS is not set CONFIG_AUTOFS4_FS=y # # CD-ROM/DVD Filesystems # CONFIG_ISO9660_FS=y CONFIG_JOLIET=y CONFIG_ZISOFS=y CONFIG_ZISOFS_FS=y CONFIG_UDF_FS=y CONFIG_UDF_NLS=y # # DOS/FAT/NT Filesystems # CONFIG_FAT_FS=y CONFIG_MSDOS_FS=y CONFIG_VFAT_FS=y CONFIG_FAT_DEFAULT_CODEPAGE=437 CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1" # CONFIG_NTFS_FS is not set # # Pseudo filesystems # CONFIG_PROC_FS=y CONFIG_PROC_KCORE=y CONFIG_SYSFS=y # CONFIG_DEVFS_FS is not set # CONFIG_DEVPTS_FS_XATTR is not set # CONFIG_TMPFS is not set # CONFIG_HUGETLBFS is not set # CONFIG_HUGETLB_PAGE is not set CONFIG_RAMFS=y # # Miscellaneous filesystems # # CONFIG_ADFS_FS is not set # CONFIG_AFFS_FS is not set # CONFIG_HFS_FS is not set # CONFIG_HFSPLUS_FS is not set # CONFIG_BEFS_FS is not set # CONFIG_BFS_FS is not set # CONFIG_EFS_FS is not set # CONFIG_CRAMFS is not set # CONFIG_VXFS_FS is not set # CONFIG_HPFS_FS is not set # CONFIG_QNX4FS_FS is not set # CONFIG_SYSV_FS is not set # CONFIG_UFS_FS is not set # # Network File Systems # # CONFIG_NFS_FS is not set # CONFIG_NFSD is not set # CONFIG_EXPORTFS is not set CONFIG_SMB_FS=y # CONFIG_SMB_NLS_DEFAULT is not set # CONFIG_CIFS is not set # CONFIG_NCP_FS is not set # CONFIG_CODA_FS is not set # CONFIG_AFS_FS is not set # # Partition Types # # CONFIG_PARTITION_ADVANCED is not set CONFIG_MSDOS_PARTITION=y # # Native Language Support # CONFIG_NLS=y CONFIG_NLS_DEFAULT="iso8859-1" CONFIG_NLS_CODEPAGE_437=y CONFIG_NLS_CODEPAGE_737=m CONFIG_NLS_CODEPAGE_775=m CONFIG_NLS_CODEPAGE_850=m CONFIG_NLS_CODEPAGE_852=m CONFIG_NLS_CODEPAGE_855=m CONFIG_NLS_CODEPAGE_857=m CONFIG_NLS_CODEPAGE_860=m CONFIG_NLS_CODEPAGE_861=m CONFIG_NLS_CODEPAGE_862=m CONFIG_NLS_CODEPAGE_863=m CONFIG_NLS_CODEPAGE_864=m CONFIG_NLS_CODEPAGE_865=m CONFIG_NLS_CODEPAGE_866=m CONFIG_NLS_CODEPAGE_869=m CONFIG_NLS_CODEPAGE_936=m CONFIG_NLS_CODEPAGE_950=m CONFIG_NLS_CODEPAGE_932=m CONFIG_NLS_CODEPAGE_949=m CONFIG_NLS_CODEPAGE_874=m CONFIG_NLS_ISO8859_8=m CONFIG_NLS_CODEPAGE_1250=m CONFIG_NLS_CODEPAGE_1251=m # CONFIG_NLS_ASCII is not set CONFIG_NLS_ISO8859_1=y CONFIG_NLS_ISO8859_2=m CONFIG_NLS_ISO8859_3=m CONFIG_NLS_ISO8859_4=m CONFIG_NLS_ISO8859_5=m CONFIG_NLS_ISO8859_6=m CONFIG_NLS_ISO8859_7=m CONFIG_NLS_ISO8859_9=m CONFIG_NLS_ISO8859_13=m CONFIG_NLS_ISO8859_14=m CONFIG_NLS_ISO8859_15=y CONFIG_NLS_KOI8_R=m CONFIG_NLS_KOI8_U=m CONFIG_NLS_UTF8=y # # Profiling support # # CONFIG_PROFILING is not set # # Kernel hacking # # CONFIG_DEBUG_KERNEL is not set CONFIG_EARLY_PRINTK=y # CONFIG_DEBUG_SPINLOCK_SLEEP is not set CONFIG_FRAME_POINTER=y CONFIG_4KSTACKS=y # # Security options # CONFIG_SECURITY=y # CONFIG_SECURITY_NETWORK is not set CONFIG_SECURITY_CAPABILITIES=y # CONFIG_SECURITY_ROOTPLUG is not set # CONFIG_SECURITY_SELINUX is not set # # Cryptographic options # # CONFIG_CRYPTO is not set # # Library routines # CONFIG_CRC_CCITT=y CONFIG_CRC32=y CONFIG_LIBCRC32C=y CONFIG_ZLIB_INFLATE=y CONFIG_X86_BIOS_REBOOT=y CONFIG_PC=y ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 4:47 ` Gene Heskett @ 2004-08-11 4:59 ` Linus Torvalds 2004-08-11 8:05 ` Roger Luethi 2004-08-13 4:27 ` Gene Heskett 0 siblings, 2 replies; 146+ messages in thread From: Linus Torvalds @ 2004-08-11 4:59 UTC (permalink / raw) To: Gene Heskett; +Cc: Kernel Mailing List, Andrew Morton, Al Viro I wrote: > Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps > axm" helps too. That should be "ps axv" of course. Just shows what a retard I am. Linus ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 4:59 ` Linus Torvalds @ 2004-08-11 8:05 ` Roger Luethi 2004-08-13 4:27 ` Gene Heskett 1 sibling, 0 replies; 146+ messages in thread From: Roger Luethi @ 2004-08-11 8:05 UTC (permalink / raw) To: Linus Torvalds; +Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Al Viro On Tue, 10 Aug 2004 21:59:48 -0700, Linus Torvalds wrote: > That should be "ps axv" of course. Just shows what a retard I am. Note that some of those columns won't work as advertised. I'd be particularly suspicious about DRS and TRS. Roger ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-11 4:59 ` Linus Torvalds 2004-08-11 8:05 ` Roger Luethi @ 2004-08-13 4:27 ` Gene Heskett 2004-08-13 8:32 ` Gene Heskett 2004-08-14 2:18 ` Marcelo Tosatti 1 sibling, 2 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-13 4:27 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds, Andrew Morton, Al Viro On Wednesday 11 August 2004 00:59, Linus Torvalds wrote: >I wrote: >> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps >> axm" helps too. > >That should be "ps axv" of course. Just shows what a retard I am. > > Linus Acck! I just logged an Oops: Aug 13 00:02:00 coyote kernel: kjournald starting. Commit interval 5 seconds Aug 13 00:02:00 coyote kernel: EXT3 FS on hdb3, internal journal Aug 13 00:02:00 coyote kernel: EXT3-fs: mounted filesystem with ordered data mode. Aug 13 00:05:09 coyote kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004 Aug 13 00:05:09 coyote kernel: printing eip: Aug 13 00:05:09 coyote kernel: c014e0dc Aug 13 00:05:09 coyote kernel: *pde = 00000000 Aug 13 00:05:09 coyote kernel: Oops: 0002 [#1] Aug 13 00:05:09 coyote kernel: PREEMPT Aug 13 00:05:09 coyote kernel: Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg Aug 13 00:05:09 coyote kernel: CPU: 0 Aug 13 00:05:09 coyote kernel: EIP: 0060:[<c014e0dc>] Not tainted Aug 13 00:05:09 coyote kernel: EFLAGS: 00010246 (2.6.8-rc4) Aug 13 00:05:09 coyote kernel: EIP is at remove_inode_buffers+0x4c/0x90 Aug 13 00:05:09 coyote kernel: eax: 00000000 ebx: d7ff68b4 ecx: d7ffffb4 edx: 00000000 Aug 13 00:05:09 coyote kernel: esi: d7ff67e0 edi: 00000001 ebp: c198bed8 esp: c198bec8 Aug 13 00:05:09 coyote kernel: ds: 007b es: 007b ss: 0068 Aug 13 00:05:09 coyote kernel: Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050) Aug 13 00:05:09 coyote kernel: Stack: d7ff67e0 d7ff67e8 d7ff67e0 0000001e c198bf04 c0165242 d7ff67e0 c198b000 Aug 13 00:05:09 coyote kernel: 00000000 0000001e d7ff6988 ed3be928 00000080 00000000 c198b000 c198bf10 Aug 13 00:05:09 coyote kernel: c016532f 00000080 c198bf44 c013a32c 00000080 000000d0 0002cc1d 013b0a00 Aug 13 00:05:09 coyote kernel: Call Trace: Aug 13 00:05:09 coyote kernel: [<c010476f>] show_stack+0x7f/0xa0 Aug 13 00:05:09 coyote kernel: [<c0104908>] show_registers+0x158/0x1b0 Aug 13 00:05:09 coyote kernel: [<c0104a89>] die+0x89/0x100 Aug 13 00:05:09 coyote kernel: [<c0111725>] do_page_fault+0x1f5/0x553 Aug 13 00:05:09 coyote kernel: [<c01043d9>] error_code+0x2d/0x38 Aug 13 00:05:09 coyote kernel: [<c0165242>] prune_icache+0x142/0x1f0 Aug 13 00:05:09 coyote kernel: [<c016532f>] shrink_icache_memory+0x3f/0x50 Aug 13 00:05:09 coyote kernel: [<c013a32c>] shrink_slab+0x14c/0x190 Aug 13 00:05:09 coyote kernel: [<c013b639>] balance_pgdat+0x1a9/0x1f0 Aug 13 00:05:09 coyote kernel: [<c013b73f>] kswapd+0xbf/0xd0 Aug 13 00:05:09 coyote kernel: [<c0102471>] kernel_thread_helper+0x5/0x14 Aug 13 00:05:09 coyote kernel: Code: 89 50 04 89 02 89 49 04 89 09 8b 03 39 d8 89 c1 75 e2 b8 00 Aug 13 00:05:09 coyote kernel: <6>note: kswapd0[66] exited with preempt_count 1 The first 3 entries are from a nightly run of rsync, which mounts a normally unmounted partition for the duration of its run. Now lets see if I can get meminfo and slabinfo meminfo: root@coyote themes]# cat /proc/meminfo MemTotal: 1035844 kB MemFree: 152884 kB Buffers: 4896 kB Cached: 41276 kB SwapCached: 36740 kB Active: 131792 kB Inactive: 11792 kB HighTotal: 131008 kB HighFree: 52640 kB LowTotal: 904836 kB LowFree: 100244 kB SwapTotal: 3857104 kB SwapFree: 3731720 kB Dirty: 16 kB Writeback: 0 kB Mapped: 121068 kB Slab: 728876 kB Committed_AS: 348500 kB PageTables: 3480 kB VmallocTotal: 114680 kB VmallocUsed: 19644 kB VmallocChunk: 94932 kB slabinfo: slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> unix_sock 171 190 384 10 1 : tunables 54 27 0 : slabdata 19 19 0 tcp_tw_bucket 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 tcp_bind_bucket 19 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 ip_fib_hash 10 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 ip_dst_cache 17 30 256 15 1 : tunables 120 60 0 : slabdata 2 2 0 arp_cache 4 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 480 8 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 2 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 31 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 flow_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 mqueue_inode_cache 1 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 udf_inode_cache 0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0 smb_request 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 smb_inode_cache 2 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 1 11 352 11 1 : tunables 54 27 0 : slabdata 1 1 0 ext2_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 journal_handle 8 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 889 2025 48 81 1 : tunables 120 60 0 : slabdata 25 25 0 revoke_table 14 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 ext3_inode_cache 1373751 1373751 416 9 1 : tunables 54 27 0 : slabdata 152639 152639 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 172 370 20 185 1 : tunables 120 60 0 : slabdata 2 2 0 file_lock_cache 43 43 92 43 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 2 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 shmem_inode_cache 5 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 7 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0 sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0 sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0 sgpool-8 32 62 128 31 1 : tunables 120 60 0 : slabdata 2 2 0 cfq_pool 72 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 35 107 36 107 1 : tunables 120 60 0 : slabdata 1 1 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 0 0 60 65 1 : tunables 120 60 0 : slabdata 0 0 0 blkdev_ioc 91 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 12 18 448 9 1 : tunables 54 27 0 : slabdata 2 2 0 blkdev_requests 23 78 152 26 1 : tunables 120 60 0 : slabdata 3 3 0 biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0 biovec-64 256 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0 biovec-16 256 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0 biovec-4 257 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0 biovec-1 294 452 16 226 1 : tunables 120 60 0 : slabdata 2 2 0 bio 294 366 64 61 1 : tunables 120 60 0 : slabdata 6 6 0 sock_inode_cache 208 231 352 11 1 : tunables 54 27 0 : slabdata 21 21 0 skbuff_head_cache 225 625 160 25 1 : tunables 120 60 0 : slabdata 25 25 0 sock 3 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 proc_inode_cache 353 504 320 12 1 : tunables 54 27 0 : slabdata 42 42 0 sigqueue 84 108 148 27 1 : tunables 120 60 0 : slabdata 4 4 0 radix_tree_node 1590 4046 276 14 1 : tunables 54 27 0 : slabdata 289 289 0 bdev_cache 12 18 416 9 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 26 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 2179 2198 288 14 1 : tunables 54 27 0 : slabdata 157 157 0 dentry_cache 564066 764232 140 28 1 : tunables 120 60 0 : slabdata 27294 27294 0 filp 2045 2225 160 25 1 : tunables 120 60 0 : slabdata 89 89 0 names_cache 19 19 4096 1 1 : tunables 24 12 0 : slabdata 19 19 0 idr_layer_cache 80 87 136 29 1 : tunables 120 60 0 : slabdata 3 3 0 buffer_head 1467 9477 48 81 1 : tunables 120 60 0 : slabdata 117 117 0 mm_struct 98 98 512 7 1 : tunables 54 27 0 : slabdata 14 14 0 vm_area_struct 7667 8272 84 47 1 : tunables 120 60 0 : slabdata 176 176 0 fs_cache 102 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 99 99 416 9 1 : tunables 54 27 0 : slabdata 11 11 0 signal_cache 122 123 96 41 1 : tunables 120 60 0 : slabdata 3 3 0 sighand_cache 105 105 1312 3 1 : tunables 24 12 0 : slabdata 35 35 0 task_struct 115 115 1424 5 2 : tunables 24 12 0 : slabdata 23 23 0 anon_vma 1839 2035 8 407 1 : tunables 120 60 0 : slabdata 5 5 0 pgd 89 89 4096 1 1 : tunables 24 12 0 : slabdata 89 89 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 8 8 16384 1 4 : tunables 8 4 0 : slabdata 8 8 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 10 10 8192 1 2 : tunables 8 4 0 : slabdata 10 10 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 190 190 4096 1 1 : tunables 24 12 0 : slabdata 190 190 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 170 192 2048 2 1 : tunables 24 12 0 : slabdata 96 96 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 132 132 1024 4 1 : tunables 54 27 0 : slabdata 33 33 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 184 448 512 8 1 : tunables 54 27 0 : slabdata 56 56 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 180 435 256 15 1 : tunables 120 60 0 : slabdata 29 29 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0 size-192 100 100 192 20 1 : tunables 120 60 0 : slabdata 5 5 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 1176 1271 128 31 1 : tunables 120 60 0 : slabdata 41 41 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 1290 2440 64 61 1 : tunables 120 60 0 : slabdata 40 40 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 1368 1428 32 119 1 : tunables 120 60 0 : slabdata 12 12 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 dmesg >/foo: Linux version 2.6.8-rc4 (root@coyote.coyote.den) (gcc version 3.3.3 20040412 (Red Hat Linux 3.3.3-7)) #3 Wed Aug 11 04:58:21 EDT 2004 BIOS-provided physical RAM map: BIOS-e820: 0000000000000000 - 00000000000a0000 (usable) BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved) BIOS-e820: 0000000000100000 - 000000003fff0000 (usable) BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS) BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data) BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved) BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved) BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved) 127MB HIGHMEM available. 896MB LOWMEM available. On node 0 totalpages: 262128 DMA zone: 4096 pages, LIFO batch:1 Normal zone: 225280 pages, LIFO batch:16 HighMem zone: 32752 pages, LIFO batch:7 DMI 2.2 present. ACPI: RSDP (v000 Nvidia ) @ 0x000f7220 ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x3fff3000 ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x3fff3040 ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x3fff7dc0 ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x00000000 Built 1 zonelists Kernel command line: ro root=/dev/hda7 elevator=cfq Initializing CPU#0 CPU 0 irqstacks, hard=c0408000 soft=c0407000 PID hash table entries: 4096 (order 12: 32768 bytes) Detected 2088.428 MHz processor. Using tsc for high-res timesource Console: colour VGA+ 80x25 Dentry cache hash table entries: 131072 (order: 7, 524288 bytes) Inode-cache hash table entries: 65536 (order: 6, 262144 bytes) Memory: 1035184k/1048512k available (2080k kernel code, 12424k reserved, 863k data, 140k init, 131008k highmem) Checking if this processor honours the WP bit even in supervisor mode... Ok. Calibrating delay loop... 4128.76 BogoMIPS Security Scaffold v1.0.0 initialized Capability LSM initialized Mount-cache hash table entries: 512 (order: 0, 4096 bytes) CPU: After generic identify, caps: 0383fbff c1c3fbff 00000000 00000000 CPU: After vendor identify, caps: 0383fbff c1c3fbff 00000000 00000000 CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line) CPU: L2 Cache: 512K (64 bytes/line) CPU: After all inits, caps: 0383fbff c1c3fbff 00000000 00000020 CPU: AMD Athlon(tm) XP 2800+ stepping 00 Enabling fast FPU save and restore... done. Enabling unmasked SIMD FPU exception support... done. Checking 'hlt' instruction... OK. NET: Registered protocol family 16 PCI: PCI BIOS revision 2.10 entry at 0xfb4c0, last bus=2 PCI: Using configuration type 1 mtrr: v2.0 (20020519) Linux Plug and Play Support v0.97 (c) Adam Belay pnp: the driver 'system' has been registered PnPBIOS: Scanning system for PnP BIOS support... PnPBIOS: Found PnP BIOS installation structure at 0xc00fbf30 PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xbf60, dseg 0xf0000 pnp: match found with the PnP device '00:07' and the driver 'system' pnp: match found with the PnP device '00:08' and the driver 'system' PnPBIOS: 16 nodes reported by PnP BIOS; 16 recorded by driver SCSI subsystem initialized usbcore: registered new driver usbfs usbcore: registered new driver hub PCI: Probing PCI hardware PCI: Probing PCI hardware (bus 00) PCI: nForce2 C1 Halt Disconnect fixup PCI: Using IRQ router default [10de/01e0] at 0000:00:00.0 radeonfb: Found Intel x86 BIOS ROM Image radeonfb: Retreived PLL infos from BIOS radeonfb: Reference=27.00 MHz (RefDiv=12) Memory=200.00 Mhz, System=166.00 MHz radeonfb: Monitor 1 type CRT found radeonfb: Monitor 2 type no found radeonfb: ATI Radeon Yd DDR SGRAM 128 MB apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac) highmem bounce pool size: 64 pages udf: registering filesystem isapnp: Scanning for PnP cards... isapnp: No Plug & Play device found lp: driver loaded but no devices found Real Time Clock Driver v1.12 Linux agpgart interface v0.100 (c) Dave Jones agpgart: Detected NVIDIA nForce2 chipset agpgart: Maximum main memory to use for agp memory: 941M agpgart: AGP aperture is 128M @ 0xc0000000 [drm] Initialized radeon 1.11.0 20020828 on minor 0: ATI Technologies Inc RV280 [Radeon 9200 SE] ipmi message handler version v32 ipmi device interface version v32 Serial: 8250/16550 driver $Revision: 1.90 $ 6 ports, IRQ sharing enabled ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A pnp: the driver 'serial' has been registered pnp: match found with the PnP device '00:0b' and the driver 'serial' pnp: match found with the PnP device '00:0f' and the driver 'serial' pnp: the driver 'parport_pc' has been registered pnp: match found with the PnP device '00:0d' and the driver 'parport_pc' parport: PnPBIOS parport detected. parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE] lp0: using parport0 (interrupt-driven). Using cfq io scheduler Floppy drive(s): fd0 is 1.44M, fd1 is 360K PC FDC 0 is a post-1991 82077 loop: loaded (max 8 devices) Linux video capture interface: v1.00 Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx NFORCE2: IDE controller at PCI slot 0000:00:09.0 NFORCE2: chipset revision 162 NFORCE2: not 100% native mode: will probe irqs later NFORCE2: BIOS didn't set cable bits correctly. Enabling workaround. NFORCE2: 0000:00:09.0 (rev a2) UDMA133 controller ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA hda: Maxtor 6Y120P0, ATA DISK drive hdb: Maxtor 54610H6, ATA DISK drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hdc: LITE-ON DVDRW LDW-451S, ATAPI CD/DVD-ROM drive ide1 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 240121728 sectors (122942 MB) w/7936KiB Cache, CHS=65535/16/63, UDMA(133) hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 > hdb: max request size: 128KiB hdb: 90045648 sectors (46103 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100) hdb: hdb1 hdb2 hdb3 hdb4 hdc: ATAPI 40X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33) Uniform CD-ROM driver Revision: 3.20 ehci_hcd 0000:00:02.2: nVidia Corporation nForce2 USB Controller PCI: Setting latency timer of device 0000:00:02.2 to 64 ehci_hcd 0000:00:02.2: irq 5, pci mem f985b000 ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 1 PCI: cache line size of 64 is not supported by device 0000:00:02.2 ehci_hcd 0000:00:02.2: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10 hub 1-0:1.0: USB hub found hub 1-0:1.0: 6 ports detected ohci_hcd: 2004 Feb 02 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI) ohci_hcd: block sizes: ed 64 td 64 ohci_hcd 0000:00:02.0: nVidia Corporation nForce2 USB Controller PCI: Setting latency timer of device 0000:00:02.0 to 64 ohci_hcd 0000:00:02.0: irq 12, pci mem f985d000 ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2 hub 2-0:1.0: USB hub found hub 2-0:1.0: 3 ports detected ohci_hcd 0000:00:02.1: nVidia Corporation nForce2 USB Controller (#2) PCI: Setting latency timer of device 0000:00:02.1 to 64 ohci_hcd 0000:00:02.1: irq 11, pci mem f985f000 ohci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 3 hub 3-0:1.0: USB hub found hub 3-0:1.0: 3 ports detected usbcore: registered new driver usblp drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver Initializing USB Mass Storage driver... usbcore: registered new driver usb-storage USB Mass Storage support registered. usbcore: registered new driver hiddev usbcore: registered new driver usbhid drivers/usb/input/hid-core.c: v2.0:USB HID core driver usbcore: registered new driver usbserial drivers/usb/serial/usb-serial.c: USB Serial Driver core v2.0 drivers/usb/serial/usb-serial.c: USB Serial support registered for PL-2303 usbcore: registered new driver pl2303 drivers/usb/serial/pl2303.c: Prolific PL2303 USB to serial adaptor driver v0.11 mice: PS/2 mouse device common for all mice input: PC Speaker serio: i8042 AUX port at 0x60,0x64 irq 12 serio: i8042 KBD port at 0x60,0x64 irq 1 input: AT Translated Set 2 keyboard on isa0060/serio0 i2c /dev entries driver NET: Registered protocol family 2 IP: routing cache hash table of 8192 buckets, 64Kbytes TCP: Hash tables configured (established 262144 bind 65536) NET: Registered protocol family 1 NET: Registered protocol family 17 kjournald starting. Commit interval 5 seconds EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Freeing unused kernel memory: 140k freed ohci_hcd 0000:00:02.0: wakeup ohci_hcd 0000:00:02.1: wakeup usb 2-1: new full speed USB device using address 2 usb 2-2: new low speed USB device using address 3 input: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-0000:00:02.0-2 usb 3-1: new full speed USB device using address 2 drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 2 if 0 alt 0 proto 2 vid 0x04B8 pid 0x0005 usb 3-2: new full speed USB device using address 3 hub 3-2:1.0: USB hub found hub 3-2:1.0: 4 ports detected usb 3-3: new full speed USB device using address 4 drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 4 if 0 alt 0 proto 2 vid 0x04B8 pid 0x0005 usb 3-2.3: new full speed USB device using address 5 EXT3 FS on hda7, internal journal Adding 3857104k swap on /dev/hdb4. Priority:-1 extents:1 kjournald starting. Commit interval 5 seconds EXT3 FS on hda1, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS on hda3, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS on hda8, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS on hda5, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS on hda6, internal journal EXT3-fs: mounted filesystem with ordered data mode. 8139too Fast Ethernet driver 0.9.27 eth0: RealTek RTL8139 at 0xf9a39000, 00:50:ba:5d:eb:7d, IRQ 12 eth0: Identified 8139 chip type 'RTL-8139C' forcedeth: Unknown parameter `mem' forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.28. PCI: Setting latency timer of device 0000:00:04.0 to 64 eth1: forcedeth.c: subsystem: 01565:2301 bound to 0000:00:04.0 forcedeth: Unknown parameter `mem' forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.28. PCI: Setting latency timer of device 0000:00:04.0 to 64 eth0: forcedeth.c: subsystem: 01565:2301 bound to 0000:00:04.0 eth0: no link during initialization. eth0: link up. eth0: Promiscuous mode enabled. device eth0 entered promiscuous mode PCI: Setting latency timer of device 0000:00:06.0 to 64 intel8x0_measure_ac97_clock: measured 49436 usecs intel8x0: clocking to 47459 kjournald starting. Commit interval 5 seconds EXT3 FS on hdb3, internal journal EXT3-fs: mounted filesystem with ordered data mode. kjournald starting. Commit interval 5 seconds EXT3 FS on hdb3, internal journal EXT3-fs: mounted filesystem with ordered data mode. Unable to handle kernel NULL pointer dereference at virtual address 00000004 printing eip: c014e0dc *pde = 00000000 Oops: 0002 [#1] PREEMPT Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg CPU: 0 EIP: 0060:[<c014e0dc>] Not tainted EFLAGS: 00010246 (2.6.8-rc4) EIP is at remove_inode_buffers+0x4c/0x90 eax: 00000000 ebx: d7ff68b4 ecx: d7ffffb4 edx: 00000000 esi: d7ff67e0 edi: 00000001 ebp: c198bed8 esp: c198bec8 ds: 007b es: 007b ss: 0068 Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050) Stack: d7ff67e0 d7ff67e8 d7ff67e0 0000001e c198bf04 c0165242 d7ff67e0 c198b000 00000000 0000001e d7ff6988 ed3be928 00000080 00000000 c198b000 c198bf10 c016532f 00000080 c198bf44 c013a32c 00000080 000000d0 0002cc1d 013b0a00 Call Trace: [<c010476f>] show_stack+0x7f/0xa0 [<c0104908>] show_registers+0x158/0x1b0 [<c0104a89>] die+0x89/0x100 [<c0111725>] do_page_fault+0x1f5/0x553 [<c01043d9>] error_code+0x2d/0x38 [<c0165242>] prune_icache+0x142/0x1f0 [<c016532f>] shrink_icache_memory+0x3f/0x50 [<c013a32c>] shrink_slab+0x14c/0x190 [<c013b639>] balance_pgdat+0x1a9/0x1f0 [<c013b73f>] kswapd+0xbf/0xd0 [<c0102471>] kernel_thread_helper+0x5/0x14 Code: 89 50 04 89 02 89 49 04 89 09 8b 03 39 d8 89 c1 75 e2 b8 00 <6>note: kswapd0[66] exited with preempt_count 1 And a 'ps axv': PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 1 ? S 0:01 13 31 1440 288 0.0 init [3] 2 ? SWN 0:00 0 0 0 0 0.0 [ksoftirqd/0] 3 ? SW< 0:00 0 0 0 0 0.0 [events/0] 4 ? SW< 0:00 0 0 0 0 0.0 [khelper] 21 ? SW< 0:00 0 0 0 0 0.0 [kblockd/0] 22 ? SW 0:00 0 0 0 0 0.0 [khubd] 62 ? SW 0:00 0 0 0 0 0.0 [kapmd] 64 ? SW 0:00 0 0 0 0 0.0 [pdflush] 65 ? SW 0:00 0 0 0 0 0.0 [pdflush] 67 ? SW< 0:00 0 0 0 0 0.0 [aio/0] 194 ? SW 0:00 0 0 0 0 0.0 [kseriod] 231 ? SW 0:01 0 0 0 0 0.0 [kjournald] 1419 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1420 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1421 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1422 ? SW 0:01 0 0 0 0 0.0 [kjournald] 1423 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1874 ? S 0:00 5 27 1432 308 0.0 syslogd -m 0 1878 ? S 0:00 7 20 1383 316 0.0 klogd -x 1889 ? S 0:00 4 27 1508 272 0.0 portmap 1955 ? S 0:00 9 149 2918 376 0.0 arpwatch -u pcap -e root -s root (Arpwatch) 1964 ? S 0:00 34 237 8886 920 0.0 cupsd 2092 ? S 0:00 0 262 3385 440 0.0 /usr/sbin/sshd 2107 ? S 0:00 1 143 1872 356 0.0 xinetd -stayalive -pidfile /var/run/xinetd.pid 2129 ? S 0:00 1 690 6357 1948 0.1 sendmail: accepting connections 2140 ? S 0:00 0 690 5433 1544 0.1 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue 2150 ? S 0:00 0 78 1521 324 0.0 gpm -m /dev/input/mice -t imps2 2162 ? S 0:00 34 250 22985 5564 0.5 /usr/sbin/httpd 2215 ? S 0:00 0 133 1758 264 0.0 /usr/sbin/cannaserver -syslog -u bin 2227 ? S 0:00 0 23 1488 332 0.0 crond 2259 ? S 0:00 1 250 22985 5452 0.5 /usr/sbin/httpd 2260 ? S 0:00 1 250 22985 5452 0.5 /usr/sbin/httpd 2261 ? S 0:00 1 250 22985 5452 0.5 /usr/sbin/httpd 2262 ? S 0:00 1 250 22985 5452 0.5 /usr/sbin/httpd 2263 ? S 0:00 1 250 22985 5452 0.5 /usr/sbin/httpd 2264 ? S 0:00 1 250 22985 5452 0.5 /usr/sbin/httpd 2265 ? S 0:00 1 250 22985 5452 0.5 /usr/sbin/httpd 2266 ? S 0:00 1 250 22985 5452 0.5 /usr/sbin/httpd 2298 ? S 0:02 87 72 7675 2936 0.2 xfs -droppriv -daemon 2307 ? S 0:00 4 2727 7584 1584 0.1 smbd -D 2311 ? S 0:00 3 868 7183 1044 0.1 nmbd -D 2347 ? S 0:00 0 15 1484 320 0.0 /usr/sbin/atd 2364 ? S 0:00 0 230 1629 508 0.0 dbus-daemon-1 --system 2377 ttyS0 S 2:10 5 153 1810 756 0.0 /usr/local/bulldog/upsd 2482 ? S 0:00 0 554 3797 696 0.0 /bin/sh /root/bin/setibatch -run 2569 ? S 0:00 4 17 2594 424 0.0 login -- root 2570 tty2 S 0:00 0 8 1375 236 0.0 /sbin/mingetty tty2 2571 tty3 S 0:00 0 8 1375 236 0.0 /sbin/mingetty tty3 2572 tty4 S 0:00 0 8 1375 236 0.0 /sbin/mingetty tty4 2573 tty5 S 0:00 0 8 1375 236 0.0 /sbin/mingetty tty5 2574 tty6 S 0:00 0 8 1375 236 0.0 /sbin/mingetty tty6 2725 tty1 S 0:00 1 554 3801 372 0.0 -bash 2757 tty1 S 0:00 1 554 3741 276 0.0 /bin/sh /usr/X11R6/bin/startx 2768 tty1 S 0:00 4 8 2195 284 0.0 xinit /etc/X11/xinit/xinitrc -- 2769 ? S< 36:22 975 1498 186685 26996 2.6 X :0 2783 tty1 S 0:00 2 554 3745 296 0.0 /bin/sh /root/kde3.3-beta2/bin/startkde 2802 ? S 0:00 0 55 3252 280 0.0 ssh-agent /etc/X11/xinit/Xclients 2844 ? S 0:00 14 34 24313 2724 0.2 kdeinit: Running... 2847 ? S 0:00 18 34 23533 2280 0.2 kdeinit: dcopserver --nosid 2849 ? S 0:00 47 34 27793 3504 0.3 kdeinit: klauncher 2852 ? S 0:11 69 34 28977 4812 0.4 kdeinit: kded 2853 ? S 0:18 62 135 4160 2316 0.2 fam 2861 ? S 0:04 28 34 27033 4244 0.4 kdeinit: kxkb 2869 ? S 1:16 159 154 20441 6680 0.6 artsd -F 11 -S 4096 -a alsa -s 15 -m artsmessage -c drkonqi -l 3 -f 2871 ? S 0:04 33 34 33453 6632 0.6 kdeinit: knotify 2873 tty1 S 0:00 7 1 20546 4508 0.4 ksmserver 2874 ? S 0:13 58 34 36565 5776 0.5 kdeinit: kwin -session 11c0a80103000104545060000000016110000_1092106762_766963 2877 ? S 0:03 12 34 25225 4160 0.4 kdeinit: khotkeys 2878 ? S 0:23 99 34 28225 6604 0.6 kdeinit: kdesktop 2885 ? S 0:47 152 34 45721 8400 0.8 kdeinit: kicker 2888 ? S 0:03 53 34 42293 5768 0.5 kdeinit: klipper 2893 ? S 0:15 43 61 28138 6068 0.5 korgac --miniicon korganizer 2894 ? S 0:05 54 830 28705 6176 0.5 kgpg -session 11c0a84703000109133389800000023980007_1092106762_470713 2897 ? S 0:05 44 240 28987 5196 0.5 knotes -session 11c0a84703000107107498500000013490017_1092106762_471216 2899 ? S 0:11 33 34 28829 5200 0.5 kdeinit: kmix -session 11c0a84703000109118722300000023870010_1092106762_471781 2900 ? S 0:04 95 34 29497 5996 0.5 kdeinit: konsole -session 11c0a84703000109146439400000023670008_1092106762_472128 -name Qt-subapplication 2904 pts/1 S 0:00 0 554 3801 368 0.0 /bin/bash 2908 ? S 1:21 144 590 34129 4292 0.4 /usr/bin/gkrellm --sm-client-id 11c0a84703000109197323200000021320011 2922 ? S 7:24 1138 11 90820 39460 3.8 kmail -session 11c0a84703000109193895800000021320008_1092106762_468128 2924 ? S 0:46 15 1060 4947 1964 0.1 /usr/local/bulldog/monitor 2927 ? S 0:00 2 96 25219 2576 0.2 kalarmd --login 2946 pts/1 S 0:00 0 33 3934 264 0.0 tail -f /var/log/messages 2948 ? S 3:25 34 34 29369 5664 0.5 kdeinit: konsole 2949 pts/2 S 0:00 1 554 3801 368 0.0 /bin/bash 2967 pts/2 S 4:47 1 47 1776 668 0.0 top 2976 ? S 0:14 88 34 34297 6204 0.5 kdeinit: konsole 2977 pts/3 S 0:00 8 554 3805 880 0.0 /bin/bash 14824 ? S 0:04 56 34 28885 5824 0.5 kdeinit: kio_uiserver 32416 ? S 0:00 0 141 1566 748 0.0 /usr/sbin/smartd 14706 ? S 0:00 0 37 2482 700 0.0 /usr/bin/esd -terminate -nobeeps -as 2 -spawnfd 17 19677 ? S 0:00 24 34 27993 4084 0.3 kdeinit: kio_file file /tmp/ksocket-root/klauncher4bqXZb.slave-socket /tmp/ksocket-root/kioexecyNpOWa.slave-socket 20320 ? S 0:00 1 23 1492 380 0.0 CROND 20321 ? Z 0:00 0 0 0 0 0.0 [night-switch] <defunct> 20333 ttyS1 S 0:00 5 44 1567 524 0.0 heyu_relay ck 20335 ? S 0:00 1 690 5449 1812 0.1 /usr/sbin/sendmail -FCronDaemon -i -odi -oem root@coyote.coyote.den 20339 ? S 0:04 1 44 1571 480 0.0 heyu monitor 20353 ? S 0:00 0 19 1392 368 0.0 /usr/local/bin/xtend -f /etc/.xtendrc 20807 ? S 0:00 0 627 5232 1380 0.1 /sbin/mount.smbfs //gene.coyote.den/public /mnt/gene -o rw username root password XXXXXXXXX 20809 ? SW 0:00 0 0 0 0 0.0 [smbiod] 20812 ? S 0:00 1 627 5180 1308 0.1 /sbin/mount.smbfs //gene.coyote.den/dlds /mnt/dlds -o rw username root password XXXXXXXXX 4848 ? RN 40:32 18 131 16596 14448 1.3 /usr/local/bin/setiathome -stop_after_process -nice 19 5157 ? S 0:00 0 23 1488 396 0.0 CROND 5158 ? S 0:00 3 554 1449 736 0.0 /bin/sh /root/bin/backup-me-nightly 5160 ? SW 0:00 0 0 0 0 0.0 [kjournald] 5324 ? D 0:01 6 46 1377 356 0.0 umount /mnt/hdb3 5418 ? S 0:00 28 34 51109 7512 0.7 kdeinit: kio_pop3 pop3 /tmp/ksocket-root/klauncher4bqXZb.slave-socket /tmp/ksocket-root/kmailhxI3Ga.slave-socket 5532 pts/3 R 0:00 0 64 2143 576 0.0 ps axv The system is still up, and I'll probably leave it till the rsync run is done, maybe another 30 minutes if it stays up. Humm, its not running now according to top or a ps, but the partition /dev/hdb3 is still mounted according to /etc/mtab. So I assume its safe to reboot if rsync isn't alive yet. I presume it will self-destruct eventually if kswapd isn't on duty. Is this enough for an autopsy? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-13 4:27 ` Gene Heskett @ 2004-08-13 8:32 ` Gene Heskett 2004-08-14 2:18 ` Marcelo Tosatti 1 sibling, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-13 8:32 UTC (permalink / raw) To: linux-kernel; +Cc: Linus Torvalds, Andrew Morton, Al Viro On Friday 13 August 2004 00:27, Gene Heskett wrote: >On Wednesday 11 August 2004 00:59, Linus Torvalds wrote: >>I wrote: >>> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps >>> axm" helps too. >> >>That should be "ps axv" of course. Just shows what a retard I am. >> >> Linus > Acck! I just logged another Oops, this time with only: 04:22:50 up 3:52, 5 users, load average: 4.31, 2.95, 2.22 uptime. Aug 13 04:20:21 coyote kernel: Unable to handle kernel paging request at virtual address 00003614 Aug 13 04:20:21 coyote kernel: printing eip: Aug 13 04:20:21 coyote kernel: c01632ae Aug 13 04:20:21 coyote kernel: *pde = 00000000 Aug 13 04:20:21 coyote kernel: Oops: 0000 [#1] Aug 13 04:20:21 coyote kernel: PREEMPT Aug 13 04:20:21 coyote kernel: Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg Aug 13 04:20:21 coyote kernel: CPU: 0 Aug 13 04:20:21 coyote kernel: EIP: 0060:[<c01632ae>] Not tainted Aug 13 04:20:21 coyote kernel: EFLAGS: 00010206 (2.6.8-rc4) Aug 13 04:20:21 coyote kernel: EIP is at prune_dcache+0x14e/0x1c0 Aug 13 04:20:21 coyote kernel: eax: 00003600 ebx: dbbf3070 ecx: da707230 edx: da703430 Aug 13 04:20:21 coyote kernel: esi: da703420 edi: c198b000 ebp: c198bf04 esp: c198beec Aug 13 04:20:21 coyote kernel: ds: 007b es: 007b ss: 0068 Aug 13 04:20:21 coyote kernel: Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050) Aug 13 04:20:21 coyote kernel: Stack: df5580fc c198bef0 00000046 00000080 00000000 c198b000 c198bf10 c0163770 Aug 13 04:20:21 coyote kernel: 00000080 c198bf44 c013a32c 00000080 000000d0 0001f3cf 02277700 00000000 Aug 13 04:20:21 coyote kernel: 0000011a 00000000 f7ffea60 c035c624 00000002 0000000a c198bf8c c013b639 Aug 13 04:20:21 coyote kernel: Call Trace: Aug 13 04:20:21 coyote kernel: [<c010476f>] show_stack+0x7f/0xa0 Aug 13 04:20:21 coyote kernel: [<c0104908>] show_registers+0x158/0x1b0 Aug 13 04:20:21 coyote kernel: [<c0104a89>] die+0x89/0x100 Aug 13 04:20:21 coyote kernel: [<c0111725>] do_page_fault+0x1f5/0x553 Aug 13 04:20:21 coyote kernel: [<c01043d9>] error_code+0x2d/0x38 Aug 13 04:20:21 coyote kernel: [<c0163770>] shrink_dcache_memory+0x20/0x50 Aug 13 04:20:21 coyote kernel: [<c013a32c>] shrink_slab+0x14c/0x190 Aug 13 04:20:21 coyote kernel: [<c013b639>] balance_pgdat+0x1a9/0x1f0 Aug 13 04:20:21 coyote kernel: [<c013b73f>] kswapd+0xbf/0xd0 Aug 13 04:20:21 coyote kernel: [<c0102471>] kernel_thread_helper+0x5/0x14 Aug 13 04:20:21 coyote kernel: Code: 8b 50 14 85 d2 75 27 89 34 24 e8 83 2b 00 00 8b 73 0c 89 1c [root@coyote themes]# cat /proc/meminfo MemTotal: 1035844 kB MemFree: 4184 kB Buffers: 23072 kB Cached: 109932 kB SwapCached: 864 kB Active: 171624 kB Inactive: 113532 kB HighTotal: 131008 kB HighFree: 280 kB LowTotal: 904836 kB LowFree: 3904 kB SwapTotal: 3857104 kB SwapFree: 3827944 kB Dirty: 76 kB Writeback: 0 kB Mapped: 195660 kB Slab: 736384 kB Committed_AS: 315580 kB PageTables: 3200 kB VmallocTotal: 114680 kB VmallocUsed: 19644 kB VmallocChunk: 94932 kB But top says I'm 102140 K into the swap????? slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> unix_sock 164 170 384 10 1 : tunables 54 27 0 : slabdata 17 17 0 tcp_tw_bucket 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 tcp_bind_bucket 19 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 ip_fib_hash 10 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 ip_dst_cache 4 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0 arp_cache 3 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 480 8 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 2 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 31 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 flow_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 mqueue_inode_cache 1 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 udf_inode_cache 0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0 smb_request 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 smb_inode_cache 2 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 132 132 352 11 1 : tunables 54 27 0 : slabdata 12 12 0 ext2_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 journal_handle 16 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 1357 2754 48 81 1 : tunables 120 60 0 : slabdata 34 34 0 revoke_table 12 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 ext3_inode_cache 1328526 1328526 416 9 1 : tunables 54 27 0 : slabdata 147614 147614 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 172 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 file_lock_cache 43 43 92 43 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 2 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 shmem_inode_cache 5 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 9 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0 sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0 sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0 sgpool-8 32 62 128 31 1 : tunables 120 60 0 : slabdata 2 2 0 cfq_pool 64 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 130 195 60 65 1 : tunables 120 60 0 : slabdata 3 3 0 blkdev_ioc 76 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 12 18 448 9 1 : tunables 54 27 0 : slabdata 2 2 0 blkdev_requests 96 156 152 26 1 : tunables 120 60 0 : slabdata 6 6 0 biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0 biovec-64 265 265 768 5 1 : tunables 54 27 0 : slabdata 53 53 0 biovec-16 280 280 192 20 1 : tunables 120 60 0 : slabdata 14 14 0 biovec-4 272 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0 biovec-1 364 678 16 226 1 : tunables 120 60 0 : slabdata 3 3 0 bio 369 549 64 61 1 : tunables 120 60 0 : slabdata 9 9 0 sock_inode_cache 202 209 352 11 1 : tunables 54 27 0 : slabdata 19 19 0 skbuff_head_cache 245 400 160 25 1 : tunables 120 60 0 : slabdata 16 16 0 sock 3 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 proc_inode_cache 337 408 320 12 1 : tunables 54 27 0 : slabdata 34 34 0 sigqueue 27 27 148 27 1 : tunables 120 60 0 : slabdata 1 1 0 radix_tree_node 5371 14994 276 14 1 : tunables 54 27 0 : slabdata 1071 1071 0 bdev_cache 11 18 416 9 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 26 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 2179 2212 288 14 1 : tunables 54 27 0 : slabdata 158 158 0 dentry_cache 1061172 1061172 140 28 1 : tunables 120 60 0 : slabdata 37899 37899 0 filp 1970 2150 160 25 1 : tunables 120 60 0 : slabdata 86 86 0 names_cache 9 9 4096 1 1 : tunables 24 12 0 : slabdata 9 9 0 idr_layer_cache 81 87 136 29 1 : tunables 120 60 0 : slabdata 3 3 0 buffer_head 11414 40014 48 81 1 : tunables 120 60 0 : slabdata 494 494 0 mm_struct 98 98 512 7 1 : tunables 54 27 0 : slabdata 14 14 0 vm_area_struct 7442 7802 84 47 1 : tunables 120 60 0 : slabdata 166 166 0 fs_cache 88 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 88 99 416 9 1 : tunables 54 27 0 : slabdata 11 11 0 signal_cache 108 123 96 41 1 : tunables 120 60 0 : slabdata 3 3 0 sighand_cache 102 105 1312 3 1 : tunables 24 12 0 : slabdata 35 35 0 task_struct 110 115 1424 5 2 : tunables 24 12 0 : slabdata 23 23 0 anon_vma 1674 2035 8 407 1 : tunables 120 60 0 : slabdata 5 5 0 pgd 89 89 4096 1 1 : tunables 24 12 0 : slabdata 89 89 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 4 4 16384 1 4 : tunables 8 4 0 : slabdata 4 4 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 11 11 8192 1 2 : tunables 8 4 0 : slabdata 11 11 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 184 184 4096 1 1 : tunables 24 12 0 : slabdata 184 184 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 166 170 2048 2 1 : tunables 24 12 0 : slabdata 85 85 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 120 120 1024 4 1 : tunables 54 27 0 : slabdata 30 30 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 169 448 512 8 1 : tunables 54 27 0 : slabdata 56 56 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 165 420 256 15 1 : tunables 120 60 0 : slabdata 28 28 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0 size-192 98 100 192 20 1 : tunables 120 60 0 : slabdata 5 5 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 1231 1240 128 31 1 : tunables 120 60 0 : slabdata 40 40 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 49226 49227 64 61 1 : tunables 120 60 0 : slabdata 807 807 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 1309 1309 32 119 1 : tunables 120 60 0 : slabdata 11 11 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 But dentry_cache 1061172 ? PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 1 ? S 0:01 12 31 1440 480 0.0 init [3] 2 ? SWN 0:00 0 0 0 0 0.0 [ksoftirqd/0] 3 ? SW< 0:00 0 0 0 0 0.0 [events/0] 4 ? SW< 0:00 0 0 0 0 0.0 [khelper] 21 ? SW< 0:01 0 0 0 0 0.0 [kblockd/0] 22 ? SW 0:00 0 0 0 0 0.0 [khubd] 62 ? SW 0:00 0 0 0 0 0.0 [kapmd] 64 ? SW 0:00 0 0 0 0 0.0 [pdflush] 65 ? SW 0:00 0 0 0 0 0.0 [pdflush] 67 ? SW< 0:00 0 0 0 0 0.0 [aio/0] 194 ? SW 0:00 0 0 0 0 0.0 [kseriod] 231 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1415 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1416 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1417 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1418 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1419 ? SW 0:00 0 0 0 0 0.0 [kjournald] 1886 ? S 0:00 0 27 1432 580 0.0 syslogd -m 0 1890 ? S 0:00 0 20 1383 448 0.0 klogd -x 1901 ? S 0:00 1 27 1508 568 0.0 portmap 1911 ? S 0:00 0 141 1566 752 0.0 /usr/sbin/smartd 1977 ? S 0:00 1 149 2918 996 0.0 arpwatch -u pcap -e root -s root (Arpwatch) 1986 ? S 0:00 9 237 8886 3336 0.3 cupsd 2114 ? S 0:00 0 262 3385 1460 0.1 /usr/sbin/sshd 2129 ? S 0:00 1 143 1872 916 0.0 xinetd -stayalive -pidfile /var/run/xinetd.pid 2151 ? S 0:00 1 690 6357 2808 0.2 sendmail: accepting connections 2162 ? S 0:00 0 690 5433 2364 0.2 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue 2172 ? S 0:00 0 78 1521 460 0.0 gpm -m /dev/input/mice -t imps2 2184 ? S 0:00 34 250 22985 10396 1.0 /usr/sbin/httpd 2237 ? S 0:00 0 133 1758 1056 0.1 /usr/sbin/cannaserver -syslog -u bin 2249 ? S 0:00 1 23 1488 652 0.0 crond 2261 ? S 0:00 0 250 22985 10412 1.0 /usr/sbin/httpd 2262 ? S 0:00 0 250 22985 10408 1.0 /usr/sbin/httpd 2263 ? S 0:00 0 250 22985 10408 1.0 /usr/sbin/httpd 2264 ? S 0:00 0 250 22985 10408 1.0 /usr/sbin/httpd 2265 ? S 0:00 0 250 22985 10408 1.0 /usr/sbin/httpd 2266 ? S 0:00 0 250 22985 10408 1.0 /usr/sbin/httpd 2267 ? S 0:00 0 250 22985 10408 1.0 /usr/sbin/httpd 2268 ? S 0:00 0 250 22985 10408 1.0 /usr/sbin/httpd 2320 ? S 0:00 18 72 7383 6012 0.5 xfs -droppriv -daemon 2329 ? S 0:00 4 2727 7584 2676 0.2 smbd -D 2333 ? S 0:00 1 868 7183 2004 0.1 nmbd -D 2347 ? S 0:00 0 627 7324 2072 0.2 /sbin/mount.smbfs //gene.coyote.den/public /mnt/gene -o rw username root password XXXXXXXXX 2349 ? SW 0:00 0 0 0 0 0.0 [smbiod] 2352 ? S 0:00 0 627 7272 1980 0.1 /sbin/mount.smbfs //gene.coyote.den/dlds /mnt/dlds -o rw username root password XXXXXXXXX 2369 ? S 0:00 0 15 1484 604 0.0 /usr/sbin/atd 2386 ? S 0:00 0 230 1629 816 0.0 dbus-daemon-1 --system 2399 ttyS0 S 0:13 4 153 1810 900 0.0 /usr/local/bulldog/upsd 2498 ttyS1 S 0:00 0 44 1567 564 0.0 heyu_relay ck 2499 ? S 0:01 0 44 1571 556 0.0 heyu monitor 2503 ? S 0:00 0 19 1392 424 0.0 xtend -f /etc/.xtendrc 2504 ? S 0:00 0 554 3797 1176 0.1 /bin/sh /root/bin/setibatch -run 2615 ? S 0:00 3 17 2594 1072 0.1 login -- root 2616 tty2 S 0:00 0 8 1375 340 0.0 /sbin/mingetty tty2 2617 tty3 S 0:00 0 8 1375 340 0.0 /sbin/mingetty tty3 2618 tty4 S 0:00 0 8 1375 340 0.0 /sbin/mingetty tty4 2619 tty5 S 0:00 0 8 1375 340 0.0 /sbin/mingetty tty5 2620 tty6 S 0:00 0 8 1375 340 0.0 /sbin/mingetty tty6 2771 tty1 S 0:00 0 554 3801 1332 0.1 -bash 2803 tty1 S 0:00 0 554 3741 1012 0.0 /bin/sh /usr/X11R6/bin/startx 2814 tty1 S 0:00 3 8 2195 508 0.0 xinit /etc/X11/xinit/xinitrc -- 2815 ? S< 5:39 63 1498 153113 18040 1.7 X :0 2829 tty1 S 0:00 1 554 3745 896 0.0 /bin/sh /root/kde3.3-beta2/bin/startkde 2848 ? S 0:00 0 55 3252 560 0.0 ssh-agent /etc/X11/xinit/Xclients 2890 ? S 0:00 15 34 24313 4156 0.4 kdeinit: Running... 2893 ? S 0:00 6 34 23521 3608 0.3 kdeinit: dcopserver --nosid 2895 ? S 0:00 47 34 26377 4532 0.4 kdeinit: klauncher 2898 ? S 0:01 50 34 28673 6096 0.5 kdeinit: kded 2899 ? S 0:00 12 135 3136 1740 0.1 fam 2907 ? S 0:00 5 34 27025 5772 0.5 kdeinit: kxkb 2915 ? S 0:06 72 154 19217 2588 0.2 artsd -F 11 -S 4096 -a alsa -s 15 -m artsmessage -c drkonqi -l 3 -f 2917 ? S 0:00 11 34 33445 5748 0.5 kdeinit: knotify 2918 tty1 S 0:00 25 1 20538 4560 0.4 ksmserver 2919 ? S 0:01 35 34 27273 6612 0.6 kdeinit: kwin -session 11c0a80103000104545060000000016110000_1092371276_861353 2922 ? S 0:00 2 34 25225 5220 0.5 kdeinit: khotkeys 2923 ? S 0:03 39 34 30149 7712 0.7 kdeinit: kdesktop 2926 ? S 0:03 90 34 32925 8216 0.7 kdeinit: kicker 2928 ? S 0:00 33 34 42277 6432 0.6 kdeinit: klipper 2931 ? S 0:01 23 61 30106 6588 0.6 korgac --miniicon korganizer 2935 ? S 0:00 6 830 28705 6520 0.6 kgpg -session 11c0a84703000109133389800000023980007_1092371276_68550 2937 ? S 0:00 11 240 30979 6068 0.5 knotes -session 11c0a84703000107107498500000013490017_1092371275_892454 2939 ? S 0:01 4 34 28801 7048 0.6 kdeinit: kmix -session 11c0a84703000109118722300000023870010_1092371275_970301 2940 ? S 0:01 17 34 31385 6768 0.6 kdeinit: konsole -session 11c0a84703000109146439400000023670008_1092371275_892612 -name Qt-subapplication 2941 ? S 0:06 49 590 31101 5060 0.4 /usr/bin/gkrellm --sm-client-id 11c0a84703000109197323200000021320011 2944 pts/1 S 0:00 0 554 3801 1120 0.1 /bin/bash 2952 ? S 0:19 3 34 31369 6752 0.6 kdeinit: konsole -session 11c0a84703000109223264800000028730008_1092371276_29035 -name Qt-subapplication 2953 ? S 0:01 16 34 31377 7276 0.7 kdeinit: konsole -session 11c0a84703000109223268000000028730009_1092371275_892929 -name Qt-subapplication 2955 ? S 0:04 15 1060 4947 1608 0.1 /usr/local/bulldog/monitor 2956 ? S 0:00 1 96 25219 3616 0.3 kalarmd --login 2965 pts/2 S 0:00 5 554 3805 1308 0.1 /bin/bash 2976 pts/3 S 0:00 0 554 3801 1120 0.1 /bin/bash 3005 ? S 2:53 474 11 134332 76968 7.4 kmail 3093 pts/1 S 0:00 0 33 3934 504 0.0 tail -f /var/log/messages 3095 pts/3 S 0:26 1 47 1776 912 0.0 top 7587 ? S 0:00 0 37 2482 720 0.0 /usr/bin/esd -terminate -nobeeps -as 2 -spawnfd 17 7923 ? RN 16:18 7 131 17624 15104 1.4 /usr/local/bin/setiathome -stop_after_process -nice 19 8081 ? S 0:00 0 23 1492 656 0.0 CROND 8082 ? S 0:00 0 554 1449 824 0.0 /bin/bash /usr/bin/run-parts /etc/cron.daily 8700 ? S 0:00 0 690 5433 2432 0.2 /usr/sbin/sendmail -FCronDaemon -i -odi -oem root 10304 ? SN 0:00 0 554 1449 788 0.0 /bin/sh /etc/cron.daily/slocate.cron 10305 ? S 0:00 0 245 1558 496 0.0 awk -v progname=/etc/cron.daily/slocate.cron progname {????? print progname ":\n"????? progname="";???? }???? { print; } 10307 ? DN 0:12 0 27 1508 712 0.0 /usr/bin/updatedb 10320 ? S 0:00 3 34 51105 4676 0.4 kdeinit: kio_pop3 pop3 /tmp/ksocket-root/klauncherilvgna.slave-socket /tmp/ksocket-root/kmailkizaqa.slave-socket 10359 ? S 0:00 8 34 26645 8180 0.7 kdeinit: kio_file file /tmp/ksocket-root/klauncherilvgna.slave-socket /tmp/ksocket-root/kmailWoeNac.slave-socket 10362 ? S 0:00 48 34 28797 11940 1.1 kdeinit: kio_uiserver 10371 pts/2 R 0:00 1 64 2143 576 0.0 ps axv I won't repeat the dmesg as it, except for the Oops, will be the same. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-13 4:27 ` Gene Heskett 2004-08-13 8:32 ` Gene Heskett @ 2004-08-14 2:18 ` Marcelo Tosatti 2004-08-14 5:19 ` Gene Heskett ` (2 more replies) 1 sibling, 3 replies; 146+ messages in thread From: Marcelo Tosatti @ 2004-08-14 2:18 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Linus Torvalds, Andrew Morton, Al Viro On Fri, Aug 13, 2004 at 12:27:24AM -0400, Gene Heskett wrote: > On Wednesday 11 August 2004 00:59, Linus Torvalds wrote: > >I wrote: > >> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps > >> axm" helps too. > > > >That should be "ps axv" of course. Just shows what a retard I am. > > > > Linus > Acck! I just logged an Oops: > Aug 13 00:02:00 coyote kernel: kjournald starting. Commit interval 5 seconds > Aug 13 00:02:00 coyote kernel: EXT3 FS on hdb3, internal journal > Aug 13 00:02:00 coyote kernel: EXT3-fs: mounted filesystem with ordered data mode. > Aug 13 00:05:09 coyote kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004 > Aug 13 00:05:09 coyote kernel: printing eip: > Aug 13 00:05:09 coyote kernel: c014e0dc > Aug 13 00:05:09 coyote kernel: *pde = 00000000 > Aug 13 00:05:09 coyote kernel: Oops: 0002 [#1] > Aug 13 00:05:09 coyote kernel: PREEMPT > Aug 13 00:05:09 coyote kernel: Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg > Aug 13 00:05:09 coyote kernel: CPU: 0 > Aug 13 00:05:09 coyote kernel: EIP: 0060:[<c014e0dc>] Not tainted > Aug 13 00:05:09 coyote kernel: EFLAGS: 00010246 (2.6.8-rc4) > Aug 13 00:05:09 coyote kernel: EIP is at remove_inode_buffers+0x4c/0x90 > Aug 13 00:05:09 coyote kernel: eax: 00000000 ebx: d7ff68b4 ecx: d7ffffb4 edx: 00000000 > Aug 13 00:05:09 coyote kernel: esi: d7ff67e0 edi: 00000001 ebp: c198bed8 esp: c198bec8 > Aug 13 00:05:09 coyote kernel: ds: 007b es: 007b ss: 0068 > Aug 13 00:05:09 coyote kernel: Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050) > Aug 13 00:05:09 coyote kernel: Stack: d7ff67e0 d7ff67e8 d7ff67e0 0000001e c198bf04 c0165242 d7ff67e0 c198b000 > Aug 13 00:05:09 coyote kernel: 00000000 0000001e d7ff6988 ed3be928 00000080 00000000 c198b000 c198bf10 > Aug 13 00:05:09 coyote kernel: c016532f 00000080 c198bf44 c013a32c 00000080 000000d0 0002cc1d 013b0a00 > Aug 13 00:05:09 coyote kernel: Call Trace: > Aug 13 00:05:09 coyote kernel: [<c010476f>] show_stack+0x7f/0xa0 > Aug 13 00:05:09 coyote kernel: [<c0104908>] show_registers+0x158/0x1b0 > Aug 13 00:05:09 coyote kernel: [<c0104a89>] die+0x89/0x100 > Aug 13 00:05:09 coyote kernel: [<c0111725>] do_page_fault+0x1f5/0x553 > Aug 13 00:05:09 coyote kernel: [<c01043d9>] error_code+0x2d/0x38 > Aug 13 00:05:09 coyote kernel: [<c0165242>] prune_icache+0x142/0x1f0 > Aug 13 00:05:09 coyote kernel: [<c016532f>] shrink_icache_memory+0x3f/0x50 > Aug 13 00:05:09 coyote kernel: [<c013a32c>] shrink_slab+0x14c/0x190 > Aug 13 00:05:09 coyote kernel: [<c013b639>] balance_pgdat+0x1a9/0x1f0 > Aug 13 00:05:09 coyote kernel: [<c013b73f>] kswapd+0xbf/0xd0 > Aug 13 00:05:09 coyote kernel: [<c0102471>] kernel_thread_helper+0x5/0x14 > Aug 13 00:05:09 coyote kernel: Code: 89 50 04 89 02 89 49 04 89 09 8b 03 39 d8 89 c1 75 e2 b8 00 > Aug 13 00:05:09 coyote kernel: <6>note: kswapd0[66] exited with preempt_count 1 > > The first 3 entries are from a nightly run of rsync, which mounts a > normally unmounted partition for the duration of its run. Hi fellows, I've taken some time to look at this oopses, and I truly believe we are facing real corruption. The symptom is that an inode's (blockdev) i_mapping->private_list gets corrupted, one of its buffer_head's contains a b_assoc_mapping list_head with NULL pointers. And this is not an SMP race, because Gene is not running SMP. Gene's oops happens when remove_inode_buffers calls __remove_assoc_queue(bh) Ingo's oops happens while remove_inode_buffers does struct buffer_head *bh = BH_ENTRY(list->next); which is mov ffffffd8(%ecx), (%somewhere) %ecx is zero, so... There is a bug somewhere. --- a/fs/buffer.c.original 2004-08-14 00:19:55.000000000 -0300 +++ b/fs/buffer.c 2004-08-14 00:34:57.000000000 -0300 @@ -802,6 +802,8 @@ */ static inline void __remove_assoc_queue(struct buffer_head *bh) { + BUG_ON(bh->b_assoc_buffers.next == NULL); + BUG_ON(bh->b_assoc_buffers.prev == NULL); list_del_init(&bh->b_assoc_buffers); } @@ -1073,6 +1075,7 @@ spin_lock(&buffer_mapping->private_lock); while (!list_empty(list)) { + BUG_ON(list->next == NULL); struct buffer_head *bh = BH_ENTRY(list->next); if (buffer_dirty(bh)) { ret = 0; Ingo oops for reference: Unable to handle kernel paging request at virtual address ffffffd8 printing eip: c016a3d0 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 0 EIP: 0060:[<c016a3d0>] Not tainted VLI EFLAGS: 00010217 (2.6.8-rc2-mm2) EIP is at remove_inode_buffers+0x60/0xe0 eax: 00000000 ebx: c03ba9dc ecx: 00000000 edx: c03ba8d0 esi: c03ba8d0 edi: c0379b2a ebp: c4115ec4 esp: c4115eac ds: 007b es: 007b ss: 0068 Process kswapd0 (pid: 39, threadinfo=c4114000 task=c40aa070) Stack: c03ba8d0 c0379b76 00000001 c03ba8d8 c03ba8d0 00000000 c4115ef8 c0186c4c c03ba8d0 00000077 c4114000 00000000 0000004d 00000000 c4115ee4 c4115ee4 c4114000 c07fd6a0 00004e09 c4115f04 c0186df5 00000080 c4115f38 c014f4b3 Call Trace: [<c01059ff>] show_stack+0x8f/0xb0 [<c0105bb3>] show_registers+0x163/0x1d0 [<c0105dc6>] die+0xe6/0x1c0 [<c0117773>] do_page_fault+0x213/0x6c0 [<c0105674>] exception_start+0x6/0xe [<c0186c4c>] prune_icache+0x20c/0x390 [<c0186df5>] shrink_icache_memory+0x25/0x50 [<c014f4b3>] shrink_slab+0x123/0x1d0 [<c01511ee>] balance_pgdat+0x24e/0x2a0 [<c015130c>] kswapd+0xcc/0xe0 [<c0102899>] kernel_thread_helper+0x5/0xc Code: 00 e0 ff ff 21 e0 ff 40 14 8d 47 4c 89 45 ec 31 c0 86 47 4c 84 c0 0f 8e 79 00 \ 00 00 8b 86 0c 01 00 00 39 d8 74 23 89 c1 8d 76 00 <8b> 41 d8 a8 02 75 5a 8b 01 8b 51 \ 04 89 02 89 09 89 50 04 8b 03 <6>note: kswapd0[39] exited with preempt_count 1 ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-14 2:18 ` Marcelo Tosatti @ 2004-08-14 5:19 ` Gene Heskett 2004-08-14 5:50 ` Gene Heskett 2004-08-14 8:17 ` Gene Heskett 2 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-14 5:19 UTC (permalink / raw) To: linux-kernel; +Cc: Marcelo Tosatti, Linus Torvalds, Andrew Morton, Al Viro On Friday 13 August 2004 22:18, Marcelo Tosatti wrote: >On Fri, Aug 13, 2004 at 12:27:24AM -0400, Gene Heskett wrote: >> On Wednesday 11 August 2004 00:59, Linus Torvalds wrote: [...] > >Hi fellows, > >I've taken some time to look at this oopses, and I truly believe we >are facing real corruption. > >The symptom is that an inode's (blockdev) i_mapping->private_list > gets corrupted, one of its buffer_head's contains a b_assoc_mapping > list_head with NULL pointers. > >And this is not an SMP race, because Gene is not running SMP. > >Gene's oops happens when remove_inode_buffers calls > __remove_assoc_queue(bh) > >Ingo's oops happens while remove_inode_buffers does > > struct buffer_head *bh = BH_ENTRY(list->next); > >which is > > mov ffffffd8(%ecx), (%somewhere) > >%ecx is zero, so... > >There is a bug somewhere. > >--- a/fs/buffer.c.original 2004-08-14 00:19:55.000000000 -0300 >+++ b/fs/buffer.c 2004-08-14 00:34:57.000000000 -0300 >@@ -802,6 +802,8 @@ > */ > static inline void __remove_assoc_queue(struct buffer_head *bh) > { >+ BUG_ON(bh->b_assoc_buffers.next == NULL); >+ BUG_ON(bh->b_assoc_buffers.prev == NULL); > list_del_init(&bh->b_assoc_buffers); > } > >@@ -1073,6 +1075,7 @@ > > spin_lock(&buffer_mapping->private_lock); > while (!list_empty(list)) { >+ BUG_ON(list->next == NULL); > struct buffer_head *bh = BH_ENTRY(list->next); > if (buffer_dirty(bh)) { > ret = 0; > Marcelo; I've put in the patch that disables the prefetch, and thats been running ok so far, but uptime is still pretty short, in hours. But if it eventually does an Oops on me, the reboot will bring this one in too, its building right now. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-14 2:18 ` Marcelo Tosatti 2004-08-14 5:19 ` Gene Heskett @ 2004-08-14 5:50 ` Gene Heskett 2004-08-14 8:17 ` Gene Heskett 2 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-14 5:50 UTC (permalink / raw) To: linux-kernel; +Cc: Marcelo Tosatti, Linus Torvalds, Andrew Morton, Al Viro On Friday 13 August 2004 22:18, Marcelo Tosatti wrote: [...] > >Hi fellows, > >I've taken some time to look at this oopses, and I truly believe we >are facing real corruption. > >The symptom is that an inode's (blockdev) i_mapping->private_list > gets corrupted, one of its buffer_head's contains a b_assoc_mapping > list_head with NULL pointers. > >And this is not an SMP race, because Gene is not running SMP. > >Gene's oops happens when remove_inode_buffers calls > __remove_assoc_queue(bh) > >Ingo's oops happens while remove_inode_buffers does > > struct buffer_head *bh = BH_ENTRY(list->next); > >which is > > mov ffffffd8(%ecx), (%somewhere) > >%ecx is zero, so... > >There is a bug somewhere. > >--- a/fs/buffer.c.original 2004-08-14 00:19:55.000000000 -0300 >+++ b/fs/buffer.c 2004-08-14 00:34:57.000000000 -0300 >@@ -802,6 +802,8 @@ > */ > static inline void __remove_assoc_queue(struct buffer_head *bh) > { >+ BUG_ON(bh->b_assoc_buffers.next == NULL); >+ BUG_ON(bh->b_assoc_buffers.prev == NULL); > list_del_init(&bh->b_assoc_buffers); > } > >@@ -1073,6 +1075,7 @@ > > spin_lock(&buffer_mapping->private_lock); > while (!list_empty(list)) { >+ BUG_ON(list->next == NULL); > struct buffer_head *bh = BH_ENTRY(list->next); During the compile, the above line output this warning: fs/buffer.c: In function `remove_inode_buffers': fs/buffer.c:1079: warning: ISO C90 forbids mixed declarations and code Did the compiler do the right thing? Or is this perchance the bug? > if (buffer_dirty(bh)) { > ret = 0; In any event, its getting sleepy out, good night all. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-14 2:18 ` Marcelo Tosatti 2004-08-14 5:19 ` Gene Heskett 2004-08-14 5:50 ` Gene Heskett @ 2004-08-14 8:17 ` Gene Heskett 2004-08-15 4:09 ` Gene Heskett 2 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-14 8:17 UTC (permalink / raw) To: linux-kernel; +Cc: Marcelo Tosatti, Linus Torvalds, Andrew Morton, Al Viro On Friday 13 August 2004 22:18, Marcelo Tosatti wrote: >On Fri, Aug 13, 2004 at 12:27:24AM -0400, Gene Heskett wrote: >> On Wednesday 11 August 2004 00:59, Linus Torvalds wrote: >> >I wrote: >> >> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". >> >> "ps axv" helps too. [...] > >Hi fellows, > >I've taken some time to look at this oopses, and I truly believe we >are facing real corruption. > >The symptom is that an inode's (blockdev) i_mapping->private_list > gets corrupted, one of its buffer_head's contains a b_assoc_mapping > list_head with NULL pointers. > >And this is not an SMP race, because Gene is not running SMP. > >Gene's oops happens when remove_inode_buffers calls > __remove_assoc_queue(bh) > >Ingo's oops happens while remove_inode_buffers does > > struct buffer_head *bh = BH_ENTRY(list->next); > >which is > > mov ffffffd8(%ecx), (%somewhere) > >%ecx is zero, so... > >There is a bug somewhere. > >--- a/fs/buffer.c.original 2004-08-14 00:19:55.000000000 -0300 >+++ b/fs/buffer.c 2004-08-14 00:34:57.000000000 -0300 >@@ -802,6 +802,8 @@ > */ > static inline void __remove_assoc_queue(struct buffer_head *bh) > { >+ BUG_ON(bh->b_assoc_buffers.next == NULL); >+ BUG_ON(bh->b_assoc_buffers.prev == NULL); > list_del_init(&bh->b_assoc_buffers); > } > >@@ -1073,6 +1075,7 @@ > > spin_lock(&buffer_mapping->private_lock); > while (!list_empty(list)) { >+ BUG_ON(list->next == NULL); > struct buffer_head *bh = BH_ENTRY(list->next); > if (buffer_dirty(bh)) { > ret = 0; > Just for grins I occasionally do the up-arrow bit and re-run that slabinfo sorter line Linus gave me, watching the size of the dentry_cache line in particular. I believe I just saw a first, the size was reported as being slightly smaller that the last run an hour ago. Previously it had done nothing but grow. This is a kernel with two patches from -rc4, one being the list_del thing, the other being the one liner that presumably forces the fetch, not depending on the prefetch in this chip which conjecture says it might not be working 100%. Also, top is showing a relatively large amount of free memory even though a small amount is now in the swap. /proc/meminfo: MemTotal: 1035852 kB MemFree: 130452 kB Buffers: 70664 kB Cached: 420512 kB SwapCached: 400 kB Active: 384008 kB Inactive: 271184 kB HighTotal: 131008 kB HighFree: 308 kB LowTotal: 904844 kB LowFree: 130144 kB SwapTotal: 3857104 kB SwapFree: 3856452 kB Dirty: 136 kB Writeback: 0 kB Mapped: 222000 kB Slab: 239816 kB Committed_AS: 302408 kB PageTables: 3232 kB VmallocTotal: 114680 kB VmallocUsed: 19900 kB VmallocChunk: 94604 kB This with an uptime approaching 18 hours. With only the list_del patch, by now I would be down to 3-5 megs free, and 20-100 megs in swap. The 4am stuff just started, this was the killer yesterday morning. No probs at the 15 minute mark, looks good. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-14 8:17 ` Gene Heskett @ 2004-08-15 4:09 ` Gene Heskett 2004-08-15 8:48 ` viro 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-15 4:09 UTC (permalink / raw) To: linux-kernel; +Cc: Marcelo Tosatti, Linus Torvalds, Andrew Morton, Al Viro On Saturday 14 August 2004 04:17, Gene Heskett wrote: >On Friday 13 August 2004 22:18, Marcelo Tosatti wrote: >>On Fri, Aug 13, 2004 at 12:27:24AM -0400, Gene Heskett wrote: >>> On Wednesday 11 August 2004 00:59, Linus Torvalds wrote: >>> >I wrote: >>> >> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". >>> >> "ps axv" helps too. > >[...] > >>Hi fellows, >> >>I've taken some time to look at this oopses, and I truly believe we >>are facing real corruption. >> >>The symptom is that an inode's (blockdev) i_mapping->private_list >> gets corrupted, one of its buffer_head's contains a >> b_assoc_mapping list_head with NULL pointers. >> >>And this is not an SMP race, because Gene is not running SMP. >> >>Gene's oops happens when remove_inode_buffers calls >> __remove_assoc_queue(bh) >> >>Ingo's oops happens while remove_inode_buffers does >> >> struct buffer_head *bh = BH_ENTRY(list->next); >> >>which is >> >> mov ffffffd8(%ecx), (%somewhere) >> >>%ecx is zero, so... >> >>There is a bug somewhere. >> >>--- a/fs/buffer.c.original 2004-08-14 00:19:55.000000000 -0300 >>+++ b/fs/buffer.c 2004-08-14 00:34:57.000000000 -0300 >>@@ -802,6 +802,8 @@ >> */ >> static inline void __remove_assoc_queue(struct buffer_head *bh) >> { >>+ BUG_ON(bh->b_assoc_buffers.next == NULL); >>+ BUG_ON(bh->b_assoc_buffers.prev == NULL); >> list_del_init(&bh->b_assoc_buffers); >> } >> >>@@ -1073,6 +1075,7 @@ >> >> spin_lock(&buffer_mapping->private_lock); >> while (!list_empty(list)) { >>+ BUG_ON(list->next == NULL); >> struct buffer_head *bh = BH_ENTRY(list->next); >> if (buffer_dirty(bh)) { >> ret = 0; > >Just for grins I occasionally do the up-arrow bit and re-run that >slabinfo sorter line Linus gave me, watching the size of the >dentry_cache line in particular. I believe I just saw a first, the >size was reported as being slightly smaller that the last run an > hour ago. Previously it had done nothing but grow. This is a > kernel with two patches from -rc4, one being the list_del thing, > the other being the one liner that presumably forces the fetch, not > depending on the prefetch in this chip which conjecture says it > might not be working 100%. I spoke too soon, and I am now rebooted to this patch in addition to the 2 noted previously. It lasted 35 hours this time. I was looking at sendmail.mc with vim, trying to see if I could spot a reason that local mail to root only gets posted when the sendmail buffer needs flushed, often resulting in messages from 5am local time, finally making it into kmail at 10pm! Not finding anything obvious, I did a :q to quit. At that point everything froze including the clock on the lower right corner of the launch bar. The only unexplained entries in the log are: ----------------- Aug 14 22:44:04 coyote bonobo-activation-server (root-27863): iid OAFIID:BrokenNoType:20000808 has a NULL type Aug 14 22:44:04 coyote bonobo-activation-server (root-27863): invalid character '#' in iid 'OAFIID:This#!!%$iid%^$% _|~!OAFIID_ContainsBadChars' Aug 14 22:44:34 coyote gconfd (root-27861): GConf server is not in use, shutting down. Aug 14 22:44:34 coyote gconfd (root-27861): Exiting And of course the hardware clock was wrong since a normal shutdown wasn't done, just a tap on the reset button. Aug 15 03:37:17 coyote syslogd 1.4.1: restart. ---------------------- So I am as usual, puzzled. Or up that famous creek with no visible means of locomotion, apply as required. The only thing I've noted in the slabinfo reports is the ext3_cache was well into 6 digits in kilobytes. Now its only 15,000 of its normal units (whatever they are) after the reboot. But now we have the BUG_ON stuff from above installed, maybe that will disclose something we can use. That would brighten my mood which needs lots of help after watching my Shelty die today with no vet help available. We will miss him, he was part of the family for 11 years. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 4:09 ` Gene Heskett @ 2004-08-15 8:48 ` viro 2004-08-15 9:42 ` Gene Heskett ` (3 more replies) 0 siblings, 4 replies; 146+ messages in thread From: viro @ 2004-08-15 8:48 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote: > The only thing I've noted in the slabinfo reports is the ext3_cache > was well into 6 digits in kilobytes. Now its only 15,000 of its > normal units (whatever they are) after the reboot. What did dcache numbers look like at that time? Anyway, we could try the patch below and see what shows in /proc/fs/ext3 with it [NOTE: patch is completely untested]. It should show major:minor:inumber:mode for all currently allocated ext3 inodes. It won't be 100% accurate (we can miss some entries/get some twice if cache shrinks or grows at the time), but if the leak is so massive, we ought to see a *lot* of duplicates in there. Seeing what kind of inodes really leaks could narrow the things down. See if cat /proc/fs/ext3 | sort | uniq -c | sort -nr gives anything interesting when leak happens (and check it right after boot to see if it works at all and doesn't oops, obviously ;-) diff -urN RC8-current/fs/ext3/super.c RC8-leak/fs/ext3/super.c --- RC8-current/fs/ext3/super.c Sat Aug 14 05:35:37 2004 +++ RC8-leak/fs/ext3/super.c Sun Aug 15 04:41:09 2004 @@ -35,6 +35,8 @@ #include <linux/mount.h> #include <linux/namei.h> #include <linux/quotaops.h> +#include <linux/proc_fs.h> +#include <linux/seq_file.h> #include <asm/uaccess.h> #include "xattr.h" #include "acl.h" @@ -438,6 +440,9 @@ static kmem_cache_t *ext3_inode_cachep; +static LIST_HEAD(ext3_list); +static spinlock_t ext3_list_lock = SPIN_LOCK_UNLOCKED; + /* * Called inside transaction, so use GFP_NOFS */ @@ -453,11 +458,17 @@ ei->i_default_acl = EXT3_ACL_NOT_CACHED; #endif ei->vfs_inode.i_version = 1; + spin_lock(&ext3_list_lock); + list_add(&ei->list, &ext3_list); + spin_unlock(&ext3_list_lock); return &ei->vfs_inode; } static void ext3_destroy_inode(struct inode *inode) { + spin_lock(&ext3_list_lock); + list_del_init(&EXT3_I(inode)->list); + spin_unlock(&ext3_list_lock); kmem_cache_free(ext3_inode_cachep, EXT3_I(inode)); } @@ -475,20 +486,82 @@ inode_init_once(&ei->vfs_inode); } } + +static void *ext3_cache_start(struct seq_file *m, loff_t *pos) +{ + struct list_head *p; + loff_t l = *pos; + + spin_lock(&ext3_list_lock); + list_for_each(p, &ext3_list) + if (!l--) + return list_entry(p, struct ext3_inode_info, list); + return NULL; +} + +static void *ext3_cache_next(struct seq_file *m, void *v, loff_t *pos) +{ + struct list_head *p = ((struct ext3_inode_info *)v)->list.next; + (*pos)++; + return p==&ext3_list ? NULL : list_entry(p, struct ext3_inode_info, list); +} + +static void ext3_cache_stop(struct seq_file *m, void *v) +{ + spin_unlock(&ext3_list_lock); +} + +static int ext3_cache_show(struct seq_file *m, void *v) +{ + struct ext3_inode_info *ei = v; + struct inode *inode = &ei->vfs_inode; + seq_printf(m, "%d:%d:%lu:%o", + MAJOR(inode->i_sb->s_dev), + MINOR(inode->i_sb->s_dev), + inode->i_ino, + inode->i_mode); + return 0; +} + +static struct seq_operations ext3_cache_op = { + .start = ext3_cache_start, + .next = ext3_cache_next, + .stop = ext3_cache_stop, + .show = ext3_cache_show +}; + +static int ext3_cache_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &ext3_cache_op); +} + +static struct file_operations ext3_cache_operations = { + .open = ext3_cache_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; static int init_inodecache(void) { + struct proc_dir_entry *p; ext3_inode_cachep = kmem_cache_create("ext3_inode_cache", sizeof(struct ext3_inode_info), 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT, init_once, NULL); if (ext3_inode_cachep == NULL) return -ENOMEM; + p = create_proc_entry("fs/ext3", S_IRUGO, NULL); + if (p) { + p->owner = THIS_MODULE; + p->proc_fops = &ext3_cache_operations; + } return 0; } static void destroy_inodecache(void) { + remove_proc_entry("fs/ext3", NULL); if (kmem_cache_destroy(ext3_inode_cachep)) printk(KERN_INFO "ext3_inode_cache: not all structures were freed\n"); } diff -urN RC8-current/include/linux/ext3_fs_i.h RC8-leak/include/linux/ext3_fs_i.h --- RC8-current/include/linux/ext3_fs_i.h Thu Oct 9 17:34:54 2003 +++ RC8-leak/include/linux/ext3_fs_i.h Sun Aug 15 04:11:03 2004 @@ -107,6 +107,7 @@ * by other means, so we have truncate_sem. */ struct semaphore truncate_sem; + struct list_head list; struct inode vfs_inode; }; ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 8:48 ` viro @ 2004-08-15 9:42 ` Gene Heskett 2004-08-15 17:31 ` Andrew Morton 2004-08-15 9:50 ` Gene Heskett ` (2 subsequent siblings) 3 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-15 9:42 UTC (permalink / raw) To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sunday 15 August 2004 04:48, viro@parcelfarce.linux.theplanet.co.uk wrote: >On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote: >> The only thing I've noted in the slabinfo reports is the >> ext3_cache was well into 6 digits in kilobytes. Now its only >> 15,000 of its normal units (whatever they are) after the reboot. > >What did dcache numbers look like at that time? IIRC the last time I checked before it locked up, dcache was in the 57xxx kilobytes area. Right now, after about 5 6 hours uptime, that line in raw format is:dentry_cache 731159 772632 and:ext3_inode_cache 1024365 1055817 Now, this mornings logwatch told me I should go look at the logs again, and I found this had occurred several hours earlier: ----------- Aug 14 18:53:24 coyote kernel: Unable to handle kernel paging request at virtual address 0058af03 Aug 14 18:53:24 coyote kernel: printing eip: Aug 14 18:53:24 coyote kernel: c01648bc Aug 14 18:53:24 coyote kernel: *pde = 00000000 Aug 14 18:53:24 coyote kernel: Oops: 0002 [#1] Aug 14 18:53:24 coyote kernel: PREEMPT Aug 14 18:53:24 coyote kernel: Modules linked in: tuner tvaudio bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq _midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_allo c snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg Aug 14 18:53:24 coyote kernel: CPU: 0 Aug 14 18:53:24 coyote kernel: EIP: 0060:[<c01648bc>] Not tainted Aug 14 18:53:24 coyote kernel: EFLAGS: 00010202 (2.6.8-rc4) Aug 14 18:53:24 coyote kernel: EIP is at dispose_list+0x1c/0xa0 Aug 14 18:53:24 coyote kernel: eax: 0058aeff ebx: ddfc9140 ecx: ddfc9148 edx: c198bef0 Aug 14 18:53:24 coyote kernel: esi: c198bef0 edi: 00000075 ebp: c198bed8 esp: c198bec0 Aug 14 18:53:24 coyote kernel: ds: 007b es: 007b ss: 0068 Aug 14 18:53:24 coyote kernel: Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050) Aug 14 18:53:24 coyote kernel: Stack: ddfc92e0 c198bec4 c198bec4 c198b000 cb2be2a0 00000080 c198bf04 c0164c37 Aug 14 18:53:24 coyote kernel: c198bef0 c198b000 00000000 00000080 ddfc9148 cdf9b668 00000080 00000000 Aug 14 18:53:24 coyote kernel: c198b000 c198bf10 c0164daf 00000080 c198bf44 c0139fd4 00000080 000000d0 Aug 14 18:53:24 coyote kernel: Call Trace: Aug 14 18:53:24 coyote kernel: [<c010476f>] show_stack+0x7f/0xa0 Aug 14 18:53:25 coyote kernel: [<c0104908>] show_registers+0x158/0x1b0 Aug 14 18:53:25 coyote kernel: [<c0104a89>] die+0x89/0x100 Aug 14 18:53:25 coyote kernel: [<c0111725>] do_page_fault+0x1f5/0x553 Aug 14 18:53:25 coyote kernel: [<c01043d9>] error_code+0x2d/0x38 Aug 14 18:53:25 coyote kernel: [<c0164c37>] prune_icache+0xb7/0x1f0 Aug 14 18:53:25 coyote kernel: [<c0164daf>] shrink_icache_memory+0x3f/0x50 Aug 14 18:53:25 coyote kernel: [<c0139fd4>] shrink_slab+0x134/0x170 Aug 14 18:53:25 coyote kernel: [<c013b25d>] balance_pgdat+0x1ad/0x1f0 Aug 14 18:53:25 coyote kernel: [<c013b35f>] kswapd+0xbf/0xd0 Aug 14 18:53:25 coyote kernel: [<c0102471>] kernel_thread_helper+0x5/0x14 Aug 14 18:53:25 coyote kernel: Code: 89 50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 8b 83 ----------------- which was about 5 hours before the lockup. >Anyway, we could try the patch below and see what shows in > /proc/fs/ext3 with it [NOTE: patch is completely untested]. It > should show major:minor:inumber:mode >for all currently allocated ext3 inodes. It won't be 100% accurate > (we can miss some entries/get some twice if cache shrinks or grows > at the time), but if the leak is so massive, we ought to see a > *lot* of duplicates in there. Seeing what kind of inodes really > leaks could narrow the things down. > >See if cat /proc/fs/ext3 | sort | uniq -c | sort -nr gives anything > interesting when leak happens (and check it right after boot to see > if it works at all and doesn't oops, obviously ;-) > >diff -urN RC8-current/fs/ext3/super.c RC8-leak/fs/ext3/super.c >--- RC8-current/fs/ext3/super.c Sat Aug 14 05:35:37 2004 >+++ RC8-leak/fs/ext3/super.c Sun Aug 15 04:41:09 2004 >@@ -35,6 +35,8 @@ > #include <linux/mount.h> > #include <linux/namei.h> > #include <linux/quotaops.h> >+#include <linux/proc_fs.h> >+#include <linux/seq_file.h> > #include <asm/uaccess.h> > #include "xattr.h" > #include "acl.h" >@@ -438,6 +440,9 @@ > > static kmem_cache_t *ext3_inode_cachep; > >+static LIST_HEAD(ext3_list); >+static spinlock_t ext3_list_lock = SPIN_LOCK_UNLOCKED; >+ > /* > * Called inside transaction, so use GFP_NOFS > */ >@@ -453,11 +458,17 @@ > ei->i_default_acl = EXT3_ACL_NOT_CACHED; > #endif > ei->vfs_inode.i_version = 1; >+ spin_lock(&ext3_list_lock); >+ list_add(&ei->list, &ext3_list); >+ spin_unlock(&ext3_list_lock); > return &ei->vfs_inode; > } > > static void ext3_destroy_inode(struct inode *inode) > { >+ spin_lock(&ext3_list_lock); >+ list_del_init(&EXT3_I(inode)->list); >+ spin_unlock(&ext3_list_lock); > kmem_cache_free(ext3_inode_cachep, EXT3_I(inode)); > } > >@@ -475,20 +486,82 @@ > inode_init_once(&ei->vfs_inode); > } > } >+ >+static void *ext3_cache_start(struct seq_file *m, loff_t *pos) >+{ >+ struct list_head *p; >+ loff_t l = *pos; >+ >+ spin_lock(&ext3_list_lock); >+ list_for_each(p, &ext3_list) >+ if (!l--) >+ return list_entry(p, struct ext3_inode_info, list); >+ return NULL; >+} >+ >+static void *ext3_cache_next(struct seq_file *m, void *v, loff_t > *pos) +{ >+ struct list_head *p = ((struct ext3_inode_info *)v)->list.next; >+ (*pos)++; >+ return p==&ext3_list ? NULL : list_entry(p, struct > ext3_inode_info, list); +} >+ >+static void ext3_cache_stop(struct seq_file *m, void *v) >+{ >+ spin_unlock(&ext3_list_lock); >+} >+ >+static int ext3_cache_show(struct seq_file *m, void *v) >+{ >+ struct ext3_inode_info *ei = v; >+ struct inode *inode = &ei->vfs_inode; >+ seq_printf(m, "%d:%d:%lu:%o", >+ MAJOR(inode->i_sb->s_dev), >+ MINOR(inode->i_sb->s_dev), >+ inode->i_ino, >+ inode->i_mode); >+ return 0; >+} >+ >+static struct seq_operations ext3_cache_op = { >+ .start = ext3_cache_start, >+ .next = ext3_cache_next, >+ .stop = ext3_cache_stop, >+ .show = ext3_cache_show >+}; >+ >+static int ext3_cache_open(struct inode *inode, struct file *file) >+{ >+ return seq_open(file, &ext3_cache_op); >+} >+ >+static struct file_operations ext3_cache_operations = { >+ .open = ext3_cache_open, >+ .read = seq_read, >+ .llseek = seq_lseek, >+ .release = seq_release, >+}; > > static int init_inodecache(void) > { >+ struct proc_dir_entry *p; > ext3_inode_cachep = kmem_cache_create("ext3_inode_cache", > sizeof(struct ext3_inode_info), > 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT, > init_once, NULL); > if (ext3_inode_cachep == NULL) > return -ENOMEM; >+ p = create_proc_entry("fs/ext3", S_IRUGO, NULL); >+ if (p) { >+ p->owner = THIS_MODULE; >+ p->proc_fops = &ext3_cache_operations; >+ } > return 0; > } > > static void destroy_inodecache(void) > { >+ remove_proc_entry("fs/ext3", NULL); > if (kmem_cache_destroy(ext3_inode_cachep)) > printk(KERN_INFO "ext3_inode_cache: not all structures were > freed\n"); } >diff -urN RC8-current/include/linux/ext3_fs_i.h > RC8-leak/include/linux/ext3_fs_i.h --- > RC8-current/include/linux/ext3_fs_i.h Thu Oct 9 17:34:54 2003 +++ > RC8-leak/include/linux/ext3_fs_i.h Sun Aug 15 04:11:03 2004 @@ > -107,6 +107,7 @@ > * by other means, so we have truncate_sem. > */ > struct semaphore truncate_sem; >+ struct list_head list; > struct inode vfs_inode; > }; ---------- I'll put this in right now. Thanks. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 9:42 ` Gene Heskett @ 2004-08-15 17:31 ` Andrew Morton 2004-08-15 17:58 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Andrew Morton @ 2004-08-15 17:31 UTC (permalink / raw) To: gene.heskett; +Cc: linux-kernel, viro, marcelo.tosatti, torvalds Gene Heskett <gene.heskett@verizon.net> wrote: > > ... > > Now, this mornings logwatch told me I should go look at the > logs again, and I found this had occurred several hours earlier: > ----------- > Aug 14 18:53:24 coyote kernel: Unable to handle kernel paging request at virtual address 0058af03 This oops is the _cause_ of the out-of-memory condition. The oopsing process exitted while holding shrinker_sem, so slab will never again be shrunk. Any observed behaviour after an oops is almost always uninteresting, and usually misleading. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 17:31 ` Andrew Morton @ 2004-08-15 17:58 ` Gene Heskett 0 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-15 17:58 UTC (permalink / raw) To: linux-kernel; +Cc: Andrew Morton, viro, marcelo.tosatti, torvalds On Sunday 15 August 2004 13:31, Andrew Morton wrote: >Gene Heskett <gene.heskett@verizon.net> wrote: >> ... >> >> Now, this mornings logwatch told me I should go look at the >> logs again, and I found this had occurred several hours earlier: >> ----------- >> Aug 14 18:53:24 coyote kernel: Unable to handle kernel paging >> request at virtual address 0058af03 > >This oops is the _cause_ of the out-of-memory condition. The > oopsing process exitted while holding shrinker_sem, so slab will > never again be shrunk. > >Any observed behaviour after an oops is almost always uninteresting, > and usually misleading. Okaaaay, now what. See my post of 10 minutes ago, this top "top" took a SIGABRT exit. I posted the Oops, but now here is meminfo: [root@coyote linux-2.6.8-rc4]# cat /proc/meminfo MemTotal: 1035848 kB MemFree: 238016 kB Buffers: 98756 kB Cached: 491324 kB SwapCached: 0 kB Active: 343340 kB Inactive: 416908 kB HighTotal: 131008 kB HighFree: 252 kB LowTotal: 904840 kB LowFree: 237764 kB SwapTotal: 3857104 kB SwapFree: 3857104 kB Dirty: 56 kB Writeback: 0 kB Mapped: 229924 kB Slab: 27416 kB Committed_AS: 333992 kB PageTables: 3292 kB VmallocTotal: 114680 kB VmallocUsed: 19636 kB VmallocChunk: 94936 kB And slabinfo: [root@coyote linux-2.6.8-rc4]# cat /proc/slabinfo slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> unix_sock 173 180 384 10 1 : tunables 54 27 0 : slabdata 18 18 0 tcp_tw_bucket 2 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_bind_bucket 21 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 ip_fib_hash 10 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 ip_dst_cache 12 30 256 15 1 : tunables 120 60 0 : slabdata 2 2 0 arp_cache 3 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 480 8 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 2 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 31 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 flow_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 mqueue_inode_cache 1 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 udf_inode_cache 0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0 smb_request 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 smb_inode_cache 2 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 1 11 352 11 1 : tunables 54 27 0 : slabdata 1 1 0 ext2_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 journal_handle 4 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 95 243 48 81 1 : tunables 120 60 0 : slabdata 3 3 0 revoke_table 12 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 ext3_inode_cache 20079 20088 448 9 1 : tunables 54 27 0 : slabdata 2232 2232 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 172 370 20 185 1 : tunables 120 60 0 : slabdata 2 2 0 file_lock_cache 43 43 92 43 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 2 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 shmem_inode_cache 5 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 7 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0 sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0 sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0 sgpool-8 32 62 128 31 1 : tunables 120 60 0 : slabdata 2 2 0 cfq_pool 64 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 65 65 60 65 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_ioc 63 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 12 18 448 9 1 : tunables 54 27 0 : slabdata 2 2 0 blkdev_requests 52 52 152 26 1 : tunables 120 60 0 : slabdata 2 2 0 biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0 biovec-64 256 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0 biovec-16 256 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0 biovec-4 256 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0 biovec-1 318 452 16 226 1 : tunables 120 60 0 : slabdata 2 2 0 bio 342 366 64 61 1 : tunables 120 60 0 : slabdata 6 6 0 sock_inode_cache 210 220 352 11 1 : tunables 54 27 0 : slabdata 20 20 0 skbuff_head_cache 250 325 160 25 1 : tunables 120 60 0 : slabdata 13 13 0 sock 3 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 proc_inode_cache 600 600 320 12 1 : tunables 54 27 0 : slabdata 50 50 0 sigqueue 117 135 148 27 1 : tunables 120 60 0 : slabdata 5 5 0 radix_tree_node 9073 9604 276 14 1 : tunables 54 27 0 : slabdata 686 686 0 bdev_cache 11 18 416 9 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 26 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 2198 2198 288 14 1 : tunables 54 27 0 : slabdata 157 157 0 dentry_cache 33792 33796 140 28 1 : tunables 120 60 0 : slabdata 1207 1207 0 filp 2030 2125 160 25 1 : tunables 120 60 0 : slabdata 85 85 0 names_cache 16 16 4096 1 1 : tunables 24 12 0 : slabdata 16 16 0 idr_layer_cache 81 87 136 29 1 : tunables 120 60 0 : slabdata 3 3 0 buffer_head 76638 76707 48 81 1 : tunables 120 60 0 : slabdata 947 947 0 mm_struct 91 91 512 7 1 : tunables 54 27 0 : slabdata 13 13 0 vm_area_struct 7703 7896 84 47 1 : tunables 120 60 0 : slabdata 168 168 0 fs_cache 100 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 90 90 416 9 1 : tunables 54 27 0 : slabdata 10 10 0 signal_cache 119 123 96 41 1 : tunables 120 60 0 : slabdata 3 3 0 sighand_cache 102 102 1312 3 1 : tunables 24 12 0 : slabdata 34 34 0 task_struct 110 110 1424 5 2 : tunables 24 12 0 : slabdata 22 22 0 anon_vma 1619 2035 8 407 1 : tunables 120 60 0 : slabdata 5 5 0 pgd 87 87 4096 1 1 : tunables 24 12 0 : slabdata 87 87 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 4 4 16384 1 4 : tunables 8 4 0 : slabdata 4 4 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 11 11 8192 1 2 : tunables 8 4 0 : slabdata 11 11 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 180 180 4096 1 1 : tunables 24 12 0 : slabdata 180 180 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 162 186 2048 2 1 : tunables 24 12 0 : slabdata 93 93 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 124 124 1024 4 1 : tunables 54 27 0 : slabdata 31 31 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 184 448 512 8 1 : tunables 54 27 0 : slabdata 56 56 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 180 450 256 15 1 : tunables 120 60 0 : slabdata 30 30 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0 size-192 100 100 192 20 1 : tunables 120 60 0 : slabdata 5 5 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 1174 1209 128 31 1 : tunables 120 60 0 : slabdata 39 39 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 915 915 64 61 1 : tunables 120 60 0 : slabdata 15 15 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 1369 1428 32 119 1 : tunables 120 60 0 : slabdata 12 12 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 And a dmesg after a dmesg -c: returns an empty file. Next please? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 8:48 ` viro 2004-08-15 9:42 ` Gene Heskett @ 2004-08-15 9:50 ` Gene Heskett 2004-08-15 10:36 ` viro 2004-08-15 10:10 ` Gene Heskett 2004-08-16 22:52 ` Gene Heskett 3 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-15 9:50 UTC (permalink / raw) To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sunday 15 August 2004 04:48, viro@parcelfarce.linux.theplanet.co.uk wrote: >On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote: >> The only thing I've noted in the slabinfo reports is the >> ext3_cache was well into 6 digits in kilobytes. Now its only >> 15,000 of its normal units (whatever they are) after the reboot. > And I just noticed this go by during the build: ---------- fs/buffer.c: In function `remove_inode_buffers': fs/buffer.c:1079: warning: ISO C90 forbids mixed declarations and code ---------- Do we need to address this? Its a line immediately below the BUG-ON patch that Marcelo had me put in most recently, and has probably been there all along. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 9:50 ` Gene Heskett @ 2004-08-15 10:36 ` viro 0 siblings, 0 replies; 146+ messages in thread From: viro @ 2004-08-15 10:36 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sun, Aug 15, 2004 at 05:50:26AM -0400, Gene Heskett wrote: > fs/buffer.c: In function `remove_inode_buffers': > fs/buffer.c:1079: warning: ISO C90 forbids mixed declarations and code > ---------- > Do we need to address this? Its a line immediately below the BUG-ON > patch that Marcelo had me put in most recently, and has probably been > there all along. No, it had appeared when Marcelo had put BUG_ON() before a declaration of local variable. Not acceptable for merge into the tree, but OK for a debugging patch. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 8:48 ` viro 2004-08-15 9:42 ` Gene Heskett 2004-08-15 9:50 ` Gene Heskett @ 2004-08-15 10:10 ` Gene Heskett 2004-08-15 10:37 ` viro 2004-08-16 22:52 ` Gene Heskett 3 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-15 10:10 UTC (permalink / raw) To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sunday 15 August 2004 04:48, viro@parcelfarce.linux.theplanet.co.uk wrote: >On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote: >> The only thing I've noted in the slabinfo reports is the >> ext3_cache was well into 6 digits in kilobytes. Now its only >> 15,000 of its normal units (whatever they are) after the reboot. > >What did dcache numbers look like at that time? > >Anyway, we could try the patch below and see what shows in > /proc/fs/ext3 with it [NOTE: patch is completely untested]. It > should show major:minor:inumber:mode >for all currently allocated ext3 inodes. It won't be 100% accurate > (we can miss some entries/get some twice if cache shrinks or grows > at the time), but if the leak is so massive, we ought to see a > *lot* of duplicates in there. Seeing what kind of inodes really > leaks could narrow the things down. > >See if cat /proc/fs/ext3 | sort | uniq -c | sort -nr gives anything > interesting when leak happens (and check it right after boot to see > if it works at all and doesn't oops, obviously ;-) It doesn't Oops when I use that line, but at 15,000+ entries spit out all in one line of text, its a bit hard to locate real duplicates. But I think I see some right now! Can this line be modified to spit them out, one entry per line with all dups sorted to be adjacent? >diff -urN RC8-current/fs/ext3/super.c RC8-leak/fs/ext3/super.c >--- RC8-current/fs/ext3/super.c Sat Aug 14 05:35:37 2004 >+++ RC8-leak/fs/ext3/super.c Sun Aug 15 04:41:09 2004 >@@ -35,6 +35,8 @@ > #include <linux/mount.h> > #include <linux/namei.h> > #include <linux/quotaops.h> >+#include <linux/proc_fs.h> >+#include <linux/seq_file.h> > #include <asm/uaccess.h> > #include "xattr.h" > #include "acl.h" >@@ -438,6 +440,9 @@ > > static kmem_cache_t *ext3_inode_cachep; > >+static LIST_HEAD(ext3_list); >+static spinlock_t ext3_list_lock = SPIN_LOCK_UNLOCKED; >+ > /* > * Called inside transaction, so use GFP_NOFS > */ >@@ -453,11 +458,17 @@ > ei->i_default_acl = EXT3_ACL_NOT_CACHED; > #endif > ei->vfs_inode.i_version = 1; >+ spin_lock(&ext3_list_lock); >+ list_add(&ei->list, &ext3_list); >+ spin_unlock(&ext3_list_lock); > return &ei->vfs_inode; > } > > static void ext3_destroy_inode(struct inode *inode) > { >+ spin_lock(&ext3_list_lock); >+ list_del_init(&EXT3_I(inode)->list); >+ spin_unlock(&ext3_list_lock); > kmem_cache_free(ext3_inode_cachep, EXT3_I(inode)); > } > >@@ -475,20 +486,82 @@ > inode_init_once(&ei->vfs_inode); > } > } >+ >+static void *ext3_cache_start(struct seq_file *m, loff_t *pos) >+{ >+ struct list_head *p; >+ loff_t l = *pos; >+ >+ spin_lock(&ext3_list_lock); >+ list_for_each(p, &ext3_list) >+ if (!l--) >+ return list_entry(p, struct ext3_inode_info, list); >+ return NULL; >+} >+ >+static void *ext3_cache_next(struct seq_file *m, void *v, loff_t > *pos) +{ >+ struct list_head *p = ((struct ext3_inode_info *)v)->list.next; >+ (*pos)++; >+ return p==&ext3_list ? NULL : list_entry(p, struct > ext3_inode_info, list); +} >+ >+static void ext3_cache_stop(struct seq_file *m, void *v) >+{ >+ spin_unlock(&ext3_list_lock); >+} >+ >+static int ext3_cache_show(struct seq_file *m, void *v) >+{ >+ struct ext3_inode_info *ei = v; >+ struct inode *inode = &ei->vfs_inode; >+ seq_printf(m, "%d:%d:%lu:%o", >+ MAJOR(inode->i_sb->s_dev), >+ MINOR(inode->i_sb->s_dev), >+ inode->i_ino, >+ inode->i_mode); >+ return 0; >+} >+ >+static struct seq_operations ext3_cache_op = { >+ .start = ext3_cache_start, >+ .next = ext3_cache_next, >+ .stop = ext3_cache_stop, >+ .show = ext3_cache_show >+}; >+ >+static int ext3_cache_open(struct inode *inode, struct file *file) >+{ >+ return seq_open(file, &ext3_cache_op); >+} >+ >+static struct file_operations ext3_cache_operations = { >+ .open = ext3_cache_open, >+ .read = seq_read, >+ .llseek = seq_lseek, >+ .release = seq_release, >+}; > > static int init_inodecache(void) > { >+ struct proc_dir_entry *p; > ext3_inode_cachep = kmem_cache_create("ext3_inode_cache", > sizeof(struct ext3_inode_info), > 0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT, > init_once, NULL); > if (ext3_inode_cachep == NULL) > return -ENOMEM; >+ p = create_proc_entry("fs/ext3", S_IRUGO, NULL); >+ if (p) { >+ p->owner = THIS_MODULE; >+ p->proc_fops = &ext3_cache_operations; >+ } > return 0; > } > > static void destroy_inodecache(void) > { >+ remove_proc_entry("fs/ext3", NULL); > if (kmem_cache_destroy(ext3_inode_cachep)) > printk(KERN_INFO "ext3_inode_cache: not all structures were > freed\n"); } >diff -urN RC8-current/include/linux/ext3_fs_i.h > RC8-leak/include/linux/ext3_fs_i.h --- > RC8-current/include/linux/ext3_fs_i.h Thu Oct 9 17:34:54 2003 +++ > RC8-leak/include/linux/ext3_fs_i.h Sun Aug 15 04:11:03 2004 @@ > -107,6 +107,7 @@ > * by other means, so we have truncate_sem. > */ > struct semaphore truncate_sem; >+ struct list_head list; > struct inode vfs_inode; > }; -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 10:10 ` Gene Heskett @ 2004-08-15 10:37 ` viro 2004-08-15 10:42 ` Gene Heskett [not found] ` <200408150704.49312.gene.heskett@verizon.net> 0 siblings, 2 replies; 146+ messages in thread From: viro @ 2004-08-15 10:37 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote: > all in one line of text, its a bit hard to locate real duplicates. > But I think I see some right now! Can this line be modified to spit > them out, one entry per line with all dups sorted to be adjacent? Sure, just add \n in format here. Sorry, hadn't noticed that... > >+ seq_printf(m, "%d:%d:%lu:%o", ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 10:37 ` viro @ 2004-08-15 10:42 ` Gene Heskett 2004-08-15 11:00 ` viro [not found] ` <200408150704.49312.gene.heskett@verizon.net> 1 sibling, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-15 10:42 UTC (permalink / raw) To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sunday 15 August 2004 06:37, viro@parcelfarce.linux.theplanet.co.uk wrote: >On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote: >> all in one line of text, its a bit hard to locate real duplicates. >> But I think I see some right now! Can this line be modified to >> spit them out, one entry per line with all dups sorted to be >> adjacent? > >Sure, just add \n in format here. Sorry, hadn't noticed that... > >> >+ seq_printf(m, "%d:%d:%lu:%o", Can do, assume it would then be seq_printf(m, "%d:%d:%lu:%o\n"? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 10:42 ` Gene Heskett @ 2004-08-15 11:00 ` viro 0 siblings, 0 replies; 146+ messages in thread From: viro @ 2004-08-15 11:00 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sun, Aug 15, 2004 at 06:42:11AM -0400, Gene Heskett wrote: > On Sunday 15 August 2004 06:37, viro@parcelfarce.linux.theplanet.co.uk > wrote: > >On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote: > >> all in one line of text, its a bit hard to locate real duplicates. > >> But I think I see some right now! Can this line be modified to > >> spit them out, one entry per line with all dups sorted to be > >> adjacent? > > > >Sure, just add \n in format here. Sorry, hadn't noticed that... > > > >> >+ seq_printf(m, "%d:%d:%lu:%o", > > Can do, assume it would then be seq_printf(m, "%d:%d:%lu:%o\n"? Yes ^ permalink raw reply [flat|nested] 146+ messages in thread
[parent not found: <200408150704.49312.gene.heskett@verizon.net>]
* Re: Possible dcache BUG [not found] ` <200408150704.49312.gene.heskett@verizon.net> @ 2004-08-15 11:26 ` viro 2004-08-15 17:47 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: viro @ 2004-08-15 11:26 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sun, Aug 15, 2004 at 07:04:49AM -0400, Gene Heskett wrote: > On Sunday 15 August 2004 06:37, viro@parcelfarce.linux.theplanet.co.uk > wrote: > >On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote: > >> all in one line of text, its a bit hard to locate real duplicates. > >> But I think I see some right now! Can this line be modified to > >> spit them out, one entry per line with all dups sorted to be > >> adjacent? > > > >Sure, just add \n in format here. Sorry, hadn't noticed that... > > > >> >+ seq_printf(m, "%d:%d:%lu:%o\n", > > And here it is right after starting x on the reboot. (I take it the > first number is the number of dups?) Yes - uniq -c merges duplicates and puts the number of copies in front of line, so sort | uniq -c | sort -nr will sort by frequency and print each line with number of times it had occured. You don't have any duplicates so far and the output looks OK... ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 11:26 ` viro @ 2004-08-15 17:47 ` Gene Heskett [not found] ` <200408152257.04773.vda@port.imtp.ilyichevsk.odessa.ua> 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-15 17:47 UTC (permalink / raw) To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sunday 15 August 2004 07:26, viro@parcelfarce.linux.theplanet.co.uk wrote: >On Sun, Aug 15, 2004 at 07:04:49AM -0400, Gene Heskett wrote: >> On Sunday 15 August 2004 06:37, >> viro@parcelfarce.linux.theplanet.co.uk >> >> wrote: >> >On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote: >> >> all in one line of text, its a bit hard to locate real >> >> duplicates. But I think I see some right now! Can this line be >> >> modified to spit them out, one entry per line with all dups >> >> sorted to be adjacent? >> > >> >Sure, just add \n in format here. Sorry, hadn't noticed that... >> > >> >> >+ seq_printf(m, "%d:%d:%lu:%o\n", >> >> And here it is right after starting x on the reboot. (I take it >> the first number is the number of dups?) > >Yes - uniq -c merges duplicates and puts the number of copies in > front of line, so sort | uniq -c | sort -nr will sort by frequency > and print each line with number of times it had occured. > >You don't have any duplicates so far and the output looks OK... And I still don't have any dups, but I AAARRRRGGGGGggg! do have this: -------------- Aug 15 09:33:02 coyote kernel: Unable to handle kernel paging request at virtual address 5f746573 Aug 15 09:33:02 coyote kernel: printing eip: Aug 15 09:33:02 coyote kernel: 5f746573 Aug 15 09:33:02 coyote kernel: *pde = 00000000 Aug 15 09:33:02 coyote kernel: Oops: 0000 [#1] Aug 15 09:33:02 coyote kernel: PREEMPT Aug 15 09:33:02 coyote kernel: Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg Aug 15 09:33:02 coyote kernel: CPU: 0 Aug 15 09:33:02 coyote kernel: EIP: 0060:[<5f746573>] Not tainted Aug 15 09:33:02 coyote kernel: EFLAGS: 00210006 (2.6.8-rc4) Aug 15 09:33:02 coyote kernel: EIP is at 0x5f746573 Aug 15 09:33:02 coyote kernel: eax: f0679a18 ebx: 20262620 ecx: 00000000 edx: 00000001 Aug 15 09:33:02 coyote kernel: esi: 63617266 edi: 00000001 ebp: ee62dcf8 esp: ee62dcd8 Aug 15 09:33:02 coyote kernel: ds: 007b es: 007b ss: 0068 Aug 15 09:33:02 coyote kernel: Process top (pid: 2439, threadinfo=ee62d000 task=ed68c3b0) Aug 15 09:33:02 coyote kernel: Stack: c0113378 f0679a18 00000001 00000000 00000000 ee62d000 00000000 00200286 Aug 15 09:33:02 coyote kernel: ee62dd20 c01133db f0678924 00000001 00000001 00000000 00000000 f0678000 Aug 15 09:33:02 coyote kernel: ee62dea8 ee52df3e ee62de0c c01ef5fe 00000000 0000001d 00020001 ffffffff Aug 15 09:33:02 coyote kernel: Call Trace: Aug 15 09:33:02 coyote kernel: [<c010476f>] show_stack+0x7f/0xa0 Aug 15 09:33:02 coyote kernel: [<c0104908>] show_registers+0x158/0x1b0 Aug 15 09:33:02 coyote kernel: [<c0104a89>] die+0x89/0x100 Aug 15 09:33:02 coyote kernel: [<c0111725>] do_page_fault+0x1f5/0x553 Aug 15 09:33:02 coyote kernel: [<c01043d9>] error_code+0x2d/0x38 Aug 15 09:33:02 coyote kernel: [<c01133db>] __wake_up+0x3b/0x70 Aug 15 09:33:02 coyote kernel: [<c01ef5fe>] n_tty_receive_buf+0x20e/0xf20 Aug 15 09:33:02 coyote kernel: [<c01f1e3a>] pty_write+0x12a/0x130 Aug 15 09:33:02 coyote kernel: [<c01eec7b>] opost_block+0xeb/0x1a0 Aug 15 09:33:02 coyote kernel: [<c01f0efc>] write_chan+0x18c/0x220 Aug 15 09:33:02 coyote kernel: [<c01eb9e7>] tty_write+0x1b7/0x250 Aug 15 09:33:02 coyote kernel: [<c014b7ca>] vfs_write+0xca/0x140 Aug 15 09:33:02 coyote kernel: [<c014b90b>] sys_write+0x4b/0x80 Aug 15 09:33:02 coyote kernel: [<c01041dd>] sysenter_past_esp+0x52/0x71 Aug 15 09:33:02 coyote kernel: Code: Bad EIP value. Aug 15 09:33:02 coyote kernel: <6>note: top[2439] exited with preempt_count 2 Aug 15 09:33:02 coyote kernel: bad: scheduling while atomic! Aug 15 09:33:02 coyote kernel: [<c01047ae>] dump_stack+0x1e/0x20 Aug 15 09:33:02 coyote kernel: [<c0305578>] schedule+0x478/0x480 Aug 15 09:33:02 coyote kernel: [<c013d209>] unmap_vmas+0x199/0x1b0 Aug 15 09:33:02 coyote kernel: [<c0141471>] exit_mmap+0x81/0x160 Aug 15 09:33:02 coyote kernel: [<c0114895>] mmput+0x65/0x90 Aug 15 09:33:02 coyote kernel: [<c0118ad3>] do_exit+0x153/0x430 Aug 15 09:33:02 coyote kernel: [<c0104af9>] die+0xf9/0x100 Aug 15 09:33:02 coyote kernel: [<c0111725>] do_page_fault+0x1f5/0x553 Aug 15 09:33:02 coyote kernel: [<c01043d9>] error_code+0x2d/0x38 Aug 15 09:33:02 coyote kernel: [<c01133db>] __wake_up+0x3b/0x70 Aug 15 09:33:02 coyote kernel: [<c01ef5fe>] n_tty_receive_buf+0x20e/0xf20 Aug 15 09:33:02 coyote kernel: [<c01f1e3a>] pty_write+0x12a/0x130 Aug 15 09:33:02 coyote kernel: [<c01eec7b>] opost_block+0xeb/0x1a0 Aug 15 09:33:02 coyote kernel: [<c01f0efc>] write_chan+0x18c/0x220 Aug 15 09:33:02 coyote kernel: [<c01eb9e7>] tty_write+0x1b7/0x250 Aug 15 09:33:02 coyote kernel: [<c014b7ca>] vfs_write+0xca/0x140 Aug 15 09:33:02 coyote kernel: [<c014b90b>] sys_write+0x4b/0x80 Aug 15 09:33:02 coyote kernel: [<c01041dd>] sysenter_past_esp+0x52/0x71 ------------------- And the shell I had a "top" running in on xwindow #2 had crashed with a SIGABRT. This was about 10 minutes after I had gone out to make some more cement blocks, which takes around 3 hours. I was able to restart the shell, and the top. The system "feels" normal. I'm going to call tcwo tomorrow and see what I can get in new hardware. This is fscking ridiculous. I get a cpu/cooler/fan that runs 40C cooler than the old one and its doing nothing but crashing. The absolute longest uptime so far was the recent nearly 37 hours. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
[parent not found: <200408152257.04773.vda@port.imtp.ilyichevsk.odessa.ua>]
* Re: Possible dcache BUG [not found] ` <200408152257.04773.vda@port.imtp.ilyichevsk.odessa.ua> @ 2004-08-15 20:33 ` Gene Heskett [not found] ` <200408160803.15206.vda@port.imtp.ilyichevsk.odessa.ua> 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-15 20:33 UTC (permalink / raw) To: linux-kernel Cc: Denis Vlasenko, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sunday 15 August 2004 15:57, Denis Vlasenko wrote: >> And I still don't have any dups, but I AAARRRRGGGGGggg! do have >> this: >> >> -------------- >> Aug 15 09:33:02 coyote kernel: Unable to handle kernel paging >> request at virtual address 5f746573 Aug 15 09:33:02 coyote kernel: >> printing eip: Aug 15 09:33:02 coyote kernel: 5f746573 >> Aug 15 09:33:02 coyote kernel: *pde = 00000000 >> Aug 15 09:33:02 coyote kernel: Oops: 0000 [#1] >> Aug 15 09:33:02 coyote kernel: PREEMPT > > ^^^^^^^ > >> Aug 15 09:33:02 coyote kernel: Modules linked in: eeprom >> snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss >> snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer >> snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd >> forcedeth > >Gene, you should have stopped using preempt/smp and sound modules >in an attempt to narrow down the bug. We already kinda determined >that you are experiencing random memory corruption, but hardware >was tested and seems to be ok. It's software, then. Preempt/smp bug >or buggy driver are prime suspects. Ok, non-preempt is building. Will reboot to it when the build is done. >> I was able to restart the shell, and the top. The system "feels" >> normal. >> >> I'm going to call tcwo tomorrow and see what I can get in new >> hardware. > >Very likely this won't help. I'm not quite as sure. This could be a mobo with a flakey buffer latch or something. I also had, many years ago, a z-80 that would not reliably switch its foreground/background register set. And guess what? By the time I'd diagnosed it, zilog wasn't interested in replaceing an obviously flakey chip. Out of warranty according to the date stamps. Not my problem it laid on some distribs shelf for a frigging year plus... >- >- >vda -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
[parent not found: <200408160803.15206.vda@port.imtp.ilyichevsk.odessa.ua>]
* Re: Possible dcache BUG [not found] ` <200408160803.15206.vda@port.imtp.ilyichevsk.odessa.ua> @ 2004-08-16 6:32 ` Gene Heskett 2004-08-16 14:13 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-16 6:32 UTC (permalink / raw) To: linux-kernel; +Cc: Denis Vlasenko, viro, Marcelo Tosatti On Monday 16 August 2004 01:03, Denis Vlasenko wrote: >On Sunday 15 August 2004 23:33, Gene Heskett wrote: >> On Sunday 15 August 2004 15:57, Denis Vlasenko wrote: >> >> And I still don't have any dups, but I AAARRRRGGGGGggg! do have >> >> this: >> >> >> >> -------------- >> >> Aug 15 09:33:02 coyote kernel: Unable to handle kernel paging >> >> request at virtual address 5f746573 Aug 15 09:33:02 coyote >> >> kernel: printing eip: Aug 15 09:33:02 coyote kernel: 5f746573 >> >> Aug 15 09:33:02 coyote kernel: *pde = 00000000 >> >> Aug 15 09:33:02 coyote kernel: Oops: 0000 [#1] >> >> Aug 15 09:33:02 coyote kernel: PREEMPT >> > >> > ^^^^^^^ >> > >> >> Aug 15 09:33:02 coyote kernel: Modules linked in: eeprom >> >> snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss >> >> snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm >> >> snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi >> >> snd_seq_device snd forcedeth >> > >> >Gene, you should have stopped using preempt/smp and sound modules >> >in an attempt to narrow down the bug. We already kinda determined >> >that you are experiencing random memory corruption, but hardware >> >was tested and seems to be ok. It's software, then. Preempt/smp >> > bug or buggy driver are prime suspects. >> >> Ok, non-preempt is building. Will reboot to it when the build is >> done. > >Do not load sound modules too please, unless you absolutely need > sound. One thing at a time I think. Thats major surgery on modprobe.conf to disable that, plus a chkconfig alsasound off. I've noticed that with preempt off, my kde curser motions are back to using the mouse if I want to move it more than a word or so to hit a typu and fix it. Its an effect that comes and goes, often in the same message reply. X is running at -1 I think. Other than that (knock on wood) its running ok so far, but only 9h50m uptime. >> >> I was able to restart the shell, and the top. The system >> >> "feels" normal. >> >> >> >> I'm going to call tcwo tomorrow and see what I can get in new >> >> hardware. >> > >> >Very likely this won't help. >> >> I'm not quite as sure. This could be a mobo with a flakey buffer >> latch or something. I also had, many years ago, a z-80 that would > >GCC is likely to sometimes catch sig11 on such flakey hardware. >You did not report anything like that, than's why I'm thinking >hardware is ok. > >> not reliably switch its foreground/background register set. And >> guess what? By the time I'd diagnosed it, zilog wasn't interested >> in replaceing an obviously flakey chip. Out of warranty according >> to the date stamps. Not my problem it laid on some distribs shelf >> for a frigging year plus... > >-- >vda -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-16 6:32 ` Gene Heskett @ 2004-08-16 14:13 ` Gene Heskett [not found] ` <200408161749.23663.vda@port.imtp.ilyichevsk.odessa.ua> 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-16 14:13 UTC (permalink / raw) To: linux-kernel; +Cc: Denis Vlasenko, viro, Marcelo Tosatti On Monday 16 August 2004 02:32, Gene Heskett wrote: >On Monday 16 August 2004 01:03, Denis Vlasenko wrote: >>On Sunday 15 August 2004 23:33, Gene Heskett wrote: >>> On Sunday 15 August 2004 15:57, Denis Vlasenko wrote: [...] >>> >Gene, you should have stopped using preempt/smp and sound >>> > modules in an attempt to narrow down the bug. We already kinda >>> > determined that you are experiencing random memory corruption, >>> > but hardware was tested and seems to be ok. It's software, >>> > then. Preempt/smp bug or buggy driver are prime suspects. >>> >>> Ok, non-preempt is building. Will reboot to it when the build is >>> done. >> >>Do not load sound modules too please, unless you absolutely need >> sound. > >One thing at a time I think. Thats major surgery on modprobe.conf > to disable that, plus a chkconfig alsasound off. > >I've noticed that with preempt off, my kde curser motions are back > to using the mouse if I want to move it more than a word or so to > hit a typu and fix it. Its an effect that comes and goes, often in > the same message reply. X is running at -1 I think. Other than > that (knock on wood) its running ok so far, but only 9h50m uptime. [...] With PREEMPT off, and a 16 hour uptime, I am suddenly nearly out of memory again. As an additional tool, I had started ksysguard for its gfx memory display and set it for a 1 minute update interval. When I awoke again, the memory panel was 100% blue since some major event, I assume logrotate by cron, ran but hadn't quite scrolled off screen. However, there is no swapping yet, and nothing unusual in the log. Here are /proc/meminfo: MemTotal: 1035956 kB MemFree: 14036 kB Buffers: 181044 kB Cached: 114024 kB SwapCached: 0 kB Active: 277684 kB Inactive: 148840 kB HighTotal: 131008 kB HighFree: 9408 kB LowTotal: 904948 kB LowFree: 4628 kB SwapTotal: 3857104 kB SwapFree: 3857104 kB Dirty: 12 kB Writeback: 0 kB Mapped: 202108 kB Slab: 584876 kB Committed_AS: 276216 kB PageTables: 3340 kB VmallocTotal: 114680 kB VmallocUsed: 19876 kB VmallocChunk: 94640 kB and /proc/slabinfo: slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> unix_sock 200 200 384 10 1 : tunables 54 27 0 : slabdata 20 20 0 tcp_tw_bucket 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 tcp_bind_bucket 35 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 ip_fib_hash 10 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 ip_dst_cache 15 15 256 15 1 : tunables 120 60 0 : slabdata 1 1 0 arp_cache 3 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 480 8 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 2 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 flow_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 mqueue_inode_cache 1 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 udf_inode_cache 0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0 smb_request 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 smb_inode_cache 2 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 4 22 352 11 1 : tunables 54 27 0 : slabdata 2 2 0 ext2_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 journal_handle 8 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 1114 3888 48 81 1 : tunables 120 60 0 : slabdata 48 48 0 revoke_table 12 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 ext3_inode_cache 1000246 1020249 448 9 1 : tunables 54 27 0 : slabdata 113361 113361 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 172 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 file_lock_cache 43 43 92 43 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 2 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 shmem_inode_cache 6 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 7 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0 sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0 sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0 sgpool-8 32 62 128 31 1 : tunables 120 60 0 : slabdata 2 2 0 cfq_pool 64 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 65 65 60 65 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_ioc 73 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 12 18 448 9 1 : tunables 54 27 0 : slabdata 2 2 0 blkdev_requests 52 52 152 26 1 : tunables 120 60 0 : slabdata 2 2 0 biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0 biovec-64 256 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0 biovec-16 256 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0 biovec-4 256 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0 biovec-1 320 452 16 226 1 : tunables 120 60 0 : slabdata 2 2 0 bio 319 366 64 61 1 : tunables 120 60 0 : slabdata 6 6 0 sock_inode_cache 242 242 352 11 1 : tunables 54 27 0 : slabdata 22 22 0 skbuff_head_cache 235 450 160 25 1 : tunables 120 60 0 : slabdata 18 18 0 sock 3 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 proc_inode_cache 571 600 320 12 1 : tunables 54 27 0 : slabdata 50 50 0 sigqueue 108 108 148 27 1 : tunables 120 60 0 : slabdata 4 4 0 radix_tree_node 10212 21182 276 14 1 : tunables 54 27 0 : slabdata 1513 1513 0 bdev_cache 11 18 416 9 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 26 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 2371 2380 288 14 1 : tunables 54 27 0 : slabdata 170 170 0 dentry_cache 718370 781704 140 28 1 : tunables 120 60 0 : slabdata 27918 27918 0 filp 2145 2300 160 25 1 : tunables 120 60 0 : slabdata 92 92 0 names_cache 17 17 4096 1 1 : tunables 24 12 0 : slabdata 17 17 0 idr_layer_cache 81 87 136 29 1 : tunables 120 60 0 : slabdata 3 3 0 buffer_head 51836 80919 48 81 1 : tunables 120 60 0 : slabdata 999 999 0 mm_struct 98 98 512 7 1 : tunables 54 27 0 : slabdata 14 14 0 vm_area_struct 7852 8272 84 47 1 : tunables 120 60 0 : slabdata 176 176 0 fs_cache 103 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 99 99 416 9 1 : tunables 54 27 0 : slabdata 11 11 0 signal_cache 123 123 96 41 1 : tunables 120 60 0 : slabdata 3 3 0 sighand_cache 111 111 1312 3 1 : tunables 24 12 0 : slabdata 37 37 0 task_struct 115 120 1424 5 2 : tunables 24 12 0 : slabdata 24 24 0 anon_vma 1796 2035 8 407 1 : tunables 120 60 0 : slabdata 5 5 0 pgd 90 90 4096 1 1 : tunables 24 12 0 : slabdata 90 90 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 10 10 16384 1 4 : tunables 8 4 0 : slabdata 10 10 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 9 9 8192 1 2 : tunables 8 4 0 : slabdata 9 9 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 191 191 4096 1 1 : tunables 24 12 0 : slabdata 191 191 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 172 192 2048 2 1 : tunables 24 12 0 : slabdata 96 96 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 145 164 1024 4 1 : tunables 54 27 0 : slabdata 41 41 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 184 448 512 8 1 : tunables 54 27 0 : slabdata 56 56 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 165 435 256 15 1 : tunables 120 60 0 : slabdata 29 29 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0 size-192 120 120 192 20 1 : tunables 120 60 0 : slabdata 6 6 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 1231 1271 128 31 1 : tunables 120 60 0 : slabdata 41 41 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 32409 33123 64 61 1 : tunables 120 60 0 : slabdata 543 543 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 1428 1428 32 119 1 : tunables 120 60 0 : slabdata 12 12 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 Note the size-64, dentry_cache and ext3_inode_cache lines Now if I can remember that shell line to check /proc/fs/ext3 for dups: Unforch, that doesn't want to work, cat is using 90% of the cpu, and the command line: cat /proc/fs/ext3|sort|uniq -c|sort -nr is hung. But it will ctl-c. Humm, cat /proc/fs/ext3 by itself is running, its just got so much data that I ctl-c'd it after 1 minute. This may be an interesting report IF it ever gets done. But at 10 megs for shell history I may have to redo it directed to a file! Yes, at least 10 megs scrolled off the end of the scrollback buffer. However, as I watched it scrolling, I never saw the first digit change to a non-1 value. Odd effect, the cpu temp is falling, by about 5C. And with only 7 megs free according to top, its still not swapping! The file is just short of 24 megs. Now to grep it for errors. Aha! There are some non-1 first digit values in that file! [root@coyote linux-2.6.8-rc4]# grep ' 2 ' /ext3-allocs 2 3:8:8227974:100644 2 3:8:8227973:100644 2 3:8:8227972:100644 2 3:8:8227971:100644 2 3:8:8193936:100644 2 3:8:8193935:100644 2 3:8:8193934:100644 2 3:8:8193738:100644 2 3:8:7834144:100644 2 3:8:7834143:100644 2 3:8:7684604:100644 2 3:8:7521425:100644 2 3:8:7521411:100644 2 3:8:6360398:40755 2 3:8:6013120:40755 2 3:8:6013101:40755 2 3:8:5982111:40755 2 3:8:5982098:40755 2 3:8:5982088:40775 2 3:8:5949697:40777 2 3:8:5949683:40777 2 3:8:5947892:42755 2 3:8:5947890:42755 2 3:8:5915386:42755 2 3:8:5915379:42755 2 3:8:5901299:42755 2 3:8:5901289:42755 2 3:8:5835169:42777 2 3:8:5835162:40755 2 3:8:5835159:40755 2 3:8:1250790:100644 2 3:8:1250789:100644 However, thats the end of it: [root@coyote linux-2.6.8-rc4]# grep ' 3 ' /ext3-allocs [root@coyote linux-2.6.8-rc4]# grep ' 4 ' /ext3-allocs [root@coyote linux-2.6.8-rc4]# grep ' 5 ' /ext3-allocs [root@coyote linux-2.6.8-rc4]# grep ' 6 ' /ext3-allocs [root@coyote linux-2.6.8-rc4]# grep ' 7 ' /ext3-allocs [root@coyote linux-2.6.8-rc4]# grep ' 8 ' /ext3-allocs [root@coyote linux-2.6.8-rc4]# grep ' 9 ' /ext3-allocs [root@coyote linux-2.6.8-rc4]# grep ' 10 ' /ext3-allocs So now we have an odor of a problem, the question is what does it smell like? What can I do next to shine a light on this? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
[parent not found: <200408161749.23663.vda@port.imtp.ilyichevsk.odessa.ua>]
* Re: Possible dcache BUG [not found] ` <200408161749.23663.vda@port.imtp.ilyichevsk.odessa.ua> @ 2004-08-16 15:25 ` Gene Heskett 0 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-16 15:25 UTC (permalink / raw) To: linux-kernel; +Cc: Denis Vlasenko, viro, Marcelo Tosatti On Monday 16 August 2004 10:49, Denis Vlasenko wrote: >> >>> Ok, non-preempt is building. Will reboot to it when the build >> >>> is done. >> >> >> >>Do not load sound modules too please, unless you absolutely need >> >> sound. >> > >> >One thing at a time I think. Thats major surgery on >> > modprobe.conf to disable that, plus a chkconfig alsasound off. >> > >> >I've noticed that with preempt off, my kde curser motions are >> > back to using the mouse if I want to move it more than a word or >> > so to hit a typu and fix it. Its an effect that comes and goes, >> > often in the same message reply. X is running at -1 I think. >> > Other than that (knock on wood) its running ok so far, but only >> > 9h50m uptime. >> >> [...] >> With PREEMPT off, and a 16 hour uptime, I am suddenly nearly out >> of memory again. As an additional tool, I had started ksysguard >> for its > >That depends of what you call "out of memory". It's normal for Linux >to have very little free memory. top shows on my 256Mb home box: > >224 processes: 223 sleeping, 1 running, 0 zombie, 0 stopped >CPU states: 2.9% user 13.8% system 0.0% nice 0.1% iowait > 82.9% idle Mem: 254936k av, 252736k used, 2200k free, > 0k shrd, 38872k buff ^^^^^^^^^^ > 197796k active, 31396k inactive >Swap: 262136k av, 0k used, 262136k free > 95868k cached ^^^^^^^^^^^^^ > >Of course, when you're fresh after reboot, you do have >tons of free memory. How quickly cache will fill your RAM >depends on RAM amount and your usage pattern. >With 1Gig of RAM and mild usage it can take e.g. 16 hours or so. ;) > >> gfx memory display and set it for a 1 minute update interval. >> When I awoke again, the memory panel was 100% blue since some >> major event, I assume logrotate by cron, ran but hadn't quite >> scrolled off screen. > >Quite possibly. Reboot, run "grep -rF 'something' ." in a kernel > tree and see your RAM quickly filled with cache. > >> However, there is no swapping yet, and nothing unusual in the log. >> Here are /proc/meminfo: >> MemTotal: 1035956 kB >> Buffers: 181044 kB > >I'm not sure. Maybe this is a bit high. Other values look ok. > >> and /proc/slabinfo: >> slabinfo - version: 2.0 >> # name <active_objs> <num_objs> <objsize> <objperslab> >> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : >> slabdata <active_slabs> <num_slabs> <sharedavail> >> >> Note the size-64, dentry_cache and ext3_inode_cache lines > >Yes, dentry_cache and ext3_inode_cache are filesystem cache. >So far, nothing looks wrong. > >> Now if I can remember that shell line to check /proc/fs/ext3 >> for dups: > >Sorry, I am all-reiserfs now. ;] I believe it was Viro that sent me a patch that instrumented the inode handling of ext3. This is the results of that, otherwise /proc/fs/ext3 doesn't exist. >> [root@coyote linux-2.6.8-rc4]# grep ' 2 ' /ext3-allocs >> 2 3:8:8227974:100644 >> 2 3:8:8227973:100644 >> 2 3:8:8227972:100644 >> 2 3:8:8227971:100644 >> 2 3:8:8193936:100644 >> 2 3:8:8193935:100644 >> 2 3:8:8193934:100644 >> 2 3:8:8193738:100644 >> 2 3:8:7834144:100644 >> 2 3:8:7834143:100644 >> 2 3:8:7684604:100644 >> 2 3:8:7521425:100644 >> 2 3:8:7521411:100644 >> 2 3:8:6360398:40755 >> 2 3:8:6013120:40755 >> 2 3:8:6013101:40755 >> 2 3:8:5982111:40755 >> 2 3:8:5982098:40755 >> 2 3:8:5982088:40775 >> 2 3:8:5949697:40777 >> 2 3:8:5949683:40777 >> 2 3:8:5947892:42755 >> 2 3:8:5947890:42755 >> 2 3:8:5915386:42755 >> 2 3:8:5915379:42755 >> 2 3:8:5901299:42755 >> 2 3:8:5901289:42755 >> 2 3:8:5835169:42777 >> 2 3:8:5835162:40755 >> 2 3:8:5835159:40755 >> 2 3:8:1250790:100644 >> 2 3:8:1250789:100644 The first ' 2 ' represents an error in that only one process should have allocated that block of ram _1_ time only for that particular inode. This could, but hasn't yet that I know of, lead to disk corruption I'm sure. These are the most thoroughly e2fsck'd drives on the planet I'd think. Somewhat more than an average of once daily now for several weeks. I'd also bet a cold one that if I typed reboot, I would end up having to use the reset button to finish the job at some point, its a 75% sure thing. And, wonder of wonders, this list has self-shortened!: And still no Oops either. The number of leading 2's had gone down the next time I ran that line. I'd taken a shower between runs but didn't know it would clean this up too :-) Then reading up on grep, I used this command line the next time: #>cat /proc/fs/ext3|sort|uniq -c|sort -nr|grep -v ' 1 ' >/ext3-allocs-bad;cat /ext3-allocs-bad and got this, a much shorter list: 2 3:8:8405754:40775 2 3:8:7850178:100644 2 3:8:7816153:100644 2 3:8:7816152:100644 2 3:8:7803727:100644 2 3:8:7803726:100644 2 3:8:7803033:100644 2 3:8:7684502:100644 2 3:8:7407284:100644 But I still don't know enough about this to point any fingers at anything. [... old list] >> So now we have an odor of a problem, the question is what does >> it smell like? What can I do next to shine a light on this? Questions still valid IMO. >-- >vda -- Cheers & thanks Denis, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-15 8:48 ` viro ` (2 preceding siblings ...) 2004-08-15 10:10 ` Gene Heskett @ 2004-08-16 22:52 ` Gene Heskett 2004-08-16 23:01 ` viro 3 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-16 22:52 UTC (permalink / raw) To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Sunday 15 August 2004 04:48, viro@parcelfarce.linux.theplanet.co.uk wrote: >On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote: >> The only thing I've noted in the slabinfo reports is the >> ext3_cache was well into 6 digits in kilobytes. Now its only >> 15,000 of its normal units (whatever they are) after the reboot. > >What did dcache numbers look like at that time? > >Anyway, we could try the patch below and see what shows in > /proc/fs/ext3 with it [NOTE: patch is completely untested]. It > should show major:minor:inumber:mode >for all currently allocated ext3 inodes. It won't be 100% accurate > (we can miss some entries/get some twice if cache shrinks or grows > at the time), but if the leak is so massive, we ought to see a > *lot* of duplicates in there. Seeing what kind of inodes really > leaks could narrow the things down. Well, I am seing some dups, but they are so volatile that no two runs will report the same allocations as dups, and its never more than 2 using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v ' 1 ' Consecutive runs will show anywhere from 3 to 10 or 12 dups, but never is an address repeated between runs. How is this to be interpreted? FWIW, I'm now up 25 hours, with PREEMPT off. No Oops's yet. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-16 22:52 ` Gene Heskett @ 2004-08-16 23:01 ` viro 2004-08-17 4:44 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: viro @ 2004-08-16 23:01 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Mon, Aug 16, 2004 at 06:52:50PM -0400, Gene Heskett wrote: > Well, I am seing some dups, but they are so volatile that no two runs > will report the same allocations as dups, and its never more than 2 > using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v ' 1 ' > > Consecutive runs will show anywhere from 3 to 10 or 12 dups, but never > is an address repeated between runs. > > How is this to be interpreted? That's OK. Keep in mind that you have a *lot* of these guys and your cat(1) makes a lot of read(2) calls. So what you see is <starting to read> <see inode #n that is about to be evicted> <read some more> <inode #n gets evicted, quite possibly - due to memory pressure from cat(1) or sort(1)> <read more> <somebody wants the same inode again> <read more> <see the inode #n we'd just had read from disk again> So few duplicates are all right. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-16 23:01 ` viro @ 2004-08-17 4:44 ` Gene Heskett 2004-08-17 4:58 ` Nick Piggin 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-17 4:44 UTC (permalink / raw) To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Monday 16 August 2004 19:01, viro@parcelfarce.linux.theplanet.co.uk wrote: >On Mon, Aug 16, 2004 at 06:52:50PM -0400, Gene Heskett wrote: >> Well, I am seing some dups, but they are so volatile that no two >> runs will report the same allocations as dups, and its never more >> than 2 using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v ' >> 1 ' >> >> Consecutive runs will show anywhere from 3 to 10 or 12 dups, but >> never is an address repeated between runs. >> >> How is this to be interpreted? > >That's OK. Keep in mind that you have a *lot* of these guys and > your cat(1) makes a lot of read(2) calls. So what you see is > ><starting to read> ><see inode #n that is about to be evicted> ><read some more> ><inode #n gets evicted, quite possibly - due to memory pressure from > cat(1) or sort(1)> ><read more> ><somebody wants the same inode again> ><read more> ><see the inode #n we'd just had read from disk again> > >So few duplicates are all right. I hope so. I've got a real hoodoozy here, being out of memory (well, maybe 30 megs left) when my nightly run of rsync started, everything came to a grinding halt. I couldn't even get to the screen the tail -f on the log was running in, but after walking away for 10 minutes. I can once again. However, things seem to be partially functional so I'm going to see if I can do some cut-n-paste from the log screen to here, but I probably can't send it as sendmail was one of the items the OOM killer killed. According to top, I'm about 250 megs into the swap, very suddenly. No swap was in use at 23:55 local. ------- Aug 17 00:02:00 coyote kernel: kjournald starting. Commit interval 5 seconds Aug 17 00:02:00 coyote kernel: EXT3 FS on hdb3, internal journal Aug 17 00:02:00 coyote kernel: EXT3-fs: mounted filesystem with ordered data mode. Aug 17 00:11:55 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:11:55 coyote kernel: DMA per-cpu: Aug 17 00:11:55 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:11:55 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:11:55 coyote kernel: Normal per-cpu: Aug 17 00:11:55 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:11:55 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:11:55 coyote kernel: HighMem per-cpu: Aug 17 00:11:55 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:11:55 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:11:55 coyote kernel: Aug 17 00:11:55 coyote kernel: Free pages: 4308kB (532kB HighMem) Aug 17 00:11:55 coyote kernel: Active:31159 inactive:1039 dirty:0 writeback:28 unstable:0 free:1077 slab:222946 mapped:30766 pagetables:944 Aug 17 00:11:55 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:11:56 coyote kernel: protections[]: 8 476 540 Aug 17 00:11:56 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:0kB inactive:2420kB present:901120kB Aug 17 00:11:56 coyote kernel: protections[]: 0 468 532 Aug 17 00:11:56 coyote kernel: HighMem free:532kB min:128kB low:256kB high:384kB active:124636kB inactive:1736kB present:131008kB Aug 17 00:11:56 coyote kernel: protections[]: 0 0 64 Aug 17 00:12:00 coyote kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:12:00 coyote kernel: Normal: 12*4kB 2*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB Aug 17 00:12:00 coyote kernel: HighMem: 51*4kB 11*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 532kB Aug 17 00:12:01 coyote kernel: Swap cache: add 94539, delete 86334, find 14429/21141, race 0+0 Aug 17 00:12:01 coyote kernel: Out of Memory: Killed process 2239 (httpd). Aug 17 00:12:01 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:12:01 coyote kernel: DMA per-cpu: Aug 17 00:12:01 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:12:01 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:12:01 coyote kernel: Normal per-cpu: Aug 17 00:12:01 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:12:01 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:12:01 coyote kernel: HighMem per-cpu: Aug 17 00:12:01 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:12:01 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:12:01 coyote kernel: Aug 17 00:12:01 coyote kernel: Free pages: 4280kB (504kB HighMem) Aug 17 00:12:01 coyote kernel: Active:31668 inactive:498 dirty:0 writeback:0 unstable:0 free:1070 slab:222978 mapped:31113 pagetables:935 Aug 17 00:12:01 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:12:01 coyote kernel: protections[]: 8 476 540 Aug 17 00:12:02 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1192kB inactive:1104kB present:901120kB Aug 17 00:12:02 coyote kernel: protections[]: 0 468 532 Aug 17 00:12:02 coyote kernel: HighMem free:504kB min:128kB low:256kB high:384kB active:125480kB inactive:888kB present:131008kB Aug 17 00:12:02 coyote kernel: protections[]: 0 0 64 Aug 17 00:12:02 coyote kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:12:02 coyote kernel: Normal: 6*4kB 5*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1872kB Aug 17 00:12:02 coyote kernel: HighMem: 10*4kB 28*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 504kB Aug 17 00:12:02 coyote kernel: Swap cache: add 95383, delete 87073, find 14612/21472, race 0+0 Aug 17 00:12:02 coyote kernel: Out of Memory: Killed process 2240 (httpd). Aug 17 00:12:05 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:12:05 coyote kernel: DMA per-cpu: Aug 17 00:12:05 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:12:05 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:12:05 coyote kernel: Normal per-cpu: Aug 17 00:12:05 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:12:05 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:12:05 coyote kernel: HighMem per-cpu: Aug 17 00:12:05 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:12:05 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:12:05 coyote kernel: Aug 17 00:12:05 coyote kernel: Free pages: 4224kB (448kB HighMem) Aug 17 00:12:05 coyote kernel: Active:31803 inactive:378 dirty:0 writeback:0 unstable:0 free:1056 slab:222988 mapped:31394 pagetables:926 Aug 17 00:13:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:15:12 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:28 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1144kB inactive:1160kB present:901120kB Aug 17 00:16:28 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:29 coyote kernel: HighMem free:448kB min:128kB low:256kB high:384kB active:126068kB inactive:352kB present:131008kB Aug 17 00:16:29 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:30 coyote kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:30 coyote kernel: Normal: 0*4kB 4*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1872kB Aug 17 00:16:30 coyote kernel: HighMem: 40*4kB 6*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 448kB Aug 17 00:16:30 coyote kernel: Swap cache: add 96127, delete 87885, find 14691/21706, race 0+0 Aug 17 00:16:30 coyote kernel: Out of Memory: Killed process 2241 (httpd). Aug 17 00:16:30 coyote kernel: unstable:0 free:8799 slab:223005 mapped:19246 pagetables:850 Aug 17 00:16:31 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:31 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:31 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1140kB inactive:1252kB present:901120kB Aug 17 00:16:31 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:31 coyote kernel: HighMem free:31444kB min:128kB low:256kB high:384kB active:80988kB inactive:14548kBpresent:131008kB Aug 17 00:16:31 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:32 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:32 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB Aug 17 00:16:32 coyote kernel: HighMem: 2411*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 31444kB Aug 17 00:16:32 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0 Aug 17 00:16:32 coyote kernel: Out of Memory: Killed process 1803 (httpd). Aug 17 00:16:32 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:32 coyote kernel: DMA per-cpu: Aug 17 00:16:32 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:32 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:32 coyote kernel: Normal per-cpu: Aug 17 00:16:32 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:32 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:32 coyote kernel: HighMem per-cpu: Aug 17 00:16:33 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:33 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:33 coyote kernel: Aug 17 00:16:33 coyote kernel: Free pages: 35392kB (31640kB HighMem) Aug 17 00:16:33 coyote kernel: Active:20556 inactive:3885 dirty:0 writeback:0 unstable:0 free:8848 slab:222999 mapped:19205 pagetables:841 Aug 17 00:16:33 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:33 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:33 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1400kB inactive:992kB present:901120kB Aug 17 00:16:33 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:33 coyote kernel: HighMem free:31640kB min:128kB low:256kB high:384kB active:80824kB inactive:14548kBpresent:131008kB Aug 17 00:16:34 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:34 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:34 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB Aug 17 00:16:34 coyote kernel: HighMem: 2460*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 31640kB Aug 17 00:16:34 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0 Aug 17 00:16:34 coyote kernel: Out of Memory: Killed process 1804 (httpd). Aug 17 00:16:34 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:34 coyote kernel: DMA per-cpu: Aug 17 00:16:34 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:34 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:34 coyote kernel: Normal per-cpu: Aug 17 00:16:34 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:34 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:34 coyote kernel: HighMem per-cpu: Aug 17 00:16:34 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:34 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:34 coyote kernel: Aug 17 00:16:34 coyote kernel: Free pages: 35588kB (31836kB HighMem) Aug 17 00:16:34 coyote kernel: Active:20469 inactive:3931 dirty:0 writeback:0 unstable:0 free:8897 slab:222993 mapped:19164 pagetables:832 Aug 17 00:16:35 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:35 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:35 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1216kB inactive:1176kB present:901120kB Aug 17 00:16:35 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:35 coyote kernel: HighMem free:31836kB min:128kB low:256kB high:384kB active:80660kB inactive:14548kBpresent:131008kB Aug 17 00:16:35 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:35 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:35 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB Aug 17 00:16:35 coyote kernel: HighMem: 2509*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 31836kB Aug 17 00:16:35 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0 Aug 17 00:16:35 coyote kernel: Out of Memory: Killed process 1805 (httpd). Aug 17 00:16:35 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:35 coyote kernel: DMA per-cpu: Aug 17 00:16:35 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:35 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:35 coyote kernel: Normal per-cpu: Aug 17 00:16:35 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:35 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:36 coyote kernel: HighMem per-cpu: Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:36 coyote kernel: Aug 17 00:16:36 coyote kernel: Free pages: 35784kB (32032kB HighMem) Aug 17 00:16:36 coyote kernel: Active:20404 inactive:3954 dirty:0 writeback:0 unstable:0 free:8946 slab:222987 mapped:19038 pagetables:823 Aug 17 00:16:36 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:36 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:36 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1124kB inactive:1268kB present:901120kB Aug 17 00:16:36 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:36 coyote kernel: HighMem free:32032kB min:128kB low:256kB high:384kB active:80492kB inactive:14548kBpresent:131008kB Aug 17 00:16:36 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:36 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:36 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB Aug 17 00:16:36 coyote kernel: HighMem: 2558*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32032kB Aug 17 00:16:36 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0 Aug 17 00:16:36 coyote kernel: Out of Memory: Killed process 2153 (sendmail). Aug 17 00:16:36 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:36 coyote kernel: DMA per-cpu: Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:36 coyote kernel: Normal per-cpu: Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:36 coyote kernel: HighMem per-cpu: Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:36 coyote kernel: Aug 17 00:16:36 coyote kernel: Free pages: 35812kB (32060kB HighMem) Aug 17 00:16:36 coyote kernel: Active:20381 inactive:3976 dirty:0 writeback:0 unstable:0 free:8953 slab:222986 mapped:19037 pagetables:818 Aug 17 00:16:36 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:36 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:36 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1036kB inactive:1356kB present:901120kB Aug 17 00:16:36 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:36 coyote kernel: HighMem free:32060kB min:128kB low:256kB high:384kB active:80488kB inactive:14548kBpresent:131008kB Aug 17 00:16:36 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:36 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:36 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB Aug 17 00:16:36 coyote kernel: HighMem: 2565*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32060kB Aug 17 00:16:36 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0 Aug 17 00:16:36 coyote kernel: Out of Memory: Killed process 21567 (kdeinit). Aug 17 00:16:36 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:36 coyote kernel: DMA per-cpu: Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:36 coyote kernel: Normal per-cpu: Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:36 coyote kernel: HighMem per-cpu: Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:37 coyote kernel: Aug 17 00:16:37 coyote kernel: Free pages: 36120kB (32368kB HighMem) Aug 17 00:16:37 coyote kernel: Active:20284 inactive:4018 dirty:0 writeback:0 unstable:0 free:9030 slab:222984 mapped:18935 pagetables:800 Aug 17 00:16:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:37 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:37 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:952kB inactive:1444kB present:901120kB Aug 17 00:16:37 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:37 coyote kernel: HighMem free:32368kB min:128kB low:256kB high:384kB active:80184kB inactive:14628kBpresent:131008kB Aug 17 00:16:37 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:37 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:37 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB Aug 17 00:16:37 coyote kernel: HighMem: 2580*4kB 1642*8kB 365*16kB 56*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32368kB Aug 17 00:16:37 coyote kernel: Swap cache: add 102702, delete 93647, find 17218/24748, race 0+0 Aug 17 00:16:37 coyote kernel: Out of Memory: Killed process 1809 (httpd). Aug 17 00:16:37 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:37 coyote kernel: DMA per-cpu: Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:37 coyote kernel: Normal per-cpu: Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:37 coyote kernel: HighMem per-cpu: Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:37 coyote kernel: Aug 17 00:16:37 coyote kernel: Free pages: 36120kB (32368kB HighMem) Aug 17 00:16:37 coyote kernel: Active:20263 inactive:4039 dirty:0 writeback:0 unstable:0 free:9030 slab:222984 mapped:18935 pagetables:800 Aug 17 00:16:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:37 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:37 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:868kB inactive:1528kB present:901120kB Aug 17 00:16:37 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:37 coyote kernel: HighMem free:32368kB min:128kB low:256kB high:384kB active:80184kB inactive:14628kBpresent:131008kB Aug 17 00:16:37 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:37 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:37 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB Aug 17 00:16:37 coyote kernel: HighMem: 2580*4kB 1642*8kB 365*16kB 56*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32368kB Aug 17 00:16:37 coyote kernel: Swap cache: add 102702, delete 93647, find 17218/24748, race 0+0 Aug 17 00:16:37 coyote kernel: Out of Memory: Killed process 1968 (arpwatch). Aug 17 00:16:37 coyote kernel: device eth0 left promiscuous mode Aug 17 00:16:37 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:37 coyote kernel: DMA per-cpu: Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:37 coyote kernel: Normal per-cpu: Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:37 coyote kernel: HighMem per-cpu: Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:37 coyote kernel: Aug 17 00:16:37 coyote kernel: Free pages: 36232kB (32480kB HighMem) Aug 17 00:16:37 coyote kernel: Active:20222 inactive:4053 dirty:0 writeback:0 unstable:0 free:9058 slab:222983 mapped:18921 pagetables:795 Aug 17 00:16:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:37 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:37 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:792kB inactive:1604kB present:901120kB Aug 17 00:16:37 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:37 coyote kernel: HighMem free:32480kB min:128kB low:256kB high:384kB active:80096kB inactive:14608kBpresent:131008kB Aug 17 00:16:37 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:37 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:37 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB Aug 17 00:16:37 coyote kernel: HighMem: 2598*4kB 1645*8kB 366*16kB 56*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32480kB Aug 17 00:16:37 coyote kernel: Swap cache: add 102702, delete 93673, find 17218/24748, race 0+0 Aug 17 00:16:37 coyote kernel: Out of Memory: Killed process 10755 (kdeinit). Aug 17 00:16:37 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:37 coyote kernel: DMA per-cpu: Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:37 coyote kernel: Normal per-cpu: Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:37 coyote kernel: HighMem per-cpu: Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:37 coyote kernel: Aug 17 00:16:37 coyote kernel: Free pages: 25392kB (21616kB HighMem) Aug 17 00:16:37 coyote kernel: Active:21664 inactive:5363 dirty:0 writeback:0 unstable:0 free:6348 slab:223017 mapped:19400 pagetables:798 Aug 17 00:16:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:37 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:37 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1132kB inactive:1268kB present:901120kB Aug 17 00:16:37 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:37 coyote kernel: HighMem free:21616kB min:128kB low:256kB high:384kB active:85524kB inactive:20184kBpresent:131008kB Aug 17 00:16:37 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:37 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:37 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB Aug 17 00:16:38 coyote kernel: HighMem: 0*4kB 1536*8kB 373*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 21616kB Aug 17 00:16:39 coyote kernel: Swap cache: add 103622, delete 93673, find 17556/25253, race 0+0 Aug 17 00:16:39 coyote kernel: Out of Memory: Killed process 1812 (httpd). Aug 17 00:16:40 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:40 coyote kernel: DMA per-cpu: Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:40 coyote kernel: Normal per-cpu: Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:40 coyote kernel: HighMem per-cpu: Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:40 coyote kernel: Aug 17 00:16:40 coyote kernel: Free pages: 26540kB (22764kB HighMem) Aug 17 00:16:40 coyote kernel: Active:21447 inactive:5282 dirty:0 writeback:0 unstable:0 free:6635 slab:223012 mapped:19102 pagetables:789 Aug 17 00:16:40 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:40 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:40 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1456kB inactive:944kB present:901120kB Aug 17 00:16:40 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:40 coyote kernel: HighMem free:22764kB min:128kB low:256kB high:384kB active:84332kB inactive:20184kBpresent:131008kB Aug 17 00:16:40 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:40 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:40 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB Aug 17 00:16:40 coyote kernel: HighMem: 221*4kB 1569*8kB 373*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22764kB Aug 17 00:16:40 coyote kernel: Swap cache: add 103622, delete 93673, find 17556/25253, race 0+0 Aug 17 00:16:40 coyote kernel: Out of Memory: Killed process 1813 (httpd). Aug 17 00:16:40 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:40 coyote kernel: DMA per-cpu: Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:40 coyote kernel: Normal per-cpu: Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:40 coyote kernel: HighMem per-cpu: Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:40 coyote kernel: Aug 17 00:16:40 coyote kernel: Free pages: 25784kB (22008kB HighMem) Aug 17 00:16:40 coyote kernel: Active:21640 inactive:5304 dirty:0 writeback:0 unstable:0 free:6446 slab:223008 mapped:19233 pagetables:780 Aug 17 00:16:40 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:40 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:40 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1368kB inactive:1032kB present:901120kB Aug 17 00:16:40 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:40 coyote kernel: HighMem free:22008kB min:128kB low:256kB high:384kB active:85192kB inactive:20184kBpresent:131008kB Aug 17 00:16:40 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:40 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:40 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB Aug 17 00:16:40 coyote kernel: HighMem: 28*4kB 1571*8kB 373*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22008kB Aug 17 00:16:40 coyote kernel: Swap cache: add 103622, delete 93673, find 17556/25253, race 0+0 Aug 17 00:16:40 coyote kernel: Out of Memory: Killed process 1810 (httpd). Aug 17 00:16:40 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:40 coyote kernel: DMA per-cpu: Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:40 coyote kernel: Normal per-cpu: Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:40 coyote kernel: HighMem per-cpu: Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:40 coyote kernel: Aug 17 00:16:40 coyote kernel: Free pages: 25952kB (22176kB HighMem) Aug 17 00:16:40 coyote kernel: Active:21621 inactive:5296 dirty:0 writeback:0 unstable:0 free:6488 slab:223006 mapped:19162 pagetables:771 Aug 17 00:16:40 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:40 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:40 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1400kB inactive:1000kB present:901120kB Aug 17 00:16:41 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:41 coyote kernel: HighMem free:22176kB min:128kB low:256kB high:384kB active:85084kB inactive:20184kBpresent:131008kB Aug 17 00:16:41 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:41 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:41 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB Aug 17 00:16:41 coyote kernel: HighMem: 70*4kB 1571*8kB 373*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22176kB Aug 17 00:16:41 coyote kernel: Swap cache: add 103622, delete 93673, find 17740/25437, race 0+0 Aug 17 00:16:41 coyote kernel: Out of Memory: Killed process 3119 (kdeinit). Aug 17 00:16:41 coyote kernel: oom-killer: gfp_mask=0xd0 Aug 17 00:16:41 coyote kernel: DMA per-cpu: Aug 17 00:16:41 coyote kernel: cpu 0 hot: low 2, high 6, batch 1 Aug 17 00:16:41 coyote kernel: cpu 0 cold: low 0, high 2, batch 1 Aug 17 00:16:41 coyote kernel: Normal per-cpu: Aug 17 00:16:41 coyote kernel: cpu 0 hot: low 32, high 96, batch 16 Aug 17 00:16:41 coyote kernel: cpu 0 cold: low 0, high 32, batch 16 Aug 17 00:16:41 coyote kernel: HighMem per-cpu: Aug 17 00:16:41 coyote kernel: cpu 0 hot: low 14, high 42, batch 7 Aug 17 00:16:41 coyote kernel: cpu 0 cold: low 0, high 14, batch 7 Aug 17 00:16:41 coyote kernel: Aug 17 00:16:41 coyote kernel: Free pages: 26820kB (23044kB HighMem) Aug 17 00:16:41 coyote kernel: Active:21387 inactive:5311 dirty:0 writeback:0 unstable:0 free:6705 slab:223005 mapped:18948 pagetables:754 Aug 17 00:16:41 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB Aug 17 00:16:41 coyote kernel: protections[]: 8 476 540 Aug 17 00:16:41 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1340kB inactive:1060kB present:901120kB Aug 17 00:16:41 coyote kernel: protections[]: 0 468 532 Aug 17 00:16:41 coyote kernel: HighMem free:23044kB min:128kB low:256kB high:384kB active:84208kB inactive:20184kBpresent:131008kB Aug 17 00:16:41 coyote kernel: protections[]: 0 0 64 Aug 17 00:16:41 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB Aug 17 00:16:41 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB Aug 17 00:16:41 coyote kernel: HighMem: 273*4kB 1576*8kB 374*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 23044kB Aug 17 00:16:41 coyote kernel: Swap cache: add 103622, delete 93761, find 17740/25437, race 0+0 Aug 17 00:16:41 coyote kernel: Out of Memory: Killed process 3133 (kdeinit). [root@coyote xsane-0.90]# cat /proc/meminfo MemTotal: 1035956 kB MemFree: 5524 kB Buffers: 15816 kB Cached: 80116 kB SwapCached: 57788 kB Active: 134848 kB Inactive: 51592 kB HighTotal: 131008 kB HighFree: 532 kB LowTotal: 904948 kB LowFree: 4992 kB SwapTotal: 3857104 kB SwapFree: 3752500 kB Dirty: 164 kB Writeback: 0 kB Mapped: 115268 kB Slab: 833184 kB Committed_AS: 295784 kB PageTables: 3424 kB VmallocTotal: 114680 kB VmallocUsed: 19876 kB VmallocChunk: 94640 kB [root@coyote xsane-0.90]# cat /proc/slabinfo slabinfo - version: 2.0 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> unix_sock 200 200 384 10 1 : tunables 54 27 0 : slabdata 20 20 0 tcp_tw_bucket 4 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_bind_bucket 27 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 tcp_open_request 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 inet_peer_cache 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 ip_fib_hash 10 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 ip_dst_cache 16 30 256 15 1 : tunables 120 60 0 : slabdata 2 2 0 arp_cache 4 31 128 31 1 : tunables 120 60 0 : slabdata 1 1 0 raw4_sock 0 0 480 8 1 : tunables 54 27 0 : slabdata 0 0 0 udp_sock 2 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 tcp_sock 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 flow_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 mqueue_inode_cache 1 8 480 8 1 : tunables 54 27 0 : slabdata 1 1 0 udf_inode_cache 0 0 352 11 1 : tunables 54 27 0 : slabdata 0 0 0 smb_request 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 smb_inode_cache 1 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 isofs_inode_cache 0 0 320 12 1 : tunables 54 27 0 : slabdata 0 0 0 fat_inode_cache 4 22 352 11 1 : tunables 54 27 0 : slabdata 2 2 0 ext2_inode_cache 0 0 416 9 1 : tunables 54 27 0 : slabdata 0 0 0 journal_handle 25 135 28 135 1 : tunables 120 60 0 : slabdata 1 1 0 journal_head 607 2835 48 81 1 : tunables 120 60 0 : slabdata 35 35 0 revoke_table 14 290 12 290 1 : tunables 120 60 0 : slabdata 1 1 0 revoke_record 0 0 16 226 1 : tunables 120 60 0 : slabdata 0 0 0 ext3_inode_cache 1488612 1488618 448 9 1 : tunables 54 27 0 : slabdata 165402 165402 0 eventpoll_pwq 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 eventpoll_epi 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 kioctx 0 0 160 25 1 : tunables 120 60 0 : slabdata 0 0 0 kiocb 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 dnotify_cache 222 370 20 185 1 : tunables 120 60 0 : slabdata 2 2 0 file_lock_cache 19 43 92 43 1 : tunables 120 60 0 : slabdata 1 1 0 fasync_cache 2 226 16 226 1 : tunables 120 60 0 : slabdata 1 1 0 shmem_inode_cache 5 10 384 10 1 : tunables 54 27 0 : slabdata 1 1 0 posix_timers_cache 0 0 96 41 1 : tunables 120 60 0 : slabdata 0 0 0 uid_cache 5 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 sgpool-128 32 32 2048 2 1 : tunables 24 12 0 : slabdata 16 16 0 sgpool-64 32 32 1024 4 1 : tunables 54 27 0 : slabdata 8 8 0 sgpool-32 32 32 512 8 1 : tunables 54 27 0 : slabdata 4 4 0 sgpool-16 32 45 256 15 1 : tunables 120 60 0 : slabdata 3 3 0 sgpool-8 32 62 128 31 1 : tunables 120 60 0 : slabdata 2 2 0 cfq_pool 64 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 crq_pool 0 0 36 107 1 : tunables 120 60 0 : slabdata 0 0 0 deadline_drq 0 0 48 81 1 : tunables 120 60 0 : slabdata 0 0 0 as_arq 101 130 60 65 1 : tunables 120 60 0 : slabdata 2 2 0 blkdev_ioc 80 185 20 185 1 : tunables 120 60 0 : slabdata 1 1 0 blkdev_queue 12 18 448 9 1 : tunables 54 27 0 : slabdata 2 2 0 blkdev_requests 80 104 152 26 1 : tunables 120 60 0 : slabdata 4 4 0 biovec-(256) 256 256 3072 2 2 : tunables 24 12 0 : slabdata 128 128 0 biovec-128 256 260 1536 5 2 : tunables 24 12 0 : slabdata 52 52 0 biovec-64 256 260 768 5 1 : tunables 54 27 0 : slabdata 52 52 0 biovec-16 256 260 192 20 1 : tunables 120 60 0 : slabdata 13 13 0 biovec-4 256 305 64 61 1 : tunables 120 60 0 : slabdata 5 5 0 biovec-1 368 452 16 226 1 : tunables 120 60 0 : slabdata 2 2 0 bio 366 366 64 61 1 : tunables 120 60 0 : slabdata 6 6 0 sock_inode_cache 234 242 352 11 1 : tunables 54 27 0 : slabdata 22 22 0 skbuff_head_cache 251 475 160 25 1 : tunables 120 60 0 : slabdata 19 19 0 sock 2 12 320 12 1 : tunables 54 27 0 : slabdata 1 1 0 proc_inode_cache 610 612 320 12 1 : tunables 54 27 0 : slabdata 51 51 0 sigqueue 66 81 148 27 1 : tunables 120 60 0 : slabdata 3 3 0 radix_tree_node 2565 3276 276 14 1 : tunables 54 27 0 : slabdata 234 234 0 bdev_cache 12 18 416 9 1 : tunables 54 27 0 : slabdata 2 2 0 mnt_cache 25 41 96 41 1 : tunables 120 60 0 : slabdata 1 1 0 inode_cache 2354 2380 288 14 1 : tunables 54 27 0 : slabdata 170 170 0 dentry_cache 1115280 1116752 140 28 1 : tunables 120 60 0 : slabdata 39884 39884 0 filp 2060 2300 160 25 1 : tunables 120 60 0 : slabdata 92 92 0 names_cache 17 17 4096 1 1 : tunables 24 12 0 : slabdata 17 17 0 idr_layer_cache 81 87 136 29 1 : tunables 120 60 0 : slabdata 3 3 0 buffer_head 4151 8424 48 81 1 : tunables 120 60 0 : slabdata 104 104 0 mm_struct 98 98 512 7 1 : tunables 54 27 0 : slabdata 14 14 0 vm_area_struct 8554 8554 84 47 1 : tunables 120 60 0 : slabdata 182 182 0 fs_cache 94 119 32 119 1 : tunables 120 60 0 : slabdata 1 1 0 files_cache 93 99 416 9 1 : tunables 54 27 0 : slabdata 11 11 0 signal_cache 116 123 96 41 1 : tunables 120 60 0 : slabdata 3 3 0 sighand_cache 111 111 1312 3 1 : tunables 24 12 0 : slabdata 37 37 0 task_struct 121 130 1424 5 2 : tunables 24 12 0 : slabdata 26 26 0 anon_vma 1770 2035 8 407 1 : tunables 120 60 0 : slabdata 5 5 0 pgd 94 94 4096 1 1 : tunables 24 12 0 : slabdata 94 94 0 size-131072(DMA) 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-131072 0 0 131072 1 32 : tunables 8 4 0 : slabdata 0 0 0 size-65536(DMA) 0 0 65536 1 16 : tunables 8 4 0 : slabdata 0 0 0 size-65536 1 1 65536 1 16 : tunables 8 4 0 : slabdata 1 1 0 size-32768(DMA) 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-32768 0 0 32768 1 8 : tunables 8 4 0 : slabdata 0 0 0 size-16384(DMA) 0 0 16384 1 4 : tunables 8 4 0 : slabdata 0 0 0 size-16384 5 9 16384 1 4 : tunables 8 4 0 : slabdata 5 9 0 size-8192(DMA) 0 0 8192 1 2 : tunables 8 4 0 : slabdata 0 0 0 size-8192 11 11 8192 1 2 : tunables 8 4 0 : slabdata 11 11 0 size-4096(DMA) 0 0 4096 1 1 : tunables 24 12 0 : slabdata 0 0 0 size-4096 184 184 4096 1 1 : tunables 24 12 0 : slabdata 184 184 0 size-2048(DMA) 0 0 2048 2 1 : tunables 24 12 0 : slabdata 0 0 0 size-2048 174 194 2048 2 1 : tunables 24 12 0 : slabdata 97 97 0 size-1024(DMA) 0 0 1024 4 1 : tunables 54 27 0 : slabdata 0 0 0 size-1024 157 180 1024 4 1 : tunables 54 27 0 : slabdata 45 45 0 size-512(DMA) 0 0 512 8 1 : tunables 54 27 0 : slabdata 0 0 0 size-512 197 448 512 8 1 : tunables 54 27 0 : slabdata 56 56 0 size-256(DMA) 0 0 256 15 1 : tunables 120 60 0 : slabdata 0 0 0 size-256 213 420 256 15 1 : tunables 120 60 0 : slabdata 28 28 0 size-192(DMA) 0 0 192 20 1 : tunables 120 60 0 : slabdata 0 0 0 size-192 120 120 192 20 1 : tunables 120 60 0 : slabdata 6 6 0 size-128(DMA) 0 0 128 31 1 : tunables 120 60 0 : slabdata 0 0 0 size-128 1243 1302 128 31 1 : tunables 120 60 0 : slabdata 42 42 0 size-64(DMA) 0 0 64 61 1 : tunables 120 60 0 : slabdata 0 0 0 size-64 47735 48251 64 61 1 : tunables 120 60 0 : slabdata 791 791 0 size-32(DMA) 0 0 32 119 1 : tunables 120 60 0 : slabdata 0 0 0 size-32 1368 1428 32 119 1 : tunables 120 60 0 : slabdata 12 12 0 kmem_cache 124 124 128 31 1 : tunables 120 60 0 : slabdata 4 4 0 I cannot start any new shells, as before. Is there any usable dna in this sample? Reboot time I guess :((( -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-17 4:44 ` Gene Heskett @ 2004-08-17 4:58 ` Nick Piggin 2004-08-17 5:26 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Nick Piggin @ 2004-08-17 4:58 UTC (permalink / raw) To: gene.heskett Cc: linux-kernel, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton Gene Heskett wrote: >On Monday 16 August 2004 19:01, viro@parcelfarce.linux.theplanet.co.uk wrote: > >>On Mon, Aug 16, 2004 at 06:52:50PM -0400, Gene Heskett wrote: >> >>>Well, I am seing some dups, but they are so volatile that no two >>>runs will report the same allocations as dups, and its never more >>>than 2 using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v ' >>>1 ' >>> >>>Consecutive runs will show anywhere from 3 to 10 or 12 dups, but >>>never is an address repeated between runs. >>> >>>How is this to be interpreted? >>> >>That's OK. Keep in mind that you have a *lot* of these guys and >>your cat(1) makes a lot of read(2) calls. So what you see is >> >><starting to read> >><see inode #n that is about to be evicted> >><read some more> >><inode #n gets evicted, quite possibly - due to memory pressure from >>cat(1) or sort(1)> >><read more> >><somebody wants the same inode again> >><read more> >><see the inode #n we'd just had read from disk again> >> >>So few duplicates are all right. >> > >I hope so. I've got a real hoodoozy here, being out of memory (well, >maybe 30 megs left) when my nightly run of rsync started, everything >came to a grinding halt. I couldn't even get to the screen the >tail -f on the log was running in, but after walking away for 10 minutes. >I can once again. However, things seem to be partially functional so >I'm going to see if I can do some cut-n-paste from the log screen to >here, but I probably can't send it as sendmail was one of the items the >OOM killer killed. According to top, I'm about 250 megs into the >swap, very suddenly. No swap was in use at 23:55 local. > > snip > >I cannot start any new shells, as before. Is there any usable dna in this sample? > >Reboot time I guess :((( > > All your low memory has been used by dentry and inode caches. This isn't very interesting because this would be no doubt caused by something oopsing while holding the shrinker semaphore as Andrew pointed out. What is interesting is that first Oops message (I wonder if you don't have bad hardware though, I don't think anyone else is seeing it). ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-17 4:58 ` Nick Piggin @ 2004-08-17 5:26 ` Gene Heskett 2004-08-17 11:57 ` Nick Piggin 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-17 5:26 UTC (permalink / raw) To: linux-kernel Cc: Nick Piggin, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Tuesday 17 August 2004 00:58, Nick Piggin wrote: >Gene Heskett wrote: >>On Monday 16 August 2004 19:01, viro@parcelfarce.linux.theplanet.co.uk wrote: >>>On Mon, Aug 16, 2004 at 06:52:50PM -0400, Gene Heskett wrote: >>>>Well, I am seing some dups, but they are so volatile that no two >>>>runs will report the same allocations as dups, and its never more >>>>than 2 using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v ' >>>>1 ' >>>> >>>>Consecutive runs will show anywhere from 3 to 10 or 12 dups, but >>>>never is an address repeated between runs. >>>> >>>>How is this to be interpreted? >>> >>>That's OK. Keep in mind that you have a *lot* of these guys and >>>your cat(1) makes a lot of read(2) calls. So what you see is >>> >>><starting to read> >>><see inode #n that is about to be evicted> >>><read some more> >>><inode #n gets evicted, quite possibly - due to memory pressure >>> from cat(1) or sort(1)> >>><read more> >>><somebody wants the same inode again> >>><read more> >>><see the inode #n we'd just had read from disk again> >>> >>>So few duplicates are all right. >> >>I hope so. I've got a real hoodoozy here, being out of memory >> (well, maybe 30 megs left) when my nightly run of rsync started, >> everything came to a grinding halt. I couldn't even get to the >> screen the tail -f on the log was running in, but after walking >> away for 10 minutes. I can once again. However, things seem to be >> partially functional so I'm going to see if I can do some >> cut-n-paste from the log screen to here, but I probably can't send >> it as sendmail was one of the items the OOM killer killed. >> According to top, I'm about 250 megs into the swap, very suddenly. >> No swap was in use at 23:55 local. > >snip > >>I cannot start any new shells, as before. Is there any usable dna >> in this sample? >> >>Reboot time I guess :((( > >All your low memory has been used by dentry and inode caches. This > isn't very >interesting because this would be no doubt caused by something > oopsing while holding the shrinker semaphore as Andrew pointed out. > >What is interesting is that first Oops message (I wonder if you > don't have bad hardware though, I don't think anyone else is seeing > it). What 'first Oops message'? One I posted before? That comment caused me to go back in the log to well above where I had been channel surfing with tvtime, and I did find an Oops: Aug 16 21:15:46 coyote kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 Aug 16 21:15:46 coyote kernel: printing eip: Aug 16 21:15:46 coyote kernel: c015c8db Aug 16 21:15:46 coyote kernel: *pde = 00000000 Aug 16 21:15:46 coyote kernel: Oops: 0002 [#1] Aug 16 21:15:46 coyote kernel: Modules linked in: tuner tvaudio bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq _midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_allo c snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg Aug 16 21:15:46 coyote kernel: CPU: 0 Aug 16 21:15:46 coyote kernel: EIP: 0060:[<c015c8db>] Not tainted Aug 16 21:15:46 coyote kernel: EFLAGS: 00210206 (2.6.8-rc4) Aug 16 21:15:46 coyote kernel: EIP is at prune_icache+0x6b/0x1b0 Aug 16 21:15:46 coyote kernel: eax: 00000000 ebx: dffe0fd0 ecx: d3eb8b80 edx: c0341660 Aug 16 21:15:46 coyote kernel: esi: dffe0fc8 edi: 0000005a ebp: d3eb8b94 esp: d3eb8b74 Aug 16 21:15:46 coyote kernel: ds: 007b es: 007b ss: 0068 Aug 16 21:15:46 coyote kernel: Process yum (pid: 30892, threadinfo=d3eb8000 task=cf6bf7b0) Aug 16 21:15:46 coyote kernel: Stack: dffe0448 00000000 00000059 dffe0450 df58d0d0 00000080 00000000 d3eb8000 Aug 16 21:15:46 coyote kernel: d3eb8ba0 c015ca5f 00000080 d3eb8bd4 c0135b14 00000080 000000d2 0108bf00 Aug 16 21:15:46 coyote kernel: 00000000 00021087 00000080 00000000 f7ffea20 0000000a d3eb8c50 00000000 Aug 16 21:15:46 coyote kernel: Call Trace: Aug 16 21:15:46 coyote kernel: [<c01044ef>] show_stack+0x7f/0xa0 Aug 16 21:15:46 coyote kernel: [<c0104688>] show_registers+0x158/0x1b0 Aug 16 21:15:46 coyote kernel: [<c01047e6>] die+0x66/0xd0 Aug 16 21:15:46 coyote kernel: [<c01109de>] do_page_fault+0x28e/0x548 Aug 16 21:15:46 coyote kernel: [<c010415d>] error_code+0x2d/0x38 Aug 16 21:15:46 coyote kernel: [<c015ca5f>] shrink_icache_memory+0x3f/0x50 Aug 16 21:15:46 coyote kernel: [<c0135b14>] shrink_slab+0x134/0x170 Aug 16 21:15:46 coyote kernel: [<c0136954>] try_to_free_pages+0xa4/0x160 Aug 16 21:15:46 coyote kernel: [<c012fc23>] __alloc_pages+0x1b3/0x320 Aug 16 21:15:46 coyote kernel: [<c0139a8f>] do_anonymous_page+0x5f/0x180 Aug 16 21:15:46 coyote kernel: [<c0139c11>] do_no_page+0x61/0x310 Aug 16 21:15:46 coyote kernel: [<c013a097>] handle_mm_fault+0xd7/0x160 Aug 16 21:15:46 coyote kernel: [<c01108a0>] do_page_fault+0x150/0x548 Aug 16 21:15:46 coyote kernel: [<c010415d>] error_code+0x2d/0x38 Aug 16 21:15:46 coyote kernel: [<c012c279>] do_generic_mapping_read+0x129/0x430 Aug 16 21:15:46 coyote kernel: [<c012c836>] __generic_file_aio_read+0x1b6/0x1f0 Aug 16 21:15:46 coyote kernel: [<c012c8c2>] generic_file_aio_read+0x52/0x70 Aug 16 21:15:46 coyote kernel: [<c0145898>] do_sync_read+0x78/0xa0 Aug 16 21:15:46 coyote kernel: [<c014598a>] vfs_read+0xca/0x140 Aug 16 21:15:46 coyote kernel: [<c0145c2b>] sys_read+0x4b/0x80 Aug 16 21:15:46 coyote kernel: [<c0103f61>] sysenter_past_esp+0x52/0x71 Aug 16 21:15:46 coyote kernel: Code: 89 10 a1 60 16 34 c0 89 58 04 89 03 c7 43 04 60 16 34 c0 89 yum did a segfault about that time. yum is nice code, when it fscking works, which is maybe half the time on 2 different FC2 machines here now. So we're back to the dentry_cache thing... Duh, NO!, this is in prune_icache, not prune_dcache, presumably slightly different. As far as bad hardware is concerned, warranty time is running out. I need something plausible to take back to tcwo as a good reason for requesting a 'blanket rma' on the whole thing, would they please send me another. Preferably an AMD Athlon 2800XP that wasn't stepping 00. Or are the bug lists constant across these processors? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-17 5:26 ` Gene Heskett @ 2004-08-17 11:57 ` Nick Piggin 2004-08-19 9:41 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Nick Piggin @ 2004-08-17 11:57 UTC (permalink / raw) To: gene.heskett Cc: linux-kernel, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton Gene Heskett wrote: > On Tuesday 17 August 2004 00:58, Nick Piggin wrote: > >>Gene Heskett wrote: >>>Reboot time I guess :((( >> >>All your low memory has been used by dentry and inode caches. This >>isn't very >>interesting because this would be no doubt caused by something >>oopsing while holding the shrinker semaphore as Andrew pointed out. >> >>What is interesting is that first Oops message (I wonder if you >>don't have bad hardware though, I don't think anyone else is seeing >>it). > > > What 'first Oops message'? One I posted before? > Well, the first Oops that your running kernel raises. Usually you don't bother about subsequent oopses and misbehaviour because the first one can cause the system to go into a funny state - this is a prime example. > That comment caused me to go back in the log to well above where I had > been channel surfing with tvtime, and I did find an Oops: > > Aug 16 21:15:46 coyote kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000 > Aug 16 21:15:46 coyote kernel: printing eip: > Aug 16 21:15:46 coyote kernel: c015c8db > Aug 16 21:15:46 coyote kernel: *pde = 00000000 > Aug 16 21:15:46 coyote kernel: Oops: 0002 [#1] > Aug 16 21:15:46 coyote kernel: Modules linked in: tuner tvaudio bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq > _midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_allo > c snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg > Aug 16 21:15:46 coyote kernel: CPU: 0 > Aug 16 21:15:46 coyote kernel: EIP: 0060:[<c015c8db>] Not tainted > Aug 16 21:15:46 coyote kernel: EFLAGS: 00210206 (2.6.8-rc4) > Aug 16 21:15:46 coyote kernel: EIP is at prune_icache+0x6b/0x1b0 > Aug 16 21:15:46 coyote kernel: eax: 00000000 ebx: dffe0fd0 ecx: d3eb8b80 edx: c0341660 > Aug 16 21:15:46 coyote kernel: esi: dffe0fc8 edi: 0000005a ebp: d3eb8b94 esp: d3eb8b74 > Aug 16 21:15:46 coyote kernel: ds: 007b es: 007b ss: 0068 > Aug 16 21:15:46 coyote kernel: Process yum (pid: 30892, threadinfo=d3eb8000 task=cf6bf7b0) > Aug 16 21:15:46 coyote kernel: Stack: dffe0448 00000000 00000059 dffe0450 df58d0d0 00000080 00000000 d3eb8000 > Aug 16 21:15:46 coyote kernel: d3eb8ba0 c015ca5f 00000080 d3eb8bd4 c0135b14 00000080 000000d2 0108bf00 > Aug 16 21:15:46 coyote kernel: 00000000 00021087 00000080 00000000 f7ffea20 0000000a d3eb8c50 00000000 > Aug 16 21:15:46 coyote kernel: Call Trace: > Aug 16 21:15:46 coyote kernel: [<c01044ef>] show_stack+0x7f/0xa0 > Aug 16 21:15:46 coyote kernel: [<c0104688>] show_registers+0x158/0x1b0 > Aug 16 21:15:46 coyote kernel: [<c01047e6>] die+0x66/0xd0 > Aug 16 21:15:46 coyote kernel: [<c01109de>] do_page_fault+0x28e/0x548 > Aug 16 21:15:46 coyote kernel: [<c010415d>] error_code+0x2d/0x38 > Aug 16 21:15:46 coyote kernel: [<c015ca5f>] shrink_icache_memory+0x3f/0x50 > Aug 16 21:15:46 coyote kernel: [<c0135b14>] shrink_slab+0x134/0x170 > Aug 16 21:15:46 coyote kernel: [<c0136954>] try_to_free_pages+0xa4/0x160 > Aug 16 21:15:46 coyote kernel: [<c012fc23>] __alloc_pages+0x1b3/0x320 > Aug 16 21:15:46 coyote kernel: [<c0139a8f>] do_anonymous_page+0x5f/0x180 > Aug 16 21:15:46 coyote kernel: [<c0139c11>] do_no_page+0x61/0x310 > Aug 16 21:15:46 coyote kernel: [<c013a097>] handle_mm_fault+0xd7/0x160 > Aug 16 21:15:46 coyote kernel: [<c01108a0>] do_page_fault+0x150/0x548 > Aug 16 21:15:46 coyote kernel: [<c010415d>] error_code+0x2d/0x38 > Aug 16 21:15:46 coyote kernel: [<c012c279>] do_generic_mapping_read+0x129/0x430 > Aug 16 21:15:46 coyote kernel: [<c012c836>] __generic_file_aio_read+0x1b6/0x1f0 > Aug 16 21:15:46 coyote kernel: [<c012c8c2>] generic_file_aio_read+0x52/0x70 > Aug 16 21:15:46 coyote kernel: [<c0145898>] do_sync_read+0x78/0xa0 > Aug 16 21:15:46 coyote kernel: [<c014598a>] vfs_read+0xca/0x140 > Aug 16 21:15:46 coyote kernel: [<c0145c2b>] sys_read+0x4b/0x80 > Aug 16 21:15:46 coyote kernel: [<c0103f61>] sysenter_past_esp+0x52/0x71 > Aug 16 21:15:46 coyote kernel: Code: 89 10 a1 60 16 34 c0 89 58 04 89 03 c7 43 04 60 16 34 c0 89 > > yum did a segfault about that time. yum is nice code, when > it fscking works, which is maybe half the time on 2 different > FC2 machines here now. > Although an Oops is always the kernel's (or bad hardware's) fault. So in this case you can let yum off the hook :) > So we're back to the dentry_cache thing... Duh, NO!, this is in > prune_icache, not prune_dcache, presumably slightly different. > Yeah, both are going to cause cache shrinking to stop working. > As far as bad hardware is concerned, warranty time is running out. > I need something plausible to take back to tcwo as a good reason > for requesting a 'blanket rma' on the whole thing, would they > please send me another. > Not too sure really. At this stage keep trying patches that you get sent :P ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-17 11:57 ` Nick Piggin @ 2004-08-19 9:41 ` Gene Heskett 2004-08-19 18:36 ` Marcelo Tosatti 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-19 9:41 UTC (permalink / raw) To: linux-kernel Cc: Nick Piggin, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton On Tuesday 17 August 2004 07:57, Nick Piggin wrote: >Gene Heskett wrote: >> On Tuesday 17 August 2004 00:58, Nick Piggin wrote: >>>Gene Heskett wrote: >>>>Reboot time I guess :((( >>> >>>All your low memory has been used by dentry and inode caches. This >>>isn't very >>>interesting because this would be no doubt caused by something >>>oopsing while holding the shrinker semaphore as Andrew pointed >>> out. >>> >>>What is interesting is that first Oops message (I wonder if you >>>don't have bad hardware though, I don't think anyone else is >>> seeing it). >> >> What 'first Oops message'? One I posted before? > >Well, the first Oops that your running kernel raises. Usually you >don't bother about subsequent oopses and misbehaviour because the >first one can cause the system to go into a funny state - this is >a prime example. > >> That comment caused me to go back in the log to well above where I >> had been channel surfing with tvtime, and I did find an Oops: >> >> Aug 16 21:15:46 coyote kernel: Unable to handle kernel NULL >> pointer dereference at virtual address 00000000 Aug 16 21:15:46 >> coyote kernel: printing eip: >> Aug 16 21:15:46 coyote kernel: c015c8db >> Aug 16 21:15:46 coyote kernel: *pde = 00000000 >> Aug 16 21:15:46 coyote kernel: Oops: 0002 [#1] >> Aug 16 21:15:46 coyote kernel: Modules linked in: tuner tvaudio >> bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq _midi_event >> snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 >> snd_ac97_codec snd_pcm snd_timer snd_page_allo c snd_mpu401_uart >> snd_rawmidi snd_seq_device snd forcedeth sg Aug 16 21:15:46 coyote >> kernel: CPU: 0 >> Aug 16 21:15:46 coyote kernel: EIP: 0060:[<c015c8db>] Not >> tainted Aug 16 21:15:46 coyote kernel: EFLAGS: 00210206 >> (2.6.8-rc4) Aug 16 21:15:46 coyote kernel: EIP is at >> prune_icache+0x6b/0x1b0 Aug 16 21:15:46 coyote kernel: eax: >> 00000000 ebx: dffe0fd0 ecx: d3eb8b80 edx: c0341660 Aug 16 >> 21:15:46 coyote kernel: esi: dffe0fc8 edi: 0000005a ebp: >> d3eb8b94 esp: d3eb8b74 Aug 16 21:15:46 coyote kernel: ds: 007b >> es: 007b ss: 0068 Aug 16 21:15:46 coyote kernel: Process yum >> (pid: 30892, threadinfo=d3eb8000 task=cf6bf7b0) Aug 16 21:15:46 >> coyote kernel: Stack: dffe0448 00000000 00000059 dffe0450 df58d0d0 >> 00000080 00000000 d3eb8000 Aug 16 21:15:46 coyote kernel: >> d3eb8ba0 c015ca5f 00000080 d3eb8bd4 c0135b14 00000080 000000d2 >> 0108bf00 Aug 16 21:15:46 coyote kernel: 00000000 00021087 >> 00000080 00000000 f7ffea20 0000000a d3eb8c50 00000000 Aug 16 >> 21:15:46 coyote kernel: Call Trace: >> Aug 16 21:15:46 coyote kernel: [<c01044ef>] show_stack+0x7f/0xa0 >> Aug 16 21:15:46 coyote kernel: [<c0104688>] >> show_registers+0x158/0x1b0 Aug 16 21:15:46 coyote kernel: >> [<c01047e6>] die+0x66/0xd0 Aug 16 21:15:46 coyote kernel: >> [<c01109de>] do_page_fault+0x28e/0x548 Aug 16 21:15:46 coyote >> kernel: [<c010415d>] error_code+0x2d/0x38 Aug 16 21:15:46 coyote >> kernel: [<c015ca5f>] shrink_icache_memory+0x3f/0x50 Aug 16 >> 21:15:46 coyote kernel: [<c0135b14>] shrink_slab+0x134/0x170 Aug >> 16 21:15:46 coyote kernel: [<c0136954>] >> try_to_free_pages+0xa4/0x160 Aug 16 21:15:46 coyote kernel: >> [<c012fc23>] __alloc_pages+0x1b3/0x320 Aug 16 21:15:46 coyote >> kernel: [<c0139a8f>] do_anonymous_page+0x5f/0x180 Aug 16 21:15:46 >> coyote kernel: [<c0139c11>] do_no_page+0x61/0x310 Aug 16 21:15:46 >> coyote kernel: [<c013a097>] handle_mm_fault+0xd7/0x160 Aug 16 >> 21:15:46 coyote kernel: [<c01108a0>] do_page_fault+0x150/0x548 >> Aug 16 21:15:46 coyote kernel: [<c010415d>] error_code+0x2d/0x38 >> Aug 16 21:15:46 coyote kernel: [<c012c279>] >> do_generic_mapping_read+0x129/0x430 Aug 16 21:15:46 coyote kernel: >> [<c012c836>] __generic_file_aio_read+0x1b6/0x1f0 Aug 16 21:15:46 >> coyote kernel: [<c012c8c2>] generic_file_aio_read+0x52/0x70 Aug >> 16 21:15:46 coyote kernel: [<c0145898>] do_sync_read+0x78/0xa0 >> Aug 16 21:15:46 coyote kernel: [<c014598a>] vfs_read+0xca/0x140 >> Aug 16 21:15:46 coyote kernel: [<c0145c2b>] sys_read+0x4b/0x80 >> Aug 16 21:15:46 coyote kernel: [<c0103f61>] >> sysenter_past_esp+0x52/0x71 Aug 16 21:15:46 coyote kernel: Code: >> 89 10 a1 60 16 34 c0 89 58 04 89 03 c7 43 04 60 16 34 c0 89 >> >> yum did a segfault about that time. yum is nice code, when >> it fscking works, which is maybe half the time on 2 different >> FC2 machines here now. > >Although an Oops is always the kernel's (or bad hardware's) fault. >So in this case you can let yum off the hook :) > >> So we're back to the dentry_cache thing... Duh, NO!, this is in >> prune_icache, not prune_dcache, presumably slightly different. > >Yeah, both are going to cause cache shrinking to stop working. > >> As far as bad hardware is concerned, warranty time is running out. >> I need something plausible to take back to tcwo as a good reason >> for requesting a 'blanket rma' on the whole thing, would they >> please send me another. > >Not too sure really. At this stage keep trying patches that you get >sent :P I just had another but this ones a bit different: Aug 19 04:22:11 coyote kernel: ------------[ cut here ]------------ Aug 19 04:22:11 coyote kernel: kernel BUG at fs/buffer.c:805! Aug 19 04:22:11 coyote kernel: invalid operand: 0000 [#1] Aug 19 04:22:11 coyote kernel: Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg Aug 19 04:22:11 coyote kernel: CPU: 0 Aug 19 04:22:11 coyote kernel: EIP: 0060:[<c0147d77>] Not tainted Aug 19 04:22:11 coyote kernel: EFLAGS: 00010246 (2.6.8-rc4) Aug 19 04:22:11 coyote kernel: EIP is at remove_inode_buffers+0x77/0x90 Aug 19 04:22:11 coyote kernel: eax: 00000000 ebx: d7de519c ecx: d7deb99c edx: d7deb974 Aug 19 04:22:11 coyote kernel: esi: d7de50c8 edi: 00000001 ebp: c198bedc esp: c198becc Aug 19 04:22:11 coyote kernel: ds: 007b es: 007b ss: 0068 Aug 19 04:22:11 coyote kernel: Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050) Aug 19 04:22:11 coyote kernel: Stack: d7de50c8 d7de50d0 d7de50c8 00000057 c198bf04 c015c985 d7de50c8 00000000 Aug 19 04:22:11 coyote kernel: 00000057 d7de5290 e50ac0d0 00000080 00000000 c198b000 c198bf10 c015ca5f Aug 19 04:22:11 coyote kernel: 00000080 c198bf44 c0135b14 00000080 000000d0 01779600 00000000 0002d1f3 Aug 19 04:22:11 coyote kernel: Call Trace: Aug 19 04:22:11 coyote kernel: [<c01044ef>] show_stack+0x7f/0xa0 Aug 19 04:22:11 coyote kernel: [<c0104688>] show_registers+0x158/0x1b0 Aug 19 04:22:11 coyote kernel: [<c01047e6>] die+0x66/0xd0 Aug 19 04:22:12 coyote kernel: [<c0104bc3>] do_invalid_op+0xb3/0xc0 Aug 19 04:22:12 coyote kernel: [<c010415d>] error_code+0x2d/0x38 Aug 19 04:22:12 coyote kernel: [<c015c985>] prune_icache+0x115/0x1b0 Aug 19 04:22:12 coyote kernel: [<c015ca5f>] shrink_icache_memory+0x3f/0x50 Aug 19 04:22:12 coyote kernel: [<c0135b14>] shrink_slab+0x134/0x170 Aug 19 04:22:12 coyote kernel: [<c0136bb9>] balance_pgdat+0x1a9/0x1f0 Aug 19 04:22:12 coyote kernel: [<c0136cbf>] kswapd+0xbf/0xd0 Aug 19 04:22:12 coyote kernel: [<c01023f1>] kernel_thread_helper+0x5/0x14 Aug 19 04:22:12 coyote kernel: Code: 0f 0b 25 03 e5 0b 30 c0 eb c4 31 ff eb de 0f 0b 36 04 e5 0b The system is still up but its 100 megs into swap so I'm going to reboot without changing anything. Is this one traceable? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-19 9:41 ` Gene Heskett @ 2004-08-19 18:36 ` Marcelo Tosatti 2004-08-20 2:38 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Marcelo Tosatti @ 2004-08-19 18:36 UTC (permalink / raw) To: Gene Heskett Cc: linux-kernel, Nick Piggin, viro, Linus Torvalds, Andrew Morton Gene, That is: /* * The buffer's backing address_space's private_lock must be held */ static inline void __remove_assoc_queue(struct buffer_head *bh) { BUG_ON(bh->b_assoc_buffers.next == NULL); <---------- BUG_ON(bh->b_assoc_buffers.prev == NULL); list_del_init(&bh->b_assoc_buffers); } Viro, Linus, Andrew, dont you have any idea what could cause such mapping->b_assoc_mapping corruption? I can't see how that could be caused by flaky hardware. Maybe we should include those BUGs into the official kernel, or -mm's tree? On Thu, Aug 19, 2004 at 05:41:13AM -0400, Gene Heskett wrote: > On Tuesday 17 August 2004 07:57, Nick Piggin wrote: > >Gene Heskett wrote: > >> On Tuesday 17 August 2004 00:58, Nick Piggin wrote: > >>>Gene Heskett wrote: > >>>>Reboot time I guess :((( > >>> > >>>All your low memory has been used by dentry and inode caches. This > >>>isn't very > >>>interesting because this would be no doubt caused by something > >>>oopsing while holding the shrinker semaphore as Andrew pointed > >>> out. > >>> > >>>What is interesting is that first Oops message (I wonder if you > >>>don't have bad hardware though, I don't think anyone else is > >>> seeing it). > >> > >> What 'first Oops message'? One I posted before? > > > >Well, the first Oops that your running kernel raises. Usually you > >don't bother about subsequent oopses and misbehaviour because the > >first one can cause the system to go into a funny state - this is > >a prime example. > > > >> That comment caused me to go back in the log to well above where I > >> had been channel surfing with tvtime, and I did find an Oops: > >> > >> Aug 16 21:15:46 coyote kernel: Unable to handle kernel NULL > >> pointer dereference at virtual address 00000000 Aug 16 21:15:46 > >> coyote kernel: printing eip: > >> Aug 16 21:15:46 coyote kernel: c015c8db > >> Aug 16 21:15:46 coyote kernel: *pde = 00000000 > >> Aug 16 21:15:46 coyote kernel: Oops: 0002 [#1] > >> Aug 16 21:15:46 coyote kernel: Modules linked in: tuner tvaudio > >> bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq _midi_event > >> snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 > >> snd_ac97_codec snd_pcm snd_timer snd_page_allo c snd_mpu401_uart > >> snd_rawmidi snd_seq_device snd forcedeth sg Aug 16 21:15:46 coyote > >> kernel: CPU: 0 > >> Aug 16 21:15:46 coyote kernel: EIP: 0060:[<c015c8db>] Not > >> tainted Aug 16 21:15:46 coyote kernel: EFLAGS: 00210206 > >> (2.6.8-rc4) Aug 16 21:15:46 coyote kernel: EIP is at > >> prune_icache+0x6b/0x1b0 Aug 16 21:15:46 coyote kernel: eax: > >> 00000000 ebx: dffe0fd0 ecx: d3eb8b80 edx: c0341660 Aug 16 > >> 21:15:46 coyote kernel: esi: dffe0fc8 edi: 0000005a ebp: > >> d3eb8b94 esp: d3eb8b74 Aug 16 21:15:46 coyote kernel: ds: 007b > >> es: 007b ss: 0068 Aug 16 21:15:46 coyote kernel: Process yum > >> (pid: 30892, threadinfo=d3eb8000 task=cf6bf7b0) Aug 16 21:15:46 > >> coyote kernel: Stack: dffe0448 00000000 00000059 dffe0450 df58d0d0 > >> 00000080 00000000 d3eb8000 Aug 16 21:15:46 coyote kernel: > >> d3eb8ba0 c015ca5f 00000080 d3eb8bd4 c0135b14 00000080 000000d2 > >> 0108bf00 Aug 16 21:15:46 coyote kernel: 00000000 00021087 > >> 00000080 00000000 f7ffea20 0000000a d3eb8c50 00000000 Aug 16 > >> 21:15:46 coyote kernel: Call Trace: > >> Aug 16 21:15:46 coyote kernel: [<c01044ef>] show_stack+0x7f/0xa0 > >> Aug 16 21:15:46 coyote kernel: [<c0104688>] > >> show_registers+0x158/0x1b0 Aug 16 21:15:46 coyote kernel: > >> [<c01047e6>] die+0x66/0xd0 Aug 16 21:15:46 coyote kernel: > >> [<c01109de>] do_page_fault+0x28e/0x548 Aug 16 21:15:46 coyote > >> kernel: [<c010415d>] error_code+0x2d/0x38 Aug 16 21:15:46 coyote > >> kernel: [<c015ca5f>] shrink_icache_memory+0x3f/0x50 Aug 16 > >> 21:15:46 coyote kernel: [<c0135b14>] shrink_slab+0x134/0x170 Aug > >> 16 21:15:46 coyote kernel: [<c0136954>] > >> try_to_free_pages+0xa4/0x160 Aug 16 21:15:46 coyote kernel: > >> [<c012fc23>] __alloc_pages+0x1b3/0x320 Aug 16 21:15:46 coyote > >> kernel: [<c0139a8f>] do_anonymous_page+0x5f/0x180 Aug 16 21:15:46 > >> coyote kernel: [<c0139c11>] do_no_page+0x61/0x310 Aug 16 21:15:46 > >> coyote kernel: [<c013a097>] handle_mm_fault+0xd7/0x160 Aug 16 > >> 21:15:46 coyote kernel: [<c01108a0>] do_page_fault+0x150/0x548 > >> Aug 16 21:15:46 coyote kernel: [<c010415d>] error_code+0x2d/0x38 > >> Aug 16 21:15:46 coyote kernel: [<c012c279>] > >> do_generic_mapping_read+0x129/0x430 Aug 16 21:15:46 coyote kernel: > >> [<c012c836>] __generic_file_aio_read+0x1b6/0x1f0 Aug 16 21:15:46 > >> coyote kernel: [<c012c8c2>] generic_file_aio_read+0x52/0x70 Aug > >> 16 21:15:46 coyote kernel: [<c0145898>] do_sync_read+0x78/0xa0 > >> Aug 16 21:15:46 coyote kernel: [<c014598a>] vfs_read+0xca/0x140 > >> Aug 16 21:15:46 coyote kernel: [<c0145c2b>] sys_read+0x4b/0x80 > >> Aug 16 21:15:46 coyote kernel: [<c0103f61>] > >> sysenter_past_esp+0x52/0x71 Aug 16 21:15:46 coyote kernel: Code: > >> 89 10 a1 60 16 34 c0 89 58 04 89 03 c7 43 04 60 16 34 c0 89 > >> > >> yum did a segfault about that time. yum is nice code, when > >> it fscking works, which is maybe half the time on 2 different > >> FC2 machines here now. > > > >Although an Oops is always the kernel's (or bad hardware's) fault. > >So in this case you can let yum off the hook :) > > > >> So we're back to the dentry_cache thing... Duh, NO!, this is in > >> prune_icache, not prune_dcache, presumably slightly different. > > > >Yeah, both are going to cause cache shrinking to stop working. > > > >> As far as bad hardware is concerned, warranty time is running out. > >> I need something plausible to take back to tcwo as a good reason > >> for requesting a 'blanket rma' on the whole thing, would they > >> please send me another. > > > >Not too sure really. At this stage keep trying patches that you get > >sent :P > > I just had another but this ones a bit different: > > Aug 19 04:22:11 coyote kernel: ------------[ cut here ]------------ > Aug 19 04:22:11 coyote kernel: kernel BUG at fs/buffer.c:805! > Aug 19 04:22:11 coyote kernel: invalid operand: 0000 [#1] > Aug 19 04:22:11 coyote kernel: Modules linked in: eeprom snd_seq_oss > snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x > snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc > snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg > Aug 19 04:22:11 coyote kernel: CPU: 0 > Aug 19 04:22:11 coyote kernel: EIP: 0060:[<c0147d77>] Not > tainted > Aug 19 04:22:11 coyote kernel: EFLAGS: 00010246 (2.6.8-rc4) > Aug 19 04:22:11 coyote kernel: EIP is at > remove_inode_buffers+0x77/0x90 > Aug 19 04:22:11 coyote kernel: eax: 00000000 ebx: d7de519c ecx: > d7deb99c edx: d7deb974 > Aug 19 04:22:11 coyote kernel: esi: d7de50c8 edi: 00000001 ebp: > c198bedc esp: c198becc > Aug 19 04:22:11 coyote kernel: ds: 007b es: 007b ss: 0068 > Aug 19 04:22:11 coyote kernel: Process kswapd0 (pid: 66, > threadinfo=c198b000 task=c1978050) > Aug 19 04:22:11 coyote kernel: Stack: d7de50c8 d7de50d0 d7de50c8 > 00000057 c198bf04 c015c985 d7de50c8 00000000 > Aug 19 04:22:11 coyote kernel: 00000057 d7de5290 e50ac0d0 > 00000080 00000000 c198b000 c198bf10 c015ca5f > Aug 19 04:22:11 coyote kernel: 00000080 c198bf44 c0135b14 > 00000080 000000d0 01779600 00000000 0002d1f3 > Aug 19 04:22:11 coyote kernel: Call Trace: > Aug 19 04:22:11 coyote kernel: [<c01044ef>] show_stack+0x7f/0xa0 > Aug 19 04:22:11 coyote kernel: [<c0104688>] > show_registers+0x158/0x1b0 > Aug 19 04:22:11 coyote kernel: [<c01047e6>] die+0x66/0xd0 > Aug 19 04:22:12 coyote kernel: [<c0104bc3>] do_invalid_op+0xb3/0xc0 > Aug 19 04:22:12 coyote kernel: [<c010415d>] error_code+0x2d/0x38 > Aug 19 04:22:12 coyote kernel: [<c015c985>] prune_icache+0x115/0x1b0 > Aug 19 04:22:12 coyote kernel: [<c015ca5f>] > shrink_icache_memory+0x3f/0x50 > Aug 19 04:22:12 coyote kernel: [<c0135b14>] shrink_slab+0x134/0x170 > Aug 19 04:22:12 coyote kernel: [<c0136bb9>] balance_pgdat+0x1a9/0x1f0 > Aug 19 04:22:12 coyote kernel: [<c0136cbf>] kswapd+0xbf/0xd0 > Aug 19 04:22:12 coyote kernel: [<c01023f1>] > kernel_thread_helper+0x5/0x14 > Aug 19 04:22:12 coyote kernel: Code: 0f 0b 25 03 e5 0b 30 c0 eb c4 31 > ff eb de 0f 0b 36 04 e5 0b > > The system is still up but its 100 megs into swap so I'm going to > reboot without changing anything. Is this one traceable? ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-19 18:36 ` Marcelo Tosatti @ 2004-08-20 2:38 ` Gene Heskett 2004-08-20 7:33 ` Marcelo Tosatti 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-20 2:38 UTC (permalink / raw) To: linux-kernel Cc: Marcelo Tosatti, Nick Piggin, viro, Linus Torvalds, Andrew Morton On Thursday 19 August 2004 14:36, Marcelo Tosatti wrote: >Gene, > >That is: > >/* > * The buffer's backing address_space's private_lock must be held > */ >static inline void __remove_assoc_queue(struct buffer_head *bh) >{ > BUG_ON(bh->b_assoc_buffers.next == NULL); <---------- > BUG_ON(bh->b_assoc_buffers.prev == NULL); > list_del_init(&bh->b_assoc_buffers); >} > >Viro, Linus, Andrew, dont you have any idea what could cause such > mapping->b_assoc_mapping corruption? > >I can't see how that could be caused by flaky hardware. There is still that possibility Marcelo. Someone recommended I get cpuburn and memburn, and before fixing the scanf statement (it was broken) in memburn, I had compiled it for a 512 meg test the first time, and a 768 meg test the next couple of runs. All exited with errors like this: Passed round 133, elapsed 4827.19. FAILED at round 134/14208927: got ff00, expected 0!!! REREAD: ff00, ff00, ff00!!! [root@coyote memburn]# vim memburn.c [root@coyote memburn]# gcc -o memburn memburn.c [root@coyote memburn]# ./memburn Starting test with size 768 megs.. Passed round 0, elapsed 44.36. Passed round 1, elapsed 74.13. Passed round 2, elapsed 105.12. FAILED at round 3/25777183: got 2b00, expected 0!!! REREAD: 2b00, 2b00, 2b00!!! I've now rebuilt it with a better printf format string, and its running over 768 megs again. But this time the round counter is up to 90 and still going... Interesting too is that memburn has now allocated a 768 meg wide block 5 times, and still no Oops. Over a hundred megs in swap, but its still running. I lost the BUG_ON patches in fs/buffer.c, this is now 2.6.8.1-mm2 (but I can go back if this fails of course) Or can I just copy that 2.6.8-rc4/fs/buffer.c file over this one? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 2:38 ` Gene Heskett @ 2004-08-20 7:33 ` Marcelo Tosatti 2004-08-20 15:06 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Marcelo Tosatti @ 2004-08-20 7:33 UTC (permalink / raw) To: Gene Heskett, mingo Cc: linux-kernel, Nick Piggin, viro, Linus Torvalds, Andrew Morton On Thu, Aug 19, 2004 at 10:38:19PM -0400, Gene Heskett wrote: > On Thursday 19 August 2004 14:36, Marcelo Tosatti wrote: > >Gene, > > > >That is: > > > >/* > > * The buffer's backing address_space's private_lock must be held > > */ > >static inline void __remove_assoc_queue(struct buffer_head *bh) > >{ > > BUG_ON(bh->b_assoc_buffers.next == NULL); <---------- > > BUG_ON(bh->b_assoc_buffers.prev == NULL); > > list_del_init(&bh->b_assoc_buffers); > >} > > > >Viro, Linus, Andrew, dont you have any idea what could cause such > > mapping->b_assoc_mapping corruption? > > > >I can't see how that could be caused by flaky hardware. > > There is still that possibility Marcelo. Someone recommended I get > cpuburn and memburn, and before fixing the scanf statement (it was > broken) in memburn, I had compiled it for a 512 meg test the first > time, and a 768 meg test the next couple of runs. > > All exited with errors like this: > Passed round 133, elapsed 4827.19. > FAILED at round 134/14208927: got ff00, expected 0!!! > > REREAD: ff00, ff00, ff00!!! > > [root@coyote memburn]# vim memburn.c > [root@coyote memburn]# gcc -o memburn memburn.c > [root@coyote memburn]# ./memburn > Starting test with size 768 megs.. > > Passed round 0, elapsed 44.36. > Passed round 1, elapsed 74.13. > Passed round 2, elapsed 105.12. > FAILED at round 3/25777183: got 2b00, expected 0!!! > > REREAD: 2b00, 2b00, 2b00!!! > > I've now rebuilt it with a better printf format string, and its > running over 768 megs again. But this time the round counter is up > to 90 and still going... > > Interesting too is that memburn has now allocated a 768 meg wide block > 5 times, and still no Oops. Over a hundred megs in swap, but its > still running. > > I lost the BUG_ON patches in fs/buffer.c, this is now 2.6.8.1-mm2 (but > I can go back if this fails of course) > > Or can I just copy that 2.6.8-rc4/fs/buffer.c file over this one? You can just copy it, _I think_. If you have problems just add the BUG_ON's by hand. Now Ingo also hit the same problem, Ingo can you reproduce that remove_inode_buffers()? ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 7:33 ` Marcelo Tosatti @ 2004-08-20 15:06 ` Gene Heskett 2004-08-20 15:43 ` V13 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-20 15:06 UTC (permalink / raw) To: linux-kernel Cc: Marcelo Tosatti, mingo, Nick Piggin, viro, Linus Torvalds, Andrew Morton On Friday 20 August 2004 03:33, Marcelo Tosatti wrote: [...] >> >I can't see how that could be caused by flaky hardware. >> >> There is still that possibility Marcelo. Someone recommended I >> get cpuburn and memburn, and before fixing the scanf statement (it >> was broken) in memburn, I had compiled it for a 512 meg test the >> first time, and a 768 meg test the next couple of runs. >> >> All exited with errors like this: >> Passed round 133, elapsed 4827.19. >> FAILED at round 134/14208927: got ff00, expected 0!!! >> >> REREAD: ff00, ff00, ff00!!! >> >> [root@coyote memburn]# vim memburn.c >> [root@coyote memburn]# gcc -o memburn memburn.c >> [root@coyote memburn]# ./memburn >> Starting test with size 768 megs.. >> >> Passed round 0, elapsed 44.36. >> Passed round 1, elapsed 74.13. >> Passed round 2, elapsed 105.12. >> FAILED at round 3/25777183: got 2b00, expected 0!!! >> >> REREAD: 2b00, 2b00, 2b00!!! The latest output of memburn after a bit of format hacking: FAILED at round 78/165714207: got 0000ff00, expected 00000000!!! REREAD: 0000ff00, 0000ff00, 0000ff00!!! and FAILED at round 160/200780831: got 02025302, expected 02020202!!! REREAD: 02025302, 02025302, 02025302!!! So it appears that its the third byte of 4 each time thats fubar'd. I'l run it a few more times to confirm. Is memory byte wide per chip on these things today? >> I've now rebuilt it with a better printf format string, and its >> running over 768 megs again. But this time the round counter is >> up to 90 and still going... >> >> Interesting too is that memburn has now allocated a 768 meg wide >> block 5 times, and still no Oops. Over a hundred megs in swap, >> but its still running. >> >> I lost the BUG_ON patches in fs/buffer.c, this is now 2.6.8.1-mm2 >> (but I can go back if this fails of course) >> >> Or can I just copy that 2.6.8-rc4/fs/buffer.c file over this one? > >You can just copy it, _I think_. If you have problems just add the > BUG_ON's by hand. Looks like I'll have to, the newer one is about 600 bytes bigger already, so there are lots of changes. OTOH, I'm now up 21 hours, and the memory management so far is surviving on 2.6.8.1-mm2. memburn may be hitting the errors, keeping them from taking down the os maybe? Sillier things have happened. >Now Ingo also hit the same problem, Ingo can you reproduce that >remove_inode_buffers()? -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 15:06 ` Gene Heskett @ 2004-08-20 15:43 ` V13 2004-08-20 17:29 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: V13 @ 2004-08-20 15:43 UTC (permalink / raw) To: gene.heskett Cc: linux-kernel, Marcelo Tosatti, mingo, Nick Piggin, viro, Linus Torvalds, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 2744 bytes --] On Friday 20 August 2004 18:06, Gene Heskett wrote: > On Friday 20 August 2004 03:33, Marcelo Tosatti wrote: > [...] > > >> >I can't see how that could be caused by flaky hardware. > >> > >> There is still that possibility Marcelo. Someone recommended I > >> get cpuburn and memburn, and before fixing the scanf statement (it > >> was broken) in memburn, I had compiled it for a 512 meg test the > >> first time, and a 768 meg test the next couple of runs. > >> > >> All exited with errors like this: > >> Passed round 133, elapsed 4827.19. > >> FAILED at round 134/14208927: got ff00, expected 0!!! > >> > >> REREAD: ff00, ff00, ff00!!! > >> > >> [root@coyote memburn]# vim memburn.c > >> [root@coyote memburn]# gcc -o memburn memburn.c > >> [root@coyote memburn]# ./memburn > >> Starting test with size 768 megs.. > >> > >> Passed round 0, elapsed 44.36. > >> Passed round 1, elapsed 74.13. > >> Passed round 2, elapsed 105.12. > >> FAILED at round 3/25777183: got 2b00, expected 0!!! > >> > >> REREAD: 2b00, 2b00, 2b00!!! > > The latest output of memburn after a bit of format hacking: > > FAILED at round 78/165714207: got 0000ff00, expected 00000000!!! > REREAD: 0000ff00, 0000ff00, 0000ff00!!! > > and > > FAILED at round 160/200780831: got 02025302, expected 02020202!!! > REREAD: 02025302, 02025302, 02025302!!! > > So it appears that its the third byte of 4 each time thats fubar'd. > I'l run it a few more times to confirm. Is memory byte wide per chip > on these things today? I had a simillar problem some years ago. I had core dumps and gcc errors all the time but memtest could not find a thing. 99% it was a CPU problem and not a memory problem. It seemed that there were errors at random times even when there was no cpu load. I believe it was a cache problem. I made a simple prog (like memburn) that allocated memory blocks and then did some read/write on them (alloc+write 5 blocks, check 1, free 1, alloc+write 6, check 2, free 2 alloc+write 7....). After that whenever the program encountered an error it looped on this block forever. The errors occured after a random period of time (from 1 block allocation to more than an hour) and were never reproduced after a stop/start. When this test program was running and looping on the bad block, gcc never displayed errors. The problem was fixed when I replaced the CPU and I'm still using the same DIMMs without problems. I also did a lot of checks before replacing the CPU, like changing the position of the DIMMs, removing one of them, change their timing, and much more without success. Even removed all the PCI cards. Disabling the CPU cache or replacing it can be a good test. <<V13>> [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 15:43 ` V13 @ 2004-08-20 17:29 ` Gene Heskett 2004-08-20 18:13 ` Marc Ballarin 2004-08-20 20:11 ` R. J. Wysocki 0 siblings, 2 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-20 17:29 UTC (permalink / raw) To: linux-kernel Cc: V13, Marcelo Tosatti, mingo, Nick Piggin, viro, Linus Torvalds, Andrew Morton On Friday 20 August 2004 11:43, V13 wrote: >On Friday 20 August 2004 18:06, Gene Heskett wrote: >> On Friday 20 August 2004 03:33, Marcelo Tosatti wrote: >> [...] >> >> >> >I can't see how that could be caused by flaky hardware. >> >> >> >> There is still that possibility Marcelo. Someone recommended I >> >> get cpuburn and memburn, and before fixing the scanf statement >> >> (it was broken) in memburn, I had compiled it for a 512 meg >> >> test the first time, and a 768 meg test the next couple of >> >> runs. >> >> >> >> All exited with errors like this: >> >> Passed round 133, elapsed 4827.19. >> >> FAILED at round 134/14208927: got ff00, expected 0!!! >> >> >> >> REREAD: ff00, ff00, ff00!!! >> >> >> >> [root@coyote memburn]# vim memburn.c >> >> [root@coyote memburn]# gcc -o memburn memburn.c >> >> [root@coyote memburn]# ./memburn >> >> Starting test with size 768 megs.. >> >> >> >> Passed round 0, elapsed 44.36. >> >> Passed round 1, elapsed 74.13. >> >> Passed round 2, elapsed 105.12. >> >> FAILED at round 3/25777183: got 2b00, expected 0!!! >> >> >> >> REREAD: 2b00, 2b00, 2b00!!! >> >> The latest output of memburn after a bit of format hacking: >> >> FAILED at round 78/165714207: got 0000ff00, expected 00000000!!! >> REREAD: 0000ff00, 0000ff00, 0000ff00!!! >> >> and >> >> FAILED at round 160/200780831: got 02025302, expected 02020202!!! >> REREAD: 02025302, 02025302, 02025302!!! >> >> So it appears that its the third byte of 4 each time thats >> fubar'd. I'l run it a few more times to confirm. Is memory byte >> wide per chip on these things today? > >I had a simillar problem some years ago. I had core dumps and gcc > errors all the time but memtest could not find a thing. 99% it was > a CPU problem and not a memory problem. It seemed that there were > errors at random times even when there was no cpu load. > >I believe it was a cache problem. I made a simple prog (like > memburn) that allocated memory blocks and then did some read/write > on them (alloc+write 5 blocks, check 1, free 1, alloc+write 6, > check 2, free 2 alloc+write 7....). After that whenever the program > encountered an error it looped on this block forever. > >The errors occured after a random period of time (from 1 block > allocation to more than an hour) and were never reproduced after a > stop/start. When this test program was running and looping on the > bad block, gcc never displayed errors. The problem was fixed when I > replaced the CPU and I'm still using the same DIMMs without > problems. I also did a lot of checks before replacing the CPU, like > changing the position of the DIMMs, removing one of them, change > their timing, and much more without success. Even removed all the > PCI cards. > >Disabling the CPU cache or replacing it can be a good test. > ><<V13>> I tried disabling it in the bios and the machine became unusable for all practical purposes. But it did run about half a day that way. I'd estimate its speed was similar to a 33 mhz 386sx with only 8 megs of ram though. I could type a full sentence ahead of the screen display in kmail for instance. Had it been usable, I might have been tempted to let it run a couple of days just for grins. On the next reboot, I'm going to switch the stick around, and see if the errors move to an even address. If they do, then I'd be convinced its memory and not cache. The question then becomes which stick in a dual channel setup is even addresses, and which is odd addresses. Probably best to just go buy another half gigger and swap it in for one of these one at a time. And hope its better! Yup, memburn stopped again, at an odd address, showing the same failure pattern in byte 3 of 4. FAILED at round 63/20669951: got 0000ff00, expected 00000000!!! REREAD: 0000ff00, 0000ff00, 0000ff00!!! I guess i'm going to town. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 17:29 ` Gene Heskett @ 2004-08-20 18:13 ` Marc Ballarin 2004-08-20 20:08 ` Gene Heskett 2004-08-20 20:11 ` R. J. Wysocki 1 sibling, 1 reply; 146+ messages in thread From: Marc Ballarin @ 2004-08-20 18:13 UTC (permalink / raw) To: gene.heskett; +Cc: linux-kernel, v13 On Fri, 20 Aug 2004 13:29:05 -0400 Gene Heskett <gene.heskett@verizon.net> wrote: > > I tried disabling it in the bios and the machine became unusable for > all practical purposes. Is ECC checking for L2 cache enabled in your BIOS? BTW: I trimmed the CC list somewhat Regards ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 18:13 ` Marc Ballarin @ 2004-08-20 20:08 ` Gene Heskett 2004-08-21 9:25 ` Barry K. Nathan 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-20 20:08 UTC (permalink / raw) To: linux-kernel; +Cc: Marc Ballarin, v13 On Friday 20 August 2004 14:13, Marc Ballarin wrote: >On Fri, 20 Aug 2004 13:29:05 -0400 > >Gene Heskett <gene.heskett@verizon.net> wrote: >> I tried disabling it in the bios and the machine became unusable >> for all practical purposes. > >Is ECC checking for L2 cache enabled in your BIOS? There isn't a switch for that and as near as I can tell, no L2 cache on this board, only the L1 in the cpu. If there is an L2, then memtest86 can't find it, and I don't see any chips that look like seperate memory. Memtest86 may not know howto enable it if its an nforce2 option. Whatever cache shown as switchable in the bios, turning it off makes a very sick bird out of the machine, like a 33mhz 386sx? I've located the bios docs on the Biostar site, and was set to print them when it locked up the last time. So I'll restart that project shortly. But it does run with it off and for the short time I left it that way, no errors. >BTW: I trimmed the CC list somewhat > >Regards -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 20:08 ` Gene Heskett @ 2004-08-21 9:25 ` Barry K. Nathan 2004-08-21 18:31 ` V13 0 siblings, 1 reply; 146+ messages in thread From: Barry K. Nathan @ 2004-08-21 9:25 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, Marc Ballarin, v13 On Fri, Aug 20, 2004 at 04:08:50PM -0400, Gene Heskett wrote: > On Friday 20 August 2004 14:13, Marc Ballarin wrote: [snip] > >Is ECC checking for L2 cache enabled in your BIOS? > > There isn't a switch for that and as near as I can tell, no L2 cache > on this board, only the L1 in the cpu. If there is an L2, then > memtest86 can't find it, and I don't see any chips that look like > seperate memory. The L2 cache is *on the CPU chip itself*. Any CPU recent enough to physically fit into an nForce board has the L2 cache on the CPU itself. I think the last Athlons to have separate L2 cache chips were the Slot A models, and even then, the L2 cache chips were still on the CPU module and not the motherboard. > Memtest86 may not know howto enable it if its an > nforce2 option. Whatever cache shown as switchable in the bios, > turning it off makes a very sick bird out of the machine, like a > 33mhz 386sx? Yeah, disabling the L2 cache on a modern CPU makes it really slow. But, it's still a useful troubleshooting option... -Barry K. Nathan <barryn@pobox.com> ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-21 9:25 ` Barry K. Nathan @ 2004-08-21 18:31 ` V13 2004-08-21 18:55 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: V13 @ 2004-08-21 18:31 UTC (permalink / raw) To: Barry K. Nathan; +Cc: Gene Heskett, linux-kernel, Marc Ballarin [-- Attachment #1: Type: text/plain, Size: 1272 bytes --] On Saturday 21 August 2004 12:25, Barry K. Nathan wrote: > > Memtest86 may not know howto enable it if its an > > nforce2 option. Whatever cache shown as switchable in the bios, > > turning it off makes a very sick bird out of the machine, like a > > 33mhz 386sx? > > Yeah, disabling the L2 cache on a modern CPU makes it really slow. But, > it's still a useful troubleshooting option... When I had the problem described in my previous mail I came to the conclussion that it was related with cache *BUT* it seemed that the cache was just caching wrong data. Disabling the cache would just reduce the problem. One reason for this is that when the program detected errors in a buffer (i.e. 0x1234 instead of 0x1111) then they would NOT go away if the program was reading from this buffer all the time. This means that the cache always returned the same data. The error was 'gone' every time the program was suspended for a while or when something else used a lot of memory (i.e. another instance of this program). So, I'm not suggesting that his cache is faulty but that there can be a CPU (or even a M/B) problem that corrupts data when they are transfered from memory to the processor. > -Barry K. Nathan <barryn@pobox.com> <<V13>> [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-21 18:31 ` V13 @ 2004-08-21 18:55 ` Gene Heskett 2004-08-22 11:04 ` Helge Hafting 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-21 18:55 UTC (permalink / raw) To: linux-kernel; +Cc: V13, Barry K. Nathan, Marc Ballarin On Saturday 21 August 2004 14:31, V13 wrote: >On Saturday 21 August 2004 12:25, Barry K. Nathan wrote: >> > Memtest86 may not know howto enable it if its an >> > nforce2 option. Whatever cache shown as switchable in the bios, >> > turning it off makes a very sick bird out of the machine, like a >> > 33mhz 386sx? >> >> Yeah, disabling the L2 cache on a modern CPU makes it really slow. >> But, it's still a useful troubleshooting option... > >When I had the problem described in my previous mail I came to the > conclussion that it was related with cache *BUT* it seemed that the > cache was just caching wrong data. Disabling the cache would just > reduce the problem. > >One reason for this is that when the program detected errors in a > buffer (i.e. 0x1234 instead of 0x1111) then they would NOT go away > if the program was reading from this buffer all the time. This > means that the cache always returned the same data. The error was > 'gone' every time the program was suspended for a while or when > something else used a lot of memory (i.e. another instance of this > program). > >So, I'm not suggesting that his cache is faulty but that there can > be a CPU (or even a M/B) problem that corrupts data when they are > transfered from memory to the processor. > >> -Barry K. Nathan <barryn@pobox.com> > ><<V13>> Latest memburn results here, this after swapping the memory sticks for each other, running over 512 megs, half my ram: Passed round 2308, elapsed 41225.98. FAILED at round 2309/40220063: got ff000000, expected 00000000!!! REREAD: ff000000, ff000000, ff000000!!! So not only has the problem moved from the 2nd LSB to the MSB of the fetch, but it is a lot more severe in terms of the amount of time to catch one error, now nearly 17 hours. I'm now up 25 hours and the machine feels good, no Oops so far and I've restarted memburn in addition to konstruct working on kde-3.3 final. I'm over 100 megs into the swap, and 2.6.8.1-mm2 seems to handling the situation admirably so far. That knocking sound? Thats me, knocking on wood for good luck. :-) -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-21 18:55 ` Gene Heskett @ 2004-08-22 11:04 ` Helge Hafting 2004-08-22 11:40 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Helge Hafting @ 2004-08-22 11:04 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel, V13, Barry K. Nathan, Marc Ballarin On Sat, Aug 21, 2004 at 02:55:13PM -0400, Gene Heskett wrote: > On Saturday 21 August 2004 14:31, V13 wrote: > > So not only has the problem moved from the 2nd LSB to the MSB of the > fetch, but it is a lot more severe in terms of the amount of time to > catch one error, now nearly 17 hours. I'm now up 25 hours and the > machine feels good, no Oops so far and I've restarted memburn in > addition to konstruct working on kde-3.3 final. I'm over 100 megs > into the swap, and 2.6.8.1-mm2 seems to handling the situation > admirably so far. That knocking sound? Thats me, knocking on wood > for good luck. :-) Seems it is the memory, then. Things getting *better*�when moving memory may mean: * slight timing problem - in that case the memory might be fine at a slower setting. (Reason for complaints if you must go below spec.) * Moving memory around rubs dirt, dust and oxide off the contacts, both on the memory sticks and the mainboard connectors. This gives better contact and may improve things. Consider cleaning the connectors further. Also look for dust and hair lying in the mainboard connectors. It happens, especially when some slots are free for a long time until memory is added. Helge Hafting ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-22 11:04 ` Helge Hafting @ 2004-08-22 11:40 ` Gene Heskett 0 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-22 11:40 UTC (permalink / raw) To: linux-kernel; +Cc: Helge Hafting, V13, Barry K. Nathan, Marc Ballarin On Sunday 22 August 2004 07:04, Helge Hafting wrote: >On Sat, Aug 21, 2004 at 02:55:13PM -0400, Gene Heskett wrote: >> On Saturday 21 August 2004 14:31, V13 wrote: >> >> So not only has the problem moved from the 2nd LSB to the MSB of >> the fetch, but it is a lot more severe in terms of the amount of >> time to catch one error, now nearly 17 hours. I'm now up 25 hours >> and the machine feels good, no Oops so far and I've restarted >> memburn in addition to konstruct working on kde-3.3 final. I'm >> over 100 megs into the swap, and 2.6.8.1-mm2 seems to handling the >> situation admirably so far. That knocking sound? Thats me, >> knocking on wood for good luck. :-) > >Seems it is the memory, then. >Things getting *better*�when moving memory may mean: >* slight timing problem - in that case the memory might be fine > at a slower setting. (Reason for complaints if you must go below > spec.) I'd discount this as it made no difference to run it at half speed in a bios setting, making a 1400 out of this 2800 athlon at the same time the bios signed the ram on as DDR200 dual channel ram. > * Moving memory around rubs dirt, dust and oxide off the > contacts, both on the memory sticks and the mainboard connectors. > This gives better contact and may improve things. Consider > cleaning the connectors further. Also look for dust and hair lying > in the mainboard connectors. It happens, especially when some > slots are free for a long time until memory is added. I think now that this is the scenario in effect. The next time it Oops's, I'll spend some time and reseat both sticks several more times. As this vendor is in Tampa FL, could the storage environment there for new mainbooards in their retail packaging box be a factor? With the turnover rate Dan has, I wouldn't think so, but then I've NDI where they may sit between their assembly in .tw land, and going on his shelves in Tampa. The retail box from Biostar has the board in the usual pink bubble-wrap static bag. but it isn't sealed other than the end folded over and taped shut. Ditto for the ram but I think thats hand packed per order in the usual grey anti-static, way too big, bag. Right now, memburn hasn't errored again, but konstruct bailed out trying to make liboggvorbis, and there is over 830 megs in swap. I should be able to do a swapoff and restart, leaving X/kde, memburn and seti running I'd think. I'll send this and check a swapoff for grins. All this used to run in 512 megs without useing any great amount of swap. :-] > >Helge Hafting Thanks Helge -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 17:29 ` Gene Heskett 2004-08-20 18:13 ` Marc Ballarin @ 2004-08-20 20:11 ` R. J. Wysocki 2004-08-20 20:17 ` Gene Heskett 1 sibling, 1 reply; 146+ messages in thread From: R. J. Wysocki @ 2004-08-20 20:11 UTC (permalink / raw) To: gene.heskett; +Cc: linux-kernel On Friday 20 of August 2004 19:29, Gene Heskett wrote: > On Friday 20 August 2004 11:43, V13 wrote: > >On Friday 20 August 2004 18:06, Gene Heskett wrote: > >> On Friday 20 August 2004 03:33, Marcelo Tosatti wrote: > >> [...] > >> > >> >> >I can't see how that could be caused by flaky hardware. > >> >> > >> >> There is still that possibility Marcelo. Someone recommended I > >> >> get cpuburn and memburn, and before fixing the scanf statement > >> >> (it was broken) in memburn, I had compiled it for a 512 meg > >> >> test the first time, and a 768 meg test the next couple of > >> >> runs. > >> >> > >> >> All exited with errors like this: > >> >> Passed round 133, elapsed 4827.19. > >> >> FAILED at round 134/14208927: got ff00, expected 0!!! > >> >> > >> >> REREAD: ff00, ff00, ff00!!! > >> >> > >> >> [root@coyote memburn]# vim memburn.c > >> >> [root@coyote memburn]# gcc -o memburn memburn.c > >> >> [root@coyote memburn]# ./memburn > >> >> Starting test with size 768 megs.. > >> >> > >> >> Passed round 0, elapsed 44.36. > >> >> Passed round 1, elapsed 74.13. > >> >> Passed round 2, elapsed 105.12. > >> >> FAILED at round 3/25777183: got 2b00, expected 0!!! > >> >> > >> >> REREAD: 2b00, 2b00, 2b00!!! > >> > >> The latest output of memburn after a bit of format hacking: > >> > >> FAILED at round 78/165714207: got 0000ff00, expected 00000000!!! > >> REREAD: 0000ff00, 0000ff00, 0000ff00!!! > >> > >> and > >> > >> FAILED at round 160/200780831: got 02025302, expected 02020202!!! > >> REREAD: 02025302, 02025302, 02025302!!! > >> > >> So it appears that its the third byte of 4 each time thats > >> fubar'd. I'l run it a few more times to confirm. Is memory byte > >> wide per chip on these things today? > > > >I had a simillar problem some years ago. I had core dumps and gcc > > errors all the time but memtest could not find a thing. 99% it was > > a CPU problem and not a memory problem. It seemed that there were > > errors at random times even when there was no cpu load. > > > >I believe it was a cache problem. I made a simple prog (like > > memburn) that allocated memory blocks and then did some read/write > > on them (alloc+write 5 blocks, check 1, free 1, alloc+write 6, > > check 2, free 2 alloc+write 7....). After that whenever the program > > encountered an error it looped on this block forever. > > > >The errors occured after a random period of time (from 1 block > > allocation to more than an hour) and were never reproduced after a > > stop/start. When this test program was running and looping on the > > bad block, gcc never displayed errors. The problem was fixed when I > > replaced the CPU and I'm still using the same DIMMs without > > problems. I also did a lot of checks before replacing the CPU, like > > changing the position of the DIMMs, removing one of them, change > > their timing, and much more without success. Even removed all the > > PCI cards. > > > >Disabling the CPU cache or replacing it can be a good test. > > > ><<V13>> > > I tried disabling it in the bios and the machine became unusable for > all practical purposes. But it did run about half a day that way. > I'd estimate its speed was similar to a 33 mhz 386sx with only 8 megs > of ram though. I could type a full sentence ahead of the screen > display in kmail for instance. Had it been usable, I might have been > tempted to let it run a couple of days just for grins. On the next > reboot, I'm going to switch the stick around, and see if the errors > move to an even address. If they do, then I'd be convinced its > memory and not cache. The question then becomes which stick in a > dual channel setup is even addresses, and which is odd addresses. > > Probably best to just go buy another half gigger and swap it in for > one of these one at a time. And hope its better! > > Yup, memburn stopped again, at an odd address, showing the same > failure pattern in byte 3 of 4. > > FAILED at round 63/20669951: got 0000ff00, expected 00000000!!! > REREAD: 0000ff00, 0000ff00, 0000ff00!!! > > I guess i'm going to town. There's a simple test you can do unless your DIMMs must go in pairs (I don't remember if it's required by nforce2): remove one of them and see what happens. If you can reproduce the same symptoms on each of them separately, I'd bet on a cache problem. Greetings, -- Rafael J. Wysocki ---------------------------- For a successful technology, reality must take precedence over public relations, for nature cannot be fooled. -- Richard P. Feynman ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 20:11 ` R. J. Wysocki @ 2004-08-20 20:17 ` Gene Heskett 2004-08-22 5:05 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-20 20:17 UTC (permalink / raw) To: linux-kernel; +Cc: R. J. Wysocki On Friday 20 August 2004 16:11, R. J. Wysocki wrote:[...] >There's a simple test you can do unless your DIMMs must go in pairs > (I don't remember if it's required by nforce2): remove one of them > and see what happens. To get dual channel DDR, they have to be in a pair. Since this post, they've been swapped one for the other, and I'll be curious to see if the address goes to an even address when it errors, which it hasn't yet. > If you can reproduce the same symptoms on > each of them separately, I'd bet on a cache problem. > That makes sense, so I can try that too. I hadn't thought of that, duh! >Greetings, Someone else asked if ECC was on, but this board doesn't have it, and the memory has a blank pattern where the parity chip would be. So I think its safe to say no :) -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-20 20:17 ` Gene Heskett @ 2004-08-22 5:05 ` Gene Heskett 2004-08-22 11:42 ` R. J. Wysocki 2004-08-24 2:34 ` Tom Vier 0 siblings, 2 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-22 5:05 UTC (permalink / raw) To: linux-kernel; +Cc: R. J. Wysocki On Friday 20 August 2004 16:17, Gene Heskett wrote: >On Friday 20 August 2004 16:11, R. J. Wysocki wrote:[...] > >>There's a simple test you can do unless your DIMMs must go in pairs >> (I don't remember if it's required by nforce2): remove one of them >> and see what happens. > >To get dual channel DDR, they have to be in a pair. Since this > post, they've been swapped one for the other, and I'll be curious > to see if the address goes to an even address when it errors, which > it hasn't yet. > It has, one time in 35 hours now. The problem is considerably reduced. Whereas the error was always at an odd address, and in the 2nd LSbyte, now its still an odd address but the error has moved to the MSB of a 32 bit fetch: [root@coyote memburn]# ./memburn 512 Starting test with size 512 megs.. Passed round 2308, elapsed 41225.98. FAILED at round 2309/40220063: got ff000000, expected 00000000!!! REREAD: ff000000, ff000000, ff000000!!! [root@coyote memburn]# ./memburn 512 Starting test with size 512 megs.. Passed round 2636, elapsed 60944.15. As can be seen, I restarted it, and its ran quite even more loops now without error. There has been no more Oops, but with memburn eating 512 megs, half my ram, and kde-3.3 under construction by konstruct, I've peaked at nearly a gig of swap, and 754 megs in swap right now. Sure, its a bit laggy, but not unusable. So now the question is since the error address is always odd, which stick is it? Or do I need to sanitize the dimm sockets somehow? They sure seem to slip in and out easy enough for a socket with that many contacts. Not over 3 pounds on each end will seat them, and if the clips are re-opened they virtually fall out into your hand. I'm rather more used to having to press 5 to 10 pounds on each end to seat them. Next time I have to reboot, I'm going to 'exercise' them in and out a few times just to polish the oxide from the contacts. >> If you can reproduce the same symptoms on >> each of them separately, I'd bet on a cache problem. > >That makes sense, so I can try that too. I hadn't thought of that, >duh! > >>Greetings, > >Someone else asked if ECC was on, but this board doesn't have it, > and the memory has a blank pattern where the parity chip would be. > So I think its safe to say no :) -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-22 5:05 ` Gene Heskett @ 2004-08-22 11:42 ` R. J. Wysocki 2004-08-24 2:34 ` Tom Vier 1 sibling, 0 replies; 146+ messages in thread From: R. J. Wysocki @ 2004-08-22 11:42 UTC (permalink / raw) To: gene.heskett; +Cc: linux-kernel On Sunday 22 of August 2004 07:05, Gene Heskett wrote: > On Friday 20 August 2004 16:17, Gene Heskett wrote: > >On Friday 20 August 2004 16:11, R. J. Wysocki wrote:[...] > > > >>There's a simple test you can do unless your DIMMs must go in pairs > >> (I don't remember if it's required by nforce2): remove one of them > >> and see what happens. > > > >To get dual channel DDR, they have to be in a pair. Since this > > post, they've been swapped one for the other, and I'll be curious > > to see if the address goes to an even address when it errors, which > > it hasn't yet. > > It has, one time in 35 hours now. The problem is considerably > reduced. > > Whereas the error was always at an odd address, and in the 2nd LSbyte, > now its still an odd address but the error has moved to the MSB of a > 32 bit fetch: > > [root@coyote memburn]# ./memburn 512 > Starting test with size 512 megs.. > Passed round 2308, elapsed 41225.98. > FAILED at round 2309/40220063: got ff000000, expected 00000000!!! > REREAD: ff000000, ff000000, ff000000!!! > [root@coyote memburn]# ./memburn 512 > Starting test with size 512 megs.. > Passed round 2636, elapsed 60944.15. > > As can be seen, I restarted it, and its ran quite even more loops now > without error. There has been no more Oops, but with memburn eating > 512 megs, half my ram, and kde-3.3 under construction by konstruct, > I've peaked at nearly a gig of swap, and 754 megs in swap right now. > Sure, its a bit laggy, but not unusable. > > So now the question is since the error address is always odd, which > stick is it? Hard to tell. I think the memory controller is interleaving them for efficiency but the question remains which one is regarded as the first. BTW, as it indicates that DRAM is to blame, you can try to fiddle a bit with its timings (provided the board setup allows you to do this). For example, you can set them to 3-3-3 or equivalent (generally, push them up) and check if this affects the memburn results and how. Just an idea, you know. ;-) Greetings, -- Rafael J. Wysocki ---------------------------- For a successful technology, reality must take precedence over public relations, for nature cannot be fooled. -- Richard P. Feynman ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-22 5:05 ` Gene Heskett 2004-08-22 11:42 ` R. J. Wysocki @ 2004-08-24 2:34 ` Tom Vier 2004-08-24 3:08 ` Gene Heskett 1 sibling, 1 reply; 146+ messages in thread From: Tom Vier @ 2004-08-24 2:34 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel On Sun, Aug 22, 2004 at 01:05:25AM -0400, Gene Heskett wrote: > Whereas the error was always at an odd address, and in the 2nd LSbyte, > now its still an odd address but the error has moved to the MSB of a > 32 bit fetch: are you translating virt->phys? -- Tom Vier <tmv@comcast.net> DSA Key ID 0x15741ECE ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-24 2:34 ` Tom Vier @ 2004-08-24 3:08 ` Gene Heskett 2004-08-25 1:49 ` Tom Vier 0 siblings, 1 reply; 146+ messages in thread From: Gene Heskett @ 2004-08-24 3:08 UTC (permalink / raw) To: linux-kernel, Tom Vier On Monday 23 August 2004 22:34, Tom Vier wrote: >On Sun, Aug 22, 2004 at 01:05:25AM -0400, Gene Heskett wrote: >> Whereas the error was always at an odd address, and in the 2nd >> LSbyte, now its still an odd address but the error has moved to >> the MSB of a 32 bit fetch: > >are you translating virt->phys? No, this is straight out of the memburn output (after I'd fixed the printf formatting strings to actually print full 8 character hexidecimal, but not the address of the error, thats in decimal) I don't know enough about this to nail it to a physical address unforch. And right now I have one of the two sticks pulled, trying to figure out which one has the tummy ache, but himem is still compiled in and cc1plus is going crazy, eating all ram and 500Megs of swap trying to build the libsmoke stuff in the new 3.3 kde. So I'm about to reboot to a no himem support kernel since I only have half a Gig with just one stick installed, and see if that fixes cc1plus. Thanks for asking. I appreciate it. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-24 3:08 ` Gene Heskett @ 2004-08-25 1:49 ` Tom Vier 2004-08-25 2:33 ` Gene Heskett 2004-08-25 6:13 ` Denis Vlasenko 0 siblings, 2 replies; 146+ messages in thread From: Tom Vier @ 2004-08-25 1:49 UTC (permalink / raw) To: Gene Heskett; +Cc: linux-kernel On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote: > >are you translating virt->phys? > > No, this is straight out of the memburn output (after I'd fixed the that's weird that you're finding that pattern in virtual addresses. i wouldn't expect that. even if you're booting to single user, certain variables might change during boot and cause different physical pages to be mapped. maybe single user is more deterministic than i think, though. -- Tom Vier <tmv@comcast.net> DSA Key ID 0x15741ECE ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-25 1:49 ` Tom Vier @ 2004-08-25 2:33 ` Gene Heskett 2004-08-25 14:55 ` Martin J. Bligh 2004-08-27 14:01 ` Gene Heskett 2004-08-25 6:13 ` Denis Vlasenko 1 sibling, 2 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-25 2:33 UTC (permalink / raw) To: linux-kernel, Tom Vier On Tuesday 24 August 2004 21:49, Tom Vier wrote: >On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote: >> >are you translating virt->phys? >> >> No, this is straight out of the memburn output (after I'd fixed >> the > >that's weird that you're finding that pattern in virtual addresses. > i wouldn't expect that. even if you're booting to single user, > certain variables might change during boot and cause different > physical pages to be mapped. maybe single user is more > deterministic than i think, though. Well, FWIW, and not knowing a hell of a lot about it, I would assume (there's *that* word again) that even the virtual addresses would be long word aligned with reality even if otherwise totally bogus. I mean you'd really have to go out of your way to make it otherwise on x86 hardware wouldn't you? ATM I'm running on one stick, with memburn hacking away at 128 megs worth of it, Passed round 5683, elapsed 23530.67 at the moment. And about 100 megs into swap, darnit. And it isn't running anything else unusual, x/kde/kmail/mozilla & an occasional game of sol. If it runs till tommorrow morning, I'll assume this stick is good, and put the other one in the same socket for a similar test. If it passes, then I try the other socket one stick at a time, but first I have to get my finger healed up, I somehow drew a bit of blood on the end of my little finger using it to lever open the socket clips the last time. A nasty little paper cut type slice I never felt happen till I saw the blood. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-25 2:33 ` Gene Heskett @ 2004-08-25 14:55 ` Martin J. Bligh 2004-08-25 17:23 ` Ryan Cumming 2004-08-27 14:01 ` Gene Heskett 1 sibling, 1 reply; 146+ messages in thread From: Martin J. Bligh @ 2004-08-25 14:55 UTC (permalink / raw) To: linux-kernel This whole thread makes me think ... if we oops, shouldn't we check if we're holding any spinlocks or semaphores, and just panic the whole machine if so? Not sure how expensive it would be to hold that state, but ... M. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-25 14:55 ` Martin J. Bligh @ 2004-08-25 17:23 ` Ryan Cumming 2004-08-25 17:36 ` Martin J. Bligh 0 siblings, 1 reply; 146+ messages in thread From: Ryan Cumming @ 2004-08-25 17:23 UTC (permalink / raw) To: Martin J. Bligh; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 359 bytes --] On Wednesday 25 August 2004 07:55, Martin J. Bligh wrote: > This whole thread makes me think ... if we oops, shouldn't we check if > we're holding any spinlocks or semaphores, and just panic the whole > machine if so? Not sure how expensive it would be to hold that state, > but ... On preempt, wouldn't it just be a matter of checking preempt_count? -Ryan [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-25 17:23 ` Ryan Cumming @ 2004-08-25 17:36 ` Martin J. Bligh 0 siblings, 0 replies; 146+ messages in thread From: Martin J. Bligh @ 2004-08-25 17:36 UTC (permalink / raw) To: Ryan Cumming; +Cc: linux-kernel --Ryan Cumming <ryan@spitfire.gotdns.org> wrote (on Wednesday, August 25, 2004 10:23:29 -0700): > On Wednesday 25 August 2004 07:55, Martin J. Bligh wrote: >> This whole thread makes me think ... if we oops, shouldn't we check if >> we're holding any spinlocks or semaphores, and just panic the whole >> machine if so? Not sure how expensive it would be to hold that state, >> but ... > > On preempt, wouldn't it just be a matter of checking preempt_count? Spinlocks, with or without preeempt, can probably do something like this. But I don't think that works for sems. M. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-25 2:33 ` Gene Heskett 2004-08-25 14:55 ` Martin J. Bligh @ 2004-08-27 14:01 ` Gene Heskett 1 sibling, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-27 14:01 UTC (permalink / raw) To: linux-kernel; +Cc: Tom Vier On Tuesday 24 August 2004 22:33, Gene Heskett wrote: (going for the longest running thread on lkml) >On Tuesday 24 August 2004 21:49, Tom Vier wrote: >>On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote: >>> >are you translating virt->phys? >>> >>> No, this is straight out of the memburn output (after I'd fixed >>> the >> >>that's weird that you're finding that pattern in virtual addresses. >> i wouldn't expect that. even if you're booting to single user, >> certain variables might change during boot and cause different >> physical pages to be mapped. maybe single user is more >> deterministic than i think, though. > >Well, FWIW, and not knowing a hell of a lot about it, I would assume >(there's *that* word again) that even the virtual addresses would be >long word aligned with reality even if otherwise totally bogus. I >mean you'd really have to go out of your way to make it otherwise on >x86 hardware wouldn't you? > >ATM I'm running on one stick, with memburn hacking away at 128 megs >worth of it, Passed round 5683, elapsed 23530.67 at the moment. And >about 100 megs into swap, darnit. And it isn't running anything > else unusual, x/kde/kmail/mozilla & an occasional game of sol. > >If it runs till tommorrow morning, I'll assume this stick is good, > and put the other one in the same socket for a similar test. If it > passes, then I try the other socket one stick at a time. Ok, I've now shuffled both sticks thru both "B" sockets on this mobo, and neither one could run memburn more than 20 minutes, and again, the errors are all in the xx of nnnnxxnn in hex display formats. So, I've put both sticks back in, in the A and B2 sockets ATM. That ran about 25 minutes before memburn got a tummy ache. In the meantime I'd rebuilt 2.6.9-rc1-mm1 with hi-mem support again, and the last reboot I took a detour thru the bios and turned the memory voltage up 100mv to 2.6 volts. Running memburn against 400 megs of it has now been running for around 40 minutes. So, my question for the hardware folks is: Whats the proper voltage to run a bank of DDR400 dimms in Dual Channel mode? Humm, I spoke too soon, memburn has exited, with this: Ahh, fudge, one cannot copy/paste from a virtual term. Suffice to say that the address is odd, that the error is in the usual 3rd byte of 4 position within the 32 bit read. Since it appears that TCWO is gone, bankrupt or whatever, and I like the features of this board otherwise, can someone suggest howto go about getting a warranty replacement direct from Biostar? I'll go visit the web page again, but I don't recall seeing any suitable links the last time I was there checking on updated bios files. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-25 1:49 ` Tom Vier 2004-08-25 2:33 ` Gene Heskett @ 2004-08-25 6:13 ` Denis Vlasenko 2004-08-29 13:48 ` Gene Heskett 1 sibling, 1 reply; 146+ messages in thread From: Denis Vlasenko @ 2004-08-25 6:13 UTC (permalink / raw) To: Tom Vier, Gene Heskett; +Cc: linux-kernel On Wednesday 25 August 2004 04:49, Tom Vier wrote: > On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote: > > >are you translating virt->phys? > > > > No, this is straight out of the memburn output (after I'd fixed the > > that's weird that you're finding that pattern in virtual addresses. i > wouldn't expect that. even if you're booting to single user, certain > variables might change during boot and cause different physical pages to be > mapped. maybe single user is more deterministic than i think, though. On x86, pages are aligned at 4k. Lower 12 bits of virtual address match lower 12 bits of corresponding real address. So, yes, if you hit bad RAM cell, you see random virtual address, but three last digits of it (in hex) must be the same. -- vda ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-25 6:13 ` Denis Vlasenko @ 2004-08-29 13:48 ` Gene Heskett 2004-08-29 14:34 ` Possible dcache BUG [u] Martin Schlemmer [c] 2004-08-29 15:21 ` Possible dcache BUG Rafael J. Wysocki 0 siblings, 2 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-29 13:48 UTC (permalink / raw) To: linux-kernel; +Cc: Denis Vlasenko, Tom Vier On Wednesday 25 August 2004 02:13, Denis Vlasenko wrote: >On Wednesday 25 August 2004 04:49, Tom Vier wrote: >> On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote: >> > >are you translating virt->phys? >> > >> > No, this is straight out of the memburn output (after I'd fixed >> > the >> >> that's weird that you're finding that pattern in virtual >> addresses. i wouldn't expect that. even if you're booting to >> single user, certain variables might change during boot and cause >> different physical pages to be mapped. maybe single user is more >> deterministic than i think, though. > >On x86, pages are aligned at 4k. Lower 12 bits of virtual address >match lower 12 bits of corresponding real address. > >So, yes, if you hit bad RAM cell, you see random virtual address, > but three last digits of it (in hex) must be the same. I think, based on the last 25 hours of running both memburn and setiathome at a -nice 19, and there have been no errors, that I might have stumbled onto a fix. It seems the dram is marked DDR400, so I was trying to run it that way. Unforch, on checking the invoice for the umpteenth time, it finally dawned on me that this particular AMD 2800XP is supposedly a 333mhz FSB chip, and not rated for use with DDR400 memory. Switching the bios setting for the memory to 'auto' from 'spd' seems to effect this particular item, and the memory now signs in as DDR333 Dual Channel. And after 25 hours, no errors, nothing unusual in the logs. I guess I should go paint my face with egg or something... My apologies to those who spent a considerable amount of time and brain power auditing code because of my stupidity. -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG [u] 2004-08-29 13:48 ` Gene Heskett @ 2004-08-29 14:34 ` Martin Schlemmer [c] 2004-08-29 15:21 ` Possible dcache BUG Rafael J. Wysocki 1 sibling, 0 replies; 146+ messages in thread From: Martin Schlemmer [c] @ 2004-08-29 14:34 UTC (permalink / raw) To: gene.heskett; +Cc: Linux Kernel Mailing Lists, Denis Vlasenko, Tom Vier [-- Attachment #1: Type: text/plain, Size: 3489 bytes --] On Sun, 2004-08-29 at 15:48, Gene Heskett wrote: > I think, based on the last 25 hours of running both memburn and > setiathome at a -nice 19, and there have been no errors, that I might > have stumbled onto a fix. > > It seems the dram is marked DDR400, so I was trying to run it that > way. Unforch, on checking the invoice for the umpteenth time, it > finally dawned on me that this particular AMD 2800XP is supposedly a > 333mhz FSB chip, and not rated for use with DDR400 memory. Switching > the bios setting for the memory to 'auto' from 'spd' seems to effect > this particular item, and the memory now signs in as DDR333 Dual > Channel. > > And after 25 hours, no errors, nothing unusual in the logs. > I work for a supplier here in ZA, and out of experience memory compatibility can be a vast gray area. For instance: 1) You have exactly the same Chipset (say nforce2 400's or whatever), but different vendors that assembles the board (say Asus/MSI and Gigabyte). You take PC3200 CL3 sticks, and they work fine on the Asus and MSI, but dont work on the Gigabyte (only one of the long list of memory issues Gigabyte boards have - in my experience). It has a lot to do with how the vendor does the timings, etc. 2) You have 4 sticks of Hynix memory, all for have the exact chips on. Two have the older pcboard layout, and the other two have the newer. The older ones give intermittant issues on D865GBR (Bayfield boards - cant remember the exact code) if you try to run them in dual channel mode, but works fine with only one stick. The board works fine in dual channel mode with the new revision pcb sticks. 3) P4 SiS chipsets have a bad habit of only running two sticks together (non-dual channel chipsets ... 645, 650, 651, with identical sticks) if you clock the memory down to to the bus of the cpu (400mhz cpu only runs fine with memory at 200 fsb, and 533 with memory at 266 - remember, its the true speed of the cpu/memory, not the '4x pumped' one Intel likes to advertise with, or the 'double data rate' speed memory is advertised with). With a single chip, it usually runs fine at 333mhz on 533mhz fsb cpu - cant remember with 400mhz cpu. That was just some examples to show that vendor/revision/config can make a huge difference, and lots of headaces. In your case, here is a few points you could look at. In general the boards I worked with, worked fine with a 333fsb cpu, running memory at 400mhz. Last I checked, this might be issues: 1) All nforce2 chipsets had a certain errata that caused timing issues with ddr400 memory with a CL latency of 2. You had to tipically downclock the memory to 333mhz, or set the CL latency up to 2.5 or 3. Good example is the popular Kingston Hyper-X sticks. I am not sure if they might have sorted it out on later chipsets. 2) Hynix memory tipically did not work too great, especially in dual channel mode. The best memory to use was usually the Samsung PC3200 CL3 ones if you did not want to fork too much (except if you had some of the Gigabyte boards customers brought to us when they only got the memory from us - do they ever learn not to shop around if it comes to board and memory?) 3) *sometimes* a bios update did help. Anyhow, just a few things I ran into that you might have a look at - sorry its a bit late in this thread. -- Martin Schlemmer [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-29 13:48 ` Gene Heskett 2004-08-29 14:34 ` Possible dcache BUG [u] Martin Schlemmer [c] @ 2004-08-29 15:21 ` Rafael J. Wysocki 2004-08-29 17:23 ` Denis Vlasenko 1 sibling, 1 reply; 146+ messages in thread From: Rafael J. Wysocki @ 2004-08-29 15:21 UTC (permalink / raw) To: gene.heskett, linux-kernel On Sunday 29 of August 2004 15:48, Gene Heskett wrote: > On Wednesday 25 August 2004 02:13, Denis Vlasenko wrote: > >On Wednesday 25 August 2004 04:49, Tom Vier wrote: > >> On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote: > >> > >are you translating virt->phys? > >> > > >> > No, this is straight out of the memburn output (after I'd fixed > >> > the > >> > >> that's weird that you're finding that pattern in virtual > >> addresses. i wouldn't expect that. even if you're booting to > >> single user, certain variables might change during boot and cause > >> different physical pages to be mapped. maybe single user is more > >> deterministic than i think, though. > > > >On x86, pages are aligned at 4k. Lower 12 bits of virtual address > >match lower 12 bits of corresponding real address. > > > >So, yes, if you hit bad RAM cell, you see random virtual address, > > but three last digits of it (in hex) must be the same. > > I think, based on the last 25 hours of running both memburn and > setiathome at a -nice 19, and there have been no errors, that I might > have stumbled onto a fix. > > It seems the dram is marked DDR400, so I was trying to run it that > way. Unforch, on checking the invoice for the umpteenth time, it > finally dawned on me that this particular AMD 2800XP is supposedly a > 333mhz FSB chip, and not rated for use with DDR400 memory. Switching > the bios setting for the memory to 'auto' from 'spd' seems to effect > this particular item, and the memory now signs in as DDR333 Dual > Channel. > > And after 25 hours, no errors, nothing unusual in the logs. > > I guess I should go paint my face with egg or something... Not necessarily. :-) Some mobos based on the nforce2 chipsets should be able to clock FSB and memory asynchronously. The very fact that you can set the memory clock separately in the BIOS indicates that your mobo is one of these. So, if it runs well at synchronous FSB and memory clock rates, but causes problems otherwise, the northbridge is probably fishy. Or the memory is not up to the spec. Anyway, the symptoms are quite "interesting" and it's good to know what they are. Regards, RJW -- For a successful technology, reality must take precedence over public relations, for nature cannot be fooled. -- Richard P. Feynman ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-29 15:21 ` Possible dcache BUG Rafael J. Wysocki @ 2004-08-29 17:23 ` Denis Vlasenko 2004-08-29 22:25 ` Gene Heskett 0 siblings, 1 reply; 146+ messages in thread From: Denis Vlasenko @ 2004-08-29 17:23 UTC (permalink / raw) To: Rafael J. Wysocki, gene.heskett, linux-kernel > > I think, based on the last 25 hours of running both memburn and > > setiathome at a -nice 19, and there have been no errors, that I might > > have stumbled onto a fix. > > > > It seems the dram is marked DDR400, so I was trying to run it that > > way. Unforch, on checking the invoice for the umpteenth time, it > > finally dawned on me that this particular AMD 2800XP is supposedly a > > 333mhz FSB chip, and not rated for use with DDR400 memory. Switching > > the bios setting for the memory to 'auto' from 'spd' seems to effect > > this particular item, and the memory now signs in as DDR333 Dual > > Channel. > > > > And after 25 hours, no errors, nothing unusual in the logs. > > > > I guess I should go paint my face with egg or something... > > Not necessarily. :-) Some mobos based on the nforce2 chipsets should be > able to clock FSB and memory asynchronously. The very fact that you can > set the memory clock separately in the BIOS indicates that your mobo is one > of these. So, if it runs well at synchronous FSB and memory clock rates, > but causes problems otherwise, the northbridge is probably fishy. Or the > memory is not up to the spec. Anyway, the symptoms are quite "interesting" > and it's good to know what they are. The best thing is, we got another RAM test program which seems to be better than memtest86 in some cases! -- vda ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG 2004-08-29 17:23 ` Denis Vlasenko @ 2004-08-29 22:25 ` Gene Heskett 0 siblings, 0 replies; 146+ messages in thread From: Gene Heskett @ 2004-08-29 22:25 UTC (permalink / raw) To: linux-kernel; +Cc: Denis Vlasenko, Rafael J. Wysocki On Sunday 29 August 2004 13:23, Denis Vlasenko wrote: [...] >> Not necessarily. :-) Some mobos based on the nforce2 chipsets >> should be able to clock FSB and memory asynchronously. The very >> fact that you can set the memory clock separately in the BIOS >> indicates that your mobo is one of these. So, if it runs well at >> synchronous FSB and memory clock rates, but causes problems >> otherwise, the northbridge is probably fishy. Or the memory is >> not up to the spec. Anyway, the symptoms are quite "interesting" >> and it's good to know what they are. Take you pick, unless you'd rather use a shovel. :-) The bios has provisions but the nforce2 chipset doesn't seem to want to tolerate what must be an occasional timing error on the write phase. An inadequate amount of buffering available would be my best guess. I don't believe the reads are defective in this case, just the writes go tits up on a very very narrow case thats only hit maybe once an hour. Thats damned hard for a logic analyzer to catch. >The best thing is, we got another RAM test program which seems to be > better than memtest86 in some cases! I've been thinking of that myself, and I've come to the conclusion that because memtest86 probably doesn't know anything about an nforce2 chipset, it says right up front its not using the cache. And that may well be the key right there. Turn off the cache and theres no problem. I tried that here just for grins, but it turned the machine into a very sick dog, going from 8 or 9 seti units a day down to about 1.5, and everything else was swimming in cold molasses. I could easily type a whole line ahead of kmails display updates and I'm not a touch typist, topping out at maybe 10-15 wpm, not counting the time spent backing up and fixing typu's. Ancient fingers don't always hit the key cleanly. So you are correct in that memtest86 ground away on this machine for something like 36 hours total run time, and never found an error. I fired up memburn and had an error within a half hour. Therefore to me, its proven to be a valuable tool, thank you Ville Herva. >-- >vda -- Cheers, Gene "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) 99.24% setiathome rank, not too shabby for a WV hillbilly Yahoo.com attorneys please note, additions to this message by Gene Heskett are: Copyright 2004 by Maurice Eugene Heskett, all rights reserved. ^ permalink raw reply [flat|nested] 146+ messages in thread
* Possible dcache BUG @ 2004-08-05 14:54 Brett Charbeneau 0 siblings, 0 replies; 146+ messages in thread From: Brett Charbeneau @ 2004-08-05 14:54 UTC (permalink / raw) To: linux-kernel Greetings, I am getting the oops below - twice since 7/26, but I haven't a clue what's causing it. I am not a subscriber, so any replies directed to me would be gratefully received. Thank you for your hard work on this! -- Brett Charbeneau, Network Administrator Tel: 757-259-7750 Williamsburg Regional Library FAX: 757-259-7798 7770 Croaker Road brett@wrl.org Williamsburg, VA 23188-7064 http://www.wrl.org ksymoops 2.4.9 on i686 2.4.26. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.26/ (default) -m /boot/System.map (specified) 1151MB HIGHMEM available. 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html kernel BUG at dcache.c:345! invalid operand: 0000 CPU: 0 EIP: 0010:[<c014322d>] Not tainted Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010206 eax: 00040000 ebx: eb8d7c70 ecx: c281b394 edx: e5636700 esi: eb8d7c58 edi: c281b394 ebp: d2b15f34 esp: d2b15f08 ds: 0018 es: 0018 ss: 0018 Process umount (pid: 14814, stackpage=d2b15000) Stack: c0128f81 c281b49c c281f000 00000246 d2b15f34 f721e1a0 00000466 f721e178 f721e178 f721e178 c02991c0 d2b15f44 c01435a6 00000150 f7b6f400 d2b15f5c c013714f f721e178 d2b15f88 08052179 0804d82b d2b15f7c c013afea f7b6f400 Call Trace: [<c0128f81>] [<c01435a6>] [<c013714f>] [<c013afea>] [<c01472d0>] [<c01472ee>] [<c0106d93>] Code: 0f 0b 59 01 1e d6 25 c0 8d 56 10 8b 4a 04 8b 46 10 89 48 04 >>EIP; c014322d <prune_dcache+5d/140> <===== >>ebx; eb8d7c70 <_end+2b5bb734/384f6ac4> >>ecx; c281b394 <_end+24fee58/384f6ac4> >>edx; e5636700 <_end+2531a1c4/384f6ac4> >>esi; eb8d7c58 <_end+2b5bb71c/384f6ac4> >>edi; c281b394 <_end+24fee58/384f6ac4> >>ebp; d2b15f34 <_end+127f99f8/384f6ac4> >>esp; d2b15f08 <_end+127f99cc/384f6ac4> Trace; c0128f81 <kmem_cache_free+1c1/270> Trace; c01435a6 <shrink_dcache_parent+16/30> Trace; c013714f <kill_super+5f/f0> Trace; c013afea <path_release+2a/40> Trace; c01472d0 <sys_umount+80/90> Trace; c01472ee <sys_oldumount+e/20> Trace; c0106d93 <system_call+33/38> Code; c014322d <prune_dcache+5d/140> 00000000 <_EIP>: Code; c014322d <prune_dcache+5d/140> <===== 0: 0f 0b ud2a <===== Code; c014322f <prune_dcache+5f/140> 2: 59 pop %ecx Code; c0143230 <prune_dcache+60/140> 3: 01 1e add %ebx,(%esi) Code; c0143232 <prune_dcache+62/140> 5: d6 (bad) Code; c0143233 <prune_dcache+63/140> 6: 25 c0 8d 56 10 and $0x10568dc0,%eax Code; c0143238 <prune_dcache+68/140> b: 8b 4a 04 mov 0x4(%edx),%ecx Code; c014323b <prune_dcache+6b/140> e: 8b 46 10 mov 0x10(%esi),%eax Code; c014323e <prune_dcache+6e/140> 11: 89 48 04 mov %ecx,0x4(%eax) kernel BUG at dcache.c:345! invalid operand: 0000 CPU: 0 EIP: 0010:[<c014322d>] Not tainted EFLAGS: 00010206 eax: 00040000 ebx: ea612c70 ecx: c281b394 edx: dd1f64bc esi: ea612c58 edi: c281b394 ebp: c2825f00 esp: c2825ed4 ds: 0018 es: 0018 ss: 0018 Process kswapd (pid: 4, stackpage=c2825000) Stack: 00000187 00000003 c2825ef4 c0128525 c281b418 d8728000 c281b418 00000006 00000000 c233bfb0 00000003 c2825f0c c01435e2 00000d1d c2825f4c c012a284 00000006 000001d0 c2824000 ffffffff 00012199 000001d0 c02970d0 c2825f50 Call Trace: [<c0128525>] [<c01435e2>] [<c012a284>] [<c012a462>] [<c012a501>] [<c012a580>] [<c012a739>] [<c012a7b6>] [<c012a8ff>] [<c012a860>] [<c0105000>] [<c01055b6>] [<c012a860>] Code: 0f 0b 59 01 1e d6 25 c0 8d 56 10 8b 4a 04 8b 46 10 89 48 04 >>EIP; c014322d <prune_dcache+5d/140> <===== >>ebx; ea612c70 <_end+2a2f6734/384f6ac4> >>ecx; c281b394 <_end+24fee58/384f6ac4> >>edx; dd1f64bc <_end+1ced9f80/384f6ac4> >>esi; ea612c58 <_end+2a2f671c/384f6ac4> >>edi; c281b394 <_end+24fee58/384f6ac4> >>ebp; c2825f00 <_end+25099c4/384f6ac4> >>esp; c2825ed4 <_end+2509998/384f6ac4> Trace; c0128525 <__kmem_cache_shrink_locked+45/70> Trace; c01435e2 <shrink_dcache_memory+22/40> Trace; c012a284 <shrink_cache+294/370> Trace; c012a462 <refill_inactive+102/170> Trace; c012a501 <shrink_caches+31/40> Trace; c012a580 <try_to_free_pages_zone+70/f0> Trace; c012a739 <kswapd_balance_pgdat+59/b0> Trace; c012a7b6 <kswapd_balance+26/40> Trace; c012a8ff <kswapd+9f/c0> Trace; c012a860 <kswapd+0/c0> Trace; c0105000 <_stext+0/0> Trace; c01055b6 <arch_kernel_thread+26/40> Trace; c012a860 <kswapd+0/c0> Code; c014322d <prune_dcache+5d/140> 00000000 <_EIP>: Code; c014322d <prune_dcache+5d/140> <===== 0: 0f 0b ud2a <===== Code; c014322f <prune_dcache+5f/140> 2: 59 pop %ecx Code; c0143230 <prune_dcache+60/140> 3: 01 1e add %ebx,(%esi) Code; c0143232 <prune_dcache+62/140> 5: d6 (bad) Code; c0143233 <prune_dcache+63/140> 6: 25 c0 8d 56 10 and $0x10568dc0,%eax Code; c0143238 <prune_dcache+68/140> b: 8b 4a 04 mov 0x4(%edx),%ecx Code; c014323b <prune_dcache+6b/140> e: 8b 46 10 mov 0x10(%esi),%eax Code; c014323e <prune_dcache+6e/140> 11: 89 48 04 mov %ecx,0x4(%eax) ^ permalink raw reply [flat|nested] 146+ messages in thread
[parent not found: <2oKTA-5CQ-65@gated-at.bofh.it>]
[parent not found: <2r0U7-3yx-9@gated-at.bofh.it>]
[parent not found: <2rwhh-BX-15@gated-at.bofh.it>]
[parent not found: <2rShM-7QP-5@gated-at.bofh.it>]
[parent not found: <2rSrs-7Vn-1@gated-at.bofh.it>]
[parent not found: <2rSUw-8lw-3@gated-at.bofh.it>]
[parent not found: <2rTGR-se-3@gated-at.bofh.it>]
[parent not found: <2rUjF-Od-11@gated-at.bofh.it>]
* Re: Possible dcache BUG [not found] ` <2rUjF-Od-11@gated-at.bofh.it> @ 2004-08-11 12:32 ` Andi Kleen 0 siblings, 0 replies; 146+ messages in thread From: Andi Kleen @ 2004-08-11 12:32 UTC (permalink / raw) To: David S. Miller; +Cc: linux-kernel, us15 "David S. Miller" <davem@redhat.com> writes: > On Tue, 10 Aug 2004 22:13:01 -0700 (PDT) > Linus Torvalds <torvalds@osdl.org> wrote: > >> I also wonder what the >> hell is allocating so many 8kB and 32kB entries. > > Loopback default MTU is 16K these days, might explain > the 32K entries but not the 8KB ones. Perhaps the > later are being used for page tables? Just a guess > on that latter one. Kernel stacks more likely. 200 processes = 200 8K entries. Unless he used suic^w4K stack mode. -Andi ^ permalink raw reply [flat|nested] 146+ messages in thread
* Re: Possible dcache BUG @ 2004-08-20 8:08 Daniel Blueman 0 siblings, 0 replies; 146+ messages in thread From: Daniel Blueman @ 2004-08-20 8:08 UTC (permalink / raw) To: gene.heskett, linux-kernel [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset="us-ascii", Size: 1553 bytes --] I find that memtest86 [1] does a great job of checking memory, especially since you can boot the available ISO image. Perhaps worth a try here? --- [1] http://www.memtest86.com/ --- There is still that possibility Marcelo. Someone recommended I get cpuburn and memburn, and before fixing the scanf statement (it was broken) in memburn, I had compiled it for a 512 meg test the first time, and a 768 meg test the next couple of runs. All exited with errors like this: Passed round 133, elapsed 4827.19. FAILED at round 134/14208927: got ff00, expected 0!!! REREAD: ff00, ff00, ff00!!! [root@coyote memburn]# vim memburn.c [root@coyote memburn]# gcc -o memburn memburn.c [root@coyote memburn]# ./memburn Starting test with size 768 megs.. Passed round 0, elapsed 44.36. Passed round 1, elapsed 74.13. Passed round 2, elapsed 105.12. FAILED at round 3/25777183: got 2b00, expected 0!!! REREAD: 2b00, 2b00, 2b00!!! I've now rebuilt it with a better printf format string, and its running over 768 megs again. But this time the round counter is up to 90 and still going... Interesting too is that memburn has now allocated a 768 meg wide block 5 times, and still no Oops. Over a hundred megs in swap, but its still running. I lost the BUG_ON patches in fs/buffer.c, this is now 2.6.8.1-mm2 (but I can go back if this fails of course) Or can I just copy that 2.6.8-rc4/fs/buffer.c file over this one? -- Daniel J Blueman NEU: Bis zu 10 GB Speicher für e-mails & Dateien! 1 GB bereits bei GMX FreeMail http://www.gmx.net/de/go/mail ^ permalink raw reply [flat|nested] 146+ messages in thread
end of thread, other threads:[~2004-09-13 4:53 UTC | newest] Thread overview: 146+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2004-08-02 13:14 Possible dcache BUG Brett Charbeneau 2004-08-05 2:16 ` Gene Heskett 2004-08-05 3:46 ` Andrew Morton 2004-08-05 4:31 ` Gene Heskett 2004-08-05 0:44 ` Chris Shoemaker 2004-08-05 8:35 ` Denis Vlasenko 2004-08-05 14:14 ` Gene Heskett 2004-08-05 13:48 ` Gene Heskett [not found] ` <200408210118.02011.vda@port.imtp.ilyichevsk.odessa.ua> 2004-08-21 1:40 ` Gene Heskett 2004-08-05 8:33 ` Denis Vlasenko 2004-08-05 14:19 ` Gene Heskett [not found] ` <200408070203.35268.vda@port.imtp.ilyichevsk.odessa.ua> 2004-08-07 1:28 ` Gene Heskett 2004-08-05 21:26 ` Chris Shoemaker 2004-08-05 7:25 ` Linus Torvalds 2004-08-05 7:31 ` Andrew Morton 2004-08-05 8:33 ` Denis Vlasenko 2004-08-05 14:55 ` Gene Heskett 2004-08-05 16:26 ` Linus Torvalds 2004-08-05 18:06 ` Ingo Molnar 2004-08-05 18:50 ` Linus Torvalds 2004-08-05 20:29 ` Andi Kleen [not found] ` <20040806073739.GA6617@elte.hu> [not found] ` <20040806004231.143c8bd2.akpm@osdl.org> 2004-08-06 8:27 ` Ingo Molnar 2004-08-06 11:51 ` Gene Heskett 2004-08-06 16:58 ` Linus Torvalds 2004-08-06 17:16 ` Gene Heskett 2004-08-06 17:26 ` William Lee Irwin III 2004-08-06 23:19 ` Chris Shoemaker 2004-08-07 4:15 ` William Lee Irwin III 2004-08-07 0:05 ` Chris Shoemaker 2004-08-07 5:50 ` William Lee Irwin III 2004-08-06 23:09 ` Chris Shoemaker 2004-08-07 6:20 ` Linus Torvalds 2004-08-07 12:38 ` Gene Heskett 2004-08-07 13:44 ` Chris Shoemaker 2004-08-07 18:49 ` Linus Torvalds 2004-08-07 19:01 ` Gene Heskett 2004-08-06 11:31 ` Andi Kleen 2004-08-06 17:16 ` Linus Torvalds 2004-08-05 21:10 ` Chris Shoemaker 2004-08-06 2:03 ` Gene Heskett 2004-08-06 2:12 ` Gene Heskett 2004-08-06 2:50 ` Linus Torvalds 2004-08-06 3:18 ` viro 2004-08-06 3:24 ` Linus Torvalds 2004-08-08 4:42 ` Gene Heskett 2004-08-08 14:30 ` Gene Heskett 2004-08-08 18:39 ` Andrew Morton 2004-08-10 4:12 ` Gene Heskett 2004-08-11 3:42 ` Gene Heskett 2004-08-11 3:46 ` Linus Torvalds 2004-08-11 4:18 ` Udo A. Steinberg 2004-08-11 5:13 ` Linus Torvalds 2004-08-11 5:15 ` Linus Torvalds 2004-08-11 5:33 ` Udo A. Steinberg 2004-08-11 14:37 ` Gene Heskett 2004-08-12 1:26 ` Nick Piggin 2004-08-12 2:23 ` Gene Heskett 2004-08-12 2:36 ` Nick Piggin 2004-08-13 1:00 ` Udo A. Steinberg 2004-08-13 1:31 ` Linus Torvalds 2004-08-13 2:03 ` Gene Heskett 2004-08-13 2:27 ` Andreas Dilger 2004-08-13 3:33 ` Linus Torvalds 2004-08-20 7:02 ` Udo A. Steinberg 2004-08-20 7:11 ` Andrew Morton 2004-08-20 7:19 ` Udo A. Steinberg 2004-08-20 7:49 ` Nick Piggin 2004-08-24 6:08 ` Udo A. Steinberg 2004-08-24 7:41 ` Nick Piggin 2004-08-24 18:20 ` Marcelo Tosatti 2004-08-24 20:00 ` Andrew Morton 2004-08-24 18:40 ` Marcelo Tosatti 2004-08-25 0:27 ` Marcelo Tosatti 2004-09-12 7:03 ` Udo A. Steinberg 2004-09-12 7:16 ` Andrew Morton 2004-09-12 7:29 ` Udo A. Steinberg 2004-09-12 7:48 ` Andrew Morton 2004-09-13 4:53 ` Len Brown 2004-08-11 5:55 ` David S. Miller 2004-08-11 4:47 ` Gene Heskett 2004-08-11 4:59 ` Linus Torvalds 2004-08-11 8:05 ` Roger Luethi 2004-08-13 4:27 ` Gene Heskett 2004-08-13 8:32 ` Gene Heskett 2004-08-14 2:18 ` Marcelo Tosatti 2004-08-14 5:19 ` Gene Heskett 2004-08-14 5:50 ` Gene Heskett 2004-08-14 8:17 ` Gene Heskett 2004-08-15 4:09 ` Gene Heskett 2004-08-15 8:48 ` viro 2004-08-15 9:42 ` Gene Heskett 2004-08-15 17:31 ` Andrew Morton 2004-08-15 17:58 ` Gene Heskett 2004-08-15 9:50 ` Gene Heskett 2004-08-15 10:36 ` viro 2004-08-15 10:10 ` Gene Heskett 2004-08-15 10:37 ` viro 2004-08-15 10:42 ` Gene Heskett 2004-08-15 11:00 ` viro [not found] ` <200408150704.49312.gene.heskett@verizon.net> 2004-08-15 11:26 ` viro 2004-08-15 17:47 ` Gene Heskett [not found] ` <200408152257.04773.vda@port.imtp.ilyichevsk.odessa.ua> 2004-08-15 20:33 ` Gene Heskett [not found] ` <200408160803.15206.vda@port.imtp.ilyichevsk.odessa.ua> 2004-08-16 6:32 ` Gene Heskett 2004-08-16 14:13 ` Gene Heskett [not found] ` <200408161749.23663.vda@port.imtp.ilyichevsk.odessa.ua> 2004-08-16 15:25 ` Gene Heskett 2004-08-16 22:52 ` Gene Heskett 2004-08-16 23:01 ` viro 2004-08-17 4:44 ` Gene Heskett 2004-08-17 4:58 ` Nick Piggin 2004-08-17 5:26 ` Gene Heskett 2004-08-17 11:57 ` Nick Piggin 2004-08-19 9:41 ` Gene Heskett 2004-08-19 18:36 ` Marcelo Tosatti 2004-08-20 2:38 ` Gene Heskett 2004-08-20 7:33 ` Marcelo Tosatti 2004-08-20 15:06 ` Gene Heskett 2004-08-20 15:43 ` V13 2004-08-20 17:29 ` Gene Heskett 2004-08-20 18:13 ` Marc Ballarin 2004-08-20 20:08 ` Gene Heskett 2004-08-21 9:25 ` Barry K. Nathan 2004-08-21 18:31 ` V13 2004-08-21 18:55 ` Gene Heskett 2004-08-22 11:04 ` Helge Hafting 2004-08-22 11:40 ` Gene Heskett 2004-08-20 20:11 ` R. J. Wysocki 2004-08-20 20:17 ` Gene Heskett 2004-08-22 5:05 ` Gene Heskett 2004-08-22 11:42 ` R. J. Wysocki 2004-08-24 2:34 ` Tom Vier 2004-08-24 3:08 ` Gene Heskett 2004-08-25 1:49 ` Tom Vier 2004-08-25 2:33 ` Gene Heskett 2004-08-25 14:55 ` Martin J. Bligh 2004-08-25 17:23 ` Ryan Cumming 2004-08-25 17:36 ` Martin J. Bligh 2004-08-27 14:01 ` Gene Heskett 2004-08-25 6:13 ` Denis Vlasenko 2004-08-29 13:48 ` Gene Heskett 2004-08-29 14:34 ` Possible dcache BUG [u] Martin Schlemmer [c] 2004-08-29 15:21 ` Possible dcache BUG Rafael J. Wysocki 2004-08-29 17:23 ` Denis Vlasenko 2004-08-29 22:25 ` Gene Heskett 2004-08-05 14:54 Brett Charbeneau [not found] <2oKTA-5CQ-65@gated-at.bofh.it> [not found] ` <2r0U7-3yx-9@gated-at.bofh.it> [not found] ` <2rwhh-BX-15@gated-at.bofh.it> [not found] ` <2rShM-7QP-5@gated-at.bofh.it> [not found] ` <2rSrs-7Vn-1@gated-at.bofh.it> [not found] ` <2rSUw-8lw-3@gated-at.bofh.it> [not found] ` <2rTGR-se-3@gated-at.bofh.it> [not found] ` <2rUjF-Od-11@gated-at.bofh.it> 2004-08-11 12:32 ` Andi Kleen 2004-08-20 8:08 Daniel Blueman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).