linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible dcache BUG
@ 2004-08-02 13:14 Brett Charbeneau
  2004-08-05  2:16 ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Brett Charbeneau @ 2004-08-02 13:14 UTC (permalink / raw)
  To: linux-kernel

Greetings,

	I am getting the oops below - twice since 7/26, but I haven't a 
clue what's causing it.
	I am not a subscriber, so any replies directed to me would be 
gratefully received.
	Thank you for your hard work on this!

-- 

Brett Charbeneau, Network Administrator         Tel: 757-259-7750
Williamsburg Regional Library                   FAX: 757-259-7798
7770 Croaker Road                               brett@wrl.org
Williamsburg, VA 23188-7064                     http://www.wrl.org


ksymoops 2.4.9 on i686 2.4.26.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.26/ (default)
     -m /boot/System.map (specified)

1151MB HIGHMEM available.
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
kernel BUG at dcache.c:345!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c014322d>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 00040000   ebx: eb8d7c70   ecx: c281b394   edx: e5636700
esi: eb8d7c58   edi: c281b394   ebp: d2b15f34   esp: d2b15f08
ds: 0018   es: 0018   ss: 0018
Process umount (pid: 14814, stackpage=d2b15000)
Stack: c0128f81 c281b49c c281f000 00000246 d2b15f34 f721e1a0 00000466 f721e178 
       f721e178 f721e178 c02991c0 d2b15f44 c01435a6 00000150 f7b6f400 d2b15f5c 
       c013714f f721e178 d2b15f88 08052179 0804d82b d2b15f7c c013afea f7b6f400 
Call Trace:    [<c0128f81>] [<c01435a6>] [<c013714f>] [<c013afea>] [<c01472d0>]
  [<c01472ee>] [<c0106d93>]
Code: 0f 0b 59 01 1e d6 25 c0 8d 56 10 8b 4a 04 8b 46 10 89 48 04 


>>EIP; c014322d <prune_dcache+5d/140>   <=====

>>ebx; eb8d7c70 <_end+2b5bb734/384f6ac4>
>>ecx; c281b394 <_end+24fee58/384f6ac4>
>>edx; e5636700 <_end+2531a1c4/384f6ac4>
>>esi; eb8d7c58 <_end+2b5bb71c/384f6ac4>
>>edi; c281b394 <_end+24fee58/384f6ac4>
>>ebp; d2b15f34 <_end+127f99f8/384f6ac4>
>>esp; d2b15f08 <_end+127f99cc/384f6ac4>

Trace; c0128f81 <kmem_cache_free+1c1/270>
Trace; c01435a6 <shrink_dcache_parent+16/30>
Trace; c013714f <kill_super+5f/f0>
Trace; c013afea <path_release+2a/40>
Trace; c01472d0 <sys_umount+80/90>
Trace; c01472ee <sys_oldumount+e/20>
Trace; c0106d93 <system_call+33/38>

Code;  c014322d <prune_dcache+5d/140>
00000000 <_EIP>:
Code;  c014322d <prune_dcache+5d/140>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c014322f <prune_dcache+5f/140>
   2:   59                        pop    %ecx
Code;  c0143230 <prune_dcache+60/140>
   3:   01 1e                     add    %ebx,(%esi)
Code;  c0143232 <prune_dcache+62/140>
   5:   d6                        (bad)  
Code;  c0143233 <prune_dcache+63/140>
   6:   25 c0 8d 56 10            and    $0x10568dc0,%eax
Code;  c0143238 <prune_dcache+68/140>
   b:   8b 4a 04                  mov    0x4(%edx),%ecx
Code;  c014323b <prune_dcache+6b/140>
   e:   8b 46 10                  mov    0x10(%esi),%eax
Code;  c014323e <prune_dcache+6e/140>
  11:   89 48 04                  mov    %ecx,0x4(%eax)

kernel BUG at dcache.c:345!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c014322d>]    Not tainted
EFLAGS: 00010206
eax: 00040000   ebx: ea612c70   ecx: c281b394   edx: dd1f64bc
esi: ea612c58   edi: c281b394   ebp: c2825f00   esp: c2825ed4
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 4, stackpage=c2825000)
Stack: 00000187 00000003 c2825ef4 c0128525 c281b418 d8728000 c281b418 00000006 
       00000000 c233bfb0 00000003 c2825f0c c01435e2 00000d1d c2825f4c c012a284 
       00000006 000001d0 c2824000 ffffffff 00012199 000001d0 c02970d0 c2825f50 
Call Trace:    [<c0128525>] [<c01435e2>] [<c012a284>] [<c012a462>] [<c012a501>]
  [<c012a580>] [<c012a739>] [<c012a7b6>] [<c012a8ff>] [<c012a860>] [<c0105000>]
  [<c01055b6>] [<c012a860>]
Code: 0f 0b 59 01 1e d6 25 c0 8d 56 10 8b 4a 04 8b 46 10 89 48 04 


>>EIP; c014322d <prune_dcache+5d/140>   <=====

>>ebx; ea612c70 <_end+2a2f6734/384f6ac4>
>>ecx; c281b394 <_end+24fee58/384f6ac4>
>>edx; dd1f64bc <_end+1ced9f80/384f6ac4>
>>esi; ea612c58 <_end+2a2f671c/384f6ac4>
>>edi; c281b394 <_end+24fee58/384f6ac4>
>>ebp; c2825f00 <_end+25099c4/384f6ac4>
>>esp; c2825ed4 <_end+2509998/384f6ac4>

Trace; c0128525 <__kmem_cache_shrink_locked+45/70>
Trace; c01435e2 <shrink_dcache_memory+22/40>
Trace; c012a284 <shrink_cache+294/370>
Trace; c012a462 <refill_inactive+102/170>
Trace; c012a501 <shrink_caches+31/40>
Trace; c012a580 <try_to_free_pages_zone+70/f0>
Trace; c012a739 <kswapd_balance_pgdat+59/b0>
Trace; c012a7b6 <kswapd_balance+26/40>
Trace; c012a8ff <kswapd+9f/c0>
Trace; c012a860 <kswapd+0/c0>
Trace; c0105000 <_stext+0/0>
Trace; c01055b6 <arch_kernel_thread+26/40>
Trace; c012a860 <kswapd+0/c0>

Code;  c014322d <prune_dcache+5d/140>
00000000 <_EIP>:
Code;  c014322d <prune_dcache+5d/140>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c014322f <prune_dcache+5f/140>
   2:   59                        pop    %ecx
Code;  c0143230 <prune_dcache+60/140>
   3:   01 1e                     add    %ebx,(%esi)
Code;  c0143232 <prune_dcache+62/140>
   5:   d6                        (bad)  
Code;  c0143233 <prune_dcache+63/140>
   6:   25 c0 8d 56 10            and    $0x10568dc0,%eax
Code;  c0143238 <prune_dcache+68/140>
   b:   8b 4a 04                  mov    0x4(%edx),%ecx
Code;  c014323b <prune_dcache+6b/140>
   e:   8b 46 10                  mov    0x10(%esi),%eax
Code;  c014323e <prune_dcache+6e/140>
  11:   89 48 04                  mov    %ecx,0x4(%eax)




^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  4:31     ` Gene Heskett
@ 2004-08-05  0:44       ` Chris Shoemaker
  2004-08-05  8:35         ` Denis Vlasenko
  2004-08-05 13:48         ` Gene Heskett
  2004-08-05  8:33       ` Denis Vlasenko
  1 sibling, 2 replies; 146+ messages in thread
From: Chris Shoemaker @ 2004-08-05  0:44 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

On Thu, Aug 05, 2004 at 12:31:21AM -0400, Gene Heskett wrote:
> On Wednesday 04 August 2004 23:46, Andrew Morton wrote:
> >Gene Heskett <gene.heskett@verizon.net> wrote:
> >>
> >>  The attachment this gentleman included specifically points to
> >>  prune_dcache().  Thats nice.  It also means I'm not alone.  See
> >> the 'prune_dcache() Oops, the saga continues' thread.
> >
> >Except he's running a 2.4 kernel.
> >
> >Is there any reason why I'm wrong in thinking that you have dodgy
> >hardware?
> 
> Well, it has, in the past week, ran memtest86-3a for 12 full passes 
> over the whole gig of ram with no errors.  This was the longest test, 
> I gave it a 2 hour, 5 pass test before I ever booted linux the first 
> time on this motherboard over 2 weeks ago now, a new Biostar 
> M7NCD-Pro, with an nforce2(3?) chipset.  I did that because I was 
> comeing from an older board whose memory had been overstressed by a 
> failing video card and I wanted to make sure this new memory, nearly 
> $210 worth of it, was good. I gave it another, probably 4 hour test 
> after the first couple of crashes, which it also passed.  And it got 
> worse as the kernel versions incremented from 2.6.7.  I can have the 
> same fault in prune_dcache() while running a 2.6.7 kernel without an 
> instant lockup, but it will eventually die, maybe half an hour later.  
> Move to 2.6.7-mm1, which has a patch to fs/dcache.c that remains 
> untouched thru 2.6.8-rc2, and those kernels, if they lock up, do it 
> totally, often with nothing in the logs at all.  That was the case 
> today, on 2.6.8-rc3, which has a new dcache.c patch in it if I read 
> the release notes correctly.
> 
> If this is dodgy hardware, give me something to take to tcwo.com when 
> I ask for an rma.  Not having M$ windows of any kind here, I frankly 
> haven't had the inclination to look at the cd's that came with the 
> board.  Should I?
> 
> Or does linux have a hardware test suite I've not heard about?

Gene,
	I sympathize with you.  Back in March and April I was seeing
oopses in prune_dcache() once every few days.  After tracing the asm
down for a few of them, I found one that looked like a 3 bit flip and
then one that looked like a single bit flip.  I memtested my RAM for
days with no failure.  I tried cpuburn.  I looped over kernel compiles.
I couldn't make it fail, but every day or two, as long as I wasn't
trying, I'd get an oops, and more than %50 were in prune_dcache.  I
believed that there was a correspondence with low memory conditions, but
I never proved this.  I _added_ a memory module (keeping everything I
had) and I compiled 2.6.7-rc3 on Jun 10th.  I haven't oopsed since.  (I
think I may also have turned off PREEMP around this time, so that's why
I suggested it earlier.)

	FWIW, I've seen no fewer than 4 independent reports that looked
suspiciously like yours and mine over the past 3 months.  Maybe we all
have bad hardware, and memtest86 just isn't stressful enough to show it.
The alternative is that there's some bug that has affected several
versions of 2.6 (and maybe 2.4) that seems to hit in low memory
conditions (e.g. as a result of a 4am cron.daily, or a large rsync).

	If you're curious, search google groups for "+oops +prune_dcache
group:linux.kernel", sort by date and look through the first 3 or 4
pages.  You'll see the same story with the same oopses over and over.
I know the few single bit flips are _probably_ bad hardware, but the more
similarities I see, the more I wonder.

	But, since my problems have completely gone away by adding more RAM, 
I haven't been motivated to track it down anymore.

	Sorry I can't be more helpful.  Good luck.

-chris
> 

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-02 13:14 Possible dcache BUG Brett Charbeneau
@ 2004-08-05  2:16 ` Gene Heskett
  2004-08-05  3:46   ` Andrew Morton
  2004-08-05  7:25   ` Linus Torvalds
  0 siblings, 2 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-05  2:16 UTC (permalink / raw)
  To: linux-kernel

On Monday 02 August 2004 09:14, Brett Charbeneau wrote:
>Greetings,
>
>	I am getting the oops below - twice since 7/26, but I haven't a
>clue what's causing it.
>	I am not a subscriber, so any replies directed to me would be
>gratefully received.
>	Thank you for your hard work on this!

The attachment this gentleman included specifically points to 
prune_dcache().  Thats nice.  It also means I'm not alone.  See the 
'prune_dcache() Oops, the saga continues' thread.

I got in about 9pm after spending the afternoon inside a tv 
transmitter, having left the house about 1ish.  Black screen. 
keyboard leds out.  The usual.  Last log entry was at 14:49 EDT this 
afternoon.  Some file fam couldn't find message.  Whenever it went 
down, it went so fast there was no logged trace.  The next entry is 
syslogd restarting after I'd hit the reset button.

So whatever took it down, did it all by itself as the only non-system 
processes running were setiathome, X and kmail (from kde3.2, 
kde3.2.3, and kde3.3-beta2, makes no diff, all fail in 
prune_dcache() ) making an every 10 minute run to get the mail.

I *thought* I had PREEMPT turned off, but when I did a make xconfig, 
it was turned on.  So its now off, and a new 2.6.8-rc3 is building.
It was frame pointers I had turned on for the last build, still on for 
this one underway now.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  2:16 ` Gene Heskett
@ 2004-08-05  3:46   ` Andrew Morton
  2004-08-05  4:31     ` Gene Heskett
  2004-08-05  7:25   ` Linus Torvalds
  1 sibling, 1 reply; 146+ messages in thread
From: Andrew Morton @ 2004-08-05  3:46 UTC (permalink / raw)
  To: gene.heskett; +Cc: linux-kernel

Gene Heskett <gene.heskett@verizon.net> wrote:
>
> On Monday 02 August 2004 09:14, Brett Charbeneau wrote:
>  >Greetings,
>  >
>  >	I am getting the oops below - twice since 7/26, but I haven't a
>  >clue what's causing it.
>  >	I am not a subscriber, so any replies directed to me would be
>  >gratefully received.
>  >	Thank you for your hard work on this!
> 
>  The attachment this gentleman included specifically points to 
>  prune_dcache().  Thats nice.  It also means I'm not alone.  See the 
>  'prune_dcache() Oops, the saga continues' thread.

Except he's running a 2.4 kernel.

Is there any reason why I'm wrong in thinking that you have dodgy
hardware?

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  3:46   ` Andrew Morton
@ 2004-08-05  4:31     ` Gene Heskett
  2004-08-05  0:44       ` Chris Shoemaker
  2004-08-05  8:33       ` Denis Vlasenko
  0 siblings, 2 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-05  4:31 UTC (permalink / raw)
  To: linux-kernel

On Wednesday 04 August 2004 23:46, Andrew Morton wrote:
>Gene Heskett <gene.heskett@verizon.net> wrote:
>> On Monday 02 August 2004 09:14, Brett Charbeneau wrote:
>>  >Greetings,
>>  >
>>  >	I am getting the oops below - twice since 7/26, but I haven't a
>>  >clue what's causing it.
>>  >	I am not a subscriber, so any replies directed to me would be
>>  >gratefully received.
>>  >	Thank you for your hard work on this!
>>
>>  The attachment this gentleman included specifically points to
>>  prune_dcache().  Thats nice.  It also means I'm not alone.  See
>> the 'prune_dcache() Oops, the saga continues' thread.
>
>Except he's running a 2.4 kernel.
>
>Is there any reason why I'm wrong in thinking that you have dodgy
>hardware?

Well, it has, in the past week, ran memtest86-3a for 12 full passes 
over the whole gig of ram with no errors.  This was the longest test, 
I gave it a 2 hour, 5 pass test before I ever booted linux the first 
time on this motherboard over 2 weeks ago now, a new Biostar 
M7NCD-Pro, with an nforce2(3?) chipset.  I did that because I was 
comeing from an older board whose memory had been overstressed by a 
failing video card and I wanted to make sure this new memory, nearly 
$210 worth of it, was good. I gave it another, probably 4 hour test 
after the first couple of crashes, which it also passed.  And it got 
worse as the kernel versions incremented from 2.6.7.  I can have the 
same fault in prune_dcache() while running a 2.6.7 kernel without an 
instant lockup, but it will eventually die, maybe half an hour later.  
Move to 2.6.7-mm1, which has a patch to fs/dcache.c that remains 
untouched thru 2.6.8-rc2, and those kernels, if they lock up, do it 
totally, often with nothing in the logs at all.  That was the case 
today, on 2.6.8-rc3, which has a new dcache.c patch in it if I read 
the release notes correctly.

If this is dodgy hardware, give me something to take to tcwo.com when 
I ask for an rma.  Not having M$ windows of any kind here, I frankly 
haven't had the inclination to look at the cd's that came with the 
board.  Should I?

Or does linux have a hardware test suite I've not heard about?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  2:16 ` Gene Heskett
  2004-08-05  3:46   ` Andrew Morton
@ 2004-08-05  7:25   ` Linus Torvalds
  2004-08-05  7:31     ` Andrew Morton
                       ` (2 more replies)
  1 sibling, 3 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-05  7:25 UTC (permalink / raw)
  To: Gene Heskett; +Cc: Kernel Mailing List, Andrew Morton



On Wed, 4 Aug 2004, Gene Heskett wrote:
> 
> I *thought* I had PREEMPT turned off, but when I did a make xconfig, 
> it was turned on.  So its now off, and a new 2.6.8-rc3 is building.
> It was frame pointers I had turned on for the last build, still on for 
> this one underway now.

Your latest bug report definitely had preempt on, you could see the 
preempt code in the oops output when disassembled. 

Also, could you please enable CONFIG_DEBUG_BUGVERBOSE by hand if you use
the -mm tree, since you definitely hit a BUG() in there somewhere, but in
the -mm tree, the BUG()  message is totally unreadable unless you enable
BUGVERBOSE (and it's not in the config file).

(Andrew - I think you should drop that patch, or at least enable 
BUGVERBOSE on x86 - it looks like it's disabled and with no way to enable 
it in the current -mm tree..)

I _suspect_ you hit the new "list_del-debug.patch" in Andrew's tree, 
because in my tree there are no BUG_ON's in prune_cache() at all. 

If so, I think the last oops you had was

	BUG_ON(entry->next->prev != entry);

in list_del(), but the fact is, the _interesting_ part in prune_dcache() 
ends up being the "list_del_init()" at the top, which is _not_ 
instrumented by the list_del-debug patch.

So what I'd actually _like_ you to do is:
 - test 2.6.8-rc3, but with the "list_del-debug-patch" applied (appended).
   That way the BUG message will actually be readable.
 - add the same two BUG_ON() to "list_del_init()" too, for better 
   coverage. Most of the dcache uses the "init" version.
 - keep PREEMPT on, since it is quite possible (likely) that this is a 
   preempt problem.

I'd love to see if you can hit the BUG() that way.

		Linus

----

>From Manfred Spraul

A list_del debugging check.

Signed-off-by: Andrew Morton <akpm@osdl.org>
---

 25-akpm/include/linux/list.h |    3 +++
 1 files changed, 3 insertions(+)

diff -puN include/linux/list.h~list_del-debug include/linux/list.h
--- 25/include/linux/list.h~list_del-debug	Mon Jun 14 16:44:07 2004
+++ 25-akpm/include/linux/list.h	Mon Jun 14 16:51:27 2004
@@ -6,6 +6,7 @@
 #include <linux/stddef.h>
 #include <linux/prefetch.h>
 #include <asm/system.h>
+#include <asm/bug.h>
 
 /*
  * These are non-NULL pointers that will result in page faults
@@ -160,6 +161,8 @@ static inline void __list_del(struct lis
  */
 static inline void list_del(struct list_head *entry)
 {
+	BUG_ON(entry->prev->next != entry);
+	BUG_ON(entry->next->prev != entry);
 	__list_del(entry->prev, entry->next);
 	entry->next = LIST_POISON1;
 	entry->prev = LIST_POISON2;
_

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  7:25   ` Linus Torvalds
@ 2004-08-05  7:31     ` Andrew Morton
  2004-08-05  8:33     ` Denis Vlasenko
  2004-08-06  2:50     ` Linus Torvalds
  2 siblings, 0 replies; 146+ messages in thread
From: Andrew Morton @ 2004-08-05  7:31 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: gene.heskett, linux-kernel

Linus Torvalds <torvalds@osdl.org> wrote:
>
> (Andrew - I think you should drop that patch, or at least enable 
>  BUGVERBOSE on x86 - it looks like it's disabled and with no way to enable 
>  it in the current -mm tree..)

ah, OK.  I'll put it back to `#if 1', thanks.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  4:31     ` Gene Heskett
  2004-08-05  0:44       ` Chris Shoemaker
@ 2004-08-05  8:33       ` Denis Vlasenko
  2004-08-05 14:19         ` Gene Heskett
  2004-08-05 21:26         ` Chris Shoemaker
  1 sibling, 2 replies; 146+ messages in thread
From: Denis Vlasenko @ 2004-08-05  8:33 UTC (permalink / raw)
  To: gene.heskett, linux-kernel

> Well, it has, in the past week, ran memtest86-3a for 12 full passes
> over the whole gig of ram with no errors.  This was the longest test,
> I gave it a 2 hour, 5 pass test before I ever booted linux the first
> time on this motherboard over 2 weeks ago now, a new Biostar
> M7NCD-Pro, with an nforce2(3?) chipset.  I did that because I was
> comeing from an older board whose memory had been overstressed by a
> failing video card and I wanted to make sure this new memory, nearly
> $210 worth of it, was good. I gave it another, probably 4 hour test
> after the first couple of crashes, which it also passed.  And it got

You may use cpuburn to test RAM/CPU too.

Although I have a memory which, when clocked a bit too high,
pass both memtest86 and cpuburn for extended periods of time,
yet large compile runs die with sig11 sometimes. Using a tiny
bit less aggressive clocking helped. :)
-- 
vda

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  7:25   ` Linus Torvalds
  2004-08-05  7:31     ` Andrew Morton
@ 2004-08-05  8:33     ` Denis Vlasenko
  2004-08-05 14:55       ` Gene Heskett
  2004-08-05 16:26       ` Linus Torvalds
  2004-08-06  2:50     ` Linus Torvalds
  2 siblings, 2 replies; 146+ messages in thread
From: Denis Vlasenko @ 2004-08-05  8:33 UTC (permalink / raw)
  To: Linus Torvalds, Gene Heskett; +Cc: Kernel Mailing List, Andrew Morton

Hi Linus,

On Thursday 05 August 2004 10:25, Linus Torvalds wrote:
> On Wed, 4 Aug 2004, Gene Heskett wrote:
> > I *thought* I had PREEMPT turned off, but when I did a make xconfig,
> > it was turned on.  So its now off, and a new 2.6.8-rc3 is building.
> > It was frame pointers I had turned on for the last build, still on for
> > this one underway now.
>
> Your latest bug report definitely had preempt on, you could see the
> preempt code in the oops output when disassembled.
>
> Also, could you please enable CONFIG_DEBUG_BUGVERBOSE by hand if you use
> the -mm tree, since you definitely hit a BUG() in there somewhere, but in
> the -mm tree, the BUG()  message is totally unreadable unless you enable
> BUGVERBOSE (and it's not in the config file).

It is not a BUG().

It's an oops (dereferencing a d_op pointer with value 0x00000900+14
IIRC, Gene has complete disassembly with location of that event).

It is not reproducible on request, but happens for him from time
to time in the same place with the same bogus value of d_op.
-- 
vda

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  0:44       ` Chris Shoemaker
@ 2004-08-05  8:35         ` Denis Vlasenko
  2004-08-05 14:14           ` Gene Heskett
  2004-08-05 13:48         ` Gene Heskett
  1 sibling, 1 reply; 146+ messages in thread
From: Denis Vlasenko @ 2004-08-05  8:35 UTC (permalink / raw)
  To: Chris Shoemaker, Gene Heskett; +Cc: linux-kernel

> 	FWIW, I've seen no fewer than 4 independent reports that looked
> suspiciously like yours and mine over the past 3 months.  Maybe we all
> have bad hardware, and memtest86 just isn't stressful enough to show it.
> The alternative is that there's some bug that has affected several
> versions of 2.6 (and maybe 2.4) that seems to hit in low memory
> conditions (e.g. as a result of a 4am cron.daily, or a large rsync).
>
> 	If you're curious, search google groups for "+oops +prune_dcache
> group:linux.kernel", sort by date and look through the first 3 or 4
> pages.  You'll see the same story with the same oopses over and over.
> I know the few single bit flips are _probably_ bad hardware, but the more
> similarities I see, the more I wonder.
>
> 	But, since my problems have completely gone away by adding more RAM,
> I haven't been motivated to track it down anymore.

Let's rule out PREEMPT first

> 	Sorry I can't be more helpful.  Good luck.

Maybe turn PREEMPT back on?
-- 
vda

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  0:44       ` Chris Shoemaker
  2004-08-05  8:35         ` Denis Vlasenko
@ 2004-08-05 13:48         ` Gene Heskett
       [not found]           ` <200408210118.02011.vda@port.imtp.ilyichevsk.odessa.ua>
  1 sibling, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-05 13:48 UTC (permalink / raw)
  To: linux-kernel

On Wednesday 04 August 2004 20:44, Chris Shoemaker wrote:
>On Thu, Aug 05, 2004 at 12:31:21AM -0400, Gene Heskett wrote:
>> On Wednesday 04 August 2004 23:46, Andrew Morton wrote:
>> >Gene Heskett <gene.heskett@verizon.net> wrote:
>> >>  The attachment this gentleman included specifically points to
>> >>  prune_dcache().  Thats nice.  It also means I'm not alone. 
>> >> See the 'prune_dcache() Oops, the saga continues' thread.
>> >
>> >Except he's running a 2.4 kernel.

I didn't take note of that. What triggerd my response was the 
prune_dcache() problem.  In my case we've taken a couple of than 
apart and the Opps is actually in the _dput() statement where 
the eas register contains a very small, but non-zero value, like 
maybe 0x00000820.  Thats a bit difficult as some of this code is 
marked as __inline, and can reach over 130 bytes between the 
labels we put into the srcs.  IMO, thats too much to inline 
if its used more than once, and it is.

And guess what, both prune_dcache() and _dput() are inlined...

>> >Is there any reason why I'm wrong in thinking that you have dodgy
>> >hardware?
>>
>> Well, it has, in the past week, ran memtest86-3a for 12 full
>> passes over the whole gig of ram with no errors.  This was the
>> longest test, I gave it a 2 hour, 5 pass test before I ever booted
>> linux the first time on this motherboard over 2 weeks ago now, a
>> new Biostar M7NCD-Pro, with an nforce2(3?) chipset.  I did that
>> because I was comeing from an older board whose memory had been
>> overstressed by a failing video card and I wanted to make sure
>> this new memory, nearly $210 worth of it, was good. I gave it
>> another, probably 4 hour test after the first couple of crashes,
>> which it also passed.  And it got worse as the kernel versions
>> incremented from 2.6.7.  I can have the same fault in
>> prune_dcache() while running a 2.6.7 kernel without an instant
>> lockup, but it will eventually die, maybe half an hour later. Move
>> to 2.6.7-mm1, which has a patch to fs/dcache.c that remains
>> untouched thru 2.6.8-rc2, and those kernels, if they lock up, do
>> it totally, often with nothing in the logs at all.  That was the
>> case today, on 2.6.8-rc3, which has a new dcache.c patch in it if
>> I read the release notes correctly.
>>
>> If this is dodgy hardware, give me something to take to tcwo.com
>> when I ask for an rma.  Not having M$ windows of any kind here, I
>> frankly haven't had the inclination to look at the cd's that came
>> with the board.  Should I?
>>
>> Or does linux have a hardware test suite I've not heard about?
>
>Gene,
>	I sympathize with you.  Back in March and April I was seeing
>oopses in prune_dcache() once every few days.  After tracing the asm
>down for a few of them, I found one that looked like a 3 bit flip
> and then one that looked like a single bit flip.  I memtested my
> RAM for days with no failure.  I tried cpuburn.  I looped over
> kernel compiles. I couldn't make it fail, but every day or two, as
> long as I wasn't trying, I'd get an oops, and more than %50 were in
> prune_dcache.  I believed that there was a correspondence with low
> memory conditions, but I never proved this.  I _added_ a memory
> module (keeping everything I had) and I compiled 2.6.7-rc3 on Jun
> 10th.  I haven't oopsed since.  (I think I may also have turned off
> PREEMP around this time, so that's why I suggested it earlier.)
>
>	FWIW, I've seen no fewer than 4 independent reports that looked
>suspiciously like yours and mine over the past 3 months.  Maybe we
> all have bad hardware, and memtest86 just isn't stressful enough to
> show it. The alternative is that there's some bug that has affected
> several versions of 2.6 (and maybe 2.4) that seems to hit in low
> memory conditions (e.g. as a result of a 4am cron.daily, or a large
> rsync).

That does seem to correlate slightly, but yesterdays was in the 
middle of the afternoon while I was elsewhere, and very little 
is cron related at that time of day.

>	If you're curious, search google groups for "+oops +prune_dcache
>group:linux.kernel", sort by date and look through the first 3 or 4
>pages.  You'll see the same story with the same oopses over and
> over. I know the few single bit flips are _probably_ bad hardware,
> but the more similarities I see, the more I wonder.

Me too, says he in a plaintive voice.

>	But, since my problems have completely gone away by adding more
> RAM, I haven't been motivated to track it down anymore.
>
>	Sorry I can't be more helpful.  Good luck.

This is a '3 slots for ram' board, and according to the docs, the 
Dual Channel DDR 400 banking scheme only works if the ram is in 
the 1st and 3rd slots, so thats where I put it.  memtest86 reports 
a ram bandwidth of around 1.2 Gb/sec, and an L1 cache bandwidth of 
around 12Gb/sec.  No L2 present.

I might add that the first time I ran memtest86, the bios was 
missconfigured, at factory defaults, and was running the athlon 2800 
at 3200, and the memory bus at over 450 mhz.  No problem, but I did 
find an FSB and multiplier setting that gave a 400Mb bus, and which 
says the athlon-XP is a 2800+, so I figure that should be correct.

The defaults didn't always want to post, but gave no other problems 
once it had.

If I put another stick in the last, center slot, how does the 
hardware accept that?  I'd have to go get one as the 256's in the 
old board are known dodgy.  Would this incipient Oom condition not 
be handled correctly?  If thats the cause, then maybe that portion 
of the code needs looked at, by experts the likes of which I don't 
pretend to be.  But with a Gig of ram, I don't recall it ever 
using any swap.  But it's perilously close to that right now 
according to top:

top - 08:58:30 up  9:59,  5 users,  load average: 1.21, 1.31, 1.16
Tasks: 100 total,   2 running,  97 sleeping,   0 stopped,   1 zombie
Cpu(s):  5.6% us,  2.6% sy, 91.4% ni,  0.0% id,  0.0% wa,  0.3% hi,  0.0% si
Mem:   1036020k total,  1018644k used,    17376k free,   230960k buffers
Swap:  3857104k total,        0k used,  3857104k free,   119552k cached

mmm, I wonder who the zombie is.  Ahh, it's ~/bin/its-daylight.  
It's a script that cron triggered, and which changes the mode of 
the heyu/xtend stuff for daytime operations.  Its (a bash script)
apparently hung looking for a response it didn't get.  I have 3 
of those at various times of the day and I've never gotten email 
from that one.  The mode change does occur though...  FWIW heyu 
has been fixed, the distro version has a severe scope problem 
from a missing '}' which was not caught by the compiler, but by 
a tool I wrote years ago for os9 that I've ported to linux!   The 
heyu author ): didn't seem to be interested in fixing it either.

I'll go take a look at it after I've sent this, but it does bring 
up a sore point.  linux doesn't get this right, os9 did.  zombies 
are killable by os9, it simply takes it out of the execution queue, 
and reclaims all resources used back into the free pool, no 
questions asked or expected.  We shouldn't have to reboot just to 
kill a fscking zombie...

In any event, PREEPMT is now off, if this takes a dump, then the 
hi-mem support gets turned off and PREEMPT back on.  One thing at a time.

Thanks for the discussion, it was "enlightening" :-)

>-chris

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  8:35         ` Denis Vlasenko
@ 2004-08-05 14:14           ` Gene Heskett
  0 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-05 14:14 UTC (permalink / raw)
  To: linux-kernel

On Thursday 05 August 2004 04:35, Denis Vlasenko wrote:
>
>Let's rule out PREEMPT first
>
>> 	Sorry I can't be more helpful.  Good luck.
>
>Maybe turn PREEMPT back on?

I found it was on when I checked last night, and turned it off for 
this build.  About 9 hours uptime now.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  8:33       ` Denis Vlasenko
@ 2004-08-05 14:19         ` Gene Heskett
       [not found]           ` <200408070203.35268.vda@port.imtp.ilyichevsk.odessa.ua>
  2004-08-05 21:26         ` Chris Shoemaker
  1 sibling, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-05 14:19 UTC (permalink / raw)
  To: linux-kernel

On Thursday 05 August 2004 04:33, Denis Vlasenko wrote:
>> Well, it has, in the past week, ran memtest86-3a for 12 full
>> passes over the whole gig of ram with no errors.  This was the
>> longest test, I gave it a 2 hour, 5 pass test before I ever booted
>> linux the first time on this motherboard over 2 weeks ago now, a
>> new Biostar M7NCD-Pro, with an nforce2(3?) chipset.  I did that
>> because I was comeing from an older board whose memory had been
>> overstressed by a failing video card and I wanted to make sure
>> this new memory, nearly $210 worth of it, was good. I gave it
>> another, probably 4 hour test after the first couple of crashes,
>> which it also passed.  And it got
>
>You may use cpuburn to test RAM/CPU too.

Setiathome should be doing a pretty good job of that, the cpu is at 
100% 99.99% of the time.  Only going down for a few seconds as its 
managing script switches the link to a new data packet directory when 
its done with the current one.  I keep 101 packets cached here. :)

>Although I have a memory which, when clocked a bit too high,
>pass both memtest86 and cpuburn for extended periods of time,
>yet large compile runs die with sig11 sometimes. Using a tiny
>bit less aggressive clocking helped. :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  8:33     ` Denis Vlasenko
@ 2004-08-05 14:55       ` Gene Heskett
  2004-08-05 16:26       ` Linus Torvalds
  1 sibling, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-05 14:55 UTC (permalink / raw)
  To: linux-kernel

On Thursday 05 August 2004 04:33, Denis Vlasenko wrote:
>Hi Linus,
>
>On Thursday 05 August 2004 10:25, Linus Torvalds wrote:
>> On Wed, 4 Aug 2004, Gene Heskett wrote:
>> > I *thought* I had PREEMPT turned off, but when I did a make
>> > xconfig, it was turned on.  So its now off, and a new 2.6.8-rc3
>> > is building. It was frame pointers I had turned on for the last
>> > build, still on for this one underway now.
>>
>> Your latest bug report definitely had preempt on, you could see
>> the preempt code in the oops output when disassembled.
>>
>> Also, could you please enable CONFIG_DEBUG_BUGVERBOSE by hand if
>> you use the -mm tree, since you definitely hit a BUG() in there
>> somewhere, but in the -mm tree, the BUG()  message is totally
>> unreadable unless you enable BUGVERBOSE (and it's not in the
>> config file).
>
>It is not a BUG().
>
>It's an oops (dereferencing a d_op pointer with value 0x00000900+14
>IIRC, Gene has complete disassembly with location of that event).

Unforch Denis, this is 2.6.8-rc3, the stuff we dissed was from 2.6.7, 
where it can be hit without (usually that is) killing the machine 
instantly.  From 2.6.7-mm1 on, the death seems generally sudden and 
instant, generally no logs get written at all.

>It is not reproducible on request, but happens for him from time
>to time in the same place with the same bogus value of d_op.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  8:33     ` Denis Vlasenko
  2004-08-05 14:55       ` Gene Heskett
@ 2004-08-05 16:26       ` Linus Torvalds
  2004-08-05 18:06         ` Ingo Molnar
                           ` (3 more replies)
  1 sibling, 4 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-05 16:26 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: Gene Heskett, Kernel Mailing List, Andrew Morton



On Thu, 5 Aug 2004, Denis Vlasenko wrote:
> 
> It is not a BUG().

Oh yes it is. The 2.6.8-rc2-mm2 report definitely was a BUG().

Earlier ones may not have been, but on the other hand, earlier ones may
not have had the BUG()-check for corrupted list_del() usage - it's not in
the standard kernel, and I don't know when it was added to -mm. (We used
to have it a _long_ time ago, but then we removed it because there were no
reports of problems).

> It's an oops (dereferencing a d_op pointer with value 0x00000900+14
> IIRC, Gene has complete disassembly with location of that event).

.. and that must be because of some kind of pointer corruption, where the 
dentry was either free'd twice or the dentry simply isn't a dentry at all, 
it just got to be used as such because of some bug.

> It is not reproducible on request, but happens for him from time
> to time in the same place with the same bogus value of d_op.

I've followed the discussion. You may not have noticed that the last one 
was different. (And I _think_ it may hav ebeen the first time Gene did a 
-mm kernel, so I do believe that the list_del() debugging was the thing 
that caught it).

Anyway, one other thing that makes me worry is the fact that Gene 
apparently has a K7. One of the things AMD has gotten wrong several times 
is prefetching, and it so happens that the dcache code is one of the users 
of the prefetch instruction. prude_dcache() in particular.

So I'm also entertaining the notion that there's an actual prefetch data 
corruption, not just the known AMD bug with occasional spurious page 
faults. Who else has seen the problem? What CPU's are involved?

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05 16:26       ` Linus Torvalds
@ 2004-08-05 18:06         ` Ingo Molnar
  2004-08-05 18:50           ` Linus Torvalds
  2004-08-05 21:10         ` Chris Shoemaker
                           ` (2 subsequent siblings)
  3 siblings, 1 reply; 146+ messages in thread
From: Ingo Molnar @ 2004-08-05 18:06 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denis Vlasenko, Gene Heskett, Kernel Mailing List, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1952 bytes --]


* Linus Torvalds <torvalds@osdl.org> wrote:

> Anyway, one other thing that makes me worry is the fact that Gene
> apparently has a K7. One of the things AMD has gotten wrong several
> times is prefetching, and it so happens that the dcache code is one of
> the users of the prefetch instruction. prude_dcache() in particular.

hm, i too happen to have an Athlon64 box (running the x86 kernel) where
i can reproduce dcache pruning crashes after a few hours of testing
using a near-vanilla kernel. The crash is triggered by two infinite
loops of:

        while true; do du /; done
        while true;
            dd if=/dev/zero of=/tmp/bigfile bs=1000000 count=500
            sync
            sleep 30
        done

using FC2, stock normal ext3, 1GB of RAM, single-disk IDE and nothing
else.

NOTE: i discovered these crashes while working on the voluntary-preempt
stuff, so it's not a pristine kernel.

But i reproduced it using 2.6.8-rc2 plus voluntary-preempt=1 (i.e. no
softirq or hardirq redirection to process context) - so it does nothing
that CONFIG_PREEMPT wouldnt do. (i had CONFIG_PREEMPT on but
kernel_preemption=0.) I've attached 3 oopses.

this patch does introduce a conditional reschedule in prune_icache:

--- linux/fs/inode.c.orig	
+++ linux/fs/inode.c	
@@ -428,6 +429,8 @@ static void prune_icache(int nr_to_scan)
 	for (nr_scanned = 0; nr_scanned < nr_to_scan; nr_scanned++) {
 		struct inode *inode;
 
+		voluntary_resched_lock(&inode_lock);
+
 		if (list_empty(&inode_unused))
 			break;

but it should be perfectly fine to do that there.

NOTE2: i tried hard but couldnt reproduce the problem using the very
same kernel and the same workload on a PIII box. Once i ran it overnight
to check. Only the Athlon64 box does it. It could also be a hardware
problem - albeit the box withstood days of memtest86.

NOTE3: there's no history of instability on this box otherwise, but i
only started doing this test 1-2 weeks ago.

	Ingo

[-- Attachment #2: 11 --]
[-- Type: text/plain, Size: 1425 bytes --]

Unable to handle kernel paging request at virtual address ffffffd8
 printing eip:
c016a3d0
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP 
Modules linked in:
CPU:    0
EIP:    0060:[<c016a3d0>]    Not tainted VLI
EFLAGS: 00010217   (2.6.8-rc2-mm2) 
EIP is at remove_inode_buffers+0x60/0xe0
eax: 00000000   ebx: c03ba9dc   ecx: 00000000   edx: c03ba8d0
esi: c03ba8d0   edi: c0379b2a   ebp: c4115ec4   esp: c4115eac
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 39, threadinfo=c4114000 task=c40aa070)
Stack: c03ba8d0 c0379b76 00000001 c03ba8d8 c03ba8d0 00000000 c4115ef8 c0186c4c 
       c03ba8d0 00000077 c4114000 00000000 0000004d 00000000 c4115ee4 c4115ee4 
       c4114000 c07fd6a0 00004e09 c4115f04 c0186df5 00000080 c4115f38 c014f4b3 
Call Trace:
 [<c01059ff>] show_stack+0x8f/0xb0
 [<c0105bb3>] show_registers+0x163/0x1d0
 [<c0105dc6>] die+0xe6/0x1c0
 [<c0117773>] do_page_fault+0x213/0x6c0
 [<c0105674>] exception_start+0x6/0xe
 [<c0186c4c>] prune_icache+0x20c/0x390
 [<c0186df5>] shrink_icache_memory+0x25/0x50
 [<c014f4b3>] shrink_slab+0x123/0x1d0
 [<c01511ee>] balance_pgdat+0x24e/0x2a0
 [<c015130c>] kswapd+0xcc/0xe0
 [<c0102899>] kernel_thread_helper+0x5/0xc
Code: 00 e0 ff ff 21 e0 ff 40 14 8d 47 4c 89 45 ec 31 c0 86 47 4c 84 c0 0f 8e 79 00 00 00 8b 86 0c 01 00 00 39 d8 74 23 89 c1 8d 76 00 <8b> 41 d8 a8 02 75 5a 8b 01 8b 51 04 89 02 89 09 89 50 04 8b 03 
 <6>note: kswapd0[39] exited with preempt_count 1

[-- Attachment #3: 12 --]
[-- Type: text/plain, Size: 1500 bytes --]

Unable to handle kernel NULL pointer dereference at virtual address 00000104
 printing eip:
c014c0d1
*pde = 36c9c001
*pte = 00000000
Oops: 0002 [#1]
PREEMPT SMP 
Modules linked in:
CPU:    0
EIP:    0060:[<c014c0d1>]    Not tainted
EFLAGS: 00010016   (2.6.8-rc2) 
EIP is at free_block+0x51/0xe0
eax: 00000100   ebx: e7d580c8   ecx: e7d58100   edx: e7d58100
esi: c40e5040   edi: 00000014   ebp: c413be38   esp: c413be1c
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 39, threadinfo=c413a000 task=c40a8070)
Stack: c40e5040 eb864100 c40e5068 c40e5078 c4160050 00000282 e61addc0 c413be64 
       c014c1d0 c40e5040 c4160050 0000001b c4160050 c40e50a0 0000001b c4160040 
       00000282 e61addc0 c413be80 c014c792 c40e5040 c4160040 e61ade5c c413bee4 
Call Trace:
 [<c0105a0f>] show_stack+0x8f/0xb0
 [<c0105bc3>] show_registers+0x163/0x1c0
 [<c0105d97>] die+0xb7/0x180
 [<c0116fb3>] do_page_fault+0x213/0x6c9
 [<c0105684>] exception_start+0x6/0xe
 [<c014c1d0>] cache_flusharray+0x70/0x140
 [<c014c792>] kmem_cache_free+0x52/0x60
 [<c01b8094>] ext3_destroy_inode+0x24/0x30
 [<c018713b>] destroy_inode+0x3b/0x60
 [<c0187479>] dispose_list+0x59/0x110
 [<c0187927>] prune_icache+0x127/0x3a0
 [<c0187be8>] shrink_icache_memory+0x48/0x50
 [<c014f4ec>] shrink_slab+0x15c/0x1d0
 [<c0151237>] balance_pgdat+0x217/0x270
 [<c015135c>] kswapd+0xcc/0xe0
 [<c0102859>] kernel_thread_helper+0x5/0xc
Code: 89 50 04 89 02 8b 43 0c 31 d2 c7 03 00 01 10 00 c7 43 04 00 
 <6>note: kswapd0[39] exited with preempt_count 1

[-- Attachment #4: 13 --]
[-- Type: text/plain, Size: 1258 bytes --]

Unable to handle kernel NULL pointer dereference at virtual address 0000000c
 printing eip:
c019a4e1
*pde = 0ddbe001
*pte = 00000000
Oops: 0002 [#1]
PREEMPT 
Modules linked in:
CPU:    0
EIP:    0060:[<c019a4e1>]    Not tainted
EFLAGS: 00010202   (2.6.8-rc2) 
EIP is at prune_icache+0x431/0x600
eax: 00000008   ebx: c0538b3c   ecx: c0538b44   edx: f1b3d17c
esi: 00000029   edi: c03f0790   ebp: f7ee9f04   esp: f7ee9ec8
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 38, threadinfo=f7ee8000 task=f7eb0670)
Stack: c0538b3c 00000077 f7ee9f10 c01b63a1 c17590a0 c17590c0 c17590e0 f7ee8000 
       00000000 00000029 c0538dc4 c07d6884 00000080 00000000 f7ee8000 f7ee9f10 
       c019a6f8 00000080 f7ee9f44 c015450c 00000080 000000d0 00017a89 9384c800 
Call Trace:
 [<c0105d6f>] show_stack+0x7f/0xa0
 [<c0105f1e>] show_registers+0x15e/0x1c0
 [<c0106127>] die+0xe7/0x240
 [<c0116113>] do_page_fault+0x213/0x6c8
 [<c0105a01>] error_code+0x2d/0x38
 [<c019a6f8>] shrink_icache_memory+0x48/0x50
 [<c015450c>] shrink_slab+0x15c/0x1a0
 [<c015657e>] balance_pgdat+0x1ce/0x210
 [<c015667f>] kswapd+0xbf/0xd0
 [<c0102795>] kernel_thread_helper+0x5/0x10
Code: 89 50 04 c7 03 00 00 00 00 c7 43 04 00 00 00 00 8d 53 08 8b 
 <6>note: kswapd0[38] exited with preempt_count 1

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05 18:06         ` Ingo Molnar
@ 2004-08-05 18:50           ` Linus Torvalds
  2004-08-05 20:29             ` Andi Kleen
       [not found]             ` <20040806073739.GA6617@elte.hu>
  0 siblings, 2 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-05 18:50 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denis Vlasenko, Gene Heskett, Kernel Mailing List, Andrew Morton,
	Andi Kleen



On Thu, 5 Aug 2004, Ingo Molnar wrote:
> 
> * Linus Torvalds <torvalds@osdl.org> wrote:
> 
> > Anyway, one other thing that makes me worry is the fact that Gene
> > apparently has a K7. One of the things AMD has gotten wrong several
> > times is prefetching, and it so happens that the dcache code is one of
> > the users of the prefetch instruction. prude_dcache() in particular.
> 
> hm, i too happen to have an Athlon64 box (running the x86 kernel) where
> i can reproduce dcache pruning crashes after a few hours of testing
> using a near-vanilla kernel.

Very interesthing.

The K8 core (aka Opteron or Athlon64) has exactly the same prefetch page
fault bugs that the K7 core has. This, coupled with your observation

> NOTE2: i tried hard but couldnt reproduce the problem using the very
> same kernel and the same workload on a PIII box. Once i ran it overnight
> to check. Only the Athlon64 box does it. It could also be a hardware
> problem - albeit the box withstood days of memtest86.

really makes me wonder..

NOTE! Almost every time we've wondered about a CPU bug, it really wasn't. 
It usually ends up being something really subtle with memory ordering, 
with TLB updates, or something. So I'm putting the prefetch issue up on 
the table as just a wild theory. It would be interestign to see if we can 
get a bigger set of boxes with this crash.

Andi, I think you were the contact for the AMD prefetch bug. Can you ask 
around the same people whether there might be other problems in this area? 
No point in putting a lot of effort into it, but just as one thing to 
check for..

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05 18:50           ` Linus Torvalds
@ 2004-08-05 20:29             ` Andi Kleen
       [not found]             ` <20040806073739.GA6617@elte.hu>
  1 sibling, 0 replies; 146+ messages in thread
From: Andi Kleen @ 2004-08-05 20:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: mingo, vda, gene.heskett, linux-kernel, akpm

On Thu, 5 Aug 2004 11:50:33 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:

> 
> 
> On Thu, 5 Aug 2004, Ingo Molnar wrote:
> > 
> > * Linus Torvalds <torvalds@osdl.org> wrote:
> > 
> > > Anyway, one other thing that makes me worry is the fact that Gene
> > > apparently has a K7. One of the things AMD has gotten wrong several
> > > times is prefetching, and it so happens that the dcache code is one of
> > > the users of the prefetch instruction. prude_dcache() in particular.
> > 
> > hm, i too happen to have an Athlon64 box (running the x86 kernel) where
> > i can reproduce dcache pruning crashes after a few hours of testing
> > using a near-vanilla kernel.
> 
> Very interesthing.
> 
> The K8 core (aka Opteron or Athlon64) has exactly the same prefetch page
> fault bugs that the K7 core has. This, coupled with your observation

Yep, but they should be handled. Of course in theory it could be a subtle
bug in the prefetch handler. But normally even when that goes wrong you
just get a obvious oops on the prefetch instruction itself.

When you disable the use of prefetch does it still happen? 

diff -u linux-2.6.8rc2-update/include/asm-i386/processor.h-o linux-2.6.8rc2-update/include/asm-i386/processor.h
--- linux-2.6.8rc2-update/include/asm-i386/processor.h-o	2004-07-28 02:23:44.000000000 +0200
+++ linux-2.6.8rc2-update/include/asm-i386/processor.h	2004-08-05 22:25:46.000000000 +0200
@@ -612,6 +612,7 @@
 
 #define ASM_NOP_MAX 8
 
+#if 0
 /* Prefetch instructions for Pentium III and AMD Athlon */
 /* It's not worth to care about 3dnow! prefetches for the K6
    because they are microcoded there and very slow.
@@ -640,6 +641,7 @@
 			  "r" (x));
 }
 #define spin_lock_prefetch(x)	prefetchw(x)
+#endif
 
 extern void select_idle_routine(const struct cpuinfo_x86 *c);
 


> > NOTE2: i tried hard but couldnt reproduce the problem using the very
> > same kernel and the same workload on a PIII box. Once i ran it overnight
> > to check. Only the Athlon64 box does it. It could also be a hardware
> > problem - albeit the box withstood days of memtest86.

Both K8/K7 are usually a lot faster and a lot more aggressive in out of order 
execution than the P3 box. A P4 would be a better comparison.

> Andi, I think you were the contact for the AMD prefetch bug. Can you ask 
> around the same people whether there might be other problems in this area? 
> No point in putting a lot of effort into it, but just as one thing to 
> check for..

A bigger sample size that shows it really only happens on AMD first would be useful.

-Andi

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05 16:26       ` Linus Torvalds
  2004-08-05 18:06         ` Ingo Molnar
@ 2004-08-05 21:10         ` Chris Shoemaker
  2004-08-06  2:03         ` Gene Heskett
  2004-08-06  2:12         ` Gene Heskett
  3 siblings, 0 replies; 146+ messages in thread
From: Chris Shoemaker @ 2004-08-05 21:10 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Denis Vlasenko, Gene Heskett, Kernel Mailing List, Andrew Morton

On Thu, Aug 05, 2004 at 09:26:10AM -0700, Linus Torvalds wrote:
> 
> Anyway, one other thing that makes me worry is the fact that Gene 
> apparently has a K7. One of the things AMD has gotten wrong several times 
> is prefetching, and it so happens that the dcache code is one of the users 
> of the prefetch instruction. prude_dcache() in particular.
> 
> So I'm also entertaining the notion that there's an actual prefetch data 
> corruption, not just the known AMD bug with occasional spurious page 
> faults. Who else has seen the problem? What CPU's are involved?
> 
> 		Linus

Assuming that what I was seeing was the same problem...

chris@peace:~$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 8
model name      : Celeron (Coppermine)
stepping        : 10
cpu MHz         : 1002.487
cache size      : 128 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de tsc msr pae mce cx8 sep mtrr pge mca cmov
pat pse36 mmx fxsr sse
bogomips        : 1982.46


BTW, a recent oops from wli looked similar, but I don't think he's
spoken up in this thread.

He seems busy tracking down other things.

-chris


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  8:33       ` Denis Vlasenko
  2004-08-05 14:19         ` Gene Heskett
@ 2004-08-05 21:26         ` Chris Shoemaker
  1 sibling, 0 replies; 146+ messages in thread
From: Chris Shoemaker @ 2004-08-05 21:26 UTC (permalink / raw)
  To: Denis Vlasenko; +Cc: gene.heskett, linux-kernel

On Thu, Aug 05, 2004 at 11:33:44AM +0300, Denis Vlasenko wrote:
> 
> You may use cpuburn to test RAM/CPU too.
> 
> Although I have a memory which, when clocked a bit too high,
> pass both memtest86 and cpuburn for extended periods of time,
> yet large compile runs die with sig11 sometimes. Using a tiny
> bit less aggressive clocking helped. :)
> -- 
> vda

Oh yes, now I remember that it was you who recommened cpuburn to me back
in April/May or so.  I also was suspicious that neither memtest86 nor
cpuburn were really stressful enough, but the large-compiles-in-a-loop
weren't any better for me.  I would _love_ to just have some confident
test to say "yep, your hardware is bad, go buy a shiny new box"  :)

I've seen memtest86 actually find bad RAM on a machine before, so I know
it works _sometimes_.  Can anyone say the same for cpuburn?  What does
a failure look like, and were there correlated symptoms like kernel oopses?

-chris
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05 16:26       ` Linus Torvalds
  2004-08-05 18:06         ` Ingo Molnar
  2004-08-05 21:10         ` Chris Shoemaker
@ 2004-08-06  2:03         ` Gene Heskett
  2004-08-06  2:12         ` Gene Heskett
  3 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-06  2:03 UTC (permalink / raw)
  To: linux-kernel

On Thursday 05 August 2004 12:26, Linus Torvalds wrote:
>On Thu, 5 Aug 2004, Denis Vlasenko wrote:
>> It is not a BUG().
>
>Oh yes it is. The 2.6.8-rc2-mm2 report definitely was a BUG().
>
>Earlier ones may not have been, but on the other hand, earlier ones
> may not have had the BUG()-check for corrupted list_del() usage -
> it's not in the standard kernel, and I don't know when it was added
> to -mm. (We used to have it a _long_ time ago, but then we removed
> it because there were no reports of problems).
>
>> It's an oops (dereferencing a d_op pointer with value
>> 0x00000900+14 IIRC, Gene has complete disassembly with location of
>> that event).
>
>.. and that must be because of some kind of pointer corruption,
> where the dentry was either free'd twice or the dentry simply isn't
> a dentry at all, it just got to be used as such because of some
> bug.
>
>> It is not reproducible on request, but happens for him from time
>> to time in the same place with the same bogus value of d_op.
>
>I've followed the discussion. You may not have noticed that the last
> one was different. (And I _think_ it may hav ebeen the first time
> Gene did a -mm kernel, so I do believe that the list_del()
> debugging was the thing that caught it).
>
>Anyway, one other thing that makes me worry is the fact that Gene
>apparently has a K7. One of the things AMD has gotten wrong several
> times is prefetching, and it so happens that the dcache code is one
> of the users of the prefetch instruction. prude_dcache() in
> particular.
>
>So I'm also entertaining the notion that there's an actual prefetch
> data corruption, not just the known AMD bug with occasional
> spurious page faults. Who else has seen the problem? What CPU's are
> involved?
>
>		Linus

Two things that may be of interest: 1, I do run the -mm kernels too as 
I figure the more they get excersized, the quicker some fault will be 
found.  This included the whole chain of 2.6.7's & 2.6.8-rc1/2-mm1/2.  
In this case all hell broke loose here while everyone was at the 
conventions.  Not your fault, but I was "grabbing a life jacket" 
here.

2. with the patch Linus sent, top is now showing 383 megs of free ram.  
I suspect that without that patch, it might well be less than 15 megs 
after a many (10+) hour uptime.  So this patch is definitely more 
aggressive in its memory housekeeping than without it.

And everything is on except the acpi stuffs, which has ever interested 
me here.  I do use apm, but only for shutdowns & rtc in utc.
PREEMPT, 4k stacks, page tables in high mem, frame pointers etc, its 
all on right now. So far, I haven't even heard the first shoe 
drop. :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05 16:26       ` Linus Torvalds
                           ` (2 preceding siblings ...)
  2004-08-06  2:03         ` Gene Heskett
@ 2004-08-06  2:12         ` Gene Heskett
  3 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-06  2:12 UTC (permalink / raw)
  To: linux-kernel

On Thursday 05 August 2004 12:26, Linus Torvalds wrote:
[...]
>Anyway, one other thing that makes me worry is the fact that Gene
>apparently has a K7. One of the things AMD has gotten wrong several
> times is prefetching, and it so happens that the dcache code is one
> of the users of the prefetch instruction. prude_dcache() in
> particular.
>
>So I'm also entertaining the notion that there's an actual prefetch
> data corruption, not just the known AMD bug with occasional
> spurious page faults. Who else has seen the problem? What CPU's are
> involved?
>
>		Linus

If we run it down to that, can I bounce it back at AMD as defective?  
Or can it be coded around?  If its bugging my Athlon 2800XP, then I'd 
have to assume (dmesg says its stepping 00 FWIW) that I'm far from 
alone.  AMD is peddling these just like Orville R. sells popcorn.  
And, from previous experience with this particular vendor, if I give 
convincing proof the chip really is from a defective run, I'd suspect 
a replacement would be in a fedex bag & headed my way before the day 
is out.  But I'd need proof of the problem.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-05  7:25   ` Linus Torvalds
  2004-08-05  7:31     ` Andrew Morton
  2004-08-05  8:33     ` Denis Vlasenko
@ 2004-08-06  2:50     ` Linus Torvalds
  2004-08-06  3:18       ` viro
  2 siblings, 1 reply; 146+ messages in thread
From: Linus Torvalds @ 2004-08-06  2:50 UTC (permalink / raw)
  To: Gene Heskett; +Cc: Kernel Mailing List, Andrew Morton



On Thu, 5 Aug 2004, Linus Torvalds wrote:
> 
> I _suspect_ you hit the new "list_del-debug.patch" in Andrew's tree, 
> because in my tree there are no BUG_ON's in prune_cache() at all. 

Hmm.. I'm starting to have a wild suspicion here.

Let's look at this: the d_lru list is used for dentries that have had 
their count go down to zero in dput() (which is most of them, actually), 
and we do _not_ move them away from the unused list when we increment 
their count again, because we don't want to take the dcache lock in the 
critical lookup region.

So what we do to determine whether they are on the list or not is not to
look at the count, but is we mark dentries that are _not_ on the LRU list
by making their d_lru list be empty. This is why the dcache code uses the
"careful"  delete function "list_del_init()" a lot - because when we
remove the dentry from the unused list, we really need to _mark_ it
removed. Then the removal code does

	if (!list_empty(&dentry->d_lru)) ..

Fine. HOWEVER. Sometimes we use the plain "list_del()", because we know 
that we're going to throw the dentry away. And in shrink_dcache_anon() we 
do it because we expect to add it back to the dentry list.

BUT WE DON'T ALWAYS DO THAT!

So as far as I can tell, shrink_dcache_anon() will have _removed_ a dentry 
from the unused_list, but still left the dentry with wild pointers 
pointing to other dentries. Next time around we do a dput() on such a 
dentry, we'll be screwed, because we'll try to remove it again. Boom.

Does anybody see why this isn't a serious dentry list corruption case? Or 
am I just crazy?

But if I'm right, this particular bug should only hit you if you export a
filesystem through knfsd (I don't see how you'd get an anonymous dentry 
any other way). Oh. XFS with some of the magic ioctls will do it too. But 
I don't think Gene had either of those enabled, so..

But this may explain _some_ of the dcache problems, and maybe we have 
more than one bug here. Comments? Am I getting senile?

		Linus

----
===== fs/dcache.c 1.88 vs edited =====
--- 1.88/fs/dcache.c	2004-06-24 01:55:55 -07:00
+++ edited/fs/dcache.c	2004-08-05 19:35:03 -07:00
@@ -628,7 +628,7 @@
 			struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
 			if (!list_empty(&this->d_lru)) {
 				dentry_stat.nr_unused--;
-				list_del(&this->d_lru);
+				list_del_init(&this->d_lru);
 			}
 
 			/* 

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06  2:50     ` Linus Torvalds
@ 2004-08-06  3:18       ` viro
  2004-08-06  3:24         ` Linus Torvalds
  0 siblings, 1 reply; 146+ messages in thread
From: viro @ 2004-08-06  3:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Gene Heskett, Kernel Mailing List, Andrew Morton

On Thu, Aug 05, 2004 at 07:50:28PM -0700, Linus Torvalds wrote:
 
> So as far as I can tell, shrink_dcache_anon() will have _removed_ a dentry 
> from the unused_list, but still left the dentry with wild pointers 
> pointing to other dentries. Next time around we do a dput() on such a 
> dentry, we'll be screwed, because we'll try to remove it again. Boom.

It doesn't even take a dput().  Look: we do list_del(), then notice that
sucker still has positive refcount and leave it alone.  Now think what
happens on the next pass.  That's right, we hit that dentry *again*.
And see that list_empty() is false.  And do list_del() one more time.

However, what used to be e.g. next dentry might very well be freed by
now.  *BOOM*.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06  3:18       ` viro
@ 2004-08-06  3:24         ` Linus Torvalds
  2004-08-08  4:42           ` Gene Heskett
  2004-08-08 14:30           ` Gene Heskett
  0 siblings, 2 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-06  3:24 UTC (permalink / raw)
  To: viro; +Cc: Gene Heskett, Kernel Mailing List, Andrew Morton



On Fri, 6 Aug 2004 viro@parcelfarce.linux.theplanet.co.uk wrote:
> 
> It doesn't even take a dput().  Look: we do list_del(), then notice that
> sucker still has positive refcount and leave it alone.  Now think what
> happens on the next pass.  That's right, we hit that dentry *again*. And
> see that list_empty() is false.  And do list_del() one more time.

Well, the sad part is that doing another list_del() won't even necessarily 
go *boom*. Most of the time it might even leave the list as-is, but often 
enough it should give list corruption.

> However, what used to be e.g. next dentry might very well be freed by
> now.  *BOOM*.

Absolutely. It does look like a rather nasty bug.

It doesn't explain what Gene sees, though, unless you can explain how we'd 
get an anon dentry without knfsd/xfs. Oh well.

I'll commit the obvious one-liner fix, since it might explain _some_ 
problems people have seen.

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]               ` <20040806004231.143c8bd2.akpm@osdl.org>
@ 2004-08-06  8:27                 ` Ingo Molnar
  2004-08-06 11:51                 ` Gene Heskett
  1 sibling, 0 replies; 146+ messages in thread
From: Ingo Molnar @ 2004-08-06  8:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, vda, gene.heskett, linux-kernel, ak


* Andrew Morton <akpm@osdl.org> wrote:

> Ingo Molnar <mingo@elte.hu> wrote:
> >
> > [btw., it would be nice to dump
> >  instructions prior the crash point so that we could know precisely what
> >  prefetch instruction the kernel included.]
> 
> I've had a patch (from Keith) to do that in -mm for over a year, and
> ksymoops has supported it for that long.  But I think Linus has some
> problem-which-I-never-understood with the whole idea.

There were some more naive patches around previously i believe and those
problems are solved in this patch: the dump splits the pre-crash and
post-crash instruction stream decoding, so crash-EIP decoding is never
unreliable.

>  25-akpm/arch/i386/kernel/traps.c |   18 ++++++++++--------
>  1 files changed, 10 insertions(+), 8 deletions(-)

a strong ack from me.

Signed-off-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]             ` <20040806073739.GA6617@elte.hu>
       [not found]               ` <20040806004231.143c8bd2.akpm@osdl.org>
@ 2004-08-06 11:31               ` Andi Kleen
  2004-08-06 17:16               ` Linus Torvalds
  2 siblings, 0 replies; 146+ messages in thread
From: Andi Kleen @ 2004-08-06 11:31 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: torvalds, vda, gene.heskett, linux-kernel, akpm

On Fri, 6 Aug 2004 09:37:39 +0200
Ingo Molnar <mingo@elte.hu> wrote:


> 
> ebx is 00000008, it came in from (%esi), which is (0xc20a7b30) - that
> looks like a valid pointer.
> 
> to me this crash seems to imply prefetch.


Can you add the following patch and see if it triggers at all? 

Maybe it is just the software prefetch fault handler that is somehow buggy.
There was a change there recently to handle NX, maybe that broke something.

Also testing with prefetch disabled (see my earlier patch) may also be useful
just to see if it triggers then too.

-Andi

diff -u linux-2.6.8rc2-update/arch/i386/mm/fault.c-o linux-2.6.8rc2-update/arch/i386/mm/fault.c
--- linux-2.6.8rc2-update/arch/i386/mm/fault.c-o	2004-07-28 02:23:24.000000000 +0200
+++ linux-2.6.8rc2-update/arch/i386/mm/fault.c	2004-08-05 22:20:02.000000000 +0200
@@ -21,6 +21,7 @@
 #include <linux/vt_kern.h>		/* For unblank_screen() */
 #include <linux/highmem.h>
 #include <linux/module.h>
+#include <linux/kallsyms.h>
 
 #include <asm/system.h>
 #include <asm/uaccess.h>
@@ -185,6 +186,12 @@
 			break;
 		} 
 	}
+
+	if (prefetch) {		
+		printk("corrected prefetch fault at %lx ", addr);
+		print_symbol("eip %s\n", regs->eip);
+	} 
+
 	return prefetch;
 }
 
@@ -193,6 +200,9 @@
 {
 	if (unlikely(boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
 		     boot_cpu_data.x86 >= 6)) {
+		printk("possible prefetch fault at %lx ", addr);
+		print_symbol("eip %s\n", regs->eip);
+
 		/* Catch an obscure case of prefetch inside an NX page. */
 		if (nx_enabled && (error_code & 16))
 			return 0;





^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]               ` <20040806004231.143c8bd2.akpm@osdl.org>
  2004-08-06  8:27                 ` Ingo Molnar
@ 2004-08-06 11:51                 ` Gene Heskett
  2004-08-06 16:58                   ` Linus Torvalds
  1 sibling, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-06 11:51 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, Ingo Molnar, torvalds, vda, ak

On Friday 06 August 2004 03:42, Andrew Morton wrote:
>Ingo Molnar <mingo@elte.hu> wrote:
>> [btw., it would be nice to dump
>>  instructions prior the crash point so that we could know
>> precisely what prefetch instruction the kernel included.]
>
>I've had a patch (from Keith) to do that in -mm for over a year, and
>ksymoops has supported it for that long.  But I think Linus has some
>problem-which-I-never-understood with the whole idea.
>
>
>
>
>This teaches the i386 oops dumper to dump opcodes preceding and
> after the offending EIP.  Supporting code against ksymoops has been
> tested and produces output like the below.
>
>Support for this was added to ksymoops-2.4.9.
>
>Note that ksymoops will guarantee that the disassembly after the
> <eip> value is always in sync - if the disassembly from the start
> of the Code: line does not sync up with the EIP address ksymoops
> will perform the resync.
>
>
>Warning (merge_maps): no symbols in merged map
>Mar 18 23:47:36 vmm kernel: kernel BUG at fs/open.c:802!
>Mar 18 23:47:36 vmm kernel: invalid operand: 0000 [#1]
>Mar 18 23:47:36 vmm kernel: CPU:    0
>Mar 18 23:47:36 vmm kernel: EIP:    0060:[<c014fedf>] VLI    Not
> tainted Using defaults from ksymoops -t elf32-i386 -a i386
>Mar 18 23:47:36 vmm kernel: EFLAGS: 00010246
>Mar 18 23:47:36 vmm kernel: eax: ccdfb900   ebx: 4001020d   ecx:
> 00000000   edx: 0000007b Mar 18 23:47:36 vmm kernel: esi: 00000000 
>  edi: bfffdd70   ebp: ccdfdfbc   esp: ccdfdfb0 Mar 18 23:47:36 vmm
> kernel: ds: 007b   es: 007b   ss: 0068
>Mar 18 23:47:36 vmm kernel: Stack: 4001020d 00000000 bfffdd70
> ccdfc000 c0109213 4001020d 00000000 00000003 Mar 18 23:47:36 vmm
> kernel:        00000000 bfffdd70 bfffdc88 00000005 0000007b
> 0000007b 00000005 4000ef94 Mar 18 23:47:36 vmm kernel:       
> 00000073 00000206 bfffdbd8 0000007b Mar 18 23:47:36 vmm kernel:
> Call Trace:
>Mar 18 23:47:36 vmm kernel:  [<c0109213>] syscall_call+0x7/0xb
>Mar 18 23:47:36 vmm kernel: Code: 14 98 f0 81 41 04 00 00 00 01 5b
> 89 ec 5d c3 90 b8 00 e0 ff ff 21 e0 55 89 e5 57 56 53 8b 00 81 b8
> e4 01 00 00 0f 27 00 00 75 08 <0f> 0b 22 03 85 18 2f c0 8b 45 08 50
> e8 30 d4 00 00 89 c7 83 c4
>
>>>EIP; c014fedf No symbols available   <=====
>
>Trace; c0109213 No symbols available
>
>This architecture has variable length instructions, decoding before
> eip is unreliable, take these instructions with a pinch of salt.
>
>Code;  c014feb4 No symbols available
>00000000 <_EIP>:
>Code;  c014feb4 No symbols available
>   0:   14 98                     adc    $0x98,%al
>Code;  c014feb6 No symbols available
>   2:   f0 81 41 04 00 00 00      lock addl $0x1000000,0x4(%ecx)
>Code;  c014febd No symbols available
>   9:   01
>Code;  c014febe No symbols available
>   a:   5b                        pop    %ebx
>Code;  c014febf No symbols available
>   b:   89 ec                     mov    %ebp,%esp
>Code;  c014fec1 No symbols available
>   d:   5d                        pop    %ebp
>Code;  c014fec2 No symbols available
>   e:   c3                        ret
>Code;  c014fec3 No symbols available
>   f:   90                        nop
>Code;  c014fec4 No symbols available
>  10:   b8 00 e0 ff ff            mov    $0xffffe000,%eax
>Code;  c014fec9 No symbols available
>  15:   21 e0                     and    %esp,%eax
>Code;  c014fecb No symbols available
>  17:   55                        push   %ebp
>Code;  c014fecc No symbols available
>  18:   89 e5                     mov    %esp,%ebp
>Code;  c014fece No symbols available
>  1a:   57                        push   %edi
>Code;  c014fecf No symbols available
>  1b:   56                        push   %esi
>Code;  c014fed0 No symbols available
>  1c:   53                        push   %ebx
>Code;  c014fed1 No symbols available
>  1d:   8b 00                     mov    (%eax),%eax
>Code;  c014fed3 No symbols available
>  1f:   81 b8 e4 01 00 00 0f      cmpl   $0x270f,0x1e4(%eax)
>Code;  c014feda No symbols available
>  26:   27 00 00
>Code;  c014fedd No symbols available
>  29:   75 08                     jne    33 <_EIP+0x33> c014fee7 No
> symbols available
>
>This decode from eip onwards should be reliable
>
>Code;  c014fedf No symbols available
>00000000 <_EIP>:
>Code;  c014fedf No symbols available   <=====
>   0:   0f 0b                     ud2a      <=====
>Code;  c014fee1 No symbols available
>   2:   22 03                     and    (%ebx),%al
>Code;  c014fee3 No symbols available
>   4:   85 18                     test   %ebx,(%eax)
>Code;  c014fee5 No symbols available
>   6:   2f                        das
>Code;  c014fee6 No symbols available
>   7:   c0 8b 45 08 50 e8 30      rorb   $0x30,0xe8500845(%ebx)
>Code;  c014feed No symbols available
>   e:   d4 00                     aam    $0x0
>Code;  c014feef No symbols available
>  10:   00                        .byte 0x0
>Code;  c014fef0 No symbols available
>  11:   89 c7                     mov    %eax,%edi
>Code;  c014fef2 No symbols available
>  13:   83                        .byte 0x83
>Code;  c014fef3 No symbols available
>  14:   c4                        .byte 0xc4
>
>
>
>Signed-off-by: Andrew Morton <akpm@osdl.org>
>---
>
> 25-akpm/arch/i386/kernel/traps.c |   18 ++++++++++--------
> 1 files changed, 10 insertions(+), 8 deletions(-)
>
>diff -puN arch/i386/kernel/traps.c~oops-dump-preceding-code
> arch/i386/kernel/traps.c ---
> 25/arch/i386/kernel/traps.c~oops-dump-preceding-code	2004-06-28
> 00:47:26.807038944 -0700 +++
> 25-akpm/arch/i386/kernel/traps.c	2004-06-28 00:47:26.812038184
> -0700 @@ -250,7 +250,7 @@ void show_registers(struct pt_regs *regs
> ss = regs->xss & 0xffff;
> 	}
> 	print_modules();
>-	printk("CPU:    %d\nEIP:    %04x:[<%08lx>]    %s\nEFLAGS: %08lx"
>+	printk("CPU:    %d\nEIP:    %04x:[<%08lx>]    %s VLI\nEFLAGS:
> %08lx" "   (%s) \n",
> 		smp_processor_id(), 0xffff & regs->xcs, regs->eip,
> 		print_tainted(), regs->eflags, UTS_RELEASE);
>@@ -268,23 +268,25 @@ void show_registers(struct pt_regs *regs
> 	 * time of the fault..
> 	 */
> 	if (in_kernel) {
>+		u8 *eip;
>
> 		printk("\nStack: ");
> 		show_stack(NULL, (unsigned long*)esp);
>
> 		printk("Code: ");
>-		if(regs->eip < PAGE_OFFSET)
>-			goto bad;
>
>-		for(i=0;i<20;i++)
>-		{
>+		eip = (u8 *)regs->eip - 43;
>+		for (i = 0; i < 64; i++, eip++) {
> 			unsigned char c;
>-			if(__get_user(c, &((unsigned char*)regs->eip)[i])) {
>-bad:
>+
>+			if (eip < (u8 *)PAGE_OFFSET || __get_user(c, eip)) {
> 				printk(" Bad EIP value.");
> 				break;
> 			}
>-			printk("%02x ", c);
>+			if (eip == (u8 *)regs->eip)
>+				printk("<%02x> ", c);
>+			else
>+				printk("%02x ", c);
> 		}
> 	}
> 	printk("\n");
>_

Veddy veddy Interestink.

Linus, Andrew, should I apply this patch too at the next remake?

FWIW, I'm still up (20:38) this morning, and showing plenty (127+ 
megs) of free memory.  No crash, no odd log (other than samba 
squawking about some option thats been changed & I haven't fixed the 
smb.conf) so far.

I'm beginning to like this test patch, Linus, thanks :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06 11:51                 ` Gene Heskett
@ 2004-08-06 16:58                   ` Linus Torvalds
  2004-08-06 17:16                     ` Gene Heskett
  2004-08-06 23:09                     ` Chris Shoemaker
  0 siblings, 2 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-06 16:58 UTC (permalink / raw)
  To: Gene Heskett
  Cc: Kernel Mailing List, Andrew Morton, Ingo Molnar, vda, ak,
	Chris Shoemaker, William Lee Irwin III



On Fri, 6 Aug 2004, Gene Heskett wrote:
> 
> Linus, Andrew, should I apply this patch too at the next remake?

Might be worth it, but it's more important to see any oops at all, or lack 
of oopses..

> FWIW, I'm still up (20:38) this morning, and showing plenty (127+ 
> megs) of free memory.  No crash, no odd log (other than samba 
> squawking about some option thats been changed & I haven't fixed the 
> smb.conf) so far.
> 
> I'm beginning to like this test patch, Linus, thanks :)

If the only thing you have done is add the list_del_init() debugging 
patch, then the only thing that has changed is really the access patterns 
to uncached memory.

The original list_del_init() tries to only do a few single _writes_ to the 
dentries around it. The added debugging will do _reads_ (and thus bring it 
into the cache) of the dentry pointers of the dentries around it.

If that change makes a real difference, I really only see two 
possibilities:
 - there really is a prefetch bug (or possibly, there's a bug in our 
   prefetch fixup code, and the known prefetch bug just triggers the 
   problem indirectly)
 - it just changes the timing enough that whatever bug you hit went away.

Now, Chris Shoemaker reported dentry problems on a intel CPU and said that 
wli had seen something too, but I'm wondering whether Chris and wli might 
have been seeing the knfsd/xfs-related dentry bug that I found yesterday.  
So I think the prefetch theory is still alive, but we should check with 
Chris. Chris?

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06 16:58                   ` Linus Torvalds
@ 2004-08-06 17:16                     ` Gene Heskett
  2004-08-06 17:26                       ` William Lee Irwin III
  2004-08-06 23:09                     ` Chris Shoemaker
  1 sibling, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-06 17:16 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak,
	Chris Shoemaker, William Lee Irwin III

On Friday 06 August 2004 12:58, Linus Torvalds wrote:
>On Fri, 6 Aug 2004, Gene Heskett wrote:
>> Linus, Andrew, should I apply this patch too at the next remake?
>
>Might be worth it, but it's more important to see any oops at all,
> or lack of oopses..
>
>> FWIW, I'm still up (20:38) this morning, and showing plenty (127+
>> megs) of free memory.  No crash, no odd log (other than samba
>> squawking about some option thats been changed & I haven't fixed
>> the smb.conf) so far.
>>
>> I'm beginning to like this test patch, Linus, thanks :)
>
>If the only thing you have done is add the list_del_init() debugging
>patch, then the only thing that has changed is really the access
> patterns to uncached memory.
>
>The original list_del_init() tries to only do a few single _writes_
> to the dentries around it. The added debugging will do _reads_ (and
> thus bring it into the cache) of the dentry pointers of the
> dentries around it.
>
>If that change makes a real difference, I really only see two
>possibilities:
> - there really is a prefetch bug (or possibly, there's a bug in our
>   prefetch fixup code, and the known prefetch bug just triggers the
>   problem indirectly)
> - it just changes the timing enough that whatever bug you hit went
> away.
>
>Now, Chris Shoemaker reported dentry problems on a intel CPU and
> said that wli had seen something too, but I'm wondering whether
> Chris and wli might have been seeing the knfsd/xfs-related dentry
> bug that I found yesterday. So I think the prefetch theory is still
> alive, but we should check with Chris. Chris?
>
>		Linus

I'm still up, a bit over 24 hours now. :)  Free memory is slowly going 
away, I ran mozilla for a while which got rid of about 60 megs, and 
now I see I'm down to 23 free, whereas at the 11 hour up marker I had 
nearly 130 megs free yet.  I've got to go to town, so that will leave 
seti and kmail doing their thing till I get back.  If it goes down, 
hopefully it will record something, unlike the last couple of times.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]             ` <20040806073739.GA6617@elte.hu>
       [not found]               ` <20040806004231.143c8bd2.akpm@osdl.org>
  2004-08-06 11:31               ` Andi Kleen
@ 2004-08-06 17:16               ` Linus Torvalds
  2 siblings, 0 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-06 17:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Denis Vlasenko, Gene Heskett, Kernel Mailing List, Andrew Morton,
	Andi Kleen



On Fri, 6 Aug 2004, Ingo Molnar wrote:
> 
> last night i ran another overnight test: 2.6.8-rc3-vanilla with
> CONFIG_PREEMPT enabled and no other changes. I've also reduced the CPU's
> clock speed by 5% to reduce the chance of hw problems. The crash below
> triggered after roughly 12 hours of runtime. I've also attached the full
> disassembly of __d_lookup(). The crash happens in hlist_for_each():
> 
>  c01632f3:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi
>  c01632f9:	8d bc 27 00 00 00 00 	lea    0x0(%edi,1),%edi
>  c0163300:	8b 03                	mov    (%ebx),%eax <==== [*]
> 
> the crashing instruction is preceeded by two prefetch instructions (the
> disassembly has the alternate-insn NOP).

That's not right.

The prefetchnta instruction is three or four bytes long (four if it uses 
the ebp register that needs the "0(ebp)" modrm format).

We use a NOP4 for space in there, and the things you point to are a 
NOP6+NOP7 pair.

Your two nop's are the ones gcc has inserted in order to start the loop at 
a 16-byte boundary (ie c0163300 is the top of the loop). The nop that gets 
replaced by a prefetch is the instruction _after_ the one that faulted for 
you:

	8b 03                   mov    (%ebx),%eax
	8d 74 26 00             lea    0x0(%esi,1),%esi

I think.

> to me this crash seems to imply prefetch.

I don't think it's obvious yet. It's close to the prefetch, but it's the 
instruction just before. Which in an OoO CPU doesn't necessarily mean 
much, of course - or it could be that the prefetch caused some trouble 
last time around the loop and we only see it now.

Or it could be totally prefetch-unrelated. I do find the prefetch thing 
intriguing, though.

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06 17:16                     ` Gene Heskett
@ 2004-08-06 17:26                       ` William Lee Irwin III
  2004-08-06 23:19                         ` Chris Shoemaker
  0 siblings, 1 reply; 146+ messages in thread
From: William Lee Irwin III @ 2004-08-06 17:26 UTC (permalink / raw)
  To: Gene Heskett
  Cc: linux-kernel, Linus Torvalds, Andrew Morton, Ingo Molnar, vda,
	ak, Chris Shoemaker

On Friday 06 August 2004 12:58, Linus Torvalds wrote:
>> Now, Chris Shoemaker reported dentry problems on a intel CPU and
>> said that wli had seen something too, but I'm wondering whether
>> Chris and wli might have been seeing the knfsd/xfs-related dentry
>> bug that I found yesterday. So I think the prefetch theory is still
>> alive, but we should check with Chris. Chris?

On Fri, Aug 06, 2004 at 01:16:24PM -0400, Gene Heskett wrote:
> I'm still up, a bit over 24 hours now. :)  Free memory is slowly going 
> away, I ran mozilla for a while which got rid of about 60 megs, and 
> now I see I'm down to 23 free, whereas at the 11 hour up marker I had 
> nearly 130 megs free yet.  I've got to go to town, so that will leave 
> seti and kmail doing their thing till I get back.  If it goes down, 
> hopefully it will record something, unlike the last couple of times.

I've not had issues around the dcache for quite some time, I think not
since the 2.5.65 timeframe. IIRC maneesh and dipankar had some fixes
that resolved all my issues not long afterward. So unfortunately I have
nothing strictly dcache-related to report. Chris may have been
referring to some potentially pathological NFS behavior I've seen for a
long time centered around extended periods of knfsd unresponsiveness.

-- wli

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06 16:58                   ` Linus Torvalds
  2004-08-06 17:16                     ` Gene Heskett
@ 2004-08-06 23:09                     ` Chris Shoemaker
  2004-08-07  6:20                       ` Linus Torvalds
  1 sibling, 1 reply; 146+ messages in thread
From: Chris Shoemaker @ 2004-08-06 23:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Ingo Molnar,
	vda, ak, William Lee Irwin III

[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

On Fri, Aug 06, 2004 at 09:58:35AM -0700, Linus Torvalds wrote:
> 
> Now, Chris Shoemaker reported dentry problems on a intel CPU and said that 
> wli had seen something too, but I'm wondering whether Chris and wli might 
> have been seeing the knfsd/xfs-related dentry bug that I found yesterday.  
> So I think the prefetch theory is still alive, but we should check with 
> Chris. Chris?
> 
> 		Linus

My oopses were not related to nfs or xfs.  I don't use either of these
on this box.

In the interest of contributing more than conspiracy theories, I'm
trying to dig up some records of the dcache problems I was having.
Unfortunately, a period of low free disk space led to some aggressive
"cleaning" on my part since then.  :(

I _was_ able to find the attached oops, but I don't think I have the
corresponding object files, so I hope the decoding it contains is
good enough. 

Just ask if you need some more info.

-chris


[-- Attachment #2: Mar17.4.txt --]
[-- Type: text/plain, Size: 2190 bytes --]

Mar 17 16:42:01 peace kernel: Unable to handle kernel paging request at virtual address 0b7eec1c
Mar 17 16:42:01 peace kernel:  printing eip:
Mar 17 16:42:01 peace kernel: c01a6667
Mar 17 16:42:01 peace kernel: *pde = 00000000
Mar 17 16:42:01 peace kernel: Oops: 0000 [#1]
Mar 17 16:42:01 peace kernel: PREEMPT DEBUG_PAGEALLOC
Mar 17 16:42:01 peace kernel: CPU:    0
Mar 17 16:42:01 peace kernel: EIP:    0060:[iput+23/112]    Not tainted
Mar 17 16:42:01 peace kernel: EFLAGS: 00010202
Mar 17 16:42:01 peace kernel: EIP is at iput+0x17/0x70
Mar 17 16:42:01 peace kernel: eax: 0b7eebf8   ebx: c33fee3c   ecx: c33fee4c   edx: c33fee4c
Mar 17 16:42:01 peace kernel: esi: c33f2ef8   edi: cba32000   ebp: cba33e54   esp: cba33e50
Mar 17 16:42:01 peace kernel: ds: 007b   es: 007b   ss: 0068
Mar 17 16:42:01 peace kernel: Process kswapd0 (pid: 7, threadinfo=cba32000 task=cba559e0)
Mar 17 16:42:01 peace kernel: Stack: c33fee3c cba33e88 c019f540 00000066 cba33e60 cba33e60 00000000 00000001 
Mar 17 16:42:01 peace kernel:        00000000 c11bcc40 0000003c 00000080 cba32000 0000009c cba33e90 c01a06d7 
Mar 17 16:42:01 peace kernel:        cba33ec4 c01612d8 000e1048 00000000 000079d9 0000001d 00000000 cbffb654 
Mar 17 16:42:01 peace kernel: Call Trace:
Mar 17 16:42:01 peace kernel:  [prune_dcache+1120/1952] prune_dcache+0x460/0x7a0
Mar 17 16:42:01 peace kernel:  [shrink_dcache_memory+23/32] shrink_dcache_memory+0x17/0x20
Mar 17 16:42:01 peace kernel:  [shrink_slab+280/368] shrink_slab+0x118/0x170
Mar 17 16:42:01 peace kernel:  [balance_pgdat+492/528] balance_pgdat+0x1ec/0x210
Mar 17 16:42:01 peace kernel:  [kswapd+220/240] kswapd+0xdc/0xf0
Mar 17 16:42:01 peace kernel:  [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40
Mar 17 16:42:01 peace kernel:  [ret_from_fork+6/20] ret_from_fork+0x6/0x14
Mar 17 16:42:01 peace kernel:  [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40
Mar 17 16:42:01 peace kernel:  [kswapd+0/240] kswapd+0x0/0xf0
Mar 17 16:42:01 peace kernel:  [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc
Mar 17 16:42:01 peace kernel: 
Mar 17 16:42:01 peace kernel: Code: 8b 40 24 74 4a 85 c0 74 07 8b 50 14 85 d2 75 39 8d 43 1c ba 

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06 17:26                       ` William Lee Irwin III
@ 2004-08-06 23:19                         ` Chris Shoemaker
  2004-08-07  4:15                           ` William Lee Irwin III
  0 siblings, 1 reply; 146+ messages in thread
From: Chris Shoemaker @ 2004-08-06 23:19 UTC (permalink / raw)
  To: William Lee Irwin III, Gene Heskett, linux-kernel,
	Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak

On Fri, Aug 06, 2004 at 10:26:07AM -0700, William Lee Irwin III wrote:
> On Friday 06 August 2004 12:58, Linus Torvalds wrote:
> >> Now, Chris Shoemaker reported dentry problems on a intel CPU and
> >> said that wli had seen something too, but I'm wondering whether
> >> Chris and wli might have been seeing the knfsd/xfs-related dentry
> >> bug that I found yesterday. So I think the prefetch theory is still
> >> alive, but we should check with Chris. Chris?
> 
> I've not had issues around the dcache for quite some time, I think not
> since the 2.5.65 timeframe. IIRC maneesh and dipankar had some fixes
> that resolved all my issues not long afterward. So unfortunately I have
> nothing strictly dcache-related to report. Chris may have been
> referring to some potentially pathological NFS behavior I've seen for a
> long time centered around extended periods of knfsd unresponsiveness.
> 
> -- wli

I was referring to:
http://www.ussg.iu.edu/hypermail/linux/kernel/0406.2/0410.html

...doesn't look NFS-related to me.  OTOH, it does bear some resemblance
to some other oopses floating around.  Did you solve this one?

-chris


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-07  4:15                           ` William Lee Irwin III
@ 2004-08-07  0:05                             ` Chris Shoemaker
  2004-08-07  5:50                               ` William Lee Irwin III
  0 siblings, 1 reply; 146+ messages in thread
From: Chris Shoemaker @ 2004-08-07  0:05 UTC (permalink / raw)
  To: William Lee Irwin III, Gene Heskett, linux-kernel,
	Linus Torvalds, Andrew Morton, Ingo Molnar, vda, ak

On Fri, Aug 06, 2004 at 09:15:50PM -0700, William Lee Irwin III wrote:
> On Fri, Aug 06, 2004 at 10:26:07AM -0700, William Lee Irwin III wrote:
> >> I've not had issues around the dcache for quite some time, I think not
> >> since the 2.5.65 timeframe. IIRC maneesh and dipankar had some fixes
> >> that resolved all my issues not long afterward. So unfortunately I have
> >> nothing strictly dcache-related to report. Chris may have been
> >> referring to some potentially pathological NFS behavior I've seen for a
> >> long time centered around extended periods of knfsd unresponsiveness.
> 
> On Fri, Aug 06, 2004 at 07:19:02PM -0400, Chris Shoemaker wrote:
> > I was referring to:
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0406.2/0410.html
> > ...doesn't look NFS-related to me.  OTOH, it does bear some resemblance
> > to some other oopses floating around.  Did you solve this one?
> 
> I've not seen this ever again after some point, and don't recall enough
> of the context/etc. to say much about what was going on with it.
> 
> -- wli

I know what you mean.  Sometimes I don't know which bothers me more, the
oopses that inexplicably DON'T come back, or the ones that DO.

Perchance, have you added RAM since the oops, or changed the machine's
memory-related behavior?

-chris




^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]           ` <200408070203.35268.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-08-07  1:28             ` Gene Heskett
  0 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-07  1:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Denis Vlasenko

On Friday 06 August 2004 19:03, Denis Vlasenko wrote:
>Hi Gene,
>
>Please do not remove my address from To or CC
>fields, I can miss your emails otherwise.
>
Denis:
Mmm, sorry.  I was in the habit of using a button on kmail that 
replies only to the mailing list, thinking that then I wasn't 
bombarding everyone with 2 or more copies of my replies.  I've now 
re-arranged it so that I have a "reply all" button, and will use that 
one from now on unless the subject is really OT.

Linus:
One comment re the patch, I'm seeing a huge slowdown in the seti 
processing, its only done about 2.5 units since 6am local, and it 
should be well into the 4th by now.

Anybody:
Speaking of somewhat OT, what is the command I should use to actually 
turn on the PREEMPT option in the kernel?  Its on in the compile, but 
I think I read someplace where I had to do an "echo 1 >someplace 
in /proc" to actually enable it.  I've survived over 24 hours now 
with the patch Linus sent, and I thought maybe I'd get some exersize 
pushing my luck :)

[...]

-- 
Cheers Denis, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06 23:19                         ` Chris Shoemaker
@ 2004-08-07  4:15                           ` William Lee Irwin III
  2004-08-07  0:05                             ` Chris Shoemaker
  0 siblings, 1 reply; 146+ messages in thread
From: William Lee Irwin III @ 2004-08-07  4:15 UTC (permalink / raw)
  To: Chris Shoemaker
  Cc: Gene Heskett, linux-kernel, Linus Torvalds, Andrew Morton,
	Ingo Molnar, vda, ak

On Fri, Aug 06, 2004 at 10:26:07AM -0700, William Lee Irwin III wrote:
>> I've not had issues around the dcache for quite some time, I think not
>> since the 2.5.65 timeframe. IIRC maneesh and dipankar had some fixes
>> that resolved all my issues not long afterward. So unfortunately I have
>> nothing strictly dcache-related to report. Chris may have been
>> referring to some potentially pathological NFS behavior I've seen for a
>> long time centered around extended periods of knfsd unresponsiveness.

On Fri, Aug 06, 2004 at 07:19:02PM -0400, Chris Shoemaker wrote:
> I was referring to:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0406.2/0410.html
> ...doesn't look NFS-related to me.  OTOH, it does bear some resemblance
> to some other oopses floating around.  Did you solve this one?

I've not seen this ever again after some point, and don't recall enough
of the context/etc. to say much about what was going on with it.


-- wli

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-07  0:05                             ` Chris Shoemaker
@ 2004-08-07  5:50                               ` William Lee Irwin III
  0 siblings, 0 replies; 146+ messages in thread
From: William Lee Irwin III @ 2004-08-07  5:50 UTC (permalink / raw)
  To: Chris Shoemaker
  Cc: Gene Heskett, linux-kernel, Linus Torvalds, Andrew Morton,
	Ingo Molnar, vda, ak

On Fri, Aug 06, 2004 at 09:15:50PM -0700, William Lee Irwin III wrote:
>> I've not seen this ever again after some point, and don't recall enough
>> of the context/etc. to say much about what was going on with it.

On Fri, Aug 06, 2004 at 08:05:21PM -0400, Chris Shoemaker wrote:
> I know what you mean.  Sometimes I don't know which bothers me more, the
> oopses that inexplicably DON'T come back, or the ones that DO.
> Perchance, have you added RAM since the oops, or changed the machine's
> memory-related behavior?

Neither. Only the kernel has changed. Upon closer inspection, local
changes with direct impact on the inode cache are likely suspects.


-- wli

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06 23:09                     ` Chris Shoemaker
@ 2004-08-07  6:20                       ` Linus Torvalds
  2004-08-07 12:38                         ` Gene Heskett
  2004-08-07 13:44                         ` Chris Shoemaker
  0 siblings, 2 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-07  6:20 UTC (permalink / raw)
  To: Chris Shoemaker
  Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Ingo Molnar,
	vda, ak, William Lee Irwin III



On Fri, 6 Aug 2004, Chris Shoemaker wrote:
> 
> I _was_ able to find the attached oops, but I don't think I have the
> corresponding object files, so I hope the decoding it contains is
> good enough. 

It's fine.

It oopses on

	inode->i_sb->s_op

where "i_sb" is bad and contains the pointer "0x0b7eebf8" which is 
definitely not a valid kernel pointer.

There's a few other strange details in your oops report too. One being 
that the inode pointer (in %ebx, apparently) doesn't show on the stack 
where I'd expect it to show. Hmm. That might be just a different compiler 
issue, though.

Anyway, this does look somewhat like the ones Gene is seeing. If I had to 
guess, I'd guess that either the inode pointer is bad, or it's just stale 
from an inode that has already been free'd. Most likely because of 
prune_dcache() having had a corrupt LRU list with a stale/corrupt entry.

That would blow the prefetch theory out of the water. 

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-07  6:20                       ` Linus Torvalds
@ 2004-08-07 12:38                         ` Gene Heskett
  2004-08-07 13:44                         ` Chris Shoemaker
  1 sibling, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-07 12:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Chris Shoemaker, Andrew Morton, Ingo Molnar, vda,
	ak, William Lee Irwin III

On Saturday 07 August 2004 02:20, Linus Torvalds wrote:
>On Fri, 6 Aug 2004, Chris Shoemaker wrote:
>> I _was_ able to find the attached oops, but I don't think I have
>> the corresponding object files, so I hope the decoding it contains
>> is good enough.
>
>It's fine.
>
>It oopses on
>
>	inode->i_sb->s_op
>
>where "i_sb" is bad and contains the pointer "0x0b7eebf8" which is
>definitely not a valid kernel pointer.
>
>There's a few other strange details in your oops report too. One
> being that the inode pointer (in %ebx, apparently) doesn't show on
> the stack where I'd expect it to show. Hmm. That might be just a
> different compiler issue, though.
>
>Anyway, this does look somewhat like the ones Gene is seeing. If I
> had to guess, I'd guess that either the inode pointer is bad, or
> it's just stale from an inode that has already been free'd. Most
> likely because of prune_dcache() having had a corrupt LRU list with
> a stale/corrupt entry.
>
>That would blow the prefetch theory out of the water.
>
>		Linus

And I'm still up, no Oops yet.

08:34:07 up 1 day, 21:25,  4 users,  load average: 1.10, 1.08, 1.03

I've also only done 3 seti units since yesterday morning, about 40% to 
50%  of my usual production even with the crashes.  In other words, 
system seems stable, but old dog slow too.  & thats with top showing 
seti getting 97-99% of the cpu.  Ouch!

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-07  6:20                       ` Linus Torvalds
  2004-08-07 12:38                         ` Gene Heskett
@ 2004-08-07 13:44                         ` Chris Shoemaker
  2004-08-07 18:49                           ` Linus Torvalds
  2004-08-07 19:01                           ` Gene Heskett
  1 sibling, 2 replies; 146+ messages in thread
From: Chris Shoemaker @ 2004-08-07 13:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Ingo Molnar,
	vda, ak, William Lee Irwin III

[-- Attachment #1: Type: text/plain, Size: 1457 bytes --]

On Fri, Aug 06, 2004 at 11:20:28PM -0700, Linus Torvalds wrote:
> 
> 
> On Fri, 6 Aug 2004, Chris Shoemaker wrote:
> > 
> > I _was_ able to find the attached oops, but I don't think I have the
> > corresponding object files, so I hope the decoding it contains is
> > good enough. 
> 
> It's fine.

Well then, maybe you'd like more?  I attached two more from the same
period.  Please remember that these are 5 months old, and could
represent bugs already fixed.  I think this was stock 2.6.4.

> 
> It oopses on
> 
> 	inode->i_sb->s_op
> 
> where "i_sb" is bad and contains the pointer "0x0b7eebf8" which is 
> definitely not a valid kernel pointer.
> 
> There's a few other strange details in your oops report too. One being 
> that the inode pointer (in %ebx, apparently) doesn't show on the stack 
> where I'd expect it to show. Hmm. That might be just a different compiler 
> issue, though.

Perhaps due to CONFIG_REGPARM?  I haven't used it for quite a while, but
back in March I was a bit bolder about config options marked
experimental.  Gene, are you using REGPARM?

-chris

> 
> Anyway, this does look somewhat like the ones Gene is seeing. If I had to 
> guess, I'd guess that either the inode pointer is bad, or it's just stale 
> from an inode that has already been free'd. Most likely because of 
> prune_dcache() having had a corrupt LRU list with a stale/corrupt entry.
> 
> That would blow the prefetch theory out of the water. 
> 
> 		Linus

[-- Attachment #2: Mar17.2.txt --]
[-- Type: text/plain, Size: 4271 bytes --]

Mar 17 03:34:28 peace kernel: Unable to handle kernel paging request at virtual address 0034779d
Mar 17 03:34:28 peace kernel:  printing eip:
Mar 17 03:34:28 peace kernel: c0211e8f
Mar 17 03:34:28 peace kernel: *pde = 00000000
Mar 17 03:34:28 peace kernel: Oops: 0000 [#1]
Mar 17 03:34:28 peace kernel: PREEMPT DEBUG_PAGEALLOC
Mar 17 03:34:28 peace kernel: CPU:    0
Mar 17 03:34:28 peace kernel: EIP:    0060:[vsnprintf+799/1184]    Not tainted
Mar 17 03:34:28 peace kernel: EFLAGS: 00010097
Mar 17 03:34:28 peace kernel: EIP is at vsnprintf+0x31f/0x4a0
Mar 17 03:34:28 peace kernel: eax: 0034779d   ebx: 0000000a   ecx: 0034779d   edx: fffffffe
Mar 17 03:34:28 peace kernel: esi: c042d1fb   edi: 00000000   ebp: cba33cf4   esp: cba33cb8
Mar 17 03:34:28 peace kernel: ds: 007b   es: 007b   ss: 0068
Mar 17 03:34:29 peace kernel: Process kswapd0 (pid: 7, threadinfo=cba32000 task=cba559e0)
Mar 17 03:34:29 peace kernel: Stack: 000001a0 00000000 0000000a ffffffff 00000002 00000002 ffffffff ffffffff 
Mar 17 03:34:29 peace kernel:        c042d5df 00000400 c042d1e0 c033a3f2 00000400 00000246 c03419b3 cba33d04 
Mar 17 03:34:29 peace kernel:        c0212028 cba33d6c c042d1e0 cba33d54 c012a7b7 cba33d60 c10786b8 c1078690 
Mar 17 03:34:29 peace kernel: Call Trace:
Mar 17 03:34:29 peace kernel:  [vscnprintf+24/48] vscnprintf+0x18/0x30
Mar 17 03:34:29 peace kernel:  [printk+359/1008] printk+0x167/0x3f0
Mar 17 03:34:29 peace kernel:  [shrink_list+2259/2816] shrink_list+0x8d3/0xb00
Mar 17 03:34:29 peace kernel:  [shrink_cache+574/1664] shrink_cache+0x23e/0x680
Mar 17 03:34:29 peace kernel:  [balance_pgdat+401/528] balance_pgdat+0x191/0x210
Mar 17 03:34:29 peace kernel:  [kswapd+220/240] kswapd+0xdc/0xf0
Mar 17 03:34:29 peace kernel:  [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40
Mar 17 03:34:29 peace kernel:  [ret_from_fork+6/20] ret_from_fork+0x6/0x14
Mar 17 03:34:29 peace kernel:  [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40
Mar 17 03:34:29 peace kernel:  [kswapd+0/240] kswapd+0x0/0xf0
Mar 17 03:34:29 peace kernel:  [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc
Mar 17 03:34:29 peace kernel: 
Mar 17 03:34:29 peace kernel: Code: 80 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 83 e7 10 89 c3 75 
Mar 17 03:34:29 peace kernel:  <6>note: kswapd0[7] exited with preempt_count 2
Mar 17 03:34:29 peace kernel: Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
Mar 17 03:34:29 peace kernel: in_atomic():1, irqs_disabled():0
Mar 17 03:34:29 peace kernel: Call Trace:
Mar 17 03:34:29 peace kernel:  [__might_sleep+172/224] __might_sleep+0xac/0xe0
Mar 17 03:34:29 peace kernel:  [profile_exit_task+35/96] profile_exit_task+0x23/0x60
Mar 17 03:34:29 peace kernel:  [do_exit+117/2480] do_exit+0x75/0x9b0
Mar 17 03:34:29 peace kernel:  [die+594/608] die+0x252/0x260
Mar 17 03:34:29 peace kernel:  [do_page_fault+0/1360] do_page_fault+0x0/0x550
Mar 17 03:34:29 peace kernel:  [do_page_fault+485/1360] do_page_fault+0x1e5/0x550
Mar 17 03:34:29 peace kernel:  [update_wall_time+22/64] update_wall_time+0x16/0x40
Mar 17 03:34:29 peace kernel:  [do_timer+199/208] do_timer+0xc7/0xd0
Mar 17 03:34:29 peace kernel:  [do_page_fault+0/1360] do_page_fault+0x0/0x550
Mar 17 03:34:29 peace kernel:  [error_code+45/56] error_code+0x2d/0x38
Mar 17 03:34:29 peace kernel:  [vsnprintf+799/1184] vsnprintf+0x31f/0x4a0
Mar 17 03:34:29 peace kernel:  [vscnprintf+24/48] vscnprintf+0x18/0x30
Mar 17 03:34:29 peace kernel:  [printk+359/1008] printk+0x167/0x3f0
Mar 17 03:34:29 peace kernel:  [shrink_list+2259/2816] shrink_list+0x8d3/0xb00
Mar 17 03:34:29 peace kernel:  [shrink_cache+574/1664] shrink_cache+0x23e/0x680
Mar 17 03:34:29 peace kernel:  [balance_pgdat+401/528] balance_pgdat+0x191/0x210
Mar 17 03:34:29 peace kernel:  [kswapd+220/240] kswapd+0xdc/0xf0
Mar 17 03:34:29 peace kernel:  [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40
Mar 17 03:34:29 peace kernel:  [ret_from_fork+6/20] ret_from_fork+0x6/0x14
Mar 17 03:34:29 peace kernel:  [autoremove_wake_function+0/64] autoremove_wake_function+0x0/0x40
Mar 17 03:34:29 peace kernel:  [kswapd+0/240] kswapd+0x0/0xf0
Mar 17 03:34:29 peace kernel:  [kernel_thread_helper+5/12] kernel_thread_helper+0x5/0xc
Mar 17 03:34:29 peace kernel: 

[-- Attachment #3: Mar17.3.txt --]
[-- Type: text/plain, Size: 6974 bytes --]

Mar 17 06:25:01 peace /USR/SBIN/CRON[1153]: (root) CMD (test -e /usr/sbin/anacron || run-parts --report /etc/cron.daily)
Mar 17 06:25:04 peace kernel: Unable to handle kernel paging request at virtual address 00a6be3c
Mar 17 06:25:04 peace kernel:  printing eip:
Mar 17 06:25:04 peace kernel: c01a4650
Mar 17 06:25:04 peace kernel: *pde = 00000000
Mar 17 06:25:04 peace kernel: Oops: 0000 [#2]
Mar 17 06:25:04 peace kernel: PREEMPT DEBUG_PAGEALLOC
Mar 17 06:25:04 peace kernel: CPU:    0
Mar 17 06:25:04 peace kernel: EIP:    0060:[find_inode_fast+32/96]    Not tainted
Mar 17 06:25:04 peace kernel: EFLAGS: 00010206
Mar 17 06:25:04 peace kernel: EIP is at find_inode_fast+0x20/0x60
Mar 17 06:25:04 peace kernel: eax: c35b5e3c   ebx: 000382ea   ecx: 00a6be3c   edx: 00a6be3c
Mar 17 06:25:04 peace kernel: esi: cb7eebf8   edi: c11f1cac   ebp: c616de24   esp: c616de18
Mar 17 06:25:04 peace kernel: ds: 007b   es: 007b   ss: 0068
Mar 17 06:25:04 peace kernel: Process find (pid: 1170, threadinfo=c616c000 task=c5d169e0)
Mar 17 06:25:04 peace kernel: Stack: 000382ea 000382ea cb7eebf8 c616de58 c01a5aa0 c55612ec 00000000 00000000 
Mar 17 06:25:04 peace kernel:        c5f87f8c c616def8 cab3cef8 1d244b3c c11f1cac 000382ea c5f87ef8 cb7eebf8 
Mar 17 06:25:04 peace kernel:        c616de70 c01dde7c c8fae110 c0388a80 fffffff4 cacb3ebc c616de94 c0191a7e 
Mar 17 06:25:04 peace kernel: Call Trace:
Mar 17 06:25:04 peace kernel:  [iget_locked+176/672] iget_locked+0xb0/0x2a0
Mar 17 06:25:04 peace kernel:  [ext3_lookup+92/176] ext3_lookup+0x5c/0xb0
Mar 17 06:25:04 peace kernel:  [real_lookup+206/256] real_lookup+0xce/0x100
Mar 17 06:25:04 peace kernel:  [do_lookup+117/128] do_lookup+0x75/0x80
Mar 17 06:25:04 peace kernel:  [link_path_walk+2138/4224] link_path_walk+0x85a/0x1080
Mar 17 06:25:04 peace kernel:  [kmem_cache_alloc+134/480] kmem_cache_alloc+0x86/0x1e0
Mar 17 06:25:04 peace kernel:  [getname+126/192] getname+0x7e/0xc0
Mar 17 06:25:04 peace kernel:  [__user_walk+61/80] __user_walk+0x3d/0x50
Mar 17 06:25:04 peace kernel:  [vfs_lstat+29/80] vfs_lstat+0x1d/0x50
Mar 17 06:25:04 peace kernel:  [sys_lstat64+22/48] sys_lstat64+0x16/0x30
Mar 17 06:25:04 peace kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Mar 17 06:25:04 peace kernel: 
Mar 17 06:25:04 peace kernel: Code: 8b 11 0f 18 02 90 39 59 18 89 c8 74 13 85 d2 89 d1 75 ed 31 
Mar 17 06:25:04 peace kernel:  <6>note: find[1170] exited with preempt_count 1
Mar 17 06:25:04 peace kernel: Debug: sleeping function called from invalid context at include/linux/rwsem.h:43
Mar 17 06:25:04 peace kernel: in_atomic():1, irqs_disabled():0
Mar 17 06:25:04 peace kernel: Call Trace:
Mar 17 06:25:04 peace kernel:  [__might_sleep+172/224] __might_sleep+0xac/0xe0
Mar 17 06:25:04 peace kernel:  [profile_exit_task+35/96] profile_exit_task+0x23/0x60
Mar 17 06:25:04 peace kernel:  [do_exit+117/2480] do_exit+0x75/0x9b0
Mar 17 06:25:04 peace kernel:  [die+594/608] die+0x252/0x260
Mar 17 06:25:04 peace kernel:  [do_page_fault+0/1360] do_page_fault+0x0/0x550
Mar 17 06:25:04 peace kernel:  [do_page_fault+485/1360] do_page_fault+0x1e5/0x550
Mar 17 06:25:04 peace kernel:  [__getblk+28/64] __getblk+0x1c/0x40
Mar 17 06:25:04 peace kernel:  [ext3_getblk+119/576] ext3_getblk+0x77/0x240
Mar 17 06:25:04 peace kernel:  [wake_up_buffer+9/48] wake_up_buffer+0x9/0x30
Mar 17 06:25:04 peace kernel:  [ll_rw_block+92/144] ll_rw_block+0x5c/0x90
Mar 17 06:25:04 peace kernel:  [do_page_fault+0/1360] do_page_fault+0x0/0x550
Mar 17 06:25:04 peace kernel:  [error_code+45/56] error_code+0x2d/0x38
Mar 17 06:25:04 peace kernel:  [find_inode_fast+32/96] find_inode_fast+0x20/0x60
Mar 17 06:25:04 peace kernel:  [iget_locked+176/672] iget_locked+0xb0/0x2a0
Mar 17 06:25:04 peace kernel:  [ext3_lookup+92/176] ext3_lookup+0x5c/0xb0
Mar 17 06:25:04 peace kernel:  [real_lookup+206/256] real_lookup+0xce/0x100
Mar 17 06:25:04 peace kernel:  [do_lookup+117/128] do_lookup+0x75/0x80
Mar 17 06:25:04 peace kernel:  [link_path_walk+2138/4224] link_path_walk+0x85a/0x1080
Mar 17 06:25:04 peace kernel:  [kmem_cache_alloc+134/480] kmem_cache_alloc+0x86/0x1e0
Mar 17 06:25:04 peace kernel:  [getname+126/192] getname+0x7e/0xc0
Mar 17 06:25:04 peace kernel:  [__user_walk+61/80] __user_walk+0x3d/0x50
Mar 17 06:25:04 peace kernel:  [vfs_lstat+29/80] vfs_lstat+0x1d/0x50
Mar 17 06:25:04 peace kernel:  [sys_lstat64+22/48] sys_lstat64+0x16/0x30
Mar 17 06:25:04 peace kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Mar 17 06:25:04 peace kernel: 
Mar 17 06:25:04 peace kernel: bad: scheduling while atomic!
Mar 17 06:25:04 peace kernel: Call Trace:
Mar 17 06:25:04 peace kernel:  [schedule+2311/2320] schedule+0x907/0x910
Mar 17 06:25:04 peace kernel:  [zap_pmd_range+68/96] zap_pmd_range+0x44/0x60
Mar 17 06:25:04 peace kernel:  [unmap_page_range+70/128] unmap_page_range+0x46/0x80
Mar 17 06:25:04 peace kernel:  [unmap_vmas+534/848] unmap_vmas+0x216/0x350
Mar 17 06:25:04 peace kernel:  [__pagevec_lru_add_active+451/672] __pagevec_lru_add_active+0x1c3/0x2a0
Mar 17 06:25:04 peace kernel:  [exit_mmap+199/688] exit_mmap+0xc7/0x2b0
Mar 17 06:25:04 peace kernel:  [dump_stack+23/32] dump_stack+0x17/0x20
Mar 17 06:25:04 peace kernel:  [mmput+173/288] mmput+0xad/0x120
Mar 17 06:25:04 peace kernel:  [do_exit+482/2480] do_exit+0x1e2/0x9b0
Mar 17 06:25:04 peace kernel:  [die+594/608] die+0x252/0x260
Mar 17 06:25:04 peace kernel:  [do_page_fault+0/1360] do_page_fault+0x0/0x550
Mar 17 06:25:04 peace kernel:  [do_page_fault+485/1360] do_page_fault+0x1e5/0x550
Mar 17 06:25:04 peace kernel:  [__getblk+28/64] __getblk+0x1c/0x40
Mar 17 06:25:04 peace kernel:  [ext3_getblk+119/576] ext3_getblk+0x77/0x240
Mar 17 06:25:04 peace kernel:  [wake_up_buffer+9/48] wake_up_buffer+0x9/0x30
Mar 17 06:25:04 peace kernel:  [ll_rw_block+92/144] ll_rw_block+0x5c/0x90
Mar 17 06:25:04 peace kernel:  [do_page_fault+0/1360] do_page_fault+0x0/0x550
Mar 17 06:25:04 peace kernel:  [error_code+45/56] error_code+0x2d/0x38
Mar 17 06:25:04 peace kernel:  [find_inode_fast+32/96] find_inode_fast+0x20/0x60
Mar 17 06:25:04 peace kernel:  [iget_locked+176/672] iget_locked+0xb0/0x2a0
Mar 17 06:25:04 peace kernel:  [ext3_lookup+92/176] ext3_lookup+0x5c/0xb0
Mar 17 06:25:04 peace kernel:  [real_lookup+206/256] real_lookup+0xce/0x100
Mar 17 06:25:04 peace kernel:  [do_lookup+117/128] do_lookup+0x75/0x80
Mar 17 06:25:04 peace kernel:  [link_path_walk+2138/4224] link_path_walk+0x85a/0x1080
Mar 17 06:25:04 peace kernel:  [kmem_cache_alloc+134/480] kmem_cache_alloc+0x86/0x1e0
Mar 17 06:25:04 peace kernel:  [getname+126/192] getname+0x7e/0xc0
Mar 17 06:25:04 peace kernel:  [__user_walk+61/80] __user_walk+0x3d/0x50
Mar 17 06:25:04 peace kernel:  [vfs_lstat+29/80] vfs_lstat+0x1d/0x50
Mar 17 06:25:04 peace kernel:  [sys_lstat64+22/48] sys_lstat64+0x16/0x30
Mar 17 06:25:04 peace kernel:  [syscall_call+7/11] syscall_call+0x7/0xb
Mar 17 06:25:04 peace kernel: 
Mar 17 06:25:04 peace kernel: fs/fs-writeback.c:71: spin_lock(fs/inode.c:c0386770) already locked by fs/inode.c/798

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-07 13:44                         ` Chris Shoemaker
@ 2004-08-07 18:49                           ` Linus Torvalds
  2004-08-07 19:01                           ` Gene Heskett
  1 sibling, 0 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-07 18:49 UTC (permalink / raw)
  To: Chris Shoemaker
  Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Ingo Molnar,
	vda, ak, William Lee Irwin III



On Sat, 7 Aug 2004, Chris Shoemaker wrote:
> 
> Well then, maybe you'd like more?  I attached two more from the same
> period.  Please remember that these are 5 months old, and could
> represent bugs already fixed.  I think this was stock 2.6.4.

These look like total memory corruption, they don't look anything like the 
prune_dcache things. 

> Perhaps due to CONFIG_REGPARM?  I haven't used it for quite a while, but
> back in March I was a bit bolder about config options marked
> experimental.

Entirely possible. gcc has historically had bugs in regparm (extra 
register pressure causing incorrect register re-use). It's supposed to be 
fixed in gcc-3+

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-07 13:44                         ` Chris Shoemaker
  2004-08-07 18:49                           ` Linus Torvalds
@ 2004-08-07 19:01                           ` Gene Heskett
  1 sibling, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-07 19:01 UTC (permalink / raw)
  To: linux-kernel
  Cc: Chris Shoemaker, Linus Torvalds, Andrew Morton, Ingo Molnar, vda,
	ak, William Lee Irwin III

On Saturday 07 August 2004 09:44, Chris Shoemaker wrote:
>On Fri, Aug 06, 2004 at 11:20:28PM -0700, Linus Torvalds wrote:
>> On Fri, 6 Aug 2004, Chris Shoemaker wrote:
>> > I _was_ able to find the attached oops, but I don't think I have
>> > the corresponding object files, so I hope the decoding it
>> > contains is good enough.
>>
>> It's fine.
>
>Well then, maybe you'd like more?  I attached two more from the same
>period.  Please remember that these are 5 months old, and could
>represent bugs already fixed.  I think this was stock 2.6.4.
>
>> It oopses on
>>
>> 	inode->i_sb->s_op
>>
>> where "i_sb" is bad and contains the pointer "0x0b7eebf8" which is
>> definitely not a valid kernel pointer.
>>
>> There's a few other strange details in your oops report too. One
>> being that the inode pointer (in %ebx, apparently) doesn't show on
>> the stack where I'd expect it to show. Hmm. That might be just a
>> different compiler issue, though.
>
>Perhaps due to CONFIG_REGPARM?  I haven't used it for quite a while,
> but back in March I was a bit bolder about config options marked
> experimental.  Gene, are you using REGPARM?
>
>-chris

No Chris.  I think I may have had it on for maybe 10 minutes, in 
2.6.7-mmsomething maybe.  But it died without a trace, (on the old 
motherboard with an Athlon 1600XP on it) as I was starting X, so on 
the next reboot to 2.6.7, I turned it back off and haven't turned it 
back on since.

IIRC that was before the video card took ill, so at that point I was 
blaming my problems, which were generally only post problems then, as 
symptoms of heat.  TBE it had to warm up before it would post!  By 
the time the video card wouldn't post, memtest86 was also finding bad 
memory (with a new card plugged in), hence the whole mobo got 
retired.

>> Anyway, this does look somewhat like the ones Gene is seeing. If I
>> had to guess, I'd guess that either the inode pointer is bad, or
>> it's just stale from an inode that has already been free'd. Most
>> likely because of prune_dcache() having had a corrupt LRU list
>> with a stale/corrupt entry.
>>
>> That would blow the prefetch theory out of the water.
>>
>> 		Linus

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06  3:24         ` Linus Torvalds
@ 2004-08-08  4:42           ` Gene Heskett
  2004-08-08 14:30           ` Gene Heskett
  1 sibling, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-08  4:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, viro, Andrew Morton

On Thursday 05 August 2004 23:24, Linus Torvalds wrote:
>On Fri, 6 Aug 2004 viro@parcelfarce.linux.theplanet.co.uk wrote:
>> It doesn't even take a dput().  Look: we do list_del(), then
>> notice that sucker still has positive refcount and leave it alone.
>>  Now think what happens on the next pass.  That's right, we hit
>> that dentry *again*. And see that list_empty() is false.  And do
>> list_del() one more time.
>
>Well, the sad part is that doing another list_del() won't even
> necessarily go *boom*. Most of the time it might even leave the
> list as-is, but often enough it should give list corruption.
>
>> However, what used to be e.g. next dentry might very well be freed
>> by now.  *BOOM*.
>
>Absolutely. It does look like a rather nasty bug.
>
>It doesn't explain what Gene sees, though, unless you can explain
> how we'd get an anon dentry without knfsd/xfs. Oh well.
>
>I'll commit the obvious one-liner fix, since it might explain _some_
>problems people have seen.
>
>		Linus

I just had to reboot, after about an 8 hour uptime with the 'one 
liner' only on top of 2.6.8-rc3.  Out of memory basicly.  tvtime and 
mozilla were casualties of what must be the Oom killer.  Nothing in 
the logs.  I had seti@home, X, kde3.3-beta2, and its kmail, plus top, 
tail, tvtime and mozilla.  Moz died first, or at least thats what I 
noticed first.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-06  3:24         ` Linus Torvalds
  2004-08-08  4:42           ` Gene Heskett
@ 2004-08-08 14:30           ` Gene Heskett
  2004-08-08 18:39             ` Andrew Morton
  1 sibling, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-08 14:30 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, viro, Andrew Morton

On Thursday 05 August 2004 23:24, Linus Torvalds wrote:

[...]

>I'll commit the obvious one-liner fix, since it might explain _some_
>problems people have seen.
>
>		Linus

I had to reboot late last night, out of memory and things (like 
mozilla (1.7.2) were dying, but nothing in the logs.  Nearly out 
again, now ~40megs free but so far its stable & nothing in swap.  I'm 
getting the impression there is a memory leak somewhere.  OOm hasn't 
killed anything I am using at this time anyway.

Its running like an arthritic dog though, 3 units for seti yesterday, 
s/b 6 to 7.  The gkrellm2 cpu usage display looks plumb normal, so 
I'm a bit puzzled as to why the slowdown, the rest of the system 
'feels good'.

This is with just the 'one liner' on top of rc3 & non-verbose-debug.  
The question is, is rc3-mm1 ready for *me* to try?

I don't want to be the hangup, holding up forward progress, but it 
appears I (& maybe 1 or 2 others) may be exactly that with all this 
time sitting around waiting for the other shoe to drop.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-08 14:30           ` Gene Heskett
@ 2004-08-08 18:39             ` Andrew Morton
  2004-08-10  4:12               ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Andrew Morton @ 2004-08-08 18:39 UTC (permalink / raw)
  To: gene.heskett; +Cc: linux-kernel, torvalds, viro

Gene Heskett <gene.heskett@verizon.net> wrote:
>
> On Thursday 05 August 2004 23:24, Linus Torvalds wrote:
> 
> [...]
> 
> >I'll commit the obvious one-liner fix, since it might explain _some_
> >problems people have seen.
> >
> >		Linus
> 
> I had to reboot late last night, out of memory and things (like 
> mozilla (1.7.2) were dying, but nothing in the logs.

Please wait for it to happen again, then send the contents of
/proc/meminfo, /proc/slabinfo and then do

	su
	dmesg -c
	echo m > /proc/sysrq-trigger
	dmesg > foo

and send foo as well.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-08 18:39             ` Andrew Morton
@ 2004-08-10  4:12               ` Gene Heskett
  2004-08-11  3:42                 ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-10  4:12 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, torvalds, viro

On Sunday 08 August 2004 14:39, Andrew Morton wrote:
>Gene Heskett <gene.heskett@verizon.net> wrote:
>> On Thursday 05 August 2004 23:24, Linus Torvalds wrote:
>>
>> [...]
>>
>> >I'll commit the obvious one-liner fix, since it might explain
>> > _some_ problems people have seen.
>> >
>> >		Linus
>>
>> I had to reboot late last night, out of memory and things (like
>> mozilla (1.7.2) were dying, but nothing in the logs.
>
>Please wait for it to happen again, then send the contents of
>/proc/meminfo, /proc/slabinfo and then do
>
>	su
>	dmesg -c
>	echo m > /proc/sysrq-trigger
>	dmesg > foo
>
>and send foo as well.

I just had to reboot again.  Top was showing about 50 megs free, and 
there was about 60 megs in the swap.  Top wasn't showing anything 
else of interest that I noted.

I've been gone all day, a long day at that, 12 hours. We had another 
blowup in the hi voltage at the tv transmitter, and we'll be sometime 
tomorrow getting things back to normal there.  Its 40 years old, and 
quite far up the far end of the "bathtub curve".

I left about 10:15 and came back in about 22:30.  A friend had been 
trying to reach me over an alsa problem, and I'd opened a shell and 
was showing him how the new 2.6 modprobe.conf worked.  When we were 
done, I hit a q to quit less, and (surprise) the whole shell went 
away, and I could not start another shell, each attempt being 
reported as an error 5 on the kickstart panel at the bottom of the 
screen after the new window opened and reclosed in about 100 
milliseconds per attempt.  I quit the top program to free that shell, 
and thinking maybe I was being attacked, entered a 'w' at the prompt, 
and that shell went away too, with the same error.  That left me with 
the tail on the log, which at that point still wasn't showing me 
anything but a samba restart I do once daily else it dies from a 
profound lack of interest anyway.

I right clicked on the screen and selected quit X.

It quit, but then a trap error was reported.

I typed "reboot" and the machine reported no more processes at this 
run level and was then DOA, requireing a tap on the reset button to 
bring it back to life.  On the subsequent e2fsck's, /dev/hda8 had 
this error:

i_dir_acl for inode 654880 (/lib/local/ar_YE is 42752 but s/b zero.

And then dropped me to a shell to run e2fsck without any options.  
Which I did.  Eventually it asked me if I wanted to clear that inode, 
so I answered 'y' and it finished without any other errors, but when 
I did the ctl-d to reboot, it still wanted to do an e2fsck on 
everything, which passed.  So now I'm rebooted, but without anything 
of meaning (there is nothing in the logs) to report.  Any evidence of 
the debacle is now gone.

Also, during the reboot, I'm blind from "ok, booting the kernel" until 
the line in something that sets the default font is executed, setting 
it to "lat0_sun16" at which time I have readable info on the screen 
again.  I don't recall seeing that particular font mentioned in a 
make xconfig, so I've no idea how to make it use it from square one 
so I can read the dmesg as it goes by the first time.  I have 
iso-8859-1 compiled in, along with codepage 437 for US useage, with 
everything else as modular.

How can I fix this "blind" time?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-10  4:12               ` Gene Heskett
@ 2004-08-11  3:42                 ` Gene Heskett
  2004-08-11  3:46                   ` Linus Torvalds
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-11  3:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, torvalds, viro

On Tuesday 10 August 2004 00:12, Gene Heskett wrote:
>On Sunday 08 August 2004 14:39, Andrew Morton wrote:
>>Gene Heskett <gene.heskett@verizon.net> wrote:
>>> On Thursday 05 August 2004 23:24, Linus Torvalds wrote:
>>>
>>> [...]
>>>
>>> >I'll commit the obvious one-liner fix, since it might explain
>>> > _some_ problems people have seen.
>>> >
>>> >		Linus

Linus, I hate to be a killjoy on this, but I just had to reboot again, 
it was killing processes, even first the shells I had open then kmail 
and X this time, but with nothing in the logs, and when X had quit, a 
top in the launching shell reported nearly 250 megs free with nothing 
in the swap.

So I'm not getting any usefull data, the machine is dog slow:
real    17m51.460s
user    13m11.201s
sys     1m34.718s

That should have been 6 minutes maximum.

I got rc4 as the whole thing just now, maybe there was something wrong 
with the 2.6.7 base I was using.  Thats rare since I quit getting 
the .bz2's, switching to tar.gz's which seem to be the more 
dependable format here.

>>> I had to reboot late last night, out of memory and things (like
>>> mozilla (1.7.2) were dying, but nothing in the logs.
>>
>>Please wait for it to happen again, then send the contents of
>>/proc/meminfo, /proc/slabinfo and then do
>>
>>	su
>>	dmesg -c
>>	echo m > /proc/sysrq-trigger
>>	dmesg > foo
>>
>>and send foo as well.

The above was not available (X wouldn't restart), and trying to print 
from any kde app causes the app, and its launcher, to exit.  So I 
don't have a paper copy and my memory isn't photographic, please 
accept my apologies on this.  Maybe rc4 will also do it.  We'll find 
out I guess.  Reboot time again.

[...]

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  3:42                 ` Gene Heskett
@ 2004-08-11  3:46                   ` Linus Torvalds
  2004-08-11  4:18                     ` Udo A. Steinberg
  2004-08-11  4:47                     ` Gene Heskett
  0 siblings, 2 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-11  3:46 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Andrew Morton, viro



On Tue, 10 Aug 2004, Gene Heskett wrote:
> 
> Linus, I hate to be a killjoy on this, but I just had to reboot again, 

Note that this is something else going on. The "obvious one-liner" can be 
an issue only with certain special XFS stuff or knfsd, neither of which 
you have.

> it was killing processes, even first the shells I had open then kmail 
> and X this time, but with nothing in the logs, and when X had quit, a 
> top in the launching shell reported nearly 250 megs free with nothing 
> in the swap.

As Andrew already requested, the only way for us to figure out what is 
wrong is to get output from you on where the memory has gone. Notably, the 
output of "/proc/meminfo" and "/proc/slabinfo". "ps axm" helps too.

If it is slow, the above will still work. Just save them away and reboot. 

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  3:46                   ` Linus Torvalds
@ 2004-08-11  4:18                     ` Udo A. Steinberg
  2004-08-11  5:13                       ` Linus Torvalds
  2004-08-11  4:47                     ` Gene Heskett
  1 sibling, 1 reply; 146+ messages in thread
From: Udo A. Steinberg @ 2004-08-11  4:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, viro

[-- Attachment #1: Type: text/plain, Size: 20973 bytes --]

On Tue, 10 Aug 2004 20:46:33 -0700 (PDT) Linus Torvalds (LT) wrote:

I'm currently using 2.6.8-rc4 and I'm seeing the same problem. Each day the
machine just gets slower and swappier, even though I'm always running the same
workload. Rebooting helps a lot. The machine has very little memory (128MB).

LT> As Andrew already requested, the only way for us to figure out what is 
LT> wrong is to get output from you on where the memory has gone. Notably, the 
LT> output of "/proc/meminfo" and "/proc/slabinfo". "ps axm" helps too.

See below.

-Udo.


MemTotal:       125124 kB
MemFree:          1404 kB
Buffers:         19060 kB
Cached:          40484 kB
SwapCached:      33336 kB
Active:          70176 kB
Inactive:        41892 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       125124 kB
LowFree:          1404 kB
SwapTotal:      506512 kB
SwapFree:       455536 kB
Dirty:               4 kB
Writeback:           0 kB
Mapped:          65312 kB
Slab:             9068 kB
Committed_AS:    99576 kB
PageTables:        704 kB
VmallocTotal:   909268 kB
VmallocUsed:      8936 kB
VmallocChunk:   900312 kB

slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
rpc_buffers            8      8   2048    2    1 : tunables   24   12    0 : slabdata      4      4      0
rpc_tasks              8     25    160   25    1 : tunables  120   60    0 : slabdata      1      1      0
rpc_inode_cache        0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
xfrm6_tunnel_spi       0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
fib6_nodes             5    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
ip6_dst_cache          5     18    224   18    1 : tunables  120   60    0 : slabdata      1      1      0
ndisc_cache            1     25    160   25    1 : tunables  120   60    0 : slabdata      1      1      0
raw6_sock              0      0    640    6    1 : tunables   54   27    0 : slabdata      0      0      0
udp6_sock              0      0    608    6    1 : tunables   54   27    0 : slabdata      0      0      0
tcp6_sock              6      7   1120    7    2 : tunables   24   12    0 : slabdata      1      1      0
unix_sock             42     50    384   10    1 : tunables   54   27    0 : slabdata      5      5      0
ip_conntrack           4     25    160   25    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_tw_bucket          2     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_bind_bucket       10    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_open_request       0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
inet_peer_cache        0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
secpath_cache          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
xfrm_dst_cache         0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
ip_fib_hash            9    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
ip_dst_cache          10     30    256   15    1 : tunables  120   60    0 : slabdata      2      2      0
arp_cache              1     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
raw4_sock              0      0    480    8    1 : tunables   54   27    0 : slabdata      0      0      0
udp_sock               1      7    512    7    1 : tunables   54   27    0 : slabdata      1      1      0
tcp_sock              11     16   1024    4    1 : tunables   54   27    0 : slabdata      4      4      0
flow_cache             0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uhci_urb_priv          0      0     44   88    1 : tunables  120   60    0 : slabdata      0      0      0
ntfs_big_inode_cache      0      0    448    9    1 : tunables   54   27    0 : slabdata      0      0      0
ntfs_inode_cache       0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
ntfs_name_cache        0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
ntfs_attr_ctx_cache      0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
ntfs_index_ctx_cache      0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
smb_request            0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
smb_inode_cache        0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
nfs_write_data        36     36    448    9    1 : tunables   54   27    0 : slabdata      4      4      0
nfs_read_data         32     36    416    9    1 : tunables   54   27    0 : slabdata      4      4      0
nfs_inode_cache        0      0    544    7    1 : tunables   54   27    0 : slabdata      0      0      0
nfs_page               0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
isofs_inode_cache      0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
fat_inode_cache        0      0    352   11    1 : tunables   54   27    0 : slabdata      0      0      0
ext2_inode_cache       0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
ext2_xattr             0      0     44   88    1 : tunables  120   60    0 : slabdata      0      0      0
journal_handle        16    135     28  135    1 : tunables  120   60    0 : slabdata      1      1      0
journal_head          46    162     48   81    1 : tunables  120   60    0 : slabdata      2      2      0
revoke_table          12    290     12  290    1 : tunables  120   60    0 : slabdata      1      1      0
revoke_record          0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
ext3_inode_cache     844   1359    448    9    1 : tunables   54   27    0 : slabdata    151    151      0
ext3_xattr             0      0     44   88    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_epi          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
kioctx                 0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
kiocb                  0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
dnotify_cache          0      0     20  185    1 : tunables  120   60    0 : slabdata      0      0      0
file_lock_cache        1     43     92   43    1 : tunables  120   60    0 : slabdata      1      1      0
fasync_cache           2    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
shmem_inode_cache      8     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uid_cache              2    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
cfq_pool              64    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
crq_pool               0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : slabdata      0      0      0
as_arq                 4     65     60   65    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_ioc            43    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_queue           9      9    448    9    1 : tunables   54   27    0 : slabdata      1      1      0
blkdev_requests        4     26    152   26    1 : tunables  120   60    0 : slabdata      1      1      0
biovec-(256)          60     60   3072    2    2 : tunables   24   12    0 : slabdata     30     30      0
biovec-128           121    125   1536    5    2 : tunables   24   12    0 : slabdata     25     25      0
biovec-64            242    245    768    5    1 : tunables   54   27    0 : slabdata     49     49      0
biovec-16            242    260    192   20    1 : tunables  120   60    0 : slabdata     13     13      0
biovec-4             242    244     64   61    1 : tunables  120   60    0 : slabdata      4      4      0
biovec-1             242    452     16  226    1 : tunables  120   60    0 : slabdata      2      2      0
bio                  259    366     64   61    1 : tunables  120   60    0 : slabdata      6      6      0
sock_inode_cache      65     77    352   11    1 : tunables   54   27    0 : slabdata      7      7      0
skbuff_head_cache    520    580    192   20    1 : tunables  120   60    0 : slabdata     29     29      0
sock                   4     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
proc_inode_cache     360    360    320   12    1 : tunables   54   27    0 : slabdata     30     30      0
sigqueue              16     27    148   27    1 : tunables  120   60    0 : slabdata      1      1      0
radix_tree_node     1934   2044    276   14    1 : tunables   54   27    0 : slabdata    146    146      0
bdev_cache            10     18    416    9    1 : tunables   54   27    0 : slabdata      2      2      0
mnt_cache             23     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
inode_cache         1069   1078    288   14    1 : tunables   54   27    0 : slabdata     77     77      0
dentry_cache        2432   4368    140   28    1 : tunables  120   60    0 : slabdata    156    156      0
filp                 800    800    160   25    1 : tunables  120   60    0 : slabdata     32     32      0
names_cache            8      8   4096    1    1 : tunables   24   12    0 : slabdata      8      8      0
idr_layer_cache       84     87    136   29    1 : tunables  120   60    0 : slabdata      3      3      0
buffer_head        33719  36126     48   81    1 : tunables  120   60    0 : slabdata    446    446      0
mm_struct             56     56    512    7    1 : tunables   54   27    0 : slabdata      8      8      0
vm_area_struct      1619   1692     84   47    1 : tunables  120   60    0 : slabdata     36     36      0
fs_cache              59    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
files_cache           54     54    416    9    1 : tunables   54   27    0 : slabdata      6      6      0
signal_cache          95    123     96   41    1 : tunables  120   60    0 : slabdata      3      3      0
sighand_cache         63     63   1312    3    1 : tunables   24   12    0 : slabdata     21     21      0
task_struct           82     85   1424    5    2 : tunables   24   12    0 : slabdata     17     17      0
anon_vma             762    814      8  407    1 : tunables  120   60    0 : slabdata      2      2      0
pgd                   46     46   4096    1    1 : tunables   24   12    0 : slabdata     46     46      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768            28     28  32768    1    8 : tunables    8    4    0 : slabdata     28     28      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             3      3  16384    1    4 : tunables    8    4    0 : slabdata      3      3      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             84     84   8192    1    2 : tunables    8    4    0 : slabdata     84     84      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096             53     53   4096    1    1 : tunables   24   12    0 : slabdata     53     53      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048            118    118   2048    2    1 : tunables   24   12    0 : slabdata     59     59      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            184    184   1024    4    1 : tunables   54   27    0 : slabdata     46     46      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             383    568    512    8    1 : tunables   54   27    0 : slabdata     71     71      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             252    480    256   15    1 : tunables  120   60    0 : slabdata     32     32      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    0 : slabdata      0      0      0
size-192             200    200    192   20    1 : tunables  120   60    0 : slabdata     10     10      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128             335    372    128   31    1 : tunables  120   60    0 : slabdata     12     12      0
size-96(DMA)           0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
size-96             1655   1681     96   41    1 : tunables  120   60    0 : slabdata     41     41      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64             1782   2013     64   61    1 : tunables  120   60    0 : slabdata     33     33      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32             2677   2737     32  119    1 : tunables  120   60    0 : slabdata     23     23      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

  PID TTY      STAT   TIME COMMAND
    1 ?        -      0:01 init [4]             
    - -        S      0:01 -
    2 ?        -      0:00 [ksoftirqd/0]
    - -        SN     0:00 -
    3 ?        -      0:00 [events/0]
    - -        S<     0:00 -
    4 ?        -      0:00 [khelper]
    - -        S<     0:00 -
    5 ?        -      0:04 [kacpid]
    - -        S<     0:04 -
   22 ?        -      0:00 [kblockd/0]
    - -        S<     0:00 -
   23 ?        -      0:00 [khubd]
    - -        S      0:00 -
   37 ?        -      0:00 [pdflush]
    - -        S      0:00 -
   40 ?        -      0:00 [aio/0]
    - -        S<     0:00 -
   39 ?        -      0:02 [kswapd0]
    - -        S      0:02 -
  142 ?        -      0:00 [pccardd]
    - -        S      0:00 -
  144 ?        -      0:00 [pccardd]
    - -        S      0:00 -
  152 ?        -      0:00 [kseriod]
    - -        S      0:00 -
  171 ?        -      0:00 [kjournald]
    - -        S      0:00 -
  330 ?        -      0:00 [kjournald]
    - -        S      0:00 -
  331 ?        -      0:24 [loop0]
    - -        S<     0:24 -
  332 ?        -      0:00 [kjournald]
    - -        S      0:00 -
  333 ?        -      0:00 [kjournald]
    - -        S      0:00 -
  334 ?        -      0:00 [kjournald]
    - -        S      0:00 -
  335 ?        -      0:00 [kjournald]
    - -        S      0:00 -
  497 ?        -      0:00 /usr/sbin/syslogd -m 0
    - -        Ss     0:00 -
  511 ?        -      0:00 /usr/sbin/klogd -c 3 -x
    - -        Ss     0:00 -
  514 ?        -      0:00 /sbin/cardmgr
    - -        Ss     0:00 -
  857 ?        -      0:00 /sbin/rpc.portmap
    - -        Ss     0:00 -
  898 ?        -      0:00 /usr/sbin/inetd
    - -        Ss     0:00 -
  904 ?        -      0:00 /usr/local/sbin/sshd
    - -        Ss     0:00 -
  914 ?        -      0:00 /usr/sbin/crond -l10
    - -        S      0:00 -
  917 ?        -      0:00 /usr/sbin/acpid
    - -        Ss     0:00 -
  930 ?        -      0:00 /usr/sbin/gpm -m /dev/mouse -t ps2
    - -        Ss     0:00 -
  933 ?        -      0:00 /usr/sbin/smartd
    - -        S      0:00 -
  950 tty2     -      0:00 /sbin/agetty 38400 tty2 linux
    - -        Ss+    0:00 -
  951 tty3     -      0:00 /sbin/agetty 38400 tty3 linux
    - -        Ss+    0:00 -
  952 tty4     -      0:00 /sbin/agetty 38400 tty4 linux
    - -        Ss+    0:00 -
  953 tty5     -      0:00 /sbin/agetty 38400 tty5 linux
    - -        Ss+    0:00 -
  954 tty6     -      0:00 /sbin/agetty 38400 tty6 linux
    - -        Ss+    0:00 -
  955 tty7     -      0:00 /sbin/agetty 38400 tty7 linux
    - -        Ss+    0:00 -
  956 tty8     -      0:00 /sbin/agetty 38400 tty8 linux
    - -        Ss+    0:00 -
  957 tty9     -      0:00 /sbin/agetty 38400 tty9 linux
    - -        Ss+    0:00 -
  958 tty10    -      0:00 /sbin/agetty 38400 tty10 linux
    - -        Ss+    0:00 -
  959 ?        -      0:00 /usr/X11R6/bin/xdm -nodaemon
    - -        Ss     0:00 -
 1081 ?        -     19:54 /usr/X11R6/bin/X -auth /usr/X11R6/lib/X11/xdm/authdir/authfiles/A:0-yl5ncw
    - -        S     19:54 -
 1082 ?        -      0:00 -:0                         
    - -        S      0:00 -
 1181 ?        -      0:00 [eth1]
    - -        S      0:00 -
 1244 ?        -      0:00 /sbin/dhcpcd -d eth1
    - -        Ss     0:00 -
 1251 ?        -      0:01 blackbox
    - -        S      0:01 -
 1280 ?        -      6:18 /home/uas/bin/wmbatteries
    - -        S      6:18 -
 1282 ?        -      0:02 /home/uas/bin/wmcpuload -a -n -lc rgb:ff/ff/33
    - -        S      0:02 -
 1284 ?        -      2:24 /home/uas/bin/wmnetload -n eth1
    - -        S      2:24 -
 1286 ?        -      0:01 /home/uas/bin/wmmemload -am -b -c -lc rgb:ff/80/30
    - -        S      0:01 -
 1288 ?        -      0:12 /home/uas/bin/wmtime -lc rgb:33/33/ff
    - -        S      0:12 -
 1292 ?        -      0:00 /home/uas/bin/root-tail -f -g 350x10+5-10 -fn -schumacher-clean-medium-r-*-*-10-*-*-*-*-*-*-* -color rgb:cc/cc/ff /var/log/messages rgb:88/88/ff /var/log/syslog rgb:ff/88/ff /var/log/maillog
    - -        Ss     0:00 -
 1328 ?        -      0:00 licq
    - -        Ss     0:00 -
 1331 ?        -      0:00 licq
    - -        S      0:00 -
 1332 ?        -      0:06 licq
    - -        S      0:06 -
 1333 ?        -      0:00 licq
    - -        S      0:00 -
 1334 ?        -      0:00 licq
    - -        S      0:00 -
 1335 ?        -      0:04 licq
    - -        S      0:04 -
 1484 ?        -      0:00 /home/uas/bin/aterm -geometry 80x25
    - -        Ss     0:00 -
 1485 pts/1    -      0:00 -bash
    - -        Rs     0:00 -
 1508 tty1     -      0:00 /sbin/agetty 38400 tty1 linux
    - -        Ss+    0:00 -
 4232 ?        -      0:03 xmms
    - -        Ss     0:03 -
 4233 ?        -      0:00 xmms
    - -        S      0:00 -
 4234 ?        -      0:00 xmms
    - -        S      0:00 -
 4235 ?        -      0:00 xmms
    - -        S      0:00 -
 4265 ?        -      0:00 /bin/sh /usr/local/bin/firefox
    - -        Ss     0:00 -
 4277 ?        -      0:00 /bin/sh /usr/local/firefox/run-mozilla.sh /usr/local/firefox/firefox-bin
    - -        S      0:00 -
 4282 ?        -      0:30 /usr/local/firefox/firefox-bin
    - -        S      0:30 -
 4283 ?        -      0:00 /usr/local/firefox/firefox-bin
    - -        S      0:00 -
 4284 ?        -      0:00 /usr/local/firefox/firefox-bin
    - -        S      0:00 -
 4285 ?        -      0:00 /usr/local/firefox/firefox-bin
    - -        S      0:00 -
 4307 ?        -      0:00 [netstat] <defunct>
    - -        Z      0:00 -
 4335 ?        -      0:00 [pdflush]
    - -        S      0:00 -
 4386 ?        -      0:00 xmms
    - -        S      0:00 -
 4387 ?        -      0:00 xmms
    - -        S      0:00 -
 4441 ?        -      0:03 sylpheed
    - -        Ss     0:03 -
 4447 ?        -      0:00 /home/uas/bin/aterm -geometry 80x25
    - -        Ss     0:00 -
 4448 pts/2    -      0:00 -bash
    - -        Ss+    0:00 -
 4474 ?        -      0:00 /home/uas/bin/aterm -geometry 80x25
    - -        Ss     0:00 -
 4475 pts/0    -      0:00 -bash
    - -        Ss+    0:00 -
 4489 pts/1    -      0:00 ps axm
    - -        R+     0:00 -

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  3:46                   ` Linus Torvalds
  2004-08-11  4:18                     ` Udo A. Steinberg
@ 2004-08-11  4:47                     ` Gene Heskett
  2004-08-11  4:59                       ` Linus Torvalds
  1 sibling, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-11  4:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Andrew Morton, viro

[-- Attachment #1: Type: text/plain, Size: 3242 bytes --]

On Tuesday 10 August 2004 23:46, Linus Torvalds wrote:
>On Tue, 10 Aug 2004, Gene Heskett wrote:
>> Linus, I hate to be a killjoy on this, but I just had to reboot
>> again,
>
>Note that this is something else going on. The "obvious one-liner"
> can be an issue only with certain special XFS stuff or knfsd,
> neither of which you have.

I've come to the same conclusion.

>> it was killing processes, even first the shells I had open then
>> kmail and X this time, but with nothing in the logs, and when X
>> had quit, a top in the launching shell reported nearly 250 megs
>> free with nothing in the swap.
>
>As Andrew already requested, the only way for us to figure out what
> is wrong is to get output from you on where the memory has gone.
> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps
> axm" helps too.
>
I'll try and remember that.

>If it is slow, the above will still work. Just save them away and
> reboot.

And it appears to still be quit slow, on -rc4 now. I have a remake 
running just to double-check.  Also, the X startup was extremely 
slow, taking over 3 minutes before gkrellm was able to load its theme 
face, and kmail nearly 2 minutes to get started.  This is also the 
cfq scheduler which I hadn't tried in a while.

Just to quantify the slow, here is the rebuild time on another copy of 
2.6.8-rc4 while running 2.6.8-rc4:

real    27m50.631s
user    13m15.535s
sys     1m35.908s

Thats nearly 10 minutes longer than the same build took on -rc2.  And 
over 22 minutes longer that it would take if I was running 2.6.7.

This shouldn't be more than 6 minutes, so where is all the tire smoke 
I should see if its spinning them that badly?  I must have something 
terribly, badly configured.  So the .config is attached.  This is a 
Biostar M7NCD-Pro, nforce2 SPP northbridge, MCP southbridge chipset, 
with an athlon 2800XP and a gig of ram, using the onboard audio 
(intel8x0) and ethernet (forcedeth).  Actual cpu clock according to 
dmesg is 2079mhz.  Bogomips=4100+.

If this crashes too, I'll send those files if I can keep it going long 
enough to get them once I get rebooted, but the reboot will be to 
2.6.7, it at least leaves its muddy footprints all over the log when 
it goes by-by.  Unforch, when it goes, theres no chance to get that 
data, its total lockup.

Just for grins, /proc/cpuinfo:
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 10
model name      : AMD Athlon(tm) XP 2800+
stepping        : 0
cpu MHz         : 2079.940
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 4112.38

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

[-- Attachment #2: .config --]
[-- Type: text/plain, Size: 29571 bytes --]

#
# Automatically generated make config: don't edit
#
CONFIG_X86=y
CONFIG_MMU=y
CONFIG_UID16=y
CONFIG_GENERIC_ISA_DMA=y

#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
# CONFIG_CLEAN_COMPILE is not set
CONFIG_BROKEN=y
CONFIG_BROKEN_ON_SMP=y

#
# General setup
#
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_BSD_PROCESS_ACCT=y
# CONFIG_BSD_PROCESS_ACCT_V3 is not set
CONFIG_SYSCTL=y
# CONFIG_AUDIT is not set
CONFIG_LOG_BUF_SHIFT=14
CONFIG_HOTPLUG=y
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
# CONFIG_EMBEDDED is not set
CONFIG_KALLSYMS=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=y
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set

#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
CONFIG_OBSOLETE_MODPARM=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y

#
# Processor type and features
#
CONFIG_X86_PC=y
# CONFIG_X86_ELAN is not set
# CONFIG_X86_VOYAGER is not set
# CONFIG_X86_NUMAQ is not set
# CONFIG_X86_SUMMIT is not set
# CONFIG_X86_BIGSMP is not set
# CONFIG_X86_VISWS is not set
# CONFIG_X86_GENERICARCH is not set
# CONFIG_X86_ES7000 is not set
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
CONFIG_MK7=y
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_X86_GENERIC is not set
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_INTEL_USERCOPY=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_USE_3DNOW=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_SMP is not set
CONFIG_PREEMPT=y
# CONFIG_X86_UP_APIC is not set
CONFIG_X86_TSC=y
# CONFIG_X86_MCE is not set
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
CONFIG_X86_CPUID=y

#
# Firmware Drivers
#
# CONFIG_EDD is not set
# CONFIG_NOHIGHMEM is not set
CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set
CONFIG_HIGHMEM=y
CONFIG_HIGHPTE=y
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
CONFIG_HAVE_DEC_LOCK=y
# CONFIG_REGPARM is not set

#
# Power management options (ACPI, APM)
#
CONFIG_PM=y
# CONFIG_SOFTWARE_SUSPEND is not set
# CONFIG_PM_DISK is not set

#
# ACPI (Advanced Configuration and Power Interface) Support
#
# CONFIG_ACPI is not set
CONFIG_ACPI_BOOT=y

#
# APM (Advanced Power Management) BIOS Support
#
CONFIG_APM=y
# CONFIG_APM_IGNORE_USER_SUSPEND is not set
# CONFIG_APM_DO_ENABLE is not set
# CONFIG_APM_CPU_IDLE is not set
# CONFIG_APM_DISPLAY_BLANK is not set
CONFIG_APM_RTC_IS_GMT=y
# CONFIG_APM_ALLOW_INTS is not set
CONFIG_APM_REAL_MODE_POWER_OFF=y

#
# CPU Frequency scaling
#
# CONFIG_CPU_FREQ is not set

#
# Bus options (PCI, PCMCIA, EISA, MCA, ISA)
#
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GOMMCONFIG is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_LEGACY_PROC=y
CONFIG_PCI_NAMES=y
CONFIG_ISA=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_SCx200 is not set

#
# PCMCIA/CardBus support
#
# CONFIG_PCMCIA is not set
CONFIG_PCMCIA_PROBE=y

#
# PCI Hotplug Support
#
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_MISC=y

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
# CONFIG_FW_LOADER is not set

#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set

#
# Parallel port support
#
CONFIG_PARPORT=y
CONFIG_PARPORT_PC=y
CONFIG_PARPORT_PC_CML1=y
# CONFIG_PARPORT_SERIAL is not set
# CONFIG_PARPORT_PC_FIFO is not set
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_OTHER is not set
CONFIG_PARPORT_1284=y

#
# Plug and Play support
#
CONFIG_PNP=y
CONFIG_PNP_DEBUG=y

#
# Protocols
#
CONFIG_ISAPNP=y
CONFIG_PNPBIOS=y
CONFIG_PNPBIOS_PROC_FS=y

#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_XD is not set
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_LBD is not set

#
# ATA/ATAPI/MFM/RLL support
#
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_IDE_SATA is not set
# CONFIG_BLK_DEV_HD_IDE is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_BLK_DEV_IDESCSI is not set
# CONFIG_IDE_TASK_IOCTL is not set
# CONFIG_IDE_TASKFILE_IO is not set

#
# IDE chipset support/bugfixes
#
# CONFIG_IDE_GENERIC is not set
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_IDEPNP is not set
CONFIG_BLK_DEV_IDEPCI=y
# CONFIG_IDEPCI_SHARE_IRQ is not set
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_GENERIC is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
CONFIG_BLK_DEV_ADMA=y
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
CONFIG_BLK_DEV_AMD74XX=y
# CONFIG_BLK_DEV_ATIIXP is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_SC1200 is not set
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_IDE_ARM is not set
# CONFIG_IDE_CHIPSETS is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_IVB is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_BLK_DEV_HD is not set

#
# SCSI device support
#
CONFIG_SCSI=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
CONFIG_CHR_DEV_ST=m
# CONFIG_CHR_DEV_OSST is not set
# CONFIG_BLK_DEV_SR is not set
CONFIG_CHR_DEV_SG=m

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
CONFIG_SCSI_MULTI_LUN=y
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set

#
# SCSI Transport Attributes
#
# CONFIG_SCSI_SPI_ATTRS is not set
# CONFIG_SCSI_FC_ATTRS is not set

#
# SCSI low-level drivers
#
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_7000FASST is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AHA152X is not set
# CONFIG_SCSI_AHA1542 is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_IN2000 is not set
# CONFIG_SCSI_MEGARAID is not set
# CONFIG_SCSI_SATA is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_SCSI_CPQFCTS is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_DTC3280 is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_EATA_PIO is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_GENERIC_NCR5380 is not set
# CONFIG_SCSI_GENERIC_NCR5380_MMIO is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_PPA is not set
# CONFIG_SCSI_IMM is not set
# CONFIG_SCSI_NCR53C406A is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_PAS16 is not set
# CONFIG_SCSI_PCI2000 is not set
# CONFIG_SCSI_PCI2220I is not set
# CONFIG_SCSI_PSI240I is not set
# CONFIG_SCSI_QLOGIC_FAS is not set
# CONFIG_SCSI_QLOGIC_ISP is not set
# CONFIG_SCSI_QLOGIC_FC is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
CONFIG_SCSI_QLA2XXX=y
# CONFIG_SCSI_QLA21XX is not set
# CONFIG_SCSI_QLA22XX is not set
# CONFIG_SCSI_QLA2300 is not set
# CONFIG_SCSI_QLA2322 is not set
# CONFIG_SCSI_QLA6312 is not set
# CONFIG_SCSI_QLA6322 is not set
# CONFIG_SCSI_SEAGATE is not set
# CONFIG_SCSI_SYM53C416 is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_T128 is not set
# CONFIG_SCSI_U14_34F is not set
# CONFIG_SCSI_ULTRASTOR is not set
# CONFIG_SCSI_NSP32 is not set
# CONFIG_SCSI_DEBUG is not set

#
# Old CD-ROM drivers (not SCSI, not IDE)
#
# CONFIG_CD_NO_IDESCSI is not set

#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set

#
# Fusion MPT device support
#
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#
# CONFIG_IEEE1394 is not set

#
# I2O device support
#
# CONFIG_I2O is not set

#
# Networking support
#
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
CONFIG_NETLINK_DEV=y
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_IPV6 is not set
# CONFIG_NETFILTER is not set

#
# SCTP Configuration (EXPERIMENTAL)
#
# CONFIG_IP_SCTP is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set

#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set
# CONFIG_NET_CLS_ROUTE is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
# CONFIG_HAMRADIO is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
CONFIG_NETDEVICES=y
CONFIG_DUMMY=y
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_ETHERTAP is not set
# CONFIG_NET_SB1000 is not set

#
# ARCnet devices
#
# CONFIG_ARCNET is not set

#
# Ethernet (10 or 100Mbit)
#
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_LANCE is not set
# CONFIG_NET_VENDOR_SMC is not set
# CONFIG_NET_VENDOR_RACAL is not set

#
# Tulip family network device support
#
# CONFIG_NET_TULIP is not set
# CONFIG_AT1700 is not set
# CONFIG_DEPCA is not set
# CONFIG_HP100 is not set
# CONFIG_NET_ISA is not set
CONFIG_NET_PCI=y
# CONFIG_PCNET32 is not set
# CONFIG_AMD8111_ETH is not set
# CONFIG_ADAPTEC_STARFIRE is not set
# CONFIG_AC3200 is not set
# CONFIG_APRICOT is not set
# CONFIG_B44 is not set
CONFIG_FORCEDETH=m
# CONFIG_CS89x0 is not set
# CONFIG_DGRS is not set
# CONFIG_EEPRO100 is not set
# CONFIG_E100 is not set
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
# CONFIG_8139CP is not set
CONFIG_8139TOO=m
# CONFIG_8139TOO_PIO is not set
# CONFIG_8139TOO_TUNE_TWISTER is not set
# CONFIG_8139TOO_8129 is not set
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
# CONFIG_VIA_RHINE is not set
# CONFIG_VIA_VELOCITY is not set
# CONFIG_NET_POCKET is not set

#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
CONFIG_R8169=m
# CONFIG_SK98LIN is not set
# CONFIG_TIGON3 is not set

#
# Ethernet (10000 Mbit)
#
# CONFIG_IXGB is not set
# CONFIG_S2IO is not set

#
# Token Ring devices
#
# CONFIG_TR is not set

#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set

#
# Wan interfaces
#
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_SHAPER is not set
# CONFIG_NETCONSOLE is not set

#
# ISDN subsystem
#
# CONFIG_ISDN is not set

#
# Telephony Support
#
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1600
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=1200
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_TSDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input I/O drivers
#
# CONFIG_GAMEPORT is not set
CONFIG_SOUND_GAMEPORT=y
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_INPORT is not set
# CONFIG_MOUSE_LOGIBM is not set
# CONFIG_MOUSE_PC110PAD is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PCSPKR=y
# CONFIG_INPUT_UINPUT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_SERIAL_NONSTANDARD is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
# CONFIG_SERIAL_8250_CONSOLE is not set
CONFIG_SERIAL_8250_NR_UARTS=2
CONFIG_SERIAL_8250_EXTENDED=y
# CONFIG_SERIAL_8250_MANY_PORTS is not set
CONFIG_SERIAL_8250_SHARE_IRQ=y
# CONFIG_SERIAL_8250_DETECT_IRQ is not set
# CONFIG_SERIAL_8250_MULTIPORT is not set
# CONFIG_SERIAL_8250_RSA is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_PRINTER=y
# CONFIG_LP_CONSOLE is not set
# CONFIG_PPDEV is not set
# CONFIG_TIPAR is not set
# CONFIG_QIC02_TAPE is not set

#
# IPMI
#
CONFIG_IPMI_HANDLER=y
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=y
# CONFIG_IPMI_SI is not set
# CONFIG_IPMI_WATCHDOG is not set

#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
CONFIG_HW_RANDOM=y
# CONFIG_NVRAM is not set
CONFIG_RTC=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set

#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
CONFIG_AGP=y
# CONFIG_AGP_ALI is not set
# CONFIG_AGP_ATI is not set
# CONFIG_AGP_AMD is not set
# CONFIG_AGP_AMD64 is not set
# CONFIG_AGP_INTEL is not set
# CONFIG_AGP_INTEL_MCH is not set
CONFIG_AGP_NVIDIA=y
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_SWORKS is not set
# CONFIG_AGP_VIA is not set
# CONFIG_AGP_EFFICEON is not set
CONFIG_DRM=y
# CONFIG_DRM_TDFX is not set
# CONFIG_DRM_GAMMA is not set
# CONFIG_DRM_R128 is not set
CONFIG_DRM_RADEON=y
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_SIS is not set
# CONFIG_MWAVE is not set
# CONFIG_RAW_DRIVER is not set
# CONFIG_HANGCHECK_TIMER is not set

#
# I2C support
#
CONFIG_I2C=y
CONFIG_I2C_CHARDEV=y

#
# I2C Algorithms
#
CONFIG_I2C_ALGOBIT=y
# CONFIG_I2C_ALGOPCF is not set

#
# I2C Hardware Bus support
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_ELEKTOR is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_I810 is not set
CONFIG_I2C_ISA=y
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_PARPORT is not set
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_PROSAVAGE is not set
# CONFIG_I2C_SAVAGE4 is not set
# CONFIG_SCx200_ACB is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set
# CONFIG_I2C_VOODOO3 is not set

#
# Hardware Sensors Chip support
#
CONFIG_I2C_SENSOR=y
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_FSCHER is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_VIA686A is not set
CONFIG_SENSORS_W83781D=y
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83627HF is not set

#
# Other I2C Chip support
#
CONFIG_SENSORS_EEPROM=m
# CONFIG_SENSORS_PCF8574 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_SENSORS_RTC8564 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set

#
# Dallas's 1-wire bus
#
# CONFIG_W1 is not set

#
# Misc devices
#
# CONFIG_IBM_ASM is not set

#
# Multimedia devices
#
CONFIG_VIDEO_DEV=y

#
# Video For Linux
#

#
# Video Adapters
#
CONFIG_VIDEO_BT848=m
# CONFIG_VIDEO_PMS is not set
# CONFIG_VIDEO_BWQCAM is not set
# CONFIG_VIDEO_CQCAM is not set
# CONFIG_VIDEO_W9966 is not set
# CONFIG_VIDEO_CPIA is not set
# CONFIG_VIDEO_SAA5246A is not set
# CONFIG_VIDEO_SAA5249 is not set
# CONFIG_TUNER_3036 is not set
# CONFIG_VIDEO_STRADIS is not set
# CONFIG_VIDEO_ZORAN is not set
# CONFIG_VIDEO_ZR36120 is not set
# CONFIG_VIDEO_SAA7134 is not set
# CONFIG_VIDEO_MXB is not set
# CONFIG_VIDEO_DPC is not set
# CONFIG_VIDEO_HEXIUM_ORION is not set
# CONFIG_VIDEO_HEXIUM_GEMINI is not set
# CONFIG_VIDEO_CX88 is not set
# CONFIG_VIDEO_OVCAMCHIP is not set

#
# Radio Adapters
#
# CONFIG_RADIO_CADET is not set
# CONFIG_RADIO_RTRACK is not set
# CONFIG_RADIO_RTRACK2 is not set
# CONFIG_RADIO_AZTECH is not set
# CONFIG_RADIO_GEMTEK is not set
# CONFIG_RADIO_GEMTEK_PCI is not set
# CONFIG_RADIO_MAXIRADIO is not set
# CONFIG_RADIO_MAESTRO is not set
# CONFIG_RADIO_SF16FMI is not set
# CONFIG_RADIO_SF16FMR2 is not set
# CONFIG_RADIO_TERRATEC is not set
# CONFIG_RADIO_TRUST is not set
# CONFIG_RADIO_TYPHOON is not set
# CONFIG_RADIO_ZOLTRIX is not set

#
# Digital Video Broadcasting Devices
#
# CONFIG_DVB is not set
CONFIG_VIDEO_TUNER=m
CONFIG_VIDEO_BUF=m
CONFIG_VIDEO_BTCX=m
CONFIG_VIDEO_IR=m

#
# Graphics support
#
CONFIG_FB=y
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_VESA is not set
CONFIG_VIDEO_SELECT=y
# CONFIG_FB_HGA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON_OLD is not set
CONFIG_FB_RADEON=y
CONFIG_FB_RADEON_I2C=y
# CONFIG_FB_RADEON_DEBUG is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_VIRTUAL is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
# CONFIG_MDA_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE is not set

#
# Logo configuration
#
# CONFIG_LOGO is not set

#
# Sound
#
CONFIG_SOUND=y

#
# Advanced Linux Sound Architecture
#
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_SEQUENCER=m
# CONFIG_SND_SEQ_DUMMY is not set
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_SEQUENCER_OSS=y
CONFIG_SND_RTCTIMER=m
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set

#
# Generic devices
#
CONFIG_SND_MPU401_UART=m
CONFIG_SND_DUMMY=m
CONFIG_SND_VIRMIDI=m
CONFIG_SND_MTPAV=m
# CONFIG_SND_SERIAL_U16550 is not set
CONFIG_SND_MPU401=m

#
# ISA devices
#
# CONFIG_SND_AD1816A is not set
# CONFIG_SND_AD1848 is not set
# CONFIG_SND_CS4231 is not set
# CONFIG_SND_CS4232 is not set
# CONFIG_SND_CS4236 is not set
# CONFIG_SND_ES968 is not set
# CONFIG_SND_ES1688 is not set
# CONFIG_SND_ES18XX is not set
# CONFIG_SND_GUSCLASSIC is not set
# CONFIG_SND_GUSEXTREME is not set
# CONFIG_SND_GUSMAX is not set
# CONFIG_SND_INTERWAVE is not set
# CONFIG_SND_INTERWAVE_STB is not set
# CONFIG_SND_OPTI92X_AD1848 is not set
# CONFIG_SND_OPTI92X_CS4231 is not set
# CONFIG_SND_OPTI93X is not set
# CONFIG_SND_SB8 is not set
# CONFIG_SND_SB16 is not set
# CONFIG_SND_SBAWE is not set
# CONFIG_SND_WAVEFRONT is not set
# CONFIG_SND_ALS100 is not set
# CONFIG_SND_AZT2320 is not set
# CONFIG_SND_CMI8330 is not set
# CONFIG_SND_DT019X is not set
# CONFIG_SND_OPL3SA2 is not set
# CONFIG_SND_SGALAXY is not set
# CONFIG_SND_SSCAPE is not set

#
# PCI devices
#
CONFIG_SND_AC97_CODEC=m
# CONFIG_SND_ALI5451 is not set
# CONFIG_SND_ATIIXP is not set
# CONFIG_SND_AU8810 is not set
# CONFIG_SND_AU8820 is not set
# CONFIG_SND_AU8830 is not set
# CONFIG_SND_AZT3328 is not set
CONFIG_SND_BT87X=m
# CONFIG_SND_CS46XX is not set
# CONFIG_SND_CS4281 is not set
# CONFIG_SND_EMU10K1 is not set
# CONFIG_SND_KORG1212 is not set
# CONFIG_SND_MIXART is not set
# CONFIG_SND_NM256 is not set
# CONFIG_SND_RME32 is not set
# CONFIG_SND_RME96 is not set
# CONFIG_SND_RME9652 is not set
# CONFIG_SND_HDSP is not set
# CONFIG_SND_TRIDENT is not set
# CONFIG_SND_YMFPCI is not set
# CONFIG_SND_ALS4000 is not set
# CONFIG_SND_CMIPCI is not set
# CONFIG_SND_ENS1370 is not set
# CONFIG_SND_ENS1371 is not set
# CONFIG_SND_ES1938 is not set
# CONFIG_SND_ES1968 is not set
# CONFIG_SND_MAESTRO3 is not set
# CONFIG_SND_FM801 is not set
# CONFIG_SND_ICE1712 is not set
# CONFIG_SND_ICE1724 is not set
CONFIG_SND_INTEL8X0=m
# CONFIG_SND_INTEL8X0M is not set
# CONFIG_SND_SONICVIBES is not set
# CONFIG_SND_VIA82XX is not set
# CONFIG_SND_VX222 is not set

#
# ALSA USB devices
#
# CONFIG_SND_USB_AUDIO is not set

#
# Open Sound System
#
# CONFIG_SOUND_PRIME is not set

#
# USB support
#
CONFIG_USB=y
# CONFIG_USB_DEBUG is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICEFS=y
# CONFIG_USB_BANDWIDTH is not set
# CONFIG_USB_DYNAMIC_MINORS is not set

#
# USB Host Controller Drivers
#
CONFIG_USB_EHCI_HCD=y
# CONFIG_USB_EHCI_SPLIT_ISO is not set
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
CONFIG_USB_OHCI_HCD=y
# CONFIG_USB_UHCI_HCD is not set

#
# USB Device Class drivers
#
# CONFIG_USB_AUDIO is not set
# CONFIG_USB_BLUETOOTH_TTY is not set
# CONFIG_USB_MIDI is not set
# CONFIG_USB_ACM is not set
CONFIG_USB_PRINTER=y
CONFIG_USB_STORAGE=y
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_RW_DETECT is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_DPCM is not set
# CONFIG_USB_STORAGE_HP8200e is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set

#
# USB Human Interface Devices (HID)
#
CONFIG_USB_HID=y
CONFIG_USB_HIDINPUT=y
# CONFIG_HID_FF is not set
CONFIG_USB_HIDDEV=y
# CONFIG_USB_AIPTEK is not set
# CONFIG_USB_WACOM is not set
# CONFIG_USB_KBTAB is not set
# CONFIG_USB_POWERMATE is not set
# CONFIG_USB_MTOUCH is not set
# CONFIG_USB_EGALAX is not set
# CONFIG_USB_XPAD is not set
# CONFIG_USB_ATI_REMOTE is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set
# CONFIG_USB_HPUSBSCSI is not set

#
# USB Multimedia devices
#
# CONFIG_USB_DABUSB is not set
# CONFIG_USB_VICAM is not set
# CONFIG_USB_DSBR is not set
# CONFIG_USB_IBMCAM is not set
# CONFIG_USB_KONICAWC is not set
# CONFIG_USB_OV511 is not set
# CONFIG_USB_PWC is not set
# CONFIG_USB_SE401 is not set
# CONFIG_USB_SN9C102 is not set
# CONFIG_USB_STV680 is not set

#
# USB Network adaptors
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set

#
# USB port drivers
#
# CONFIG_USB_USS720 is not set

#
# USB Serial Converter support
#
CONFIG_USB_SERIAL=y
# CONFIG_USB_SERIAL_CONSOLE is not set
# CONFIG_USB_SERIAL_GENERIC is not set
# CONFIG_USB_SERIAL_BELKIN is not set
# CONFIG_USB_SERIAL_WHITEHEAT is not set
# CONFIG_USB_SERIAL_DIGI_ACCELEPORT is not set
# CONFIG_USB_SERIAL_EMPEG is not set
# CONFIG_USB_SERIAL_FTDI_SIO is not set
# CONFIG_USB_SERIAL_VISOR is not set
# CONFIG_USB_SERIAL_IPAQ is not set
# CONFIG_USB_SERIAL_IR is not set
# CONFIG_USB_SERIAL_EDGEPORT is not set
# CONFIG_USB_SERIAL_EDGEPORT_TI is not set
# CONFIG_USB_SERIAL_KEYSPAN_PDA is not set
# CONFIG_USB_SERIAL_KEYSPAN is not set
# CONFIG_USB_SERIAL_KLSI is not set
# CONFIG_USB_SERIAL_KOBIL_SCT is not set
# CONFIG_USB_SERIAL_MCT_U232 is not set
CONFIG_USB_SERIAL_PL2303=y
# CONFIG_USB_SERIAL_SAFE is not set
# CONFIG_USB_SERIAL_CYBERJACK is not set
# CONFIG_USB_SERIAL_XIRCOM is not set
# CONFIG_USB_SERIAL_OMNINET is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_TIGL is not set
# CONFIG_USB_AUERSWALD is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_PHIDGETSERVO is not set
# CONFIG_USB_TEST is not set

#
# USB Gadget Support
#
# CONFIG_USB_GADGET is not set

#
# File systems
#
CONFIG_EXT2_FS=y
# CONFIG_EXT2_FS_XATTR is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_FS_XATTR is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
# CONFIG_XFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
CONFIG_AUTOFS4_FS=y

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
CONFIG_ZISOFS=y
CONFIG_ZISOFS_FS=y
CONFIG_UDF_FS=y
CONFIG_UDF_NLS=y

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_SYSFS=y
# CONFIG_DEVFS_FS is not set
# CONFIG_DEVPTS_FS_XATTR is not set
# CONFIG_TMPFS is not set
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_RAMFS=y

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set

#
# Network File Systems
#
# CONFIG_NFS_FS is not set
# CONFIG_NFSD is not set
# CONFIG_EXPORTFS is not set
CONFIG_SMB_FS=y
# CONFIG_SMB_NLS_DEFAULT is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y

#
# Native Language Support
#
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
CONFIG_NLS_CODEPAGE_737=m
CONFIG_NLS_CODEPAGE_775=m
CONFIG_NLS_CODEPAGE_850=m
CONFIG_NLS_CODEPAGE_852=m
CONFIG_NLS_CODEPAGE_855=m
CONFIG_NLS_CODEPAGE_857=m
CONFIG_NLS_CODEPAGE_860=m
CONFIG_NLS_CODEPAGE_861=m
CONFIG_NLS_CODEPAGE_862=m
CONFIG_NLS_CODEPAGE_863=m
CONFIG_NLS_CODEPAGE_864=m
CONFIG_NLS_CODEPAGE_865=m
CONFIG_NLS_CODEPAGE_866=m
CONFIG_NLS_CODEPAGE_869=m
CONFIG_NLS_CODEPAGE_936=m
CONFIG_NLS_CODEPAGE_950=m
CONFIG_NLS_CODEPAGE_932=m
CONFIG_NLS_CODEPAGE_949=m
CONFIG_NLS_CODEPAGE_874=m
CONFIG_NLS_ISO8859_8=m
CONFIG_NLS_CODEPAGE_1250=m
CONFIG_NLS_CODEPAGE_1251=m
# CONFIG_NLS_ASCII is not set
CONFIG_NLS_ISO8859_1=y
CONFIG_NLS_ISO8859_2=m
CONFIG_NLS_ISO8859_3=m
CONFIG_NLS_ISO8859_4=m
CONFIG_NLS_ISO8859_5=m
CONFIG_NLS_ISO8859_6=m
CONFIG_NLS_ISO8859_7=m
CONFIG_NLS_ISO8859_9=m
CONFIG_NLS_ISO8859_13=m
CONFIG_NLS_ISO8859_14=m
CONFIG_NLS_ISO8859_15=y
CONFIG_NLS_KOI8_R=m
CONFIG_NLS_KOI8_U=m
CONFIG_NLS_UTF8=y

#
# Profiling support
#
# CONFIG_PROFILING is not set

#
# Kernel hacking
#
# CONFIG_DEBUG_KERNEL is not set
CONFIG_EARLY_PRINTK=y
# CONFIG_DEBUG_SPINLOCK_SLEEP is not set
CONFIG_FRAME_POINTER=y
CONFIG_4KSTACKS=y

#
# Security options
#
CONFIG_SECURITY=y
# CONFIG_SECURITY_NETWORK is not set
CONFIG_SECURITY_CAPABILITIES=y
# CONFIG_SECURITY_ROOTPLUG is not set
# CONFIG_SECURITY_SELINUX is not set

#
# Cryptographic options
#
# CONFIG_CRYPTO is not set

#
# Library routines
#
CONFIG_CRC_CCITT=y
CONFIG_CRC32=y
CONFIG_LIBCRC32C=y
CONFIG_ZLIB_INFLATE=y
CONFIG_X86_BIOS_REBOOT=y
CONFIG_PC=y

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  4:47                     ` Gene Heskett
@ 2004-08-11  4:59                       ` Linus Torvalds
  2004-08-11  8:05                         ` Roger Luethi
  2004-08-13  4:27                         ` Gene Heskett
  0 siblings, 2 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-11  4:59 UTC (permalink / raw)
  To: Gene Heskett; +Cc: Kernel Mailing List, Andrew Morton, Al Viro



I wrote:
> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps
> axm" helps too.

That should be "ps axv" of course. Just shows what a retard I am.

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  4:18                     ` Udo A. Steinberg
@ 2004-08-11  5:13                       ` Linus Torvalds
  2004-08-11  5:15                         ` Linus Torvalds
  2004-08-11  5:55                         ` David S. Miller
  0 siblings, 2 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-11  5:13 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: linux-kernel, Andrew Morton, viro



On Tue, 10 Aug 2004, Udo A. Steinberg wrote:
> 
> I'm currently using 2.6.8-rc4 and I'm seeing the same problem. Each day the
> machine just gets slower and swappier, even though I'm always running the same
> workload. Rebooting helps a lot. The machine has very little memory (128MB).

This is your slab-info sorted according to use:

	bytes used	slab
	----------	-----
	  128,000	filp
	  128,832	size-64
	  142,128	vm_area_struct
	  161,376	size-96
	  184,320	biovec-(256)
	  188,160	biovec-64
	  188,416	pgd
	  188,416	size-1024
	  192,000	biovec-128
	  217,088	size-4096
	  241,664	size-2048
	  290,816	size-512
	  310,464	inode_cache
	  564,144	radix_tree_node
	  608,832	ext3_inode_cache
	  611,520	dentry_cache
	  688,128	size-8192
	  917,504	size-32768
	1,734,048	buffer_head

and that "buffer_head" thing really looks strange. I also wonder what the 
hell is allocating so many 8kB and 32kB entries.

That said, the cumulative slabinfo usage seems to be no more than 
8,924,868 bytes, so it doesn't seem to be slab that is the problem.

> MemTotal:       125124 kB
> MemFree:          1404 kB
> Buffers:         19060 kB
> Cached:          40484 kB
> SwapCached:      33336 kB
> Active:          70176 kB
> Inactive:        41892 kB

"active+inactive" adds up to ~111MB, which together with slab accounts for 
pretty much all your memory. So there isn't anything unaccounted either.

So I suspect it's a balancing issue. Possibly just the slight change in 
slab balancing to fix the highmem problems. Maybe we shrink slab _too_ 
aggressively or something. 

			Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  5:13                       ` Linus Torvalds
@ 2004-08-11  5:15                         ` Linus Torvalds
  2004-08-11  5:33                           ` Udo A. Steinberg
                                             ` (2 more replies)
  2004-08-11  5:55                         ` David S. Miller
  1 sibling, 3 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-11  5:15 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: linux-kernel, Andrew Morton, viro



On Tue, 10 Aug 2004, Linus Torvalds wrote:
> 
> So I suspect it's a balancing issue. Possibly just the slight change in 
> slab balancing to fix the highmem problems. Maybe we shrink slab _too_ 
> aggressively or something. 

Udo, that's a simple thing to check. If it's the slab balancing changes, 
then you should be able to test it with just a

	bk cset -x1.1830.4.3

if you have the current BK and are a BK user, or by just revertign the 
patch here ("patch -R -p1" from inside your linux source tree) if you're 
not a BK user..

		Linus

-----
# This is a BitKeeper generated diff -Nru style patch.
#
# ChangeSet
#   2004/07/31 14:47:41-07:00 akpm@osdl.org 
#   [PATCH] slab memory shrinking balancing fix
#   
#   The logic in shrink_slab tries to balance the proportion of slab which it
#   scans against the proportion of pagecache which the caller scanned.  Problem
#   is that with a large number of highmem LRU pages and a small number of lowmem
#   LRU pages, the amount of pagecache scanning appears to be very small, so we
#   don't push slab hard enough.
#   
#   The patch changes things so that for, say, a GFP_KERNEL allocation attempt we
#   only consider ZONE_NORMAL and ZONE_DMA when calculating "what proportion of
#   the LRU did the caller just scan".
#   
#   This will have the effect of shrinking slab harder in response to GFP_KERNEL
#   allocations than for GFP_HIGHMEM allocations.
#   
#   Signed-off-by: Andrew Morton <akpm@osdl.org>
#   Signed-off-by: Linus Torvalds <torvalds@osdl.org>
# 
# mm/vmscan.c
#   2004/07/31 14:35:31-07:00 akpm@osdl.org +23 -9
#   slab memory shrinking balancing fix
# 
# mm/page_alloc.c
#   2004/07/31 14:02:21-07:00 akpm@osdl.org +0 -11
#   slab memory shrinking balancing fix
# 
# include/linux/mm.h
#   2004/07/31 14:02:26-07:00 akpm@osdl.org +0 -2
#   slab memory shrinking balancing fix
# 
diff -Nru a/include/linux/mm.h b/include/linux/mm.h
--- a/include/linux/mm.h	2004-08-10 22:15:23 -07:00
+++ b/include/linux/mm.h	2004-08-10 22:15:23 -07:00
@@ -706,8 +706,6 @@
 
 extern struct vm_area_struct *find_extend_vma(struct mm_struct *mm, unsigned long addr);
 
-extern unsigned int nr_used_zone_pages(void);
-
 extern struct page * vmalloc_to_page(void *addr);
 extern struct page * follow_page(struct mm_struct *mm, unsigned long address,
 		int write);
diff -Nru a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c	2004-08-10 22:15:23 -07:00
+++ b/mm/page_alloc.c	2004-08-10 22:15:23 -07:00
@@ -825,17 +825,6 @@
 
 EXPORT_SYMBOL(nr_free_pages);
 
-unsigned int nr_used_zone_pages(void)
-{
-	unsigned int pages = 0;
-	struct zone *zone;
-
-	for_each_zone(zone)
-		pages += zone->nr_active + zone->nr_inactive;
-
-	return pages;
-}
-
 #ifdef CONFIG_NUMA
 unsigned int nr_free_pages_pgdat(pg_data_t *pgdat)
 {
diff -Nru a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c	2004-08-10 22:15:23 -07:00
+++ b/mm/vmscan.c	2004-08-10 22:15:23 -07:00
@@ -169,22 +169,25 @@
  * slab to avoid swapping.
  *
  * We do weird things to avoid (scanned*seeks*entries) overflowing 32 bits.
+ *
+ * `lru_pages' represents the number of on-LRU pages in all the zones which
+ * are eligible for the caller's allocation attempt.  It is used for balancing
+ * slab reclaim versus page reclaim.
  */
-static int shrink_slab(unsigned long scanned, unsigned int gfp_mask)
+static int shrink_slab(unsigned long scanned, unsigned int gfp_mask,
+			unsigned long lru_pages)
 {
 	struct shrinker *shrinker;
-	long pages;
 
 	if (down_trylock(&shrinker_sem))
 		return 0;
 
-	pages = nr_used_zone_pages();
 	list_for_each_entry(shrinker, &shrinker_list, list) {
 		unsigned long long delta;
 
 		delta = (4 * scanned) / shrinker->seeks;
 		delta *= (*shrinker->shrinker)(0, gfp_mask);
-		do_div(delta, pages + 1);
+		do_div(delta, lru_pages + 1);
 		shrinker->nr += delta;
 		if (shrinker->nr < 0)
 			shrinker->nr = LONG_MAX;	/* It wrapped! */
@@ -896,6 +899,7 @@
 	int total_scanned = 0, total_reclaimed = 0;
 	struct reclaim_state *reclaim_state = current->reclaim_state;
 	struct scan_control sc;
+	unsigned long lru_pages = 0;
 	int i;
 
 	sc.gfp_mask = gfp_mask;
@@ -903,8 +907,12 @@
 
 	inc_page_state(allocstall);
 
-	for (i = 0; zones[i] != 0; i++)
-		zones[i]->temp_priority = DEF_PRIORITY;
+	for (i = 0; zones[i] != NULL; i++) {
+		struct zone *zone = zones[i];
+
+		zone->temp_priority = DEF_PRIORITY;
+		lru_pages += zone->nr_active + zone->nr_inactive;
+	}
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		sc.nr_mapped = read_page_state(nr_mapped);
@@ -912,7 +920,7 @@
 		sc.nr_reclaimed = 0;
 		sc.priority = priority;
 		shrink_caches(zones, &sc);
-		shrink_slab(sc.nr_scanned, gfp_mask);
+		shrink_slab(sc.nr_scanned, gfp_mask, lru_pages);
 		if (reclaim_state) {
 			sc.nr_reclaimed += reclaim_state->reclaimed_slab;
 			reclaim_state->reclaimed_slab = 0;
@@ -997,7 +1005,7 @@
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
 		int all_zones_ok = 1;
 		int end_zone = 0;	/* Inclusive.  0 = ZONE_DMA */
-
+		unsigned long lru_pages = 0;
 
 		if (nr_pages == 0) {
 			/*
@@ -1021,6 +1029,12 @@
 			end_zone = pgdat->nr_zones - 1;
 		}
 scan:
+		for (i = 0; i <= end_zone; i++) {
+			struct zone *zone = pgdat->node_zones + i;
+
+			lru_pages += zone->nr_active + zone->nr_inactive;
+		}
+
 		/*
 		 * Now scan the zone in the dma->highmem direction, stopping
 		 * at the last zone which needs scanning.
@@ -1048,7 +1062,7 @@
 			sc.priority = priority;
 			shrink_zone(zone, &sc);
 			reclaim_state->reclaimed_slab = 0;
-			shrink_slab(sc.nr_scanned, GFP_KERNEL);
+			shrink_slab(sc.nr_scanned, GFP_KERNEL, lru_pages);
 			sc.nr_reclaimed += reclaim_state->reclaimed_slab;
 			total_reclaimed += sc.nr_reclaimed;
 			if (zone->all_unreclaimable)

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  5:15                         ` Linus Torvalds
@ 2004-08-11  5:33                           ` Udo A. Steinberg
  2004-08-11 14:37                           ` Gene Heskett
  2004-08-13  1:00                           ` Udo A. Steinberg
  2 siblings, 0 replies; 146+ messages in thread
From: Udo A. Steinberg @ 2004-08-11  5:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, viro

[-- Attachment #1: Type: text/plain, Size: 502 bytes --]

On Tue, 10 Aug 2004 22:15:34 -0700 (PDT) Linus Torvalds (LT) wrote:

LT> Udo, that's a simple thing to check. If it's the slab balancing changes, 
LT> then you should be able to test it with just a
LT> 
LT> 	bk cset -x1.1830.4.3
LT> 
LT> if you have the current BK and are a BK user, or by just revertign the 
LT> patch here ("patch -R -p1" from inside your linux source tree) if you're 
LT> not a BK user..

Linus,

Thanks for the patch. I'll run it for a few days and see how things turn out.

-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  5:13                       ` Linus Torvalds
  2004-08-11  5:15                         ` Linus Torvalds
@ 2004-08-11  5:55                         ` David S. Miller
  1 sibling, 0 replies; 146+ messages in thread
From: David S. Miller @ 2004-08-11  5:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: us15, linux-kernel, akpm, viro

On Tue, 10 Aug 2004 22:13:01 -0700 (PDT)
Linus Torvalds <torvalds@osdl.org> wrote:

> I also wonder what the 
> hell is allocating so many 8kB and 32kB entries.

Loopback default MTU is 16K these days, might explain
the 32K entries but not the 8KB ones.  Perhaps the
later are being used for page tables?  Just a guess
on that latter one.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  4:59                       ` Linus Torvalds
@ 2004-08-11  8:05                         ` Roger Luethi
  2004-08-13  4:27                         ` Gene Heskett
  1 sibling, 0 replies; 146+ messages in thread
From: Roger Luethi @ 2004-08-11  8:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Gene Heskett, Kernel Mailing List, Andrew Morton, Al Viro

On Tue, 10 Aug 2004 21:59:48 -0700, Linus Torvalds wrote:
> That should be "ps axv" of course. Just shows what a retard I am.

Note that some of those columns won't work as advertised. I'd be
particularly suspicious about DRS and TRS.

Roger

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  5:15                         ` Linus Torvalds
  2004-08-11  5:33                           ` Udo A. Steinberg
@ 2004-08-11 14:37                           ` Gene Heskett
  2004-08-12  1:26                             ` Nick Piggin
  2004-08-13  1:00                           ` Udo A. Steinberg
  2 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-11 14:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro

[-- Attachment #1: Type: text/plain, Size: 1653 bytes --]

On Wednesday 11 August 2004 01:15, Linus Torvalds wrote:
>On Tue, 10 Aug 2004, Linus Torvalds wrote:
>> So I suspect it's a balancing issue. Possibly just the slight
>> change in slab balancing to fix the highmem problems. Maybe we
>> shrink slab _too_ aggressively or something.
>
>Udo, that's a simple thing to check. If it's the slab balancing
> changes, then you should be able to test it with just a
>
>	bk cset -x1.1830.4.3
>
>if you have the current BK and are a BK user, or by just revertign
> the patch here ("patch -R -p1" from inside your linux source tree)
> if you're not a BK user..
>
>		Linus
>
With the previously attached patch reverted, a fresh kernel builds in:
real    7m18.296s
user    5m49.385s
sys     0m31.760s
which is a marked improvement, but still about 1m30 or so slow.

Is there anything in the /proc/slabinfo output I should watch 
carefully in hopes I can see things going to hell before they 
actually take the machine down?

The attachment is it with only about 20 minutes uptime.  I had been 
playing in the bios, turning off the 50% cpu throttle on overtemp, 
and managed to kill the bios, so I just turned it off till daylight 
again.  This bios seriously needs a tutorial on the interactions 
between various timeing related things.

[snip patch to revert]

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

[-- Attachment #2: slabinfo.new --]
[-- Type: text/plain, Size: 11441 bytes --]

slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
unix_sock            157    160    384   10    1 : tunables   54   27    0 : slabdata     16     16      0
tcp_tw_bucket          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
tcp_bind_bucket       19    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_open_request       0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
inet_peer_cache        0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
ip_fib_hash           10    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
ip_dst_cache           7     15    256   15    1 : tunables  120   60    0 : slabdata      1      1      0
arp_cache              4     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
raw4_sock              0      0    480    8    1 : tunables   54   27    0 : slabdata      0      0      0
udp_sock               2      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
tcp_sock              31     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
flow_cache             0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
mqueue_inode_cache      1      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
udf_inode_cache        0      0    352   11    1 : tunables   54   27    0 : slabdata      0      0      0
smb_request            0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
smb_inode_cache        2     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
isofs_inode_cache      0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
fat_inode_cache        1     11    352   11    1 : tunables   54   27    0 : slabdata      1      1      0
ext2_inode_cache       0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
journal_handle         8    135     28  135    1 : tunables  120   60    0 : slabdata      1      1      0
journal_head         106    972     48   81    1 : tunables  120   60    0 : slabdata     12     12      0
revoke_table          12    290     12  290    1 : tunables  120   60    0 : slabdata      1      1      0
revoke_record          0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
ext3_inode_cache   23067  23067    416    9    1 : tunables   54   27    0 : slabdata   2563   2563      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_epi          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
kioctx                 0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
kiocb                  0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
dnotify_cache        172    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
file_lock_cache       43     43     92   43    1 : tunables  120   60    0 : slabdata      1      1      0
fasync_cache           2    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
shmem_inode_cache      4     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uid_cache              7    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
sgpool-128            32     32   2048    2    1 : tunables   24   12    0 : slabdata     16     16      0
sgpool-64             32     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
sgpool-32             32     32    512    8    1 : tunables   54   27    0 : slabdata      4      4      0
sgpool-16             32     45    256   15    1 : tunables  120   60    0 : slabdata      3      3      0
sgpool-8              32     62    128   31    1 : tunables  120   60    0 : slabdata      2      2      0
cfq_pool              80    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
crq_pool              68    107     36  107    1 : tunables  120   60    0 : slabdata      1      1      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : slabdata      0      0      0
as_arq                 0      0     60   65    1 : tunables  120   60    0 : slabdata      0      0      0
blkdev_ioc            61    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_queue          12     18    448    9    1 : tunables   54   27    0 : slabdata      2      2      0
blkdev_requests       54     78    152   26    1 : tunables  120   60    0 : slabdata      3      3      0
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            256    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
biovec-16            256    260    192   20    1 : tunables  120   60    0 : slabdata     13     13      0
biovec-4             256    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1             329    452     16  226    1 : tunables  120   60    0 : slabdata      2      2      0
bio                  341    366     64   61    1 : tunables  120   60    0 : slabdata      6      6      0
sock_inode_cache     194    198    352   11    1 : tunables   54   27    0 : slabdata     18     18      0
skbuff_head_cache    245    375    160   25    1 : tunables  120   60    0 : slabdata     15     15      0
sock                   3     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
proc_inode_cache     468    468    320   12    1 : tunables   54   27    0 : slabdata     39     39      0
sigqueue              27     27    148   27    1 : tunables  120   60    0 : slabdata      1      1      0
radix_tree_node    13342  13342    276   14    1 : tunables   54   27    0 : slabdata    953    953      0
bdev_cache            11     18    416    9    1 : tunables   54   27    0 : slabdata      2      2      0
mnt_cache             26     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
inode_cache         2184   2184    288   14    1 : tunables   54   27    0 : slabdata    156    156      0
dentry_cache       35876  35896    140   28    1 : tunables  120   60    0 : slabdata   1282   1282      0
filp                1930   2050    160   25    1 : tunables  120   60    0 : slabdata     82     82      0
names_cache           19     19   4096    1    1 : tunables   24   12    0 : slabdata     19     19      0
idr_layer_cache       81     87    136   29    1 : tunables  120   60    0 : slabdata      3      3      0
buffer_head        37260  37260     48   81    1 : tunables  120   60    0 : slabdata    460    460      0
mm_struct             84     84    512    7    1 : tunables   54   27    0 : slabdata     12     12      0
vm_area_struct      7079   7332     84   47    1 : tunables  120   60    0 : slabdata    156    156      0
fs_cache              93    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
files_cache           81     81    416    9    1 : tunables   54   27    0 : slabdata      9      9      0
signal_cache         112    123     96   41    1 : tunables  120   60    0 : slabdata      3      3      0
sighand_cache         93     93   1312    3    1 : tunables   24   12    0 : slabdata     31     31      0
task_struct          100    100   1424    5    2 : tunables   24   12    0 : slabdata     20     20      0
anon_vma            1534   2035      8  407    1 : tunables  120   60    0 : slabdata      5      5      0
pgd                   80     80   4096    1    1 : tunables   24   12    0 : slabdata     80     80      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 : slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             6      6  16384    1    4 : tunables    8    4    0 : slabdata      6      6      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             10     10   8192    1    2 : tunables    8    4    0 : slabdata     10     10      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096            181    181   4096    1    1 : tunables   24   12    0 : slabdata    181    181      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048            178    192   2048    2    1 : tunables   24   12    0 : slabdata     96     96      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            116    116   1024    4    1 : tunables   54   27    0 : slabdata     29     29      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             184    448    512    8    1 : tunables   54   27    0 : slabdata     56     56      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             180    420    256   15    1 : tunables  120   60    0 : slabdata     28     28      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    0 : slabdata      0      0      0
size-192             100    100    192   20    1 : tunables  120   60    0 : slabdata      5      5      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128            1157   1209    128   31    1 : tunables  120   60    0 : slabdata     39     39      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64              606    610     64   61    1 : tunables  120   60    0 : slabdata     10     10      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32             1369   1428     32  119    1 : tunables  120   60    0 : slabdata     12     12      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11 14:37                           ` Gene Heskett
@ 2004-08-12  1:26                             ` Nick Piggin
  2004-08-12  2:23                               ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Nick Piggin @ 2004-08-12  1:26 UTC (permalink / raw)
  To: gene.heskett
  Cc: linux-kernel, Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro

Gene Heskett wrote:

>On Wednesday 11 August 2004 01:15, Linus Torvalds wrote:
>
>>On Tue, 10 Aug 2004, Linus Torvalds wrote:
>>
>>>So I suspect it's a balancing issue. Possibly just the slight
>>>change in slab balancing to fix the highmem problems. Maybe we
>>>shrink slab _too_ aggressively or something.
>>>
>>Udo, that's a simple thing to check. If it's the slab balancing
>>changes, then you should be able to test it with just a
>>
>>	bk cset -x1.1830.4.3
>>
>>if you have the current BK and are a BK user, or by just revertign
>>the patch here ("patch -R -p1" from inside your linux source tree)
>>if you're not a BK user..
>>
>>		Linus
>>
>>
>With the previously attached patch reverted, a fresh kernel builds in:
>real    7m18.296s
>user    5m49.385s
>sys     0m31.760s
>which is a marked improvement, but still about 1m30 or so slow.
>
>

This could easily be from too much slab pressure. How much memory do you 
have?
Have you got highmem turned on?

The new slab pressure calculation is an improvement in that it won't let 
slab
get out of control and cause OOMs, however it can shrink the slab too much.
If you regularly need ZONE_DMA pages, for example. AFAIKS there isn't 
much you
can do about this except go to per-zone slab LRUs.

That said, your stability problems should be resolved first. If they are 
fixed,
and you would like to help track down the slowdown, run the kernel 
compile about
3 times each with and without the patch, and save cat /proc/vmstat 
before and
after each compile. Try to keep all else constant.

Thanks
Nick


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-12  1:26                             ` Nick Piggin
@ 2004-08-12  2:23                               ` Gene Heskett
  2004-08-12  2:36                                 ` Nick Piggin
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-12  2:23 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nick Piggin, Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro

On Wednesday 11 August 2004 21:26, Nick Piggin wrote:
>Gene Heskett wrote:
>>On Wednesday 11 August 2004 01:15, Linus Torvalds wrote:
>>>On Tue, 10 Aug 2004, Linus Torvalds wrote:
>>>>So I suspect it's a balancing issue. Possibly just the slight
>>>>change in slab balancing to fix the highmem problems. Maybe we
>>>>shrink slab _too_ aggressively or something.
>>>
>>>Udo, that's a simple thing to check. If it's the slab balancing
>>>changes, then you should be able to test it with just a
>>>
>>>	bk cset -x1.1830.4.3
>>>
>>>if you have the current BK and are a BK user, or by just revertign
>>>the patch here ("patch -R -p1" from inside your linux source tree)
>>>if you're not a BK user..
>>>
>>>		Linus
>>
>>With the previously attached patch reverted, a fresh kernel builds
>> in: real    7m18.296s
>>user    5m49.385s
>>sys     0m31.760s
>>which is a marked improvement, but still about 1m30 or so slow.
>
>This could easily be from too much slab pressure. How much memory do
> you have?

1 Gb in 2 512Mb sticks of DDR400 ram which signs on in the bios as 
dual channel.  The sticks are in the first and third slots as 
recommended by the mobo docs.

>Have you got highmem turned on?
Yes

>The new slab pressure calculation is an improvement in that it won't
> let slab
>get out of control and cause OOMs, however it can shrink the slab
> too much. If you regularly need ZONE_DMA pages, for example. AFAIKS
> there isn't much you
>can do about this except go to per-zone slab LRUs.

And how would an otherwise clueless user like me determine that?

>That said, your stability problems should be resolved first. If they
> are fixed,

Which as yet is an unknown, Nick.  Uptime now at
 22:15:14 up 12:30,  5 users,  load average: 1.03, 1.11, 1.05

>and you would like to help track down the slowdown, run the kernel
>compile about
>3 times each with and without the patch, and save cat /proc/vmstat
>before and
>after each compile. Try to keep all else constant.

I'll try that if I get to a 30+ hour uptime.

>Thanks
>Nick

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-12  2:23                               ` Gene Heskett
@ 2004-08-12  2:36                                 ` Nick Piggin
  0 siblings, 0 replies; 146+ messages in thread
From: Nick Piggin @ 2004-08-12  2:36 UTC (permalink / raw)
  To: gene.heskett
  Cc: linux-kernel, Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro

Gene Heskett wrote:

>On Wednesday 11 August 2004 21:26, Nick Piggin wrote:
>
>>This could easily be from too much slab pressure. How much memory do
>>you have?
>>
>
>1 Gb in 2 512Mb sticks of DDR400 ram which signs on in the bios as 
>dual channel.  The sticks are in the first and third slots as 
>recommended by the mobo docs.
>
>
>>Have you got highmem turned on?
>>
>Yes
>
>

OK, leave it on until you sort out the stability problem. When we look
at performance problem, we'll probably start with highmem off.

I'll try to reproduce it here, but my highmem system has 4GB and is
allergic to mem=

>>The new slab pressure calculation is an improvement in that it won't
>>let slab
>>get out of control and cause OOMs, however it can shrink the slab
>>too much. If you regularly need ZONE_DMA pages, for example. AFAIKS
>>there isn't much you
>>can do about this except go to per-zone slab LRUs.
>>
>
>And how would an otherwise clueless user like me determine that?
>
>

We can look at deltas on some /proc/vmstat fields like pgpgin, slab_scanned,
pgalloc_dma etc. before and after the kbuild. Look at those deltas before
and after Linus' patch, and see if we can work out what is going on.

>>That said, your stability problems should be resolved first. If they
>>are fixed,
>>
>
>Which as yet is an unknown, Nick.  Uptime now at
> 22:15:14 up 12:30,  5 users,  load average: 1.03, 1.11, 1.05
>
>
>>and you would like to help track down the slowdown, run the kernel
>>compile about
>>3 times each with and without the patch, and save cat /proc/vmstat
>>before and
>>after each compile. Try to keep all else constant.
>>
>
>I'll try that if I get to a 30+ hour uptime.
>

Well make sure it is stable first, then get back to me when you're ready
to tackle the performance problem. Thanks.


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  5:15                         ` Linus Torvalds
  2004-08-11  5:33                           ` Udo A. Steinberg
  2004-08-11 14:37                           ` Gene Heskett
@ 2004-08-13  1:00                           ` Udo A. Steinberg
  2004-08-13  1:31                             ` Linus Torvalds
  2 siblings, 1 reply; 146+ messages in thread
From: Udo A. Steinberg @ 2004-08-13  1:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, viro, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 20266 bytes --]

On Tue, 10 Aug 2004 22:15:34 -0700 (PDT) Linus Torvalds (LT) wrote:

LT> > So I suspect it's a balancing issue. Possibly just the slight change in 
LT> > slab balancing to fix the highmem problems. Maybe we shrink slab _too_ 
LT> > aggressively or something. 
LT> 
LT> Udo, that's a simple thing to check. If it's the slab balancing changes, 
LT> then you should be able to test it with just a
LT> 
LT> 	bk cset -x1.1830.4.3

Linus,

After nearly 2 days of running 2.6.8-rc4 with above patch backed out, the 
machine has gone back into heavy swapping, being rather unresponsive for
several minutes. At that time the only bigger applications running were
X and my mailer, as can be seen in the ps output below.

-Udo.

MemTotal:       125124 kB
MemFree:          1812 kB
Buffers:           216 kB
Cached:           2796 kB
SwapCached:      10024 kB
Active:           8352 kB
Inactive:         4800 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:       125124 kB
LowFree:          1812 kB
SwapTotal:      506512 kB
SwapFree:       457116 kB
Dirty:               0 kB
Writeback:         904 kB
Mapped:           8924 kB
Slab:           107936 kB
Committed_AS:    53088 kB
PageTables:        500 kB
VmallocTotal:   909268 kB
VmallocUsed:      8936 kB
VmallocChunk:   900312 kB

slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
rpc_buffers            8      8   2048    2    1 : tunables   24   12    0 : slabdata      4      4      0
rpc_tasks              8     25    160   25    1 : tunables  120   60    0 : slabdata      1      1      0
rpc_inode_cache        0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
xfrm6_tunnel_spi       0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
fib6_nodes             5    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
ip6_dst_cache          5     18    224   18    1 : tunables  120   60    0 : slabdata      1      1      0
ndisc_cache            1     25    160   25    1 : tunables  120   60    0 : slabdata      1      1      0
raw6_sock              0      0    640    6    1 : tunables   54   27    0 : slabdata      0      0      0
udp6_sock              0      0    608    6    1 : tunables   54   27    0 : slabdata      0      0      0
tcp6_sock              6      7   1120    7    2 : tunables   24   12    0 : slabdata      1      1      0
unix_sock             31     40    384   10    1 : tunables   54   27    0 : slabdata      4      4      0
ip_conntrack           1     25    160   25    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_tw_bucket          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
tcp_bind_bucket        5    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_open_request       0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
inet_peer_cache        0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
secpath_cache          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
xfrm_dst_cache         0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
ip_fib_hash            9    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
ip_dst_cache          65     75    256   15    1 : tunables  120   60    0 : slabdata      5      5      0
arp_cache              1     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
raw4_sock              0      0    480    8    1 : tunables   54   27    0 : slabdata      0      0      0
udp_sock               1      7    512    7    1 : tunables   54   27    0 : slabdata      1      1      0
tcp_sock               8      8   1024    4    1 : tunables   54   27    0 : slabdata      2      2      0
flow_cache             0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uhci_urb_priv          0      0     44   88    1 : tunables  120   60    0 : slabdata      0      0      0
ntfs_big_inode_cache      0      0    448    9    1 : tunables   54   27    0 : slabdata      0      0      0
ntfs_inode_cache       0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
ntfs_name_cache        0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
ntfs_attr_ctx_cache      0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
ntfs_index_ctx_cache      0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
smb_request            0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
smb_inode_cache        0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
nfs_write_data        36     36    448    9    1 : tunables   54   27    0 : slabdata      4      4      0
nfs_read_data         32     36    416    9    1 : tunables   54   27    0 : slabdata      4      4      0
nfs_inode_cache        0      0    544    7    1 : tunables   54   27    0 : slabdata      0      0      0
nfs_page               0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
isofs_inode_cache      0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
fat_inode_cache        0      0    352   11    1 : tunables   54   27    0 : slabdata      0      0      0
ext2_inode_cache       0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
ext2_xattr             0      0     44   88    1 : tunables  120   60    0 : slabdata      0      0      0
journal_handle         4    135     28  135    1 : tunables  120   60    0 : slabdata      1      1      0
journal_head          36    324     48   81    1 : tunables  120   60    0 : slabdata      4      4      0
revoke_table          12    290     12  290    1 : tunables  120   60    0 : slabdata      1      1      0
revoke_record          1    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
ext3_inode_cache     284    549    448    9    1 : tunables   54   27    0 : slabdata     61     61      0
ext3_xattr             0      0     44   88    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_epi          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
kioctx                 0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
kiocb                  0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
dnotify_cache          0      0     20  185    1 : tunables  120   60    0 : slabdata      0      0      0
file_lock_cache        1     43     92   43    1 : tunables  120   60    0 : slabdata      1      1      0
fasync_cache           2    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
shmem_inode_cache      5     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uid_cache              2    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
cfq_pool              64    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
crq_pool               0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : slabdata      0      0      0
as_arq               130    195     60   65    1 : tunables  120   60    0 : slabdata      3      3      0
blkdev_ioc            34    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_queue           9      9    448    9    1 : tunables   54   27    0 : slabdata      1      1      0
blkdev_requests      122    182    152   26    1 : tunables  120   60    0 : slabdata      7      7      0
biovec-(256)          60     60   3072    2    2 : tunables   24   12    0 : slabdata     30     30      0
biovec-128           121    125   1536    5    2 : tunables   24   12    0 : slabdata     25     25      0
biovec-64            250    250    768    5    1 : tunables   54   27    0 : slabdata     50     50      0
biovec-16            251    260    192   20    1 : tunables  120   60    0 : slabdata     13     13      0
biovec-4             249    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1             512    678     16  226    1 : tunables  120   60    0 : slabdata      3      3      0
bio                  510   1281     64   61    1 : tunables  120   60    0 : slabdata     21     21      0
sock_inode_cache      51     66    352   11    1 : tunables   54   27    0 : slabdata      6      6      0
skbuff_head_cache    600    740    192   20    1 : tunables  120   60    0 : slabdata     37     37      0
sock                   4     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
proc_inode_cache      52    288    320   12    1 : tunables   54   27    0 : slabdata     24     24      0
sigqueue              27     27    148   27    1 : tunables  120   60    0 : slabdata      1      1      0
radix_tree_node      419   1078    276   14    1 : tunables   54   27    0 : slabdata     77     77      0
bdev_cache            10     18    416    9    1 : tunables   54   27    0 : slabdata      2      2      0
mnt_cache             23     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
inode_cache         1056   1078    288   14    1 : tunables   54   27    0 : slabdata     77     77      0
dentry_cache        1413   2436    140   28    1 : tunables  120   60    0 : slabdata     87     87      0
filp                 441    750    160   25    1 : tunables  120   60    0 : slabdata     30     30      0
names_cache           13     13   4096    1    1 : tunables   24   12    0 : slabdata     13     13      0
idr_layer_cache       82     87    136   29    1 : tunables  120   60    0 : slabdata      3      3      0
buffer_head          152    891     48   81    1 : tunables  120   60    0 : slabdata     11     11      0
mm_struct             37     56    512    7    1 : tunables   54   27    0 : slabdata      8      8      0
vm_area_struct       929   1598     84   47    1 : tunables  120   60    0 : slabdata     34     34      0
fs_cache              38    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
files_cache           37     72    416    9    1 : tunables   54   27    0 : slabdata      8      8      0
signal_cache          60    123     96   41    1 : tunables  120   60    0 : slabdata      3      3      0
sighand_cache         53     60   1312    3    1 : tunables   24   12    0 : slabdata     20     20      0
task_struct           60     80   1424    5    2 : tunables   24   12    0 : slabdata     16     16      0
anon_vma             455    814      8  407    1 : tunables  120   60    0 : slabdata      2      2      0
pgd                   37     37   4096    1    1 : tunables   24   12    0 : slabdata     37     37      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768            28     28  32768    1    8 : tunables    8    4    0 : slabdata     28     28      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             3      3  16384    1    4 : tunables    8    4    0 : slabdata      3      3      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             61     61   8192    1    2 : tunables    8    4    0 : slabdata     61     61      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096             34     34   4096    1    1 : tunables   24   12    0 : slabdata     34     34      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048             32     32   2048    2    1 : tunables   24   12    0 : slabdata     16     16      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            246    252   1024    4    1 : tunables   54   27    0 : slabdata     63     63      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             380    568    512    8    1 : tunables   54   27    0 : slabdata     71     71      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             335    480    256   15    1 : tunables  120   60    0 : slabdata     32     32      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    0 : slabdata      0      0      0
size-192             220    220    192   20    1 : tunables  120   60    0 : slabdata     11     11      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128             323    372    128   31    1 : tunables  120   60    0 : slabdata     12     12      0
size-96(DMA)           0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
size-96             1629   1681     96   41    1 : tunables  120   60    0 : slabdata     41     41      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64           1554463 1554463     64   61    1 : tunables  120   60    0 : slabdata  25483  25483      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32             2618   2737     32  119    1 : tunables  120   60    0 : slabdata     23     23      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

nr_dirty 0
nr_writeback 0
nr_unstable 0
nr_page_table_pages 125
nr_mapped 2558
nr_slab 26996
pgpgin 6850970
pgpgout 2663539
pswpin 346485
pswpout 178555
pgalloc_high 0
pgalloc_normal 54477318
pgalloc_dma 7478366
pgfree 61956118
pgactivate 696231
pgdeactivate 1111221
pgfault 13656600
pgmajfault 138898
pgrefill_high 0
pgrefill_normal 11740872
pgrefill_dma 1307467
pgsteal_high 0
pgsteal_normal 1842048
pgsteal_dma 297724
pgscan_kswapd_high 0
pgscan_kswapd_normal 5756289
pgscan_kswapd_dma 1211228
pgscan_direct_high 0
pgscan_direct_normal 886314
pgscan_direct_dma 151569
pginodesteal 68
slabs_scanned 1428023
kswapd_steal 1655861
kswapd_inodesteal 805
pageoutrun 15255
allocstall 9684
pgrotated 177966

  PID TTY      STAT   TIME  MAJFL   TRS   DRS  RSS %MEM COMMAND
    1 ?        S      0:01    319   446    33   52  0.0 init [4]        
    2 ?        SN     0:00      0     0     0    0  0.0 [ksoftirqd/0]
    3 ?        S<     0:00      0     0     0    0  0.0 [events/0]
    4 ?        S<     0:00      0     0     0    0  0.0 [khelper]
    5 ?        S<     0:02      0     0     0    0  0.0 [kacpid]
   22 ?        S<     0:01      0     0     0    0  0.0 [kblockd/0]
   23 ?        S      0:00      0     0     0    0  0.0 [khubd]
   40 ?        S<     0:00      0     0     0    0  0.0 [aio/0]
   39 ?        S      0:12      0     0     0    0  0.0 [kswapd0]
  142 ?        S      0:00      0     0     0    0  0.0 [pccardd]
  144 ?        S      0:00      0     0     0    0  0.0 [pccardd]
  152 ?        S      0:00      0     0     0    0  0.0 [kseriod]
  171 ?        S      0:00      0     0     0    0  0.0 [kjournald]
  330 ?        S      0:00      0     0     0    0  0.0 [kjournald]
  331 ?        S<     1:31      0     0     0    0  0.0 [loop0]
  332 ?        S      0:00      0     0     0    0  0.0 [kjournald]
  333 ?        S      0:00      0     0     0    0  0.0 [kjournald]
  334 ?        S      0:00      0     0     0    0  0.0 [kjournald]
  335 ?        S      0:00      0     0     0    0  0.0 [kjournald]
  497 ?        Ss     0:00    374    23  1384   80  0.0 /usr/sbin/syslogd -m 0
  511 ?        Ss     0:00     27    18  1325    0  0.0 /usr/sbin/klogd -c 3 -x
  514 ?        Ss     0:00      7    39  1444    0  0.0 /sbin/cardmgr
  857 ?        Ss     0:00      7    20  1459    0  0.0 /sbin/rpc.portmap
  898 ?        Ss     0:00      7    19  1360    0  0.0 /usr/sbin/inetd
  904 ?        Ss     0:00      7   899  3188    0  0.0 /usr/local/sbin/sshd
  914 ?        S      0:00    273    11  1448  184  0.1 /usr/sbin/crond -l10
  917 ?        Ss     0:00      6    15  1332    0  0.0 /usr/sbin/acpid
  930 ?        Ss     0:00     93    59  1324   60  0.0 /usr/sbin/gpm -m /dev/mouse -t ps2
  933 ?        S      0:00    168   158  1529   44  0.0 /usr/sbin/smartd
  948 tty2     Ss+    0:00      7    11  1324    0  0.0 /sbin/agetty 38400 tty2 linux
  949 tty3     Ss+    0:00      7    11  1324    0  0.0 /sbin/agetty 38400 tty3 linux
  950 tty4     Ss+    0:00      7    11  1324    0  0.0 /sbin/agetty 38400 tty4 linux
  951 tty5     Ss+    0:00      7    11  1324    0  0.0 /sbin/agetty 38400 tty5 linux
  952 tty6     Ss+    0:00      7    11  1324    0  0.0 /sbin/agetty 38400 tty6 linux
  953 tty7     Ss+    0:00      6    11  1324    0  0.0 /sbin/agetty 38400 tty7 linux
  955 tty8     Ss+    0:00      6    11  1324    0  0.0 /sbin/agetty 38400 tty8 linux
  959 tty9     Ss+    0:00      6    11  1324    0  0.0 /sbin/agetty 38400 tty9 linux
  962 tty10    Ss+    0:00      5    11  1324    0  0.0 /sbin/agetty 38400 tty10 linux
  964 ?        Ss     0:00     10    98  3069    0  0.0 /usr/X11R6/bin/xdm -nodaemon
 1081 ?        S     43:25  25126  1505 39590 1712  1.3 /usr/X11R6/bin/X -auth /usr/X11R6/lib/X11/xdm/authdir/authfiles/A:0-dhgq55
 1082 ?        S      0:00      9    98  3517    0  0.0 -:0                         
 1159 ?        S      0:00      0     0     0    0  0.0 [eth1]
 1222 ?        Ss     0:01    217    33  1338  140  0.1 /sbin/dhcpcd -d eth1
 1229 ?        S      0:04   2325   313  3858  476  0.3 blackbox
 1258 ?        S     10:40    245    38  2449  276  0.2 /home/uas/bin/wmbatteries
 1260 ?        S      0:06     87    21  2390  248  0.1 /home/uas/bin/wmcpuload -a -n -lc rgb:ff/ff/33
 1262 ?        S      4:16     84    31  2392  232  0.1 /home/uas/bin/wmnetload -n eth1
 1264 ?        S      0:04     56    19  2388  156  0.1 /home/uas/bin/wmmemload -am -b -c -lc rgb:ff/80/30
 1266 ?        S      0:24    104    28  2391  224  0.1 /home/uas/bin/wmtime -lc rgb:33/33/ff
 1270 ?        Ds     0:00    406    11  2252  220  0.1 /home/uas/bin/root-tail -f -g 350x10+5-10 -fn -schumacher-clean-medium-r-*-*-10-*-*-*-*-*-*-* -color rgb:cc/cc/ff /var/log/messages rgb:88/88/ff /var/log/syslog rgb:ff/88/ff /var/log/maillog
 1294 tty1     Ss+    0:00      5    11  1324    0  0.0 /sbin/agetty 38400 tty1 linux
 4093 ?        S      0:01      0     0     0    0  0.0 [pdflush]
 4707 ?        S      0:00      0     0     0    0  0.0 [pdflush]
 5222 ?        Ss     0:00    429    98  3733  568  0.4 /home/uas/bin/aterm -geometry 80x25
 5223 pts/0    Ss     0:00    313   586  2089  632  0.5 -bash
 5449 ?        Ds     0:15  21414  1403 39548 5784  4.6 sylpheed
 5501 ?        D      0:00     14    11  1452  444  0.3 /usr/sbin/crond -l10
 5502 pts/0    R+     0:00     16    60  2087  652  0.5 ps axv

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-13  1:00                           ` Udo A. Steinberg
@ 2004-08-13  1:31                             ` Linus Torvalds
  2004-08-13  2:03                               ` Gene Heskett
                                                 ` (3 more replies)
  0 siblings, 4 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-13  1:31 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: linux-kernel, Andrew Morton, viro, Nick Piggin



On Thu, 12 Aug 2004, Udo A. Steinberg wrote:
> 
> After nearly 2 days of running 2.6.8-rc4 with above patch backed out, the 
> machine has gone back into heavy swapping, being rather unresponsive for
> several minutes. At that time the only bigger applications running were
> X and my mailer, as can be seen in the ps output below.

Your slab usage seems to be:

	cumulative	     usage	name
	=========	    ======	====
		.....
	  2,021,428	   151,552	pgd
	  2,182,804	   161,376	size-96
	  2,367,124	   184,320	biovec-(256)
	  2,559,124	   192,000	biovec-128
	  2,751,124	   192,000	biovec-64
	  2,997,076	   245,952	ext3_inode_cache
	  3,255,124	   258,048	size-1024
	  3,545,940	   290,816	size-512
	  3,843,468	   297,528	radix_tree_node
	  4,153,932	   310,464	inode_cache
	  4,494,972	   341,040	dentry_cache
	  4,994,684	   499,712	size-8192
	  5,912,188	   917,504	size-32768
	105,397,820	99,485,632	size-64

Something pretty much stands out.

What the _heck_ is doing 64-byte allocations and leaking them?

Can you figure out what triggers it for you? If nothing obvious comes to 
mind, could you do something really silly like this

	--- 1.141/mm/slab.c     2004-07-11 01:52:48 -07:00
	+++ edited/mm/slab.c    2004-08-12 18:30:00 -07:00
	@@ -2360,6 +2360,11 @@
	                 */
	                BUG_ON(csizep->cs_cachep == NULL);
	 #endif
	+               if (csizep->cs_size == 64) {
	+                       static unsigned count;
	+                       if (!(4095 & ++count))
	+                               dump_stack();
	+               }
	                return __cache_alloc(flags & GFP_DMA ?
	                         csizep->cs_dmacachep : csizep->cs_cachep, flags);
	        }

(totally whitespace-damaged) which should just print out a stack dump 
every four thoudand allocations, which should give a good clue (somebody 
else might also be allocating those 64-byte things, but _likely_ we'll see 
at least one of the leakers.. Maybe.)


			Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-13  1:31                             ` Linus Torvalds
@ 2004-08-13  2:03                               ` Gene Heskett
  2004-08-13  2:27                               ` Andreas Dilger
                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-13  2:03 UTC (permalink / raw)
  To: linux-kernel
  Cc: Linus Torvalds, Udo A. Steinberg, Andrew Morton, viro, Nick Piggin

On Thursday 12 August 2004 21:31, Linus Torvalds wrote:
>On Thu, 12 Aug 2004, Udo A. Steinberg wrote:
>> After nearly 2 days of running 2.6.8-rc4 with above patch backed
>> out, the machine has gone back into heavy swapping, being rather
>> unresponsive for several minutes. At that time the only bigger
>> applications running were X and my mailer, as can be seen in the
>> ps output below.
>
>Your slab usage seems to be:
>
>	cumulative	     usage	name
>	=========	    ======	====
>		.....
>	  2,021,428	   151,552	pgd
>	  2,182,804	   161,376	size-96
>	  2,367,124	   184,320	biovec-(256)
>	  2,559,124	   192,000	biovec-128
>	  2,751,124	   192,000	biovec-64
>	  2,997,076	   245,952	ext3_inode_cache
>	  3,255,124	   258,048	size-1024
>	  3,545,940	   290,816	size-512
>	  3,843,468	   297,528	radix_tree_node
>	  4,153,932	   310,464	inode_cache
>	  4,494,972	   341,040	dentry_cache
>	  4,994,684	   499,712	size-8192
>	  5,912,188	   917,504	size-32768
>	105,397,820	99,485,632	size-64
>
>Something pretty much stands out.
>
>What the _heck_ is doing 64-byte allocations and leaking them?
>
>Can you figure out what triggers it for you? If nothing obvious
> comes to mind, could you do something really silly like this
>
>	--- 1.141/mm/slab.c     2004-07-11 01:52:48 -07:00
>	+++ edited/mm/slab.c    2004-08-12 18:30:00 -07:00
>	@@ -2360,6 +2360,11 @@
>	                 */
>	                BUG_ON(csizep->cs_cachep == NULL);
>	 #endif
>	+               if (csizep->cs_size == 64) {
>	+                       static unsigned count;
>	+                       if (!(4095 & ++count))
>	+                               dump_stack();
>	+               }
>	                return __cache_alloc(flags & GFP_DMA ?
>	                         csizep->cs_dmacachep : csizep->cs_cachep,
> flags); }
>
>(totally whitespace-damaged) which should just print out a stack
> dump every four thoudand allocations, which should give a good clue
> (somebody else might also be allocating those 64-byte things, but
> _likely_ we'll see at least one of the leakers.. Maybe.)
>
>
>			Linus

I'm not seeing any of that here Linus.  Let me snip just the alloc sizes
of my current, up a bit over 24 hours, /proc/slabinfo

size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 : slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             7      7  16384    1    4 : tunables    8    4    0 : slabdata      7      7      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             10     10   8192    1    2 : tunables    8    4    0 : slabdata     10     10      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096            182    182   4096    1    1 : tunables   24   12    0 : slabdata    182    182      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048            170    198   2048    2    1 : tunables   24   12    0 : slabdata     99     99      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            124    124   1024    4    1 : tunables   54   27    0 : slabdata     31     31      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             184    448    512    8    1 : tunables   54   27    0 : slabdata     56     56      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             165    450    256   15    1 : tunables  120   60    0 : slabdata     30     30      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    0 : slabdata      0      0      0
size-192             100    100    192   20    1 : tunables  120   60    0 : slabdata      5      5      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128            1205   1271    128   31    1 : tunables  120   60    0 : slabdata     41     41      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64             1850   2745     64   61    1 : tunables  120   60    0 : slabdata     45     45      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32             1369   1428     32  119    1 : tunables  120   60    0 : slabdata     12     12      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

And I'm doing my usual activities here, browseing the web with mozilla,
handling the email with kmail from kde3.3-beta2, and watching a little
tv with tvtime or playing solitaire.

So far (knock on wood) its purring right along at about 80% of what
its normal speed would be, nothing unusual in the logs or in the top
display.  And it did 8 seti units in the last 24 before its 6am update,
part of which was on the unpatched kernel.  5 since 6am, its 22:00
here now.

Apparently Udo has something running I don't, and its a leaker.  Lets
hope your snooper patch will show it.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-13  1:31                             ` Linus Torvalds
  2004-08-13  2:03                               ` Gene Heskett
@ 2004-08-13  2:27                               ` Andreas Dilger
  2004-08-13  3:33                                 ` Linus Torvalds
  2004-08-20  7:02                               ` Udo A. Steinberg
  2004-09-12  7:03                               ` Udo A. Steinberg
  3 siblings, 1 reply; 146+ messages in thread
From: Andreas Dilger @ 2004-08-13  2:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Udo A. Steinberg, linux-kernel, Andrew Morton, viro, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 1404 bytes --]

On Aug 12, 2004  18:31 -0700, Linus Torvalds wrote:
> Can you figure out what triggers it for you? If nothing obvious comes to 
> mind, could you do something really silly like this
> 
> 	--- 1.141/mm/slab.c     2004-07-11 01:52:48 -07:00
> 	+++ edited/mm/slab.c    2004-08-12 18:30:00 -07:00
> 	@@ -2360,6 +2360,11 @@
> 	                 */
> 	                BUG_ON(csizep->cs_cachep == NULL);
> 	 #endif
> 	+               if (csizep->cs_size == 64) {
> 	+                       static unsigned count;
> 	+                       if (!(4095 & ++count))
> 	+                               dump_stack();
> 	+               }
> 	                return __cache_alloc(flags & GFP_DMA ?
> 	                         csizep->cs_dmacachep : csizep->cs_cachep, flags);
> 	        }

I don't know who suggested it first, but someone on l-k had a similar
problem and a more robust method of finding the offender was to dump_stack()
when the slab was grown instead of for each allocation.  That way you
don't see frequent but harmless allocators that don't leak, but rather the
process that is causing the slab to be grown repeatedly.

So putting something like the above in cache_alloc_refill() is probably
the right thing.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://members.shaw.ca/adilger/             http://members.shaw.ca/golinux/


[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-13  2:27                               ` Andreas Dilger
@ 2004-08-13  3:33                                 ` Linus Torvalds
  0 siblings, 0 replies; 146+ messages in thread
From: Linus Torvalds @ 2004-08-13  3:33 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Udo A. Steinberg, linux-kernel, Andrew Morton, viro, Nick Piggin



On Thu, 12 Aug 2004, Andreas Dilger wrote:
> 
> So putting something like the above in cache_alloc_refill() is probably
> the right thing.

Yes, that sounds about right.

		Linus

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-11  4:59                       ` Linus Torvalds
  2004-08-11  8:05                         ` Roger Luethi
@ 2004-08-13  4:27                         ` Gene Heskett
  2004-08-13  8:32                           ` Gene Heskett
  2004-08-14  2:18                           ` Marcelo Tosatti
  1 sibling, 2 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-13  4:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Andrew Morton, Al Viro

On Wednesday 11 August 2004 00:59, Linus Torvalds wrote:
>I wrote:
>> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps
>> axm" helps too.
>
>That should be "ps axv" of course. Just shows what a retard I am.
>
>		Linus
Acck!  I just logged an Oops:
Aug 13 00:02:00 coyote kernel: kjournald starting.  Commit interval 5 seconds
Aug 13 00:02:00 coyote kernel: EXT3 FS on hdb3, internal journal
Aug 13 00:02:00 coyote kernel: EXT3-fs: mounted filesystem with ordered data mode.
Aug 13 00:05:09 coyote kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
Aug 13 00:05:09 coyote kernel:  printing eip:
Aug 13 00:05:09 coyote kernel: c014e0dc
Aug 13 00:05:09 coyote kernel: *pde = 00000000
Aug 13 00:05:09 coyote kernel: Oops: 0002 [#1]
Aug 13 00:05:09 coyote kernel: PREEMPT
Aug 13 00:05:09 coyote kernel: Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
Aug 13 00:05:09 coyote kernel: CPU:    0
Aug 13 00:05:09 coyote kernel: EIP:    0060:[<c014e0dc>]    Not tainted
Aug 13 00:05:09 coyote kernel: EFLAGS: 00010246   (2.6.8-rc4)
Aug 13 00:05:09 coyote kernel: EIP is at remove_inode_buffers+0x4c/0x90
Aug 13 00:05:09 coyote kernel: eax: 00000000   ebx: d7ff68b4   ecx: d7ffffb4   edx: 00000000
Aug 13 00:05:09 coyote kernel: esi: d7ff67e0   edi: 00000001   ebp: c198bed8   esp: c198bec8
Aug 13 00:05:09 coyote kernel: ds: 007b   es: 007b   ss: 0068
Aug 13 00:05:09 coyote kernel: Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050)
Aug 13 00:05:09 coyote kernel: Stack: d7ff67e0 d7ff67e8 d7ff67e0 0000001e c198bf04 c0165242 d7ff67e0 c198b000
Aug 13 00:05:09 coyote kernel:        00000000 0000001e d7ff6988 ed3be928 00000080 00000000 c198b000 c198bf10
Aug 13 00:05:09 coyote kernel:        c016532f 00000080 c198bf44 c013a32c 00000080 000000d0 0002cc1d 013b0a00
Aug 13 00:05:09 coyote kernel: Call Trace:
Aug 13 00:05:09 coyote kernel:  [<c010476f>] show_stack+0x7f/0xa0
Aug 13 00:05:09 coyote kernel:  [<c0104908>] show_registers+0x158/0x1b0
Aug 13 00:05:09 coyote kernel:  [<c0104a89>] die+0x89/0x100
Aug 13 00:05:09 coyote kernel:  [<c0111725>] do_page_fault+0x1f5/0x553
Aug 13 00:05:09 coyote kernel:  [<c01043d9>] error_code+0x2d/0x38
Aug 13 00:05:09 coyote kernel:  [<c0165242>] prune_icache+0x142/0x1f0
Aug 13 00:05:09 coyote kernel:  [<c016532f>] shrink_icache_memory+0x3f/0x50
Aug 13 00:05:09 coyote kernel:  [<c013a32c>] shrink_slab+0x14c/0x190
Aug 13 00:05:09 coyote kernel:  [<c013b639>] balance_pgdat+0x1a9/0x1f0
Aug 13 00:05:09 coyote kernel:  [<c013b73f>] kswapd+0xbf/0xd0
Aug 13 00:05:09 coyote kernel:  [<c0102471>] kernel_thread_helper+0x5/0x14
Aug 13 00:05:09 coyote kernel: Code: 89 50 04 89 02 89 49 04 89 09 8b 03 39 d8 89 c1 75 e2 b8 00
Aug 13 00:05:09 coyote kernel:  <6>note: kswapd0[66] exited with preempt_count 1

The first 3 entries are from a nightly run of rsync, which mounts a
normally unmounted partition for the duration of its run.

Now lets see if I can get meminfo and slabinfo
meminfo:
root@coyote themes]# cat /proc/meminfo
MemTotal:      1035844 kB
MemFree:        152884 kB
Buffers:          4896 kB
Cached:          41276 kB
SwapCached:      36740 kB
Active:         131792 kB
Inactive:        11792 kB
HighTotal:      131008 kB
HighFree:        52640 kB
LowTotal:       904836 kB
LowFree:        100244 kB
SwapTotal:     3857104 kB
SwapFree:      3731720 kB
Dirty:              16 kB
Writeback:           0 kB
Mapped:         121068 kB
Slab:           728876 kB
Committed_AS:   348500 kB
PageTables:       3480 kB
VmallocTotal:   114680 kB
VmallocUsed:     19644 kB
VmallocChunk:    94932 kB

slabinfo:
slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
unix_sock            171    190    384   10    1 : tunables   54   27    0 : slabdata     19     19      0
tcp_tw_bucket          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
tcp_bind_bucket       19    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_open_request       0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
inet_peer_cache        0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
ip_fib_hash           10    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
ip_dst_cache          17     30    256   15    1 : tunables  120   60    0 : slabdata      2      2      0
arp_cache              4     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
raw4_sock              0      0    480    8    1 : tunables   54   27    0 : slabdata      0      0      0
udp_sock               2      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
tcp_sock              31     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
flow_cache             0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
mqueue_inode_cache      1      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
udf_inode_cache        0      0    352   11    1 : tunables   54   27    0 : slabdata      0      0      0
smb_request            0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
smb_inode_cache        2     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
isofs_inode_cache      0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
fat_inode_cache        1     11    352   11    1 : tunables   54   27    0 : slabdata      1      1      0
ext2_inode_cache       0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
journal_handle         8    135     28  135    1 : tunables  120   60    0 : slabdata      1      1      0
journal_head         889   2025     48   81    1 : tunables  120   60    0 : slabdata     25     25      0
revoke_table          14    290     12  290    1 : tunables  120   60    0 : slabdata      1      1      0
revoke_record          0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
ext3_inode_cache  1373751 1373751    416    9    1 : tunables   54   27    0 : slabdata 152639 152639      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_epi          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
kioctx                 0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
kiocb                  0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
dnotify_cache        172    370     20  185    1 : tunables  120   60    0 : slabdata      2      2      0
file_lock_cache       43     43     92   43    1 : tunables  120   60    0 : slabdata      1      1      0
fasync_cache           2    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
shmem_inode_cache      5     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uid_cache              7    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
sgpool-128            32     32   2048    2    1 : tunables   24   12    0 : slabdata     16     16      0
sgpool-64             32     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
sgpool-32             32     32    512    8    1 : tunables   54   27    0 : slabdata      4      4      0
sgpool-16             32     45    256   15    1 : tunables  120   60    0 : slabdata      3      3      0
sgpool-8              32     62    128   31    1 : tunables  120   60    0 : slabdata      2      2      0
cfq_pool              72    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
crq_pool              35    107     36  107    1 : tunables  120   60    0 : slabdata      1      1      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : slabdata      0      0      0
as_arq                 0      0     60   65    1 : tunables  120   60    0 : slabdata      0      0      0
blkdev_ioc            91    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_queue          12     18    448    9    1 : tunables   54   27    0 : slabdata      2      2      0
blkdev_requests       23     78    152   26    1 : tunables  120   60    0 : slabdata      3      3      0
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            256    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
biovec-16            256    260    192   20    1 : tunables  120   60    0 : slabdata     13     13      0
biovec-4             257    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1             294    452     16  226    1 : tunables  120   60    0 : slabdata      2      2      0
bio                  294    366     64   61    1 : tunables  120   60    0 : slabdata      6      6      0
sock_inode_cache     208    231    352   11    1 : tunables   54   27    0 : slabdata     21     21      0
skbuff_head_cache    225    625    160   25    1 : tunables  120   60    0 : slabdata     25     25      0
sock                   3     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
proc_inode_cache     353    504    320   12    1 : tunables   54   27    0 : slabdata     42     42      0
sigqueue              84    108    148   27    1 : tunables  120   60    0 : slabdata      4      4      0
radix_tree_node     1590   4046    276   14    1 : tunables   54   27    0 : slabdata    289    289      0
bdev_cache            12     18    416    9    1 : tunables   54   27    0 : slabdata      2      2      0
mnt_cache             26     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
inode_cache         2179   2198    288   14    1 : tunables   54   27    0 : slabdata    157    157      0
dentry_cache      564066 764232    140   28    1 : tunables  120   60    0 : slabdata  27294  27294      0
filp                2045   2225    160   25    1 : tunables  120   60    0 : slabdata     89     89      0
names_cache           19     19   4096    1    1 : tunables   24   12    0 : slabdata     19     19      0
idr_layer_cache       80     87    136   29    1 : tunables  120   60    0 : slabdata      3      3      0
buffer_head         1467   9477     48   81    1 : tunables  120   60    0 : slabdata    117    117      0
mm_struct             98     98    512    7    1 : tunables   54   27    0 : slabdata     14     14      0
vm_area_struct      7667   8272     84   47    1 : tunables  120   60    0 : slabdata    176    176      0
fs_cache             102    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
files_cache           99     99    416    9    1 : tunables   54   27    0 : slabdata     11     11      0
signal_cache         122    123     96   41    1 : tunables  120   60    0 : slabdata      3      3      0
sighand_cache        105    105   1312    3    1 : tunables   24   12    0 : slabdata     35     35      0
task_struct          115    115   1424    5    2 : tunables   24   12    0 : slabdata     23     23      0
anon_vma            1839   2035      8  407    1 : tunables  120   60    0 : slabdata      5      5      0
pgd                   89     89   4096    1    1 : tunables   24   12    0 : slabdata     89     89      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 : slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             8      8  16384    1    4 : tunables    8    4    0 : slabdata      8      8      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             10     10   8192    1    2 : tunables    8    4    0 : slabdata     10     10      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096            190    190   4096    1    1 : tunables   24   12    0 : slabdata    190    190      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048            170    192   2048    2    1 : tunables   24   12    0 : slabdata     96     96      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            132    132   1024    4    1 : tunables   54   27    0 : slabdata     33     33      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             184    448    512    8    1 : tunables   54   27    0 : slabdata     56     56      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             180    435    256   15    1 : tunables  120   60    0 : slabdata     29     29      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    0 : slabdata      0      0      0
size-192             100    100    192   20    1 : tunables  120   60    0 : slabdata      5      5      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128            1176   1271    128   31    1 : tunables  120   60    0 : slabdata     41     41      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64             1290   2440     64   61    1 : tunables  120   60    0 : slabdata     40     40      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32             1368   1428     32  119    1 : tunables  120   60    0 : slabdata     12     12      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

dmesg >/foo:
Linux version 2.6.8-rc4 (root@coyote.coyote.den) (gcc version 3.3.3 20040412 (Red Hat Linux 3.3.3-7)) #3 Wed Aug 11 04:58:21 EDT 2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
 BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000003fff0000 (usable)
 BIOS-e820: 000000003fff0000 - 000000003fff3000 (ACPI NVS)
 BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
 BIOS-e820: 00000000fec00000 - 00000000fec01000 (reserved)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000ffff0000 - 0000000100000000 (reserved)
127MB HIGHMEM available.
896MB LOWMEM available.
On node 0 totalpages: 262128
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 225280 pages, LIFO batch:16
  HighMem zone: 32752 pages, LIFO batch:7
DMI 2.2 present.
ACPI: RSDP (v000 Nvidia                                    ) @ 0x000f7220
ACPI: RSDT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x3fff3000
ACPI: FADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x3fff3040
ACPI: MADT (v001 Nvidia AWRDACPI 0x42302e31 AWRD 0x00000000) @ 0x3fff7dc0
ACPI: DSDT (v001 NVIDIA AWRDACPI 0x00001000 MSFT 0x0100000e) @ 0x00000000
Built 1 zonelists
Kernel command line: ro root=/dev/hda7 elevator=cfq
Initializing CPU#0
CPU 0 irqstacks, hard=c0408000 soft=c0407000
PID hash table entries: 4096 (order 12: 32768 bytes)
Detected 2088.428 MHz processor.
Using tsc for high-res timesource
Console: colour VGA+ 80x25
Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
Memory: 1035184k/1048512k available (2080k kernel code, 12424k reserved, 863k data, 140k init, 131008k highmem)
Checking if this processor honours the WP bit even in supervisor mode... Ok.
Calibrating delay loop... 4128.76 BogoMIPS
Security Scaffold v1.0.0 initialized
Capability LSM initialized
Mount-cache hash table entries: 512 (order: 0, 4096 bytes)
CPU: After generic identify, caps: 0383fbff c1c3fbff 00000000 00000000
CPU: After vendor identify, caps:  0383fbff c1c3fbff 00000000 00000000
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 512K (64 bytes/line)
CPU: After all inits, caps:        0383fbff c1c3fbff 00000000 00000020
CPU: AMD Athlon(tm) XP 2800+ stepping 00
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
NET: Registered protocol family 16
PCI: PCI BIOS revision 2.10 entry at 0xfb4c0, last bus=2
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
Linux Plug and Play Support v0.97 (c) Adam Belay
pnp: the driver 'system' has been registered
PnPBIOS: Scanning system for PnP BIOS support...
PnPBIOS: Found PnP BIOS installation structure at 0xc00fbf30
PnPBIOS: PnP BIOS version 1.0, entry 0xf0000:0xbf60, dseg 0xf0000
pnp: match found with the PnP device '00:07' and the driver 'system'
pnp: match found with the PnP device '00:08' and the driver 'system'
PnPBIOS: 16 nodes reported by PnP BIOS; 16 recorded by driver
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Probing PCI hardware
PCI: Probing PCI hardware (bus 00)
PCI: nForce2 C1 Halt Disconnect fixup
PCI: Using IRQ router default [10de/01e0] at 0000:00:00.0
radeonfb: Found Intel x86 BIOS ROM Image
radeonfb: Retreived PLL infos from BIOS
radeonfb: Reference=27.00 MHz (RefDiv=12) Memory=200.00 Mhz, System=166.00 MHz
radeonfb: Monitor 1 type CRT found
radeonfb: Monitor 2 type no found
radeonfb: ATI Radeon Yd  DDR SGRAM 128 MB
apm: BIOS version 1.2 Flags 0x07 (Driver version 1.16ac)
highmem bounce pool size: 64 pages
udf: registering filesystem
isapnp: Scanning for PnP cards...
isapnp: No Plug & Play device found
lp: driver loaded but no devices found
Real Time Clock Driver v1.12
Linux agpgart interface v0.100 (c) Dave Jones
agpgart: Detected NVIDIA nForce2 chipset
agpgart: Maximum main memory to use for agp memory: 941M
agpgart: AGP aperture is 128M @ 0xc0000000
[drm] Initialized radeon 1.11.0 20020828 on minor 0: ATI Technologies Inc RV280 [Radeon 9200 SE]
ipmi message handler version v32
ipmi device interface version v32
Serial: 8250/16550 driver $Revision: 1.90 $ 6 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
pnp: the driver 'serial' has been registered
pnp: match found with the PnP device '00:0b' and the driver 'serial'
pnp: match found with the PnP device '00:0f' and the driver 'serial'
pnp: the driver 'parport_pc' has been registered
pnp: match found with the PnP device '00:0d' and the driver 'parport_pc'
parport: PnPBIOS parport detected.
parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
lp0: using parport0 (interrupt-driven).
Using cfq io scheduler
Floppy drive(s): fd0 is 1.44M, fd1 is 360K PC
FDC 0 is a post-1991 82077
loop: loaded (max 8 devices)
Linux video capture interface: v1.00
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
NFORCE2: IDE controller at PCI slot 0000:00:09.0
NFORCE2: chipset revision 162
NFORCE2: not 100% native mode: will probe irqs later
NFORCE2: BIOS didn't set cable bits correctly. Enabling workaround.
NFORCE2: 0000:00:09.0 (rev a2) UDMA133 controller
    ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:DMA, hdb:DMA
    ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:DMA, hdd:DMA
hda: Maxtor 6Y120P0, ATA DISK drive
hdb: Maxtor 54610H6, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hdc: LITE-ON DVDRW LDW-451S, ATAPI CD/DVD-ROM drive
ide1 at 0x170-0x177,0x376 on irq 15
hda: max request size: 128KiB
hda: 240121728 sectors (122942 MB) w/7936KiB Cache, CHS=65535/16/63, UDMA(133)
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 hda8 >
hdb: max request size: 128KiB
hdb: 90045648 sectors (46103 MB) w/2048KiB Cache, CHS=65535/16/63, UDMA(100)
 hdb: hdb1 hdb2 hdb3 hdb4
hdc: ATAPI 40X DVD-ROM CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ehci_hcd 0000:00:02.2: nVidia Corporation nForce2 USB Controller
PCI: Setting latency timer of device 0000:00:02.2 to 64
ehci_hcd 0000:00:02.2: irq 5, pci mem f985b000
ehci_hcd 0000:00:02.2: new USB bus registered, assigned bus number 1
PCI: cache line size of 64 is not supported by device 0000:00:02.2
ehci_hcd 0000:00:02.2: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 6 ports detected
ohci_hcd: 2004 Feb 02 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ohci_hcd: block sizes: ed 64 td 64
ohci_hcd 0000:00:02.0: nVidia Corporation nForce2 USB Controller
PCI: Setting latency timer of device 0000:00:02.0 to 64
ohci_hcd 0000:00:02.0: irq 12, pci mem f985d000
ohci_hcd 0000:00:02.0: new USB bus registered, assigned bus number 2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ohci_hcd 0000:00:02.1: nVidia Corporation nForce2 USB Controller (#2)
PCI: Setting latency timer of device 0000:00:02.1 to 64
ohci_hcd 0000:00:02.1: irq 11, pci mem f985f000
ohci_hcd 0000:00:02.1: new USB bus registered, assigned bus number 3
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 3 ports detected
usbcore: registered new driver usblp
drivers/usb/class/usblp.c: v0.13: USB Printer Device Class driver
Initializing USB Mass Storage driver...
usbcore: registered new driver usb-storage
USB Mass Storage support registered.
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
usbcore: registered new driver usbserial
drivers/usb/serial/usb-serial.c: USB Serial Driver core v2.0
drivers/usb/serial/usb-serial.c: USB Serial support registered for PL-2303
usbcore: registered new driver pl2303
drivers/usb/serial/pl2303.c: Prolific PL2303 USB to serial adaptor driver v0.11
mice: PS/2 mouse device common for all mice
input: PC Speaker
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
input: AT Translated Set 2 keyboard on isa0060/serio0
i2c /dev entries driver
NET: Registered protocol family 2
IP: routing cache hash table of 8192 buckets, 64Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
NET: Registered protocol family 1
NET: Registered protocol family 17
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 140k freed
ohci_hcd 0000:00:02.0: wakeup
ohci_hcd 0000:00:02.1: wakeup
usb 2-1: new full speed USB device using address 2
usb 2-2: new low speed USB device using address 3
input: USB HID v1.10 Mouse [Logitech USB-PS/2 Optical Mouse] on usb-0000:00:02.0-2
usb 3-1: new full speed USB device using address 2
drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 2 if 0 alt 0 proto 2 vid 0x04B8 pid 0x0005
usb 3-2: new full speed USB device using address 3
hub 3-2:1.0: USB hub found
hub 3-2:1.0: 4 ports detected
usb 3-3: new full speed USB device using address 4
drivers/usb/class/usblp.c: usblp0: USB Bidirectional printer dev 4 if 0 alt 0 proto 2 vid 0x04B8 pid 0x0005
usb 3-2.3: new full speed USB device using address 5
EXT3 FS on hda7, internal journal
Adding 3857104k swap on /dev/hdb4.  Priority:-1 extents:1
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda1, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda8, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda5, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hda6, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
8139too Fast Ethernet driver 0.9.27
eth0: RealTek RTL8139 at 0xf9a39000, 00:50:ba:5d:eb:7d, IRQ 12
eth0:  Identified 8139 chip type 'RTL-8139C'
forcedeth: Unknown parameter `mem'
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.28.
PCI: Setting latency timer of device 0000:00:04.0 to 64
eth1: forcedeth.c: subsystem: 01565:2301 bound to 0000:00:04.0
forcedeth: Unknown parameter `mem'
forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.28.
PCI: Setting latency timer of device 0000:00:04.0 to 64
eth0: forcedeth.c: subsystem: 01565:2301 bound to 0000:00:04.0
eth0: no link during initialization.
eth0: link up.
eth0: Promiscuous mode enabled.
device eth0 entered promiscuous mode
PCI: Setting latency timer of device 0000:00:06.0 to 64
intel8x0_measure_ac97_clock: measured 49436 usecs
intel8x0: clocking to 47459
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hdb3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on hdb3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Unable to handle kernel NULL pointer dereference at virtual address 00000004
 printing eip:
c014e0dc
*pde = 00000000
Oops: 0002 [#1]
PREEMPT 
Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
CPU:    0
EIP:    0060:[<c014e0dc>]    Not tainted
EFLAGS: 00010246   (2.6.8-rc4) 
EIP is at remove_inode_buffers+0x4c/0x90
eax: 00000000   ebx: d7ff68b4   ecx: d7ffffb4   edx: 00000000
esi: d7ff67e0   edi: 00000001   ebp: c198bed8   esp: c198bec8
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050)
Stack: d7ff67e0 d7ff67e8 d7ff67e0 0000001e c198bf04 c0165242 d7ff67e0 c198b000 
       00000000 0000001e d7ff6988 ed3be928 00000080 00000000 c198b000 c198bf10 
       c016532f 00000080 c198bf44 c013a32c 00000080 000000d0 0002cc1d 013b0a00 
Call Trace:
 [<c010476f>] show_stack+0x7f/0xa0
 [<c0104908>] show_registers+0x158/0x1b0
 [<c0104a89>] die+0x89/0x100
 [<c0111725>] do_page_fault+0x1f5/0x553
 [<c01043d9>] error_code+0x2d/0x38
 [<c0165242>] prune_icache+0x142/0x1f0
 [<c016532f>] shrink_icache_memory+0x3f/0x50
 [<c013a32c>] shrink_slab+0x14c/0x190
 [<c013b639>] balance_pgdat+0x1a9/0x1f0
 [<c013b73f>] kswapd+0xbf/0xd0
 [<c0102471>] kernel_thread_helper+0x5/0x14
Code: 89 50 04 89 02 89 49 04 89 09 8b 03 39 d8 89 c1 75 e2 b8 00 
 <6>note: kswapd0[66] exited with preempt_count 1

And a 'ps axv':
  PID TTY      STAT   TIME  MAJFL   TRS   DRS  RSS %MEM COMMAND
    1 ?        S      0:01     13    31  1440  288  0.0 init [3]  
    2 ?        SWN    0:00      0     0     0    0  0.0 [ksoftirqd/0]
    3 ?        SW<    0:00      0     0     0    0  0.0 [events/0]
    4 ?        SW<    0:00      0     0     0    0  0.0 [khelper]
   21 ?        SW<    0:00      0     0     0    0  0.0 [kblockd/0]
   22 ?        SW     0:00      0     0     0    0  0.0 [khubd]
   62 ?        SW     0:00      0     0     0    0  0.0 [kapmd]
   64 ?        SW     0:00      0     0     0    0  0.0 [pdflush]
   65 ?        SW     0:00      0     0     0    0  0.0 [pdflush]
   67 ?        SW<    0:00      0     0     0    0  0.0 [aio/0]
  194 ?        SW     0:00      0     0     0    0  0.0 [kseriod]
  231 ?        SW     0:01      0     0     0    0  0.0 [kjournald]
 1419 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1420 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1421 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1422 ?        SW     0:01      0     0     0    0  0.0 [kjournald]
 1423 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1874 ?        S      0:00      5    27  1432  308  0.0 syslogd -m 0
 1878 ?        S      0:00      7    20  1383  316  0.0 klogd -x
 1889 ?        S      0:00      4    27  1508  272  0.0 portmap
 1955 ?        S      0:00      9   149  2918  376  0.0 arpwatch -u pcap -e root -s root (Arpwatch)
 1964 ?        S      0:00     34   237  8886  920  0.0 cupsd
 2092 ?        S      0:00      0   262  3385  440  0.0 /usr/sbin/sshd
 2107 ?        S      0:00      1   143  1872  356  0.0 xinetd -stayalive -pidfile /var/run/xinetd.pid
 2129 ?        S      0:00      1   690  6357 1948  0.1 sendmail: accepting connections
 2140 ?        S      0:00      0   690  5433 1544  0.1 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
 2150 ?        S      0:00      0    78  1521  324  0.0 gpm -m /dev/input/mice -t imps2
 2162 ?        S      0:00     34   250 22985 5564  0.5 /usr/sbin/httpd
 2215 ?        S      0:00      0   133  1758  264  0.0 /usr/sbin/cannaserver -syslog -u bin
 2227 ?        S      0:00      0    23  1488  332  0.0 crond
 2259 ?        S      0:00      1   250 22985 5452  0.5 /usr/sbin/httpd
 2260 ?        S      0:00      1   250 22985 5452  0.5 /usr/sbin/httpd
 2261 ?        S      0:00      1   250 22985 5452  0.5 /usr/sbin/httpd
 2262 ?        S      0:00      1   250 22985 5452  0.5 /usr/sbin/httpd
 2263 ?        S      0:00      1   250 22985 5452  0.5 /usr/sbin/httpd
 2264 ?        S      0:00      1   250 22985 5452  0.5 /usr/sbin/httpd
 2265 ?        S      0:00      1   250 22985 5452  0.5 /usr/sbin/httpd
 2266 ?        S      0:00      1   250 22985 5452  0.5 /usr/sbin/httpd
 2298 ?        S      0:02     87    72  7675 2936  0.2 xfs -droppriv -daemon
 2307 ?        S      0:00      4  2727  7584 1584  0.1 smbd -D
 2311 ?        S      0:00      3   868  7183 1044  0.1 nmbd -D
 2347 ?        S      0:00      0    15  1484  320  0.0 /usr/sbin/atd
 2364 ?        S      0:00      0   230  1629  508  0.0 dbus-daemon-1 --system
 2377 ttyS0    S      2:10      5   153  1810  756  0.0 /usr/local/bulldog/upsd
 2482 ?        S      0:00      0   554  3797  696  0.0 /bin/sh /root/bin/setibatch -run
 2569 ?        S      0:00      4    17  2594  424  0.0 login -- root     
 2570 tty2     S      0:00      0     8  1375  236  0.0 /sbin/mingetty tty2
 2571 tty3     S      0:00      0     8  1375  236  0.0 /sbin/mingetty tty3
 2572 tty4     S      0:00      0     8  1375  236  0.0 /sbin/mingetty tty4
 2573 tty5     S      0:00      0     8  1375  236  0.0 /sbin/mingetty tty5
 2574 tty6     S      0:00      0     8  1375  236  0.0 /sbin/mingetty tty6
 2725 tty1     S      0:00      1   554  3801  372  0.0 -bash
 2757 tty1     S      0:00      1   554  3741  276  0.0 /bin/sh /usr/X11R6/bin/startx
 2768 tty1     S      0:00      4     8  2195  284  0.0 xinit /etc/X11/xinit/xinitrc --
 2769 ?        S<    36:22    975  1498 186685 26996  2.6 X :0
 2783 tty1     S      0:00      2   554  3745  296  0.0 /bin/sh /root/kde3.3-beta2/bin/startkde
 2802 ?        S      0:00      0    55  3252  280  0.0 ssh-agent /etc/X11/xinit/Xclients
 2844 ?        S      0:00     14    34 24313 2724  0.2 kdeinit: Running...      
 2847 ?        S      0:00     18    34 23533 2280  0.2 kdeinit: dcopserver --nosid
 2849 ?        S      0:00     47    34 27793 3504  0.3 kdeinit: klauncher       
 2852 ?        S      0:11     69    34 28977 4812  0.4 kdeinit: kded            
 2853 ?        S      0:18     62   135  4160 2316  0.2 fam
 2861 ?        S      0:04     28    34 27033 4244  0.4 kdeinit: kxkb            
 2869 ?        S      1:16    159   154 20441 6680  0.6 artsd -F 11 -S 4096 -a alsa -s 15 -m artsmessage -c drkonqi -l 3 -f
 2871 ?        S      0:04     33    34 33453 6632  0.6 kdeinit: knotify         
 2873 tty1     S      0:00      7     1 20546 4508  0.4 ksmserver
 2874 ?        S      0:13     58    34 36565 5776  0.5 kdeinit: kwin -session 11c0a80103000104545060000000016110000_1092106762_766963
 2877 ?        S      0:03     12    34 25225 4160  0.4 kdeinit: khotkeys        
 2878 ?        S      0:23     99    34 28225 6604  0.6 kdeinit: kdesktop        
 2885 ?        S      0:47    152    34 45721 8400  0.8 kdeinit: kicker          
 2888 ?        S      0:03     53    34 42293 5768  0.5 kdeinit: klipper         
 2893 ?        S      0:15     43    61 28138 6068  0.5 korgac --miniicon korganizer
 2894 ?        S      0:05     54   830 28705 6176  0.5 kgpg -session 11c0a84703000109133389800000023980007_1092106762_470713
 2897 ?        S      0:05     44   240 28987 5196  0.5 knotes -session 11c0a84703000107107498500000013490017_1092106762_471216
 2899 ?        S      0:11     33    34 28829 5200  0.5 kdeinit: kmix -session 11c0a84703000109118722300000023870010_1092106762_471781
 2900 ?        S      0:04     95    34 29497 5996  0.5 kdeinit: konsole -session 11c0a84703000109146439400000023670008_1092106762_472128 -name Qt-subapplication
 2904 pts/1    S      0:00      0   554  3801  368  0.0 /bin/bash
 2908 ?        S      1:21    144   590 34129 4292  0.4 /usr/bin/gkrellm --sm-client-id 11c0a84703000109197323200000021320011
 2922 ?        S      7:24   1138    11 90820 39460  3.8 kmail -session 11c0a84703000109193895800000021320008_1092106762_468128
 2924 ?        S      0:46     15  1060  4947 1964  0.1 /usr/local/bulldog/monitor
 2927 ?        S      0:00      2    96 25219 2576  0.2 kalarmd --login
 2946 pts/1    S      0:00      0    33  3934  264  0.0 tail -f /var/log/messages
 2948 ?        S      3:25     34    34 29369 5664  0.5 kdeinit: konsole         
 2949 pts/2    S      0:00      1   554  3801  368  0.0 /bin/bash
 2967 pts/2    S      4:47      1    47  1776  668  0.0 top
 2976 ?        S      0:14     88    34 34297 6204  0.5 kdeinit: konsole         
 2977 pts/3    S      0:00      8   554  3805  880  0.0 /bin/bash
14824 ?        S      0:04     56    34 28885 5824  0.5 kdeinit: kio_uiserver    
32416 ?        S      0:00      0   141  1566  748  0.0 /usr/sbin/smartd
14706 ?        S      0:00      0    37  2482  700  0.0 /usr/bin/esd -terminate -nobeeps -as 2 -spawnfd 17
19677 ?        S      0:00     24    34 27993 4084  0.3 kdeinit: kio_file file /tmp/ksocket-root/klauncher4bqXZb.slave-socket /tmp/ksocket-root/kioexecyNpOWa.slave-socket
20320 ?        S      0:00      1    23  1492  380  0.0 CROND
20321 ?        Z      0:00      0     0     0    0  0.0 [night-switch] <defunct>
20333 ttyS1    S      0:00      5    44  1567  524  0.0 heyu_relay ck
20335 ?        S      0:00      1   690  5449 1812  0.1 /usr/sbin/sendmail -FCronDaemon -i -odi -oem root@coyote.coyote.den
20339 ?        S      0:04      1    44  1571  480  0.0 heyu monitor
20353 ?        S      0:00      0    19  1392  368  0.0 /usr/local/bin/xtend -f /etc/.xtendrc
20807 ?        S      0:00      0   627  5232 1380  0.1 /sbin/mount.smbfs //gene.coyote.den/public /mnt/gene -o rw username root password XXXXXXXXX
20809 ?        SW     0:00      0     0     0    0  0.0 [smbiod]
20812 ?        S      0:00      1   627  5180 1308  0.1 /sbin/mount.smbfs //gene.coyote.den/dlds /mnt/dlds -o rw username root password XXXXXXXXX
 4848 ?        RN    40:32     18   131 16596 14448  1.3 /usr/local/bin/setiathome -stop_after_process -nice 19
 5157 ?        S      0:00      0    23  1488  396  0.0 CROND
 5158 ?        S      0:00      3   554  1449  736  0.0 /bin/sh /root/bin/backup-me-nightly
 5160 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 5324 ?        D      0:01      6    46  1377  356  0.0 umount /mnt/hdb3
 5418 ?        S      0:00     28    34 51109 7512  0.7 kdeinit: kio_pop3 pop3 /tmp/ksocket-root/klauncher4bqXZb.slave-socket /tmp/ksocket-root/kmailhxI3Ga.slave-socket
 5532 pts/3    R      0:00      0    64  2143  576  0.0 ps axv

The system is still up, and I'll probably leave it till the rsync run is done,
maybe another 30 minutes if it stays up.  Humm, its not running now according
to top or a ps, but the partition /dev/hdb3 is still mounted according to 
/etc/mtab.  So I assume its safe to reboot if rsync isn't alive yet.  I presume
it will self-destruct eventually if kswapd isn't on duty.

Is this enough for an autopsy?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-13  4:27                         ` Gene Heskett
@ 2004-08-13  8:32                           ` Gene Heskett
  2004-08-14  2:18                           ` Marcelo Tosatti
  1 sibling, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-13  8:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Linus Torvalds, Andrew Morton, Al Viro

On Friday 13 August 2004 00:27, Gene Heskett wrote:
>On Wednesday 11 August 2004 00:59, Linus Torvalds wrote:
>>I wrote:
>>> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps
>>> axm" helps too.
>>
>>That should be "ps axv" of course. Just shows what a retard I am.
>>
>>		Linus
>
Acck!  I just logged another Oops, this time with only:
04:22:50 up  3:52,  5 users,  load average: 4.31, 2.95, 2.22
uptime.

Aug 13 04:20:21 coyote kernel: Unable to handle kernel paging request at virtual address 00003614
Aug 13 04:20:21 coyote kernel:  printing eip:
Aug 13 04:20:21 coyote kernel: c01632ae
Aug 13 04:20:21 coyote kernel: *pde = 00000000
Aug 13 04:20:21 coyote kernel: Oops: 0000 [#1]
Aug 13 04:20:21 coyote kernel: PREEMPT
Aug 13 04:20:21 coyote kernel: Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
Aug 13 04:20:21 coyote kernel: CPU:    0
Aug 13 04:20:21 coyote kernel: EIP:    0060:[<c01632ae>]    Not tainted
Aug 13 04:20:21 coyote kernel: EFLAGS: 00010206   (2.6.8-rc4)
Aug 13 04:20:21 coyote kernel: EIP is at prune_dcache+0x14e/0x1c0
Aug 13 04:20:21 coyote kernel: eax: 00003600   ebx: dbbf3070   ecx: da707230   edx: da703430
Aug 13 04:20:21 coyote kernel: esi: da703420   edi: c198b000   ebp: c198bf04   esp: c198beec
Aug 13 04:20:21 coyote kernel: ds: 007b   es: 007b   ss: 0068
Aug 13 04:20:21 coyote kernel: Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050)
Aug 13 04:20:21 coyote kernel: Stack: df5580fc c198bef0 00000046 00000080 00000000 c198b000 c198bf10 c0163770
Aug 13 04:20:21 coyote kernel:        00000080 c198bf44 c013a32c 00000080 000000d0 0001f3cf 02277700 00000000
Aug 13 04:20:21 coyote kernel:        0000011a 00000000 f7ffea60 c035c624 00000002 0000000a c198bf8c c013b639
Aug 13 04:20:21 coyote kernel: Call Trace:
Aug 13 04:20:21 coyote kernel:  [<c010476f>] show_stack+0x7f/0xa0
Aug 13 04:20:21 coyote kernel:  [<c0104908>] show_registers+0x158/0x1b0
Aug 13 04:20:21 coyote kernel:  [<c0104a89>] die+0x89/0x100
Aug 13 04:20:21 coyote kernel:  [<c0111725>] do_page_fault+0x1f5/0x553
Aug 13 04:20:21 coyote kernel:  [<c01043d9>] error_code+0x2d/0x38
Aug 13 04:20:21 coyote kernel:  [<c0163770>] shrink_dcache_memory+0x20/0x50
Aug 13 04:20:21 coyote kernel:  [<c013a32c>] shrink_slab+0x14c/0x190
Aug 13 04:20:21 coyote kernel:  [<c013b639>] balance_pgdat+0x1a9/0x1f0
Aug 13 04:20:21 coyote kernel:  [<c013b73f>] kswapd+0xbf/0xd0
Aug 13 04:20:21 coyote kernel:  [<c0102471>] kernel_thread_helper+0x5/0x14
Aug 13 04:20:21 coyote kernel: Code: 8b 50 14 85 d2 75 27 89 34 24 e8 83 2b 00 00 8b 73 0c 89 1c

[root@coyote themes]# cat /proc/meminfo
MemTotal:      1035844 kB
MemFree:          4184 kB
Buffers:         23072 kB
Cached:         109932 kB
SwapCached:        864 kB
Active:         171624 kB
Inactive:       113532 kB
HighTotal:      131008 kB
HighFree:          280 kB
LowTotal:       904836 kB
LowFree:          3904 kB
SwapTotal:     3857104 kB
SwapFree:      3827944 kB
Dirty:              76 kB
Writeback:           0 kB
Mapped:         195660 kB
Slab:           736384 kB
Committed_AS:   315580 kB
PageTables:       3200 kB
VmallocTotal:   114680 kB
VmallocUsed:     19644 kB
VmallocChunk:    94932 kB

But top says I'm 102140 K into the swap?????

slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
unix_sock            164    170    384   10    1 : tunables   54   27    0 : slabdata     17     17      0
tcp_tw_bucket          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
tcp_bind_bucket       19    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_open_request       0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
inet_peer_cache        0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
ip_fib_hash           10    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
ip_dst_cache           4     15    256   15    1 : tunables  120   60    0 : slabdata      1      1      0
arp_cache              3     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
raw4_sock              0      0    480    8    1 : tunables   54   27    0 : slabdata      0      0      0
udp_sock               2      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
tcp_sock              31     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
flow_cache             0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
mqueue_inode_cache      1      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
udf_inode_cache        0      0    352   11    1 : tunables   54   27    0 : slabdata      0      0      0
smb_request            0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
smb_inode_cache        2     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
isofs_inode_cache      0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
fat_inode_cache      132    132    352   11    1 : tunables   54   27    0 : slabdata     12     12      0
ext2_inode_cache       0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
journal_handle        16    135     28  135    1 : tunables  120   60    0 : slabdata      1      1      0
journal_head        1357   2754     48   81    1 : tunables  120   60    0 : slabdata     34     34      0
revoke_table          12    290     12  290    1 : tunables  120   60    0 : slabdata      1      1      0
revoke_record          0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
ext3_inode_cache  1328526 1328526    416    9    1 : tunables   54   27    0 : slabdata 147614 147614      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_epi          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
kioctx                 0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
kiocb                  0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
dnotify_cache        172    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
file_lock_cache       43     43     92   43    1 : tunables  120   60    0 : slabdata      1      1      0
fasync_cache           2    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
shmem_inode_cache      5     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uid_cache              9    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
sgpool-128            32     32   2048    2    1 : tunables   24   12    0 : slabdata     16     16      0
sgpool-64             32     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
sgpool-32             32     32    512    8    1 : tunables   54   27    0 : slabdata      4      4      0
sgpool-16             32     45    256   15    1 : tunables  120   60    0 : slabdata      3      3      0
sgpool-8              32     62    128   31    1 : tunables  120   60    0 : slabdata      2      2      0
cfq_pool              64    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
crq_pool               0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : slabdata      0      0      0
as_arq               130    195     60   65    1 : tunables  120   60    0 : slabdata      3      3      0
blkdev_ioc            76    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_queue          12     18    448    9    1 : tunables   54   27    0 : slabdata      2      2      0
blkdev_requests       96    156    152   26    1 : tunables  120   60    0 : slabdata      6      6      0
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            265    265    768    5    1 : tunables   54   27    0 : slabdata     53     53      0
biovec-16            280    280    192   20    1 : tunables  120   60    0 : slabdata     14     14      0
biovec-4             272    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1             364    678     16  226    1 : tunables  120   60    0 : slabdata      3      3      0
bio                  369    549     64   61    1 : tunables  120   60    0 : slabdata      9      9      0
sock_inode_cache     202    209    352   11    1 : tunables   54   27    0 : slabdata     19     19      0
skbuff_head_cache    245    400    160   25    1 : tunables  120   60    0 : slabdata     16     16      0
sock                   3     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
proc_inode_cache     337    408    320   12    1 : tunables   54   27    0 : slabdata     34     34      0
sigqueue              27     27    148   27    1 : tunables  120   60    0 : slabdata      1      1      0
radix_tree_node     5371  14994    276   14    1 : tunables   54   27    0 : slabdata   1071   1071      0
bdev_cache            11     18    416    9    1 : tunables   54   27    0 : slabdata      2      2      0
mnt_cache             26     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
inode_cache         2179   2212    288   14    1 : tunables   54   27    0 : slabdata    158    158      0
dentry_cache      1061172 1061172    140   28    1 : tunables  120   60    0 : slabdata  37899  37899      0
filp                1970   2150    160   25    1 : tunables  120   60    0 : slabdata     86     86      0
names_cache            9      9   4096    1    1 : tunables   24   12    0 : slabdata      9      9      0
idr_layer_cache       81     87    136   29    1 : tunables  120   60    0 : slabdata      3      3      0
buffer_head        11414  40014     48   81    1 : tunables  120   60    0 : slabdata    494    494      0
mm_struct             98     98    512    7    1 : tunables   54   27    0 : slabdata     14     14      0
vm_area_struct      7442   7802     84   47    1 : tunables  120   60    0 : slabdata    166    166      0
fs_cache              88    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
files_cache           88     99    416    9    1 : tunables   54   27    0 : slabdata     11     11      0
signal_cache         108    123     96   41    1 : tunables  120   60    0 : slabdata      3      3      0
sighand_cache        102    105   1312    3    1 : tunables   24   12    0 : slabdata     35     35      0
task_struct          110    115   1424    5    2 : tunables   24   12    0 : slabdata     23     23      0
anon_vma            1674   2035      8  407    1 : tunables  120   60    0 : slabdata      5      5      0
pgd                   89     89   4096    1    1 : tunables   24   12    0 : slabdata     89     89      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 : slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             4      4  16384    1    4 : tunables    8    4    0 : slabdata      4      4      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             11     11   8192    1    2 : tunables    8    4    0 : slabdata     11     11      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096            184    184   4096    1    1 : tunables   24   12    0 : slabdata    184    184      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048            166    170   2048    2    1 : tunables   24   12    0 : slabdata     85     85      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            120    120   1024    4    1 : tunables   54   27    0 : slabdata     30     30      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             169    448    512    8    1 : tunables   54   27    0 : slabdata     56     56      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             165    420    256   15    1 : tunables  120   60    0 : slabdata     28     28      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    0 : slabdata      0      0      0
size-192              98    100    192   20    1 : tunables  120   60    0 : slabdata      5      5      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128            1231   1240    128   31    1 : tunables  120   60    0 : slabdata     40     40      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64            49226  49227     64   61    1 : tunables  120   60    0 : slabdata    807    807      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32             1309   1309     32  119    1 : tunables  120   60    0 : slabdata     11     11      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

But dentry_cache      1061172 ?

  PID TTY      STAT   TIME  MAJFL   TRS   DRS  RSS %MEM COMMAND
    1 ?        S      0:01     12    31  1440  480  0.0 init [3]  
    2 ?        SWN    0:00      0     0     0    0  0.0 [ksoftirqd/0]
    3 ?        SW<    0:00      0     0     0    0  0.0 [events/0]
    4 ?        SW<    0:00      0     0     0    0  0.0 [khelper]
   21 ?        SW<    0:01      0     0     0    0  0.0 [kblockd/0]
   22 ?        SW     0:00      0     0     0    0  0.0 [khubd]
   62 ?        SW     0:00      0     0     0    0  0.0 [kapmd]
   64 ?        SW     0:00      0     0     0    0  0.0 [pdflush]
   65 ?        SW     0:00      0     0     0    0  0.0 [pdflush]
   67 ?        SW<    0:00      0     0     0    0  0.0 [aio/0]
  194 ?        SW     0:00      0     0     0    0  0.0 [kseriod]
  231 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1415 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1416 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1417 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1418 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1419 ?        SW     0:00      0     0     0    0  0.0 [kjournald]
 1886 ?        S      0:00      0    27  1432  580  0.0 syslogd -m 0
 1890 ?        S      0:00      0    20  1383  448  0.0 klogd -x
 1901 ?        S      0:00      1    27  1508  568  0.0 portmap
 1911 ?        S      0:00      0   141  1566  752  0.0 /usr/sbin/smartd
 1977 ?        S      0:00      1   149  2918  996  0.0 arpwatch -u pcap -e root -s root (Arpwatch)
 1986 ?        S      0:00      9   237  8886 3336  0.3 cupsd
 2114 ?        S      0:00      0   262  3385 1460  0.1 /usr/sbin/sshd
 2129 ?        S      0:00      1   143  1872  916  0.0 xinetd -stayalive -pidfile /var/run/xinetd.pid
 2151 ?        S      0:00      1   690  6357 2808  0.2 sendmail: accepting connections
 2162 ?        S      0:00      0   690  5433 2364  0.2 sendmail: Queue runner@01:00:00 for /var/spool/clientmqueue
 2172 ?        S      0:00      0    78  1521  460  0.0 gpm -m /dev/input/mice -t imps2
 2184 ?        S      0:00     34   250 22985 10396  1.0 /usr/sbin/httpd
 2237 ?        S      0:00      0   133  1758 1056  0.1 /usr/sbin/cannaserver -syslog -u bin
 2249 ?        S      0:00      1    23  1488  652  0.0 crond
 2261 ?        S      0:00      0   250 22985 10412  1.0 /usr/sbin/httpd
 2262 ?        S      0:00      0   250 22985 10408  1.0 /usr/sbin/httpd
 2263 ?        S      0:00      0   250 22985 10408  1.0 /usr/sbin/httpd
 2264 ?        S      0:00      0   250 22985 10408  1.0 /usr/sbin/httpd
 2265 ?        S      0:00      0   250 22985 10408  1.0 /usr/sbin/httpd
 2266 ?        S      0:00      0   250 22985 10408  1.0 /usr/sbin/httpd
 2267 ?        S      0:00      0   250 22985 10408  1.0 /usr/sbin/httpd
 2268 ?        S      0:00      0   250 22985 10408  1.0 /usr/sbin/httpd
 2320 ?        S      0:00     18    72  7383 6012  0.5 xfs -droppriv -daemon
 2329 ?        S      0:00      4  2727  7584 2676  0.2 smbd -D
 2333 ?        S      0:00      1   868  7183 2004  0.1 nmbd -D
 2347 ?        S      0:00      0   627  7324 2072  0.2 /sbin/mount.smbfs //gene.coyote.den/public /mnt/gene -o rw username root password XXXXXXXXX
 2349 ?        SW     0:00      0     0     0    0  0.0 [smbiod]
 2352 ?        S      0:00      0   627  7272 1980  0.1 /sbin/mount.smbfs //gene.coyote.den/dlds /mnt/dlds -o rw username root password XXXXXXXXX
 2369 ?        S      0:00      0    15  1484  604  0.0 /usr/sbin/atd
 2386 ?        S      0:00      0   230  1629  816  0.0 dbus-daemon-1 --system
 2399 ttyS0    S      0:13      4   153  1810  900  0.0 /usr/local/bulldog/upsd
 2498 ttyS1    S      0:00      0    44  1567  564  0.0 heyu_relay ck
 2499 ?        S      0:01      0    44  1571  556  0.0 heyu monitor
 2503 ?        S      0:00      0    19  1392  424  0.0 xtend -f /etc/.xtendrc
 2504 ?        S      0:00      0   554  3797 1176  0.1 /bin/sh /root/bin/setibatch -run
 2615 ?        S      0:00      3    17  2594 1072  0.1 login -- root     
 2616 tty2     S      0:00      0     8  1375  340  0.0 /sbin/mingetty tty2
 2617 tty3     S      0:00      0     8  1375  340  0.0 /sbin/mingetty tty3
 2618 tty4     S      0:00      0     8  1375  340  0.0 /sbin/mingetty tty4
 2619 tty5     S      0:00      0     8  1375  340  0.0 /sbin/mingetty tty5
 2620 tty6     S      0:00      0     8  1375  340  0.0 /sbin/mingetty tty6
 2771 tty1     S      0:00      0   554  3801 1332  0.1 -bash
 2803 tty1     S      0:00      0   554  3741 1012  0.0 /bin/sh /usr/X11R6/bin/startx
 2814 tty1     S      0:00      3     8  2195  508  0.0 xinit /etc/X11/xinit/xinitrc --
 2815 ?        S<     5:39     63  1498 153113 18040  1.7 X :0
 2829 tty1     S      0:00      1   554  3745  896  0.0 /bin/sh /root/kde3.3-beta2/bin/startkde
 2848 ?        S      0:00      0    55  3252  560  0.0 ssh-agent /etc/X11/xinit/Xclients
 2890 ?        S      0:00     15    34 24313 4156  0.4 kdeinit: Running...      
 2893 ?        S      0:00      6    34 23521 3608  0.3 kdeinit: dcopserver --nosid
 2895 ?        S      0:00     47    34 26377 4532  0.4 kdeinit: klauncher       
 2898 ?        S      0:01     50    34 28673 6096  0.5 kdeinit: kded            
 2899 ?        S      0:00     12   135  3136 1740  0.1 fam
 2907 ?        S      0:00      5    34 27025 5772  0.5 kdeinit: kxkb            
 2915 ?        S      0:06     72   154 19217 2588  0.2 artsd -F 11 -S 4096 -a alsa -s 15 -m artsmessage -c drkonqi -l 3 -f
 2917 ?        S      0:00     11    34 33445 5748  0.5 kdeinit: knotify         
 2918 tty1     S      0:00     25     1 20538 4560  0.4 ksmserver
 2919 ?        S      0:01     35    34 27273 6612  0.6 kdeinit: kwin -session 11c0a80103000104545060000000016110000_1092371276_861353
 2922 ?        S      0:00      2    34 25225 5220  0.5 kdeinit: khotkeys        
 2923 ?        S      0:03     39    34 30149 7712  0.7 kdeinit: kdesktop        
 2926 ?        S      0:03     90    34 32925 8216  0.7 kdeinit: kicker          
 2928 ?        S      0:00     33    34 42277 6432  0.6 kdeinit: klipper         
 2931 ?        S      0:01     23    61 30106 6588  0.6 korgac --miniicon korganizer
 2935 ?        S      0:00      6   830 28705 6520  0.6 kgpg -session 11c0a84703000109133389800000023980007_1092371276_68550
 2937 ?        S      0:00     11   240 30979 6068  0.5 knotes -session 11c0a84703000107107498500000013490017_1092371275_892454
 2939 ?        S      0:01      4    34 28801 7048  0.6 kdeinit: kmix -session 11c0a84703000109118722300000023870010_1092371275_970301
 2940 ?        S      0:01     17    34 31385 6768  0.6 kdeinit: konsole -session 11c0a84703000109146439400000023670008_1092371275_892612 -name Qt-subapplication
 2941 ?        S      0:06     49   590 31101 5060  0.4 /usr/bin/gkrellm --sm-client-id 11c0a84703000109197323200000021320011
 2944 pts/1    S      0:00      0   554  3801 1120  0.1 /bin/bash
 2952 ?        S      0:19      3    34 31369 6752  0.6 kdeinit: konsole -session 11c0a84703000109223264800000028730008_1092371276_29035 -name Qt-subapplication
 2953 ?        S      0:01     16    34 31377 7276  0.7 kdeinit: konsole -session 11c0a84703000109223268000000028730009_1092371275_892929 -name Qt-subapplication
 2955 ?        S      0:04     15  1060  4947 1608  0.1 /usr/local/bulldog/monitor
 2956 ?        S      0:00      1    96 25219 3616  0.3 kalarmd --login
 2965 pts/2    S      0:00      5   554  3805 1308  0.1 /bin/bash
 2976 pts/3    S      0:00      0   554  3801 1120  0.1 /bin/bash
 3005 ?        S      2:53    474    11 134332 76968  7.4 kmail
 3093 pts/1    S      0:00      0    33  3934  504  0.0 tail -f /var/log/messages
 3095 pts/3    S      0:26      1    47  1776  912  0.0 top
 7587 ?        S      0:00      0    37  2482  720  0.0 /usr/bin/esd -terminate -nobeeps -as 2 -spawnfd 17
 7923 ?        RN    16:18      7   131 17624 15104  1.4 /usr/local/bin/setiathome -stop_after_process -nice 19
 8081 ?        S      0:00      0    23  1492  656  0.0 CROND
 8082 ?        S      0:00      0   554  1449  824  0.0 /bin/bash /usr/bin/run-parts /etc/cron.daily
 8700 ?        S      0:00      0   690  5433 2432  0.2 /usr/sbin/sendmail -FCronDaemon -i -odi -oem root
10304 ?        SN     0:00      0   554  1449  788  0.0 /bin/sh /etc/cron.daily/slocate.cron
10305 ?        S      0:00      0   245  1558  496  0.0 awk -v progname=/etc/cron.daily/slocate.cron progname {?????   print progname ":\n"?????   progname="";????       }????       { print; }
10307 ?        DN     0:12      0    27  1508  712  0.0 /usr/bin/updatedb
10320 ?        S      0:00      3    34 51105 4676  0.4 kdeinit: kio_pop3 pop3 /tmp/ksocket-root/klauncherilvgna.slave-socket /tmp/ksocket-root/kmailkizaqa.slave-socket
10359 ?        S      0:00      8    34 26645 8180  0.7 kdeinit: kio_file file /tmp/ksocket-root/klauncherilvgna.slave-socket /tmp/ksocket-root/kmailWoeNac.slave-socket
10362 ?        S      0:00     48    34 28797 11940  1.1 kdeinit: kio_uiserver    
10371 pts/2    R      0:00      1    64  2143  576  0.0 ps axv

I won't repeat the dmesg as it, except for the Oops, will be the same.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-13  4:27                         ` Gene Heskett
  2004-08-13  8:32                           ` Gene Heskett
@ 2004-08-14  2:18                           ` Marcelo Tosatti
  2004-08-14  5:19                             ` Gene Heskett
                                               ` (2 more replies)
  1 sibling, 3 replies; 146+ messages in thread
From: Marcelo Tosatti @ 2004-08-14  2:18 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Linus Torvalds, Andrew Morton, Al Viro

On Fri, Aug 13, 2004 at 12:27:24AM -0400, Gene Heskett wrote:
> On Wednesday 11 August 2004 00:59, Linus Torvalds wrote:
> >I wrote:
> >> Notably, the output of "/proc/meminfo" and "/proc/slabinfo". "ps
> >> axm" helps too.
> >
> >That should be "ps axv" of course. Just shows what a retard I am.
> >
> >		Linus

> Acck!  I just logged an Oops:
> Aug 13 00:02:00 coyote kernel: kjournald starting.  Commit interval 5 seconds
> Aug 13 00:02:00 coyote kernel: EXT3 FS on hdb3, internal journal
> Aug 13 00:02:00 coyote kernel: EXT3-fs: mounted filesystem with ordered data mode.
> Aug 13 00:05:09 coyote kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000004
> Aug 13 00:05:09 coyote kernel:  printing eip:
> Aug 13 00:05:09 coyote kernel: c014e0dc
> Aug 13 00:05:09 coyote kernel: *pde = 00000000
> Aug 13 00:05:09 coyote kernel: Oops: 0002 [#1]
> Aug 13 00:05:09 coyote kernel: PREEMPT
> Aug 13 00:05:09 coyote kernel: Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
> Aug 13 00:05:09 coyote kernel: CPU:    0
> Aug 13 00:05:09 coyote kernel: EIP:    0060:[<c014e0dc>]    Not tainted
> Aug 13 00:05:09 coyote kernel: EFLAGS: 00010246   (2.6.8-rc4)
> Aug 13 00:05:09 coyote kernel: EIP is at remove_inode_buffers+0x4c/0x90
> Aug 13 00:05:09 coyote kernel: eax: 00000000   ebx: d7ff68b4   ecx: d7ffffb4   edx: 00000000
> Aug 13 00:05:09 coyote kernel: esi: d7ff67e0   edi: 00000001   ebp: c198bed8   esp: c198bec8
> Aug 13 00:05:09 coyote kernel: ds: 007b   es: 007b   ss: 0068
> Aug 13 00:05:09 coyote kernel: Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050)
> Aug 13 00:05:09 coyote kernel: Stack: d7ff67e0 d7ff67e8 d7ff67e0 0000001e c198bf04 c0165242 d7ff67e0 c198b000
> Aug 13 00:05:09 coyote kernel:        00000000 0000001e d7ff6988 ed3be928 00000080 00000000 c198b000 c198bf10
> Aug 13 00:05:09 coyote kernel:        c016532f 00000080 c198bf44 c013a32c 00000080 000000d0 0002cc1d 013b0a00
> Aug 13 00:05:09 coyote kernel: Call Trace:
> Aug 13 00:05:09 coyote kernel:  [<c010476f>] show_stack+0x7f/0xa0
> Aug 13 00:05:09 coyote kernel:  [<c0104908>] show_registers+0x158/0x1b0
> Aug 13 00:05:09 coyote kernel:  [<c0104a89>] die+0x89/0x100
> Aug 13 00:05:09 coyote kernel:  [<c0111725>] do_page_fault+0x1f5/0x553
> Aug 13 00:05:09 coyote kernel:  [<c01043d9>] error_code+0x2d/0x38
> Aug 13 00:05:09 coyote kernel:  [<c0165242>] prune_icache+0x142/0x1f0
> Aug 13 00:05:09 coyote kernel:  [<c016532f>] shrink_icache_memory+0x3f/0x50
> Aug 13 00:05:09 coyote kernel:  [<c013a32c>] shrink_slab+0x14c/0x190
> Aug 13 00:05:09 coyote kernel:  [<c013b639>] balance_pgdat+0x1a9/0x1f0
> Aug 13 00:05:09 coyote kernel:  [<c013b73f>] kswapd+0xbf/0xd0
> Aug 13 00:05:09 coyote kernel:  [<c0102471>] kernel_thread_helper+0x5/0x14
> Aug 13 00:05:09 coyote kernel: Code: 89 50 04 89 02 89 49 04 89 09 8b 03 39 d8 89 c1 75 e2 b8 00
> Aug 13 00:05:09 coyote kernel:  <6>note: kswapd0[66] exited with preempt_count 1
> 
> The first 3 entries are from a nightly run of rsync, which mounts a
> normally unmounted partition for the duration of its run.

Hi fellows,

I've taken some time to look at this oopses, and I truly believe we 
are facing real corruption.

The symptom is that an inode's (blockdev) i_mapping->private_list gets corrupted, 
one of its buffer_head's contains a b_assoc_mapping list_head with NULL pointers. 

And this is not an SMP race, because Gene is not running SMP.

Gene's oops happens when remove_inode_buffers calls  __remove_assoc_queue(bh)

Ingo's oops happens while remove_inode_buffers does

 struct buffer_head *bh = BH_ENTRY(list->next);

which is

	mov ffffffd8(%ecx), (%somewhere)

%ecx is zero, so...

There is a bug somewhere.

--- a/fs/buffer.c.original	2004-08-14 00:19:55.000000000 -0300
+++ b/fs/buffer.c	2004-08-14 00:34:57.000000000 -0300
@@ -802,6 +802,8 @@
  */
 static inline void __remove_assoc_queue(struct buffer_head *bh)
 {
+	BUG_ON(bh->b_assoc_buffers.next == NULL);
+	BUG_ON(bh->b_assoc_buffers.prev == NULL);
 	list_del_init(&bh->b_assoc_buffers);
 }
 
@@ -1073,6 +1075,7 @@
 
 		spin_lock(&buffer_mapping->private_lock);
 		while (!list_empty(list)) {
+			BUG_ON(list->next == NULL);
 			struct buffer_head *bh = BH_ENTRY(list->next);
 			if (buffer_dirty(bh)) {
 				ret = 0;


Ingo oops for reference:
Unable to handle kernel paging request at virtual address ffffffd8
 printing eip:
c016a3d0
*pde = 00000000
Oops: 0000 [#1]
PREEMPT SMP 
Modules linked in:
CPU:    0
EIP:    0060:[<c016a3d0>]    Not tainted VLI
EFLAGS: 00010217   (2.6.8-rc2-mm2) 
EIP is at remove_inode_buffers+0x60/0xe0
eax: 00000000   ebx: c03ba9dc   ecx: 00000000   edx: c03ba8d0
esi: c03ba8d0   edi: c0379b2a   ebp: c4115ec4   esp: c4115eac
ds: 007b   es: 007b   ss: 0068
Process kswapd0 (pid: 39, threadinfo=c4114000 task=c40aa070)
Stack: c03ba8d0 c0379b76 00000001 c03ba8d8 c03ba8d0 00000000 c4115ef8 c0186c4c 
       c03ba8d0 00000077 c4114000 00000000 0000004d 00000000 c4115ee4 c4115ee4 
       c4114000 c07fd6a0 00004e09 c4115f04 c0186df5 00000080 c4115f38 c014f4b3 
Call Trace:
 [<c01059ff>] show_stack+0x8f/0xb0
 [<c0105bb3>] show_registers+0x163/0x1d0
 [<c0105dc6>] die+0xe6/0x1c0
 [<c0117773>] do_page_fault+0x213/0x6c0
 [<c0105674>] exception_start+0x6/0xe
 [<c0186c4c>] prune_icache+0x20c/0x390
 [<c0186df5>] shrink_icache_memory+0x25/0x50
 [<c014f4b3>] shrink_slab+0x123/0x1d0
 [<c01511ee>] balance_pgdat+0x24e/0x2a0
 [<c015130c>] kswapd+0xcc/0xe0
 [<c0102899>] kernel_thread_helper+0x5/0xc
Code: 00 e0 ff ff 21 e0 ff 40 14 8d 47 4c 89 45 ec 31 c0 86 47 4c 84 c0 0f 8e 79 00 \
00 00 8b 86 0c 01 00 00 39 d8 74 23 89 c1 8d 76 00 <8b> 41 d8 a8 02 75 5a 8b 01 8b 51 \
04 89 02 89 09 89 50 04 8b 03   <6>note: kswapd0[39] exited with preempt_count 1


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-14  2:18                           ` Marcelo Tosatti
@ 2004-08-14  5:19                             ` Gene Heskett
  2004-08-14  5:50                             ` Gene Heskett
  2004-08-14  8:17                             ` Gene Heskett
  2 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-14  5:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: Marcelo Tosatti, Linus Torvalds, Andrew Morton, Al Viro

On Friday 13 August 2004 22:18, Marcelo Tosatti wrote:
>On Fri, Aug 13, 2004 at 12:27:24AM -0400, Gene Heskett wrote:
>> On Wednesday 11 August 2004 00:59, Linus Torvalds wrote:
[...]
>
>Hi fellows,
>
>I've taken some time to look at this oopses, and I truly believe we
>are facing real corruption.
>
>The symptom is that an inode's (blockdev) i_mapping->private_list
> gets corrupted, one of its buffer_head's contains a b_assoc_mapping
> list_head with NULL pointers.
>
>And this is not an SMP race, because Gene is not running SMP.
>
>Gene's oops happens when remove_inode_buffers calls 
> __remove_assoc_queue(bh)
>
>Ingo's oops happens while remove_inode_buffers does
>
> struct buffer_head *bh = BH_ENTRY(list->next);
>
>which is
>
>	mov ffffffd8(%ecx), (%somewhere)
>
>%ecx is zero, so...
>
>There is a bug somewhere.
>
>--- a/fs/buffer.c.original	2004-08-14 00:19:55.000000000 -0300
>+++ b/fs/buffer.c	2004-08-14 00:34:57.000000000 -0300
>@@ -802,6 +802,8 @@
>  */
> static inline void __remove_assoc_queue(struct buffer_head *bh)
> {
>+	BUG_ON(bh->b_assoc_buffers.next == NULL);
>+	BUG_ON(bh->b_assoc_buffers.prev == NULL);
> 	list_del_init(&bh->b_assoc_buffers);
> }
>
>@@ -1073,6 +1075,7 @@
>
> 		spin_lock(&buffer_mapping->private_lock);
> 		while (!list_empty(list)) {
>+			BUG_ON(list->next == NULL);
> 			struct buffer_head *bh = BH_ENTRY(list->next);
> 			if (buffer_dirty(bh)) {
> 				ret = 0;
>
Marcelo;  I've put in the patch that disables the prefetch, and thats 
been running ok so far, but uptime is still pretty short, in hours.  
But if it eventually does an Oops on me, the reboot will bring this 
one in too, its building right now.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-14  2:18                           ` Marcelo Tosatti
  2004-08-14  5:19                             ` Gene Heskett
@ 2004-08-14  5:50                             ` Gene Heskett
  2004-08-14  8:17                             ` Gene Heskett
  2 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-14  5:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: Marcelo Tosatti, Linus Torvalds, Andrew Morton, Al Viro

On Friday 13 August 2004 22:18, Marcelo Tosatti wrote:
[...]
>
>Hi fellows,
>
>I've taken some time to look at this oopses, and I truly believe we
>are facing real corruption.
>
>The symptom is that an inode's (blockdev) i_mapping->private_list
> gets corrupted, one of its buffer_head's contains a b_assoc_mapping
> list_head with NULL pointers.
>
>And this is not an SMP race, because Gene is not running SMP.
>
>Gene's oops happens when remove_inode_buffers calls 
> __remove_assoc_queue(bh)
>
>Ingo's oops happens while remove_inode_buffers does
>
> struct buffer_head *bh = BH_ENTRY(list->next);
>
>which is
>
>	mov ffffffd8(%ecx), (%somewhere)
>
>%ecx is zero, so...
>
>There is a bug somewhere.
>
>--- a/fs/buffer.c.original	2004-08-14 00:19:55.000000000 -0300
>+++ b/fs/buffer.c	2004-08-14 00:34:57.000000000 -0300
>@@ -802,6 +802,8 @@
>  */
> static inline void __remove_assoc_queue(struct buffer_head *bh)
> {
>+	BUG_ON(bh->b_assoc_buffers.next == NULL);
>+	BUG_ON(bh->b_assoc_buffers.prev == NULL);
> 	list_del_init(&bh->b_assoc_buffers);
> }
>
>@@ -1073,6 +1075,7 @@
>
> 		spin_lock(&buffer_mapping->private_lock);
> 		while (!list_empty(list)) {
>+			BUG_ON(list->next == NULL);
> 			struct buffer_head *bh = BH_ENTRY(list->next);
During the compile, the above line output this warning:
fs/buffer.c: In function `remove_inode_buffers':
fs/buffer.c:1079: warning: ISO C90 forbids mixed declarations and code
Did the compiler do the right thing?  Or is this perchance the bug?
> 			if (buffer_dirty(bh)) {
> 				ret = 0;

In any event, its getting sleepy out, good night all.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-14  2:18                           ` Marcelo Tosatti
  2004-08-14  5:19                             ` Gene Heskett
  2004-08-14  5:50                             ` Gene Heskett
@ 2004-08-14  8:17                             ` Gene Heskett
  2004-08-15  4:09                               ` Gene Heskett
  2 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-14  8:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: Marcelo Tosatti, Linus Torvalds, Andrew Morton, Al Viro

On Friday 13 August 2004 22:18, Marcelo Tosatti wrote:
>On Fri, Aug 13, 2004 at 12:27:24AM -0400, Gene Heskett wrote:
>> On Wednesday 11 August 2004 00:59, Linus Torvalds wrote:
>> >I wrote:
>> >> Notably, the output of "/proc/meminfo" and "/proc/slabinfo".
>> >> "ps axv" helps too.
[...]
>
>Hi fellows,
>
>I've taken some time to look at this oopses, and I truly believe we
>are facing real corruption.
>
>The symptom is that an inode's (blockdev) i_mapping->private_list
> gets corrupted, one of its buffer_head's contains a b_assoc_mapping
> list_head with NULL pointers.
>
>And this is not an SMP race, because Gene is not running SMP.
>
>Gene's oops happens when remove_inode_buffers calls 
> __remove_assoc_queue(bh)
>
>Ingo's oops happens while remove_inode_buffers does
>
> struct buffer_head *bh = BH_ENTRY(list->next);
>
>which is
>
>	mov ffffffd8(%ecx), (%somewhere)
>
>%ecx is zero, so...
>
>There is a bug somewhere.
>
>--- a/fs/buffer.c.original	2004-08-14 00:19:55.000000000 -0300
>+++ b/fs/buffer.c	2004-08-14 00:34:57.000000000 -0300
>@@ -802,6 +802,8 @@
>  */
> static inline void __remove_assoc_queue(struct buffer_head *bh)
> {
>+	BUG_ON(bh->b_assoc_buffers.next == NULL);
>+	BUG_ON(bh->b_assoc_buffers.prev == NULL);
> 	list_del_init(&bh->b_assoc_buffers);
> }
>
>@@ -1073,6 +1075,7 @@
>
> 		spin_lock(&buffer_mapping->private_lock);
> 		while (!list_empty(list)) {
>+			BUG_ON(list->next == NULL);
> 			struct buffer_head *bh = BH_ENTRY(list->next);
> 			if (buffer_dirty(bh)) {
> 				ret = 0;
>
Just for grins I occasionally do the up-arrow bit and re-run that 
slabinfo sorter line Linus gave me, watching the size of the 
dentry_cache line in particular.  I believe I just saw a first, the 
size was reported as being slightly smaller that the last run an hour 
ago.  Previously it had done nothing but grow.  This is a kernel with 
two patches from -rc4, one being the list_del thing, the other being 
the one liner that presumably forces the fetch, not depending on the 
prefetch in this chip which conjecture says it might not be working 
100%.

Also, top is showing a relatively large amount of free memory even 
though a small amount is now in the swap.  /proc/meminfo:
MemTotal:      1035852 kB
MemFree:        130452 kB
Buffers:         70664 kB
Cached:         420512 kB
SwapCached:        400 kB
Active:         384008 kB
Inactive:       271184 kB
HighTotal:      131008 kB
HighFree:          308 kB
LowTotal:       904844 kB
LowFree:        130144 kB
SwapTotal:     3857104 kB
SwapFree:      3856452 kB
Dirty:             136 kB
Writeback:           0 kB
Mapped:         222000 kB
Slab:           239816 kB
Committed_AS:   302408 kB
PageTables:       3232 kB
VmallocTotal:   114680 kB
VmallocUsed:     19900 kB
VmallocChunk:    94604 kB

This with an uptime approaching 18 hours.  With only the list_del 
patch, by now I would be down to 3-5 megs free, and 20-100 megs in 
swap.

The 4am stuff just started, this was the killer yesterday morning.
No probs at the 15 minute mark, looks good.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-14  8:17                             ` Gene Heskett
@ 2004-08-15  4:09                               ` Gene Heskett
  2004-08-15  8:48                                 ` viro
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-15  4:09 UTC (permalink / raw)
  To: linux-kernel; +Cc: Marcelo Tosatti, Linus Torvalds, Andrew Morton, Al Viro

On Saturday 14 August 2004 04:17, Gene Heskett wrote:
>On Friday 13 August 2004 22:18, Marcelo Tosatti wrote:
>>On Fri, Aug 13, 2004 at 12:27:24AM -0400, Gene Heskett wrote:
>>> On Wednesday 11 August 2004 00:59, Linus Torvalds wrote:
>>> >I wrote:
>>> >> Notably, the output of "/proc/meminfo" and "/proc/slabinfo".
>>> >> "ps axv" helps too.
>
>[...]
>
>>Hi fellows,
>>
>>I've taken some time to look at this oopses, and I truly believe we
>>are facing real corruption.
>>
>>The symptom is that an inode's (blockdev) i_mapping->private_list
>> gets corrupted, one of its buffer_head's contains a
>> b_assoc_mapping list_head with NULL pointers.
>>
>>And this is not an SMP race, because Gene is not running SMP.
>>
>>Gene's oops happens when remove_inode_buffers calls
>> __remove_assoc_queue(bh)
>>
>>Ingo's oops happens while remove_inode_buffers does
>>
>> struct buffer_head *bh = BH_ENTRY(list->next);
>>
>>which is
>>
>>	mov ffffffd8(%ecx), (%somewhere)
>>
>>%ecx is zero, so...
>>
>>There is a bug somewhere.
>>
>>--- a/fs/buffer.c.original	2004-08-14 00:19:55.000000000 -0300
>>+++ b/fs/buffer.c	2004-08-14 00:34:57.000000000 -0300
>>@@ -802,6 +802,8 @@
>>  */
>> static inline void __remove_assoc_queue(struct buffer_head *bh)
>> {
>>+	BUG_ON(bh->b_assoc_buffers.next == NULL);
>>+	BUG_ON(bh->b_assoc_buffers.prev == NULL);
>> 	list_del_init(&bh->b_assoc_buffers);
>> }
>>
>>@@ -1073,6 +1075,7 @@
>>
>> 		spin_lock(&buffer_mapping->private_lock);
>> 		while (!list_empty(list)) {
>>+			BUG_ON(list->next == NULL);
>> 			struct buffer_head *bh = BH_ENTRY(list->next);
>> 			if (buffer_dirty(bh)) {
>> 				ret = 0;
>
>Just for grins I occasionally do the up-arrow bit and re-run that
>slabinfo sorter line Linus gave me, watching the size of the
>dentry_cache line in particular.  I believe I just saw a first, the
>size was reported as being slightly smaller that the last run an
> hour ago.  Previously it had done nothing but grow.  This is a
> kernel with two patches from -rc4, one being the list_del thing,
> the other being the one liner that presumably forces the fetch, not
> depending on the prefetch in this chip which conjecture says it
> might not be working 100%.

I spoke too soon, and I am now rebooted to this patch in addition to 
the 2 noted previously.  It lasted 35 hours this time.

I was looking at sendmail.mc with vim, trying to see if I could spot a 
reason that local mail to root only gets posted when the sendmail 
buffer needs flushed, often resulting in messages from 5am local 
time, finally making it into kmail at 10pm!  Not finding anything 
obvious, I did a :q to quit.  At that point everything froze 
including the clock on the lower right corner of the launch bar.  The 
only unexplained entries in the log are:
-----------------
Aug 14 22:44:04 coyote bonobo-activation-server (root-27863): iid 
OAFIID:BrokenNoType:20000808 has a NULL type
Aug 14 22:44:04 coyote bonobo-activation-server (root-27863): invalid 
character '#' in iid 'OAFIID:This#!!%$iid%^$%
_|~!OAFIID_ContainsBadChars'
Aug 14 22:44:34 coyote gconfd (root-27861): GConf server is not in 
use, shutting down.
Aug 14 22:44:34 coyote gconfd (root-27861): Exiting

And of course the hardware clock was wrong since a normal shutdown 
wasn't done, just a tap on the reset button.

Aug 15 03:37:17 coyote syslogd 1.4.1: restart.
----------------------
So I am as usual, puzzled.  Or up that famous creek with no visible 
means of locomotion, apply as required.

The only thing I've noted in the slabinfo reports is the ext3_cache 
was well into 6 digits in kilobytes.  Now its only 15,000 of its 
normal units (whatever they are) after the reboot.

But now we have the BUG_ON stuff from above installed, maybe that will 
disclose something we can use.  That would brighten my mood which 
needs lots of help after watching my Shelty die today with no vet 
help available.  We will miss him, he was part of the family for 11 
years.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15  4:09                               ` Gene Heskett
@ 2004-08-15  8:48                                 ` viro
  2004-08-15  9:42                                   ` Gene Heskett
                                                     ` (3 more replies)
  0 siblings, 4 replies; 146+ messages in thread
From: viro @ 2004-08-15  8:48 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote:
> The only thing I've noted in the slabinfo reports is the ext3_cache 
> was well into 6 digits in kilobytes.  Now its only 15,000 of its 
> normal units (whatever they are) after the reboot.

What did dcache numbers look like at that time?

Anyway, we could try the patch below and see what shows in /proc/fs/ext3
with it [NOTE: patch is completely untested].  It should show
	major:minor:inumber:mode
for all currently allocated ext3 inodes.  It won't be 100% accurate (we
can miss some entries/get some twice if cache shrinks or grows at the
time), but if the leak is so massive, we ought to see a *lot* of duplicates
in there.  Seeing what kind of inodes really leaks could narrow the things
down.

See if cat /proc/fs/ext3 | sort | uniq -c | sort -nr gives anything interesting
when leak happens (and check it right after boot to see if it works at all
and doesn't oops, obviously ;-)

diff -urN RC8-current/fs/ext3/super.c RC8-leak/fs/ext3/super.c
--- RC8-current/fs/ext3/super.c	Sat Aug 14 05:35:37 2004
+++ RC8-leak/fs/ext3/super.c	Sun Aug 15 04:41:09 2004
@@ -35,6 +35,8 @@
 #include <linux/mount.h>
 #include <linux/namei.h>
 #include <linux/quotaops.h>
+#include <linux/proc_fs.h>
+#include <linux/seq_file.h>
 #include <asm/uaccess.h>
 #include "xattr.h"
 #include "acl.h"
@@ -438,6 +440,9 @@
 
 static kmem_cache_t *ext3_inode_cachep;
 
+static LIST_HEAD(ext3_list);
+static spinlock_t ext3_list_lock = SPIN_LOCK_UNLOCKED;
+
 /*
  * Called inside transaction, so use GFP_NOFS
  */
@@ -453,11 +458,17 @@
 	ei->i_default_acl = EXT3_ACL_NOT_CACHED;
 #endif
 	ei->vfs_inode.i_version = 1;
+	spin_lock(&ext3_list_lock);
+	list_add(&ei->list, &ext3_list);
+	spin_unlock(&ext3_list_lock);
 	return &ei->vfs_inode;
 }
 
 static void ext3_destroy_inode(struct inode *inode)
 {
+	spin_lock(&ext3_list_lock);
+	list_del_init(&EXT3_I(inode)->list);
+	spin_unlock(&ext3_list_lock);
 	kmem_cache_free(ext3_inode_cachep, EXT3_I(inode));
 }
 
@@ -475,20 +486,82 @@
 		inode_init_once(&ei->vfs_inode);
 	}
 }
+
+static void *ext3_cache_start(struct seq_file *m, loff_t *pos)
+{
+	struct list_head *p;
+	loff_t l = *pos;
+
+	spin_lock(&ext3_list_lock);
+	list_for_each(p, &ext3_list)
+		if (!l--)
+			return list_entry(p, struct ext3_inode_info, list);
+	return NULL;
+}
+
+static void *ext3_cache_next(struct seq_file *m, void *v, loff_t *pos)
+{
+	struct list_head *p = ((struct ext3_inode_info *)v)->list.next;
+	(*pos)++;
+	return p==&ext3_list ? NULL : list_entry(p, struct ext3_inode_info, list);
+}
+
+static void ext3_cache_stop(struct seq_file *m, void *v)
+{
+	spin_unlock(&ext3_list_lock);
+}
+
+static int ext3_cache_show(struct seq_file *m, void *v)
+{
+	struct ext3_inode_info *ei = v;
+	struct inode *inode = &ei->vfs_inode;
+	seq_printf(m, "%d:%d:%lu:%o",
+		MAJOR(inode->i_sb->s_dev),
+		MINOR(inode->i_sb->s_dev),
+		inode->i_ino,
+		inode->i_mode);
+	return 0;
+}
+
+static struct seq_operations ext3_cache_op = {
+	.start	= ext3_cache_start,
+	.next	= ext3_cache_next,
+	.stop	= ext3_cache_stop,
+	.show	= ext3_cache_show
+};
+
+static int ext3_cache_open(struct inode *inode, struct file *file)
+{
+	return seq_open(file, &ext3_cache_op);
+}
+
+static struct file_operations ext3_cache_operations = {
+	.open		= ext3_cache_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= seq_release,
+};
  
 static int init_inodecache(void)
 {
+	struct proc_dir_entry *p;
 	ext3_inode_cachep = kmem_cache_create("ext3_inode_cache",
 					     sizeof(struct ext3_inode_info),
 					     0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
 					     init_once, NULL);
 	if (ext3_inode_cachep == NULL)
 		return -ENOMEM;
+	p = create_proc_entry("fs/ext3", S_IRUGO, NULL);
+	if (p) {
+		p->owner = THIS_MODULE;
+		p->proc_fops = &ext3_cache_operations;
+	}
 	return 0;
 }
 
 static void destroy_inodecache(void)
 {
+	remove_proc_entry("fs/ext3", NULL);
 	if (kmem_cache_destroy(ext3_inode_cachep))
 		printk(KERN_INFO "ext3_inode_cache: not all structures were freed\n");
 }
diff -urN RC8-current/include/linux/ext3_fs_i.h RC8-leak/include/linux/ext3_fs_i.h
--- RC8-current/include/linux/ext3_fs_i.h	Thu Oct  9 17:34:54 2003
+++ RC8-leak/include/linux/ext3_fs_i.h	Sun Aug 15 04:11:03 2004
@@ -107,6 +107,7 @@
 	 * by other means, so we have truncate_sem.
 	 */
 	struct semaphore truncate_sem;
+	struct list_head list;
 	struct inode vfs_inode;
 };
 

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15  8:48                                 ` viro
@ 2004-08-15  9:42                                   ` Gene Heskett
  2004-08-15 17:31                                     ` Andrew Morton
  2004-08-15  9:50                                   ` Gene Heskett
                                                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-15  9:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sunday 15 August 2004 04:48, viro@parcelfarce.linux.theplanet.co.uk wrote:
>On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote:
>> The only thing I've noted in the slabinfo reports is the
>> ext3_cache was well into 6 digits in kilobytes.  Now its only
>> 15,000 of its normal units (whatever they are) after the reboot.
>
>What did dcache numbers look like at that time?

IIRC the last time I checked before it locked up, dcache was 
in the 57xxx kilobytes area.

Right now, after about 5 6 hours uptime, that line in raw format
is:dentry_cache      731159 772632
and:ext3_inode_cache  1024365 1055817

Now, this mornings logwatch told me I should go look at the 
logs again, and I found this had occurred several hours earlier:
-----------
Aug 14 18:53:24 coyote kernel: Unable to handle kernel paging request at virtual address 0058af03
Aug 14 18:53:24 coyote kernel:  printing eip:
Aug 14 18:53:24 coyote kernel: c01648bc
Aug 14 18:53:24 coyote kernel: *pde = 00000000
Aug 14 18:53:24 coyote kernel: Oops: 0002 [#1]
Aug 14 18:53:24 coyote kernel: PREEMPT
Aug 14 18:53:24 coyote kernel: Modules linked in: tuner tvaudio bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq
_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_allo
c snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
Aug 14 18:53:24 coyote kernel: CPU:    0
Aug 14 18:53:24 coyote kernel: EIP:    0060:[<c01648bc>]    Not tainted
Aug 14 18:53:24 coyote kernel: EFLAGS: 00010202   (2.6.8-rc4)
Aug 14 18:53:24 coyote kernel: EIP is at dispose_list+0x1c/0xa0
Aug 14 18:53:24 coyote kernel: eax: 0058aeff   ebx: ddfc9140   ecx: ddfc9148   edx: c198bef0
Aug 14 18:53:24 coyote kernel: esi: c198bef0   edi: 00000075   ebp: c198bed8   esp: c198bec0
Aug 14 18:53:24 coyote kernel: ds: 007b   es: 007b   ss: 0068
Aug 14 18:53:24 coyote kernel: Process kswapd0 (pid: 66, threadinfo=c198b000 task=c1978050)
Aug 14 18:53:24 coyote kernel: Stack: ddfc92e0 c198bec4 c198bec4 c198b000 cb2be2a0 00000080 c198bf04 c0164c37
Aug 14 18:53:24 coyote kernel:        c198bef0 c198b000 00000000 00000080 ddfc9148 cdf9b668 00000080 00000000
Aug 14 18:53:24 coyote kernel:        c198b000 c198bf10 c0164daf 00000080 c198bf44 c0139fd4 00000080 000000d0
Aug 14 18:53:24 coyote kernel: Call Trace:
Aug 14 18:53:24 coyote kernel:  [<c010476f>] show_stack+0x7f/0xa0
Aug 14 18:53:25 coyote kernel:  [<c0104908>] show_registers+0x158/0x1b0
Aug 14 18:53:25 coyote kernel:  [<c0104a89>] die+0x89/0x100
Aug 14 18:53:25 coyote kernel:  [<c0111725>] do_page_fault+0x1f5/0x553
Aug 14 18:53:25 coyote kernel:  [<c01043d9>] error_code+0x2d/0x38
Aug 14 18:53:25 coyote kernel:  [<c0164c37>] prune_icache+0xb7/0x1f0
Aug 14 18:53:25 coyote kernel:  [<c0164daf>] shrink_icache_memory+0x3f/0x50
Aug 14 18:53:25 coyote kernel:  [<c0139fd4>] shrink_slab+0x134/0x170
Aug 14 18:53:25 coyote kernel:  [<c013b25d>] balance_pgdat+0x1ad/0x1f0
Aug 14 18:53:25 coyote kernel:  [<c013b35f>] kswapd+0xbf/0xd0
Aug 14 18:53:25 coyote kernel:  [<c0102471>] kernel_thread_helper+0x5/0x14
Aug 14 18:53:25 coyote kernel: Code: 89 50 04 89 02 c7 41 04 00 02 20 00 c7 01 00 01 10 00 8b 83
-----------------
which was about 5 hours before the lockup.

>Anyway, we could try the patch below and see what shows in
> /proc/fs/ext3 with it [NOTE: patch is completely untested].  It
> should show major:minor:inumber:mode
>for all currently allocated ext3 inodes.  It won't be 100% accurate
> (we can miss some entries/get some twice if cache shrinks or grows
> at the time), but if the leak is so massive, we ought to see a
> *lot* of duplicates in there.  Seeing what kind of inodes really
> leaks could narrow the things down.
>
>See if cat /proc/fs/ext3 | sort | uniq -c | sort -nr gives anything
> interesting when leak happens (and check it right after boot to see
> if it works at all and doesn't oops, obviously ;-)
>
>diff -urN RC8-current/fs/ext3/super.c RC8-leak/fs/ext3/super.c
>--- RC8-current/fs/ext3/super.c	Sat Aug 14 05:35:37 2004
>+++ RC8-leak/fs/ext3/super.c	Sun Aug 15 04:41:09 2004
>@@ -35,6 +35,8 @@
> #include <linux/mount.h>
> #include <linux/namei.h>
> #include <linux/quotaops.h>
>+#include <linux/proc_fs.h>
>+#include <linux/seq_file.h>
> #include <asm/uaccess.h>
> #include "xattr.h"
> #include "acl.h"
>@@ -438,6 +440,9 @@
>
> static kmem_cache_t *ext3_inode_cachep;
>
>+static LIST_HEAD(ext3_list);
>+static spinlock_t ext3_list_lock = SPIN_LOCK_UNLOCKED;
>+
> /*
>  * Called inside transaction, so use GFP_NOFS
>  */
>@@ -453,11 +458,17 @@
> 	ei->i_default_acl = EXT3_ACL_NOT_CACHED;
> #endif
> 	ei->vfs_inode.i_version = 1;
>+	spin_lock(&ext3_list_lock);
>+	list_add(&ei->list, &ext3_list);
>+	spin_unlock(&ext3_list_lock);
> 	return &ei->vfs_inode;
> }
>
> static void ext3_destroy_inode(struct inode *inode)
> {
>+	spin_lock(&ext3_list_lock);
>+	list_del_init(&EXT3_I(inode)->list);
>+	spin_unlock(&ext3_list_lock);
> 	kmem_cache_free(ext3_inode_cachep, EXT3_I(inode));
> }
>
>@@ -475,20 +486,82 @@
> 		inode_init_once(&ei->vfs_inode);
> 	}
> }
>+
>+static void *ext3_cache_start(struct seq_file *m, loff_t *pos)
>+{
>+	struct list_head *p;
>+	loff_t l = *pos;
>+
>+	spin_lock(&ext3_list_lock);
>+	list_for_each(p, &ext3_list)
>+		if (!l--)
>+			return list_entry(p, struct ext3_inode_info, list);
>+	return NULL;
>+}
>+
>+static void *ext3_cache_next(struct seq_file *m, void *v, loff_t
> *pos) +{
>+	struct list_head *p = ((struct ext3_inode_info *)v)->list.next;
>+	(*pos)++;
>+	return p==&ext3_list ? NULL : list_entry(p, struct
> ext3_inode_info, list); +}
>+
>+static void ext3_cache_stop(struct seq_file *m, void *v)
>+{
>+	spin_unlock(&ext3_list_lock);
>+}
>+
>+static int ext3_cache_show(struct seq_file *m, void *v)
>+{
>+	struct ext3_inode_info *ei = v;
>+	struct inode *inode = &ei->vfs_inode;
>+	seq_printf(m, "%d:%d:%lu:%o",
>+		MAJOR(inode->i_sb->s_dev),
>+		MINOR(inode->i_sb->s_dev),
>+		inode->i_ino,
>+		inode->i_mode);
>+	return 0;
>+}
>+
>+static struct seq_operations ext3_cache_op = {
>+	.start	= ext3_cache_start,
>+	.next	= ext3_cache_next,
>+	.stop	= ext3_cache_stop,
>+	.show	= ext3_cache_show
>+};
>+
>+static int ext3_cache_open(struct inode *inode, struct file *file)
>+{
>+	return seq_open(file, &ext3_cache_op);
>+}
>+
>+static struct file_operations ext3_cache_operations = {
>+	.open		= ext3_cache_open,
>+	.read		= seq_read,
>+	.llseek		= seq_lseek,
>+	.release	= seq_release,
>+};
>
> static int init_inodecache(void)
> {
>+	struct proc_dir_entry *p;
> 	ext3_inode_cachep = kmem_cache_create("ext3_inode_cache",
> 					     sizeof(struct ext3_inode_info),
> 					     0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
> 					     init_once, NULL);
> 	if (ext3_inode_cachep == NULL)
> 		return -ENOMEM;
>+	p = create_proc_entry("fs/ext3", S_IRUGO, NULL);
>+	if (p) {
>+		p->owner = THIS_MODULE;
>+		p->proc_fops = &ext3_cache_operations;
>+	}
> 	return 0;
> }
>
> static void destroy_inodecache(void)
> {
>+	remove_proc_entry("fs/ext3", NULL);
> 	if (kmem_cache_destroy(ext3_inode_cachep))
> 		printk(KERN_INFO "ext3_inode_cache: not all structures were
> freed\n"); }
>diff -urN RC8-current/include/linux/ext3_fs_i.h
> RC8-leak/include/linux/ext3_fs_i.h ---
> RC8-current/include/linux/ext3_fs_i.h	Thu Oct  9 17:34:54 2003 +++
> RC8-leak/include/linux/ext3_fs_i.h	Sun Aug 15 04:11:03 2004 @@
> -107,6 +107,7 @@
> 	 * by other means, so we have truncate_sem.
> 	 */
> 	struct semaphore truncate_sem;
>+	struct list_head list;
> 	struct inode vfs_inode;
> };
----------
I'll put this in right now.  Thanks.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15  8:48                                 ` viro
  2004-08-15  9:42                                   ` Gene Heskett
@ 2004-08-15  9:50                                   ` Gene Heskett
  2004-08-15 10:36                                     ` viro
  2004-08-15 10:10                                   ` Gene Heskett
  2004-08-16 22:52                                   ` Gene Heskett
  3 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-15  9:50 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sunday 15 August 2004 04:48, viro@parcelfarce.linux.theplanet.co.uk 
wrote:
>On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote:
>> The only thing I've noted in the slabinfo reports is the
>> ext3_cache was well into 6 digits in kilobytes.  Now its only
>> 15,000 of its normal units (whatever they are) after the reboot.
>
And I just noticed this go by during the build:
----------
fs/buffer.c: In function `remove_inode_buffers':
fs/buffer.c:1079: warning: ISO C90 forbids mixed declarations and code
----------
Do we need to address this?  Its a line immediately below the BUG-ON 
patch that Marcelo had me put in most recently, and has probably been 
there all along.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15  8:48                                 ` viro
  2004-08-15  9:42                                   ` Gene Heskett
  2004-08-15  9:50                                   ` Gene Heskett
@ 2004-08-15 10:10                                   ` Gene Heskett
  2004-08-15 10:37                                     ` viro
  2004-08-16 22:52                                   ` Gene Heskett
  3 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-15 10:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sunday 15 August 2004 04:48, viro@parcelfarce.linux.theplanet.co.uk 
wrote:
>On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote:
>> The only thing I've noted in the slabinfo reports is the
>> ext3_cache was well into 6 digits in kilobytes.  Now its only
>> 15,000 of its normal units (whatever they are) after the reboot.
>
>What did dcache numbers look like at that time?
>
>Anyway, we could try the patch below and see what shows in
> /proc/fs/ext3 with it [NOTE: patch is completely untested].  It
> should show major:minor:inumber:mode
>for all currently allocated ext3 inodes.  It won't be 100% accurate
> (we can miss some entries/get some twice if cache shrinks or grows
> at the time), but if the leak is so massive, we ought to see a
> *lot* of duplicates in there.  Seeing what kind of inodes really
> leaks could narrow the things down.
>
>See if cat /proc/fs/ext3 | sort | uniq -c | sort -nr gives anything
> interesting when leak happens (and check it right after boot to see
> if it works at all and doesn't oops, obviously ;-)

It doesn't Oops when I use that line, but at 15,000+ entries spit out 
all in one line of text, its a bit hard to locate real duplicates.  
But I think I see some right now!  Can this line be modified to spit 
them out, one entry per line with all dups sorted to be adjacent?

>diff -urN RC8-current/fs/ext3/super.c RC8-leak/fs/ext3/super.c
>--- RC8-current/fs/ext3/super.c	Sat Aug 14 05:35:37 2004
>+++ RC8-leak/fs/ext3/super.c	Sun Aug 15 04:41:09 2004
>@@ -35,6 +35,8 @@
> #include <linux/mount.h>
> #include <linux/namei.h>
> #include <linux/quotaops.h>
>+#include <linux/proc_fs.h>
>+#include <linux/seq_file.h>
> #include <asm/uaccess.h>
> #include "xattr.h"
> #include "acl.h"
>@@ -438,6 +440,9 @@
>
> static kmem_cache_t *ext3_inode_cachep;
>
>+static LIST_HEAD(ext3_list);
>+static spinlock_t ext3_list_lock = SPIN_LOCK_UNLOCKED;
>+
> /*
>  * Called inside transaction, so use GFP_NOFS
>  */
>@@ -453,11 +458,17 @@
> 	ei->i_default_acl = EXT3_ACL_NOT_CACHED;
> #endif
> 	ei->vfs_inode.i_version = 1;
>+	spin_lock(&ext3_list_lock);
>+	list_add(&ei->list, &ext3_list);
>+	spin_unlock(&ext3_list_lock);
> 	return &ei->vfs_inode;
> }
>
> static void ext3_destroy_inode(struct inode *inode)
> {
>+	spin_lock(&ext3_list_lock);
>+	list_del_init(&EXT3_I(inode)->list);
>+	spin_unlock(&ext3_list_lock);
> 	kmem_cache_free(ext3_inode_cachep, EXT3_I(inode));
> }
>
>@@ -475,20 +486,82 @@
> 		inode_init_once(&ei->vfs_inode);
> 	}
> }
>+
>+static void *ext3_cache_start(struct seq_file *m, loff_t *pos)
>+{
>+	struct list_head *p;
>+	loff_t l = *pos;
>+
>+	spin_lock(&ext3_list_lock);
>+	list_for_each(p, &ext3_list)
>+		if (!l--)
>+			return list_entry(p, struct ext3_inode_info, list);
>+	return NULL;
>+}
>+
>+static void *ext3_cache_next(struct seq_file *m, void *v, loff_t
> *pos) +{
>+	struct list_head *p = ((struct ext3_inode_info *)v)->list.next;
>+	(*pos)++;
>+	return p==&ext3_list ? NULL : list_entry(p, struct
> ext3_inode_info, list); +}
>+
>+static void ext3_cache_stop(struct seq_file *m, void *v)
>+{
>+	spin_unlock(&ext3_list_lock);
>+}
>+
>+static int ext3_cache_show(struct seq_file *m, void *v)
>+{
>+	struct ext3_inode_info *ei = v;
>+	struct inode *inode = &ei->vfs_inode;
>+	seq_printf(m, "%d:%d:%lu:%o",
>+		MAJOR(inode->i_sb->s_dev),
>+		MINOR(inode->i_sb->s_dev),
>+		inode->i_ino,
>+		inode->i_mode);
>+	return 0;
>+}
>+
>+static struct seq_operations ext3_cache_op = {
>+	.start	= ext3_cache_start,
>+	.next	= ext3_cache_next,
>+	.stop	= ext3_cache_stop,
>+	.show	= ext3_cache_show
>+};
>+
>+static int ext3_cache_open(struct inode *inode, struct file *file)
>+{
>+	return seq_open(file, &ext3_cache_op);
>+}
>+
>+static struct file_operations ext3_cache_operations = {
>+	.open		= ext3_cache_open,
>+	.read		= seq_read,
>+	.llseek		= seq_lseek,
>+	.release	= seq_release,
>+};
>
> static int init_inodecache(void)
> {
>+	struct proc_dir_entry *p;
> 	ext3_inode_cachep = kmem_cache_create("ext3_inode_cache",
> 					     sizeof(struct ext3_inode_info),
> 					     0, SLAB_HWCACHE_ALIGN|SLAB_RECLAIM_ACCOUNT,
> 					     init_once, NULL);
> 	if (ext3_inode_cachep == NULL)
> 		return -ENOMEM;
>+	p = create_proc_entry("fs/ext3", S_IRUGO, NULL);
>+	if (p) {
>+		p->owner = THIS_MODULE;
>+		p->proc_fops = &ext3_cache_operations;
>+	}
> 	return 0;
> }
>
> static void destroy_inodecache(void)
> {
>+	remove_proc_entry("fs/ext3", NULL);
> 	if (kmem_cache_destroy(ext3_inode_cachep))
> 		printk(KERN_INFO "ext3_inode_cache: not all structures were
> freed\n"); }
>diff -urN RC8-current/include/linux/ext3_fs_i.h
> RC8-leak/include/linux/ext3_fs_i.h ---
> RC8-current/include/linux/ext3_fs_i.h	Thu Oct  9 17:34:54 2003 +++
> RC8-leak/include/linux/ext3_fs_i.h	Sun Aug 15 04:11:03 2004 @@
> -107,6 +107,7 @@
> 	 * by other means, so we have truncate_sem.
> 	 */
> 	struct semaphore truncate_sem;
>+	struct list_head list;
> 	struct inode vfs_inode;
> };

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15  9:50                                   ` Gene Heskett
@ 2004-08-15 10:36                                     ` viro
  0 siblings, 0 replies; 146+ messages in thread
From: viro @ 2004-08-15 10:36 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sun, Aug 15, 2004 at 05:50:26AM -0400, Gene Heskett wrote:
> fs/buffer.c: In function `remove_inode_buffers':
> fs/buffer.c:1079: warning: ISO C90 forbids mixed declarations and code
> ----------
> Do we need to address this?  Its a line immediately below the BUG-ON 
> patch that Marcelo had me put in most recently, and has probably been 
> there all along.

No, it had appeared when Marcelo had put BUG_ON() before a declaration
of local variable.  Not acceptable for merge into the tree, but OK for
a debugging patch.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15 10:10                                   ` Gene Heskett
@ 2004-08-15 10:37                                     ` viro
  2004-08-15 10:42                                       ` Gene Heskett
       [not found]                                       ` <200408150704.49312.gene.heskett@verizon.net>
  0 siblings, 2 replies; 146+ messages in thread
From: viro @ 2004-08-15 10:37 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote:
> all in one line of text, its a bit hard to locate real duplicates.  
> But I think I see some right now!  Can this line be modified to spit 
> them out, one entry per line with all dups sorted to be adjacent?

Sure, just add \n in format here.  Sorry, hadn't noticed that...

> >+	seq_printf(m, "%d:%d:%lu:%o",

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15 10:37                                     ` viro
@ 2004-08-15 10:42                                       ` Gene Heskett
  2004-08-15 11:00                                         ` viro
       [not found]                                       ` <200408150704.49312.gene.heskett@verizon.net>
  1 sibling, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-15 10:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sunday 15 August 2004 06:37, viro@parcelfarce.linux.theplanet.co.uk 
wrote:
>On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote:
>> all in one line of text, its a bit hard to locate real duplicates.
>> But I think I see some right now!  Can this line be modified to
>> spit them out, one entry per line with all dups sorted to be
>> adjacent?
>
>Sure, just add \n in format here.  Sorry, hadn't noticed that...
>
>> >+	seq_printf(m, "%d:%d:%lu:%o",

Can do, assume it would then be seq_printf(m, "%d:%d:%lu:%o\n"?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15 10:42                                       ` Gene Heskett
@ 2004-08-15 11:00                                         ` viro
  0 siblings, 0 replies; 146+ messages in thread
From: viro @ 2004-08-15 11:00 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sun, Aug 15, 2004 at 06:42:11AM -0400, Gene Heskett wrote:
> On Sunday 15 August 2004 06:37, viro@parcelfarce.linux.theplanet.co.uk 
> wrote:
> >On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote:
> >> all in one line of text, its a bit hard to locate real duplicates.
> >> But I think I see some right now!  Can this line be modified to
> >> spit them out, one entry per line with all dups sorted to be
> >> adjacent?
> >
> >Sure, just add \n in format here.  Sorry, hadn't noticed that...
> >
> >> >+	seq_printf(m, "%d:%d:%lu:%o",
> 
> Can do, assume it would then be seq_printf(m, "%d:%d:%lu:%o\n"?

Yes

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]                                       ` <200408150704.49312.gene.heskett@verizon.net>
@ 2004-08-15 11:26                                         ` viro
  2004-08-15 17:47                                           ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: viro @ 2004-08-15 11:26 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sun, Aug 15, 2004 at 07:04:49AM -0400, Gene Heskett wrote:
> On Sunday 15 August 2004 06:37, viro@parcelfarce.linux.theplanet.co.uk 
> wrote:
> >On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote:
> >> all in one line of text, its a bit hard to locate real duplicates.
> >> But I think I see some right now!  Can this line be modified to
> >> spit them out, one entry per line with all dups sorted to be
> >> adjacent?
> >
> >Sure, just add \n in format here.  Sorry, hadn't noticed that...
> >
> >> >+	seq_printf(m, "%d:%d:%lu:%o\n",
> 
> And here it is right after starting x on the reboot. (I take it the 
> first number is the number of dups?)

Yes - uniq -c merges duplicates and puts the number of copies in front
of line, so sort | uniq -c | sort -nr will sort by frequency and print
each line with number of times it had occured.

You don't have any duplicates so far and the output looks OK...

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15  9:42                                   ` Gene Heskett
@ 2004-08-15 17:31                                     ` Andrew Morton
  2004-08-15 17:58                                       ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Andrew Morton @ 2004-08-15 17:31 UTC (permalink / raw)
  To: gene.heskett; +Cc: linux-kernel, viro, marcelo.tosatti, torvalds

Gene Heskett <gene.heskett@verizon.net> wrote:
>
> ...
>
> Now, this mornings logwatch told me I should go look at the 
>  logs again, and I found this had occurred several hours earlier:
>  -----------
>  Aug 14 18:53:24 coyote kernel: Unable to handle kernel paging request at virtual address 0058af03

This oops is the _cause_ of the out-of-memory condition.  The oopsing
process exitted while holding shrinker_sem, so slab will never again be
shrunk.

Any observed behaviour after an oops is almost always uninteresting, and
usually misleading.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15 11:26                                         ` viro
@ 2004-08-15 17:47                                           ` Gene Heskett
       [not found]                                             ` <200408152257.04773.vda@port.imtp.ilyichevsk.odessa.ua>
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-15 17:47 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sunday 15 August 2004 07:26, viro@parcelfarce.linux.theplanet.co.uk wrote:
>On Sun, Aug 15, 2004 at 07:04:49AM -0400, Gene Heskett wrote:
>> On Sunday 15 August 2004 06:37,
>> viro@parcelfarce.linux.theplanet.co.uk
>>
>> wrote:
>> >On Sun, Aug 15, 2004 at 06:10:28AM -0400, Gene Heskett wrote:
>> >> all in one line of text, its a bit hard to locate real
>> >> duplicates. But I think I see some right now!  Can this line be
>> >> modified to spit them out, one entry per line with all dups
>> >> sorted to be adjacent?
>> >
>> >Sure, just add \n in format here.  Sorry, hadn't noticed that...
>> >
>> >> >+	seq_printf(m, "%d:%d:%lu:%o\n",
>>
>> And here it is right after starting x on the reboot. (I take it
>> the first number is the number of dups?)
>
>Yes - uniq -c merges duplicates and puts the number of copies in
> front of line, so sort | uniq -c | sort -nr will sort by frequency
> and print each line with number of times it had occured.
>
>You don't have any duplicates so far and the output looks OK...

And I still don't have any dups, but I AAARRRRGGGGGggg! do have this:

--------------
Aug 15 09:33:02 coyote kernel: Unable to handle kernel paging request at virtual address 5f746573
Aug 15 09:33:02 coyote kernel:  printing eip:
Aug 15 09:33:02 coyote kernel: 5f746573
Aug 15 09:33:02 coyote kernel: *pde = 00000000
Aug 15 09:33:02 coyote kernel: Oops: 0000 [#1]
Aug 15 09:33:02 coyote kernel: PREEMPT
Aug 15 09:33:02 coyote kernel: Modules linked in: eeprom snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
Aug 15 09:33:02 coyote kernel: CPU:    0
Aug 15 09:33:02 coyote kernel: EIP:    0060:[<5f746573>]    Not tainted
Aug 15 09:33:02 coyote kernel: EFLAGS: 00210006   (2.6.8-rc4)
Aug 15 09:33:02 coyote kernel: EIP is at 0x5f746573
Aug 15 09:33:02 coyote kernel: eax: f0679a18   ebx: 20262620   ecx: 00000000   edx: 00000001
Aug 15 09:33:02 coyote kernel: esi: 63617266   edi: 00000001   ebp: ee62dcf8   esp: ee62dcd8
Aug 15 09:33:02 coyote kernel: ds: 007b   es: 007b   ss: 0068
Aug 15 09:33:02 coyote kernel: Process top (pid: 2439, threadinfo=ee62d000 task=ed68c3b0)
Aug 15 09:33:02 coyote kernel: Stack: c0113378 f0679a18 00000001 00000000 00000000 ee62d000 00000000 00200286
Aug 15 09:33:02 coyote kernel:        ee62dd20 c01133db f0678924 00000001 00000001 00000000 00000000 f0678000
Aug 15 09:33:02 coyote kernel:        ee62dea8 ee52df3e ee62de0c c01ef5fe 00000000 0000001d 00020001 ffffffff
Aug 15 09:33:02 coyote kernel: Call Trace:
Aug 15 09:33:02 coyote kernel:  [<c010476f>] show_stack+0x7f/0xa0
Aug 15 09:33:02 coyote kernel:  [<c0104908>] show_registers+0x158/0x1b0
Aug 15 09:33:02 coyote kernel:  [<c0104a89>] die+0x89/0x100
Aug 15 09:33:02 coyote kernel:  [<c0111725>] do_page_fault+0x1f5/0x553
Aug 15 09:33:02 coyote kernel:  [<c01043d9>] error_code+0x2d/0x38
Aug 15 09:33:02 coyote kernel:  [<c01133db>] __wake_up+0x3b/0x70
Aug 15 09:33:02 coyote kernel:  [<c01ef5fe>] n_tty_receive_buf+0x20e/0xf20
Aug 15 09:33:02 coyote kernel:  [<c01f1e3a>] pty_write+0x12a/0x130
Aug 15 09:33:02 coyote kernel:  [<c01eec7b>] opost_block+0xeb/0x1a0
Aug 15 09:33:02 coyote kernel:  [<c01f0efc>] write_chan+0x18c/0x220
Aug 15 09:33:02 coyote kernel:  [<c01eb9e7>] tty_write+0x1b7/0x250
Aug 15 09:33:02 coyote kernel:  [<c014b7ca>] vfs_write+0xca/0x140
Aug 15 09:33:02 coyote kernel:  [<c014b90b>] sys_write+0x4b/0x80
Aug 15 09:33:02 coyote kernel:  [<c01041dd>] sysenter_past_esp+0x52/0x71
Aug 15 09:33:02 coyote kernel: Code:  Bad EIP value.
Aug 15 09:33:02 coyote kernel:  <6>note: top[2439] exited with preempt_count 2
Aug 15 09:33:02 coyote kernel: bad: scheduling while atomic!
Aug 15 09:33:02 coyote kernel:  [<c01047ae>] dump_stack+0x1e/0x20
Aug 15 09:33:02 coyote kernel:  [<c0305578>] schedule+0x478/0x480
Aug 15 09:33:02 coyote kernel:  [<c013d209>] unmap_vmas+0x199/0x1b0
Aug 15 09:33:02 coyote kernel:  [<c0141471>] exit_mmap+0x81/0x160
Aug 15 09:33:02 coyote kernel:  [<c0114895>] mmput+0x65/0x90
Aug 15 09:33:02 coyote kernel:  [<c0118ad3>] do_exit+0x153/0x430
Aug 15 09:33:02 coyote kernel:  [<c0104af9>] die+0xf9/0x100
Aug 15 09:33:02 coyote kernel:  [<c0111725>] do_page_fault+0x1f5/0x553
Aug 15 09:33:02 coyote kernel:  [<c01043d9>] error_code+0x2d/0x38
Aug 15 09:33:02 coyote kernel:  [<c01133db>] __wake_up+0x3b/0x70
Aug 15 09:33:02 coyote kernel:  [<c01ef5fe>] n_tty_receive_buf+0x20e/0xf20
Aug 15 09:33:02 coyote kernel:  [<c01f1e3a>] pty_write+0x12a/0x130
Aug 15 09:33:02 coyote kernel:  [<c01eec7b>] opost_block+0xeb/0x1a0
Aug 15 09:33:02 coyote kernel:  [<c01f0efc>] write_chan+0x18c/0x220
Aug 15 09:33:02 coyote kernel:  [<c01eb9e7>] tty_write+0x1b7/0x250
Aug 15 09:33:02 coyote kernel:  [<c014b7ca>] vfs_write+0xca/0x140
Aug 15 09:33:02 coyote kernel:  [<c014b90b>] sys_write+0x4b/0x80
Aug 15 09:33:02 coyote kernel:  [<c01041dd>] sysenter_past_esp+0x52/0x71
-------------------

And the shell I had a "top" running in on xwindow #2 had crashed with a SIGABRT.
This was about 10 minutes after I had gone out to make some more cement blocks,
which takes around 3 hours.

I was able to restart the shell, and the top.  The system "feels" normal.

I'm going to call tcwo tomorrow and see what I can get in new hardware.
This is fscking ridiculous.  I get a cpu/cooler/fan that runs 40C
cooler than the old one and its doing nothing but crashing.  The
absolute longest uptime so far was the recent nearly 37 hours.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15 17:31                                     ` Andrew Morton
@ 2004-08-15 17:58                                       ` Gene Heskett
  0 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-15 17:58 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, viro, marcelo.tosatti, torvalds

On Sunday 15 August 2004 13:31, Andrew Morton wrote:
>Gene Heskett <gene.heskett@verizon.net> wrote:
>> ...
>>
>> Now, this mornings logwatch told me I should go look at the
>>  logs again, and I found this had occurred several hours earlier:
>>  -----------
>>  Aug 14 18:53:24 coyote kernel: Unable to handle kernel paging
>> request at virtual address 0058af03
>
>This oops is the _cause_ of the out-of-memory condition.  The
> oopsing process exitted while holding shrinker_sem, so slab will
> never again be shrunk.
>
>Any observed behaviour after an oops is almost always uninteresting,
> and usually misleading.

Okaaaay, now what. See my post of 10 minutes ago, this top "top" 
took a SIGABRT exit.  I posted the Oops, but now here is meminfo:

[root@coyote linux-2.6.8-rc4]# cat /proc/meminfo
MemTotal:      1035848 kB
MemFree:        238016 kB
Buffers:         98756 kB
Cached:         491324 kB
SwapCached:          0 kB
Active:         343340 kB
Inactive:       416908 kB
HighTotal:      131008 kB
HighFree:          252 kB
LowTotal:       904840 kB
LowFree:        237764 kB
SwapTotal:     3857104 kB
SwapFree:      3857104 kB
Dirty:              56 kB
Writeback:           0 kB
Mapped:         229924 kB
Slab:            27416 kB
Committed_AS:   333992 kB
PageTables:       3292 kB
VmallocTotal:   114680 kB
VmallocUsed:     19636 kB
VmallocChunk:    94936 kB

And slabinfo:

[root@coyote linux-2.6.8-rc4]# cat /proc/slabinfo
slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
unix_sock            173    180    384   10    1 : tunables   54   27    0 : slabdata     18     18      0
tcp_tw_bucket          2     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_bind_bucket       21    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_open_request       0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
inet_peer_cache        0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
ip_fib_hash           10    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
ip_dst_cache          12     30    256   15    1 : tunables  120   60    0 : slabdata      2      2      0
arp_cache              3     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
raw4_sock              0      0    480    8    1 : tunables   54   27    0 : slabdata      0      0      0
udp_sock               2      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
tcp_sock              31     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
flow_cache             0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
mqueue_inode_cache      1      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
udf_inode_cache        0      0    352   11    1 : tunables   54   27    0 : slabdata      0      0      0
smb_request            0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
smb_inode_cache        2     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
isofs_inode_cache      0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
fat_inode_cache        1     11    352   11    1 : tunables   54   27    0 : slabdata      1      1      0
ext2_inode_cache       0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
journal_handle         4    135     28  135    1 : tunables  120   60    0 : slabdata      1      1      0
journal_head          95    243     48   81    1 : tunables  120   60    0 : slabdata      3      3      0
revoke_table          12    290     12  290    1 : tunables  120   60    0 : slabdata      1      1      0
revoke_record          0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
ext3_inode_cache   20079  20088    448    9    1 : tunables   54   27    0 : slabdata   2232   2232      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_epi          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
kioctx                 0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
kiocb                  0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
dnotify_cache        172    370     20  185    1 : tunables  120   60    0 : slabdata      2      2      0
file_lock_cache       43     43     92   43    1 : tunables  120   60    0 : slabdata      1      1      0
fasync_cache           2    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
shmem_inode_cache      5     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uid_cache              7    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
sgpool-128            32     32   2048    2    1 : tunables   24   12    0 : slabdata     16     16      0
sgpool-64             32     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
sgpool-32             32     32    512    8    1 : tunables   54   27    0 : slabdata      4      4      0
sgpool-16             32     45    256   15    1 : tunables  120   60    0 : slabdata      3      3      0
sgpool-8              32     62    128   31    1 : tunables  120   60    0 : slabdata      2      2      0
cfq_pool              64    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
crq_pool               0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : slabdata      0      0      0
as_arq                65     65     60   65    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_ioc            63    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_queue          12     18    448    9    1 : tunables   54   27    0 : slabdata      2      2      0
blkdev_requests       52     52    152   26    1 : tunables  120   60    0 : slabdata      2      2      0
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            256    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
biovec-16            256    260    192   20    1 : tunables  120   60    0 : slabdata     13     13      0
biovec-4             256    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1             318    452     16  226    1 : tunables  120   60    0 : slabdata      2      2      0
bio                  342    366     64   61    1 : tunables  120   60    0 : slabdata      6      6      0
sock_inode_cache     210    220    352   11    1 : tunables   54   27    0 : slabdata     20     20      0
skbuff_head_cache    250    325    160   25    1 : tunables  120   60    0 : slabdata     13     13      0
sock                   3     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
proc_inode_cache     600    600    320   12    1 : tunables   54   27    0 : slabdata     50     50      0
sigqueue             117    135    148   27    1 : tunables  120   60    0 : slabdata      5      5      0
radix_tree_node     9073   9604    276   14    1 : tunables   54   27    0 : slabdata    686    686      0
bdev_cache            11     18    416    9    1 : tunables   54   27    0 : slabdata      2      2      0
mnt_cache             26     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
inode_cache         2198   2198    288   14    1 : tunables   54   27    0 : slabdata    157    157      0
dentry_cache       33792  33796    140   28    1 : tunables  120   60    0 : slabdata   1207   1207      0
filp                2030   2125    160   25    1 : tunables  120   60    0 : slabdata     85     85      0
names_cache           16     16   4096    1    1 : tunables   24   12    0 : slabdata     16     16      0
idr_layer_cache       81     87    136   29    1 : tunables  120   60    0 : slabdata      3      3      0
buffer_head        76638  76707     48   81    1 : tunables  120   60    0 : slabdata    947    947      0
mm_struct             91     91    512    7    1 : tunables   54   27    0 : slabdata     13     13      0
vm_area_struct      7703   7896     84   47    1 : tunables  120   60    0 : slabdata    168    168      0
fs_cache             100    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
files_cache           90     90    416    9    1 : tunables   54   27    0 : slabdata     10     10      0
signal_cache         119    123     96   41    1 : tunables  120   60    0 : slabdata      3      3      0
sighand_cache        102    102   1312    3    1 : tunables   24   12    0 : slabdata     34     34      0
task_struct          110    110   1424    5    2 : tunables   24   12    0 : slabdata     22     22      0
anon_vma            1619   2035      8  407    1 : tunables  120   60    0 : slabdata      5      5      0
pgd                   87     87   4096    1    1 : tunables   24   12    0 : slabdata     87     87      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 : slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             4      4  16384    1    4 : tunables    8    4    0 : slabdata      4      4      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             11     11   8192    1    2 : tunables    8    4    0 : slabdata     11     11      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096            180    180   4096    1    1 : tunables   24   12    0 : slabdata    180    180      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048            162    186   2048    2    1 : tunables   24   12    0 : slabdata     93     93      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            124    124   1024    4    1 : tunables   54   27    0 : slabdata     31     31      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             184    448    512    8    1 : tunables   54   27    0 : slabdata     56     56      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             180    450    256   15    1 : tunables  120   60    0 : slabdata     30     30      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    0 : slabdata      0      0      0
size-192             100    100    192   20    1 : tunables  120   60    0 : slabdata      5      5      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128            1174   1209    128   31    1 : tunables  120   60    0 : slabdata     39     39      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64              915    915     64   61    1 : tunables  120   60    0 : slabdata     15     15      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32             1369   1428     32  119    1 : tunables  120   60    0 : slabdata     12     12      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

And a dmesg after a dmesg -c: returns an empty file.

Next please?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]                                             ` <200408152257.04773.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-08-15 20:33                                               ` Gene Heskett
       [not found]                                                 ` <200408160803.15206.vda@port.imtp.ilyichevsk.odessa.ua>
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-15 20:33 UTC (permalink / raw)
  To: linux-kernel
  Cc: Denis Vlasenko, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sunday 15 August 2004 15:57, Denis Vlasenko wrote:
>> And I still don't have any dups, but I AAARRRRGGGGGggg! do have
>> this:
>>
>> --------------
>> Aug 15 09:33:02 coyote kernel: Unable to handle kernel paging
>> request at virtual address 5f746573 Aug 15 09:33:02 coyote kernel:
>>  printing eip: Aug 15 09:33:02 coyote kernel: 5f746573
>> Aug 15 09:33:02 coyote kernel: *pde = 00000000
>> Aug 15 09:33:02 coyote kernel: Oops: 0000 [#1]
>> Aug 15 09:33:02 coyote kernel: PREEMPT
>
>                                 ^^^^^^^
>
>> Aug 15 09:33:02 coyote kernel: Modules linked in: eeprom
>> snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss
>> snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer
>> snd_page_alloc snd_mpu401_uart snd_rawmidi snd_seq_device snd
>> forcedeth
>
>Gene, you should have stopped using preempt/smp and sound modules
>in an attempt to narrow down the bug. We already kinda determined
>that you are experiencing random memory corruption, but hardware
>was tested and seems to be ok. It's software, then. Preempt/smp bug
>or buggy driver are prime suspects.

Ok, non-preempt is building.  Will reboot to it when the build is 
done.

>> I was able to restart the shell, and the top.  The system "feels"
>> normal.
>>
>> I'm going to call tcwo tomorrow and see what I can get in new
>> hardware.
>
>Very likely this won't help.

I'm not quite as sure.  This could be a mobo with a flakey buffer 
latch or something.  I also had, many years ago, a z-80 that would 
not reliably switch its foreground/background register set.  And 
guess what?  By the time I'd diagnosed it, zilog wasn't interested in 
replaceing an obviously flakey chip.  Out of warranty according to 
the date stamps.  Not my problem it laid on some distribs shelf for a 
frigging year plus...
>-
>- 
>vda

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]                                                 ` <200408160803.15206.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-08-16  6:32                                                   ` Gene Heskett
  2004-08-16 14:13                                                     ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-16  6:32 UTC (permalink / raw)
  To: linux-kernel; +Cc: Denis Vlasenko, viro, Marcelo Tosatti

On Monday 16 August 2004 01:03, Denis Vlasenko wrote:
>On Sunday 15 August 2004 23:33, Gene Heskett wrote:
>> On Sunday 15 August 2004 15:57, Denis Vlasenko wrote:
>> >> And I still don't have any dups, but I AAARRRRGGGGGggg! do have
>> >> this:
>> >>
>> >> --------------
>> >> Aug 15 09:33:02 coyote kernel: Unable to handle kernel paging
>> >> request at virtual address 5f746573 Aug 15 09:33:02 coyote
>> >> kernel: printing eip: Aug 15 09:33:02 coyote kernel: 5f746573
>> >> Aug 15 09:33:02 coyote kernel: *pde = 00000000
>> >> Aug 15 09:33:02 coyote kernel: Oops: 0000 [#1]
>> >> Aug 15 09:33:02 coyote kernel: PREEMPT
>> >
>> >                                 ^^^^^^^
>> >
>> >> Aug 15 09:33:02 coyote kernel: Modules linked in: eeprom
>> >> snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss
>> >> snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm
>> >> snd_timer snd_page_alloc snd_mpu401_uart snd_rawmidi
>> >> snd_seq_device snd forcedeth
>> >
>> >Gene, you should have stopped using preempt/smp and sound modules
>> >in an attempt to narrow down the bug. We already kinda determined
>> >that you are experiencing random memory corruption, but hardware
>> >was tested and seems to be ok. It's software, then. Preempt/smp
>> > bug or buggy driver are prime suspects.
>>
>> Ok, non-preempt is building.  Will reboot to it when the build is
>> done.
>
>Do not load sound modules too please, unless you absolutely need
> sound.

One thing at a time I think.  Thats major surgery on modprobe.conf to 
disable that, plus a chkconfig alsasound off.

I've noticed that with preempt off, my kde curser motions are back to 
using the mouse if I want to move it more than a word or so to hit a 
typu and fix it.  Its an effect that comes and goes, often in the 
same message reply.  X is running at -1 I think.  Other than that 
(knock on wood) its running ok so far, but only 9h50m uptime.

>> >> I was able to restart the shell, and the top.  The system
>> >> "feels" normal.
>> >>
>> >> I'm going to call tcwo tomorrow and see what I can get in new
>> >> hardware.
>> >
>> >Very likely this won't help.
>>
>> I'm not quite as sure.  This could be a mobo with a flakey buffer
>> latch or something.  I also had, many years ago, a z-80 that would
>
>GCC is likely to sometimes catch sig11 on such flakey hardware.
>You did not report anything like that, than's why I'm thinking
>hardware is ok.
>
>> not reliably switch its foreground/background register set.  And
>> guess what?  By the time I'd diagnosed it, zilog wasn't interested
>> in replaceing an obviously flakey chip.  Out of warranty according
>> to the date stamps.  Not my problem it laid on some distribs shelf
>> for a frigging year plus...
>
>--
>vda

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-16  6:32                                                   ` Gene Heskett
@ 2004-08-16 14:13                                                     ` Gene Heskett
       [not found]                                                       ` <200408161749.23663.vda@port.imtp.ilyichevsk.odessa.ua>
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-16 14:13 UTC (permalink / raw)
  To: linux-kernel; +Cc: Denis Vlasenko, viro, Marcelo Tosatti

On Monday 16 August 2004 02:32, Gene Heskett wrote:
>On Monday 16 August 2004 01:03, Denis Vlasenko wrote:
>>On Sunday 15 August 2004 23:33, Gene Heskett wrote:
>>> On Sunday 15 August 2004 15:57, Denis Vlasenko wrote:

[...]

>>> >Gene, you should have stopped using preempt/smp and sound
>>> > modules in an attempt to narrow down the bug. We already kinda
>>> > determined that you are experiencing random memory corruption,
>>> > but hardware was tested and seems to be ok. It's software,
>>> > then. Preempt/smp bug or buggy driver are prime suspects.
>>>
>>> Ok, non-preempt is building.  Will reboot to it when the build is
>>> done.
>>
>>Do not load sound modules too please, unless you absolutely need
>> sound.
>
>One thing at a time I think.  Thats major surgery on modprobe.conf
> to disable that, plus a chkconfig alsasound off.
>
>I've noticed that with preempt off, my kde curser motions are back
> to using the mouse if I want to move it more than a word or so to
> hit a typu and fix it.  Its an effect that comes and goes, often in
> the same message reply.  X is running at -1 I think.  Other than
> that (knock on wood) its running ok so far, but only 9h50m uptime.
[...]
With PREEMPT off, and a 16 hour uptime, I am suddenly nearly out of
memory again. As an additional tool, I had started ksysguard for its
gfx memory display and set it for a 1 minute update interval.  When 
I awoke again, the memory panel was 100% blue since some major event,
I assume logrotate by cron, ran but hadn't quite scrolled off screen.

However, there is no swapping yet, and nothing unusual in the log.
Here are /proc/meminfo:
MemTotal:      1035956 kB
MemFree:         14036 kB
Buffers:        181044 kB
Cached:         114024 kB
SwapCached:          0 kB
Active:         277684 kB
Inactive:       148840 kB
HighTotal:      131008 kB
HighFree:         9408 kB
LowTotal:       904948 kB
LowFree:          4628 kB
SwapTotal:     3857104 kB
SwapFree:      3857104 kB
Dirty:              12 kB
Writeback:           0 kB
Mapped:         202108 kB
Slab:           584876 kB
Committed_AS:   276216 kB
PageTables:       3340 kB
VmallocTotal:   114680 kB
VmallocUsed:     19876 kB
VmallocChunk:    94640 kB

and /proc/slabinfo:
slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
unix_sock            200    200    384   10    1 : tunables   54   27    0 : slabdata     20     20      0
tcp_tw_bucket          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
tcp_bind_bucket       35    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_open_request       0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
inet_peer_cache        0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
ip_fib_hash           10    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
ip_dst_cache          15     15    256   15    1 : tunables  120   60    0 : slabdata      1      1      0
arp_cache              3     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
raw4_sock              0      0    480    8    1 : tunables   54   27    0 : slabdata      0      0      0
udp_sock               2      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
tcp_sock              32     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
flow_cache             0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
mqueue_inode_cache      1      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
udf_inode_cache        0      0    352   11    1 : tunables   54   27    0 : slabdata      0      0      0
smb_request            0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
smb_inode_cache        2     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
isofs_inode_cache      0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
fat_inode_cache        4     22    352   11    1 : tunables   54   27    0 : slabdata      2      2      0
ext2_inode_cache       0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
journal_handle         8    135     28  135    1 : tunables  120   60    0 : slabdata      1      1      0
journal_head        1114   3888     48   81    1 : tunables  120   60    0 : slabdata     48     48      0
revoke_table          12    290     12  290    1 : tunables  120   60    0 : slabdata      1      1      0
revoke_record          0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
ext3_inode_cache  1000246 1020249    448    9    1 : tunables   54   27    0 : slabdata 113361 113361      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_epi          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
kioctx                 0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
kiocb                  0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
dnotify_cache        172    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
file_lock_cache       43     43     92   43    1 : tunables  120   60    0 : slabdata      1      1      0
fasync_cache           2    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
shmem_inode_cache      6     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uid_cache              7    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
sgpool-128            32     32   2048    2    1 : tunables   24   12    0 : slabdata     16     16      0
sgpool-64             32     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
sgpool-32             32     32    512    8    1 : tunables   54   27    0 : slabdata      4      4      0
sgpool-16             32     45    256   15    1 : tunables  120   60    0 : slabdata      3      3      0
sgpool-8              32     62    128   31    1 : tunables  120   60    0 : slabdata      2      2      0
cfq_pool              64    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
crq_pool               0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : slabdata      0      0      0
as_arq                65     65     60   65    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_ioc            73    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_queue          12     18    448    9    1 : tunables   54   27    0 : slabdata      2      2      0
blkdev_requests       52     52    152   26    1 : tunables  120   60    0 : slabdata      2      2      0
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            256    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
biovec-16            256    260    192   20    1 : tunables  120   60    0 : slabdata     13     13      0
biovec-4             256    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1             320    452     16  226    1 : tunables  120   60    0 : slabdata      2      2      0
bio                  319    366     64   61    1 : tunables  120   60    0 : slabdata      6      6      0
sock_inode_cache     242    242    352   11    1 : tunables   54   27    0 : slabdata     22     22      0
skbuff_head_cache    235    450    160   25    1 : tunables  120   60    0 : slabdata     18     18      0
sock                   3     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
proc_inode_cache     571    600    320   12    1 : tunables   54   27    0 : slabdata     50     50      0
sigqueue             108    108    148   27    1 : tunables  120   60    0 : slabdata      4      4      0
radix_tree_node    10212  21182    276   14    1 : tunables   54   27    0 : slabdata   1513   1513      0
bdev_cache            11     18    416    9    1 : tunables   54   27    0 : slabdata      2      2      0
mnt_cache             26     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
inode_cache         2371   2380    288   14    1 : tunables   54   27    0 : slabdata    170    170      0
dentry_cache      718370 781704    140   28    1 : tunables  120   60    0 : slabdata  27918  27918      0
filp                2145   2300    160   25    1 : tunables  120   60    0 : slabdata     92     92      0
names_cache           17     17   4096    1    1 : tunables   24   12    0 : slabdata     17     17      0
idr_layer_cache       81     87    136   29    1 : tunables  120   60    0 : slabdata      3      3      0
buffer_head        51836  80919     48   81    1 : tunables  120   60    0 : slabdata    999    999      0
mm_struct             98     98    512    7    1 : tunables   54   27    0 : slabdata     14     14      0
vm_area_struct      7852   8272     84   47    1 : tunables  120   60    0 : slabdata    176    176      0
fs_cache             103    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
files_cache           99     99    416    9    1 : tunables   54   27    0 : slabdata     11     11      0
signal_cache         123    123     96   41    1 : tunables  120   60    0 : slabdata      3      3      0
sighand_cache        111    111   1312    3    1 : tunables   24   12    0 : slabdata     37     37      0
task_struct          115    120   1424    5    2 : tunables   24   12    0 : slabdata     24     24      0
anon_vma            1796   2035      8  407    1 : tunables  120   60    0 : slabdata      5      5      0
pgd                   90     90   4096    1    1 : tunables   24   12    0 : slabdata     90     90      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 : slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384            10     10  16384    1    4 : tunables    8    4    0 : slabdata     10     10      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192              9      9   8192    1    2 : tunables    8    4    0 : slabdata      9      9      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096            191    191   4096    1    1 : tunables   24   12    0 : slabdata    191    191      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048            172    192   2048    2    1 : tunables   24   12    0 : slabdata     96     96      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            145    164   1024    4    1 : tunables   54   27    0 : slabdata     41     41      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             184    448    512    8    1 : tunables   54   27    0 : slabdata     56     56      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             165    435    256   15    1 : tunables  120   60    0 : slabdata     29     29      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    0 : slabdata      0      0      0
size-192             120    120    192   20    1 : tunables  120   60    0 : slabdata      6      6      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128            1231   1271    128   31    1 : tunables  120   60    0 : slabdata     41     41      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64            32409  33123     64   61    1 : tunables  120   60    0 : slabdata    543    543      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32             1428   1428     32  119    1 : tunables  120   60    0 : slabdata     12     12      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

Note the size-64, dentry_cache and ext3_inode_cache lines
Now if I can remember that shell line to check /proc/fs/ext3
for dups:

Unforch, that doesn't want to work, cat is using 90% of the cpu,
and the command line: cat /proc/fs/ext3|sort|uniq -c|sort -nr
is hung.  But it will ctl-c.  Humm, cat /proc/fs/ext3 by itself
is running, its just got so much data that I ctl-c'd it after 1 minute.
This may be an interesting report IF it ever gets done.  But at 
10 megs for shell history I may have to redo it directed to a file!
Yes, at least 10 megs scrolled off the end of the scrollback buffer.
However, as I watched it scrolling, I never saw the first digit
change to a non-1 value.  Odd effect, the cpu temp is falling, 
by about 5C.  And with only 7 megs free according to top, its
still not swapping!

The file is just short of 24 megs.  Now to grep it for errors.

Aha!  There are some non-1 first digit values in that file!
[root@coyote linux-2.6.8-rc4]# grep ' 2 ' /ext3-allocs
      2 3:8:8227974:100644
      2 3:8:8227973:100644
      2 3:8:8227972:100644
      2 3:8:8227971:100644
      2 3:8:8193936:100644
      2 3:8:8193935:100644
      2 3:8:8193934:100644
      2 3:8:8193738:100644
      2 3:8:7834144:100644
      2 3:8:7834143:100644
      2 3:8:7684604:100644
      2 3:8:7521425:100644
      2 3:8:7521411:100644
      2 3:8:6360398:40755
      2 3:8:6013120:40755
      2 3:8:6013101:40755
      2 3:8:5982111:40755
      2 3:8:5982098:40755
      2 3:8:5982088:40775
      2 3:8:5949697:40777
      2 3:8:5949683:40777
      2 3:8:5947892:42755
      2 3:8:5947890:42755
      2 3:8:5915386:42755
      2 3:8:5915379:42755
      2 3:8:5901299:42755
      2 3:8:5901289:42755
      2 3:8:5835169:42777
      2 3:8:5835162:40755
      2 3:8:5835159:40755
      2 3:8:1250790:100644
      2 3:8:1250789:100644

However, thats the end of it:
[root@coyote linux-2.6.8-rc4]# grep ' 3 ' /ext3-allocs
[root@coyote linux-2.6.8-rc4]# grep ' 4 ' /ext3-allocs
[root@coyote linux-2.6.8-rc4]# grep ' 5 ' /ext3-allocs
[root@coyote linux-2.6.8-rc4]# grep ' 6 ' /ext3-allocs
[root@coyote linux-2.6.8-rc4]# grep ' 7 ' /ext3-allocs
[root@coyote linux-2.6.8-rc4]# grep ' 8 ' /ext3-allocs
[root@coyote linux-2.6.8-rc4]# grep ' 9 ' /ext3-allocs
[root@coyote linux-2.6.8-rc4]# grep ' 10 ' /ext3-allocs

So now we have an odor of a problem, the question is what does
it smell like?  What can I do next to shine a light on this?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]                                                       ` <200408161749.23663.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-08-16 15:25                                                         ` Gene Heskett
  0 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-16 15:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Denis Vlasenko, viro, Marcelo Tosatti

On Monday 16 August 2004 10:49, Denis Vlasenko wrote:
>> >>> Ok, non-preempt is building.  Will reboot to it when the build
>> >>> is done.
>> >>
>> >>Do not load sound modules too please, unless you absolutely need
>> >> sound.
>> >
>> >One thing at a time I think.  Thats major surgery on
>> > modprobe.conf to disable that, plus a chkconfig alsasound off.
>> >
>> >I've noticed that with preempt off, my kde curser motions are
>> > back to using the mouse if I want to move it more than a word or
>> > so to hit a typu and fix it.  Its an effect that comes and goes,
>> > often in the same message reply.  X is running at -1 I think. 
>> > Other than that (knock on wood) its running ok so far, but only
>> > 9h50m uptime.
>>
>> [...]
>> With PREEMPT off, and a 16 hour uptime, I am suddenly nearly out
>> of memory again. As an additional tool, I had started ksysguard
>> for its
>
>That depends of what you call "out of memory". It's normal for Linux
>to have very little free memory. top shows on my 256Mb home box:
>
>224 processes: 223 sleeping, 1 running, 0 zombie, 0 stopped
>CPU states:   2.9% user  13.8% system   0.0% nice   0.1% iowait 
> 82.9% idle Mem:   254936k av,  252736k used,    2200k free,      
> 0k shrd,   38872k buff ^^^^^^^^^^
>       197796k active,              31396k inactive
>Swap:  262136k av,       0k used,  262136k free                  
> 95868k cached ^^^^^^^^^^^^^
>
>Of course, when you're fresh after reboot, you do have
>tons of free memory. How quickly cache will fill your RAM
>depends on RAM amount and your usage pattern.
>With 1Gig of RAM and mild usage it can take e.g. 16 hours or so. ;)
>
>> gfx memory display and set it for a 1 minute update interval. 
>> When I awoke again, the memory panel was 100% blue since some
>> major event, I assume logrotate by cron, ran but hadn't quite
>> scrolled off screen.
>
>Quite possibly. Reboot, run "grep -rF 'something' ." in a kernel
> tree and see your RAM quickly filled with cache.
>
>> However, there is no swapping yet, and nothing unusual in the log.
>> Here are /proc/meminfo:
>> MemTotal:      1035956 kB
>> Buffers:        181044 kB
>
>I'm not sure. Maybe this is a bit high. Other values look ok.
>
>> and /proc/slabinfo:
>> slabinfo - version: 2.0
>> # name            <active_objs> <num_objs> <objsize> <objperslab>
>> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> :
>> slabdata <active_slabs> <num_slabs> <sharedavail>
>>
>> Note the size-64, dentry_cache and ext3_inode_cache lines
>
>Yes, dentry_cache and ext3_inode_cache are filesystem cache.
>So far, nothing looks wrong.
>
>> Now if I can remember that shell line to check /proc/fs/ext3
>> for dups:
>
>Sorry, I am all-reiserfs now. ;]

I believe it was Viro that sent me a patch that instrumented
the inode handling of ext3.  This is the results of that,
otherwise /proc/fs/ext3 doesn't exist.

>> [root@coyote linux-2.6.8-rc4]# grep ' 2 ' /ext3-allocs
>>       2 3:8:8227974:100644
>>       2 3:8:8227973:100644
>>       2 3:8:8227972:100644
>>       2 3:8:8227971:100644
>>       2 3:8:8193936:100644
>>       2 3:8:8193935:100644
>>       2 3:8:8193934:100644
>>       2 3:8:8193738:100644
>>       2 3:8:7834144:100644
>>       2 3:8:7834143:100644
>>       2 3:8:7684604:100644
>>       2 3:8:7521425:100644
>>       2 3:8:7521411:100644
>>       2 3:8:6360398:40755
>>       2 3:8:6013120:40755
>>       2 3:8:6013101:40755
>>       2 3:8:5982111:40755
>>       2 3:8:5982098:40755
>>       2 3:8:5982088:40775
>>       2 3:8:5949697:40777
>>       2 3:8:5949683:40777
>>       2 3:8:5947892:42755
>>       2 3:8:5947890:42755
>>       2 3:8:5915386:42755
>>       2 3:8:5915379:42755
>>       2 3:8:5901299:42755
>>       2 3:8:5901289:42755
>>       2 3:8:5835169:42777
>>       2 3:8:5835162:40755
>>       2 3:8:5835159:40755
>>       2 3:8:1250790:100644
>>       2 3:8:1250789:100644

The first ' 2 ' represents an error in that only one process
should have allocated that block of ram _1_ time only for 
that particular inode.  This could, but hasn't yet that I know of,
lead to disk corruption I'm sure.  These are the most thoroughly
e2fsck'd drives on the planet I'd think.  Somewhat more than
an average of once daily now for several weeks.  I'd also bet
a cold one that if I typed reboot, I would end up having to use
the reset button to finish the job at some point, its a 75% sure
thing.

And, wonder of wonders, this list has self-shortened!:  And still
no Oops either.  The number of leading 2's had gone down the next
time I ran that line.  I'd taken a shower between runs but didn't 
know it would clean this up too :-)

Then reading up on grep, I used this command line the next time:

#>cat /proc/fs/ext3|sort|uniq -c|sort -nr|grep -v ' 1 ' >/ext3-allocs-bad;cat /ext3-allocs-bad

and got this, a much shorter list:
      2 3:8:8405754:40775
      2 3:8:7850178:100644
      2 3:8:7816153:100644
      2 3:8:7816152:100644
      2 3:8:7803727:100644
      2 3:8:7803726:100644
      2 3:8:7803033:100644
      2 3:8:7684502:100644
      2 3:8:7407284:100644

But I still don't know enough about this to point any fingers
at anything.

[... old list]

>> So now we have an odor of a problem, the question is what does
>> it smell like?  What can I do next to shine a light on this?

Questions still valid IMO.

>--
>vda

-- 
Cheers & thanks Denis, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-15  8:48                                 ` viro
                                                     ` (2 preceding siblings ...)
  2004-08-15 10:10                                   ` Gene Heskett
@ 2004-08-16 22:52                                   ` Gene Heskett
  2004-08-16 23:01                                     ` viro
  3 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-16 22:52 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Sunday 15 August 2004 04:48, viro@parcelfarce.linux.theplanet.co.uk 
wrote:
>On Sun, Aug 15, 2004 at 12:09:44AM -0400, Gene Heskett wrote:
>> The only thing I've noted in the slabinfo reports is the
>> ext3_cache was well into 6 digits in kilobytes.  Now its only
>> 15,000 of its normal units (whatever they are) after the reboot.
>
>What did dcache numbers look like at that time?
>
>Anyway, we could try the patch below and see what shows in
> /proc/fs/ext3 with it [NOTE: patch is completely untested].  It
> should show major:minor:inumber:mode
>for all currently allocated ext3 inodes.  It won't be 100% accurate
> (we can miss some entries/get some twice if cache shrinks or grows
> at the time), but if the leak is so massive, we ought to see a
> *lot* of duplicates in there.  Seeing what kind of inodes really
> leaks could narrow the things down.

Well, I am seing some dups, but they are so volatile that no two runs 
will report the same allocations as dups, and its never more than 2
using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v ' 1 '

Consecutive runs will show anywhere from 3 to 10 or 12 dups, but never 
is an address repeated between runs.

How is this to be interpreted?

FWIW, I'm now up 25 hours, with PREEMPT off.  No Oops's yet.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-16 22:52                                   ` Gene Heskett
@ 2004-08-16 23:01                                     ` viro
  2004-08-17  4:44                                       ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: viro @ 2004-08-16 23:01 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Mon, Aug 16, 2004 at 06:52:50PM -0400, Gene Heskett wrote:
> Well, I am seing some dups, but they are so volatile that no two runs 
> will report the same allocations as dups, and its never more than 2
> using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v ' 1 '
> 
> Consecutive runs will show anywhere from 3 to 10 or 12 dups, but never 
> is an address repeated between runs.
> 
> How is this to be interpreted?

That's OK.  Keep in mind that you have a *lot* of these guys and your
cat(1) makes a lot of read(2) calls.  So what you see is

<starting to read>
<see inode #n that is about to be evicted>
<read some more>
<inode #n gets evicted, quite possibly - due to memory pressure from cat(1)
or sort(1)>
<read more>
<somebody wants the same inode again>
<read more>
<see the inode #n we'd just had read from disk again>

So few duplicates are all right.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-16 23:01                                     ` viro
@ 2004-08-17  4:44                                       ` Gene Heskett
  2004-08-17  4:58                                         ` Nick Piggin
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-17  4:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Monday 16 August 2004 19:01, viro@parcelfarce.linux.theplanet.co.uk wrote:
>On Mon, Aug 16, 2004 at 06:52:50PM -0400, Gene Heskett wrote:
>> Well, I am seing some dups, but they are so volatile that no two
>> runs will report the same allocations as dups, and its never more
>> than 2 using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v '
>> 1 '
>>
>> Consecutive runs will show anywhere from 3 to 10 or 12 dups, but
>> never is an address repeated between runs.
>>
>> How is this to be interpreted?
>
>That's OK.  Keep in mind that you have a *lot* of these guys and
> your cat(1) makes a lot of read(2) calls.  So what you see is
>
><starting to read>
><see inode #n that is about to be evicted>
><read some more>
><inode #n gets evicted, quite possibly - due to memory pressure from
> cat(1) or sort(1)>
><read more>
><somebody wants the same inode again>
><read more>
><see the inode #n we'd just had read from disk again>
>
>So few duplicates are all right.

I hope so.  I've got a real hoodoozy here, being out of memory (well,
maybe 30 megs left) when my nightly run of rsync started, everything
came to a grinding halt.  I couldn't even get to the screen the 
tail -f on the log was running in, but after walking away for 10 minutes. 
I can once again.  However, things seem to be partially functional so 
I'm going to see if I can do some cut-n-paste from the log screen to 
here, but I probably can't send it as sendmail was one of the items the 
OOM killer killed.  According to top, I'm about 250 megs into the 
swap, very suddenly.  No swap was in use at 23:55 local.
-------
Aug 17 00:02:00 coyote kernel: kjournald starting.  Commit interval 5 seconds
Aug 17 00:02:00 coyote kernel: EXT3 FS on hdb3, internal journal
Aug 17 00:02:00 coyote kernel: EXT3-fs: mounted filesystem with ordered data mode.
Aug 17 00:11:55 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:11:55 coyote kernel: DMA per-cpu:
Aug 17 00:11:55 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:11:55 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:11:55 coyote kernel: Normal per-cpu:
Aug 17 00:11:55 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:11:55 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:11:55 coyote kernel: HighMem per-cpu:
Aug 17 00:11:55 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:11:55 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:11:55 coyote kernel:
Aug 17 00:11:55 coyote kernel: Free pages:        4308kB (532kB HighMem)
Aug 17 00:11:55 coyote kernel: Active:31159 inactive:1039 dirty:0 writeback:28 unstable:0 free:1077 slab:222946 mapped:30766 pagetables:944
Aug 17 00:11:55 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:11:56 coyote kernel: protections[]: 8 476 540
Aug 17 00:11:56 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:0kB inactive:2420kB present:901120kB
Aug 17 00:11:56 coyote kernel: protections[]: 0 468 532
Aug 17 00:11:56 coyote kernel: HighMem free:532kB min:128kB low:256kB high:384kB active:124636kB inactive:1736kB present:131008kB
Aug 17 00:11:56 coyote kernel: protections[]: 0 0 64
Aug 17 00:12:00 coyote kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:12:00 coyote kernel: Normal: 12*4kB 2*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB
Aug 17 00:12:00 coyote kernel: HighMem: 51*4kB 11*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 532kB
Aug 17 00:12:01 coyote kernel: Swap cache: add 94539, delete 86334, find 14429/21141, race 0+0
Aug 17 00:12:01 coyote kernel: Out of Memory: Killed process 2239 (httpd).
Aug 17 00:12:01 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:12:01 coyote kernel: DMA per-cpu:
Aug 17 00:12:01 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:12:01 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:12:01 coyote kernel: Normal per-cpu:
Aug 17 00:12:01 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:12:01 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:12:01 coyote kernel: HighMem per-cpu:
Aug 17 00:12:01 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:12:01 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:12:01 coyote kernel:
Aug 17 00:12:01 coyote kernel: Free pages:        4280kB (504kB HighMem)
Aug 17 00:12:01 coyote kernel: Active:31668 inactive:498 dirty:0 writeback:0 unstable:0 free:1070 slab:222978 mapped:31113 pagetables:935
Aug 17 00:12:01 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:12:01 coyote kernel: protections[]: 8 476 540
Aug 17 00:12:02 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1192kB inactive:1104kB present:901120kB
Aug 17 00:12:02 coyote kernel: protections[]: 0 468 532
Aug 17 00:12:02 coyote kernel: HighMem free:504kB min:128kB low:256kB high:384kB active:125480kB inactive:888kB present:131008kB
Aug 17 00:12:02 coyote kernel: protections[]: 0 0 64
Aug 17 00:12:02 coyote kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:12:02 coyote kernel: Normal: 6*4kB 5*8kB 1*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1872kB
Aug 17 00:12:02 coyote kernel: HighMem: 10*4kB 28*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 504kB
Aug 17 00:12:02 coyote kernel: Swap cache: add 95383, delete 87073, find 14612/21472, race 0+0
Aug 17 00:12:02 coyote kernel: Out of Memory: Killed process 2240 (httpd).
Aug 17 00:12:05 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:12:05 coyote kernel: DMA per-cpu:
Aug 17 00:12:05 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:12:05 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:12:05 coyote kernel: Normal per-cpu:
Aug 17 00:12:05 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:12:05 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:12:05 coyote kernel: HighMem per-cpu:
Aug 17 00:12:05 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:12:05 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:12:05 coyote kernel:
Aug 17 00:12:05 coyote kernel: Free pages:        4224kB (448kB HighMem)
Aug 17 00:12:05 coyote kernel: Active:31803 inactive:378 dirty:0 writeback:0 unstable:0 free:1056 slab:222988 mapped:31394 pagetables:926
Aug 17 00:13:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:15:12 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:28 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1144kB inactive:1160kB present:901120kB
Aug 17 00:16:28 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:29 coyote kernel: HighMem free:448kB min:128kB low:256kB high:384kB active:126068kB inactive:352kB present:131008kB
Aug 17 00:16:29 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:30 coyote kernel: DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:30 coyote kernel: Normal: 0*4kB 4*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1872kB
Aug 17 00:16:30 coyote kernel: HighMem: 40*4kB 6*8kB 1*16kB 1*32kB 1*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 448kB
Aug 17 00:16:30 coyote kernel: Swap cache: add 96127, delete 87885, find 14691/21706, race 0+0
Aug 17 00:16:30 coyote kernel: Out of Memory: Killed process 2241 (httpd).
Aug 17 00:16:30 coyote kernel:  unstable:0 free:8799 slab:223005 mapped:19246 pagetables:850
Aug 17 00:16:31 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:31 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:31 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1140kB inactive:1252kB present:901120kB
Aug 17 00:16:31 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:31 coyote kernel: HighMem free:31444kB min:128kB low:256kB high:384kB active:80988kB inactive:14548kBpresent:131008kB
Aug 17 00:16:31 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:32 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:32 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
Aug 17 00:16:32 coyote kernel: HighMem: 2411*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 31444kB
Aug 17 00:16:32 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0
Aug 17 00:16:32 coyote kernel: Out of Memory: Killed process 1803 (httpd).
Aug 17 00:16:32 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:32 coyote kernel: DMA per-cpu:
Aug 17 00:16:32 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:32 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:32 coyote kernel: Normal per-cpu:
Aug 17 00:16:32 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:32 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:32 coyote kernel: HighMem per-cpu:
Aug 17 00:16:33 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:33 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:33 coyote kernel:
Aug 17 00:16:33 coyote kernel: Free pages:       35392kB (31640kB HighMem)
Aug 17 00:16:33 coyote kernel: Active:20556 inactive:3885 dirty:0 writeback:0 unstable:0 free:8848 slab:222999 mapped:19205 pagetables:841
Aug 17 00:16:33 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:33 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:33 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1400kB inactive:992kB present:901120kB
Aug 17 00:16:33 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:33 coyote kernel: HighMem free:31640kB min:128kB low:256kB high:384kB active:80824kB inactive:14548kBpresent:131008kB
Aug 17 00:16:34 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:34 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:34 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
Aug 17 00:16:34 coyote kernel: HighMem: 2460*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 31640kB
Aug 17 00:16:34 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0
Aug 17 00:16:34 coyote kernel: Out of Memory: Killed process 1804 (httpd).
Aug 17 00:16:34 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:34 coyote kernel: DMA per-cpu:
Aug 17 00:16:34 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:34 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:34 coyote kernel: Normal per-cpu:
Aug 17 00:16:34 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:34 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:34 coyote kernel: HighMem per-cpu:
Aug 17 00:16:34 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:34 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:34 coyote kernel:
Aug 17 00:16:34 coyote kernel: Free pages:       35588kB (31836kB HighMem)
Aug 17 00:16:34 coyote kernel: Active:20469 inactive:3931 dirty:0 writeback:0 unstable:0 free:8897 slab:222993 mapped:19164 pagetables:832
Aug 17 00:16:35 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:35 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:35 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1216kB inactive:1176kB present:901120kB
Aug 17 00:16:35 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:35 coyote kernel: HighMem free:31836kB min:128kB low:256kB high:384kB active:80660kB inactive:14548kBpresent:131008kB
Aug 17 00:16:35 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:35 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:35 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
Aug 17 00:16:35 coyote kernel: HighMem: 2509*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 31836kB
Aug 17 00:16:35 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0
Aug 17 00:16:35 coyote kernel: Out of Memory: Killed process 1805 (httpd).
Aug 17 00:16:35 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:35 coyote kernel: DMA per-cpu:
Aug 17 00:16:35 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:35 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:35 coyote kernel: Normal per-cpu:
Aug 17 00:16:35 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:35 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:36 coyote kernel: HighMem per-cpu:
Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:36 coyote kernel:
Aug 17 00:16:36 coyote kernel: Free pages:       35784kB (32032kB HighMem)
Aug 17 00:16:36 coyote kernel: Active:20404 inactive:3954 dirty:0 writeback:0 unstable:0 free:8946 slab:222987 mapped:19038 pagetables:823
Aug 17 00:16:36 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:36 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:36 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1124kB inactive:1268kB present:901120kB
Aug 17 00:16:36 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:36 coyote kernel: HighMem free:32032kB min:128kB low:256kB high:384kB active:80492kB inactive:14548kBpresent:131008kB
Aug 17 00:16:36 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:36 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:36 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
Aug 17 00:16:36 coyote kernel: HighMem: 2558*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32032kB
Aug 17 00:16:36 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0
Aug 17 00:16:36 coyote kernel: Out of Memory: Killed process 2153 (sendmail).
Aug 17 00:16:36 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:36 coyote kernel: DMA per-cpu:
Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:36 coyote kernel: Normal per-cpu:
Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:36 coyote kernel: HighMem per-cpu:
Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:36 coyote kernel:
Aug 17 00:16:36 coyote kernel: Free pages:       35812kB (32060kB HighMem)
Aug 17 00:16:36 coyote kernel: Active:20381 inactive:3976 dirty:0 writeback:0 unstable:0 free:8953 slab:222986 mapped:19037 pagetables:818
Aug 17 00:16:36 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:36 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:36 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:1036kB inactive:1356kB present:901120kB
Aug 17 00:16:36 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:36 coyote kernel: HighMem free:32060kB min:128kB low:256kB high:384kB active:80488kB inactive:14548kBpresent:131008kB
Aug 17 00:16:36 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:36 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:36 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
Aug 17 00:16:36 coyote kernel: HighMem: 2565*4kB 1633*8kB 356*16kB 55*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32060kB
Aug 17 00:16:36 coyote kernel: Swap cache: add 102685, delete 93636, find 17082/24609, race 0+0
Aug 17 00:16:36 coyote kernel: Out of Memory: Killed process 21567 (kdeinit).
Aug 17 00:16:36 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:36 coyote kernel: DMA per-cpu:
Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:36 coyote kernel: Normal per-cpu:
Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:36 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:36 coyote kernel: HighMem per-cpu:
Aug 17 00:16:36 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:37 coyote kernel:
Aug 17 00:16:37 coyote kernel: Free pages:       36120kB (32368kB HighMem)
Aug 17 00:16:37 coyote kernel: Active:20284 inactive:4018 dirty:0 writeback:0 unstable:0 free:9030 slab:222984 mapped:18935 pagetables:800
Aug 17 00:16:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:37 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:37 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:952kB inactive:1444kB present:901120kB
Aug 17 00:16:37 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:37 coyote kernel: HighMem free:32368kB min:128kB low:256kB high:384kB active:80184kB inactive:14628kBpresent:131008kB
Aug 17 00:16:37 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:37 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:37 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
Aug 17 00:16:37 coyote kernel: HighMem: 2580*4kB 1642*8kB 365*16kB 56*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32368kB
Aug 17 00:16:37 coyote kernel: Swap cache: add 102702, delete 93647, find 17218/24748, race 0+0
Aug 17 00:16:37 coyote kernel: Out of Memory: Killed process 1809 (httpd).
Aug 17 00:16:37 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:37 coyote kernel: DMA per-cpu:
Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:37 coyote kernel: Normal per-cpu:
Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:37 coyote kernel: HighMem per-cpu:
Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:37 coyote kernel:
Aug 17 00:16:37 coyote kernel: Free pages:       36120kB (32368kB HighMem)
Aug 17 00:16:37 coyote kernel: Active:20263 inactive:4039 dirty:0 writeback:0 unstable:0 free:9030 slab:222984 mapped:18935 pagetables:800
Aug 17 00:16:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:37 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:37 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:868kB inactive:1528kB present:901120kB
Aug 17 00:16:37 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:37 coyote kernel: HighMem free:32368kB min:128kB low:256kB high:384kB active:80184kB inactive:14628kBpresent:131008kB
Aug 17 00:16:37 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:37 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:37 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
Aug 17 00:16:37 coyote kernel: HighMem: 2580*4kB 1642*8kB 365*16kB 56*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32368kB
Aug 17 00:16:37 coyote kernel: Swap cache: add 102702, delete 93647, find 17218/24748, race 0+0
Aug 17 00:16:37 coyote kernel: Out of Memory: Killed process 1968 (arpwatch).
Aug 17 00:16:37 coyote kernel: device eth0 left promiscuous mode
Aug 17 00:16:37 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:37 coyote kernel: DMA per-cpu:
Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:37 coyote kernel: Normal per-cpu:
Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:37 coyote kernel: HighMem per-cpu:
Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:37 coyote kernel:
Aug 17 00:16:37 coyote kernel: Free pages:       36232kB (32480kB HighMem)
Aug 17 00:16:37 coyote kernel: Active:20222 inactive:4053 dirty:0 writeback:0 unstable:0 free:9058 slab:222983 mapped:18921 pagetables:795
Aug 17 00:16:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:37 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:37 coyote kernel: Normal free:1848kB min:936kB low:1872kB high:2808kB active:792kB inactive:1604kB present:901120kB
Aug 17 00:16:37 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:37 coyote kernel: HighMem free:32480kB min:128kB low:256kB high:384kB active:80096kB inactive:14608kBpresent:131008kB
Aug 17 00:16:37 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:37 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:37 coyote kernel: Normal: 0*4kB 1*8kB 3*16kB 0*32kB 0*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1848kB
Aug 17 00:16:37 coyote kernel: HighMem: 2598*4kB 1645*8kB 366*16kB 56*32kB 6*64kB 5*128kB 1*256kB 0*512kB 0*1024kB0*2048kB 0*4096kB = 32480kB
Aug 17 00:16:37 coyote kernel: Swap cache: add 102702, delete 93673, find 17218/24748, race 0+0
Aug 17 00:16:37 coyote kernel: Out of Memory: Killed process 10755 (kdeinit).
Aug 17 00:16:37 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:37 coyote kernel: DMA per-cpu:
Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:37 coyote kernel: Normal per-cpu:
Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:37 coyote kernel: HighMem per-cpu:
Aug 17 00:16:37 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:37 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:37 coyote kernel:
Aug 17 00:16:37 coyote kernel: Free pages:       25392kB (21616kB HighMem)
Aug 17 00:16:37 coyote kernel: Active:21664 inactive:5363 dirty:0 writeback:0 unstable:0 free:6348 slab:223017 mapped:19400 pagetables:798
Aug 17 00:16:37 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:37 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:37 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1132kB inactive:1268kB present:901120kB
Aug 17 00:16:37 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:37 coyote kernel: HighMem free:21616kB min:128kB low:256kB high:384kB active:85524kB inactive:20184kBpresent:131008kB
Aug 17 00:16:37 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:37 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:37 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB
Aug 17 00:16:38 coyote kernel: HighMem: 0*4kB 1536*8kB 373*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 21616kB
Aug 17 00:16:39 coyote kernel: Swap cache: add 103622, delete 93673, find 17556/25253, race 0+0
Aug 17 00:16:39 coyote kernel: Out of Memory: Killed process 1812 (httpd).
Aug 17 00:16:40 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:40 coyote kernel: DMA per-cpu:
Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:40 coyote kernel: Normal per-cpu:
Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:40 coyote kernel: HighMem per-cpu:
Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:40 coyote kernel:
Aug 17 00:16:40 coyote kernel: Free pages:       26540kB (22764kB HighMem)
Aug 17 00:16:40 coyote kernel: Active:21447 inactive:5282 dirty:0 writeback:0 unstable:0 free:6635 slab:223012 mapped:19102 pagetables:789
Aug 17 00:16:40 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:40 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:40 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1456kB inactive:944kB present:901120kB
Aug 17 00:16:40 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:40 coyote kernel: HighMem free:22764kB min:128kB low:256kB high:384kB active:84332kB inactive:20184kBpresent:131008kB
Aug 17 00:16:40 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:40 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:40 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB
Aug 17 00:16:40 coyote kernel: HighMem: 221*4kB 1569*8kB 373*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22764kB
Aug 17 00:16:40 coyote kernel: Swap cache: add 103622, delete 93673, find 17556/25253, race 0+0
Aug 17 00:16:40 coyote kernel: Out of Memory: Killed process 1813 (httpd).
Aug 17 00:16:40 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:40 coyote kernel: DMA per-cpu:
Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:40 coyote kernel: Normal per-cpu:
Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:40 coyote kernel: HighMem per-cpu:
Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:40 coyote kernel:
Aug 17 00:16:40 coyote kernel: Free pages:       25784kB (22008kB HighMem)
Aug 17 00:16:40 coyote kernel: Active:21640 inactive:5304 dirty:0 writeback:0 unstable:0 free:6446 slab:223008 mapped:19233 pagetables:780
Aug 17 00:16:40 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:40 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:40 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1368kB inactive:1032kB present:901120kB
Aug 17 00:16:40 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:40 coyote kernel: HighMem free:22008kB min:128kB low:256kB high:384kB active:85192kB inactive:20184kBpresent:131008kB
Aug 17 00:16:40 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:40 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:40 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB
Aug 17 00:16:40 coyote kernel: HighMem: 28*4kB 1571*8kB 373*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22008kB
Aug 17 00:16:40 coyote kernel: Swap cache: add 103622, delete 93673, find 17556/25253, race 0+0
Aug 17 00:16:40 coyote kernel: Out of Memory: Killed process 1810 (httpd).
Aug 17 00:16:40 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:40 coyote kernel: DMA per-cpu:
Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:40 coyote kernel: Normal per-cpu:
Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:40 coyote kernel: HighMem per-cpu:
Aug 17 00:16:40 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:40 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:40 coyote kernel:
Aug 17 00:16:40 coyote kernel: Free pages:       25952kB (22176kB HighMem)
Aug 17 00:16:40 coyote kernel: Active:21621 inactive:5296 dirty:0 writeback:0 unstable:0 free:6488 slab:223006 mapped:19162 pagetables:771
Aug 17 00:16:40 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:40 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:40 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1400kB inactive:1000kB present:901120kB
Aug 17 00:16:41 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:41 coyote kernel: HighMem free:22176kB min:128kB low:256kB high:384kB active:85084kB inactive:20184kBpresent:131008kB
Aug 17 00:16:41 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:41 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:41 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB
Aug 17 00:16:41 coyote kernel: HighMem: 70*4kB 1571*8kB 373*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 22176kB
Aug 17 00:16:41 coyote kernel: Swap cache: add 103622, delete 93673, find 17740/25437, race 0+0
Aug 17 00:16:41 coyote kernel: Out of Memory: Killed process 3119 (kdeinit).
Aug 17 00:16:41 coyote kernel: oom-killer: gfp_mask=0xd0
Aug 17 00:16:41 coyote kernel: DMA per-cpu:
Aug 17 00:16:41 coyote kernel: cpu 0 hot: low 2, high 6, batch 1
Aug 17 00:16:41 coyote kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 17 00:16:41 coyote kernel: Normal per-cpu:
Aug 17 00:16:41 coyote kernel: cpu 0 hot: low 32, high 96, batch 16
Aug 17 00:16:41 coyote kernel: cpu 0 cold: low 0, high 32, batch 16
Aug 17 00:16:41 coyote kernel: HighMem per-cpu:
Aug 17 00:16:41 coyote kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 17 00:16:41 coyote kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 17 00:16:41 coyote kernel:
Aug 17 00:16:41 coyote kernel: Free pages:       26820kB (23044kB HighMem)
Aug 17 00:16:41 coyote kernel: Active:21387 inactive:5311 dirty:0 writeback:0 unstable:0 free:6705 slab:223005 mapped:18948 pagetables:754
Aug 17 00:16:41 coyote kernel: DMA free:1904kB min:16kB low:32kB high:48kB active:0kB inactive:0kB present:16384kB
Aug 17 00:16:41 coyote kernel: protections[]: 8 476 540
Aug 17 00:16:41 coyote kernel: Normal free:1872kB min:936kB low:1872kB high:2808kB active:1340kB inactive:1060kB present:901120kB
Aug 17 00:16:41 coyote kernel: protections[]: 0 468 532
Aug 17 00:16:41 coyote kernel: HighMem free:23044kB min:128kB low:256kB high:384kB active:84208kB inactive:20184kBpresent:131008kB
Aug 17 00:16:41 coyote kernel: protections[]: 0 0 64
Aug 17 00:16:41 coyote kernel: DMA: 10*4kB 5*8kB 4*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 1904kB
Aug 17 00:16:41 coyote kernel: Normal: 16*4kB 6*8kB 0*16kB 1*32kB 1*64kB 1*128kB 0*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 1872kB
Aug 17 00:16:41 coyote kernel: HighMem: 273*4kB 1576*8kB 374*16kB 57*32kB 8*64kB 6*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 23044kB
Aug 17 00:16:41 coyote kernel: Swap cache: add 103622, delete 93761, find 17740/25437, race 0+0
Aug 17 00:16:41 coyote kernel: Out of Memory: Killed process 3133 (kdeinit).

[root@coyote xsane-0.90]# cat /proc/meminfo
MemTotal:      1035956 kB
MemFree:          5524 kB
Buffers:         15816 kB
Cached:          80116 kB
SwapCached:      57788 kB
Active:         134848 kB
Inactive:        51592 kB
HighTotal:      131008 kB
HighFree:          532 kB
LowTotal:       904948 kB
LowFree:          4992 kB
SwapTotal:     3857104 kB
SwapFree:      3752500 kB
Dirty:             164 kB
Writeback:           0 kB
Mapped:         115268 kB
Slab:           833184 kB
Committed_AS:   295784 kB
PageTables:       3424 kB
VmallocTotal:   114680 kB
VmallocUsed:     19876 kB
VmallocChunk:    94640 kB

[root@coyote xsane-0.90]# cat /proc/slabinfo
slabinfo - version: 2.0
# name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <batchcount> <limit> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail>
unix_sock            200    200    384   10    1 : tunables   54   27    0 : slabdata     20     20      0
tcp_tw_bucket          4     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_bind_bucket       27    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
tcp_open_request       0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
inet_peer_cache        0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
ip_fib_hash           10    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
ip_dst_cache          16     30    256   15    1 : tunables  120   60    0 : slabdata      2      2      0
arp_cache              4     31    128   31    1 : tunables  120   60    0 : slabdata      1      1      0
raw4_sock              0      0    480    8    1 : tunables   54   27    0 : slabdata      0      0      0
udp_sock               2      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
tcp_sock              32     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
flow_cache             0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
mqueue_inode_cache      1      8    480    8    1 : tunables   54   27    0 : slabdata      1      1      0
udf_inode_cache        0      0    352   11    1 : tunables   54   27    0 : slabdata      0      0      0
smb_request            0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
smb_inode_cache        1     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
isofs_inode_cache      0      0    320   12    1 : tunables   54   27    0 : slabdata      0      0      0
fat_inode_cache        4     22    352   11    1 : tunables   54   27    0 : slabdata      2      2      0
ext2_inode_cache       0      0    416    9    1 : tunables   54   27    0 : slabdata      0      0      0
journal_handle        25    135     28  135    1 : tunables  120   60    0 : slabdata      1      1      0
journal_head         607   2835     48   81    1 : tunables  120   60    0 : slabdata     35     35      0
revoke_table          14    290     12  290    1 : tunables  120   60    0 : slabdata      1      1      0
revoke_record          0      0     16  226    1 : tunables  120   60    0 : slabdata      0      0      0
ext3_inode_cache  1488612 1488618    448    9    1 : tunables   54   27    0 : slabdata 165402 165402      0
eventpoll_pwq          0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
eventpoll_epi          0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
kioctx                 0      0    160   25    1 : tunables  120   60    0 : slabdata      0      0      0
kiocb                  0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
dnotify_cache        222    370     20  185    1 : tunables  120   60    0 : slabdata      2      2      0
file_lock_cache       19     43     92   43    1 : tunables  120   60    0 : slabdata      1      1      0
fasync_cache           2    226     16  226    1 : tunables  120   60    0 : slabdata      1      1      0
shmem_inode_cache      5     10    384   10    1 : tunables   54   27    0 : slabdata      1      1      0
posix_timers_cache      0      0     96   41    1 : tunables  120   60    0 : slabdata      0      0      0
uid_cache              5    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
sgpool-128            32     32   2048    2    1 : tunables   24   12    0 : slabdata     16     16      0
sgpool-64             32     32   1024    4    1 : tunables   54   27    0 : slabdata      8      8      0
sgpool-32             32     32    512    8    1 : tunables   54   27    0 : slabdata      4      4      0
sgpool-16             32     45    256   15    1 : tunables  120   60    0 : slabdata      3      3      0
sgpool-8              32     62    128   31    1 : tunables  120   60    0 : slabdata      2      2      0
cfq_pool              64    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
crq_pool               0      0     36  107    1 : tunables  120   60    0 : slabdata      0      0      0
deadline_drq           0      0     48   81    1 : tunables  120   60    0 : slabdata      0      0      0
as_arq               101    130     60   65    1 : tunables  120   60    0 : slabdata      2      2      0
blkdev_ioc            80    185     20  185    1 : tunables  120   60    0 : slabdata      1      1      0
blkdev_queue          12     18    448    9    1 : tunables   54   27    0 : slabdata      2      2      0
blkdev_requests       80    104    152   26    1 : tunables  120   60    0 : slabdata      4      4      0
biovec-(256)         256    256   3072    2    2 : tunables   24   12    0 : slabdata    128    128      0
biovec-128           256    260   1536    5    2 : tunables   24   12    0 : slabdata     52     52      0
biovec-64            256    260    768    5    1 : tunables   54   27    0 : slabdata     52     52      0
biovec-16            256    260    192   20    1 : tunables  120   60    0 : slabdata     13     13      0
biovec-4             256    305     64   61    1 : tunables  120   60    0 : slabdata      5      5      0
biovec-1             368    452     16  226    1 : tunables  120   60    0 : slabdata      2      2      0
bio                  366    366     64   61    1 : tunables  120   60    0 : slabdata      6      6      0
sock_inode_cache     234    242    352   11    1 : tunables   54   27    0 : slabdata     22     22      0
skbuff_head_cache    251    475    160   25    1 : tunables  120   60    0 : slabdata     19     19      0
sock                   2     12    320   12    1 : tunables   54   27    0 : slabdata      1      1      0
proc_inode_cache     610    612    320   12    1 : tunables   54   27    0 : slabdata     51     51      0
sigqueue              66     81    148   27    1 : tunables  120   60    0 : slabdata      3      3      0
radix_tree_node     2565   3276    276   14    1 : tunables   54   27    0 : slabdata    234    234      0
bdev_cache            12     18    416    9    1 : tunables   54   27    0 : slabdata      2      2      0
mnt_cache             25     41     96   41    1 : tunables  120   60    0 : slabdata      1      1      0
inode_cache         2354   2380    288   14    1 : tunables   54   27    0 : slabdata    170    170      0
dentry_cache      1115280 1116752    140   28    1 : tunables  120   60    0 : slabdata  39884  39884      0
filp                2060   2300    160   25    1 : tunables  120   60    0 : slabdata     92     92      0
names_cache           17     17   4096    1    1 : tunables   24   12    0 : slabdata     17     17      0
idr_layer_cache       81     87    136   29    1 : tunables  120   60    0 : slabdata      3      3      0
buffer_head         4151   8424     48   81    1 : tunables  120   60    0 : slabdata    104    104      0
mm_struct             98     98    512    7    1 : tunables   54   27    0 : slabdata     14     14      0
vm_area_struct      8554   8554     84   47    1 : tunables  120   60    0 : slabdata    182    182      0
fs_cache              94    119     32  119    1 : tunables  120   60    0 : slabdata      1      1      0
files_cache           93     99    416    9    1 : tunables   54   27    0 : slabdata     11     11      0
signal_cache         116    123     96   41    1 : tunables  120   60    0 : slabdata      3      3      0
sighand_cache        111    111   1312    3    1 : tunables   24   12    0 : slabdata     37     37      0
task_struct          121    130   1424    5    2 : tunables   24   12    0 : slabdata     26     26      0
anon_vma            1770   2035      8  407    1 : tunables  120   60    0 : slabdata      5      5      0
pgd                   94     94   4096    1    1 : tunables   24   12    0 : slabdata     94     94      0
size-131072(DMA)       0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-131072            0      0 131072    1   32 : tunables    8    4    0 : slabdata      0      0      0
size-65536(DMA)        0      0  65536    1   16 : tunables    8    4    0 : slabdata      0      0      0
size-65536             1      1  65536    1   16 : tunables    8    4    0 : slabdata      1      1      0
size-32768(DMA)        0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-32768             0      0  32768    1    8 : tunables    8    4    0 : slabdata      0      0      0
size-16384(DMA)        0      0  16384    1    4 : tunables    8    4    0 : slabdata      0      0      0
size-16384             5      9  16384    1    4 : tunables    8    4    0 : slabdata      5      9      0
size-8192(DMA)         0      0   8192    1    2 : tunables    8    4    0 : slabdata      0      0      0
size-8192             11     11   8192    1    2 : tunables    8    4    0 : slabdata     11     11      0
size-4096(DMA)         0      0   4096    1    1 : tunables   24   12    0 : slabdata      0      0      0
size-4096            184    184   4096    1    1 : tunables   24   12    0 : slabdata    184    184      0
size-2048(DMA)         0      0   2048    2    1 : tunables   24   12    0 : slabdata      0      0      0
size-2048            174    194   2048    2    1 : tunables   24   12    0 : slabdata     97     97      0
size-1024(DMA)         0      0   1024    4    1 : tunables   54   27    0 : slabdata      0      0      0
size-1024            157    180   1024    4    1 : tunables   54   27    0 : slabdata     45     45      0
size-512(DMA)          0      0    512    8    1 : tunables   54   27    0 : slabdata      0      0      0
size-512             197    448    512    8    1 : tunables   54   27    0 : slabdata     56     56      0
size-256(DMA)          0      0    256   15    1 : tunables  120   60    0 : slabdata      0      0      0
size-256             213    420    256   15    1 : tunables  120   60    0 : slabdata     28     28      0
size-192(DMA)          0      0    192   20    1 : tunables  120   60    0 : slabdata      0      0      0
size-192             120    120    192   20    1 : tunables  120   60    0 : slabdata      6      6      0
size-128(DMA)          0      0    128   31    1 : tunables  120   60    0 : slabdata      0      0      0
size-128            1243   1302    128   31    1 : tunables  120   60    0 : slabdata     42     42      0
size-64(DMA)           0      0     64   61    1 : tunables  120   60    0 : slabdata      0      0      0
size-64            47735  48251     64   61    1 : tunables  120   60    0 : slabdata    791    791      0
size-32(DMA)           0      0     32  119    1 : tunables  120   60    0 : slabdata      0      0      0
size-32             1368   1428     32  119    1 : tunables  120   60    0 : slabdata     12     12      0
kmem_cache           124    124    128   31    1 : tunables  120   60    0 : slabdata      4      4      0

I cannot start any new shells, as before.  Is there any usable dna in this sample?                                                                                       

Reboot time I guess :(((

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-17  4:44                                       ` Gene Heskett
@ 2004-08-17  4:58                                         ` Nick Piggin
  2004-08-17  5:26                                           ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Nick Piggin @ 2004-08-17  4:58 UTC (permalink / raw)
  To: gene.heskett
  Cc: linux-kernel, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

Gene Heskett wrote:

>On Monday 16 August 2004 19:01, viro@parcelfarce.linux.theplanet.co.uk wrote:
>
>>On Mon, Aug 16, 2004 at 06:52:50PM -0400, Gene Heskett wrote:
>>
>>>Well, I am seing some dups, but they are so volatile that no two
>>>runs will report the same allocations as dups, and its never more
>>>than 2 using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v '
>>>1 '
>>>
>>>Consecutive runs will show anywhere from 3 to 10 or 12 dups, but
>>>never is an address repeated between runs.
>>>
>>>How is this to be interpreted?
>>>
>>That's OK.  Keep in mind that you have a *lot* of these guys and
>>your cat(1) makes a lot of read(2) calls.  So what you see is
>>
>><starting to read>
>><see inode #n that is about to be evicted>
>><read some more>
>><inode #n gets evicted, quite possibly - due to memory pressure from
>>cat(1) or sort(1)>
>><read more>
>><somebody wants the same inode again>
>><read more>
>><see the inode #n we'd just had read from disk again>
>>
>>So few duplicates are all right.
>>
>
>I hope so.  I've got a real hoodoozy here, being out of memory (well,
>maybe 30 megs left) when my nightly run of rsync started, everything
>came to a grinding halt.  I couldn't even get to the screen the 
>tail -f on the log was running in, but after walking away for 10 minutes. 
>I can once again.  However, things seem to be partially functional so 
>I'm going to see if I can do some cut-n-paste from the log screen to 
>here, but I probably can't send it as sendmail was one of the items the 
>OOM killer killed.  According to top, I'm about 250 megs into the 
>swap, very suddenly.  No swap was in use at 23:55 local.
>
>
snip

>
>I cannot start any new shells, as before.  Is there any usable dna in this sample?                                                                                       
>
>Reboot time I guess :(((
>
>

All your low memory has been used by dentry and inode caches. This isn't 
very
interesting because this would be no doubt caused by something oopsing while
holding the shrinker semaphore as Andrew pointed out.

What is interesting is that first Oops message (I wonder if you don't have
bad hardware though, I don't think anyone else is seeing it).


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-17  4:58                                         ` Nick Piggin
@ 2004-08-17  5:26                                           ` Gene Heskett
  2004-08-17 11:57                                             ` Nick Piggin
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-17  5:26 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nick Piggin, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Tuesday 17 August 2004 00:58, Nick Piggin wrote:
>Gene Heskett wrote:
>>On Monday 16 August 2004 19:01, viro@parcelfarce.linux.theplanet.co.uk wrote:
>>>On Mon, Aug 16, 2004 at 06:52:50PM -0400, Gene Heskett wrote:
>>>>Well, I am seing some dups, but they are so volatile that no two
>>>>runs will report the same allocations as dups, and its never more
>>>>than 2 using /proc/fs/ext3 | sort | uniq -c | sort -nr |grep -v '
>>>>1 '
>>>>
>>>>Consecutive runs will show anywhere from 3 to 10 or 12 dups, but
>>>>never is an address repeated between runs.
>>>>
>>>>How is this to be interpreted?
>>>
>>>That's OK.  Keep in mind that you have a *lot* of these guys and
>>>your cat(1) makes a lot of read(2) calls.  So what you see is
>>>
>>><starting to read>
>>><see inode #n that is about to be evicted>
>>><read some more>
>>><inode #n gets evicted, quite possibly - due to memory pressure
>>> from cat(1) or sort(1)>
>>><read more>
>>><somebody wants the same inode again>
>>><read more>
>>><see the inode #n we'd just had read from disk again>
>>>
>>>So few duplicates are all right.
>>
>>I hope so.  I've got a real hoodoozy here, being out of memory
>> (well, maybe 30 megs left) when my nightly run of rsync started,
>> everything came to a grinding halt.  I couldn't even get to the
>> screen the tail -f on the log was running in, but after walking
>> away for 10 minutes. I can once again.  However, things seem to be
>> partially functional so I'm going to see if I can do some
>> cut-n-paste from the log screen to here, but I probably can't send
>> it as sendmail was one of the items the OOM killer killed. 
>> According to top, I'm about 250 megs into the swap, very suddenly.
>>  No swap was in use at 23:55 local.
>
>snip
>
>>I cannot start any new shells, as before.  Is there any usable dna
>> in this sample?
>>
>>Reboot time I guess :(((
>
>All your low memory has been used by dentry and inode caches. This
> isn't very
>interesting because this would be no doubt caused by something
> oopsing while holding the shrinker semaphore as Andrew pointed out.
>
>What is interesting is that first Oops message (I wonder if you
> don't have bad hardware though, I don't think anyone else is seeing
> it).

What 'first Oops message'?  One I posted before?

That comment caused me to go back in the log to well above where I had
been channel surfing with tvtime, and I did find an Oops:

Aug 16 21:15:46 coyote kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
Aug 16 21:15:46 coyote kernel:  printing eip:
Aug 16 21:15:46 coyote kernel: c015c8db
Aug 16 21:15:46 coyote kernel: *pde = 00000000
Aug 16 21:15:46 coyote kernel: Oops: 0002 [#1]
Aug 16 21:15:46 coyote kernel: Modules linked in: tuner tvaudio bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq
_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_allo
c snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
Aug 16 21:15:46 coyote kernel: CPU:    0
Aug 16 21:15:46 coyote kernel: EIP:    0060:[<c015c8db>]    Not tainted
Aug 16 21:15:46 coyote kernel: EFLAGS: 00210206   (2.6.8-rc4)
Aug 16 21:15:46 coyote kernel: EIP is at prune_icache+0x6b/0x1b0
Aug 16 21:15:46 coyote kernel: eax: 00000000   ebx: dffe0fd0   ecx: d3eb8b80   edx: c0341660
Aug 16 21:15:46 coyote kernel: esi: dffe0fc8   edi: 0000005a   ebp: d3eb8b94   esp: d3eb8b74
Aug 16 21:15:46 coyote kernel: ds: 007b   es: 007b   ss: 0068
Aug 16 21:15:46 coyote kernel: Process yum (pid: 30892, threadinfo=d3eb8000 task=cf6bf7b0)
Aug 16 21:15:46 coyote kernel: Stack: dffe0448 00000000 00000059 dffe0450 df58d0d0 00000080 00000000 d3eb8000
Aug 16 21:15:46 coyote kernel:        d3eb8ba0 c015ca5f 00000080 d3eb8bd4 c0135b14 00000080 000000d2 0108bf00
Aug 16 21:15:46 coyote kernel:        00000000 00021087 00000080 00000000 f7ffea20 0000000a d3eb8c50 00000000
Aug 16 21:15:46 coyote kernel: Call Trace:
Aug 16 21:15:46 coyote kernel:  [<c01044ef>] show_stack+0x7f/0xa0
Aug 16 21:15:46 coyote kernel:  [<c0104688>] show_registers+0x158/0x1b0
Aug 16 21:15:46 coyote kernel:  [<c01047e6>] die+0x66/0xd0
Aug 16 21:15:46 coyote kernel:  [<c01109de>] do_page_fault+0x28e/0x548
Aug 16 21:15:46 coyote kernel:  [<c010415d>] error_code+0x2d/0x38
Aug 16 21:15:46 coyote kernel:  [<c015ca5f>] shrink_icache_memory+0x3f/0x50
Aug 16 21:15:46 coyote kernel:  [<c0135b14>] shrink_slab+0x134/0x170
Aug 16 21:15:46 coyote kernel:  [<c0136954>] try_to_free_pages+0xa4/0x160
Aug 16 21:15:46 coyote kernel:  [<c012fc23>] __alloc_pages+0x1b3/0x320
Aug 16 21:15:46 coyote kernel:  [<c0139a8f>] do_anonymous_page+0x5f/0x180
Aug 16 21:15:46 coyote kernel:  [<c0139c11>] do_no_page+0x61/0x310
Aug 16 21:15:46 coyote kernel:  [<c013a097>] handle_mm_fault+0xd7/0x160
Aug 16 21:15:46 coyote kernel:  [<c01108a0>] do_page_fault+0x150/0x548
Aug 16 21:15:46 coyote kernel:  [<c010415d>] error_code+0x2d/0x38
Aug 16 21:15:46 coyote kernel:  [<c012c279>] do_generic_mapping_read+0x129/0x430
Aug 16 21:15:46 coyote kernel:  [<c012c836>] __generic_file_aio_read+0x1b6/0x1f0
Aug 16 21:15:46 coyote kernel:  [<c012c8c2>] generic_file_aio_read+0x52/0x70
Aug 16 21:15:46 coyote kernel:  [<c0145898>] do_sync_read+0x78/0xa0
Aug 16 21:15:46 coyote kernel:  [<c014598a>] vfs_read+0xca/0x140
Aug 16 21:15:46 coyote kernel:  [<c0145c2b>] sys_read+0x4b/0x80
Aug 16 21:15:46 coyote kernel:  [<c0103f61>] sysenter_past_esp+0x52/0x71
Aug 16 21:15:46 coyote kernel: Code: 89 10 a1 60 16 34 c0 89 58 04 89 03 c7 43 04 60 16 34 c0 89

yum did a segfault about that time. yum is nice code, when
it fscking works, which is maybe half the time on 2 different
FC2 machines here now.

So we're back to the dentry_cache thing...  Duh, NO!, this is in
prune_icache, not prune_dcache, presumably slightly different.  

As far as bad hardware is concerned, warranty time is running out.
I need something  plausible to take back to tcwo as a good reason
for requesting a 'blanket rma' on the whole thing, would they 
please send me another.

Preferably an AMD Athlon 2800XP that wasn't stepping 00.

Or are the bug lists constant across these processors?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-17  5:26                                           ` Gene Heskett
@ 2004-08-17 11:57                                             ` Nick Piggin
  2004-08-19  9:41                                               ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Nick Piggin @ 2004-08-17 11:57 UTC (permalink / raw)
  To: gene.heskett
  Cc: linux-kernel, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

Gene Heskett wrote:
> On Tuesday 17 August 2004 00:58, Nick Piggin wrote:
> 
>>Gene Heskett wrote:

>>>Reboot time I guess :(((
>>
>>All your low memory has been used by dentry and inode caches. This
>>isn't very
>>interesting because this would be no doubt caused by something
>>oopsing while holding the shrinker semaphore as Andrew pointed out.
>>
>>What is interesting is that first Oops message (I wonder if you
>>don't have bad hardware though, I don't think anyone else is seeing
>>it).
> 
> 
> What 'first Oops message'?  One I posted before?
> 

Well, the first Oops that your running kernel raises. Usually you
don't bother about subsequent oopses and misbehaviour because the
first one can cause the system to go into a funny state - this is
a prime example.

> That comment caused me to go back in the log to well above where I had
> been channel surfing with tvtime, and I did find an Oops:
> 
> Aug 16 21:15:46 coyote kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000
> Aug 16 21:15:46 coyote kernel:  printing eip:
> Aug 16 21:15:46 coyote kernel: c015c8db
> Aug 16 21:15:46 coyote kernel: *pde = 00000000
> Aug 16 21:15:46 coyote kernel: Oops: 0002 [#1]
> Aug 16 21:15:46 coyote kernel: Modules linked in: tuner tvaudio bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq
> _midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_allo
> c snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
> Aug 16 21:15:46 coyote kernel: CPU:    0
> Aug 16 21:15:46 coyote kernel: EIP:    0060:[<c015c8db>]    Not tainted
> Aug 16 21:15:46 coyote kernel: EFLAGS: 00210206   (2.6.8-rc4)
> Aug 16 21:15:46 coyote kernel: EIP is at prune_icache+0x6b/0x1b0
> Aug 16 21:15:46 coyote kernel: eax: 00000000   ebx: dffe0fd0   ecx: d3eb8b80   edx: c0341660
> Aug 16 21:15:46 coyote kernel: esi: dffe0fc8   edi: 0000005a   ebp: d3eb8b94   esp: d3eb8b74
> Aug 16 21:15:46 coyote kernel: ds: 007b   es: 007b   ss: 0068
> Aug 16 21:15:46 coyote kernel: Process yum (pid: 30892, threadinfo=d3eb8000 task=cf6bf7b0)
> Aug 16 21:15:46 coyote kernel: Stack: dffe0448 00000000 00000059 dffe0450 df58d0d0 00000080 00000000 d3eb8000
> Aug 16 21:15:46 coyote kernel:        d3eb8ba0 c015ca5f 00000080 d3eb8bd4 c0135b14 00000080 000000d2 0108bf00
> Aug 16 21:15:46 coyote kernel:        00000000 00021087 00000080 00000000 f7ffea20 0000000a d3eb8c50 00000000
> Aug 16 21:15:46 coyote kernel: Call Trace:
> Aug 16 21:15:46 coyote kernel:  [<c01044ef>] show_stack+0x7f/0xa0
> Aug 16 21:15:46 coyote kernel:  [<c0104688>] show_registers+0x158/0x1b0
> Aug 16 21:15:46 coyote kernel:  [<c01047e6>] die+0x66/0xd0
> Aug 16 21:15:46 coyote kernel:  [<c01109de>] do_page_fault+0x28e/0x548
> Aug 16 21:15:46 coyote kernel:  [<c010415d>] error_code+0x2d/0x38
> Aug 16 21:15:46 coyote kernel:  [<c015ca5f>] shrink_icache_memory+0x3f/0x50
> Aug 16 21:15:46 coyote kernel:  [<c0135b14>] shrink_slab+0x134/0x170
> Aug 16 21:15:46 coyote kernel:  [<c0136954>] try_to_free_pages+0xa4/0x160
> Aug 16 21:15:46 coyote kernel:  [<c012fc23>] __alloc_pages+0x1b3/0x320
> Aug 16 21:15:46 coyote kernel:  [<c0139a8f>] do_anonymous_page+0x5f/0x180
> Aug 16 21:15:46 coyote kernel:  [<c0139c11>] do_no_page+0x61/0x310
> Aug 16 21:15:46 coyote kernel:  [<c013a097>] handle_mm_fault+0xd7/0x160
> Aug 16 21:15:46 coyote kernel:  [<c01108a0>] do_page_fault+0x150/0x548
> Aug 16 21:15:46 coyote kernel:  [<c010415d>] error_code+0x2d/0x38
> Aug 16 21:15:46 coyote kernel:  [<c012c279>] do_generic_mapping_read+0x129/0x430
> Aug 16 21:15:46 coyote kernel:  [<c012c836>] __generic_file_aio_read+0x1b6/0x1f0
> Aug 16 21:15:46 coyote kernel:  [<c012c8c2>] generic_file_aio_read+0x52/0x70
> Aug 16 21:15:46 coyote kernel:  [<c0145898>] do_sync_read+0x78/0xa0
> Aug 16 21:15:46 coyote kernel:  [<c014598a>] vfs_read+0xca/0x140
> Aug 16 21:15:46 coyote kernel:  [<c0145c2b>] sys_read+0x4b/0x80
> Aug 16 21:15:46 coyote kernel:  [<c0103f61>] sysenter_past_esp+0x52/0x71
> Aug 16 21:15:46 coyote kernel: Code: 89 10 a1 60 16 34 c0 89 58 04 89 03 c7 43 04 60 16 34 c0 89
> 
> yum did a segfault about that time. yum is nice code, when
> it fscking works, which is maybe half the time on 2 different
> FC2 machines here now.
> 

Although an Oops is always the kernel's (or bad hardware's) fault.
So in this case you can let yum off the hook :)

> So we're back to the dentry_cache thing...  Duh, NO!, this is in
> prune_icache, not prune_dcache, presumably slightly different.  
> 

Yeah, both are going to cause cache shrinking to stop working.

> As far as bad hardware is concerned, warranty time is running out.
> I need something  plausible to take back to tcwo as a good reason
> for requesting a 'blanket rma' on the whole thing, would they 
> please send me another.
> 

Not too sure really. At this stage keep trying patches that you get
sent :P

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-17 11:57                                             ` Nick Piggin
@ 2004-08-19  9:41                                               ` Gene Heskett
  2004-08-19 18:36                                                 ` Marcelo Tosatti
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-19  9:41 UTC (permalink / raw)
  To: linux-kernel
  Cc: Nick Piggin, viro, Marcelo Tosatti, Linus Torvalds, Andrew Morton

On Tuesday 17 August 2004 07:57, Nick Piggin wrote:
>Gene Heskett wrote:
>> On Tuesday 17 August 2004 00:58, Nick Piggin wrote:
>>>Gene Heskett wrote:
>>>>Reboot time I guess :(((
>>>
>>>All your low memory has been used by dentry and inode caches. This
>>>isn't very
>>>interesting because this would be no doubt caused by something
>>>oopsing while holding the shrinker semaphore as Andrew pointed
>>> out.
>>>
>>>What is interesting is that first Oops message (I wonder if you
>>>don't have bad hardware though, I don't think anyone else is
>>> seeing it).
>>
>> What 'first Oops message'?  One I posted before?
>
>Well, the first Oops that your running kernel raises. Usually you
>don't bother about subsequent oopses and misbehaviour because the
>first one can cause the system to go into a funny state - this is
>a prime example.
>
>> That comment caused me to go back in the log to well above where I
>> had been channel surfing with tvtime, and I did find an Oops:
>>
>> Aug 16 21:15:46 coyote kernel: Unable to handle kernel NULL
>> pointer dereference at virtual address 00000000 Aug 16 21:15:46
>> coyote kernel:  printing eip:
>> Aug 16 21:15:46 coyote kernel: c015c8db
>> Aug 16 21:15:46 coyote kernel: *pde = 00000000
>> Aug 16 21:15:46 coyote kernel: Oops: 0002 [#1]
>> Aug 16 21:15:46 coyote kernel: Modules linked in: tuner tvaudio
>> bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq _midi_event
>> snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0
>> snd_ac97_codec snd_pcm snd_timer snd_page_allo c snd_mpu401_uart
>> snd_rawmidi snd_seq_device snd forcedeth sg Aug 16 21:15:46 coyote
>> kernel: CPU:    0
>> Aug 16 21:15:46 coyote kernel: EIP:    0060:[<c015c8db>]    Not
>> tainted Aug 16 21:15:46 coyote kernel: EFLAGS: 00210206  
>> (2.6.8-rc4) Aug 16 21:15:46 coyote kernel: EIP is at
>> prune_icache+0x6b/0x1b0 Aug 16 21:15:46 coyote kernel: eax:
>> 00000000   ebx: dffe0fd0   ecx: d3eb8b80   edx: c0341660 Aug 16
>> 21:15:46 coyote kernel: esi: dffe0fc8   edi: 0000005a   ebp:
>> d3eb8b94   esp: d3eb8b74 Aug 16 21:15:46 coyote kernel: ds: 007b  
>> es: 007b   ss: 0068 Aug 16 21:15:46 coyote kernel: Process yum
>> (pid: 30892, threadinfo=d3eb8000 task=cf6bf7b0) Aug 16 21:15:46
>> coyote kernel: Stack: dffe0448 00000000 00000059 dffe0450 df58d0d0
>> 00000080 00000000 d3eb8000 Aug 16 21:15:46 coyote kernel:       
>> d3eb8ba0 c015ca5f 00000080 d3eb8bd4 c0135b14 00000080 000000d2
>> 0108bf00 Aug 16 21:15:46 coyote kernel:        00000000 00021087
>> 00000080 00000000 f7ffea20 0000000a d3eb8c50 00000000 Aug 16
>> 21:15:46 coyote kernel: Call Trace:
>> Aug 16 21:15:46 coyote kernel:  [<c01044ef>] show_stack+0x7f/0xa0
>> Aug 16 21:15:46 coyote kernel:  [<c0104688>]
>> show_registers+0x158/0x1b0 Aug 16 21:15:46 coyote kernel: 
>> [<c01047e6>] die+0x66/0xd0 Aug 16 21:15:46 coyote kernel: 
>> [<c01109de>] do_page_fault+0x28e/0x548 Aug 16 21:15:46 coyote
>> kernel:  [<c010415d>] error_code+0x2d/0x38 Aug 16 21:15:46 coyote
>> kernel:  [<c015ca5f>] shrink_icache_memory+0x3f/0x50 Aug 16
>> 21:15:46 coyote kernel:  [<c0135b14>] shrink_slab+0x134/0x170 Aug
>> 16 21:15:46 coyote kernel:  [<c0136954>]
>> try_to_free_pages+0xa4/0x160 Aug 16 21:15:46 coyote kernel: 
>> [<c012fc23>] __alloc_pages+0x1b3/0x320 Aug 16 21:15:46 coyote
>> kernel:  [<c0139a8f>] do_anonymous_page+0x5f/0x180 Aug 16 21:15:46
>> coyote kernel:  [<c0139c11>] do_no_page+0x61/0x310 Aug 16 21:15:46
>> coyote kernel:  [<c013a097>] handle_mm_fault+0xd7/0x160 Aug 16
>> 21:15:46 coyote kernel:  [<c01108a0>] do_page_fault+0x150/0x548
>> Aug 16 21:15:46 coyote kernel:  [<c010415d>] error_code+0x2d/0x38
>> Aug 16 21:15:46 coyote kernel:  [<c012c279>]
>> do_generic_mapping_read+0x129/0x430 Aug 16 21:15:46 coyote kernel:
>>  [<c012c836>] __generic_file_aio_read+0x1b6/0x1f0 Aug 16 21:15:46
>> coyote kernel:  [<c012c8c2>] generic_file_aio_read+0x52/0x70 Aug
>> 16 21:15:46 coyote kernel:  [<c0145898>] do_sync_read+0x78/0xa0
>> Aug 16 21:15:46 coyote kernel:  [<c014598a>] vfs_read+0xca/0x140
>> Aug 16 21:15:46 coyote kernel:  [<c0145c2b>] sys_read+0x4b/0x80
>> Aug 16 21:15:46 coyote kernel:  [<c0103f61>]
>> sysenter_past_esp+0x52/0x71 Aug 16 21:15:46 coyote kernel: Code:
>> 89 10 a1 60 16 34 c0 89 58 04 89 03 c7 43 04 60 16 34 c0 89
>>
>> yum did a segfault about that time. yum is nice code, when
>> it fscking works, which is maybe half the time on 2 different
>> FC2 machines here now.
>
>Although an Oops is always the kernel's (or bad hardware's) fault.
>So in this case you can let yum off the hook :)
>
>> So we're back to the dentry_cache thing...  Duh, NO!, this is in
>> prune_icache, not prune_dcache, presumably slightly different.
>
>Yeah, both are going to cause cache shrinking to stop working.
>
>> As far as bad hardware is concerned, warranty time is running out.
>> I need something  plausible to take back to tcwo as a good reason
>> for requesting a 'blanket rma' on the whole thing, would they
>> please send me another.
>
>Not too sure really. At this stage keep trying patches that you get
>sent :P

I just had another but this ones a bit different:

Aug 19 04:22:11 coyote kernel: ------------[ cut here ]------------
Aug 19 04:22:11 coyote kernel: kernel BUG at fs/buffer.c:805!
Aug 19 04:22:11 coyote kernel: invalid operand: 0000 [#1]
Aug 19 04:22:11 coyote kernel: Modules linked in: eeprom snd_seq_oss 
snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x 
snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc 
snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
Aug 19 04:22:11 coyote kernel: CPU:    0
Aug 19 04:22:11 coyote kernel: EIP:    0060:[<c0147d77>]    Not 
tainted
Aug 19 04:22:11 coyote kernel: EFLAGS: 00010246   (2.6.8-rc4)
Aug 19 04:22:11 coyote kernel: EIP is at 
remove_inode_buffers+0x77/0x90
Aug 19 04:22:11 coyote kernel: eax: 00000000   ebx: d7de519c   ecx: 
d7deb99c   edx: d7deb974
Aug 19 04:22:11 coyote kernel: esi: d7de50c8   edi: 00000001   ebp: 
c198bedc   esp: c198becc
Aug 19 04:22:11 coyote kernel: ds: 007b   es: 007b   ss: 0068
Aug 19 04:22:11 coyote kernel: Process kswapd0 (pid: 66, 
threadinfo=c198b000 task=c1978050)
Aug 19 04:22:11 coyote kernel: Stack: d7de50c8 d7de50d0 d7de50c8 
00000057 c198bf04 c015c985 d7de50c8 00000000
Aug 19 04:22:11 coyote kernel:        00000057 d7de5290 e50ac0d0 
00000080 00000000 c198b000 c198bf10 c015ca5f
Aug 19 04:22:11 coyote kernel:        00000080 c198bf44 c0135b14 
00000080 000000d0 01779600 00000000 0002d1f3
Aug 19 04:22:11 coyote kernel: Call Trace:
Aug 19 04:22:11 coyote kernel:  [<c01044ef>] show_stack+0x7f/0xa0
Aug 19 04:22:11 coyote kernel:  [<c0104688>] 
show_registers+0x158/0x1b0
Aug 19 04:22:11 coyote kernel:  [<c01047e6>] die+0x66/0xd0
Aug 19 04:22:12 coyote kernel:  [<c0104bc3>] do_invalid_op+0xb3/0xc0
Aug 19 04:22:12 coyote kernel:  [<c010415d>] error_code+0x2d/0x38
Aug 19 04:22:12 coyote kernel:  [<c015c985>] prune_icache+0x115/0x1b0
Aug 19 04:22:12 coyote kernel:  [<c015ca5f>] 
shrink_icache_memory+0x3f/0x50
Aug 19 04:22:12 coyote kernel:  [<c0135b14>] shrink_slab+0x134/0x170
Aug 19 04:22:12 coyote kernel:  [<c0136bb9>] balance_pgdat+0x1a9/0x1f0
Aug 19 04:22:12 coyote kernel:  [<c0136cbf>] kswapd+0xbf/0xd0
Aug 19 04:22:12 coyote kernel:  [<c01023f1>] 
kernel_thread_helper+0x5/0x14
Aug 19 04:22:12 coyote kernel: Code: 0f 0b 25 03 e5 0b 30 c0 eb c4 31 
ff eb de 0f 0b 36 04 e5 0b

The system is still up but its 100 megs into swap so I'm going to 
reboot without changing anything.  Is this one traceable?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-19  9:41                                               ` Gene Heskett
@ 2004-08-19 18:36                                                 ` Marcelo Tosatti
  2004-08-20  2:38                                                   ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Marcelo Tosatti @ 2004-08-19 18:36 UTC (permalink / raw)
  To: Gene Heskett
  Cc: linux-kernel, Nick Piggin, viro, Linus Torvalds, Andrew Morton


Gene, 

That is:

/*
 * The buffer's backing address_space's private_lock must be held
 */
static inline void __remove_assoc_queue(struct buffer_head *bh)
{
        BUG_ON(bh->b_assoc_buffers.next == NULL); 			<----------
        BUG_ON(bh->b_assoc_buffers.prev == NULL);
        list_del_init(&bh->b_assoc_buffers);
}

Viro, Linus, Andrew, dont you have any idea what could cause such mapping->b_assoc_mapping 
corruption? 

I can't see how that could be caused by flaky hardware.

Maybe we should include those BUGs into the official kernel, or -mm's tree?


On Thu, Aug 19, 2004 at 05:41:13AM -0400, Gene Heskett wrote:
> On Tuesday 17 August 2004 07:57, Nick Piggin wrote:
> >Gene Heskett wrote:
> >> On Tuesday 17 August 2004 00:58, Nick Piggin wrote:
> >>>Gene Heskett wrote:
> >>>>Reboot time I guess :(((
> >>>
> >>>All your low memory has been used by dentry and inode caches. This
> >>>isn't very
> >>>interesting because this would be no doubt caused by something
> >>>oopsing while holding the shrinker semaphore as Andrew pointed
> >>> out.
> >>>
> >>>What is interesting is that first Oops message (I wonder if you
> >>>don't have bad hardware though, I don't think anyone else is
> >>> seeing it).
> >>
> >> What 'first Oops message'?  One I posted before?
> >
> >Well, the first Oops that your running kernel raises. Usually you
> >don't bother about subsequent oopses and misbehaviour because the
> >first one can cause the system to go into a funny state - this is
> >a prime example.
> >
> >> That comment caused me to go back in the log to well above where I
> >> had been channel surfing with tvtime, and I did find an Oops:
> >>
> >> Aug 16 21:15:46 coyote kernel: Unable to handle kernel NULL
> >> pointer dereference at virtual address 00000000 Aug 16 21:15:46
> >> coyote kernel:  printing eip:
> >> Aug 16 21:15:46 coyote kernel: c015c8db
> >> Aug 16 21:15:46 coyote kernel: *pde = 00000000
> >> Aug 16 21:15:46 coyote kernel: Oops: 0002 [#1]
> >> Aug 16 21:15:46 coyote kernel: Modules linked in: tuner tvaudio
> >> bttv video_buf btcx_risc eeprom snd_seq_oss snd_seq _midi_event
> >> snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x snd_intel8x0
> >> snd_ac97_codec snd_pcm snd_timer snd_page_allo c snd_mpu401_uart
> >> snd_rawmidi snd_seq_device snd forcedeth sg Aug 16 21:15:46 coyote
> >> kernel: CPU:    0
> >> Aug 16 21:15:46 coyote kernel: EIP:    0060:[<c015c8db>]    Not
> >> tainted Aug 16 21:15:46 coyote kernel: EFLAGS: 00210206  
> >> (2.6.8-rc4) Aug 16 21:15:46 coyote kernel: EIP is at
> >> prune_icache+0x6b/0x1b0 Aug 16 21:15:46 coyote kernel: eax:
> >> 00000000   ebx: dffe0fd0   ecx: d3eb8b80   edx: c0341660 Aug 16
> >> 21:15:46 coyote kernel: esi: dffe0fc8   edi: 0000005a   ebp:
> >> d3eb8b94   esp: d3eb8b74 Aug 16 21:15:46 coyote kernel: ds: 007b  
> >> es: 007b   ss: 0068 Aug 16 21:15:46 coyote kernel: Process yum
> >> (pid: 30892, threadinfo=d3eb8000 task=cf6bf7b0) Aug 16 21:15:46
> >> coyote kernel: Stack: dffe0448 00000000 00000059 dffe0450 df58d0d0
> >> 00000080 00000000 d3eb8000 Aug 16 21:15:46 coyote kernel:       
> >> d3eb8ba0 c015ca5f 00000080 d3eb8bd4 c0135b14 00000080 000000d2
> >> 0108bf00 Aug 16 21:15:46 coyote kernel:        00000000 00021087
> >> 00000080 00000000 f7ffea20 0000000a d3eb8c50 00000000 Aug 16
> >> 21:15:46 coyote kernel: Call Trace:
> >> Aug 16 21:15:46 coyote kernel:  [<c01044ef>] show_stack+0x7f/0xa0
> >> Aug 16 21:15:46 coyote kernel:  [<c0104688>]
> >> show_registers+0x158/0x1b0 Aug 16 21:15:46 coyote kernel: 
> >> [<c01047e6>] die+0x66/0xd0 Aug 16 21:15:46 coyote kernel: 
> >> [<c01109de>] do_page_fault+0x28e/0x548 Aug 16 21:15:46 coyote
> >> kernel:  [<c010415d>] error_code+0x2d/0x38 Aug 16 21:15:46 coyote
> >> kernel:  [<c015ca5f>] shrink_icache_memory+0x3f/0x50 Aug 16
> >> 21:15:46 coyote kernel:  [<c0135b14>] shrink_slab+0x134/0x170 Aug
> >> 16 21:15:46 coyote kernel:  [<c0136954>]
> >> try_to_free_pages+0xa4/0x160 Aug 16 21:15:46 coyote kernel: 
> >> [<c012fc23>] __alloc_pages+0x1b3/0x320 Aug 16 21:15:46 coyote
> >> kernel:  [<c0139a8f>] do_anonymous_page+0x5f/0x180 Aug 16 21:15:46
> >> coyote kernel:  [<c0139c11>] do_no_page+0x61/0x310 Aug 16 21:15:46
> >> coyote kernel:  [<c013a097>] handle_mm_fault+0xd7/0x160 Aug 16
> >> 21:15:46 coyote kernel:  [<c01108a0>] do_page_fault+0x150/0x548
> >> Aug 16 21:15:46 coyote kernel:  [<c010415d>] error_code+0x2d/0x38
> >> Aug 16 21:15:46 coyote kernel:  [<c012c279>]
> >> do_generic_mapping_read+0x129/0x430 Aug 16 21:15:46 coyote kernel:
> >>  [<c012c836>] __generic_file_aio_read+0x1b6/0x1f0 Aug 16 21:15:46
> >> coyote kernel:  [<c012c8c2>] generic_file_aio_read+0x52/0x70 Aug
> >> 16 21:15:46 coyote kernel:  [<c0145898>] do_sync_read+0x78/0xa0
> >> Aug 16 21:15:46 coyote kernel:  [<c014598a>] vfs_read+0xca/0x140
> >> Aug 16 21:15:46 coyote kernel:  [<c0145c2b>] sys_read+0x4b/0x80
> >> Aug 16 21:15:46 coyote kernel:  [<c0103f61>]
> >> sysenter_past_esp+0x52/0x71 Aug 16 21:15:46 coyote kernel: Code:
> >> 89 10 a1 60 16 34 c0 89 58 04 89 03 c7 43 04 60 16 34 c0 89
> >>
> >> yum did a segfault about that time. yum is nice code, when
> >> it fscking works, which is maybe half the time on 2 different
> >> FC2 machines here now.
> >
> >Although an Oops is always the kernel's (or bad hardware's) fault.
> >So in this case you can let yum off the hook :)
> >
> >> So we're back to the dentry_cache thing...  Duh, NO!, this is in
> >> prune_icache, not prune_dcache, presumably slightly different.
> >
> >Yeah, both are going to cause cache shrinking to stop working.
> >
> >> As far as bad hardware is concerned, warranty time is running out.
> >> I need something  plausible to take back to tcwo as a good reason
> >> for requesting a 'blanket rma' on the whole thing, would they
> >> please send me another.
> >
> >Not too sure really. At this stage keep trying patches that you get
> >sent :P
> 
> I just had another but this ones a bit different:
> 
> Aug 19 04:22:11 coyote kernel: ------------[ cut here ]------------
> Aug 19 04:22:11 coyote kernel: kernel BUG at fs/buffer.c:805!
> Aug 19 04:22:11 coyote kernel: invalid operand: 0000 [#1]
> Aug 19 04:22:11 coyote kernel: Modules linked in: eeprom snd_seq_oss 
> snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_bt87x 
> snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc 
> snd_mpu401_uart snd_rawmidi snd_seq_device snd forcedeth sg
> Aug 19 04:22:11 coyote kernel: CPU:    0
> Aug 19 04:22:11 coyote kernel: EIP:    0060:[<c0147d77>]    Not 
> tainted
> Aug 19 04:22:11 coyote kernel: EFLAGS: 00010246   (2.6.8-rc4)
> Aug 19 04:22:11 coyote kernel: EIP is at 
> remove_inode_buffers+0x77/0x90
> Aug 19 04:22:11 coyote kernel: eax: 00000000   ebx: d7de519c   ecx: 
> d7deb99c   edx: d7deb974
> Aug 19 04:22:11 coyote kernel: esi: d7de50c8   edi: 00000001   ebp: 
> c198bedc   esp: c198becc
> Aug 19 04:22:11 coyote kernel: ds: 007b   es: 007b   ss: 0068
> Aug 19 04:22:11 coyote kernel: Process kswapd0 (pid: 66, 
> threadinfo=c198b000 task=c1978050)
> Aug 19 04:22:11 coyote kernel: Stack: d7de50c8 d7de50d0 d7de50c8 
> 00000057 c198bf04 c015c985 d7de50c8 00000000
> Aug 19 04:22:11 coyote kernel:        00000057 d7de5290 e50ac0d0 
> 00000080 00000000 c198b000 c198bf10 c015ca5f
> Aug 19 04:22:11 coyote kernel:        00000080 c198bf44 c0135b14 
> 00000080 000000d0 01779600 00000000 0002d1f3
> Aug 19 04:22:11 coyote kernel: Call Trace:
> Aug 19 04:22:11 coyote kernel:  [<c01044ef>] show_stack+0x7f/0xa0
> Aug 19 04:22:11 coyote kernel:  [<c0104688>] 
> show_registers+0x158/0x1b0
> Aug 19 04:22:11 coyote kernel:  [<c01047e6>] die+0x66/0xd0
> Aug 19 04:22:12 coyote kernel:  [<c0104bc3>] do_invalid_op+0xb3/0xc0
> Aug 19 04:22:12 coyote kernel:  [<c010415d>] error_code+0x2d/0x38
> Aug 19 04:22:12 coyote kernel:  [<c015c985>] prune_icache+0x115/0x1b0
> Aug 19 04:22:12 coyote kernel:  [<c015ca5f>] 
> shrink_icache_memory+0x3f/0x50
> Aug 19 04:22:12 coyote kernel:  [<c0135b14>] shrink_slab+0x134/0x170
> Aug 19 04:22:12 coyote kernel:  [<c0136bb9>] balance_pgdat+0x1a9/0x1f0
> Aug 19 04:22:12 coyote kernel:  [<c0136cbf>] kswapd+0xbf/0xd0
> Aug 19 04:22:12 coyote kernel:  [<c01023f1>] 
> kernel_thread_helper+0x5/0x14
> Aug 19 04:22:12 coyote kernel: Code: 0f 0b 25 03 e5 0b 30 c0 eb c4 31 
> ff eb de 0f 0b 36 04 e5 0b
> 
> The system is still up but its 100 megs into swap so I'm going to 
> reboot without changing anything.  Is this one traceable?

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-19 18:36                                                 ` Marcelo Tosatti
@ 2004-08-20  2:38                                                   ` Gene Heskett
  2004-08-20  7:33                                                     ` Marcelo Tosatti
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-20  2:38 UTC (permalink / raw)
  To: linux-kernel
  Cc: Marcelo Tosatti, Nick Piggin, viro, Linus Torvalds, Andrew Morton

On Thursday 19 August 2004 14:36, Marcelo Tosatti wrote:
>Gene,
>
>That is:
>
>/*
> * The buffer's backing address_space's private_lock must be held
> */
>static inline void __remove_assoc_queue(struct buffer_head *bh)
>{
>        BUG_ON(bh->b_assoc_buffers.next == NULL); 			<----------
>        BUG_ON(bh->b_assoc_buffers.prev == NULL);
>        list_del_init(&bh->b_assoc_buffers);
>}
>
>Viro, Linus, Andrew, dont you have any idea what could cause such
> mapping->b_assoc_mapping corruption?
>
>I can't see how that could be caused by flaky hardware.

There is still that possibility Marcelo.  Someone recommended I get 
cpuburn and memburn, and before fixing the scanf statement (it was 
broken) in memburn, I had compiled it for a 512 meg test the first 
time, and a 768 meg test the next couple of runs.

All exited with errors like this:
Passed round 133, elapsed 4827.19.
FAILED at round 134/14208927: got ff00, expected 0!!!

REREAD: ff00, ff00, ff00!!!

[root@coyote memburn]# vim memburn.c
[root@coyote memburn]# gcc -o memburn memburn.c
[root@coyote memburn]# ./memburn
Starting test with size 768 megs..

Passed round 0, elapsed 44.36.
Passed round 1, elapsed 74.13.
Passed round 2, elapsed 105.12.
FAILED at round 3/25777183: got 2b00, expected 0!!!

REREAD: 2b00, 2b00, 2b00!!!

I've now rebuilt it with a better printf format string, and its 
running over 768 megs again.  But this time the round counter is up 
to 90 and still going...

Interesting too is that memburn has now allocated a 768 meg wide block 
5 times, and still no Oops.  Over a hundred megs in swap, but its 
still running.

I lost the BUG_ON patches in fs/buffer.c, this is now 2.6.8.1-mm2 (but 
I can go back if this fails of course)

Or can I just copy that 2.6.8-rc4/fs/buffer.c file over this one?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-13  1:31                             ` Linus Torvalds
  2004-08-13  2:03                               ` Gene Heskett
  2004-08-13  2:27                               ` Andreas Dilger
@ 2004-08-20  7:02                               ` Udo A. Steinberg
  2004-08-20  7:11                                 ` Andrew Morton
  2004-09-12  7:03                               ` Udo A. Steinberg
  3 siblings, 1 reply; 146+ messages in thread
From: Udo A. Steinberg @ 2004-08-20  7:02 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, viro, Nick Piggin

[-- Attachment #1: Type: text/plain, Size: 3623 bytes --]

On Thu, 12 Aug 2004 18:31:31 -0700 (PDT) Linus Torvalds (LT) wrote:

LT> Your slab usage seems to be:
LT> 
LT> 	cumulative	     usage	name
LT> 	=========	    ======	====
LT> 		.....
LT> 	  2,021,428	   151,552	pgd
LT> 	  2,182,804	   161,376	size-96
LT> 	  2,367,124	   184,320	biovec-(256)
LT> 	  2,559,124	   192,000	biovec-128
LT> 	  2,751,124	   192,000	biovec-64
LT> 	  2,997,076	   245,952	ext3_inode_cache
LT> 	  3,255,124	   258,048	size-1024
LT> 	  3,545,940	   290,816	size-512
LT> 	  3,843,468	   297,528	radix_tree_node
LT> 	  4,153,932	   310,464	inode_cache
LT> 	  4,494,972	   341,040	dentry_cache
LT> 	  4,994,684	   499,712	size-8192
LT> 	  5,912,188	   917,504	size-32768
LT> 	105,397,820	99,485,632	size-64
LT> 
LT> Something pretty much stands out.
LT> 
LT> What the _heck_ is doing 64-byte allocations and leaking them?
LT> 
LT> Can you figure out what triggers it for you? If nothing obvious comes to 
LT> mind, could you do something really silly like this

[...]

Linus,

So far I have had serious trouble reproducing the slab misbehaviour quoted
above. However, I've just come across what appears to be a serious VM or USB
problem which may or may not be related to that, and I can reproduce it.

I've tried to download 700 MB of data from a digital camera via USB using
"gphoto2 --get-all-files" and I can repeatedly run my 128 MB box out of
memory using either Linux 2.4.26 or 2.6.8.1 for that.


2.4.26 fails with

Aug 19 23:02:05 laptop kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
Aug 19 23:02:05 laptop kernel: __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
Aug 19 23:02:05 laptop kernel: __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)

2.6.8.1 fails with

Aug 19 21:27:41 laptop kernel: usb 1-1: usbfs: interface 0 claimed while 'gphoto2' sets config #1
Aug 19 21:46:43 laptop kernel: oom-killer: gfp_mask=0x1d2                                        
Aug 19 21:46:43 laptop kernel: DMA per-cpu:
Aug 19 21:46:43 laptop kernel: cpu 0 hot: low 2, high 6, batch 1 
Aug 19 21:46:43 laptop kernel: cpu 0 cold: low 0, high 2, batch 1
Aug 19 21:46:43 laptop kernel: Normal per-cpu:
Aug 19 21:46:43 laptop kernel: cpu 0 hot: low 14, high 42, batch 7
Aug 19 21:46:43 laptop kernel: cpu 0 cold: low 0, high 14, batch 7
Aug 19 21:46:43 laptop kernel: HighMem per-cpu: empty             
Aug 19 21:46:43 laptop kernel: 
Aug 19 21:46:43 laptop kernel: Free pages:        1324kB (0kB HighMem)
Aug 19 21:46:43 laptop kernel: Active:1315 inactive:27343 dirty:0 writeback:0 unstable:0 free:331 slab:1606 mapped:1555 pagetables:241
Aug 19 21:46:43 laptop kernel: DMA free:704kB min:44kB low:88kB high:132kB active:0kB inactive:10720kB present:16384kB                
Aug 19 21:46:43 laptop kernel: protections[]: 22 178 178
Aug 19 21:46:43 laptop kernel: Normal free:620kB min:312kB low:624kB high:936kB active:5260kB inactive:98652kB present:114688kB
Aug 19 21:46:43 laptop kernel: protections[]: 0 156 156
Aug 19 21:46:43 laptop kernel: HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB
Aug 19 21:46:43 laptop kernel: protections[]: 0 0 0
Aug 19 21:46:43 laptop kernel: DMA: 0*4kB 2*8kB 5*16kB 5*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 704kB   
Aug 19 21:46:43 laptop kernel: Normal: 1*4kB 3*8kB 3*16kB 1*32kB 0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 620kB
Aug 19 21:46:43 laptop kernel: HighMem: empty
Aug 19 21:46:43 laptop kernel: Swap cache: add 366080, delete 339455, find 219744/259874, race 0+0
Aug 19 21:46:43 laptop kernel: Out of Memory: Killed process 10239 (gphoto2).        

-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20  7:02                               ` Udo A. Steinberg
@ 2004-08-20  7:11                                 ` Andrew Morton
  2004-08-20  7:19                                   ` Udo A. Steinberg
  0 siblings, 1 reply; 146+ messages in thread
From: Andrew Morton @ 2004-08-20  7:11 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: torvalds, linux-kernel, viro, nickpiggin

"Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote:
>
> I've tried to download 700 MB of data from a digital camera via USB using
>  "gphoto2 --get-all-files" and I can repeatedly run my 128 MB box out of
>  memory using either Linux 2.4.26 or 2.6.8.1 for that.

whee.  How much swap is online?

Not that it matters - you seem to have a bunch of reclaimable pagecache
just sitting there.  Very odd.

Could gphoto2 be using mlock?  Does it run as root?

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20  7:11                                 ` Andrew Morton
@ 2004-08-20  7:19                                   ` Udo A. Steinberg
  2004-08-20  7:49                                     ` Nick Piggin
  0 siblings, 1 reply; 146+ messages in thread
From: Udo A. Steinberg @ 2004-08-20  7:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, linux-kernel, viro, nickpiggin

[-- Attachment #1: Type: text/plain, Size: 691 bytes --]

On Fri, 20 Aug 2004 00:11:54 -0700 Andrew Morton (AM) wrote:

AM> "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote:
AM> >
AM> > I've tried to download 700 MB of data from a digital camera via USB using
AM> >  "gphoto2 --get-all-files" and I can repeatedly run my 128 MB box out of
AM> >  memory using either Linux 2.4.26 or 2.6.8.1 for that.
AM> 
AM> whee.  How much swap is online?

Something close to 512 MB.

Adding 506512k swap on /dev/hda2.  Priority:-1 extents:1

AM> Not that it matters - you seem to have a bunch of reclaimable pagecache
AM> just sitting there.  Very odd.
AM> 
AM> Could gphoto2 be using mlock?  Does it run as root?

No, gphoto2 was not running as root.

-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20  2:38                                                   ` Gene Heskett
@ 2004-08-20  7:33                                                     ` Marcelo Tosatti
  2004-08-20 15:06                                                       ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Marcelo Tosatti @ 2004-08-20  7:33 UTC (permalink / raw)
  To: Gene Heskett, mingo
  Cc: linux-kernel, Nick Piggin, viro, Linus Torvalds, Andrew Morton

On Thu, Aug 19, 2004 at 10:38:19PM -0400, Gene Heskett wrote:
> On Thursday 19 August 2004 14:36, Marcelo Tosatti wrote:
> >Gene,
> >
> >That is:
> >
> >/*
> > * The buffer's backing address_space's private_lock must be held
> > */
> >static inline void __remove_assoc_queue(struct buffer_head *bh)
> >{
> >        BUG_ON(bh->b_assoc_buffers.next == NULL); 			<----------
> >        BUG_ON(bh->b_assoc_buffers.prev == NULL);
> >        list_del_init(&bh->b_assoc_buffers);
> >}
> >
> >Viro, Linus, Andrew, dont you have any idea what could cause such
> > mapping->b_assoc_mapping corruption?
> >
> >I can't see how that could be caused by flaky hardware.
> 
> There is still that possibility Marcelo.  Someone recommended I get 
> cpuburn and memburn, and before fixing the scanf statement (it was 
> broken) in memburn, I had compiled it for a 512 meg test the first 
> time, and a 768 meg test the next couple of runs.
> 
> All exited with errors like this:
> Passed round 133, elapsed 4827.19.
> FAILED at round 134/14208927: got ff00, expected 0!!!
> 
> REREAD: ff00, ff00, ff00!!!
> 
> [root@coyote memburn]# vim memburn.c
> [root@coyote memburn]# gcc -o memburn memburn.c
> [root@coyote memburn]# ./memburn
> Starting test with size 768 megs..
> 
> Passed round 0, elapsed 44.36.
> Passed round 1, elapsed 74.13.
> Passed round 2, elapsed 105.12.
> FAILED at round 3/25777183: got 2b00, expected 0!!!
> 
> REREAD: 2b00, 2b00, 2b00!!!
> 
> I've now rebuilt it with a better printf format string, and its 
> running over 768 megs again.  But this time the round counter is up 
> to 90 and still going...
> 
> Interesting too is that memburn has now allocated a 768 meg wide block 
> 5 times, and still no Oops.  Over a hundred megs in swap, but its 
> still running.
> 
> I lost the BUG_ON patches in fs/buffer.c, this is now 2.6.8.1-mm2 (but 
> I can go back if this fails of course)
> 
> Or can I just copy that 2.6.8-rc4/fs/buffer.c file over this one?

You can just copy it, _I think_. If you have problems just add the BUG_ON's by hand. 

Now Ingo also hit the same problem, Ingo can you reproduce that 
remove_inode_buffers()? 

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20  7:19                                   ` Udo A. Steinberg
@ 2004-08-20  7:49                                     ` Nick Piggin
  2004-08-24  6:08                                       ` Udo A. Steinberg
  0 siblings, 1 reply; 146+ messages in thread
From: Nick Piggin @ 2004-08-20  7:49 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Andrew Morton, torvalds, linux-kernel, viro

[-- Attachment #1: Type: text/plain, Size: 850 bytes --]

Udo A. Steinberg wrote:
> On Fri, 20 Aug 2004 00:11:54 -0700 Andrew Morton (AM) wrote:
> 
> AM> "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote:
> AM> >
> AM> > I've tried to download 700 MB of data from a digital camera via USB using
> AM> >  "gphoto2 --get-all-files" and I can repeatedly run my 128 MB box out of
> AM> >  memory using either Linux 2.4.26 or 2.6.8.1 for that.
> AM> 
> AM> whee.  How much swap is online?
> 
> Something close to 512 MB.
> 
> Adding 506512k swap on /dev/hda2.  Priority:-1 extents:1
> 
> AM> Not that it matters - you seem to have a bunch of reclaimable pagecache
> AM> just sitting there.  Very odd.
> AM> 
> AM> Could gphoto2 be using mlock?  Does it run as root?
> 
> No, gphoto2 was not running as root.
> 
> -Udo.

Can you reproduce the OOM with the following patch please? Then
send the output.

Thanks


[-- Attachment #2: vm-unreclaimable-debug.patch --]
[-- Type: text/x-patch, Size: 878 bytes --]




---

 linux-2.6-npiggin/mm/page_alloc.c |    6 +++++-
 1 files changed, 5 insertions(+), 1 deletion(-)

diff -puN mm/page_alloc.c~vm-unreclaimable-debug mm/page_alloc.c
--- linux-2.6/mm/page_alloc.c~vm-unreclaimable-debug	2004-08-20 17:44:45.000000000 +1000
+++ linux-2.6-npiggin/mm/page_alloc.c	2004-08-20 17:48:26.000000000 +1000
@@ -1182,6 +1182,8 @@ void show_free_areas(void)
 			" active:%lukB"
 			" inactive:%lukB"
 			" present:%lukB"
+			" pages_scanned:%lu"
+			" all_unreclaimable? %s"
 			"\n",
 			zone->name,
 			K(zone->free_pages),
@@ -1190,7 +1192,9 @@ void show_free_areas(void)
 			K(zone->pages_high),
 			K(zone->nr_active),
 			K(zone->nr_inactive),
-			K(zone->present_pages)
+			K(zone->present_pages),
+			zone->pages_scanned,
+			(zone->all_unreclaimable ? "yes" : "no")
 			);
 		printk("protections[]:");
 		for (i = 0; i < MAX_NR_ZONES; i++)

_

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20  7:33                                                     ` Marcelo Tosatti
@ 2004-08-20 15:06                                                       ` Gene Heskett
  2004-08-20 15:43                                                         ` V13
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-20 15:06 UTC (permalink / raw)
  To: linux-kernel
  Cc: Marcelo Tosatti, mingo, Nick Piggin, viro, Linus Torvalds, Andrew Morton

On Friday 20 August 2004 03:33, Marcelo Tosatti wrote:
[...]
>> >I can't see how that could be caused by flaky hardware.
>>
>> There is still that possibility Marcelo.  Someone recommended I
>> get cpuburn and memburn, and before fixing the scanf statement (it
>> was broken) in memburn, I had compiled it for a 512 meg test the
>> first time, and a 768 meg test the next couple of runs.
>>
>> All exited with errors like this:
>> Passed round 133, elapsed 4827.19.
>> FAILED at round 134/14208927: got ff00, expected 0!!!
>>
>> REREAD: ff00, ff00, ff00!!!
>>
>> [root@coyote memburn]# vim memburn.c
>> [root@coyote memburn]# gcc -o memburn memburn.c
>> [root@coyote memburn]# ./memburn
>> Starting test with size 768 megs..
>>
>> Passed round 0, elapsed 44.36.
>> Passed round 1, elapsed 74.13.
>> Passed round 2, elapsed 105.12.
>> FAILED at round 3/25777183: got 2b00, expected 0!!!
>>
>> REREAD: 2b00, 2b00, 2b00!!!

The latest output of memburn after a bit of format hacking:

FAILED at round 78/165714207: got 0000ff00, expected 00000000!!!
REREAD: 0000ff00, 0000ff00, 0000ff00!!!

and

FAILED at round 160/200780831: got 02025302, expected 02020202!!!
REREAD: 02025302, 02025302, 02025302!!!

So it appears that its the third byte of 4 each time thats fubar'd.  
I'l run it a few more times to confirm.  Is memory byte wide per chip 
on these things today?

>> I've now rebuilt it with a better printf format string, and its
>> running over 768 megs again.  But this time the round counter is
>> up to 90 and still going...
>>
>> Interesting too is that memburn has now allocated a 768 meg wide
>> block 5 times, and still no Oops.  Over a hundred megs in swap,
>> but its still running.
>>
>> I lost the BUG_ON patches in fs/buffer.c, this is now 2.6.8.1-mm2
>> (but I can go back if this fails of course)
>>
>> Or can I just copy that 2.6.8-rc4/fs/buffer.c file over this one?
>
>You can just copy it, _I think_. If you have problems just add the
> BUG_ON's by hand.

Looks like I'll have to, the newer one is about 600 bytes bigger 
already, so there are lots of changes.

OTOH, I'm now up 21 hours, and the memory management so far is 
surviving on 2.6.8.1-mm2.  memburn may be hitting the errors, keeping 
them from taking down the os maybe?  Sillier things have happened.

>Now Ingo also hit the same problem, Ingo can you reproduce that
>remove_inode_buffers()?

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20 15:06                                                       ` Gene Heskett
@ 2004-08-20 15:43                                                         ` V13
  2004-08-20 17:29                                                           ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: V13 @ 2004-08-20 15:43 UTC (permalink / raw)
  To: gene.heskett
  Cc: linux-kernel, Marcelo Tosatti, mingo, Nick Piggin, viro,
	Linus Torvalds, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2744 bytes --]

On Friday 20 August 2004 18:06, Gene Heskett wrote:
> On Friday 20 August 2004 03:33, Marcelo Tosatti wrote:
> [...]
>
> >> >I can't see how that could be caused by flaky hardware.
> >>
> >> There is still that possibility Marcelo.  Someone recommended I
> >> get cpuburn and memburn, and before fixing the scanf statement (it
> >> was broken) in memburn, I had compiled it for a 512 meg test the
> >> first time, and a 768 meg test the next couple of runs.
> >>
> >> All exited with errors like this:
> >> Passed round 133, elapsed 4827.19.
> >> FAILED at round 134/14208927: got ff00, expected 0!!!
> >>
> >> REREAD: ff00, ff00, ff00!!!
> >>
> >> [root@coyote memburn]# vim memburn.c
> >> [root@coyote memburn]# gcc -o memburn memburn.c
> >> [root@coyote memburn]# ./memburn
> >> Starting test with size 768 megs..
> >>
> >> Passed round 0, elapsed 44.36.
> >> Passed round 1, elapsed 74.13.
> >> Passed round 2, elapsed 105.12.
> >> FAILED at round 3/25777183: got 2b00, expected 0!!!
> >>
> >> REREAD: 2b00, 2b00, 2b00!!!
>
> The latest output of memburn after a bit of format hacking:
>
> FAILED at round 78/165714207: got 0000ff00, expected 00000000!!!
> REREAD: 0000ff00, 0000ff00, 0000ff00!!!
>
> and
>
> FAILED at round 160/200780831: got 02025302, expected 02020202!!!
> REREAD: 02025302, 02025302, 02025302!!!
>
> So it appears that its the third byte of 4 each time thats fubar'd.
> I'l run it a few more times to confirm.  Is memory byte wide per chip
> on these things today?

I had a simillar problem some years ago. I had core dumps and gcc errors all 
the time but memtest could not find a thing. 99% it was a CPU problem and not 
a memory problem. It seemed that there were errors at random times even when 
there was no cpu load.

I believe it was a cache problem. I made a simple prog (like memburn) that 
allocated memory blocks and then did some read/write on them (alloc+write 5 
blocks, check 1, free 1, alloc+write 6, check 2, free 2 alloc+write 7....). 
After that whenever the program encountered an error it looped on this block 
forever. 

The errors occured after a random period of time (from 1 block allocation to 
more than an hour) and were never reproduced after a stop/start. When this 
test program was running and looping on the bad block, gcc never displayed 
errors. The problem was fixed when I replaced the CPU and I'm still using the 
same DIMMs without problems. I also did a lot of checks before replacing the 
CPU, like changing the position of the DIMMs, removing one of them, change 
their timing, and much more without success. Even removed all the PCI cards.

Disabling the CPU cache or replacing it can be a good test.

<<V13>>

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20 15:43                                                         ` V13
@ 2004-08-20 17:29                                                           ` Gene Heskett
  2004-08-20 18:13                                                             ` Marc Ballarin
  2004-08-20 20:11                                                             ` R. J. Wysocki
  0 siblings, 2 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-20 17:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: V13, Marcelo Tosatti, mingo, Nick Piggin, viro, Linus Torvalds,
	Andrew Morton

On Friday 20 August 2004 11:43, V13 wrote:
>On Friday 20 August 2004 18:06, Gene Heskett wrote:
>> On Friday 20 August 2004 03:33, Marcelo Tosatti wrote:
>> [...]
>>
>> >> >I can't see how that could be caused by flaky hardware.
>> >>
>> >> There is still that possibility Marcelo.  Someone recommended I
>> >> get cpuburn and memburn, and before fixing the scanf statement
>> >> (it was broken) in memburn, I had compiled it for a 512 meg
>> >> test the first time, and a 768 meg test the next couple of
>> >> runs.
>> >>
>> >> All exited with errors like this:
>> >> Passed round 133, elapsed 4827.19.
>> >> FAILED at round 134/14208927: got ff00, expected 0!!!
>> >>
>> >> REREAD: ff00, ff00, ff00!!!
>> >>
>> >> [root@coyote memburn]# vim memburn.c
>> >> [root@coyote memburn]# gcc -o memburn memburn.c
>> >> [root@coyote memburn]# ./memburn
>> >> Starting test with size 768 megs..
>> >>
>> >> Passed round 0, elapsed 44.36.
>> >> Passed round 1, elapsed 74.13.
>> >> Passed round 2, elapsed 105.12.
>> >> FAILED at round 3/25777183: got 2b00, expected 0!!!
>> >>
>> >> REREAD: 2b00, 2b00, 2b00!!!
>>
>> The latest output of memburn after a bit of format hacking:
>>
>> FAILED at round 78/165714207: got 0000ff00, expected 00000000!!!
>> REREAD: 0000ff00, 0000ff00, 0000ff00!!!
>>
>> and
>>
>> FAILED at round 160/200780831: got 02025302, expected 02020202!!!
>> REREAD: 02025302, 02025302, 02025302!!!
>>
>> So it appears that its the third byte of 4 each time thats
>> fubar'd. I'l run it a few more times to confirm.  Is memory byte
>> wide per chip on these things today?
>
>I had a simillar problem some years ago. I had core dumps and gcc
> errors all the time but memtest could not find a thing. 99% it was
> a CPU problem and not a memory problem. It seemed that there were
> errors at random times even when there was no cpu load.
>
>I believe it was a cache problem. I made a simple prog (like
> memburn) that allocated memory blocks and then did some read/write
> on them (alloc+write 5 blocks, check 1, free 1, alloc+write 6,
> check 2, free 2 alloc+write 7....). After that whenever the program
> encountered an error it looped on this block forever.
>
>The errors occured after a random period of time (from 1 block
> allocation to more than an hour) and were never reproduced after a
> stop/start. When this test program was running and looping on the
> bad block, gcc never displayed errors. The problem was fixed when I
> replaced the CPU and I'm still using the same DIMMs without
> problems. I also did a lot of checks before replacing the CPU, like
> changing the position of the DIMMs, removing one of them, change
> their timing, and much more without success. Even removed all the
> PCI cards.
>
>Disabling the CPU cache or replacing it can be a good test.
>
><<V13>>

I tried disabling it in the bios and the machine became unusable for 
all practical purposes.  But it did run about half a day that way.  
I'd estimate its speed was similar to a 33 mhz 386sx with only 8 megs 
of ram though.  I could type a full sentence ahead of the screen 
display in kmail for instance.  Had it been usable, I might have been 
tempted to let it run a couple of days just for grins.  On the next 
reboot, I'm going to switch the stick around, and see if the errors 
move to an even address.  If they do, then I'd be convinced its 
memory and not cache.  The question then becomes which stick in a 
dual channel setup is even addresses, and which is odd addresses.

Probably best to just go buy another half gigger and swap it in for 
one of these one at a time.  And hope its better!

Yup, memburn stopped again, at an odd address, showing the same 
failure pattern in byte 3 of 4.

FAILED at round 63/20669951: got 0000ff00, expected 00000000!!!
REREAD: 0000ff00, 0000ff00, 0000ff00!!!

I guess i'm going to town.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20 17:29                                                           ` Gene Heskett
@ 2004-08-20 18:13                                                             ` Marc Ballarin
  2004-08-20 20:08                                                               ` Gene Heskett
  2004-08-20 20:11                                                             ` R. J. Wysocki
  1 sibling, 1 reply; 146+ messages in thread
From: Marc Ballarin @ 2004-08-20 18:13 UTC (permalink / raw)
  To: gene.heskett; +Cc: linux-kernel, v13

On Fri, 20 Aug 2004 13:29:05 -0400
Gene Heskett <gene.heskett@verizon.net> wrote:

> 
> I tried disabling it in the bios and the machine became unusable for 
> all practical purposes. 

Is ECC checking for L2 cache enabled in your BIOS?

BTW: I trimmed the CC list somewhat

Regards

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20 18:13                                                             ` Marc Ballarin
@ 2004-08-20 20:08                                                               ` Gene Heskett
  2004-08-21  9:25                                                                 ` Barry K. Nathan
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-20 20:08 UTC (permalink / raw)
  To: linux-kernel; +Cc: Marc Ballarin, v13

On Friday 20 August 2004 14:13, Marc Ballarin wrote:
>On Fri, 20 Aug 2004 13:29:05 -0400
>
>Gene Heskett <gene.heskett@verizon.net> wrote:
>> I tried disabling it in the bios and the machine became unusable
>> for all practical purposes.
>
>Is ECC checking for L2 cache enabled in your BIOS?

There isn't a switch for that and as near as I can tell, no L2 cache 
on this board, only the L1 in the cpu.  If there is an L2, then 
memtest86 can't find it, and I don't see any chips that look like 
seperate memory.  Memtest86 may not know howto enable it if its an 
nforce2 option.  Whatever cache shown as switchable in the bios, 
turning it off makes a very sick bird out of the machine, like a 
33mhz 386sx?

I've located the bios docs on the Biostar site, and was set to print 
them when it locked up the last time. So I'll restart that project 
shortly.

But it does run with it off and for the short time I left it that way, 
no errors.

>BTW: I trimmed the CC list somewhat
>
>Regards

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20 17:29                                                           ` Gene Heskett
  2004-08-20 18:13                                                             ` Marc Ballarin
@ 2004-08-20 20:11                                                             ` R. J. Wysocki
  2004-08-20 20:17                                                               ` Gene Heskett
  1 sibling, 1 reply; 146+ messages in thread
From: R. J. Wysocki @ 2004-08-20 20:11 UTC (permalink / raw)
  To: gene.heskett; +Cc: linux-kernel

On Friday 20 of August 2004 19:29, Gene Heskett wrote:
> On Friday 20 August 2004 11:43, V13 wrote:
> >On Friday 20 August 2004 18:06, Gene Heskett wrote:
> >> On Friday 20 August 2004 03:33, Marcelo Tosatti wrote:
> >> [...]
> >>
> >> >> >I can't see how that could be caused by flaky hardware.
> >> >>
> >> >> There is still that possibility Marcelo.  Someone recommended I
> >> >> get cpuburn and memburn, and before fixing the scanf statement
> >> >> (it was broken) in memburn, I had compiled it for a 512 meg
> >> >> test the first time, and a 768 meg test the next couple of
> >> >> runs.
> >> >>
> >> >> All exited with errors like this:
> >> >> Passed round 133, elapsed 4827.19.
> >> >> FAILED at round 134/14208927: got ff00, expected 0!!!
> >> >>
> >> >> REREAD: ff00, ff00, ff00!!!
> >> >>
> >> >> [root@coyote memburn]# vim memburn.c
> >> >> [root@coyote memburn]# gcc -o memburn memburn.c
> >> >> [root@coyote memburn]# ./memburn
> >> >> Starting test with size 768 megs..
> >> >>
> >> >> Passed round 0, elapsed 44.36.
> >> >> Passed round 1, elapsed 74.13.
> >> >> Passed round 2, elapsed 105.12.
> >> >> FAILED at round 3/25777183: got 2b00, expected 0!!!
> >> >>
> >> >> REREAD: 2b00, 2b00, 2b00!!!
> >>
> >> The latest output of memburn after a bit of format hacking:
> >>
> >> FAILED at round 78/165714207: got 0000ff00, expected 00000000!!!
> >> REREAD: 0000ff00, 0000ff00, 0000ff00!!!
> >>
> >> and
> >>
> >> FAILED at round 160/200780831: got 02025302, expected 02020202!!!
> >> REREAD: 02025302, 02025302, 02025302!!!
> >>
> >> So it appears that its the third byte of 4 each time thats
> >> fubar'd. I'l run it a few more times to confirm.  Is memory byte
> >> wide per chip on these things today?
> >
> >I had a simillar problem some years ago. I had core dumps and gcc
> > errors all the time but memtest could not find a thing. 99% it was
> > a CPU problem and not a memory problem. It seemed that there were
> > errors at random times even when there was no cpu load.
> >
> >I believe it was a cache problem. I made a simple prog (like
> > memburn) that allocated memory blocks and then did some read/write
> > on them (alloc+write 5 blocks, check 1, free 1, alloc+write 6,
> > check 2, free 2 alloc+write 7....). After that whenever the program
> > encountered an error it looped on this block forever.
> >
> >The errors occured after a random period of time (from 1 block
> > allocation to more than an hour) and were never reproduced after a
> > stop/start. When this test program was running and looping on the
> > bad block, gcc never displayed errors. The problem was fixed when I
> > replaced the CPU and I'm still using the same DIMMs without
> > problems. I also did a lot of checks before replacing the CPU, like
> > changing the position of the DIMMs, removing one of them, change
> > their timing, and much more without success. Even removed all the
> > PCI cards.
> >
> >Disabling the CPU cache or replacing it can be a good test.
> >
> ><<V13>>
>
> I tried disabling it in the bios and the machine became unusable for
> all practical purposes.  But it did run about half a day that way.
> I'd estimate its speed was similar to a 33 mhz 386sx with only 8 megs
> of ram though.  I could type a full sentence ahead of the screen
> display in kmail for instance.  Had it been usable, I might have been
> tempted to let it run a couple of days just for grins.  On the next
> reboot, I'm going to switch the stick around, and see if the errors
> move to an even address.  If they do, then I'd be convinced its
> memory and not cache.  The question then becomes which stick in a
> dual channel setup is even addresses, and which is odd addresses.
>
> Probably best to just go buy another half gigger and swap it in for
> one of these one at a time.  And hope its better!
>
> Yup, memburn stopped again, at an odd address, showing the same
> failure pattern in byte 3 of 4.
>
> FAILED at round 63/20669951: got 0000ff00, expected 00000000!!!
> REREAD: 0000ff00, 0000ff00, 0000ff00!!!
>
> I guess i'm going to town.

There's a simple test you can do unless your DIMMs must go in pairs (I don't 
remember if it's required by nforce2): remove one of them and see what 
happens.  If you can reproduce the same symptoms on each of them separately, 
I'd bet on a cache problem.

Greetings,

-- 
Rafael J. Wysocki
----------------------------
For a successful technology, reality must take precedence over public 
relations, for nature cannot be fooled.
					-- Richard P. Feynman

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20 20:11                                                             ` R. J. Wysocki
@ 2004-08-20 20:17                                                               ` Gene Heskett
  2004-08-22  5:05                                                                 ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-20 20:17 UTC (permalink / raw)
  To: linux-kernel; +Cc: R. J. Wysocki

On Friday 20 August 2004 16:11, R. J. Wysocki wrote:[...]

>There's a simple test you can do unless your DIMMs must go in pairs
> (I don't remember if it's required by nforce2): remove one of them
> and see what happens.

To get dual channel DDR, they have to be in a pair.  Since this post, 
they've been swapped one for the other, and I'll be curious to see if 
the address goes to an even address when it errors, which it hasn't 
yet.

> If you can reproduce the same symptoms on 
> each of them separately, I'd bet on a cache problem.
>
That makes sense, so I can try that too.  I hadn't thought of that, 
duh!
>Greetings,

Someone else asked if ECC was on, but this board doesn't have it, and 
the memory has a blank pattern where the parity chip would be.  So I 
think its safe to say no :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]           ` <200408210118.02011.vda@port.imtp.ilyichevsk.odessa.ua>
@ 2004-08-21  1:40             ` Gene Heskett
  0 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-21  1:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: Denis Vlasenko

On Friday 20 August 2004 18:18, Denis Vlasenko wrote:
>> mmm, I wonder who the zombie is.  Ahh, it's ~/bin/its-daylight.
>> It's a script that cron triggered, and which changes the mode of
>> the heyu/xtend stuff for daytime operations.  Its (a bash script)
>> apparently hung looking for a response it didn't get.  I have 3
>> of those at various times of the day and I've never gotten email
>> from that one.  The mode change does occur though...  FWIW heyu
>> has been fixed, the distro version has a severe scope problem
>> from a missing '}' which was not caught by the compiler, but by
>> a tool I wrote years ago for os9 that I've ported to linux!   The
>> heyu author ): didn't seem to be interested in fixing it either.
>>
>> I'll go take a look at it after I've sent this, but it does bring
>> up a sore point.  linux doesn't get this right, os9 did.  zombies
>> are killable by os9, it simply takes it out of the execution
>> queue, and reclaims all resources used back into the free pool, no
>> questions asked or expected.  We shouldn't have to reboot just to
>> kill a fscking zombie...
>
>zombie is not much more than an exit code to be collected by
>wait() syscall. All other resources are already freed.
>
>Zombies result when parent does not wait() for dead children.
>Trivial example:
>
>#!/bin/sh
>sleep 10 &
>exec env - sleep 100
>
>26752 pts/0    S      0:00           sleep 100
>26753 pts/0    Z      0:00             [sleep <defunct>]
>
>Such zombies got reparented to init *as soon as parent dies itself*.
>Properly functioning init constanly wait()s for any unexpected
> chindren, so it takes care of zombies.
>--
>vda

Oh oh, looks like I need a lesson in bash then.  The whole basic idea 
of what I was doing there was for the parent shell to go away, 
leaving the child process sitting there until its done some 10 
seconds later.  If I didn't do that, then cron seemed to hang on the 
first execution as if was dutifully waiting for bash to exit...

The bash manual I have is both too concise, and too verbose because 
bash is as close to emac's as I can think of when looking for a 
universal executer.

In the crontab its this:
00 05 * * *     /root/bin/its-daylight

Then /root/bin/its-daylight calls 2 other scripts using the "&" 
syntax.

So I guess its time to RTFM on bash again.

Thanks.

Now, to get this back on-thread..

I switched the memory sticks to each others sockets this afternoon.  
And "memburn 512" megabytes, which puts me into the swap about 70 
megs, is still running with no detected errors in 1162 loops.  About 
4:30 elapsed time so far.  I've got all my fingers and toes crossed, 
and everything but tied a knot in it, hoping this may be the end of 
the problem.  If not, then the nightmare continues.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20 20:08                                                               ` Gene Heskett
@ 2004-08-21  9:25                                                                 ` Barry K. Nathan
  2004-08-21 18:31                                                                   ` V13
  0 siblings, 1 reply; 146+ messages in thread
From: Barry K. Nathan @ 2004-08-21  9:25 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, Marc Ballarin, v13

On Fri, Aug 20, 2004 at 04:08:50PM -0400, Gene Heskett wrote:
> On Friday 20 August 2004 14:13, Marc Ballarin wrote:
[snip]
> >Is ECC checking for L2 cache enabled in your BIOS?
> 
> There isn't a switch for that and as near as I can tell, no L2 cache 
> on this board, only the L1 in the cpu.  If there is an L2, then 
> memtest86 can't find it, and I don't see any chips that look like 
> seperate memory.

The L2 cache is *on the CPU chip itself*. Any CPU recent enough to
physically fit into an nForce board has the L2 cache on the CPU itself.
I think the last Athlons to have separate L2 cache chips were the Slot A
models, and even then, the L2 cache chips were still on the CPU module
and not the motherboard.

> Memtest86 may not know howto enable it if its an 
> nforce2 option.  Whatever cache shown as switchable in the bios, 
> turning it off makes a very sick bird out of the machine, like a 
> 33mhz 386sx?

Yeah, disabling the L2 cache on a modern CPU makes it really slow. But,
it's still a useful troubleshooting option...

-Barry K. Nathan <barryn@pobox.com>


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-21  9:25                                                                 ` Barry K. Nathan
@ 2004-08-21 18:31                                                                   ` V13
  2004-08-21 18:55                                                                     ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: V13 @ 2004-08-21 18:31 UTC (permalink / raw)
  To: Barry K. Nathan; +Cc: Gene Heskett, linux-kernel, Marc Ballarin

[-- Attachment #1: Type: text/plain, Size: 1272 bytes --]

On Saturday 21 August 2004 12:25, Barry K. Nathan wrote:
> > Memtest86 may not know howto enable it if its an
> > nforce2 option.  Whatever cache shown as switchable in the bios,
> > turning it off makes a very sick bird out of the machine, like a
> > 33mhz 386sx?
>
> Yeah, disabling the L2 cache on a modern CPU makes it really slow. But,
> it's still a useful troubleshooting option...

When I had the problem described in my previous mail I came to the conclussion 
that it was related with cache *BUT* it seemed that the cache was just 
caching wrong data. Disabling the cache would just reduce the problem.

One reason for this is that when the program detected errors in a buffer (i.e. 
0x1234 instead of 0x1111) then they would NOT go away if the program was 
reading from this buffer all the time. This means that the cache always 
returned the same data. The error was 'gone' every time the program was 
suspended for a while or when something else used a lot of memory (i.e. 
another instance of this program).

So, I'm not suggesting that his cache is faulty but that there can be a CPU 
(or even a M/B) problem that corrupts data when they are transfered from 
memory to the processor.

> -Barry K. Nathan <barryn@pobox.com>
<<V13>>

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-21 18:31                                                                   ` V13
@ 2004-08-21 18:55                                                                     ` Gene Heskett
  2004-08-22 11:04                                                                       ` Helge Hafting
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-21 18:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: V13, Barry K. Nathan, Marc Ballarin

On Saturday 21 August 2004 14:31, V13 wrote:
>On Saturday 21 August 2004 12:25, Barry K. Nathan wrote:
>> > Memtest86 may not know howto enable it if its an
>> > nforce2 option.  Whatever cache shown as switchable in the bios,
>> > turning it off makes a very sick bird out of the machine, like a
>> > 33mhz 386sx?
>>
>> Yeah, disabling the L2 cache on a modern CPU makes it really slow.
>> But, it's still a useful troubleshooting option...
>
>When I had the problem described in my previous mail I came to the
> conclussion that it was related with cache *BUT* it seemed that the
> cache was just caching wrong data. Disabling the cache would just
> reduce the problem.
>
>One reason for this is that when the program detected errors in a
> buffer (i.e. 0x1234 instead of 0x1111) then they would NOT go away
> if the program was reading from this buffer all the time. This
> means that the cache always returned the same data. The error was
> 'gone' every time the program was suspended for a while or when
> something else used a lot of memory (i.e. another instance of this
> program).
>
>So, I'm not suggesting that his cache is faulty but that there can
> be a CPU (or even a M/B) problem that corrupts data when they are
> transfered from memory to the processor.
>
>> -Barry K. Nathan <barryn@pobox.com>
>
><<V13>>
Latest memburn results here, this after swapping the memory sticks for 
each other, running over 512 megs, half my ram:

Passed round 2308, elapsed 41225.98.
FAILED at round 2309/40220063: got ff000000, expected 00000000!!!
REREAD: ff000000, ff000000, ff000000!!!

So not only has the problem moved from the 2nd LSB to the MSB of the 
fetch, but it is a lot more severe in terms of the amount of time to 
catch one error, now nearly 17 hours.  I'm now up 25 hours and the 
machine feels good, no Oops so far and I've restarted memburn in 
addition to konstruct working on kde-3.3 final.  I'm over 100 megs 
into the swap, and 2.6.8.1-mm2 seems to handling the situation 
admirably so far.  That knocking sound?  Thats me, knocking on wood 
for good luck.  :-)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20 20:17                                                               ` Gene Heskett
@ 2004-08-22  5:05                                                                 ` Gene Heskett
  2004-08-22 11:42                                                                   ` R. J. Wysocki
  2004-08-24  2:34                                                                   ` Tom Vier
  0 siblings, 2 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-22  5:05 UTC (permalink / raw)
  To: linux-kernel; +Cc: R. J. Wysocki

On Friday 20 August 2004 16:17, Gene Heskett wrote:
>On Friday 20 August 2004 16:11, R. J. Wysocki wrote:[...]
>
>>There's a simple test you can do unless your DIMMs must go in pairs
>> (I don't remember if it's required by nforce2): remove one of them
>> and see what happens.
>
>To get dual channel DDR, they have to be in a pair.  Since this
> post, they've been swapped one for the other, and I'll be curious
> to see if the address goes to an even address when it errors, which
> it hasn't yet.
>
It has, one time in 35 hours now.  The problem is considerably 
reduced.

Whereas the error was always at an odd address, and in the 2nd LSbyte, 
now its still an odd address but the error has moved to the MSB of a 
32 bit fetch:

[root@coyote memburn]# ./memburn 512
Starting test with size 512 megs..
Passed round 2308, elapsed 41225.98.
FAILED at round 2309/40220063: got ff000000, expected 00000000!!!
REREAD: ff000000, ff000000, ff000000!!!
[root@coyote memburn]# ./memburn 512
Starting test with size 512 megs..
Passed round 2636, elapsed 60944.15.

As can be seen, I restarted it, and its ran quite even more loops now 
without error.  There has been no more Oops, but with memburn eating 
512 megs, half my ram, and kde-3.3 under construction by konstruct, 
I've peaked at nearly a gig of swap, and 754 megs in swap right now.  
Sure, its a bit laggy, but not unusable.

So now the question is since the error address is always odd, which 
stick is it?

Or do I need to sanitize the dimm sockets somehow?

They sure seem to slip in and out easy enough for a socket with that 
many contacts. Not over 3 pounds on each end will seat them, and if 
the clips are re-opened they virtually fall out into your hand.  I'm 
rather more used to having to press 5 to 10 pounds on each end to 
seat them.

Next time I have to reboot, I'm going to 'exercise' them in and out a 
few times just to polish the oxide from the contacts.

>> If you can reproduce the same symptoms on
>> each of them separately, I'd bet on a cache problem.
>
>That makes sense, so I can try that too.  I hadn't thought of that,
>duh!
>
>>Greetings,
>
>Someone else asked if ECC was on, but this board doesn't have it,
> and the memory has a blank pattern where the parity chip would be. 
> So I think its safe to say no :)

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-21 18:55                                                                     ` Gene Heskett
@ 2004-08-22 11:04                                                                       ` Helge Hafting
  2004-08-22 11:40                                                                         ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Helge Hafting @ 2004-08-22 11:04 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel, V13, Barry K. Nathan, Marc Ballarin

On Sat, Aug 21, 2004 at 02:55:13PM -0400, Gene Heskett wrote:
> On Saturday 21 August 2004 14:31, V13 wrote:
> 
> So not only has the problem moved from the 2nd LSB to the MSB of the 
> fetch, but it is a lot more severe in terms of the amount of time to 
> catch one error, now nearly 17 hours.  I'm now up 25 hours and the 
> machine feels good, no Oops so far and I've restarted memburn in 
> addition to konstruct working on kde-3.3 final.  I'm over 100 megs 
> into the swap, and 2.6.8.1-mm2 seems to handling the situation 
> admirably so far.  That knocking sound?  Thats me, knocking on wood 
> for good luck.  :-)

Seems it is the memory, then.
Things getting *better*�when moving memory may mean:
* slight timing problem - in that case the memory might be fine
  at a slower setting.  (Reason for complaints if you must go below spec.)
* Moving memory around rubs dirt, dust and oxide off the contacts, both on
  the memory sticks and the mainboard connectors.  This gives
  better contact and may improve things.  Consider cleaning the
  connectors further.  Also look for dust and hair lying in
  the mainboard connectors.  It happens, especially when some
  slots are free for a long time until memory is added.

Helge Hafting




^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-22 11:04                                                                       ` Helge Hafting
@ 2004-08-22 11:40                                                                         ` Gene Heskett
  0 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-22 11:40 UTC (permalink / raw)
  To: linux-kernel; +Cc: Helge Hafting, V13, Barry K. Nathan, Marc Ballarin

On Sunday 22 August 2004 07:04, Helge Hafting wrote:
>On Sat, Aug 21, 2004 at 02:55:13PM -0400, Gene Heskett wrote:
>> On Saturday 21 August 2004 14:31, V13 wrote:
>>
>> So not only has the problem moved from the 2nd LSB to the MSB of
>> the fetch, but it is a lot more severe in terms of the amount of
>> time to catch one error, now nearly 17 hours.  I'm now up 25 hours
>> and the machine feels good, no Oops so far and I've restarted
>> memburn in addition to konstruct working on kde-3.3 final.  I'm
>> over 100 megs into the swap, and 2.6.8.1-mm2 seems to handling the
>> situation admirably so far.  That knocking sound?  Thats me,
>> knocking on wood for good luck.  :-)
>
>Seems it is the memory, then.
>Things getting *better*�when moving memory may mean:
>* slight timing problem - in that case the memory might be fine
>  at a slower setting.  (Reason for complaints if you must go below
> spec.)

I'd discount this as it made no difference to run it at half speed in 
a bios setting, making a 1400 out of this 2800 athlon at the same 
time the bios signed the ram on as DDR200 dual channel ram.

> * Moving memory around rubs dirt, dust and oxide off the 
> contacts, both on the memory sticks and the mainboard connectors. 
> This gives better contact and may improve things.  Consider
> cleaning the connectors further.  Also look for dust and hair lying
> in the mainboard connectors.  It happens, especially when some
> slots are free for a long time until memory is added.

I think now that this is the scenario in effect.  The next time it 
Oops's, I'll spend some time and reseat both sticks several more 
times.  As this vendor is in Tampa FL, could the storage environment 
there for new mainbooards in their retail packaging box be a factor?  
With the turnover rate Dan has, I wouldn't think so, but then I've 
NDI where they may sit between their assembly in .tw land, and going 
on his shelves in Tampa.  The retail box from Biostar has the board 
in the usual pink bubble-wrap static bag. but it isn't sealed other 
than the end folded over and taped shut.  Ditto for the ram but I 
think thats hand packed per order in the usual grey anti-static, way 
too big, bag.

Right now, memburn hasn't errored again, but konstruct bailed out 
trying to make liboggvorbis, and there is over 830 megs in swap.  I 
should be able to do a swapoff and restart, leaving X/kde, memburn 
and seti running I'd think.  I'll send this and check a swapoff for 
grins.  All this used to run in 512 megs without useing any great 
amount of swap.  :-]

>
>Helge Hafting

Thanks Helge

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-22  5:05                                                                 ` Gene Heskett
@ 2004-08-22 11:42                                                                   ` R. J. Wysocki
  2004-08-24  2:34                                                                   ` Tom Vier
  1 sibling, 0 replies; 146+ messages in thread
From: R. J. Wysocki @ 2004-08-22 11:42 UTC (permalink / raw)
  To: gene.heskett; +Cc: linux-kernel

On Sunday 22 of August 2004 07:05, Gene Heskett wrote:
> On Friday 20 August 2004 16:17, Gene Heskett wrote:
> >On Friday 20 August 2004 16:11, R. J. Wysocki wrote:[...]
> >
> >>There's a simple test you can do unless your DIMMs must go in pairs
> >> (I don't remember if it's required by nforce2): remove one of them
> >> and see what happens.
> >
> >To get dual channel DDR, they have to be in a pair.  Since this
> > post, they've been swapped one for the other, and I'll be curious
> > to see if the address goes to an even address when it errors, which
> > it hasn't yet.
>
> It has, one time in 35 hours now.  The problem is considerably
> reduced.
>
> Whereas the error was always at an odd address, and in the 2nd LSbyte,
> now its still an odd address but the error has moved to the MSB of a
> 32 bit fetch:
>
> [root@coyote memburn]# ./memburn 512
> Starting test with size 512 megs..
> Passed round 2308, elapsed 41225.98.
> FAILED at round 2309/40220063: got ff000000, expected 00000000!!!
> REREAD: ff000000, ff000000, ff000000!!!
> [root@coyote memburn]# ./memburn 512
> Starting test with size 512 megs..
> Passed round 2636, elapsed 60944.15.
>
> As can be seen, I restarted it, and its ran quite even more loops now
> without error.  There has been no more Oops, but with memburn eating
> 512 megs, half my ram, and kde-3.3 under construction by konstruct,
> I've peaked at nearly a gig of swap, and 754 megs in swap right now.
> Sure, its a bit laggy, but not unusable.
>
> So now the question is since the error address is always odd, which
> stick is it?

Hard to tell.  I think the memory controller is interleaving them for 
efficiency but the question remains which one is regarded as the first.

BTW, as it indicates that DRAM is to blame, you can try to fiddle a bit with 
its timings (provided the board setup allows you to do this).  For example, 
you can set them to 3-3-3 or equivalent (generally, push them up) and check 
if this affects the memburn results and how.  Just an idea, you know. ;-)

Greetings,
-- 
Rafael J. Wysocki
----------------------------
For a successful technology, reality must take precedence over public 
relations, for nature cannot be fooled.
					-- Richard P. Feynman

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-22  5:05                                                                 ` Gene Heskett
  2004-08-22 11:42                                                                   ` R. J. Wysocki
@ 2004-08-24  2:34                                                                   ` Tom Vier
  2004-08-24  3:08                                                                     ` Gene Heskett
  1 sibling, 1 reply; 146+ messages in thread
From: Tom Vier @ 2004-08-24  2:34 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

On Sun, Aug 22, 2004 at 01:05:25AM -0400, Gene Heskett wrote:
> Whereas the error was always at an odd address, and in the 2nd LSbyte, 
> now its still an odd address but the error has moved to the MSB of a 
> 32 bit fetch:

are you translating virt->phys?

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-24  2:34                                                                   ` Tom Vier
@ 2004-08-24  3:08                                                                     ` Gene Heskett
  2004-08-25  1:49                                                                       ` Tom Vier
  0 siblings, 1 reply; 146+ messages in thread
From: Gene Heskett @ 2004-08-24  3:08 UTC (permalink / raw)
  To: linux-kernel, Tom Vier

On Monday 23 August 2004 22:34, Tom Vier wrote:
>On Sun, Aug 22, 2004 at 01:05:25AM -0400, Gene Heskett wrote:
>> Whereas the error was always at an odd address, and in the 2nd
>> LSbyte, now its still an odd address but the error has moved to
>> the MSB of a 32 bit fetch:
>
>are you translating virt->phys?

No, this is straight out of the memburn output (after I'd fixed the 
printf formatting strings to actually print full 8 character 
hexidecimal, but not the address of the error, thats in decimal)

I don't know enough about this to nail it to a physical address 
unforch.

And right now I have one of the two sticks pulled, trying to figure 
out which one has the tummy ache, but himem is still compiled in and 
cc1plus is going crazy, eating all ram and 500Megs of swap trying to 
build the libsmoke stuff in the new 3.3 kde.  So I'm about to reboot 
to a no himem support kernel since I only have half a Gig with just 
one stick installed, and see if that fixes cc1plus.


Thanks for asking.  I appreciate it.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-20  7:49                                     ` Nick Piggin
@ 2004-08-24  6:08                                       ` Udo A. Steinberg
  2004-08-24  7:41                                         ` Nick Piggin
  0 siblings, 1 reply; 146+ messages in thread
From: Udo A. Steinberg @ 2004-08-24  6:08 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1492 bytes --]

On Fri, 20 Aug 2004 17:49:55 +1000 Nick Piggin (NP) wrote:

NP> Can you reproduce the OOM with the following patch please? Then
NP> send the output.

I reproduced the problem using a slightly different setup to trigger the
problem faster:  128 MB RAM, 188992 KB swap

Here's the output of the OOM killer with your patch applied:

oom-killer: gfp_mask=0x1d2
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 14, high 42, batch 7
cpu 0 cold: low 0, high 14, batch 7
HighMem per-cpu: empty

Free pages:        1316kB (0kB HighMem)
Active:5281 inactive:23611 dirty:0 writeback:0 unstable:0 free:329 slab:1403 mapped:12232 pagetables:167
DMA free:712kB min:44kB low:88kB high:132kB active:5076kB inactive:5332kB present:16384kB pages_scanned:10112 all_unreclaimable? yes
protections[]: 22 178 178
Normal free:604kB min:312kB low:624kB high:936kB active:16048kB inactive:89112kB present:114688kB pages_scanned:62432 all_unreclaimable? yes
protections[]: 0 156 156
HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 0*4kB 3*8kB 13*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 712kB
Normal: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 604kB
HighMem: empty
Swap cache: add 90886, delete 74524, find 4659/4974, race 0+0
Out of Memory: Killed process 1217 (gphoto2).

-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-24  6:08                                       ` Udo A. Steinberg
@ 2004-08-24  7:41                                         ` Nick Piggin
  2004-08-24 18:20                                           ` Marcelo Tosatti
  0 siblings, 1 reply; 146+ messages in thread
From: Nick Piggin @ 2004-08-24  7:41 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: Andrew Morton, torvalds, linux-kernel

Udo A. Steinberg wrote:

>On Fri, 20 Aug 2004 17:49:55 +1000 Nick Piggin (NP) wrote:
>
>NP> Can you reproduce the OOM with the following patch please? Then
>NP> send the output.
>
>I reproduced the problem using a slightly different setup to trigger the
>problem faster:  128 MB RAM, 188992 KB swap
>
>Here's the output of the OOM killer with your patch applied:
>
>oom-killer: gfp_mask=0x1d2
>DMA per-cpu:
>cpu 0 hot: low 2, high 6, batch 1
>cpu 0 cold: low 0, high 2, batch 1
>Normal per-cpu:
>cpu 0 hot: low 14, high 42, batch 7
>cpu 0 cold: low 0, high 14, batch 7
>HighMem per-cpu: empty
>
>Free pages:        1316kB (0kB HighMem)
>Active:5281 inactive:23611 dirty:0 writeback:0 unstable:0 free:329 slab:1403 mapped:12232 pagetables:167
>DMA free:712kB min:44kB low:88kB high:132kB active:5076kB inactive:5332kB present:16384kB pages_scanned:10112 all_unreclaimable? yes
>protections[]: 22 178 178
>Normal free:604kB min:312kB low:624kB high:936kB active:16048kB inactive:89112kB present:114688kB pages_scanned:62432 all_unreclaimable? yes
>protections[]: 0 156 156
>HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
>protections[]: 0 0 0
>DMA: 0*4kB 3*8kB 13*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 712kB
>Normal: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 604kB
>HighMem: empty
>Swap cache: add 90886, delete 74524, find 4659/4974, race 0+0
>Out of Memory: Killed process 1217 (gphoto2).
>
>

OK, all_unreclaimable caused the scanner to virtually stop. If 
all_unreclaimable
gets set, it throttles the scanning of that zone right back, which in 
turn greatly
lowers the chance that all_unreclaimable will get cleared.

When we get to priority = 0 in try_to_free_pages (ie. close to OOM), it 
might be
worth clearing each zone's all_unreclaimable for this last time 'round 
the loop.


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-24  7:41                                         ` Nick Piggin
@ 2004-08-24 18:20                                           ` Marcelo Tosatti
  2004-08-24 20:00                                             ` Andrew Morton
  0 siblings, 1 reply; 146+ messages in thread
From: Marcelo Tosatti @ 2004-08-24 18:20 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Udo A. Steinberg, Andrew Morton, torvalds, linux-kernel

On Tue, Aug 24, 2004 at 05:41:07PM +1000, Nick Piggin wrote:
> Udo A. Steinberg wrote:
> 
> >On Fri, 20 Aug 2004 17:49:55 +1000 Nick Piggin (NP) wrote:
> >
> >NP> Can you reproduce the OOM with the following patch please? Then
> >NP> send the output.
> >
> >I reproduced the problem using a slightly different setup to trigger the
> >problem faster:  128 MB RAM, 188992 KB swap
> >
> >Here's the output of the OOM killer with your patch applied:
> >
> >oom-killer: gfp_mask=0x1d2
> >DMA per-cpu:
> >cpu 0 hot: low 2, high 6, batch 1
> >cpu 0 cold: low 0, high 2, batch 1
> >Normal per-cpu:
> >cpu 0 hot: low 14, high 42, batch 7
> >cpu 0 cold: low 0, high 14, batch 7
> >HighMem per-cpu: empty
> >
> >Free pages:        1316kB (0kB HighMem)
> >Active:5281 inactive:23611 dirty:0 writeback:0 unstable:0 free:329 
> >slab:1403 mapped:12232 pagetables:167
> >DMA free:712kB min:44kB low:88kB high:132kB active:5076kB inactive:5332kB 
> >present:16384kB pages_scanned:10112 all_unreclaimable? yes
> >protections[]: 22 178 178
> >Normal free:604kB min:312kB low:624kB high:936kB active:16048kB 
> >inactive:89112kB present:114688kB pages_scanned:62432 all_unreclaimable? 
> >yes
> >protections[]: 0 156 156
> >HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB 
> >present:0kB pages_scanned:0 all_unreclaimable? no
> >protections[]: 0 0 0
> >DMA: 0*4kB 3*8kB 13*16kB 1*32kB 1*64kB 1*128kB 1*256kB 0*512kB 0*1024kB 
> >0*2048kB 0*4096kB = 712kB
> >Normal: 1*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 
> >0*2048kB 0*4096kB = 604kB
> >HighMem: empty
> >Swap cache: add 90886, delete 74524, find 4659/4974, race 0+0
> >Out of Memory: Killed process 1217 (gphoto2).
> >
> >

Hi Nick,

> OK, all_unreclaimable caused the scanner to virtually stop. If 
> all_unreclaimable
> gets set, it throttles the scanning of that zone right back, which in 
> turn greatly
> lowers the chance that all_unreclaimable will get cleared.

Which is the logic to stop tasks from shrink_zone()ing zones
which are known to be heavily scanned by kswapd 
(ie zone->pages_scanned > zone->present_pages * 2).

With that logic we want tasks doing direct free to 
blk_congestion_wait(WRITE, HZ/10) instead shrink_zone()ing 
(and blk_congestion_wait(WRITE, HZ/50) on __alloc_pages()).

I dont fully understand the all_unreclaimable logic yet. AFAICS it was
added to prevent tasks from wasting excessive CPU time on shrinking
the lists.

But at the same time it stops tasks from potentially throttling on IO 
(on shrink_list -> pageout). Is that a feature?

> When we get to priority = 0 in try_to_free_pages (ie. close to OOM), it 
> might be
> worth clearing each zone's all_unreclaimable for this last time 'round 
> the loop.

Or ignore all_unreclaimable when priority == 0 like this?

It feels hackish for me but will effectively work as cleaning all_unreclaimable
on zero priority.

Against 2.6.9-rc1-bktoday. 

Udo, one question, do you have swap space available when the OOM killer triggers ?
Dont remember seeing any info wrt to that.

--- mm/vmscan.c.orig	2004-08-24 16:48:09.467086840 -0300
+++ mm/vmscan.c	2004-08-24 16:51:55.304754296 -0300
@@ -878,7 +878,8 @@
 		if (zone->prev_priority > sc->priority)
 			zone->prev_priority = sc->priority;
 
-		if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY)
+		if (zone->all_unreclaimable && 
+				(sc->priority < DEF_PRIORITY && sc->priority > 0))
 			continue;	/* Let kswapd poll it */
 
 		shrink_zone(zone, sc);
@@ -1054,7 +1055,8 @@
 		for (i = 0; i <= end_zone; i++) {
 			struct zone *zone = pgdat->node_zones + i;
 
-			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
+			if (zone->all_unreclaimable && 
+					(priority < DEF_PRIORITY && priority > 0))
 				continue;
 
 			if (nr_pages == 0) {	/* Not software suspend */

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-24 20:00                                             ` Andrew Morton
@ 2004-08-24 18:40                                               ` Marcelo Tosatti
  2004-08-25  0:27                                                 ` Marcelo Tosatti
  0 siblings, 1 reply; 146+ messages in thread
From: Marcelo Tosatti @ 2004-08-24 18:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, us15, torvalds, linux-kernel

On Tue, Aug 24, 2004 at 01:00:27PM -0700, Andrew Morton wrote:
> Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
> >
> > I dont fully understand the all_unreclaimable logic yet.
> 
> 1) bk revtool include/linux/mmzone.h
> 2) double-click on declaration of all_unreclaimable
> 3) read changelog ;)

Will do, but my question is still unanswered, why stop IO throttling when all_unreclaimable
is set? 

Doesnt make sense to me right now.

OK, will RTFS.

> 
> > --- mm/vmscan.c.orig	2004-08-24 16:48:09.467086840 -0300
> > +++ mm/vmscan.c	2004-08-24 16:51:55.304754296 -0300
> > @@ -878,7 +878,8 @@
> >  		if (zone->prev_priority > sc->priority)
> >  			zone->prev_priority = sc->priority;
> >  
> > -		if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY)
> > +		if (zone->all_unreclaimable && 
> > +				(sc->priority < DEF_PRIORITY && sc->priority > 0))
> >  			continue;	/* Let kswapd poll it */
> >  
> >  		shrink_zone(zone, sc);
> > @@ -1054,7 +1055,8 @@
> >  		for (i = 0; i <= end_zone; i++) {
> >  			struct zone *zone = pgdat->node_zones + i;
> >  
> > -			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
> > +			if (zone->all_unreclaimable && 
> > +					(priority < DEF_PRIORITY && priority > 0))
> >  				continue;
> >  
> >  			if (nr_pages == 0) {	/* Not software suspend */
> 
> Does anyone understand _why_ all_unreclaimable is getting set?
> 
> If not, it's too early to be writing patches...

As I wrote down in the first email, kswapd does

                        if (zone->pages_scanned > zone->present_pages * 2)
                                zone->all_unreclaimable = 1;

Sure, it makes perfect sense to happen when we can't unreclaim pages
from the zone.

Its not something hard to understand. What is your point?

I suppose your question is not "_why_ all_unreclaimable is getting set?" but 
"maybe it should not be getting set?". 

Anyway, will RTFS.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-24 18:20                                           ` Marcelo Tosatti
@ 2004-08-24 20:00                                             ` Andrew Morton
  2004-08-24 18:40                                               ` Marcelo Tosatti
  0 siblings, 1 reply; 146+ messages in thread
From: Andrew Morton @ 2004-08-24 20:00 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: nickpiggin, us15, torvalds, linux-kernel

Marcelo Tosatti <marcelo.tosatti@cyclades.com> wrote:
>
> I dont fully understand the all_unreclaimable logic yet.

1) bk revtool include/linux/mmzone.h
2) double-click on declaration of all_unreclaimable
3) read changelog ;)

> --- mm/vmscan.c.orig	2004-08-24 16:48:09.467086840 -0300
> +++ mm/vmscan.c	2004-08-24 16:51:55.304754296 -0300
> @@ -878,7 +878,8 @@
>  		if (zone->prev_priority > sc->priority)
>  			zone->prev_priority = sc->priority;
>  
> -		if (zone->all_unreclaimable && sc->priority != DEF_PRIORITY)
> +		if (zone->all_unreclaimable && 
> +				(sc->priority < DEF_PRIORITY && sc->priority > 0))
>  			continue;	/* Let kswapd poll it */
>  
>  		shrink_zone(zone, sc);
> @@ -1054,7 +1055,8 @@
>  		for (i = 0; i <= end_zone; i++) {
>  			struct zone *zone = pgdat->node_zones + i;
>  
> -			if (zone->all_unreclaimable && priority != DEF_PRIORITY)
> +			if (zone->all_unreclaimable && 
> +					(priority < DEF_PRIORITY && priority > 0))
>  				continue;
>  
>  			if (nr_pages == 0) {	/* Not software suspend */

Does anyone understand _why_ all_unreclaimable is getting set?

If not, it's too early to be writing patches...

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-24 18:40                                               ` Marcelo Tosatti
@ 2004-08-25  0:27                                                 ` Marcelo Tosatti
  0 siblings, 0 replies; 146+ messages in thread
From: Marcelo Tosatti @ 2004-08-25  0:27 UTC (permalink / raw)
  To: Andrew Morton; +Cc: nickpiggin, us15, torvalds, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 3565 bytes --]

 I suppose your question is not "_why_ all_unreclaimable is getting set?" but 
> "maybe it should not be getting set?". 

Now I realize both are the same. Doh.

> Anyway, will RTFS.

Done some tests and I can only get zone->all_unreclaimable to be set near
OOM condition, as expected. 

Udo, can you please confirm you are not hitting lack of swap space by applying
the attached patch (which contains Nick's patch) on top of 2.6.9-rc1. 

I've found a different bug, however: On a 512MB box with 512MB swap running 2.6.9-rc1, 
the OOM kill triggers killing a task with swap space available (the task in case is quintela's 
fillmem). I can only make it happen after having the OOM killer trigger for real. ie:

- run fillmem 1024

setting all_unreclaimable!!
setting all_unreclaimable!!
setting all_unreclaimable!!

oom-killer: gfp_mask=0x1d2
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty
 
Free pages:        2808kB (0kB HighMem)
Active:63316 inactive:62992 dirty:0 writeback:0 unstable:0 free:702 slab:1051 mapped:126279 pagetables:287
DMA free:1440kB min:20kB low:40kB high:60kB active:5428kB inactive:5076kB present:16384kB pages_scanned:8416 all_unreclaimable? yes
protections[]: 10 360 360
Normal free:1368kB min:700kB low:1400kB high:2100kB active:247836kB inactive:246892kB present:507888kB pages_scanned:950688 all_unreclaimable? yes
protections[]: 0 350 350
protections[]: 0 350 350
HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 0*4kB 0*8kB 0*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1440kB
Normal: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1368kB
HighMem: empty
nr_free_swap_pages: 0
Swap cache: add 131105, delete 131105, find 16/28, race 0+0
Out of Memory: Killed process 933 (fillmem).

Perfect. Everything as expected.

- run fillmem 800:

setting all_unreclaimable!!
oom-killer: gfp_mask=0x1d2
DMA per-cpu:
cpu 0 hot: low 2, high 6, batch 1
cpu 0 cold: low 0, high 2, batch 1
Normal per-cpu:
cpu 0 hot: low 32, high 96, batch 16
cpu 0 cold: low 0, high 32, batch 16
HighMem per-cpu: empty
                                                                                
Free pages:        2808kB (0kB HighMem)
Active:126301 inactive:17 dirty:0 writeback:0 unstable:0 free:702 slab:1024 mapped:126333 pagetables:280
DMA free:1440kB min:20kB low:40kB high:60kB active:10508kB inactive:0kB present:16384kB pages_scanned:1000 all_unreclaimable? no
protections[]: 10 360 360
Normal free:1368kB min:700kB low:1400kB high:2100kB active:494696kB inactive:68kB present:507888kB pages_scanned:123 all_unreclaimable? no
protections[]: 0 350 350
HighMem free:0kB min:128kB low:256kB high:384kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
protections[]: 0 0 0
DMA: 0*4kB 2*8kB 3*16kB 3*32kB 0*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1440kB
Normal: 0*4kB 1*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1368kB
HighMem: empty
nr_free_swap_pages: 12167
Swap cache: add 1320161, delete 1319682, find 291280/333316, race 0+0
Out of Memory: Killed process 1010 (fillmem).

Oops. Thats really bad.

Will see if I discover something while trying to understand the
fine source tomorrow morning. Maybe someone can figure out whats
wrong before I try to... 

Bed time.

[-- Attachment #2: vm-reclaim2.patch --]
[-- Type: text/plain, Size: 1331 bytes --]

--- mm/page_alloc.c.orig	2004-08-24 20:37:53.000000000 -0300
+++ mm/page_alloc.c	2004-08-24 22:51:49.498375608 -0300
@@ -1021,11 +1021,12 @@
 void show_free_areas(void)
 {
 	struct page_state ps;
-	int cpu, temperature;
+	int cpu, temperature, i;
 	unsigned long active;
 	unsigned long inactive;
 	unsigned long free;
 	struct zone *zone;
+	unsigned int swap_pages = 0;
 
 	for_each_zone(zone) {
 		show_node(zone);
@@ -1086,6 +1087,8 @@
 			" active:%lukB"
 			" inactive:%lukB"
 			" present:%lukB"
+			" pages_scanned:%lu"
+			" all_unreclaimable? %s"
 			"\n",
 			zone->name,
 			K(zone->free_pages),
@@ -1094,7 +1097,9 @@
 			K(zone->pages_high),
 			K(zone->nr_active),
 			K(zone->nr_inactive),
-			K(zone->present_pages)
+			K(zone->present_pages),
+			zone->pages_scanned,
+			(zone->all_unreclaimable ? "yes" : "no")
 			);
 		printk("protections[]:");
 		for (i = 0; i < MAX_NR_ZONES; i++)
@@ -1125,6 +1130,18 @@
 		printk("= %lukB\n", K(total));
 	}
 
+	swap_list_lock();
+	for (i = 0; i < nr_swapfiles; i++) {
+		if (!(swap_info[i].flags & SWP_USED) ||
+		     (swap_info[i].flags & SWP_WRITEOK))
+                       continue;
+		swap_pages += swap_info[i].inuse_pages;
+	}
+	swap_pages += nr_swap_pages;
+	swap_list_unlock();
+
+	printk("nr_free_swap_pages: %u\n", swap_pages);
+
 	show_swap_cache_info();
 }
 

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-24  3:08                                                                     ` Gene Heskett
@ 2004-08-25  1:49                                                                       ` Tom Vier
  2004-08-25  2:33                                                                         ` Gene Heskett
  2004-08-25  6:13                                                                         ` Denis Vlasenko
  0 siblings, 2 replies; 146+ messages in thread
From: Tom Vier @ 2004-08-25  1:49 UTC (permalink / raw)
  To: Gene Heskett; +Cc: linux-kernel

On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote:
> >are you translating virt->phys?
> 
> No, this is straight out of the memburn output (after I'd fixed the 

that's weird that you're finding that pattern in virtual addresses. i
wouldn't expect that. even if you're booting to single user, certain
variables might change during boot and cause different physical pages to be
mapped. maybe single user is more deterministic than i think, though.

-- 
Tom Vier <tmv@comcast.net>
DSA Key ID 0x15741ECE

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-25  1:49                                                                       ` Tom Vier
@ 2004-08-25  2:33                                                                         ` Gene Heskett
  2004-08-25 14:55                                                                           ` Martin J. Bligh
  2004-08-27 14:01                                                                           ` Gene Heskett
  2004-08-25  6:13                                                                         ` Denis Vlasenko
  1 sibling, 2 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-25  2:33 UTC (permalink / raw)
  To: linux-kernel, Tom Vier

On Tuesday 24 August 2004 21:49, Tom Vier wrote:
>On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote:
>> >are you translating virt->phys?
>>
>> No, this is straight out of the memburn output (after I'd fixed
>> the
>
>that's weird that you're finding that pattern in virtual addresses.
> i wouldn't expect that. even if you're booting to single user,
> certain variables might change during boot and cause different
> physical pages to be mapped. maybe single user is more
> deterministic than i think, though.

Well, FWIW, and not knowing a hell of a lot about it, I would assume 
(there's *that* word again) that even the virtual addresses would be 
long word aligned with reality even if otherwise totally bogus.  I 
mean you'd really have to go out of your way to make it otherwise on 
x86 hardware wouldn't you?

ATM I'm running on one stick, with memburn hacking away at 128 megs 
worth of it, Passed round 5683, elapsed 23530.67 at the moment.  And 
about 100 megs into swap, darnit.  And it isn't running anything else 
unusual, x/kde/kmail/mozilla & an occasional game of sol.

If it runs till tommorrow morning, I'll assume this stick is good, and 
put the other one in the same socket for a similar test.  If it 
passes, then I try the other socket one stick at a time, but first I 
have to get my finger healed up, I somehow drew a bit of blood on the 
end of my little finger using it to lever open the socket clips the 
last time.  A nasty little paper cut type slice I never felt happen 
till I saw the blood.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-25  1:49                                                                       ` Tom Vier
  2004-08-25  2:33                                                                         ` Gene Heskett
@ 2004-08-25  6:13                                                                         ` Denis Vlasenko
  2004-08-29 13:48                                                                           ` Gene Heskett
  1 sibling, 1 reply; 146+ messages in thread
From: Denis Vlasenko @ 2004-08-25  6:13 UTC (permalink / raw)
  To: Tom Vier, Gene Heskett; +Cc: linux-kernel

On Wednesday 25 August 2004 04:49, Tom Vier wrote:
> On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote:
> > >are you translating virt->phys?
> >
> > No, this is straight out of the memburn output (after I'd fixed the
>
> that's weird that you're finding that pattern in virtual addresses. i
> wouldn't expect that. even if you're booting to single user, certain
> variables might change during boot and cause different physical pages to be
> mapped. maybe single user is more deterministic than i think, though.

On x86, pages are aligned at 4k. Lower 12 bits of virtual address
match lower 12 bits of corresponding real address.

So, yes, if you hit bad RAM cell, you see random virtual address, but
three last digits of it (in hex) must be the same.
--
vda

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-25  2:33                                                                         ` Gene Heskett
@ 2004-08-25 14:55                                                                           ` Martin J. Bligh
  2004-08-25 17:23                                                                             ` Ryan Cumming
  2004-08-27 14:01                                                                           ` Gene Heskett
  1 sibling, 1 reply; 146+ messages in thread
From: Martin J. Bligh @ 2004-08-25 14:55 UTC (permalink / raw)
  To: linux-kernel

This whole thread makes me think ... if we oops, shouldn't we check if
we're holding any spinlocks or semaphores, and just panic the whole
machine if so? Not sure how expensive it would be to hold that state,
but ...

M.


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-25 14:55                                                                           ` Martin J. Bligh
@ 2004-08-25 17:23                                                                             ` Ryan Cumming
  2004-08-25 17:36                                                                               ` Martin J. Bligh
  0 siblings, 1 reply; 146+ messages in thread
From: Ryan Cumming @ 2004-08-25 17:23 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 359 bytes --]

On Wednesday 25 August 2004 07:55, Martin J. Bligh wrote:
> This whole thread makes me think ... if we oops, shouldn't we check if
> we're holding any spinlocks or semaphores, and just panic the whole
> machine if so? Not sure how expensive it would be to hold that state,
> but ...

On preempt, wouldn't it just be a matter of checking preempt_count?

-Ryan

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-25 17:23                                                                             ` Ryan Cumming
@ 2004-08-25 17:36                                                                               ` Martin J. Bligh
  0 siblings, 0 replies; 146+ messages in thread
From: Martin J. Bligh @ 2004-08-25 17:36 UTC (permalink / raw)
  To: Ryan Cumming; +Cc: linux-kernel

--Ryan Cumming <ryan@spitfire.gotdns.org> wrote (on Wednesday, August 25, 2004 10:23:29 -0700):

> On Wednesday 25 August 2004 07:55, Martin J. Bligh wrote:
>> This whole thread makes me think ... if we oops, shouldn't we check if
>> we're holding any spinlocks or semaphores, and just panic the whole
>> machine if so? Not sure how expensive it would be to hold that state,
>> but ...
> 
> On preempt, wouldn't it just be a matter of checking preempt_count?

Spinlocks, with or without preeempt, can probably do something like this.
But I don't think that works for sems.

M.


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-25  2:33                                                                         ` Gene Heskett
  2004-08-25 14:55                                                                           ` Martin J. Bligh
@ 2004-08-27 14:01                                                                           ` Gene Heskett
  1 sibling, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-27 14:01 UTC (permalink / raw)
  To: linux-kernel; +Cc: Tom Vier

On Tuesday 24 August 2004 22:33, Gene Heskett wrote:
(going for the longest running thread on lkml)
>On Tuesday 24 August 2004 21:49, Tom Vier wrote:
>>On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote:
>>> >are you translating virt->phys?
>>>
>>> No, this is straight out of the memburn output (after I'd fixed
>>> the
>>
>>that's weird that you're finding that pattern in virtual addresses.
>> i wouldn't expect that. even if you're booting to single user,
>> certain variables might change during boot and cause different
>> physical pages to be mapped. maybe single user is more
>> deterministic than i think, though.
>
>Well, FWIW, and not knowing a hell of a lot about it, I would assume
>(there's *that* word again) that even the virtual addresses would be
>long word aligned with reality even if otherwise totally bogus.  I
>mean you'd really have to go out of your way to make it otherwise on
>x86 hardware wouldn't you?
>
>ATM I'm running on one stick, with memburn hacking away at 128 megs
>worth of it, Passed round 5683, elapsed 23530.67 at the moment.  And
>about 100 megs into swap, darnit.  And it isn't running anything
> else unusual, x/kde/kmail/mozilla & an occasional game of sol.
>
>If it runs till tommorrow morning, I'll assume this stick is good,
> and put the other one in the same socket for a similar test.  If it
> passes, then I try the other socket one stick at a time.

Ok, I've now shuffled both sticks thru both "B" sockets on this mobo, 
and neither one could run memburn more than 20 minutes, and again, 
the errors are all in the xx of nnnnxxnn in hex display formats.
So, I've put both sticks back in, in the A and B2 sockets ATM.  That 
ran about 25 minutes before memburn got a tummy ache.  In the 
meantime I'd rebuilt 2.6.9-rc1-mm1 with hi-mem support again, and the 
last reboot I took a detour thru the bios and turned the memory 
voltage up 100mv to 2.6 volts.  Running memburn against 400 megs of 
it has now been running for around 40 minutes.

So, my question for the hardware folks is: Whats the proper voltage to 
run a bank of DDR400 dimms in Dual Channel mode?

Humm, I spoke too soon, memburn has exited, with this:
Ahh, fudge, one cannot copy/paste from a virtual term.  Suffice to say 
that the address is odd, that the error is in the usual 3rd byte of 4 
position within the 32 bit read.

Since it appears that TCWO is gone, bankrupt or whatever, and I like 
the features of this board otherwise, can someone suggest howto go 
about getting a warranty replacement direct from Biostar?  I'll go 
visit the web page again, but I don't recall seeing any suitable 
links the last time I was there checking on updated bios files.


-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-25  6:13                                                                         ` Denis Vlasenko
@ 2004-08-29 13:48                                                                           ` Gene Heskett
  2004-08-29 14:34                                                                             ` Possible dcache BUG [u] Martin Schlemmer [c]
  2004-08-29 15:21                                                                             ` Possible dcache BUG Rafael J. Wysocki
  0 siblings, 2 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-29 13:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: Denis Vlasenko, Tom Vier

On Wednesday 25 August 2004 02:13, Denis Vlasenko wrote:
>On Wednesday 25 August 2004 04:49, Tom Vier wrote:
>> On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote:
>> > >are you translating virt->phys?
>> >
>> > No, this is straight out of the memburn output (after I'd fixed
>> > the
>>
>> that's weird that you're finding that pattern in virtual
>> addresses. i wouldn't expect that. even if you're booting to
>> single user, certain variables might change during boot and cause
>> different physical pages to be mapped. maybe single user is more
>> deterministic than i think, though.
>
>On x86, pages are aligned at 4k. Lower 12 bits of virtual address
>match lower 12 bits of corresponding real address.
>
>So, yes, if you hit bad RAM cell, you see random virtual address,
> but three last digits of it (in hex) must be the same.

I think, based on the last 25 hours of running both memburn and 
setiathome at a -nice 19, and there have been no errors, that I might 
have stumbled onto a fix.

It seems the dram is marked DDR400, so I was trying to run it that 
way.  Unforch, on checking the invoice for the umpteenth time, it 
finally dawned on me that this particular AMD 2800XP is supposedly a 
333mhz FSB chip, and not rated for use with DDR400 memory.  Switching 
the bios setting for the memory to 'auto' from 'spd' seems to effect 
this particular item, and the memory now signs in as DDR333 Dual 
Channel.

And after 25 hours, no errors, nothing unusual in the logs.

I guess I should go paint my face with egg or something...  My 
apologies to those who spent a considerable amount of time and brain 
power auditing code because of my stupidity.

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG [u]
  2004-08-29 13:48                                                                           ` Gene Heskett
@ 2004-08-29 14:34                                                                             ` Martin Schlemmer [c]
  2004-08-29 15:21                                                                             ` Possible dcache BUG Rafael J. Wysocki
  1 sibling, 0 replies; 146+ messages in thread
From: Martin Schlemmer [c] @ 2004-08-29 14:34 UTC (permalink / raw)
  To: gene.heskett; +Cc: Linux Kernel Mailing Lists, Denis Vlasenko, Tom Vier

[-- Attachment #1: Type: text/plain, Size: 3489 bytes --]

On Sun, 2004-08-29 at 15:48, Gene Heskett wrote:

> I think, based on the last 25 hours of running both memburn and 
> setiathome at a -nice 19, and there have been no errors, that I might 
> have stumbled onto a fix.
> 
> It seems the dram is marked DDR400, so I was trying to run it that 
> way.  Unforch, on checking the invoice for the umpteenth time, it 
> finally dawned on me that this particular AMD 2800XP is supposedly a 
> 333mhz FSB chip, and not rated for use with DDR400 memory.  Switching 
> the bios setting for the memory to 'auto' from 'spd' seems to effect 
> this particular item, and the memory now signs in as DDR333 Dual 
> Channel.
> 
> And after 25 hours, no errors, nothing unusual in the logs.
> 

I work for a supplier here in ZA, and out of experience memory
compatibility can be a vast gray area.

For instance:
1) You have exactly the same Chipset (say nforce2 400's or whatever),
   but different vendors that assembles the board (say Asus/MSI and
   Gigabyte).  You take PC3200 CL3 sticks, and they work fine on the
   Asus and MSI, but dont work on the Gigabyte (only one of the long
   list of memory issues Gigabyte boards have - in my experience). It
   has a lot to do with how the vendor does the timings, etc.
2) You have 4 sticks of Hynix memory, all for have the exact chips on.
   Two have the older pcboard layout, and the other two have the newer.
   The older ones give intermittant issues on D865GBR (Bayfield boards -
   cant remember the exact code) if you try to run them in dual channel
   mode, but works fine with only one stick.  The board works fine in
   dual channel mode with the new revision pcb sticks.
3) P4 SiS chipsets have a bad habit of only running two sticks together
   (non-dual channel chipsets ... 645, 650, 651, with identical sticks)
   if you clock the memory down to to the bus of the cpu (400mhz cpu
   only runs fine with memory at 200 fsb, and 533 with memory at 266 -
   remember, its the true speed of the cpu/memory, not the '4x pumped'
   one Intel likes to advertise with, or the 'double data rate' speed
   memory is advertised with).  With a single chip, it usually runs fine
   at 333mhz on 533mhz fsb cpu - cant remember with 400mhz cpu.


That was just some examples to show that vendor/revision/config can make
a huge difference, and lots of headaces.  In your case, here is a few
points you could look at.

In general the boards I worked with, worked fine with a 333fsb cpu,
running memory at 400mhz.  Last I checked, this might be issues:
1) All nforce2 chipsets had a certain errata that caused timing issues
   with ddr400 memory with a CL latency of 2.  You had to tipically
   downclock the memory to 333mhz, or set the CL latency up to 2.5 or
   3. Good example is the popular Kingston Hyper-X sticks.  I am not
   sure if they might have sorted it out on later chipsets.
2) Hynix memory tipically did not work too great, especially in dual
   channel mode.  The best memory to use was usually the Samsung PC3200
   CL3 ones if you did not want to fork too much (except if you had some
   of the Gigabyte boards customers brought to us when they only got the
   memory from us - do they ever learn not to shop around if it comes to
   board and memory?)
3) *sometimes* a bios update did help.


Anyhow, just a few things I ran into that you might have a look at -
sorry its a bit late in this thread.


-- 
Martin Schlemmer

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-29 13:48                                                                           ` Gene Heskett
  2004-08-29 14:34                                                                             ` Possible dcache BUG [u] Martin Schlemmer [c]
@ 2004-08-29 15:21                                                                             ` Rafael J. Wysocki
  2004-08-29 17:23                                                                               ` Denis Vlasenko
  1 sibling, 1 reply; 146+ messages in thread
From: Rafael J. Wysocki @ 2004-08-29 15:21 UTC (permalink / raw)
  To: gene.heskett, linux-kernel

On Sunday 29 of August 2004 15:48, Gene Heskett wrote:
> On Wednesday 25 August 2004 02:13, Denis Vlasenko wrote:
> >On Wednesday 25 August 2004 04:49, Tom Vier wrote:
> >> On Mon, Aug 23, 2004 at 11:08:41PM -0400, Gene Heskett wrote:
> >> > >are you translating virt->phys?
> >> >
> >> > No, this is straight out of the memburn output (after I'd fixed
> >> > the
> >>
> >> that's weird that you're finding that pattern in virtual
> >> addresses. i wouldn't expect that. even if you're booting to
> >> single user, certain variables might change during boot and cause
> >> different physical pages to be mapped. maybe single user is more
> >> deterministic than i think, though.
> >
> >On x86, pages are aligned at 4k. Lower 12 bits of virtual address
> >match lower 12 bits of corresponding real address.
> >
> >So, yes, if you hit bad RAM cell, you see random virtual address,
> > but three last digits of it (in hex) must be the same.
>
> I think, based on the last 25 hours of running both memburn and
> setiathome at a -nice 19, and there have been no errors, that I might
> have stumbled onto a fix.
>
> It seems the dram is marked DDR400, so I was trying to run it that
> way.  Unforch, on checking the invoice for the umpteenth time, it
> finally dawned on me that this particular AMD 2800XP is supposedly a
> 333mhz FSB chip, and not rated for use with DDR400 memory.  Switching
> the bios setting for the memory to 'auto' from 'spd' seems to effect
> this particular item, and the memory now signs in as DDR333 Dual
> Channel.
>
> And after 25 hours, no errors, nothing unusual in the logs.
>
> I guess I should go paint my face with egg or something...

Not necessarily.  :-)  Some mobos based on the nforce2 chipsets should be able 
to clock FSB and memory asynchronously.   The very fact that you can set the 
memory clock separately in the BIOS indicates that your mobo is one of these.  
So, if it runs well at synchronous FSB and memory clock rates, but causes 
problems otherwise, the northbridge is probably fishy.  Or the memory is not 
up to the spec.  Anyway, the symptoms are quite "interesting" and it's good 
to know what they are.

Regards,
RJW

-- 
For a successful technology, reality must take precedence over public 
relations, for nature cannot be fooled.
					-- Richard P. Feynman

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-29 15:21                                                                             ` Possible dcache BUG Rafael J. Wysocki
@ 2004-08-29 17:23                                                                               ` Denis Vlasenko
  2004-08-29 22:25                                                                                 ` Gene Heskett
  0 siblings, 1 reply; 146+ messages in thread
From: Denis Vlasenko @ 2004-08-29 17:23 UTC (permalink / raw)
  To: Rafael J. Wysocki, gene.heskett, linux-kernel

> > I think, based on the last 25 hours of running both memburn and
> > setiathome at a -nice 19, and there have been no errors, that I might
> > have stumbled onto a fix.
> >
> > It seems the dram is marked DDR400, so I was trying to run it that
> > way.  Unforch, on checking the invoice for the umpteenth time, it
> > finally dawned on me that this particular AMD 2800XP is supposedly a
> > 333mhz FSB chip, and not rated for use with DDR400 memory.  Switching
> > the bios setting for the memory to 'auto' from 'spd' seems to effect
> > this particular item, and the memory now signs in as DDR333 Dual
> > Channel.
> >
> > And after 25 hours, no errors, nothing unusual in the logs.
> >
> > I guess I should go paint my face with egg or something...
>
> Not necessarily.  :-)  Some mobos based on the nforce2 chipsets should be
> able to clock FSB and memory asynchronously.   The very fact that you can
> set the memory clock separately in the BIOS indicates that your mobo is one
> of these. So, if it runs well at synchronous FSB and memory clock rates,
> but causes problems otherwise, the northbridge is probably fishy.  Or the
> memory is not up to the spec.  Anyway, the symptoms are quite "interesting"
> and it's good to know what they are.

The best thing is, we got another RAM test program which seems to be better
than memtest86 in some cases!
--
vda


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-29 17:23                                                                               ` Denis Vlasenko
@ 2004-08-29 22:25                                                                                 ` Gene Heskett
  0 siblings, 0 replies; 146+ messages in thread
From: Gene Heskett @ 2004-08-29 22:25 UTC (permalink / raw)
  To: linux-kernel; +Cc: Denis Vlasenko, Rafael J. Wysocki

On Sunday 29 August 2004 13:23, Denis Vlasenko wrote:
[...]
>> Not necessarily.  :-)  Some mobos based on the nforce2 chipsets
>> should be able to clock FSB and memory asynchronously.   The very
>> fact that you can set the memory clock separately in the BIOS
>> indicates that your mobo is one of these. So, if it runs well at
>> synchronous FSB and memory clock rates, but causes problems
>> otherwise, the northbridge is probably fishy.  Or the memory is
>> not up to the spec.  Anyway, the symptoms are quite "interesting"
>> and it's good to know what they are.

Take you pick, unless you'd rather use a shovel. :-)

The bios has provisions but the nforce2 chipset doesn't seem to want 
to tolerate what must be an occasional timing error on the write 
phase.  An inadequate amount of buffering available would be my best 
guess.  I don't believe the reads are defective in this case, just 
the writes go tits up on a very very narrow case thats only hit maybe 
once an hour.  Thats damned hard for a logic analyzer to catch.

>The best thing is, we got another RAM test program which seems to be
> better than memtest86 in some cases!

I've been thinking of that myself, and I've come to the conclusion 
that because memtest86 probably doesn't know anything about an 
nforce2 chipset, it says right up front its not using the cache.  And 
that may well be the key right there.  Turn off the cache and theres 
no problem.  I tried that here just for grins, but it turned the 
machine into a very sick dog, going from 8 or 9 seti units a day down 
to about 1.5, and everything else was swimming in cold molasses.  I 
could easily type a whole line ahead of kmails display updates and 
I'm not a touch typist, topping out at maybe 10-15 wpm, not counting 
the time spent backing up and fixing typu's.  Ancient fingers don't 
always hit the key cleanly.

So you are correct in that memtest86 ground away on this machine for 
something like 36 hours total run time, and never found an error.  I 
fired up memburn and had an error within a half hour.  Therefore to 
me, its proven to be a valuable tool, thank you Ville Herva.

>--
>vda

-- 
Cheers, Gene
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.24% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-08-13  1:31                             ` Linus Torvalds
                                                 ` (2 preceding siblings ...)
  2004-08-20  7:02                               ` Udo A. Steinberg
@ 2004-09-12  7:03                               ` Udo A. Steinberg
  2004-09-12  7:16                                 ` Andrew Morton
  3 siblings, 1 reply; 146+ messages in thread
From: Udo A. Steinberg @ 2004-09-12  7:03 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel, Andrew Morton, Len Brown

[-- Attachment #1: Type: text/plain, Size: 4370 bytes --]

On Thu, 12 Aug 2004 18:31:31 -0700 (PDT) Linus Torvalds (LT) wrote:

LT> Your slab usage seems to be:
LT> 
LT> 	cumulative	     usage	name
LT> 	=========	    ======	====
LT> 		.....
LT> 	  4,994,684	   499,712	size-8192
LT> 	  5,912,188	   917,504	size-32768
LT> 	105,397,820	99,485,632	size-64
LT> 
LT> Something pretty much stands out.
LT> 
LT> What the _heck_ is doing 64-byte allocations and leaking them?

I think the offender is ACPI. I've been logging 64-byte slab allocations
for a while now and this is what I came up with:

The most frequent user of 64-byte allocations is:

 [<c013e98f>] __kmalloc+0x6f/0x80
 [<c016649e>] sys_poll+0xbe/0x230
 [<c0165201>] sys_ioctl+0x101/0x2a0
 [<c0165940>] __pollwait+0x0/0xc0
 [<c011f00c>] sys_gettimeofday+0x2c/0x70
 [<c01040db>] syscall_call+0x7/0xb

But that doesn't seem to leak, because I've had these happen for days before
things started getting bad.

However, then as slab usage went skyrocket after 3 days, I started logging
these:

 [<c013e98f>] __kmalloc+0x6f/0x80
 [<c0217af9>] acpi_os_allocate+0xa/0xb
 [<c022b9b6>] acpi_ut_callocate+0x30/0x7a
 [<c022b840>] acpi_ut_acquire_from_cache+0x9d/0xaa
 [<c022c7d8>] acpi_ut_create_generic_state+0xa/0x12
 [<c021b0b2>] acpi_ds_result_stack_push+0x8/0x25
 [<c021b268>] acpi_ds_create_walk_state+0x53/0x70
 [<c0227913>] acpi_ps_delete_parse_tree+0x20/0x89
 [<c0227238>] acpi_ps_parse_loop+0x550/0x7bb
 [<c02274f0>] acpi_ps_parse_aml+0x4d/0x1a1
 [<c0219dd4>] acpi_ds_call_control_method+0xd3/0x1b3
 [<c0227505>] acpi_ps_parse_aml+0x62/0x1a1
 [<c0227d1f>] acpi_psx_execute+0x13b/0x194
 [<c0225212>] acpi_ns_execute_control_method+0x3b/0x47
 [<c02251c0>] acpi_ns_evaluate_by_handle+0x6f/0x86
 [<c02250cd>] acpi_ns_evaluate_relative+0xa9/0xc3
 [<c02249c3>] acpi_evaluate_object+0xf3/0x1a0
 [<c0160f56>] link_path_walk+0xbe6/0xe70
 [<c022f496>] acpi_battery_get_status+0x68/0x102
 [<c022f9b6>] acpi_battery_read_state+0x88/0x275
 [<c018124b>] proc_file_read+0xbb/0x250
 [<c0152ea1>] vfs_read+0xd1/0x130
 [<c0153171>] sys_read+0x41/0x70
 [<c01040db>] syscall_call+0x7/0xb

 [<c013e98f>] __kmalloc+0x6f/0x80       
 [<c0217af9>] acpi_os_allocate+0xa/0xb  
 [<c022b9b6>] acpi_ut_callocate+0x30/0x7a
 [<c022b840>] acpi_ut_acquire_from_cache+0x9d/0xaa
 [<c022c7d8>] acpi_ut_create_generic_state+0xa/0x12
 [<c0227a31>] acpi_ps_push_scope+0xf/0x57          
 [<c0227180>] acpi_ps_parse_loop+0x498/0x7bb
 [<c02274f0>] acpi_ps_parse_aml+0x4d/0x1a1  
 [<c0227d1f>] acpi_psx_execute+0x13b/0x194
 [<c0225212>] acpi_ns_execute_control_method+0x3b/0x47
 [<c02251c0>] acpi_ns_evaluate_by_handle+0x6f/0x86    
 [<c02250cd>] acpi_ns_evaluate_relative+0xa9/0xc3 
 [<c02249c3>] acpi_evaluate_object+0xf3/0x1a0    
 [<c022370f>] acpi_hw_low_level_read+0x56/0x94
 [<c0230949>] acpi_ec_gpe_query+0xd5/0xec     
 [<c0218098>] acpi_os_execute_deferred+0xc/0x16
 [<c012a43e>] worker_thread+0x1ae/0x270        
 [<c021808c>] acpi_os_execute_deferred+0x0/0x16
 [<c0117d70>] default_wake_function+0x0/0x10   
 [<c0117db7>] __wake_up_common+0x37/0x70    
 [<c0117d70>] default_wake_function+0x0/0x10
 [<c012a290>] worker_thread+0x0/0x270       
 [<c012e266>] kthread+0x96/0xe0         
 [<c012e1d0>] kthread+0x0/0xe0
 [<c010229d>] kernel_thread_helper+0x5/0x18

 [<c013e98f>] __kmalloc+0x6f/0x80
 [<c0217af9>] acpi_os_allocate+0xa/0xb
 [<c022b9b6>] acpi_ut_callocate+0x30/0x7a
 [<c022b840>] acpi_ut_acquire_from_cache+0x9d/0xaa
 [<c022c7d8>] acpi_ut_create_generic_state+0xa/0x12
 [<c0227a31>] acpi_ps_push_scope+0xf/0x57
 [<c0227180>] acpi_ps_parse_loop+0x498/0x7bb
 [<c02274f0>] acpi_ps_parse_aml+0x4d/0x1a1
 [<c0219dd4>] acpi_ds_call_control_method+0xd3/0x1b3
 [<c0227505>] acpi_ps_parse_aml+0x62/0x1a1
 [<c0227d1f>] acpi_psx_execute+0x13b/0x194
 [<c0225212>] acpi_ns_execute_control_method+0x3b/0x47
 [<c02251c0>] acpi_ns_evaluate_by_handle+0x6f/0x86
 [<c02250cd>] acpi_ns_evaluate_relative+0xa9/0xc3
 [<c02249c3>] acpi_evaluate_object+0xf3/0x1a0
 [<c02185ff>] acpi_evaluate_integer+0x2d/0x4b
 [<c0146a17>] do_mmap_pgoff+0x537/0x710
 [<c0234048>] acpi_thermal_get_temperature+0x24/0x31
 [<c0234808>] acpi_thermal_temp_seq_show+0x12/0x4d
 [<c017125e>] seq_read+0xbe/0x280
 [<c0152ea1>] vfs_read+0xd1/0x130
 [<c0153171>] sys_read+0x41/0x70
 [<c01040db>] syscall_call+0x7/0xb

The machine is now allocating 64-byte slabs at about 20 objects per second.
I'm currently running 2.6.9-rc1-bk12 here.

-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-09-12  7:03                               ` Udo A. Steinberg
@ 2004-09-12  7:16                                 ` Andrew Morton
  2004-09-12  7:29                                   ` Udo A. Steinberg
  0 siblings, 1 reply; 146+ messages in thread
From: Andrew Morton @ 2004-09-12  7:16 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: torvalds, linux-kernel, len.brown

"Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote:
>
>  However, then as slab usage went skyrocket after 3 days, I started logging
>  these:
> 
>   [<c013e98f>] __kmalloc+0x6f/0x80
>   [<c0217af9>] acpi_os_allocate+0xa/0xb
>   [<c022b9b6>] acpi_ut_callocate+0x30/0x7a
>   [<c022b840>] acpi_ut_acquire_from_cache+0x9d/0xaa
>   [<c022c7d8>] acpi_ut_create_generic_state+0xa/0x12
>   [<c021b0b2>] acpi_ds_result_stack_push+0x8/0x25
>   [<c021b268>] acpi_ds_create_walk_state+0x53/0x70
>   [<c0227913>] acpi_ps_delete_parse_tree+0x20/0x89
>   [<c0227238>] acpi_ps_parse_loop+0x550/0x7bb
>   [<c02274f0>] acpi_ps_parse_aml+0x4d/0x1a1
>   [<c0219dd4>] acpi_ds_call_control_method+0xd3/0x1b3
>   [<c0227505>] acpi_ps_parse_aml+0x62/0x1a1
>   [<c0227d1f>] acpi_psx_execute+0x13b/0x194
>   [<c0225212>] acpi_ns_execute_control_method+0x3b/0x47
>   [<c02251c0>] acpi_ns_evaluate_by_handle+0x6f/0x86
>   [<c02250cd>] acpi_ns_evaluate_relative+0xa9/0xc3
>   [<c02249c3>] acpi_evaluate_object+0xf3/0x1a0
>   [<c0160f56>] link_path_walk+0xbe6/0xe70
>   [<c022f496>] acpi_battery_get_status+0x68/0x102
>   [<c022f9b6>] acpi_battery_read_state+0x88/0x275
>   [<c018124b>] proc_file_read+0xbb/0x250
>   [<c0152ea1>] vfs_read+0xd1/0x130
>   [<c0153171>] sys_read+0x41/0x70
>   [<c01040db>] syscall_call+0x7/0xb

great, thanks for working that out.

Random guess: acpi_evaluate_object() is returning an error but is
allocating memory anyway.

In acpi_battery_get_status():

	status = acpi_evaluate_object(battery->handle, "_BST", NULL, &buffer);
	if (ACPI_FAILURE(status)) {
		ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "Error evaluating _BST\n"));
		return_VALUE(-ENODEV);
	}

Is that failure path being taken?

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-09-12  7:16                                 ` Andrew Morton
@ 2004-09-12  7:29                                   ` Udo A. Steinberg
  2004-09-12  7:48                                     ` Andrew Morton
  0 siblings, 1 reply; 146+ messages in thread
From: Udo A. Steinberg @ 2004-09-12  7:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: torvalds, linux-kernel, len.brown

[-- Attachment #1: Type: text/plain, Size: 544 bytes --]

On Sun, 12 Sep 2004 00:16:26 -0700 Andrew Morton (AM) wrote:

AM> Random guess: acpi_evaluate_object() is returning an error but is
AM> allocating memory anyway.
AM> 
AM> In acpi_battery_get_status():
AM> 
AM> 	status = acpi_evaluate_object(battery->handle, "_BST", NULL, &buffer);
AM> 	if (ACPI_FAILURE(status)) {
AM> 		ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "Error evaluating _BST\n"));
AM> 		return_VALUE(-ENODEV);
AM> 	}
AM> 
AM> Is that failure path being taken?

Is there a way for me to find that out without recompiling and rebooting?

-Udo.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-09-12  7:29                                   ` Udo A. Steinberg
@ 2004-09-12  7:48                                     ` Andrew Morton
  2004-09-13  4:53                                       ` Len Brown
  0 siblings, 1 reply; 146+ messages in thread
From: Andrew Morton @ 2004-09-12  7:48 UTC (permalink / raw)
  To: Udo A. Steinberg; +Cc: torvalds, linux-kernel, len.brown

"Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote:
>
> On Sun, 12 Sep 2004 00:16:26 -0700 Andrew Morton (AM) wrote:
> 
>  AM> Random guess: acpi_evaluate_object() is returning an error but is
>  AM> allocating memory anyway.
>  AM> 
>  AM> In acpi_battery_get_status():
>  AM> 
>  AM> 	status = acpi_evaluate_object(battery->handle, "_BST", NULL, &buffer);
>  AM> 	if (ACPI_FAILURE(status)) {
>  AM> 		ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "Error evaluating _BST\n"));
>  AM> 		return_VALUE(-ENODEV);
>  AM> 	}
>  AM> 
>  AM> Is that failure path being taken?
> 
>  Is there a way for me to find that out without recompiling and rebooting?

Not sure.  Looks like you need to set CONFIG_ACPI_DEBUG and then put the
right number into /proc/acpi/debug_layer.

^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
  2004-09-12  7:48                                     ` Andrew Morton
@ 2004-09-13  4:53                                       ` Len Brown
  0 siblings, 0 replies; 146+ messages in thread
From: Len Brown @ 2004-09-13  4:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Udo A. Steinberg, Linus Torvalds, linux-kernel, ACPI Developers

On Sun, 2004-09-12 at 03:48, Andrew Morton wrote:
> "Udo A. Steinberg" <us15@os.inf.tu-dresden.de> wrote:
> >
> > On Sun, 12 Sep 2004 00:16:26 -0700 Andrew Morton (AM) wrote:
> >
> >  AM> Random guess: acpi_evaluate_object() is returning an error but
> is
> >  AM> allocating memory anyway.
> >  AM>
> >  AM> In acpi_battery_get_status():
> >  AM>
> >  AM>  status = acpi_evaluate_object(battery->handle, "_BST", NULL,
> &buffer);
> >  AM>  if (ACPI_FAILURE(status)) {
> >  AM>          ACPI_DEBUG_PRINT((ACPI_DB_ERROR, "Error evaluating
> _BST\n"));
> >  AM>          return_VALUE(-ENODEV);
> >  AM>  }
> >  AM>
> >  AM> Is that failure path being taken?
> >
> >  Is there a way for me to find that out without recompiling and
> rebooting?
> 
> Looks like you need to set CONFIG_ACPI_DEBUG and then put the
> right number into /proc/acpi/debug_layer.

For the battery module:
# echo 0x00040000 > /proc/acpi/debug_layer


and then to turn on everything about it:
# echo 0xffffffff > /proc/acpi/debug_level

These hooks exist only if the kernel is built with CONFIG_ACPI_DEBUG.

It would be interesting to know if you can examine the contents of
/proc/acpi/battery/*/*

thanks,
-Len



^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
@ 2004-08-20  8:08 Daniel Blueman
  0 siblings, 0 replies; 146+ messages in thread
From: Daniel Blueman @ 2004-08-20  8:08 UTC (permalink / raw)
  To: gene.heskett, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="us-ascii", Size: 1553 bytes --]

I find that memtest86 [1] does a great job of checking memory, especially
since you can boot the available ISO image.

Perhaps worth a try here?

--- [1]

http://www.memtest86.com/

---

There is still that possibility Marcelo.  Someone recommended I get 
cpuburn and memburn, and before fixing the scanf statement (it was 
broken) in memburn, I had compiled it for a 512 meg test the first 
time, and a 768 meg test the next couple of runs.

All exited with errors like this:
Passed round 133, elapsed 4827.19.
FAILED at round 134/14208927: got ff00, expected 0!!!

REREAD: ff00, ff00, ff00!!!

[root@coyote memburn]# vim memburn.c
[root@coyote memburn]# gcc -o memburn memburn.c
[root@coyote memburn]# ./memburn
Starting test with size 768 megs..

Passed round 0, elapsed 44.36.
Passed round 1, elapsed 74.13.
Passed round 2, elapsed 105.12.
FAILED at round 3/25777183: got 2b00, expected 0!!!

REREAD: 2b00, 2b00, 2b00!!!

I've now rebuilt it with a better printf format string, and its 
running over 768 megs again.  But this time the round counter is up 
to 90 and still going...

Interesting too is that memburn has now allocated a 768 meg wide block 
5 times, and still no Oops.  Over a hundred megs in swap, but its 
still running.

I lost the BUG_ON patches in fs/buffer.c, this is now 2.6.8.1-mm2 (but 
I can go back if this fails of course)

Or can I just copy that 2.6.8-rc4/fs/buffer.c file over this one?

-- 
Daniel J Blueman

NEU: Bis zu 10 GB Speicher für e-mails & Dateien!
1 GB bereits bei GMX FreeMail http://www.gmx.net/de/go/mail


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Re: Possible dcache BUG
       [not found]             ` <2rUjF-Od-11@gated-at.bofh.it>
@ 2004-08-11 12:32               ` Andi Kleen
  0 siblings, 0 replies; 146+ messages in thread
From: Andi Kleen @ 2004-08-11 12:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, us15

"David S. Miller" <davem@redhat.com> writes:

> On Tue, 10 Aug 2004 22:13:01 -0700 (PDT)
> Linus Torvalds <torvalds@osdl.org> wrote:
>
>> I also wonder what the 
>> hell is allocating so many 8kB and 32kB entries.
>
> Loopback default MTU is 16K these days, might explain
> the 32K entries but not the 8KB ones.  Perhaps the
> later are being used for page tables?  Just a guess
> on that latter one.

Kernel stacks more likely. 200 processes = 200 8K entries.
Unless he used suic^w4K stack mode. 

-Andi


^ permalink raw reply	[flat|nested] 146+ messages in thread

* Possible dcache BUG
@ 2004-08-05 14:54 Brett Charbeneau
  0 siblings, 0 replies; 146+ messages in thread
From: Brett Charbeneau @ 2004-08-05 14:54 UTC (permalink / raw)
  To: linux-kernel

Greetings,

	I am getting the oops below - twice since 7/26, but I haven't a 
clue what's causing it.
	I am not a subscriber, so any replies directed to me would be 
gratefully received.
	Thank you for your hard work on this!

-- 

Brett Charbeneau, Network Administrator         Tel: 757-259-7750
Williamsburg Regional Library                   FAX: 757-259-7798
7770 Croaker Road                               brett@wrl.org
Williamsburg, VA 23188-7064                     http://www.wrl.org


ksymoops 2.4.9 on i686 2.4.26.  Options used
     -V (default)
     -k /proc/ksyms (default)
     -l /proc/modules (default)
     -o /lib/modules/2.4.26/ (default)
     -m /boot/System.map (specified)

1151MB HIGHMEM available.
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
kernel BUG at dcache.c:345!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c014322d>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010206
eax: 00040000   ebx: eb8d7c70   ecx: c281b394   edx: e5636700
esi: eb8d7c58   edi: c281b394   ebp: d2b15f34   esp: d2b15f08
ds: 0018   es: 0018   ss: 0018
Process umount (pid: 14814, stackpage=d2b15000)
Stack: c0128f81 c281b49c c281f000 00000246 d2b15f34 f721e1a0 00000466 f721e178 
       f721e178 f721e178 c02991c0 d2b15f44 c01435a6 00000150 f7b6f400 d2b15f5c 
       c013714f f721e178 d2b15f88 08052179 0804d82b d2b15f7c c013afea f7b6f400 
Call Trace:    [<c0128f81>] [<c01435a6>] [<c013714f>] [<c013afea>] [<c01472d0>]
  [<c01472ee>] [<c0106d93>]
Code: 0f 0b 59 01 1e d6 25 c0 8d 56 10 8b 4a 04 8b 46 10 89 48 04 


>>EIP; c014322d <prune_dcache+5d/140>   <=====

>>ebx; eb8d7c70 <_end+2b5bb734/384f6ac4>
>>ecx; c281b394 <_end+24fee58/384f6ac4>
>>edx; e5636700 <_end+2531a1c4/384f6ac4>
>>esi; eb8d7c58 <_end+2b5bb71c/384f6ac4>
>>edi; c281b394 <_end+24fee58/384f6ac4>
>>ebp; d2b15f34 <_end+127f99f8/384f6ac4>
>>esp; d2b15f08 <_end+127f99cc/384f6ac4>

Trace; c0128f81 <kmem_cache_free+1c1/270>
Trace; c01435a6 <shrink_dcache_parent+16/30>
Trace; c013714f <kill_super+5f/f0>
Trace; c013afea <path_release+2a/40>
Trace; c01472d0 <sys_umount+80/90>
Trace; c01472ee <sys_oldumount+e/20>
Trace; c0106d93 <system_call+33/38>

Code;  c014322d <prune_dcache+5d/140>
00000000 <_EIP>:
Code;  c014322d <prune_dcache+5d/140>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c014322f <prune_dcache+5f/140>
   2:   59                        pop    %ecx
Code;  c0143230 <prune_dcache+60/140>
   3:   01 1e                     add    %ebx,(%esi)
Code;  c0143232 <prune_dcache+62/140>
   5:   d6                        (bad)  
Code;  c0143233 <prune_dcache+63/140>
   6:   25 c0 8d 56 10            and    $0x10568dc0,%eax
Code;  c0143238 <prune_dcache+68/140>
   b:   8b 4a 04                  mov    0x4(%edx),%ecx
Code;  c014323b <prune_dcache+6b/140>
   e:   8b 46 10                  mov    0x10(%esi),%eax
Code;  c014323e <prune_dcache+6e/140>
  11:   89 48 04                  mov    %ecx,0x4(%eax)

kernel BUG at dcache.c:345!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c014322d>]    Not tainted
EFLAGS: 00010206
eax: 00040000   ebx: ea612c70   ecx: c281b394   edx: dd1f64bc
esi: ea612c58   edi: c281b394   ebp: c2825f00   esp: c2825ed4
ds: 0018   es: 0018   ss: 0018
Process kswapd (pid: 4, stackpage=c2825000)
Stack: 00000187 00000003 c2825ef4 c0128525 c281b418 d8728000 c281b418 00000006 
       00000000 c233bfb0 00000003 c2825f0c c01435e2 00000d1d c2825f4c c012a284 
       00000006 000001d0 c2824000 ffffffff 00012199 000001d0 c02970d0 c2825f50 
Call Trace:    [<c0128525>] [<c01435e2>] [<c012a284>] [<c012a462>] [<c012a501>]
  [<c012a580>] [<c012a739>] [<c012a7b6>] [<c012a8ff>] [<c012a860>] [<c0105000>]
  [<c01055b6>] [<c012a860>]
Code: 0f 0b 59 01 1e d6 25 c0 8d 56 10 8b 4a 04 8b 46 10 89 48 04 


>>EIP; c014322d <prune_dcache+5d/140>   <=====

>>ebx; ea612c70 <_end+2a2f6734/384f6ac4>
>>ecx; c281b394 <_end+24fee58/384f6ac4>
>>edx; dd1f64bc <_end+1ced9f80/384f6ac4>
>>esi; ea612c58 <_end+2a2f671c/384f6ac4>
>>edi; c281b394 <_end+24fee58/384f6ac4>
>>ebp; c2825f00 <_end+25099c4/384f6ac4>
>>esp; c2825ed4 <_end+2509998/384f6ac4>

Trace; c0128525 <__kmem_cache_shrink_locked+45/70>
Trace; c01435e2 <shrink_dcache_memory+22/40>
Trace; c012a284 <shrink_cache+294/370>
Trace; c012a462 <refill_inactive+102/170>
Trace; c012a501 <shrink_caches+31/40>
Trace; c012a580 <try_to_free_pages_zone+70/f0>
Trace; c012a739 <kswapd_balance_pgdat+59/b0>
Trace; c012a7b6 <kswapd_balance+26/40>
Trace; c012a8ff <kswapd+9f/c0>
Trace; c012a860 <kswapd+0/c0>
Trace; c0105000 <_stext+0/0>
Trace; c01055b6 <arch_kernel_thread+26/40>
Trace; c012a860 <kswapd+0/c0>

Code;  c014322d <prune_dcache+5d/140>
00000000 <_EIP>:
Code;  c014322d <prune_dcache+5d/140>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c014322f <prune_dcache+5f/140>
   2:   59                        pop    %ecx
Code;  c0143230 <prune_dcache+60/140>
   3:   01 1e                     add    %ebx,(%esi)
Code;  c0143232 <prune_dcache+62/140>
   5:   d6                        (bad)  
Code;  c0143233 <prune_dcache+63/140>
   6:   25 c0 8d 56 10            and    $0x10568dc0,%eax
Code;  c0143238 <prune_dcache+68/140>
   b:   8b 4a 04                  mov    0x4(%edx),%ecx
Code;  c014323b <prune_dcache+6b/140>
   e:   8b 46 10                  mov    0x10(%esi),%eax
Code;  c014323e <prune_dcache+6e/140>
  11:   89 48 04                  mov    %ecx,0x4(%eax)





^ permalink raw reply	[flat|nested] 146+ messages in thread

end of thread, other threads:[~2004-09-13  4:53 UTC | newest]

Thread overview: 146+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-08-02 13:14 Possible dcache BUG Brett Charbeneau
2004-08-05  2:16 ` Gene Heskett
2004-08-05  3:46   ` Andrew Morton
2004-08-05  4:31     ` Gene Heskett
2004-08-05  0:44       ` Chris Shoemaker
2004-08-05  8:35         ` Denis Vlasenko
2004-08-05 14:14           ` Gene Heskett
2004-08-05 13:48         ` Gene Heskett
     [not found]           ` <200408210118.02011.vda@port.imtp.ilyichevsk.odessa.ua>
2004-08-21  1:40             ` Gene Heskett
2004-08-05  8:33       ` Denis Vlasenko
2004-08-05 14:19         ` Gene Heskett
     [not found]           ` <200408070203.35268.vda@port.imtp.ilyichevsk.odessa.ua>
2004-08-07  1:28             ` Gene Heskett
2004-08-05 21:26         ` Chris Shoemaker
2004-08-05  7:25   ` Linus Torvalds
2004-08-05  7:31     ` Andrew Morton
2004-08-05  8:33     ` Denis Vlasenko
2004-08-05 14:55       ` Gene Heskett
2004-08-05 16:26       ` Linus Torvalds
2004-08-05 18:06         ` Ingo Molnar
2004-08-05 18:50           ` Linus Torvalds
2004-08-05 20:29             ` Andi Kleen
     [not found]             ` <20040806073739.GA6617@elte.hu>
     [not found]               ` <20040806004231.143c8bd2.akpm@osdl.org>
2004-08-06  8:27                 ` Ingo Molnar
2004-08-06 11:51                 ` Gene Heskett
2004-08-06 16:58                   ` Linus Torvalds
2004-08-06 17:16                     ` Gene Heskett
2004-08-06 17:26                       ` William Lee Irwin III
2004-08-06 23:19                         ` Chris Shoemaker
2004-08-07  4:15                           ` William Lee Irwin III
2004-08-07  0:05                             ` Chris Shoemaker
2004-08-07  5:50                               ` William Lee Irwin III
2004-08-06 23:09                     ` Chris Shoemaker
2004-08-07  6:20                       ` Linus Torvalds
2004-08-07 12:38                         ` Gene Heskett
2004-08-07 13:44                         ` Chris Shoemaker
2004-08-07 18:49                           ` Linus Torvalds
2004-08-07 19:01                           ` Gene Heskett
2004-08-06 11:31               ` Andi Kleen
2004-08-06 17:16               ` Linus Torvalds
2004-08-05 21:10         ` Chris Shoemaker
2004-08-06  2:03         ` Gene Heskett
2004-08-06  2:12         ` Gene Heskett
2004-08-06  2:50     ` Linus Torvalds
2004-08-06  3:18       ` viro
2004-08-06  3:24         ` Linus Torvalds
2004-08-08  4:42           ` Gene Heskett
2004-08-08 14:30           ` Gene Heskett
2004-08-08 18:39             ` Andrew Morton
2004-08-10  4:12               ` Gene Heskett
2004-08-11  3:42                 ` Gene Heskett
2004-08-11  3:46                   ` Linus Torvalds
2004-08-11  4:18                     ` Udo A. Steinberg
2004-08-11  5:13                       ` Linus Torvalds
2004-08-11  5:15                         ` Linus Torvalds
2004-08-11  5:33                           ` Udo A. Steinberg
2004-08-11 14:37                           ` Gene Heskett
2004-08-12  1:26                             ` Nick Piggin
2004-08-12  2:23                               ` Gene Heskett
2004-08-12  2:36                                 ` Nick Piggin
2004-08-13  1:00                           ` Udo A. Steinberg
2004-08-13  1:31                             ` Linus Torvalds
2004-08-13  2:03                               ` Gene Heskett
2004-08-13  2:27                               ` Andreas Dilger
2004-08-13  3:33                                 ` Linus Torvalds
2004-08-20  7:02                               ` Udo A. Steinberg
2004-08-20  7:11                                 ` Andrew Morton
2004-08-20  7:19                                   ` Udo A. Steinberg
2004-08-20  7:49                                     ` Nick Piggin
2004-08-24  6:08                                       ` Udo A. Steinberg
2004-08-24  7:41                                         ` Nick Piggin
2004-08-24 18:20                                           ` Marcelo Tosatti
2004-08-24 20:00                                             ` Andrew Morton
2004-08-24 18:40                                               ` Marcelo Tosatti
2004-08-25  0:27                                                 ` Marcelo Tosatti
2004-09-12  7:03                               ` Udo A. Steinberg
2004-09-12  7:16                                 ` Andrew Morton
2004-09-12  7:29                                   ` Udo A. Steinberg
2004-09-12  7:48                                     ` Andrew Morton
2004-09-13  4:53                                       ` Len Brown
2004-08-11  5:55                         ` David S. Miller
2004-08-11  4:47                     ` Gene Heskett
2004-08-11  4:59                       ` Linus Torvalds
2004-08-11  8:05                         ` Roger Luethi
2004-08-13  4:27                         ` Gene Heskett
2004-08-13  8:32                           ` Gene Heskett
2004-08-14  2:18                           ` Marcelo Tosatti
2004-08-14  5:19                             ` Gene Heskett
2004-08-14  5:50                             ` Gene Heskett
2004-08-14  8:17                             ` Gene Heskett
2004-08-15  4:09                               ` Gene Heskett
2004-08-15  8:48                                 ` viro
2004-08-15  9:42                                   ` Gene Heskett
2004-08-15 17:31                                     ` Andrew Morton
2004-08-15 17:58                                       ` Gene Heskett
2004-08-15  9:50                                   ` Gene Heskett
2004-08-15 10:36                                     ` viro
2004-08-15 10:10                                   ` Gene Heskett
2004-08-15 10:37                                     ` viro
2004-08-15 10:42                                       ` Gene Heskett
2004-08-15 11:00                                         ` viro
     [not found]                                       ` <200408150704.49312.gene.heskett@verizon.net>
2004-08-15 11:26                                         ` viro
2004-08-15 17:47                                           ` Gene Heskett
     [not found]                                             ` <200408152257.04773.vda@port.imtp.ilyichevsk.odessa.ua>
2004-08-15 20:33                                               ` Gene Heskett
     [not found]                                                 ` <200408160803.15206.vda@port.imtp.ilyichevsk.odessa.ua>
2004-08-16  6:32                                                   ` Gene Heskett
2004-08-16 14:13                                                     ` Gene Heskett
     [not found]                                                       ` <200408161749.23663.vda@port.imtp.ilyichevsk.odessa.ua>
2004-08-16 15:25                                                         ` Gene Heskett
2004-08-16 22:52                                   ` Gene Heskett
2004-08-16 23:01                                     ` viro
2004-08-17  4:44                                       ` Gene Heskett
2004-08-17  4:58                                         ` Nick Piggin
2004-08-17  5:26                                           ` Gene Heskett
2004-08-17 11:57                                             ` Nick Piggin
2004-08-19  9:41                                               ` Gene Heskett
2004-08-19 18:36                                                 ` Marcelo Tosatti
2004-08-20  2:38                                                   ` Gene Heskett
2004-08-20  7:33                                                     ` Marcelo Tosatti
2004-08-20 15:06                                                       ` Gene Heskett
2004-08-20 15:43                                                         ` V13
2004-08-20 17:29                                                           ` Gene Heskett
2004-08-20 18:13                                                             ` Marc Ballarin
2004-08-20 20:08                                                               ` Gene Heskett
2004-08-21  9:25                                                                 ` Barry K. Nathan
2004-08-21 18:31                                                                   ` V13
2004-08-21 18:55                                                                     ` Gene Heskett
2004-08-22 11:04                                                                       ` Helge Hafting
2004-08-22 11:40                                                                         ` Gene Heskett
2004-08-20 20:11                                                             ` R. J. Wysocki
2004-08-20 20:17                                                               ` Gene Heskett
2004-08-22  5:05                                                                 ` Gene Heskett
2004-08-22 11:42                                                                   ` R. J. Wysocki
2004-08-24  2:34                                                                   ` Tom Vier
2004-08-24  3:08                                                                     ` Gene Heskett
2004-08-25  1:49                                                                       ` Tom Vier
2004-08-25  2:33                                                                         ` Gene Heskett
2004-08-25 14:55                                                                           ` Martin J. Bligh
2004-08-25 17:23                                                                             ` Ryan Cumming
2004-08-25 17:36                                                                               ` Martin J. Bligh
2004-08-27 14:01                                                                           ` Gene Heskett
2004-08-25  6:13                                                                         ` Denis Vlasenko
2004-08-29 13:48                                                                           ` Gene Heskett
2004-08-29 14:34                                                                             ` Possible dcache BUG [u] Martin Schlemmer [c]
2004-08-29 15:21                                                                             ` Possible dcache BUG Rafael J. Wysocki
2004-08-29 17:23                                                                               ` Denis Vlasenko
2004-08-29 22:25                                                                                 ` Gene Heskett
2004-08-05 14:54 Brett Charbeneau
     [not found] <2oKTA-5CQ-65@gated-at.bofh.it>
     [not found] ` <2r0U7-3yx-9@gated-at.bofh.it>
     [not found]   ` <2rwhh-BX-15@gated-at.bofh.it>
     [not found]     ` <2rShM-7QP-5@gated-at.bofh.it>
     [not found]       ` <2rSrs-7Vn-1@gated-at.bofh.it>
     [not found]         ` <2rSUw-8lw-3@gated-at.bofh.it>
     [not found]           ` <2rTGR-se-3@gated-at.bofh.it>
     [not found]             ` <2rUjF-Od-11@gated-at.bofh.it>
2004-08-11 12:32               ` Andi Kleen
2004-08-20  8:08 Daniel Blueman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).