All of lore.kernel.org
 help / color / mirror / Atom feed
* lost memory on a 4GB amd64
@ 2004-09-16  4:48 Sergei Haller
  2004-09-16 13:30 ` Andrew Walrond
  2004-09-16 13:48 ` Andrew Walrond
  0 siblings, 2 replies; 40+ messages in thread
From: Sergei Haller @ 2004-09-16  4:48 UTC (permalink / raw)
  To: linux-kernel


Hello,

A friend of mine has a new Opteron based machine (Tyan Tiger K8W with two 
Opteron 24?) and 4GB main memory.

the problem is that about 512 MB of that memory is lost (AGP aperture and 
stuff). Although everything is perfect otherwise.
As far as I understand, all the PCI/AGP hardware uses the top end of the 
4GB address range to access their memory and there is just an 
"overlapping" of the addresses. thus only the remaining 3.5 GB are 
available.


Now there is an option in the BIOS called "Adjust Memory" which puts a 
certain amount of memory (several choices between 64MB and 2GB) above the 
4GB address range. I tried the 2GB setting which results in 2GB main 
memory at addresses 0-2GB and 2GB memory at addresses 4GB-6GB.

the problem is that the kernel (2.6.3-9mdksmp and vanilla 2.6.8.1) crashes
if this option is enabled as soon as some memory expensive program is run
(e.g. X)

I've seen some postings on the net talking about some "kernel patch" for
some "memory split", but nothing more specific. Do I just need a certain
patch to get it working or is there more to it?



BTW, the memory map displayed at boot is

 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000d3ff0000 (usable)
 BIOS-e820: 00000000d3ff0000 - 00000000d3fff000 (ACPI data)
 BIOS-e820: 00000000d3fff000 - 00000000d4000000 (ACPI NVS)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)

if I leave the 4GB memory in one chunk and 

 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
 BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
 BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000180000000 (usable)

if I enable the "adjust memory" option and split the memory in two 2GB 
blocks. 

Thanks in advance,

        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-16  4:48 lost memory on a 4GB amd64 Sergei Haller
@ 2004-09-16 13:30 ` Andrew Walrond
  2004-09-16 13:48 ` Andrew Walrond
  1 sibling, 0 replies; 40+ messages in thread
From: Andrew Walrond @ 2004-09-16 13:30 UTC (permalink / raw)
  To: linux-kernel

Hi Sergei,

I have the same board with 4Gb ram running a 64bit 2.6.8.1 and solved it by 
changing something in the bios. Let me reboot and I'll take a note of what I 
did...

On Thursday 16 Sep 2004 05:48, Sergei Haller wrote:
> Hello,
>
> A friend of mine has a new Opteron based machine (Tyan Tiger K8W with two
> Opteron 24?) and 4GB main memory.
>
> the problem is that about 512 MB of that memory is lost (AGP aperture and
> stuff). Although everything is perfect otherwise.
> As far as I understand, all the PCI/AGP hardware uses the top end of the
> 4GB address range to access their memory and there is just an
> "overlapping" of the addresses. thus only the remaining 3.5 GB are
> available.
>
>
> Now there is an option in the BIOS called "Adjust Memory" which puts a
> certain amount of memory (several choices between 64MB and 2GB) above the
> 4GB address range. I tried the 2GB setting which results in 2GB main
> memory at addresses 0-2GB and 2GB memory at addresses 4GB-6GB.
>
> the problem is that the kernel (2.6.3-9mdksmp and vanilla 2.6.8.1) crashes
> if this option is enabled as soon as some memory expensive program is run
> (e.g. X)
>
> I've seen some postings on the net talking about some "kernel patch" for
> some "memory split", but nothing more specific. Do I just need a certain
> patch to get it working or is there more to it?
>
>
>
> BTW, the memory map displayed at boot is
>
>  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
>  BIOS-e820: 0000000000100000 - 00000000d3ff0000 (usable)
>  BIOS-e820: 00000000d3ff0000 - 00000000d3fff000 (ACPI data)
>  BIOS-e820: 00000000d3fff000 - 00000000d4000000 (ACPI NVS)
>  BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
>
> if I leave the 4GB memory in one chunk and
>
>  BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>  BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
>  BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
>  BIOS-e820: 0000000000100000 - 000000007fff0000 (usable)
>  BIOS-e820: 000000007fff0000 - 000000007ffff000 (ACPI data)
>  BIOS-e820: 000000007ffff000 - 0000000080000000 (ACPI NVS)
>  BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
>  BIOS-e820: 0000000100000000 - 0000000180000000 (usable)
>
> if I enable the "adjust memory" option and split the memory in two 2GB
> blocks.
>
> Thanks in advance,
>
>         Sergei

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-16  4:48 lost memory on a 4GB amd64 Sergei Haller
  2004-09-16 13:30 ` Andrew Walrond
@ 2004-09-16 13:48 ` Andrew Walrond
  2004-09-16 14:09   ` Sergei Haller
  1 sibling, 1 reply; 40+ messages in thread
From: Andrew Walrond @ 2004-09-16 13:48 UTC (permalink / raw)
  To: linux-kernel; +Cc: Sergei Haller

On Thursday 16 Sep 2004 05:48, Sergei Haller wrote:
> Hello,
>
> A friend of mine has a new Opteron based machine (Tyan Tiger K8W with two
> Opteron 24?) and 4GB main memory.

Typo? Tyan Thunder?

>
> the problem is that about 512 MB of that memory is lost (AGP aperture and
> stuff). Although everything is perfect otherwise.
> As far as I understand, all the PCI/AGP hardware uses the top end of the
> 4GB address range to access their memory and there is just an
> "overlapping" of the addresses. thus only the remaining 3.5 GB are
> available.
>
>
> Now there is an option in the BIOS called "Adjust Memory" which puts a
> certain amount of memory (several choices between 64MB and 2GB) above the
> 4GB address range. I tried the 2GB setting which results in 2GB main
> memory at addresses 0-2GB and 2GB memory at addresses 4GB-6GB.
>

Ok;

Assuming bios version 2.02. (upgrade if you haven't already);

The option you mention should be set to 'Auto'

Chipset->Northbridge->Memory Configuration->Adjust Memory = Auto

but set

Advanced->Cpu Configuration->MTRR Mapping = Continuous

That fixed it for me if I remember correctly :)

Andrew Walrond

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-16 13:48 ` Andrew Walrond
@ 2004-09-16 14:09   ` Sergei Haller
  2004-09-16 14:28     ` Andrew Walrond
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-09-16 14:09 UTC (permalink / raw)
  To: Andrew Walrond; +Cc: linux-kernel

On Thu, 16 Sep 2004, Andrew Walrond (AW) wrote:

AW> On Thursday 16 Sep 2004 05:48, Sergei Haller wrote:
AW> > Hello,
AW> >
AW> > A friend of mine has a new Opteron based machine (Tyan Tiger K8W with two
AW> > Opteron 24?) and 4GB main memory.
AW> 
AW> Typo? Tyan Thunder?

no, it's a tiger: http://www.tyan.com/products/html/tigerk8w.html

AW> > the problem is that about 512 MB of that memory is lost (AGP aperture and
AW> > stuff). Although everything is perfect otherwise.
AW> > As far as I understand, all the PCI/AGP hardware uses the top end of the
AW> > 4GB address range to access their memory and there is just an
AW> > "overlapping" of the addresses. thus only the remaining 3.5 GB are
AW> > available.
AW> >
AW> >
AW> > Now there is an option in the BIOS called "Adjust Memory" which puts a
AW> > certain amount of memory (several choices between 64MB and 2GB) above the
AW> > 4GB address range. I tried the 2GB setting which results in 2GB main
AW> > memory at addresses 0-2GB and 2GB memory at addresses 4GB-6GB.
AW> >
AW> 
AW> Ok;
AW> 
AW> Assuming bios version 2.02.

yes

AW> The option you mention should be set to 'Auto'
AW> 
AW> Chipset->Northbridge->Memory Configuration->Adjust Memory = Auto
AW> 
AW> but set
AW> 
AW> Advanced->Cpu Configuration->MTRR Mapping = Continuous

I had "MTRR Mapping = Continuous" set all the time and tried "Adjust 
Memory" in all three modes (Auto/manual/disabled) and manual with 1 and 
2gb size.

today I had discovered the MTRR option and changed it to "discrete".
tried "Adjust Memory" manually at 2gb. 

the only working (but with loss of memory) combination seems to be "Adjust
Memory = disabled" and independant of "MTRR Mapping".

The only combination I didn't try is "MTRR Mapping=Discrete"+"Adjust
Memory= Auto". Will try tomorrow morning.


AW> That fixed it for me if I remember correctly :)

do you have a thunder or a tiger? And could you check in your BIOS setup 
whih options you used?



thanks,
        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-16 14:09   ` Sergei Haller
@ 2004-09-16 14:28     ` Andrew Walrond
  2004-09-16 14:56       ` Sergei Haller
  0 siblings, 1 reply; 40+ messages in thread
From: Andrew Walrond @ 2004-09-16 14:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Sergei Haller

On Thursday 16 Sep 2004 15:09, you wrote:
> AW> Typo? Tyan Thunder?
>
> no, it's a tiger: http://www.tyan.com/products/html/tigerk8w.html

Ah - ok; I thought the Tiger was their dual athlon board. Didn't realise they 
had a dual opteron version.

I have a Thunder K8W.

>
> AW> The option you mention should be set to 'Auto'
> AW>
> AW> Chipset->Northbridge->Memory Configuration->Adjust Memory = Auto
> AW>
> AW> but set
> AW>
> AW> Advanced->Cpu Configuration->MTRR Mapping = Continuous
>
> I had "MTRR Mapping = Continuous" set all the time and tried "Adjust
> Memory" in all three modes (Auto/manual/disabled) and manual with 1 and
> 2gb size.
>
> today I had discovered the MTRR option and changed it to "discrete".
> tried "Adjust Memory" manually at 2gb.
>
> the only working (but with loss of memory) combination seems to be "Adjust
> Memory = disabled" and independant of "MTRR Mapping".
>
> The only combination I didn't try is "MTRR Mapping=Discrete"+"Adjust
> Memory= Auto". Will try tomorrow morning.
>

On further investigation, The settings I mentioned, 'Auto' and 'Continuous' 
only work when running a 64bit kernel. Are you running a 32bit kernel?

Andrew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-16 14:28     ` Andrew Walrond
@ 2004-09-16 14:56       ` Sergei Haller
  2004-09-16 15:19         ` Andrew Walrond
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-09-16 14:56 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Walrond

On Thu, 16 Sep 2004, Andrew Walrond (AW) wrote:

AW> 
AW> On further investigation, The settings I mentioned, 'Auto' and 'Continuous' 
AW> only work when running a 64bit kernel. Are you running a 32bit kernel?

it's a 64bit one. the precise setting for the processor is 
"AMD-Opteron/Athlon64". Should I try "Generic-x86-64"?

BTW, 32bit processors are not offered at all.


        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-16 14:56       ` Sergei Haller
@ 2004-09-16 15:19         ` Andrew Walrond
  2004-09-16 15:52           ` Sergei Haller
  0 siblings, 1 reply; 40+ messages in thread
From: Andrew Walrond @ 2004-09-16 15:19 UTC (permalink / raw)
  To: Sergei Haller; +Cc: linux-kernel

On Thursday 16 Sep 2004 15:56, Sergei Haller wrote:
> On Thu, 16 Sep 2004, Andrew Walrond (AW) wrote:
>
> AW>
> AW> On further investigation, The settings I mentioned, 'Auto' and
> 'Continuous' AW> only work when running a 64bit kernel. Are you running a
> 32bit kernel?
>
> it's a 64bit one. the precise setting for the processor is
> "AMD-Opteron/Athlon64". Should I try "Generic-x86-64"?

No - thats what I use. Do you have MTRR support enabled?

I'll send you my .config file; Perhaps you could try that.

Andrew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-16 15:19         ` Andrew Walrond
@ 2004-09-16 15:52           ` Sergei Haller
  2004-09-18 14:18             ` Sergei Haller
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-09-16 15:52 UTC (permalink / raw)
  To: Andrew Walrond

On Thu, 16 Sep 2004, Andrew Walrond (AW) wrote:

AW> On Thursday 16 Sep 2004 15:56, Sergei Haller wrote:
AW> > On Thu, 16 Sep 2004, Andrew Walrond (AW) wrote:
AW> >
AW> > AW>
AW> > AW> On further investigation, The settings I mentioned, 'Auto' and
AW> > 'Continuous' AW> only work when running a 64bit kernel. Are you running a
AW> > 32bit kernel?
AW> >
AW> > it's a 64bit one. the precise setting for the processor is
AW> > "AMD-Opteron/Athlon64". Should I try "Generic-x86-64"?
AW> 
AW> No - thats what I use. Do you have MTRR support enabled?

yes.

AW> I'll send you my .config file; Perhaps you could try that.

I just had a look at it. tomorrow morning I'll try out some of the 
options. If you like, I can send you my .config, so you can tell me which 
options are more likely to affect memory handling.

c ya
        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-16 15:52           ` Sergei Haller
@ 2004-09-18 14:18             ` Sergei Haller
  2004-09-19 20:01               ` Jon Masters
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-09-18 14:18 UTC (permalink / raw)
  To: Andrew Walrond

On Fri, 17 Sep 2004, Sergei Haller (SH) wrote:

SH> AW> No - thats what I use. Do you have MTRR support enabled?
SH> 
SH> yes.
SH> 
SH> AW> I'll send you my .config file; Perhaps you could try that.
SH> 
SH> I just had a look at it. tomorrow morning I'll try out some of the
SH> options. 

I tried out many configurations of the kernel config, nothing helped.

now I switched off SMP and it runs stable! So what am I to do about it?

that's the summary:

* if the memory is configured in one chunk (0-4gb) then the SMP kernel 
  works, but I only have about 3.4 gb main memory. (I know why)

* if the memory configuration is as follows: the first 3gb ar at the 
  normal address range, the fourth gb is at the address range 4-5gb.
  then all 4gb are available (not quite -- a few mb ere missing, but 
  thats ok) and 
   - the SMP kernel panics as soon as I start X or allocate about 1.6gb of
     memory (maybe less would trigger that as well, that was the only test
     I ran) ahh, kernel compillation runs fine.
   - the non-SMP kernel runs stable.
   - memtest86 runs fine

all kernels I mention are 2.6.8.1 vanilla.

What do you think? Is there anything I can do?



        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-18 14:18             ` Sergei Haller
@ 2004-09-19 20:01               ` Jon Masters
  2004-09-19 21:47                 ` Sergei Haller
  0 siblings, 1 reply; 40+ messages in thread
From: Jon Masters @ 2004-09-19 20:01 UTC (permalink / raw)
  To: Sergei Haller; +Cc: linux-kernel

On Sun, 19 Sep 2004 00:18:38 +1000 (EST), Sergei Haller 

> * if the memory configuration is as follows: the first 3gb ar at the
>   normal address range, the fourth gb is at the address range 4-5gb.
>   then all 4gb are available (not quite -- a few mb ere missing, but
>   thats ok) and
>    - the SMP kernel panics as soon as I start X

Just out of interest - can you say what tests you ran here - for
example whether you tried allocating large amounts of memory from a
userspace process without running X and/or touching bits of memory
mapped hardware? You say a kernel compile works fine so can you rule
out this being X taking down the system (you're previous mail seemed
somehat unclear).

Jon.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-19 20:01               ` Jon Masters
@ 2004-09-19 21:47                 ` Sergei Haller
  2004-09-19 22:00                   ` Jon Masters
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-09-19 21:47 UTC (permalink / raw)
  To: jonathan; +Cc: linux-kernel

On Sun, 19 Sep 2004, Jon Masters (JM) wrote:

JM> On Sun, 19 Sep 2004 00:18:38 +1000 (EST), Sergei Haller 
JM> 
JM> > * if the memory configuration is as follows: the first 3gb ar at the
JM> >   normal address range, the fourth gb is at the address range 4-5gb.
JM> >   then all 4gb are available (not quite -- a few mb ere missing, but
JM> >   thats ok) and
JM> >    - the SMP kernel panics as soon as I start X
JM> 
JM> Just out of interest - can you say what tests you ran here - for
JM> example whether you tried allocating large amounts of memory from a
JM> userspace process without running X and/or touching bits of memory
JM> mapped hardware? You say a kernel compile works fine so can you rule
JM> out this being X taking down the system (you're previous mail seemed
JM> somehat unclear).

- as soon as I start X, the machine is gone.
- if I compile the kernel (without X of course) its ok.

the machine is supposed to be for scientific calculations and we run magma 
on it. The test I ran is just a one-liner which creates a matrix of size
20000x20000 with zeros in it. so basically it just allocates 1.6gb of 
memory and writes zeros in it. the SMP kernel with the above memory 
configuration crashes immediately.

I guess I should write a simple C-program using malloc or something to 
reproduce the crash in the simplest possible way, shouldn't I?

        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-19 21:47                 ` Sergei Haller
@ 2004-09-19 22:00                   ` Jon Masters
  2004-09-19 22:19                     ` Sergei Haller
  0 siblings, 1 reply; 40+ messages in thread
From: Jon Masters @ 2004-09-19 22:00 UTC (permalink / raw)
  To: Sergei Haller; +Cc: linux-kernel

On Mon, 20 Sep 2004 07:47:20 +1000 (EST), Sergei Haller 

> I guess I should write a simple C-program using malloc or something to
> reproduce the crash in the simplest possible way, shouldn't I?

You've answered your own question Sergei. Thing is - you mentioned the
AGP aperature settings in your original post and then got tied up
thinking there's a bug in the kernel but we have to rule out stuff
like X getting very unhappy trying to play in the wrong place or
something. Try a simple test case and then see if you can give any
other handy details on your situation.

Jon.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-19 22:00                   ` Jon Masters
@ 2004-09-19 22:19                     ` Sergei Haller
  2004-09-20 10:26                       ` Sergei Haller
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-09-19 22:19 UTC (permalink / raw)
  To: jonathan; +Cc: linux-kernel

On Sun, 19 Sep 2004, Jon Masters (JM) wrote:

JM> On Mon, 20 Sep 2004 07:47:20 +1000 (EST), Sergei Haller 
JM> 
JM> > I guess I should write a simple C-program using malloc or something to
JM> > reproduce the crash in the simplest possible way, shouldn't I?
JM> 
JM> You've answered your own question Sergei. Thing is - you mentioned the
JM> AGP aperature settings in your original post [...]

well, AGP and PCI I mentioned only as "the bad guys stealing the memory"
and as the reason to why I wanted to spilt the main memory in two 2gb 
blocks or in 3gb+1gb, one block being at the normal address range and the 
other at the addresses >4gb

JM> but we have to rule out stuff like X 

I guess, you're right in htis.

JM> Try a simple test case and then see if you can give any other handy
JM> details on your situation.


thanks so far,

        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-19 22:19                     ` Sergei Haller
@ 2004-09-20 10:26                       ` Sergei Haller
  2004-09-24  4:38                         ` Sergei Haller
                                           ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Sergei Haller @ 2004-09-20 10:26 UTC (permalink / raw)
  To: jonathan; +Cc: linux-kernel

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1952 bytes --]

On Mon, 20 Sep 2004, Sergei Haller (SH) wrote:

SH> On Sun, 19 Sep 2004, Jon Masters (JM) wrote:
SH> 
SH> JM> On Mon, 20 Sep 2004 07:47:20 +1000 (EST), Sergei Haller 
SH> JM> 
SH> JM> > I guess I should write a simple C-program using malloc or something to
SH> JM> > reproduce the crash in the simplest possible way, shouldn't I?


here we go again. that's the program I wrote:
--------------------------------------------------------------
#include <stdlib.h>

int main(int argc, char **argv) {
     unsigned long long int         bytes;
     char *mem;

     if (argc < 2)
          bytes = 0x40000000; // 1gb
     else
          bytes = strtoll(argv[1], NULL, 10);

     printf("allocate %llu: ", bytes);

     if (mem = malloc(bytes))
     {
          printf("ok\n");
          printf("set them to 0... ");

          memset(mem,0,bytes);

          printf("done\n");

     }
     else
          printf("not ok\n");

     return 0;
}
--------------------------------------------------------------

and that's the log:

fang ~sergei> ./memtest 10000
allocate 10000: ok
set them to 0... done
fang ~sergei> ./memtest 100000
allocate 100000: ok
set them to 0... done
fang ~sergei> ./memtest 1000000
allocate 1000000: ok
set them to 0... done
fang ~sergei> ./memtest 10000000
allocate 10000000: ok
set them to 0... done
fang ~sergei> ./memtest 100000000
allocate 100000000: ok
set them to 0... done
fang ~sergei> ./memtest 1000000000
allocate 1000000000: ok

Message from syslogd@fang at Mon Sep 20 18:03:16 2004 ...
fang kernel: Oops: 0000 [1] PREEMPT SMP


Attached is the full OOPS excerpt from /var/log/messages


        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

[-- Attachment #2: Type: TEXT/PLAIN, Size: 6067 bytes --]

Sep 20 18:03:16 fang kernel: Unable to handle kernel NULL pointer dereference at 0000000000000e70 RIP: 
Sep 20 18:03:16 fang kernel: <ffffffff8016a8e1>{pte_alloc_map+193}
Sep 20 18:03:16 fang kernel: PML4 13f7a0067 PGD 13b439067 PMD 0 
Sep 20 18:03:16 fang kernel: Oops: 0000 [1] PREEMPT SMP 
Sep 20 18:03:16 fang kernel: CPU 1 
Sep 20 18:03:16 fang kernel: Modules linked in: sg af_packet raw ide_floppy ide_tape ide_cd floppy ipt_TOS ipt_REJECT ipt_LOG ipt_state ip_nat_irc ip_nat_tftp ip_nat_ftp ip_conntrack_irc ip_conntrack_tftp ip_conntrack_ftp ipt_multiport ipt_conntrack iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables ohci1394 ieee1394 rtc
Sep 20 18:03:16 fang kernel: Pid: 3264, comm: memtest Not tainted 2.6.8.1
Sep 20 18:03:16 fang kernel: RIP: 0010:[<ffffffff8016a8e1>] <ffffffff8016a8e1>{pte_alloc_map+193}
Sep 20 18:03:16 fang kernel: RSP: 0000:000001013bc7ddc8  EFLAGS: 00010213
Sep 20 18:03:16 fang kernel: RAX: ffffffff7fffffff RBX: 0000002ac7000000 RCX: 0000000000000018
Sep 20 18:03:16 fang kernel: RDX: 0000010109c97000 RSI: 0000000000000000 RDI: 0000010109c98000
Sep 20 18:03:16 fang kernel: RBP: 000001013f00d000 R08: 000001000000e000 R09: 0000000000000000
Sep 20 18:03:16 fang kernel: R10: 0000002a9566ca88 R11: 0000002a956f0820 R12: 0000000000000000
Sep 20 18:03:16 fang kernel: R13: 0000010110c5e1c0 R14: 0000010110c5e1c0 R15: 000001013ec1edb0
Sep 20 18:03:16 fang kernel: FS:  0000002a958bf4c0(0000) GS:ffffffff806bbc40(0000) knlGS:0000000000000000
Sep 20 18:03:16 fang kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 20 18:03:16 fang kernel: CR2: 0000000000000e70 CR3: 000000013ff9a000 CR4: 00000000000006e0
Sep 20 18:03:16 fang kernel: Process memtest (pid: 3264, threadinfo 000001013bc7c000, task 000001013ec1edb0)
Sep 20 18:03:16 fang kernel: Stack: 000001013f00d000 000001013f7a0558 000001013f00d000 0000002ac7000000 
Sep 20 18:03:16 fang kernel:        000001013bc7df58 ffffffff8016ab12 0000002a956f0820 0000002a9566ca88 
Sep 20 18:03:16 fang kernel:        0000000000000000 0000000000000000 
Sep 20 18:03:16 fang kernel: Call Trace:<ffffffff8016ab12>{handle_mm_fault+258} <ffffffff80123174>{do_page_fault+452} 
Sep 20 18:03:16 fang kernel:        <ffffffff804c0b3b>{schedule+219} <ffffffff8033db49>{tty_write+777} 
Sep 20 18:03:16 fang kernel:        <ffffffff80111361>{error_exit+0} 
Sep 20 18:03:16 fang kernel: 
Sep 20 18:03:16 fang kernel: Code: 48 8b 8e 70 0e 00 00 76 07 b8 00 00 00 80 eb 0a 48 b8 00 00 
Sep 20 18:03:16 fang kernel: RIP <ffffffff8016a8e1>{pte_alloc_map+193} RSP <000001013bc7ddc8>
Sep 20 18:03:16 fang kernel: CR2: 0000000000000e70
Sep 20 18:03:16 fang kernel:  <1>Unable to handle kernel NULL pointer dereference at 0000000000000e70 RIP: 
Sep 20 18:03:16 fang kernel: <ffffffff8016a05c>{unmap_vmas+860}
Sep 20 18:03:16 fang kernel: PML4 13f7a0067 PGD 13b439067 PMD 0 
Sep 20 18:03:16 fang kernel: Oops: 0000 [2] PREEMPT SMP 
Sep 20 18:03:16 fang kernel: CPU 1 
Sep 20 18:03:16 fang kernel: Modules linked in: sg af_packet raw ide_floppy ide_tape ide_cd floppy ipt_TOS ipt_REJECT ipt_LOG ipt_state ip_nat_irc ip_nat_tftp ip_nat_ftp ip_conntrack_irc ip_conntrack_tftp ip_conntrack_ftp ipt_multiport ipt_conntrack iptable_filter iptable_mangle iptable_nat ip_conntrack ip_tables ohci1394 ieee1394 rtc
Sep 20 18:03:16 fang kernel: Pid: 3264, comm: memtest Not tainted 2.6.8.1
Sep 20 18:03:16 fang kernel: RIP: 0010:[<ffffffff8016a05c>] <ffffffff8016a05c>{unmap_vmas+860}
Sep 20 18:03:16 fang kernel: RSP: 0000:000001013bc7dab8  EFLAGS: 00010217
Sep 20 18:03:16 fang kernel: RAX: 00000000000000e0 RBX: 000000000007d000 RCX: 0000000000000000
Sep 20 18:03:16 fang kernel: RDX: 0000000000109c00 RSI: 0000000000000000 RDI: 0000010004a3dfc8
Sep 20 18:03:16 fang kernel: RBP: 000001010a296b48 R08: 000001000000e400 R09: 000001013bc7db90
Sep 20 18:03:16 fang kernel: R10: 0000000000000001 R11: 0000000000aaaaaa R12: 0000000000114000
Sep 20 18:03:16 fang kernel: R13: 0000000000000020 R14: 0000000000000000 R15: 0000000000000000
Sep 20 18:03:16 fang kernel: FS:  0000002a958bf4c0(0000) GS:ffffffff806bbc40(0000) knlGS:0000000000000000
Sep 20 18:03:16 fang kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 20 18:03:16 fang kernel: CR2: 0000000000000e70 CR3: 000000013ff9a000 CR4: 00000000000006e0
Sep 20 18:03:16 fang kernel: Process memtest (pid: 3264, threadinfo 000001013bc7c000, task 000001013ec1edb0)
Sep 20 18:03:16 fang kernel: Stack: 0000002b06eec000 0000002ac70e6000 0000010110c5e1b8 0000002ac6eec000 
Sep 20 18:03:16 fang kernel:        000001013f7a0558 0000002ac70e6000 000001000564b100 00000000001fa000 
Sep 20 18:03:16 fang kernel:        0000002ad126d000 0000000900000000 
Sep 20 18:03:16 fang kernel: Call Trace:<ffffffff8016ce8a>{exit_mmap+186} <ffffffff801378a0>{mmput+128} 
Sep 20 18:03:16 fang kernel:        <ffffffff8013cce5>{do_exit+597} <ffffffff80111fea>{oops_end+74} 
Sep 20 18:03:16 fang kernel:        <ffffffff8012347b>{do_page_fault+1227} <ffffffff80132288>{recalc_task_prio+440} 
Sep 20 18:03:16 fang kernel:        <ffffffff8015cdf4>{__rmqueue+228} <ffffffff80111361>{error_exit+0} 
Sep 20 18:03:16 fang kernel:        <ffffffff8016a8e1>{pte_alloc_map+193} <ffffffff8016a885>{pte_alloc_map+101} 
Sep 20 18:03:16 fang kernel:        <ffffffff8016ab12>{handle_mm_fault+258} <ffffffff80123174>{do_page_fault+452} 
Sep 20 18:03:16 fang kernel:        <ffffffff804c0b3b>{schedule+219} <ffffffff8033db49>{tty_write+777} 
Sep 20 18:03:16 fang kernel:        <ffffffff80111361>{error_exit+0} 
Sep 20 18:03:16 fang kernel: 
Sep 20 18:03:16 fang kernel: Code: 48 2b 91 70 0e 00 00 48 8d 04 d5 00 00 00 00 48 29 d0 48 8b 
Sep 20 18:03:16 fang kernel: RIP <ffffffff8016a05c>{unmap_vmas+860} RSP <000001013bc7dab8>
Sep 20 18:03:16 fang kernel: CR2: 0000000000000e70
Sep 20 18:03:16 fang kernel:  <6>note: memtest[3264] exited with preempt_count 1
Sep 20 18:03:16 fang kernel: Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP: 
Sep 20 18:06:01 fang syslogd 1.4.1: restart.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-20 10:26                       ` Sergei Haller
  2004-09-24  4:38                         ` Sergei Haller
@ 2004-09-24  4:38                         ` Sergei Haller
  2004-09-24  8:15                         ` Andrew Walrond
  2 siblings, 0 replies; 40+ messages in thread
From: Sergei Haller @ 2004-09-24  4:38 UTC (permalink / raw)
  To: jonathan; +Cc: linux-kernel, linux-smp


I just discovered the linux-smp list and decided to summarize the topic 
and take the opportunity to cross-post to there.

an archive of the discussion can be found e.g. here:
http://marc.theaimsgroup.com/?t=109525952600004&r=1&w=4


The machine at hand is the Tyan Tiger K8W with two Opterons 246
(http://www.tyan.com/products/html/tigerk8w.html) and 4GB of memory

we are using vanilla 2.6.8.1 kernel.

 * if the memory is set up in the ordinary way in the BIOS, then
   approximately 512MB are lost (PCI/AGM adressing and stuff), but
   everything is stable
 * if the memory is set up in the BIOS to be in two chunks (e.g. 3GB at 
   the address range 0-3GB and 1GB at 4-5GB address range), then
    - memtest86 tells everything is fine.
    - if we run a non-SMP kernel, everything is stable
    - if we run an SMP kernel, it crashes as soon as approx. 1GB of memory
      is allocated and set to 0 (see the test case C-program in my
      previous mail)

Is there anything we can do? Any logs I can provide? Something to try out?

Thanks,
        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-20 10:26                       ` Sergei Haller
@ 2004-09-24  4:38                         ` Sergei Haller
  2004-09-24  4:38                         ` Sergei Haller
  2004-09-24  8:15                         ` Andrew Walrond
  2 siblings, 0 replies; 40+ messages in thread
From: Sergei Haller @ 2004-09-24  4:38 UTC (permalink / raw)
  To: jonathan; +Cc: linux-kernel, linux-smp


I just discovered the linux-smp list and decided to summarize the topic 
and take the opportunity to cross-post to there.

an archive of the discussion can be found e.g. here:
http://marc.theaimsgroup.com/?t=109525952600004&r=1&w=4


The machine at hand is the Tyan Tiger K8W with two Opterons 246
(http://www.tyan.com/products/html/tigerk8w.html) and 4GB of memory

we are using vanilla 2.6.8.1 kernel.

 * if the memory is set up in the ordinary way in the BIOS, then
   approximately 512MB are lost (PCI/AGM adressing and stuff), but
   everything is stable
 * if the memory is set up in the BIOS to be in two chunks (e.g. 3GB at 
   the address range 0-3GB and 1GB at 4-5GB address range), then
    - memtest86 tells everything is fine.
    - if we run a non-SMP kernel, everything is stable
    - if we run an SMP kernel, it crashes as soon as approx. 1GB of memory
      is allocated and set to 0 (see the test case C-program in my
      previous mail)

Is there anything we can do? Any logs I can provide? Something to try out?

Thanks,
        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-20 10:26                       ` Sergei Haller
  2004-09-24  4:38                         ` Sergei Haller
  2004-09-24  4:38                         ` Sergei Haller
@ 2004-09-24  8:15                         ` Andrew Walrond
  2004-09-24  8:23                           ` Sergei Haller
  2 siblings, 1 reply; 40+ messages in thread
From: Andrew Walrond @ 2004-09-24  8:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: Sergei Haller

On Monday 20 Sep 2004 11:26, Sergei Haller wrote:
>
> fang ~sergei> ./memtest 1000000000
> allocate 1000000000: ok
>
> Message from syslogd@fang at Mon Sep 20 18:03:16 2004 ...
> fang kernel: Oops: 0000 [1] PREEMPT SMP
>

Works fine on my 4Gb Tyan thunder K8W machine, even running from an xterm:
andrew@orac ~ $ uname -a
Linux orac.walrond.org 2.6.8.1 #3 SMP Sun Aug 29 17:36:49 BST 2004 x86_64 
unknown unknown GNU/Linux

andrew@orac ~ $ free -m
             total       used       free     shared    buffers     cached
Mem:          4008        263       3744          0          0         66
-/+ buffers/cache:        195       3812
Swap:         3827         25       3802

andrew@orac ~ $ ./memtest 1000000000
allocate 1000000000: ok
set them to 0... done
andrew@orac ~ $ ./memtest 4000000000
allocate 4000000000: ok
set them to 0... done
andrew@orac ~ $ ./memtest 5000000000
allocate 5000000000: ok
set them to 0... done
andrew@orac ~ $

The last one took a while (using 1Gb swap) but it still worked fine.

Without swap:

andrew@orac ~ $ sudo swapoff -a
andrew@orac ~ $ free -m
             total       used       free     shared    buffers     cached
Mem:          4008        237       3770          0          1         54
-/+ buffers/cache:        181       3826
Swap:            0          0          0
andrew@orac ~ $ ./memtest 1000000000
allocate 1000000000: ok
set them to 0... done
andrew@orac ~ $ ./memtest 2000000000
allocate 2000000000: ok
set them to 0... done
andrew@orac ~ $                          

Still fine.

Andrew Walrond

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-24  8:15                         ` Andrew Walrond
@ 2004-09-24  8:23                           ` Sergei Haller
  2004-09-24  8:31                             ` Andrew Walrond
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-09-24  8:23 UTC (permalink / raw)
  To: Andrew Walrond; +Cc: linux-kernel

On Fri, 24 Sep 2004, Andrew Walrond (AW) wrote:

AW> On Monday 20 Sep 2004 11:26, Sergei Haller wrote:
AW> >
AW> > fang ~sergei> ./memtest 1000000000
AW> > allocate 1000000000: ok
AW> >
AW> > Message from syslogd@fang at Mon Sep 20 18:03:16 2004 ...
AW> > fang kernel: Oops: 0000 [1] PREEMPT SMP
AW> >
AW> 
AW> Works fine on my 4Gb Tyan thunder K8W machine, even running from an xterm:
AW> andrew@orac ~ $ uname -a
AW> Linux orac.walrond.org 2.6.8.1 #3 SMP Sun Aug 29 17:36:49 BST 2004 x86_64 
AW> unknown unknown GNU/Linux
AW> 
AW> andrew@orac ~ $ free -m
AW>              total       used       free     shared    buffers     cached
AW> Mem:          4008        263       3744          0          0         66
AW> -/+ buffers/cache:        195       3812
AW> Swap:         3827         25       3802
AW> 
AW> andrew@orac ~ $ ./memtest 1000000000
AW> allocate 1000000000: ok
AW> set them to 0... done
AW> andrew@orac ~ $ ./memtest 4000000000
AW> allocate 4000000000: ok
AW> set them to 0... done
AW> andrew@orac ~ $ ./memtest 5000000000
AW> allocate 5000000000: ok
AW> set them to 0... done
AW> andrew@orac ~ $
AW> 
AW> The last one took a while (using 1Gb swap) but it still worked fine.

Hi Andrew,

thanks for your report. 

It's the same for me if I use the non-SMP version of the kernel.
but the SMP one seems to be panicking for some reason.


        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-24  8:23                           ` Sergei Haller
@ 2004-09-24  8:31                             ` Andrew Walrond
  2004-09-24  8:57                               ` Sergei Haller
  0 siblings, 1 reply; 40+ messages in thread
From: Andrew Walrond @ 2004-09-24  8:31 UTC (permalink / raw)
  To: linux-kernel; +Cc: Sergei Haller

On Friday 24 Sep 2004 09:23, Sergei Haller wrote:
> It's the same for me if I use the non-SMP version of the kernel.
> but the SMP one seems to be panicking for some reason.
>

Just a thought; How are the memory modules arranged on the board?
I have 2 x 1Gb modules in each cpu-specific bank, rather than all four in 
cpu1's bank. How are yours arranged?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-24  8:31                             ` Andrew Walrond
@ 2004-09-24  8:57                               ` Sergei Haller
  2004-09-24  9:27                                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-09-24  8:57 UTC (permalink / raw)
  To: Andrew Walrond; +Cc: linux-kernel

On Fri, 24 Sep 2004, Andrew Walrond (AW) wrote:

AW> On Friday 24 Sep 2004 09:23, Sergei Haller wrote:
AW> > It's the same for me if I use the non-SMP version of the kernel.
AW> > but the SMP one seems to be panicking for some reason.
AW> >
AW> 
AW> Just a thought; How are the memory modules arranged on the board?
AW> I have 2 x 1Gb modules in each cpu-specific bank, rather than all four in 
AW> cpu1's bank. How are yours arranged?

my board has only four banks, each of them has a 1GB module sitting.
(page 26 of ftp://ftp.tyan.com/manuals/m_s2875_102.pdf)


        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-24  8:57                               ` Sergei Haller
@ 2004-09-24  9:27                                 ` Rafael J. Wysocki
  2004-09-24  9:41                                   ` Andrew Walrond
  2004-09-24 11:50                                   ` Sergei Haller
  0 siblings, 2 replies; 40+ messages in thread
From: Rafael J. Wysocki @ 2004-09-24  9:27 UTC (permalink / raw)
  To: linux-kernel; +Cc: Sergei Haller, Andrew Walrond

On Friday 24 of September 2004 10:57, Sergei Haller wrote:
> On Fri, 24 Sep 2004, Andrew Walrond (AW) wrote:
> 
> AW> On Friday 24 Sep 2004 09:23, Sergei Haller wrote:
> AW> > It's the same for me if I use the non-SMP version of the kernel.
> AW> > but the SMP one seems to be panicking for some reason.
> AW> >
> AW> 
> AW> Just a thought; How are the memory modules arranged on the board?
> AW> I have 2 x 1Gb modules in each cpu-specific bank, rather than all four 
in 
> AW> cpu1's bank. How are yours arranged?
> 
> my board has only four banks, each of them has a 1GB module sitting.
> (page 26 of ftp://ftp.tyan.com/manuals/m_s2875_102.pdf)

Which is what makes the difference, I think.  IMO, the problem is that _both_ 
CPUs use the same memory bank that is physically attached to only one of them 
which leads to conflicts, apparently (the CPU with memory has also 
PCI/AGP/whatever attached to it via HyperTransport so I can imagine there may 
be issues with overlapping address spaces etc.).  I'd bet that there's 
something wrong either with the BIOS or with the board design itself and I 
don't think there's anything that the kernel can do about it (usual 
disclaimer applies).

Out of couriosity: have you tried to run the kernel with K8 NUMA enabled?

Greets,
RJW

-- 
- Would you tell me, please, which way I ought to go from here?
- That depends a good deal on where you want to get to.
		-- Lewis Carroll "Alice's Adventures in Wonderland"

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-24  9:27                                 ` Rafael J. Wysocki
@ 2004-09-24  9:41                                   ` Andrew Walrond
  2004-09-24 11:42                                     ` Sergei Haller
  2004-09-24 11:50                                   ` Sergei Haller
  1 sibling, 1 reply; 40+ messages in thread
From: Andrew Walrond @ 2004-09-24  9:41 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Sergei Haller

On Friday 24 Sep 2004 10:27, Rafael J. Wysocki wrote:
>
> > AW> cpu1's bank. How are yours arranged?
> >
> > my board has only four banks, each of them has a 1GB module sitting.
> > (page 26 of ftp://ftp.tyan.com/manuals/m_s2875_102.pdf)
>
> Which is what makes the difference, I think.  IMO, the problem is that
> _both_ CPUs use the same memory bank that is physically attached to only
> one of them which leads to conflicts, apparently (the CPU with memory has
> also PCI/AGP/whatever attached to it via HyperTransport so I can imagine
> there may be issues with overlapping address spaces etc.).  I'd bet that
> there's something wrong either with the BIOS or with the board design
> itself and I don't think there's anything that the kernel can do about it
> (usual disclaimer applies).
>
> Out of couriosity: have you tried to run the kernel with K8 NUMA enabled?
>

Actually, the block diagram on page 9 of the manual suggests that this is 
_not_ a NUMA board, since all DIMMS are connected to cpu1. The block diagram 
for my thunder k8w specifically shows DIMMS associated with individual 
processors.

Which suggests that NUMA show be _disabled_ in the kernel config.

Have you tried it with NUMA disabled? I think I remeber it being on in 
the .config you sent me.

Andrew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-24  9:41                                   ` Andrew Walrond
@ 2004-09-24 11:42                                     ` Sergei Haller
  2004-09-24 12:15                                       ` Andrew Walrond
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-09-24 11:42 UTC (permalink / raw)
  To: Andrew Walrond; +Cc: linux-kernel, Rafael J. Wysocki

On Fri, 24 Sep 2004, Andrew Walrond (AW) wrote:

AW> On Friday 24 Sep 2004 10:27, Rafael J. Wysocki wrote:
AW> >
AW> > > AW> cpu1's bank. How are yours arranged?
AW> > >
AW> > > my board has only four banks, each of them has a 1GB module sitting.
AW> > > (page 26 of ftp://ftp.tyan.com/manuals/m_s2875_102.pdf)
AW> >
AW> > Which is what makes the difference, I think.  IMO, the problem is that
AW> > _both_ CPUs use the same memory bank that is physically attached to only
AW> > one of them which leads to conflicts, apparently (the CPU with memory has
AW> > also PCI/AGP/whatever attached to it via HyperTransport so I can imagine
AW> > there may be issues with overlapping address spaces etc.).  I'd bet that
AW> > there's something wrong either with the BIOS or with the board design
AW> > itself and I don't think there's anything that the kernel can do about it
AW> > (usual disclaimer applies).
AW> >
AW> > Out of couriosity: have you tried to run the kernel with K8 NUMA enabled?
AW> >

yes.

AW> Actually, the block diagram on page 9 of the manual suggests that this is 
AW> _not_ a NUMA board, since all DIMMS are connected to cpu1. The block diagram 
AW> for my thunder k8w specifically shows DIMMS associated with individual 
AW> processors.
AW> 
AW> Which suggests that NUMA show be _disabled_ in the kernel config.

hmm. 

AW> Have you tried it with NUMA disabled? I think I remeber it being on in 
AW> the .config you sent me.

NUMA was enabled all the time (at least most of the time). I don't know if 
I ever ran it without NUMA. I'll certainly try that.

Unfortunately, I won't be able to do any reboots during the next one or 
two weeks since the machine has gone into stable operation tonight. (with 
some loss of memory for now)

if it is of some interest, that's what dmesg tells about NUMA:

     BIOS-provided physical RAM map:
      BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
      BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
      BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
      BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable)
      BIOS-e820: 00000000bfff0000 - 00000000bffff000 (ACPI data)
      BIOS-e820: 00000000bffff000 - 00000000c0000000 (ACPI NVS)
      BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
      BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
     Scanning NUMA topology in Northbridge 24
     Number of nodes 2 (10010)
     Node 0 MemBase 0000000000000000 Limit 000000013fffffff
     Skipping disabled node 1
     Using node hash shift of 24
     Bootmem setup node 0 0000000000000000-000000013fffffff
     No mptable found.
     On node 0 totalpages: 1310719
       DMA zone: 4096 pages, LIFO batch:1
       Normal zone: 1306623 pages, LIFO batch:16
       HighMem zone: 0 pages, LIFO batch:1

So actually it looks like the kernel well notices that only one processor
has access to the memory here.


        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-24  9:27                                 ` Rafael J. Wysocki
  2004-09-24  9:41                                   ` Andrew Walrond
@ 2004-09-24 11:50                                   ` Sergei Haller
  1 sibling, 0 replies; 40+ messages in thread
From: Sergei Haller @ 2004-09-24 11:50 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: linux-kernel, Andrew Walrond

On Fri, 24 Sep 2004, Rafael J. Wysocki (RW) wrote:

RW> > my board has only four banks, each of them has a 1GB module sitting.
RW> > (page 26 of ftp://ftp.tyan.com/manuals/m_s2875_102.pdf)
RW> 
RW> Which is what makes the difference, I think.  IMO, the problem is that _both_ 
RW> CPUs use the same memory bank that is physically attached to only one of them 
RW> which leads to conflicts, apparently (the CPU with memory has also 
RW> PCI/AGP/whatever attached to it via HyperTransport so I can imagine there may 
RW> be issues with overlapping address spaces etc.).  I'd bet that there's 
RW> something wrong either with the BIOS or with the board design itself and I 
RW> don't think there's anything that the kernel can do about it (usual 
RW> disclaimer applies).

I got the impression that the whole point of the problem is that the
kernel is getting some wrong information about the memory configuration.

Is there any way to check which information exactly is wrong that leads to
the error and to see after that, where this information comes from: if the
BIOS is lying or if the kernel is misinterpreting something...



        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-24 11:42                                     ` Sergei Haller
@ 2004-09-24 12:15                                       ` Andrew Walrond
  2004-10-22  8:59                                         ` Sergei Haller
  0 siblings, 1 reply; 40+ messages in thread
From: Andrew Walrond @ 2004-09-24 12:15 UTC (permalink / raw)
  To: Sergei Haller; +Cc: linux-kernel, Rafael J. Wysocki

On Friday 24 Sep 2004 12:42, you wrote:
>
> NUMA was enabled all the time (at least most of the time). I don't know if
> I ever ran it without NUMA. I'll certainly try that.
>
> Unfortunately, I won't be able to do any reboots during the next one or
> two weeks since the machine has gone into stable operation tonight. (with
> some loss of memory for now)
>
> if it is of some interest, that's what dmesg tells about NUMA:
>
>      BIOS-provided physical RAM map:
>       BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
>       BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
>       BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
>       BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable)
>       BIOS-e820: 00000000bfff0000 - 00000000bffff000 (ACPI data)
>       BIOS-e820: 00000000bffff000 - 00000000c0000000 (ACPI NVS)
>       BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
>       BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
>      Scanning NUMA topology in Northbridge 24
>      Number of nodes 2 (10010)
>      Node 0 MemBase 0000000000000000 Limit 000000013fffffff
>      Skipping disabled node 1
>      Using node hash shift of 24
>      Bootmem setup node 0 0000000000000000-000000013fffffff
>      No mptable found.
>      On node 0 totalpages: 1310719
>        DMA zone: 4096 pages, LIFO batch:1
>        Normal zone: 1306623 pages, LIFO batch:16
>        HighMem zone: 0 pages, LIFO batch:1
>
> So actually it looks like the kernel well notices that only one processor
> has access to the memory here.
>

Intriguing. If it works with NUMA disabled, it would strongly indicate a bug 
in the NUMA kernel code.

Definately worth a try as soon as you can afford to take the machine down for 
a few minutes.

Andrew

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-09-24 12:15                                       ` Andrew Walrond
@ 2004-10-22  8:59                                         ` Sergei Haller
  2004-10-22  9:26                                           ` Andrew Walrond
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-10-22  8:59 UTC (permalink / raw)
  To: Andrew Walrond; +Cc: linux-kernel, Rafael J. Wysocki

On Fri, 24 Sep 2004, Andrew Walrond (AW) wrote:

AW> On Friday 24 Sep 2004 12:42, you wrote:
AW> >
AW> > NUMA was enabled all the time (at least most of the time). I don't know if
AW> > I ever ran it without NUMA. I'll certainly try that.
AW> >
AW> > Unfortunately, I won't be able to do any reboots during the next one or
AW> > two weeks since the machine has gone into stable operation tonight. (with
AW> > some loss of memory for now)
AW> >
AW> > if it is of some interest, that's what dmesg tells about NUMA:
AW> >
AW> >      BIOS-provided physical RAM map:
AW> >       BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
AW> >       BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
AW> >       BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
AW> >       BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable)
AW> >       BIOS-e820: 00000000bfff0000 - 00000000bffff000 (ACPI data)
AW> >       BIOS-e820: 00000000bffff000 - 00000000c0000000 (ACPI NVS)
AW> >       BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
AW> >       BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
AW> >      Scanning NUMA topology in Northbridge 24
AW> >      Number of nodes 2 (10010)
AW> >      Node 0 MemBase 0000000000000000 Limit 000000013fffffff
AW> >      Skipping disabled node 1
AW> >      Using node hash shift of 24
AW> >      Bootmem setup node 0 0000000000000000-000000013fffffff
AW> >      No mptable found.
AW> >      On node 0 totalpages: 1310719
AW> >        DMA zone: 4096 pages, LIFO batch:1
AW> >        Normal zone: 1306623 pages, LIFO batch:16
AW> >        HighMem zone: 0 pages, LIFO batch:1
AW> >
AW> > So actually it looks like the kernel well notices that only one processor
AW> > has access to the memory here.
AW> >
AW> 
AW> Intriguing. If it works with NUMA disabled, it would strongly indicate a bug 
AW> in the NUMA kernel code.

Now I have some good news (that is, I hope that this is good news)

If I disable NUMA in 2.6.8.1, it works stable!

The same with 2.6.9, which is out for a few days: if NUMA is disabled,
everything's find, if NUMA is enabled, the kernel crashes (as described in
previous mails)

What is this NUMA by the way? Is it OK to live without? 

If you need some additional output, let me know. I can't promise to be
fast, though (as I said, this machine is in production use now)


Thanks for all help,

        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-10-22  8:59                                         ` Sergei Haller
@ 2004-10-22  9:26                                           ` Andrew Walrond
  2004-10-22 18:24                                             ` Andi Kleen
  0 siblings, 1 reply; 40+ messages in thread
From: Andrew Walrond @ 2004-10-22  9:26 UTC (permalink / raw)
  To: linux-kernel; +Cc: Sergei Haller, Rafael J. Wysocki, Andi Kleen

On Friday 22 Oct 2004 09:59, Sergei Haller wrote:
>
> If I disable NUMA in 2.6.8.1, it works stable!
>
> The same with 2.6.9, which is out for a few days: if NUMA is disabled,
> everything's find, if NUMA is enabled, the kernel crashes (as described in
> previous mails)
>
> What is this NUMA by the way? Is it OK to live without?
>

Non uniform memory architecture. Basically, some of the ram is attached to 
cpu1 and some to cpu2. They can still access each others ram using 
HyperTransport, but can access their own ram faster. I'm no expert, but I 
guess that the NUMA code tries to achieve persistent process cpu affinity and 
keep all process memory in the relevant cpu's NUMA ram bank. Or something :)

Your board isn't numa, in the sense that all the ram is attached to one cpu, 
but I don't think that it should break when NUMA is enabled.

I've cc'ed Andi Kleen (x86_64 supremo) who might have some insights, but I'm 
guessing he'll say "Bios problem - tough luck". I might be wrong ;)

Andrew Walrond

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-10-22  9:26                                           ` Andrew Walrond
@ 2004-10-22 18:24                                             ` Andi Kleen
  2004-10-23 10:02                                               ` Andreas Klein
  2004-10-23 10:26                                               ` Sergei Haller
  0 siblings, 2 replies; 40+ messages in thread
From: Andi Kleen @ 2004-10-22 18:24 UTC (permalink / raw)
  To: Andrew Walrond; +Cc: linux-kernel, Sergei Haller, Rafael J. Wysocki

> I've cc'ed Andi Kleen (x86_64 supremo) who might have some insights, but I'm 
> guessing he'll say "Bios problem - tough luck". I might be wrong ;)

Is there a full boot log of the system? 
-Andi

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-10-22 18:24                                             ` Andi Kleen
@ 2004-10-23 10:02                                               ` Andreas Klein
  2004-10-23 16:43                                                 ` Andi Kleen
  2004-10-23 10:26                                               ` Sergei Haller
  1 sibling, 1 reply; 40+ messages in thread
From: Andreas Klein @ 2004-10-23 10:02 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Walrond, linux-kernel, Sergei Haller, Rafael J. Wysocki


Hello,

I have the same problem with 45 Tyan S2885 boards, but I have one running 
sample. The one running machine has the following configuration:

- Tyan S2885 pre-production model with a 1.01 pre-release bios
6 mb ram (4x512mb, 4x1gb)
The machine is running SuSE Linux Enterprise Server 8 (32bit).
We use this machine as our primary mail-server without problems for over a 
year.

- Now we ordered 45 Tyan S2885 and 4 S2875S board.
Both board do not run stable with more than 2GB ram usable.
4GB will only be recognized if the MTRR setting is set to Continuous and 
the Adjust Memory setting is set to Auto.
If the bios is configured this way and two 1gb ram modules are installed 
for each CPU on the 2885, the machine will not even load and unpack a 
SLES 9 kernel. Memtest sees 0-2GB mem usable and 4-6GB unusable (complains 
about each memory address).
If all four modules are installed for CPU0, then memtest seems to work 
without problems (0-2GB, 4-6GB), but SLES9 will crash during boot-up.
If all four modules are installed for CPU1, then memtest seems to work 
without problems too. SLES 9 will run a few minutes before a crash.
I will try to install SLES 8 (32bit) on the new boxes to see if it runs 
stable. If yes, there is something broken in the 2.6 kernels for amd64, if 
not, the pre-production bios is better that the final ones.

Bye,

On Fri, 22 Oct 2004, Andi Kleen wrote:

> > I've cc'ed Andi Kleen (x86_64 supremo) who might have some insights, but I'm 
> > guessing he'll say "Bios problem - tough luck". I might be wrong ;)
> 
> Is there a full boot log of the system? 
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- Andreas Klein
   asklein@cip.physik.uni-wuerzburg.de
   root / webmaster @cip.physik.uni-wuerzburg.de
   root / webmaster @www.physik.uni-wuerzburg.de
_____________________________________
|                                   | 
|   Long live our gracious AMIGA!   |
|___________________________________|


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-10-22 18:24                                             ` Andi Kleen
  2004-10-23 10:02                                               ` Andreas Klein
@ 2004-10-23 10:26                                               ` Sergei Haller
  2004-10-23 16:49                                                 ` Andi Kleen
  1 sibling, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-10-23 10:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Walrond, linux-kernel, Rafael J. Wysocki

[-- Attachment #1: Type: TEXT/PLAIN, Size: 697 bytes --]

On Fri, 22 Oct 2004, Andi Kleen (AK) wrote:

AK> > I've cc'ed Andi Kleen (x86_64 supremo) who might have some insights, but I'm 
AK> > guessing he'll say "Bios problem - tough luck". I might be wrong ;)
AK> 
AK> Is there a full boot log of the system? 

yes. attached two files: 

  dmesg-2.6.9-smp-NUMA   (crashing one)
  dmesg-2.6.9-smp-noNUMA (working one)


        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

[-- Attachment #2: Type: TEXT/PLAIN, Size: 14524 bytes --]

Bootdata ok (command line is BOOT_IMAGE=linux-2.6.9-smp-NUMA ro root=801 devfs=mount splash=silent)
Linux version 2.6.9-smp-NUMA (sergei@fang.maths.usyd.edu.au) (gcc version 3.3.2 (Mandrake Linux 10.0 3.3.2-6.1mdk)) #3 SMP Thu Oct 21 19:12:55 EST 2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable)
 BIOS-e820: 00000000bfff0000 - 00000000bffff000 (ACPI data)
 BIOS-e820: 00000000bffff000 - 00000000c0000000 (ACPI NVS)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
Scanning NUMA topology in Northbridge 24
Number of nodes 2 (10010)
Node 0 MemBase 0000000000000000 Limit 000000013fffffff
Skipping disabled node 1
Using node hash shift of 24
Bootmem setup node 0 0000000000000000-000000013fffffff
No mptable found.
On node 0 totalpages: 1310719
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 1306623 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
ACPI: RSDP (v002 ACPIAM                                ) @ 0x00000000000f68a0
ACPI: XSDT (v001 A M I  OEMXSDT  0x06000428 MSFT 0x00000097) @ 0x00000000bfff0100
ACPI: FADT (v001 A M I  OEMFACP  0x06000428 MSFT 0x00000097) @ 0x00000000bfff0281
ACPI: MADT (v001 A M I  OEMAPIC  0x06000428 MSFT 0x00000097) @ 0x00000000bfff0380
ACPI: OEMB (v001 A M I  OEMBIOS  0x06000428 MSFT 0x00000097) @ 0x00000000bffff040
ACPI: HPET (v001 A M I  OEMHPET  0x06000428 MSFT 0x00000097) @ 0x00000000bfff3330
ACPI: ASF! (v001 AMIASF AMDSTRET 0x00000001 INTL 0x02002026) @ 0x00000000bfff3370
ACPI: DSDT (v001  0AAAA 0AAAA000 0x00000000 INTL 0x02002026) @ 0x0000000000000000
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:5 APIC version 16
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
ACPI: HPET id: 0x102282a0 base: 0xfec01000
Using ACPI (MADT) for SMP configuration information
Checking aperture...
CPU 0: aperture @ f0000000 size 128 MB
CPU 1: aperture @ f0000000 size 128 MB
Built 2 zonelists
Kernel command line: BOOT_IMAGE=linux-2.6.9-smp-NUMA ro root=801 devfs=mount splash=silent console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 14.318180 MHz HPET timer.
time.c: Detected 1992.117 MHz processor.
Console: colour dummy device 80x25
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Memory: 4102924k/5242880k available (3784k kernel code, 0k reserved, 1550k data, 212k init)
Calibrating delay loop... 3940.35 BogoMIPS (lpj=1970176)
Mount-cache hash table entries: 256 (order: 0, 4096 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
Using local APIC NMI watchdog using perfctr0
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU0: AMD Opteron(tm) Processor 246 stepping 0a
per-CPU timeslice cutoff: 1023.83 usecs.
task migration cache decay timeout: 2 msecs.
Booting processor 1/1 rip 6000 rsp 1013ffa5f58
Initializing CPU#1
Calibrating delay loop... 3981.31 BogoMIPS (lpj=1990656)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
AMD Opteron(tm) Processor 246 stepping 0a
Total of 2 processors activated (7921.66 BogoMIPS).
Using local APIC timer interrupts.
Detected 12.450 MHz APIC timer.
checking TSC synchronization across 2 CPUs: passed.
time.c: Using HPET based timekeeping.
Brought up 2 CPUs
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040816
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
ACPI: PCI interrupt 0000:00:07.2[D] -> GSI 19 (level, low) -> IRQ 19
ACPI: PCI interrupt 0000:00:07.5[B] -> GSI 17 (level, low) -> IRQ 17
ACPI: PCI interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
ACPI: PCI interrupt 0000:01:00.0[D] -> GSI 19 (level, low) -> IRQ 19
ACPI: PCI interrupt 0000:01:00.1[D] -> GSI 19 (level, low) -> IRQ 19
ACPI: PCI interrupt 0000:01:03.0[A] -> GSI 18 (level, low) -> IRQ 18
ACPI: PCI interrupt 0000:01:05.0[A] -> GSI 19 (level, low) -> IRQ 19
ACPI: PCI interrupt 0000:01:0a.0[A] -> GSI 17 (level, low) -> IRQ 17
ACPI: PCI interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 17
ACPI: PCI interrupt 0000:01:0b.1[B] -> GSI 18 (level, low) -> IRQ 18
ACPI: PCI interrupt 0000:01:0b.2[C] -> GSI 19 (level, low) -> IRQ 19
agpgart: Detected AMD 8151 AGP Bridge rev B2
agpgart: Maximum main memory to use for agp memory: 4938M
agpgart: AGP aperture is 128M @ 0xf0000000
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
devfs: 2004-01-31 Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x1
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NTFS driver 2.1.20 [Flags: R/O].
SGI XFS with ACLs, large block/inode numbers, no debug enabled
SGI XFS Quota Management subsystem
Initializing Cryptographic API
PCI: Via IRQ fixup for 0000:01:0b.0, from 9 to 1
PCI: Via IRQ fixup for 0000:01:0b.1, from 5 to 2
vesafb: framebuffer at 0xe8000000, mapped to 0xffffff0000100000, size 6144k
vesafb: mode is 1024x768x32, linelength=4096, pages=1
vesafb: scrolling: redraw
vesafb: Truecolor: size=8:8:8:8, shift=24:16:8:0
Console: switching to colour frame buffer device 128x48
fb0: VESA VGA frame buffer device
vga16fb: initializing
vga16fb: mapped to 0x00000100000a0000
fb1: VGA16 VGA frame buffer device
ACPI: Power Button (FF) [PWRF]
ACPI: Processor [CPU1] (supports C1, 8 throttling states)
ACPI: Processor [CPU2] (supports C1)
Non-volatile memory driver v1.2
hw_random: AMD768 system management I/O registers at 0x5000.
hw_random hardware driver 1.0.0 loaded
Linux agpgart interface v0.100 (c) Dave Jones
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 32000K size 1024 blocksize
Intel(R) PRO/1000 Network Driver - version 5.3.19-k2
Copyright (c) 1999-2004 Intel Corporation.
ACPI: PCI interrupt 0000:01:03.0[A] -> GSI 18 (level, low) -> IRQ 18
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
Probing IDE interface ide1...
hdc: SONY DVD RW DRU-700A, ATAPI CD/DVD-ROM drive
Using anticipatory io scheduler
ide1 at 0x170-0x177,0x376 on irq 15
libata version 1.02 loaded.
sata_sil version 0.54
ACPI: PCI interrupt 0000:01:05.0[A] -> GSI 19 (level, low) -> IRQ 19
ata1: SATA max UDMA/100 cmd 0xFFFFFF0000004C80 ctl 0xFFFFFF0000004C8A bmdma 0xFFFFFF0000004C00 irq 19
ata2: SATA max UDMA/100 cmd 0xFFFFFF0000004CC0 ctl 0xFFFFFF0000004CCA bmdma 0xFFFFFF0000004C08 irq 19
ata3: SATA max UDMA/100 cmd 0xFFFFFF0000004E80 ctl 0xFFFFFF0000004E8A bmdma 0xFFFFFF0000004E00 irq 19
ata4: SATA max UDMA/100 cmd 0xFFFFFF0000004EC0 ctl 0xFFFFFF0000004ECA bmdma 0xFFFFFF0000004E08 irq 19
ata1: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:203f
ata1: dev 0 ATA, max UDMA/100, 234441648 sectors: lba48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:203f
ata2: dev 0 ATA, max UDMA/100, 234441648 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sil
ata3: no device found (phy stat 00000000)
scsi2 : sata_sil
ata4: no device found (phy stat 00000000)
scsi3 : sata_sil
  Vendor: ATA       Model: WDC WD1200JD-00G  Rev: 02.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: WDC WD1200JD-00G  Rev: 02.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
st: Version 20040403, fixed bufsize 32768, s/g segs 256
osst :I: Tape driver with OnStream support version 0.99.1
osst :I: $Id: osst.c,v 1.70 2003/12/23 14:22:12 wriede Exp $
SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB)
SCSI device sda: drive cache: write back
 /dev/scsi/host0/bus0/target0/lun0: p1 p2 p3
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sdb: 234441648 512-byte hdwr sectors (120034 MB)
SCSI device sdb: drive cache: write back
 /dev/scsi/host1/bus0/target0/lun0: p1 p2
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
ACPI: PCI interrupt 0000:01:0b.2[C] -> GSI 19 (level, low) -> IRQ 19
ehci_hcd 0000:01:0b.2: VIA Technologies, Inc. USB 2.0
ehci_hcd 0000:01:0b.2: irq 19, pci mem ffffff0000020800
ehci_hcd 0000:01:0b.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:01:0b.2: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
ohci_hcd: 2004 Feb 02 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ACPI: PCI interrupt 0000:01:00.0[D] -> GSI 19 (level, low) -> IRQ 19
ohci_hcd 0000:01:00.0: Advanced Micro Devices [AMD] AMD-8111 USB
ohci_hcd 0000:01:00.0: irq 19, pci mem ffffff0000022000
ohci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ACPI: PCI interrupt 0000:01:00.1[D] -> GSI 19 (level, low) -> IRQ 19
ohci_hcd 0000:01:00.1: Advanced Micro Devices [AMD] AMD-8111 USB (#2)
ohci_hcd 0000:01:00.1: irq 19, pci mem ffffff0000024000
ohci_hcd 0000:01:00.1: new USB bus registered, assigned bus number 3
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 3 ports detected
USB Universal Host Controller Interface driver v2.2
ACPI: PCI interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 17
uhci_hcd 0000:01:0b.0: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
uhci_hcd 0000:01:0b.0: irq 17, io base 000000000000a400
uhci_hcd 0000:01:0b.0: new USB bus registered, assigned bus number 4
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI interrupt 0000:01:0b.1[B] -> GSI 18 (level, low) -> IRQ 18
uhci_hcd 0000:01:0b.1: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (#2)
uhci_hcd 0000:01:0b.1: irq 18, io base 000000000000a800
uhci_hcd 0000:01:0b.1: new USB bus registered, assigned bus number 5
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard on isa0060/serio0
input: ImExPS/2 Generic Explorer Mouse on isa0060/serio1
Advanced Linux Sound Architecture Driver Version 1.0.6 (Sun Aug 15 07:17:53 2004 UTC).
ACPI: PCI interrupt 0000:00:07.5[B] -> GSI 17 (level, low) -> IRQ 17
intel8x0_measure_ac97_clock: measured 49951 usecs
intel8x0: clocking to 48000
ALSA device list:
  #0: AMD AMD8111 at 0xc800, irq 17
oprofile: using NMI interrupt.
NET: Registered protocol family 2
IP: routing cache hash table of 32768 buckets, 512Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
NET: Registered protocol family 1
ACPI: (supports S0 S1 S4 S5)
ACPI wakeup devices: 
PCI1 USB0 USB1 PS2K PS2M UAR1 UAR2 SMBC AC97 MODM PWRB 
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
VFS: Mounted root (ext3 filesystem) readonly.
Mounted devfs on /dev
Freeing unused kernel memory: 212k freed
Real Time Clock Driver v1.12
EXT3 FS on sda1, internal journal
Adding 10008484k swap on /dev/sda2.  Priority:-1 extents:1
Adding 10008452k swap on /dev/sdb1.  Priority:-2 extents:1
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdb2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ieee1394: Initialized config rom entry `ip1394'
ohci1394: $Rev: 1223 $ Ben Collins <bcollins@debian.org>
ACPI: PCI interrupt 0000:01:0a.0[A] -> GSI 17 (level, low) -> IRQ 17
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[17]  MMIO=[fc9ff000-fc9ff7ff]  Max Packet=[2048]
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[00e0810000301c03]
ip_tables: (C) 2000-2002 Netfilter core team
ip_conntrack version 2.1 (8192 buckets, 65536 max) - 424 bytes per conntrack
inserting floppy driver for 2.6.9-smp-NUMA
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
hdc: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
NET: Registered protocol family 17
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex

[-- Attachment #3: Type: TEXT/PLAIN, Size: 14562 bytes --]

Bootdata ok (command line is BOOT_IMAGE=linux-2.6.9-smp-noNUMA ro root=801 devfs=mount splash=silent)
Linux version 2.6.9-smp-noNUMA (sergei@fang.maths.usyd.edu.au) (gcc version 3.3.2 (Mandrake Linux 10.0 3.3.2-6.1mdk)) #2 SMP Wed Oct 20 23:19:56 EST 2004
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
 BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 00000000bfff0000 (usable)
 BIOS-e820: 00000000bfff0000 - 00000000bffff000 (ACPI data)
 BIOS-e820: 00000000bffff000 - 00000000c0000000 (ACPI NVS)
 BIOS-e820: 00000000ff780000 - 0000000100000000 (reserved)
 BIOS-e820: 0000000100000000 - 0000000140000000 (usable)
No mptable found.
On node 0 totalpages: 1310720
  DMA zone: 4096 pages, LIFO batch:1
  Normal zone: 1306624 pages, LIFO batch:16
  HighMem zone: 0 pages, LIFO batch:1
ACPI: RSDP (v002 ACPIAM                                ) @ 0x00000000000f68a0
ACPI: XSDT (v001 A M I  OEMXSDT  0x06000428 MSFT 0x00000097) @ 0x00000000bfff0100
ACPI: FADT (v001 A M I  OEMFACP  0x06000428 MSFT 0x00000097) @ 0x00000000bfff0281
ACPI: MADT (v001 A M I  OEMAPIC  0x06000428 MSFT 0x00000097) @ 0x00000000bfff0380
ACPI: OEMB (v001 A M I  OEMBIOS  0x06000428 MSFT 0x00000097) @ 0x00000000bffff040
ACPI: HPET (v001 A M I  OEMHPET  0x06000428 MSFT 0x00000097) @ 0x00000000bfff3330
ACPI: ASF! (v001 AMIASF AMDSTRET 0x00000001 INTL 0x02002026) @ 0x00000000bfff3370
ACPI: DSDT (v001  0AAAA 0AAAA000 0x00000000 INTL 0x02002026) @ 0x0000000000000000
ACPI: Local APIC address 0xfee00000
ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
Processor #0 15:5 APIC version 16
ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
Processor #1 15:5 APIC version 16
ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
ACPI: IRQ0 used by override.
ACPI: IRQ2 used by override.
ACPI: IRQ9 used by override.
ACPI: HPET id: 0x102282a0 base: 0xfec01000
Using ACPI (MADT) for SMP configuration information
Checking aperture...
CPU 0: aperture @ f0000000 size 128 MB
CPU 1: aperture @ f0000000 size 128 MB
Built 1 zonelists
Kernel command line: BOOT_IMAGE=linux-2.6.9-smp-noNUMA ro root=801 devfs=mount splash=silent console=tty0
Initializing CPU#0
PID hash table entries: 4096 (order: 12, 131072 bytes)
time.c: Using 14.318180 MHz HPET timer.
time.c: Detected 1992.304 MHz processor.
Console: colour dummy device 80x25
Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
Memory: 4103084k/5242880k available (3765k kernel code, 90508k reserved, 1549k data, 208k init)
Calibrating delay loop... 3940.35 BogoMIPS (lpj=1970176)
Mount-cache hash table entries: 256 (order: 0, 4096 bytes)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
Using local APIC NMI watchdog using perfctr0
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
CPU0: AMD Opteron(tm) Processor 246 stepping 0a
per-CPU timeslice cutoff: 1023.83 usecs.
task migration cache decay timeout: 2 msecs.
Booting processor 1/1 rip 6000 rsp 100bff25f58
Initializing CPU#1
Calibrating delay loop... 3981.31 BogoMIPS (lpj=1990656)
CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
CPU: L2 Cache: 1024K (64 bytes/line)
AMD Opteron(tm) Processor 246 stepping 0a
Total of 2 processors activated (7921.66 BogoMIPS).
Using local APIC timer interrupts.
Detected 12.451 MHz APIC timer.
checking TSC synchronization across 2 CPUs: passed.
time.c: Using HPET based timekeeping.
Brought up 2 CPUs
NET: Registered protocol family 16
PCI: Using configuration type 1
mtrr: v2.0 (20020519)
ACPI: Subsystem revision 20040816
ACPI: Interpreter enabled
ACPI: Using IOAPIC for interrupt routing
ACPI: PCI Root Bridge [PCI0] (00:00)
PCI: Probing PCI hardware (bus 00)
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PCI1._PRT]
ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P2._PRT]
ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 6 7 9 10 *11 12 14 15)
ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 *5 6 7 9 10 11 12 14 15)
ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 6 7 9 *10 11 12 14 15)
SCSI subsystem initialized
usbcore: registered new driver usbfs
usbcore: registered new driver hub
PCI: Using ACPI for IRQ routing
ACPI: PCI interrupt 0000:00:07.2[D] -> GSI 19 (level, low) -> IRQ 19
ACPI: PCI interrupt 0000:00:07.5[B] -> GSI 17 (level, low) -> IRQ 17
ACPI: PCI interrupt 0000:02:00.0[A] -> GSI 16 (level, low) -> IRQ 16
ACPI: PCI interrupt 0000:01:00.0[D] -> GSI 19 (level, low) -> IRQ 19
ACPI: PCI interrupt 0000:01:00.1[D] -> GSI 19 (level, low) -> IRQ 19
ACPI: PCI interrupt 0000:01:03.0[A] -> GSI 18 (level, low) -> IRQ 18
ACPI: PCI interrupt 0000:01:05.0[A] -> GSI 19 (level, low) -> IRQ 19
ACPI: PCI interrupt 0000:01:0a.0[A] -> GSI 17 (level, low) -> IRQ 17
ACPI: PCI interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 17
ACPI: PCI interrupt 0000:01:0b.1[B] -> GSI 18 (level, low) -> IRQ 18
ACPI: PCI interrupt 0000:01:0b.2[C] -> GSI 19 (level, low) -> IRQ 19
agpgart: Detected AMD 8151 AGP Bridge rev B2
agpgart: Maximum main memory to use for agp memory: 4938M
agpgart: AGP aperture is 128M @ 0xf0000000
PCI-DMA: Reserving 64MB of IOMMU area in the AGP aperture
IA32 emulation $Id: sys_ia32.c,v 1.32 2002/03/24 13:02:28 ak Exp $
VFS: Disk quotas dquot_6.5.1
Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
devfs: 2004-01-31 Richard Gooch (rgooch@atnf.csiro.au)
devfs: boot_options: 0x1
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NTFS driver 2.1.20 [Flags: R/O].
SGI XFS with ACLs, large block/inode numbers, no debug enabled
SGI XFS Quota Management subsystem
Initializing Cryptographic API
PCI: Via IRQ fixup for 0000:01:0b.0, from 9 to 1
PCI: Via IRQ fixup for 0000:01:0b.1, from 5 to 2
vesafb: framebuffer at 0xe8000000, mapped to 0xffffff0000100000, size 6144k
vesafb: mode is 1024x768x32, linelength=4096, pages=1
vesafb: scrolling: redraw
vesafb: Truecolor: size=8:8:8:8, shift=24:16:8:0
Console: switching to colour frame buffer device 128x48
fb0: VESA VGA frame buffer device
vga16fb: initializing
vga16fb: mapped to 0x00000100000a0000
fb1: VGA16 VGA frame buffer device
ACPI: Power Button (FF) [PWRF]
ACPI: Processor [CPU1] (supports C1, 8 throttling states)
ACPI: Processor [CPU2] (supports C1)
Non-volatile memory driver v1.2
hw_random: AMD768 system management I/O registers at 0x5000.
hw_random hardware driver 1.0.0 loaded
Linux agpgart interface v0.100 (c) Dave Jones
serio: i8042 AUX port at 0x60,0x64 irq 12
serio: i8042 KBD port at 0x60,0x64 irq 1
Serial: 8250/16550 driver $Revision: 1.90 $ 8 ports, IRQ sharing enabled
ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 32000K size 1024 blocksize
Intel(R) PRO/1000 Network Driver - version 5.3.19-k2
Copyright (c) 1999-2004 Intel Corporation.
ACPI: PCI interrupt 0000:01:03.0[A] -> GSI 18 (level, low) -> IRQ 18
e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
AMD8111: IDE controller at PCI slot 0000:00:07.1
AMD8111: chipset revision 3
AMD8111: not 100% native mode: will probe irqs later
AMD8111: 0000:00:07.1 (rev 03) UDMA133 controller
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:pio, hdb:pio
    ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:DMA, hdd:pio
Probing IDE interface ide0...
Probing IDE interface ide1...
hdc: SONY DVD RW DRU-700A, ATAPI CD/DVD-ROM drive
Using anticipatory io scheduler
ide1 at 0x170-0x177,0x376 on irq 15
libata version 1.02 loaded.
sata_sil version 0.54
ACPI: PCI interrupt 0000:01:05.0[A] -> GSI 19 (level, low) -> IRQ 19
ata1: SATA max UDMA/100 cmd 0xFFFFFF0000004C80 ctl 0xFFFFFF0000004C8A bmdma 0xFFFFFF0000004C00 irq 19
ata2: SATA max UDMA/100 cmd 0xFFFFFF0000004CC0 ctl 0xFFFFFF0000004CCA bmdma 0xFFFFFF0000004C08 irq 19
ata3: SATA max UDMA/100 cmd 0xFFFFFF0000004E80 ctl 0xFFFFFF0000004E8A bmdma 0xFFFFFF0000004E00 irq 19
ata4: SATA max UDMA/100 cmd 0xFFFFFF0000004EC0 ctl 0xFFFFFF0000004ECA bmdma 0xFFFFFF0000004E08 irq 19
ata1: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:203f
ata1: dev 0 ATA, max UDMA/100, 234441648 sectors: lba48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: dev 0 cfg 49:2f00 82:346b 83:7f21 84:4003 85:3469 86:3c01 87:4003 88:203f
ata2: dev 0 ATA, max UDMA/100, 234441648 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sil
ata3: no device found (phy stat 00000000)
scsi2 : sata_sil
ata4: no device found (phy stat 00000000)
scsi3 : sata_sil
  Vendor: ATA       Model: WDC WD1200JD-00G  Rev: 02.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: WDC WD1200JD-00G  Rev: 02.0
  Type:   Direct-Access                      ANSI SCSI revision: 05
st: Version 20040403, fixed bufsize 32768, s/g segs 256
osst :I: Tape driver with OnStream support version 0.99.1
osst :I: $Id: osst.c,v 1.70 2003/12/23 14:22:12 wriede Exp $
SCSI device sda: 234441648 512-byte hdwr sectors (120034 MB)
SCSI device sda: drive cache: write back
 /dev/scsi/host0/bus0/target0/lun0: p1 p2 p3
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sdb: 234441648 512-byte hdwr sectors (120034 MB)
SCSI device sdb: drive cache: write back
 /dev/scsi/host1/bus0/target0/lun0: p1 p2
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
ACPI: PCI interrupt 0000:01:0b.2[C] -> GSI 19 (level, low) -> IRQ 19
ehci_hcd 0000:01:0b.2: VIA Technologies, Inc. USB 2.0
ehci_hcd 0000:01:0b.2: irq 19, pci mem ffffff0000020800
ehci_hcd 0000:01:0b.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:01:0b.2: USB 2.0 enabled, EHCI 1.00, driver 2004-May-10
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 4 ports detected
ohci_hcd: 2004 Feb 02 USB 1.1 'Open' Host Controller (OHCI) Driver (PCI)
ACPI: PCI interrupt 0000:01:00.0[D] -> GSI 19 (level, low) -> IRQ 19
ohci_hcd 0000:01:00.0: Advanced Micro Devices [AMD] AMD-8111 USB
ohci_hcd 0000:01:00.0: irq 19, pci mem ffffff0000022000
ohci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2
hub 2-0:1.0: USB hub found
hub 2-0:1.0: 3 ports detected
ACPI: PCI interrupt 0000:01:00.1[D] -> GSI 19 (level, low) -> IRQ 19
ohci_hcd 0000:01:00.1: Advanced Micro Devices [AMD] AMD-8111 USB (#2)
ohci_hcd 0000:01:00.1: irq 19, pci mem ffffff0000024000
ohci_hcd 0000:01:00.1: new USB bus registered, assigned bus number 3
hub 3-0:1.0: USB hub found
hub 3-0:1.0: 3 ports detected
USB Universal Host Controller Interface driver v2.2
ACPI: PCI interrupt 0000:01:0b.0[A] -> GSI 17 (level, low) -> IRQ 17
uhci_hcd 0000:01:0b.0: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller
uhci_hcd 0000:01:0b.0: irq 17, io base 000000000000a400
uhci_hcd 0000:01:0b.0: new USB bus registered, assigned bus number 4
hub 4-0:1.0: USB hub found
hub 4-0:1.0: 2 ports detected
ACPI: PCI interrupt 0000:01:0b.1[B] -> GSI 18 (level, low) -> IRQ 18
uhci_hcd 0000:01:0b.1: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller (#2)
uhci_hcd 0000:01:0b.1: irq 18, io base 000000000000a800
uhci_hcd 0000:01:0b.1: new USB bus registered, assigned bus number 5
hub 5-0:1.0: USB hub found
hub 5-0:1.0: 2 ports detected
usbcore: registered new driver hiddev
usbcore: registered new driver usbhid
drivers/usb/input/hid-core.c: v2.0:USB HID core driver
mice: PS/2 mouse device common for all mice
input: AT Translated Set 2 keyboard on isa0060/serio0
input: ImExPS/2 Generic Explorer Mouse on isa0060/serio1
Advanced Linux Sound Architecture Driver Version 1.0.6 (Sun Aug 15 07:17:53 2004 UTC).
ACPI: PCI interrupt 0000:00:07.5[B] -> GSI 17 (level, low) -> IRQ 17
intel8x0_measure_ac97_clock: measured 49948 usecs
intel8x0: clocking to 48000
ALSA device list:
  #0: AMD AMD8111 at 0xc800, irq 17
oprofile: using NMI interrupt.
NET: Registered protocol family 2
IP: routing cache hash table of 32768 buckets, 512Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
NET: Registered protocol family 1
ACPI: (supports S0 S1 S4 S5)
ACPI wakeup devices: 
PCI1 USB0 USB1 PS2K PS2M UAR1 UAR2 SMBC AC97 MODM PWRB 
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
VFS: Mounted root (ext3 filesystem) readonly.
Mounted devfs on /dev
Freeing unused kernel memory: 208k freed
Real Time Clock Driver v1.12
EXT3 FS on sda1, internal journal
Adding 10008484k swap on /dev/sda2.  Priority:-1 extents:1
Adding 10008452k swap on /dev/sdb1.  Priority:-2 extents:1
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sda3, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS on sdb2, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
ieee1394: Initialized config rom entry `ip1394'
ohci1394: $Rev: 1223 $ Ben Collins <bcollins@debian.org>
ACPI: PCI interrupt 0000:01:0a.0[A] -> GSI 17 (level, low) -> IRQ 17
ohci1394: fw-host0: OHCI-1394 1.0 (PCI): IRQ=[17]  MMIO=[fc9ff000-fc9ff7ff]  Max Packet=[2048]
ieee1394: Host added: ID:BUS[0-00:1023]  GUID[00e0810000301c03]
ip_tables: (C) 2000-2002 Netfilter core team
ip_conntrack version 2.1 (8192 buckets, 65536 max) - 424 bytes per conntrack
ip_tables: (C) 2000-2002 Netfilter core team
ip_conntrack version 2.1 (8192 buckets, 65536 max) - 424 bytes per conntrack
inserting floppy driver for 2.6.9-smp-noNUMA
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
hdc: ATAPI 40X DVD-ROM DVD-R CD-R/RW drive, 2048kB Cache, UDMA(33)
Uniform CD-ROM driver Revision: 3.20
ide-floppy driver 0.99.newide
NET: Registered protocol family 17
e1000: eth0: e1000_watchdog: NIC Link is Up 100 Mbps Full Duplex
Attached scsi generic sg0 at scsi0, channel 0, id 0, lun 0,  type 0
Attached scsi generic sg1 at scsi1, channel 0, id 0, lun 0,  type 0

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-10-23 10:02                                               ` Andreas Klein
@ 2004-10-23 16:43                                                 ` Andi Kleen
  2004-10-26 11:25                                                   ` Andreas Klein
  0 siblings, 1 reply; 40+ messages in thread
From: Andi Kleen @ 2004-10-23 16:43 UTC (permalink / raw)
  To: Andreas Klein
  Cc: Andrew Walrond, linux-kernel, Sergei Haller, Rafael J. Wysocki, discuss

[cc'ed to discuss@x86-64.org for future reference. If you find
this message in google and you have the same problem, talk
to your BIOS vendor, not to your Linux vendor]

On Sat, Oct 23, 2004 at 12:02:10PM +0200, Andreas Klein wrote:
> - Tyan S2885 pre-production model with a 1.01 pre-release bios
> 6 mb ram (4x512mb, 4x1gb)
> The machine is running SuSE Linux Enterprise Server 8 (32bit).
> We use this machine as our primary mail-server without problems for over a 
> year.
> 
> - Now we ordered 45 Tyan S2885 and 4 S2875S board.
> Both board do not run stable with more than 2GB ram usable.
> 4GB will only be recognized if the MTRR setting is set to Continuous and 
> the Adjust Memory setting is set to Auto.
> If the bios is configured this way and two 1gb ram modules are installed 
> for each CPU on the 2885, the machine will not even load and unpack a 
> SLES 9 kernel. Memtest sees 0-2GB mem usable and 4-6GB unusable (complains 
> about each memory address).
> If all four modules are installed for CPU0, then memtest seems to work 
> without problems (0-2GB, 4-6GB), but SLES9 will crash during boot-up.
> If all four modules are installed for CPU1, then memtest seems to work 
> without problems too. SLES 9 will run a few minutes before a crash.
> I will try to install SLES 8 (32bit) on the new boxes to see if it runs 
> stable. If yes, there is something broken in the 2.6 kernels for amd64, if 
> not, the pre-production bios is better that the final ones.

It all sounds very much like a BIOS problem. I doubt 2.4 will
run stable on this setup - if memtest86 doesn't like the memory, Linux 
won't like it neither. All I can recommend is to talk to Tyan or
live with the lost memory. 

-Andi


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-10-23 10:26                                               ` Sergei Haller
@ 2004-10-23 16:49                                                 ` Andi Kleen
  2004-10-24  9:53                                                   ` Sergei Haller
  0 siblings, 1 reply; 40+ messages in thread
From: Andi Kleen @ 2004-10-23 16:49 UTC (permalink / raw)
  To: Sergei Haller; +Cc: Andrew Walrond, linux-kernel, Rafael J. Wysocki

On Sat, Oct 23, 2004 at 12:26:38PM +0200, Sergei Haller wrote:
> On Fri, 22 Oct 2004, Andi Kleen (AK) wrote:
> 
> AK> > I've cc'ed Andi Kleen (x86_64 supremo) who might have some insights, but I'm 
> AK> > guessing he'll say "Bios problem - tough luck". I might be wrong ;)
> AK> 
> AK> Is there a full boot log of the system? 
> 
> yes. attached two files: 
> 
>   dmesg-2.6.9-smp-NUMA   (crashing one)
>   dmesg-2.6.9-smp-noNUMA (working one)

I bet that if you fill all memory on the non NUMA setup
it will crash too.

e.g. run something like this

#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>

#define MEMSIZE 
main()
{ 
	unsigned long len = (sysconf(_SC_AVPHYS_PAGES) - 10)* getpagesize();
	char *mem = malloc(len);
	for (;;)  {
		memset(mem, 0xff, len); 
		printf(".");
	}
} 


-Andi


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-10-23 16:49                                                 ` Andi Kleen
@ 2004-10-24  9:53                                                   ` Sergei Haller
       [not found]                                                     ` <Pine.LNX.4.58.0410271704050.3903@pluto.physik.uni-wuerzburg.de>
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-10-24  9:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Walrond, linux-kernel, Rafael J. Wysocki

On Sat, 23 Oct 2004, Andi Kleen (AK) wrote:

AK> >   dmesg-2.6.9-smp-noNUMA (working one)
AK> 
AK> I bet that if you fill all memory on the non NUMA setup
AK> it will crash too.
AK> 
AK> e.g. run something like this
AK> 
AK> #include <stdlib.h>
AK> #include <string.h>
AK> #include <unistd.h>
AK> #include <stdio.h>
AK> 
AK> #define MEMSIZE 
AK> main()
AK> { 
AK> 	unsigned long len = (sysconf(_SC_AVPHYS_PAGES) - 10)* getpagesize();
AK> 	char *mem = malloc(len);
AK> 	for (;;)  {
AK> 		memset(mem, 0xff, len); 
AK> 		printf(".");
AK> 	}
AK> } 

what's the difference to the program I posted (here a link to an archive:
  http://marc.theaimsgroup.com/?l=linux-kernel&m=109567610824746&w=4

other than yours is filling the memory with 0xFF and mine with 0x00 and 
mine does it only once and yours continuously? 

BTW: I added an fflush(stdout) after the printf and after two lines of
dots on a 150 cols terminal I just stopped the Program. This is with
2.6.9-smp-noNUMA.

on a 2.6.9-smp-NUMA, running my program crashes the kernel immediately.


c ya
        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: lost memory on a 4GB amd64
  2004-10-23 16:43                                                 ` Andi Kleen
@ 2004-10-26 11:25                                                   ` Andreas Klein
  0 siblings, 0 replies; 40+ messages in thread
From: Andreas Klein @ 2004-10-26 11:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Walrond, linux-kernel, Sergei Haller, Rafael J. Wysocki, discuss


Hello,

You are right. I tried now to install SLES 8 (32bit) and the system 
hangs when initialising CPU0. Our running pre-production sample was 
installed from the same CD and with exactly the same setup.
So something is really broken in the final bios or in the final board 
layout.
The only thing I haven't tried is to flash the pre-production bios on the 
final boards.

Bye,

On Sat, 23 Oct 2004, Andi Kleen wrote:

> [cc'ed to discuss@x86-64.org for future reference. If you find
> this message in google and you have the same problem, talk
> to your BIOS vendor, not to your Linux vendor]
> 
> On Sat, Oct 23, 2004 at 12:02:10PM +0200, Andreas Klein wrote:
> > - Tyan S2885 pre-production model with a 1.01 pre-release bios
> > 6 mb ram (4x512mb, 4x1gb)
> > The machine is running SuSE Linux Enterprise Server 8 (32bit).
> > We use this machine as our primary mail-server without problems for over a 
> > year.
> > 
> > - Now we ordered 45 Tyan S2885 and 4 S2875S board.
> > Both board do not run stable with more than 2GB ram usable.
> > 4GB will only be recognized if the MTRR setting is set to Continuous and 
> > the Adjust Memory setting is set to Auto.
> > If the bios is configured this way and two 1gb ram modules are installed 
> > for each CPU on the 2885, the machine will not even load and unpack a 
> > SLES 9 kernel. Memtest sees 0-2GB mem usable and 4-6GB unusable (complains 
> > about each memory address).
> > If all four modules are installed for CPU0, then memtest seems to work 
> > without problems (0-2GB, 4-6GB), but SLES9 will crash during boot-up.
> > If all four modules are installed for CPU1, then memtest seems to work 
> > without problems too. SLES 9 will run a few minutes before a crash.
> > I will try to install SLES 8 (32bit) on the new boxes to see if it runs 
> > stable. If yes, there is something broken in the 2.6 kernels for amd64, if 
> > not, the pre-production bios is better that the final ones.
> 
> It all sounds very much like a BIOS problem. I doubt 2.4 will
> run stable on this setup - if memtest86 doesn't like the memory, Linux 
> won't like it neither. All I can recommend is to talk to Tyan or
> live with the lost memory. 
> 
> -Andi
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- Andreas Klein
   asklein@cip.physik.uni-wuerzburg.de
   root / webmaster @cip.physik.uni-wuerzburg.de
   root / webmaster @www.physik.uni-wuerzburg.de
_____________________________________
|                                   | 
|   Long live our gracious AMIGA!   |
|___________________________________|


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: solution Re: lost memory on a 4GB amd64
       [not found]                                                     ` <Pine.LNX.4.58.0410271704050.3903@pluto.physik.uni-wuerzburg.de>
@ 2004-10-27 15:39                                                       ` Sergei Haller
  2004-10-27 16:05                                                         ` Andreas Klein
  0 siblings, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-10-27 15:39 UTC (permalink / raw)
  To: Andreas Klein; +Cc: Andi Kleen, Andrew Walrond, Rafael J. Wysocki, linux-kernel

On Wed, 27 Oct 2004, Andreas Klein (AK) wrote:

AK> Hello,
AK> 
AK> the problem has been verified by Tyan. It is definately a hardware issue.
AK> The Tyan and AMD engineers are developing a solution (BIOS) for the 
AK> problem.
AK> They will fix the problem for the S2885, as weel as for the S2875 boards.

Are you sure this is the same problem, that I have? You discovered
Problems with memtest86:

> Memtest sees 0-2GB mem usable and 4-6GB unusable (complains 
> about each memory address).

I didn't:

> memtest86 is happy with the memory.

The next difference: 
You have the S2885 (thunder K8W) and S2875S (tiger K8W single processor) 
boards and I have a S2875 (tiger K8W double processor)


I summarize (again) my problems:

Independantly of the memory settings in the BIOS:
 - non-SMP Kernel is stable
 - memtest86 does not report any errors

If the memory (4GB) is set up in one block (0-4GB) in the BIOS, then
 - SMP Kernel is stable 

If the memory (4GB) is set up in two blocks (eg. 0-3GB, 4-5GB) in the
BIOS, then
 - SMP Kernel is stable _if_and_only_if_ NUMA is _disabled_.


BTW.: I won't be able to flash a new BIOS to our machine, since it is in 
production use and runs _rock_stable_ with _full_memory_ after we
_disabled_ NUMA support in the kernel. (see the two small programs posted 
by me and by Andi Kleen)

What I COULD do is running some tests if needed. (e.g. to check if the 
Board/BIOS is lying about its capabilities)


        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: solution Re: lost memory on a 4GB amd64
  2004-10-27 15:39                                                       ` solution " Sergei Haller
@ 2004-10-27 16:05                                                         ` Andreas Klein
  2004-10-27 16:17                                                           ` Sergei Haller
  2004-10-27 16:44                                                           ` linux-os
  0 siblings, 2 replies; 40+ messages in thread
From: Andreas Klein @ 2004-10-27 16:05 UTC (permalink / raw)
  To: Sergei Haller; +Cc: Andi Kleen, Andrew Walrond, Rafael J. Wysocki, linux-kernel


Hello,

On Wed, 27 Oct 2004, Sergei Haller wrote:

> On Wed, 27 Oct 2004, Andreas Klein (AK) wrote:
> 
> Are you sure this is the same problem, that I have? You discovered
> Problems with memtest86:
> 
> > Memtest sees 0-2GB mem usable and 4-6GB unusable (complains 
> > about each memory address).
> 
> I didn't:
> 
> > memtest86 is happy with the memory.

Memtest is happy with my memory too, if all 4 modules are installed in the 
slots belonging to CPU1. If I install 2 modules for each CPU, memtest86 is 
not happy anymore.

> 
> The next difference: 
> You have the S2885 (thunder K8W) and S2875S (tiger K8W single processor) 
> boards and I have a S2875 (tiger K8W double processor)

The boards are nearly identical (on-board lan is different, and your 
memory-slots are connected to one CPU).
If all memory modules are installed for one CPU, I have your problems. 
Additionaly there are some other problems that only occur, when the 
modules are installed one pair for each CPU.

Since I have a pre-producion board and bios which is running solid as a 
rock, regardless if a SMP/no-SMP kernel is installed, I hope that they
will fix all problems.

> I summarize (again) my problems:
> 
> Independantly of the memory settings in the BIOS:
>  - non-SMP Kernel is stable
>  - memtest86 does not report any errors
> 
> If the memory (4GB) is set up in one block (0-4GB) in the BIOS, then
>  - SMP Kernel is stable 
> 
> If the memory (4GB) is set up in two blocks (eg. 0-3GB, 4-5GB) in the
> BIOS, then
>  - SMP Kernel is stable _if_and_only_if_ NUMA is _disabled_.
> 
> 
> BTW.: I won't be able to flash a new BIOS to our machine, since it is in 
> production use and runs _rock_stable_ with _full_memory_ after we
> _disabled_ NUMA support in the kernel. (see the two small programs posted 
> by me and by Andi Kleen)
> 
> What I COULD do is running some tests if needed. (e.g. to check if the 
> Board/BIOS is lying about its capabilities)
> 
> 
>         Sergei
> -- 
> --------------------------------------------------------------------  -?)
>          eMail:       Sergei.Haller@math.uni-giessen.de               /\\
> -------------------------------------------------------------------- _\_V
> Be careful of reading health books, you might die of a misprint.
>                 -- Mark Twain
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


-- Andreas Klein
   asklein@cip.physik.uni-wuerzburg.de
   root / webmaster @cip.physik.uni-wuerzburg.de
   root / webmaster @www.physik.uni-wuerzburg.de
_____________________________________
|                                   | 
|   Long live our gracious AMIGA!   |
|___________________________________|


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: solution Re: lost memory on a 4GB amd64
  2004-10-27 16:05                                                         ` Andreas Klein
@ 2004-10-27 16:17                                                           ` Sergei Haller
  2004-10-28 13:32                                                             ` Andreas Klein
  2004-10-27 16:44                                                           ` linux-os
  1 sibling, 1 reply; 40+ messages in thread
From: Sergei Haller @ 2004-10-27 16:17 UTC (permalink / raw)
  To: Andreas Klein; +Cc: Andi Kleen, Andrew Walrond, Rafael J. Wysocki, linux-kernel

On Wed, 27 Oct 2004, Andreas Klein (AK) wrote:

AK> > The next difference: 
AK> > You have the S2885 (thunder K8W) and S2875S (tiger K8W single processor) 
AK> > boards and I have a S2875 (tiger K8W double processor)
AK> 
AK> The boards are nearly identical (on-board lan is different, and your 
AK> memory-slots are connected to one CPU).

I don't think that LAN is of any importance for our problems. 

But I was told (see previous messages) that the fact that all memory slots
are connedted to one CPU makes a non-NUMA board of it (S2875).

AK> If all memory modules are installed for one CPU, I have your problems. 

I see.

AK> Additionaly there are some other problems that only occur, when the 
AK> modules are installed one pair for each CPU.

IIRC [I might be wrong], Andrew is running this board (S2885) with exactly
this memory configuration without problems.



c ya
        Sergei
-- 
--------------------------------------------------------------------  -?)
         eMail:       Sergei.Haller@math.uni-giessen.de               /\\
-------------------------------------------------------------------- _\_V
Be careful of reading health books, you might die of a misprint.
                -- Mark Twain

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: solution Re: lost memory on a 4GB amd64
  2004-10-27 16:05                                                         ` Andreas Klein
  2004-10-27 16:17                                                           ` Sergei Haller
@ 2004-10-27 16:44                                                           ` linux-os
  2004-10-27 17:47                                                             ` Andrew Walrond
  1 sibling, 1 reply; 40+ messages in thread
From: linux-os @ 2004-10-27 16:44 UTC (permalink / raw)
  To: Andreas Klein
  Cc: Sergei Haller, Andi Kleen, Andrew Walrond, Rafael J. Wysocki,
	linux-kernel

On Wed, 27 Oct 2004, Andreas Klein wrote:

>
> Hello,
>
> On Wed, 27 Oct 2004, Sergei Haller wrote:
>
>> On Wed, 27 Oct 2004, Andreas Klein (AK) wrote:
>>
>> Are you sure this is the same problem, that I have? You discovered
>> Problems with memtest86:
>>
>>> Memtest sees 0-2GB mem usable and 4-6GB unusable (complains
>>> about each memory address).
>>
>> I didn't:
>>
>>> memtest86 is happy with the memory.
>
> Memtest is happy with my memory too, if all 4 modules are installed in the
> slots belonging to CPU1. If I install 2 modules for each CPU, memtest86 is
> not happy anymore.
>

Could you please explain how memory is connected to only one
CPU? I don't think this is possible.

Is this board for some new multiple-CPU specification? It can't
work for SMP (symmetrical multiprocessor specification) unless
both CPUs can access the same RAM.

[SNIPPED...]


Cheers,
Dick Johnson
Penguin : Linux version 2.6.8 on an i686 machine (5537.79 BogoMips).
  Notice : All mail here is now cached and reviewed by John Ashcroft.
                  98.36% of all statistics are fiction.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: solution Re: lost memory on a 4GB amd64
  2004-10-27 16:44                                                           ` linux-os
@ 2004-10-27 17:47                                                             ` Andrew Walrond
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Walrond @ 2004-10-27 17:47 UTC (permalink / raw)
  To: linux-os
  Cc: Andreas Klein, Sergei Haller, Andi Kleen, Rafael J. Wysocki,
	linux-kernel

On Wednesday 27 Oct 2004 17:44, linux-os wrote:
>
> Could you please explain how memory is connected to only one
> CPU? I don't think this is possible.

The DIMMS are connected connected to cpu1. cpu2 accesses the ram with the 
Hypertransport bus.

See page 9 of ftp://ftp.tyan.com/manuals/m_s2875_102.pdf


>
> Is this board for some new multiple-CPU specification? It can't
> work for SMP (symmetrical multiprocessor specification) unless
> both CPUs can access the same RAM.

They can. Its a sort of castrated NUMA board :)

Andrew Walrond

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: solution Re: lost memory on a 4GB amd64
  2004-10-27 16:17                                                           ` Sergei Haller
@ 2004-10-28 13:32                                                             ` Andreas Klein
  0 siblings, 0 replies; 40+ messages in thread
From: Andreas Klein @ 2004-10-28 13:32 UTC (permalink / raw)
  To: Sergei Haller; +Cc: Andi Kleen, Andrew Walrond, Rafael J. Wysocki, linux-kernel


Hello,


On Wed, 27 Oct 2004, Sergei Haller wrote:

> On Wed, 27 Oct 2004, Andreas Klein (AK) wrote:
> 
> AK> Additionaly there are some other problems that only occur, when the 
> AK> modules are installed one pair for each CPU.
> 
> IIRC [I might be wrong], Andrew is running this board (S2885) with exactly
> this memory configuration without problems.

Yes, I have read that. I also have one board running without problems and 
45 boards which have the problem. The running one has been bought over a 
year ago. The 45 boards which are not running have been bought a few weeks 
ago.
So I think they have made a bad change to the board design since the 
prototype board which is running perfect or there is a bug in the CPUs 
(maybe memory controller, or hypertransport). In the running board CPUs 
with stepping 1 are installed, the not running ones have CPU stepping 10.

Bye,

-- Andreas Klein
   asklein@cip.physik.uni-wuerzburg.de
   root / webmaster @cip.physik.uni-wuerzburg.de
   root / webmaster @www.physik.uni-wuerzburg.de
_____________________________________
|                                   | 
|   Long live our gracious AMIGA!   |
|___________________________________|


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2004-10-28 13:33 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-16  4:48 lost memory on a 4GB amd64 Sergei Haller
2004-09-16 13:30 ` Andrew Walrond
2004-09-16 13:48 ` Andrew Walrond
2004-09-16 14:09   ` Sergei Haller
2004-09-16 14:28     ` Andrew Walrond
2004-09-16 14:56       ` Sergei Haller
2004-09-16 15:19         ` Andrew Walrond
2004-09-16 15:52           ` Sergei Haller
2004-09-18 14:18             ` Sergei Haller
2004-09-19 20:01               ` Jon Masters
2004-09-19 21:47                 ` Sergei Haller
2004-09-19 22:00                   ` Jon Masters
2004-09-19 22:19                     ` Sergei Haller
2004-09-20 10:26                       ` Sergei Haller
2004-09-24  4:38                         ` Sergei Haller
2004-09-24  4:38                         ` Sergei Haller
2004-09-24  8:15                         ` Andrew Walrond
2004-09-24  8:23                           ` Sergei Haller
2004-09-24  8:31                             ` Andrew Walrond
2004-09-24  8:57                               ` Sergei Haller
2004-09-24  9:27                                 ` Rafael J. Wysocki
2004-09-24  9:41                                   ` Andrew Walrond
2004-09-24 11:42                                     ` Sergei Haller
2004-09-24 12:15                                       ` Andrew Walrond
2004-10-22  8:59                                         ` Sergei Haller
2004-10-22  9:26                                           ` Andrew Walrond
2004-10-22 18:24                                             ` Andi Kleen
2004-10-23 10:02                                               ` Andreas Klein
2004-10-23 16:43                                                 ` Andi Kleen
2004-10-26 11:25                                                   ` Andreas Klein
2004-10-23 10:26                                               ` Sergei Haller
2004-10-23 16:49                                                 ` Andi Kleen
2004-10-24  9:53                                                   ` Sergei Haller
     [not found]                                                     ` <Pine.LNX.4.58.0410271704050.3903@pluto.physik.uni-wuerzburg.de>
2004-10-27 15:39                                                       ` solution " Sergei Haller
2004-10-27 16:05                                                         ` Andreas Klein
2004-10-27 16:17                                                           ` Sergei Haller
2004-10-28 13:32                                                             ` Andreas Klein
2004-10-27 16:44                                                           ` linux-os
2004-10-27 17:47                                                             ` Andrew Walrond
2004-09-24 11:50                                   ` Sergei Haller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.