linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Memory issues with Opteron 6220
@ 2012-02-08 14:37 Anders Ossowicki
  2012-02-09  8:33 ` Ingo Molnar
       [not found] ` <20120208205628.GA18909@alberich.amd.com>
  0 siblings, 2 replies; 15+ messages in thread
From: Anders Ossowicki @ 2012-02-08 14:37 UTC (permalink / raw)
  To: linux-kernel; +Cc: jk

Hey,

We're seeing unexpected slowdowns and other memory issues with a new system.
Enough to render it unusable. For example:

Error: open3: fork failed: Cannot allocate memory

at times where there's no real memory pressure:
                   total       used       free     shared    buffers     cached
      Mem:     132270720  131942388     328332          0     299768  103334420
      -/+ buffers/cache:   28308200  103962520
      Swap:      7811068      13760    7797308

The simplest test we've been able to trigger the slowdowns with, is executing
'dpkg -l perl'. On our other systems, this takes a fraction of a second, at
least with a hot cache. Here it takes somewhere between two and four seconds
even when there's no load on the machine. Several other things, including our own
software is similarly slowed down by an order of magnitude or more.


The system is a Dell Poweredge R715, with two eight-core Opteron 6220
processors and 128G of memory. We have several similar systems, such as the one
this should replace: R715, 2x8 core Opteron 6140, 128G memory, and they do not
exhibit any similar symptoms.

We have tried with 2.6.37, 2.6.38, 3.2.5 and 3.3-rc1 with no luck. The
microcode updates from AMD have not helped either.


stracing dpkg -l perl yields
$ time strace -cf dpkg -l perl >/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 95.91    0.017821        1782        10           munmap
  3.40    0.000632           1      1181           read
  0.35    0.000065           1        77        37 open
[..]
  0.00    0.000000           0         2           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.018580                  2197        49 total

real    0m4.005s
user    0m3.250s
sys     0m0.720s

It might just be a red herring though, since it doesn't account for the real
time anyway. On a functioning system the output looks like:
$ time strace -cf dpkg -l perl >/dev/null
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.000123           1       117           read
  0.00    0.000000           0       160           write
[..]
  0.00    0.000000           0         2           arch_prctl
------ ----------- ----------- --------- --------- ----------------
100.00    0.000123                   588        47 total

real    0m0.276s
user    0m0.160s
sys     0m0.090s


The two most obvious differences between a system that works and one that does
not, is the newer CPU and newer memory. The older machines have Samsung
M393B1K70CHD-YH9 chips (8G DDR3 1333MHz ECC REG) and new one has Samsung
M393B2G70BH0-CK0 chips (16G DDR3 1600MHz ECC REG)

/proc/cpuinfo:
processor   : 15
vendor_id   : AuthenticAMD
cpu family  : 21
model       : 1
model name  : AMD Opteron(TM) Processor 6220
stepping    : 2
microcode   : 0x6000613
cpu MHz     : 3000.048
cache size  : 2048 KB
physical id : 1
siblings    : 8
core id     : 3
cpu cores   : 4
apicid      : 39
initial apicid  : 39
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm
constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni
pclmulqdq monitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm
cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs
xop skinit wdt lwp fma4 nodeid_msr topoext perfctr_core arat cpb npt lbrv
svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter
pfthreshold
bogomips    : 6000.40
TLB size    : 1536 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate [9]

DMI info:
Memory Device
    Array Handle: 0x1000
    Error Information Handle: Not Provided
    Total Width: 72 bits
    Data Width: 64 bits
    Size: 16384 MB
    Form Factor: DIMM
    Set: 6
    Locator: DIMM_B4 
    Bank Locator: Not Specified
    Type: <OUT OF SPEC>
    Type Detail: Synchronous
    Speed: 1600 MHz (0.6 ns)
    Manufacturer: 80CE80B380CE
    Part Number: M393B2G70BH0-CK0

If it all seems a bit vague, it's because we're at wits end with how to debug
this issue. Consistent slowdowns and occasional failure to allocate memory for
no apparent reason is what we're seeing. Any help or suggestions is very
welcome.

dmesg is available at http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5.txt
-- 
Anders Ossowicki


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-08 14:37 Memory issues with Opteron 6220 Anders Ossowicki
@ 2012-02-09  8:33 ` Ingo Molnar
  2012-02-09  9:08   ` Eric Dumazet
  2012-02-09 21:07   ` Jesper Krogh
       [not found] ` <20120208205628.GA18909@alberich.amd.com>
  1 sibling, 2 replies; 15+ messages in thread
From: Ingo Molnar @ 2012-02-09  8:33 UTC (permalink / raw)
  To: linux-kernel, jk
  Cc: Andrew Morton, Yinghai Lu, Thomas Gleixner, H. Peter Anvin, Tejun Heo


* Anders Ossowicki <aowi@novozymes.com> wrote:

> Hey,
> 
> We're seeing unexpected slowdowns and other memory issues with a new system.
> Enough to render it unusable. For example:
> 
> Error: open3: fork failed: Cannot allocate memory
> 
> at times where there's no real memory pressure:
>                    total       used       free     shared    buffers     cached
>       Mem:     132270720  131942388     328332          0     299768  103334420
>       -/+ buffers/cache:   28308200  103962520
>       Swap:      7811068      13760    7797308
>
> [...]

> The system is a Dell Poweredge R715, with two eight-core 
> Opteron 6220 processors and 128G of memory. We have several 
> similar systems, such as the one this should replace: R715, 
> 2x8 core Opteron 6140, 128G memory, and they do not exhibit 
> any similar symptoms.

130 MB of RAM visible to Linux isn't the expected bootup default 
indeed. Around 130 *GB* would be expected ...

> We have tried with 2.6.37, 2.6.38, 3.2.5 and 3.3-rc1 with no luck. The
> microcode updates from AMD have not helped either.

Nasty.

No smoking gun in the dmesg:

> dmesg is available at http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5.txt

[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  BIOS-e820: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  BIOS-e820: 0000000000100000 - 00000000df679000 (usable)
[    0.000000]  BIOS-e820: 00000000df679000 - 00000000df68f000 (reserved)
[    0.000000]  BIOS-e820: 00000000df68f000 - 00000000df6ce000 (ACPI data)
[    0.000000]  BIOS-e820: 00000000df6ce000 - 00000000e0000000 (reserved)
[    0.000000]  BIOS-e820: 00000000f0000000 - 00000000f4000000 (reserved)
[    0.000000]  BIOS-e820: 00000000fe000000 - 00000000fec90000 (reserved)
[    0.000000]  BIOS-e820: 00000000fec94000 - 00000000fecd0000 (reserved)
[    0.000000]  BIOS-e820: 00000000fecd4000 - 0000000100000000 (reserved)
[    0.000000]  BIOS-e820: 0000000100000000 - 000000201f000000 (usable)

that 0x201f000000 is slightly above 128 GB.

The lowlevel x86 RAM init code seems to be fine:

[    0.000000] last_pfn = 0x201f000 max_arch_pfn = 0x400000000

that 0x201f000 correctly points to slighly above 128 GB 
physical.

[    0.000000] init_memory_mapping: 0000000100000000-000000201f000000

that too shows that the lowlevel x86 platform memory init code 
still sees 128 GB.

it's spread out amongst 4 nodes, 32 GB each:

[    0.000000] Initmem setup node 0 0000000000000000-0000000820000000
[    0.000000]   NODE_DATA [000000081fffb000 - 000000081fffffff]
[    0.000000] Initmem setup node 1 0000000820000000-0000001020000000
[    0.000000]   NODE_DATA [000000101fffb000 - 000000101fffffff]
[    0.000000] Initmem setup node 2 0000001020000000-0000001820000000
[    0.000000]   NODE_DATA [000000181fffb000 - 000000181fffffff]
[    0.000000] Initmem setup node 3 0000001820000000-000000201f000000
[    0.000000]   NODE_DATA [000000201effa000 - 000000201effefff]

the NORMAL zone gets set up properly:

[    0.000000]   Normal   0x00100000 -> 0x0201f000

and each node zone got 32 GB of RAM:

[    0.000000]   Normal zone: 7354368 pages, LIFO batch:31
[    0.000000]   Normal zone: 8257536 pages, LIFO batch:31
[    0.000000]   Normal zone: 8257536 pages, LIFO batch:31
[    0.000000]   Normal zone: 8253504 pages, LIFO batch:31


and it's all visible in the end to the MM:

[    0.000000] Built 4 zonelists in Zone order, mobility grouping on.  Total pages: 33021506

that's still 125 GB. (cgroup_page appears to pick up 1GB of RAM 
btw.)

So where is the rest of RAM gone? How does /proc/meminfo look 
like?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09  8:33 ` Ingo Molnar
@ 2012-02-09  9:08   ` Eric Dumazet
  2012-02-09 13:23     ` Ingo Molnar
  2012-02-09 21:07   ` Jesper Krogh
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2012-02-09  9:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, jk, Andrew Morton, Yinghai Lu, Thomas Gleixner,
	H. Peter Anvin, Tejun Heo

Le jeudi 09 février 2012 à 09:33 +0100, Ingo Molnar a écrit :
> * Anders Ossowicki <aowi@novozymes.com> wrote:
> 
> > Hey,
> > 
> > We're seeing unexpected slowdowns and other memory issues with a new system.
> > Enough to render it unusable. For example:
> > 
> > Error: open3: fork failed: Cannot allocate memory
> > 
> > at times where there's no real memory pressure:
> >                    total       used       free     shared    buffers     cached
> >       Mem:     132270720  131942388     328332          0     299768  103334420
> >       -/+ buffers/cache:   28308200  103962520
> >       Swap:      7811068      13760    7797308
> >
> > [...]
> 
> > The system is a Dell Poweredge R715, with two eight-core 
> > Opteron 6220 processors and 128G of memory. We have several 
> > similar systems, such as the one this should replace: R715, 
> > 2x8 core Opteron 6140, 128G memory, and they do not exhibit 
> > any similar symptoms.
> 
> 130 MB of RAM visible to Linux isn't the expected bootup default 
> indeed. Around 130 *GB* would be expected ...

Not sure what you mean, I see 128GB in the "free" output, as expected.

I dont understand why there are 4 nodes, given "The system is a Dell
Poweredge R715, with two eight-core Opteron 6220".

Or are each 6220 splitted on two nodes ?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
       [not found] ` <20120208205628.GA18909@alberich.amd.com>
@ 2012-02-09 12:43   ` Anders Ossowicki
  2012-02-09 13:28     ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Anders Ossowicki @ 2012-02-09 12:43 UTC (permalink / raw)
  To: Andreas Herrmann
  Cc: jk, linux-kernel, Andrew Morton, Yinghai Lu, Thomas Gleixner,
	H. Peter Anvin, Tejun Heo, Eric Dumazet, Ingo Molnar

On Wed, Feb 08, 2012 at 09:56:28PM +0100, Andreas Herrmann wrote:
> I assume you have the latest BIOS on your system?
Yep, 2.3.0 is the newest available on Dell's website for this machine.
 
> After glancing through attached dmesg I wonder whether you have "Cool
> and quiet" disabled in BIOS, see
> 
> [    8.936505] [Firmware Bug]: powernow-k8: No compatible ACPI _PSS objects found.
> [    8.936514] [Firmware Bug]: powernow-k8: Try again with latest BIOS.
> 
> Is this on purpose?
I went digging through the power management options of the bios and found that
CPU performance was set to System DBPM[1] by default. After switching it to OS
DBPM, powernow-k8 seemed a lot happier:

[    5.272938] powernow-k8: Found 4 AMD Opteron(TM) Processor 6220
(16 cpu cores) (version 2.20.00)
[    5.273111] powernow-k8: Core Performance Boosting: on.
[    5.273256] powernow-k8:    0 : pstate 0 (3000 MHz)
[..]
[    5.274601] powernow-k8:    4 : pstate 4 (1400 MHz)

full dmesg at http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5-20120209.txt

>From cursory investigation, it appears we've gotten the expected performance
back, when all CPUs are running at max frequency. So far so good.

I am curious though... a few observations:
1) With System DBPM, /proc/cpuinfo said 3GHz, the performance of the machine
   was crappy.
2) With OS DBPM, /proc/cpuinfo said 1.4GHz, the performance of the machine was
   equally crappy, as expected.
3) With OS DBPM, and the performance cpufreq governor, /proc/cpuinfo said 3GHz,
   the performance of the machine was good. Again as expected.

The conclusion I draw from this is that something (the BIOS?) is lying to the
OS. Bad Dell!

The manual is sparse on explanations of this System DBPM. It basically says that
it is a Dell proprietary implementation in BIOS, that provides improved
performance/watt over the OS implementation of AMD PowerNow!.

I apologise if that made you spit out a mouthful of coffee but that really is
what it says. It doesn't seem to be doing its job very well.

This leaves the issue of randomly failing memory allocations. I can't see why
that would be related to the power management woes, but I am by no means an
expert.  I'll see if we can still trigger the problem, but if someone can see a
causal link, please enlighten me.

 
> To rule out memory from being the culprit ...
> Have you tested the newer CPU system with the old memory?
Nope.

> Have you observed any MCEs (e.g. DRAM ECC errors) on the failing system)?
> EDAC should report them in dmesg if this is the case.
Nothing in dmesg or the iDRAC's service event log (where ECC errors usually get
logged as well).


[1] Demand-based power management, apparently.

-- 
Anders Ossowicki


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09  9:08   ` Eric Dumazet
@ 2012-02-09 13:23     ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2012-02-09 13:23 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: linux-kernel, jk, Andrew Morton, Yinghai Lu, Thomas Gleixner,
	H. Peter Anvin, Tejun Heo


* Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Le jeudi 09 février 2012 à 09:33 +0100, Ingo Molnar a écrit :
> > * Anders Ossowicki <aowi@novozymes.com> wrote:
> > 
> > > Hey,
> > > 
> > > We're seeing unexpected slowdowns and other memory issues with a new system.
> > > Enough to render it unusable. For example:
> > > 
> > > Error: open3: fork failed: Cannot allocate memory
> > > 
> > > at times where there's no real memory pressure:
> > >                    total       used       free     shared    buffers     cached
> > >       Mem:     132270720  131942388     328332          0     299768  103334420
> > >       -/+ buffers/cache:   28308200  103962520
> > >       Swap:      7811068      13760    7797308
> > >
> > > [...]
> > 
> > > The system is a Dell Poweredge R715, with two eight-core 
> > > Opteron 6220 processors and 128G of memory. We have several 
> > > similar systems, such as the one this should replace: R715, 
> > > 2x8 core Opteron 6140, 128G memory, and they do not exhibit 
> > > any similar symptoms.
> > 
> > 130 MB of RAM visible to Linux isn't the expected bootup default 
> > indeed. Around 130 *GB* would be expected ...
> 
> Not sure what you mean, I see 128GB in the "free" output, as 
> expected.

Erm, yes. I plead temporary blindness!

So all RAM is visible properly. This error:

> > > Error: open3: fork failed: Cannot allocate memory

suggests allocation failure. How is that possible with so much 
RAM?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09 12:43   ` Anders Ossowicki
@ 2012-02-09 13:28     ` Ingo Molnar
  2012-02-09 13:49       ` Anders Ossowicki
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2012-02-09 13:28 UTC (permalink / raw)
  To: Andreas Herrmann, jk, linux-kernel, Andrew Morton, Yinghai Lu,
	Thomas Gleixner, H. Peter Anvin, Tejun Heo, Eric Dumazet
  Cc: Peter Zijlstra


* Anders Ossowicki <aowi@novozymes.com> wrote:

> I went digging through the power management options of the 
> bios and found that CPU performance was set to System DBPM[1] 
> by default. After switching it to OS DBPM, powernow-k8 seemed 
> a lot happier:

Your bootlog says:

[    0.330000] Performance Events: Broken BIOS detected, complain to your hardware vendor.
[    0.330000] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 1430076)

Do you get that message if DBPM is enabled?

If the message disappeared then I'd suggest to do what that 
kernel message suggests and ask the vendor to disable that BIOS 
option by default, it breaks stuff.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09 13:28     ` Ingo Molnar
@ 2012-02-09 13:49       ` Anders Ossowicki
  2012-02-09 16:11         ` Yinghai Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Anders Ossowicki @ 2012-02-09 13:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andreas Herrmann, JK (Jesper Agerbo Krogh),
	linux-kernel, Andrew Morton, Yinghai Lu, Thomas Gleixner,
	H. Peter Anvin, Tejun Heo, Eric Dumazet, Peter Zijlstra

On Thu, Feb 09, 2012 at 02:28:25PM +0100, Ingo Molnar wrote:
> Your bootlog says:
> 
> [    0.330000] Performance Events: Broken BIOS detected, complain to your hardware vendor.
> [    0.330000] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 1430076)
> 
> Do you get that message if DBPM is enabled?

It's there with System DBPM, OS DBPM and with power management disabled (i.e.
set to maximum performance).
-- 
Anders Ossowicki


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09 13:49       ` Anders Ossowicki
@ 2012-02-09 16:11         ` Yinghai Lu
  2012-02-09 17:51           ` Anders Ossowicki
  0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2012-02-09 16:11 UTC (permalink / raw)
  To: aowi, Ingo Molnar, Andreas Herrmann, JK (Jesper Agerbo Krogh),
	linux-kernel, Andrew Morton, Yinghai Lu, Thomas Gleixner,
	H. Peter Anvin, Tejun Heo, Eric Dumazet, Peter Zijlstra

On Thu, Feb 9, 2012 at 5:49 AM, Anders Ossowicki <aowi@novozymes.com> wrote:
> On Thu, Feb 09, 2012 at 02:28:25PM +0100, Ingo Molnar wrote:
>> Your bootlog says:
>>
>> [    0.330000] Performance Events: Broken BIOS detected, complain to your hardware vendor.
>> [    0.330000] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 1430076)
>>
>> Do you get that message if DBPM is enabled?
>
> It's there with System DBPM, OS DBPM and with power management disabled (i.e.
> set to maximum performance).

mtrr setting has some problem too.

[    3.098277] mtrr: your CPUs had inconsistent fixed MTRR settings
[    3.100001] mtrr: probably your BIOS does not setup all CPUs.
[    3.110000] mtrr: corrected configuration.

can you boot with "debug ignore_loglevel show_msr=16" ?

Yinghai

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09 16:11         ` Yinghai Lu
@ 2012-02-09 17:51           ` Anders Ossowicki
  2012-02-09 18:56             ` Yinghai Lu
  0 siblings, 1 reply; 15+ messages in thread
From: Anders Ossowicki @ 2012-02-09 17:51 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Andreas Herrmann, JK (Jesper Agerbo Krogh),
	linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	Tejun Heo, Eric Dumazet, Peter Zijlstra

On Thu, Feb 09, 2012 at 05:11:04PM +0100, Yinghai Lu wrote:
> mtrr setting has some problem too.
> 
> [    3.098277] mtrr: your CPUs had inconsistent fixed MTRR settings
> [    3.100001] mtrr: probably your BIOS does not setup all CPUs.
> [    3.110000] mtrr: corrected configuration.
> 
> can you boot with "debug ignore_loglevel show_msr=16" ?

Yep, right here:
http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5-20120209-mtrr.txt

-- 
Anders


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09 17:51           ` Anders Ossowicki
@ 2012-02-09 18:56             ` Yinghai Lu
  2012-02-11 13:50               ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Yinghai Lu @ 2012-02-09 18:56 UTC (permalink / raw)
  To: aowi, Yinghai Lu, Ingo Molnar, Andreas Herrmann,
	JK (Jesper Agerbo Krogh),
	linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	Tejun Heo, Eric Dumazet, Peter Zijlstra

On Thu, Feb 9, 2012 at 9:51 AM, Anders Ossowicki <aowi@novozymes.com> wrote:
> On Thu, Feb 09, 2012 at 05:11:04PM +0100, Yinghai Lu wrote:
>> mtrr setting has some problem too.
>>
>> [    3.098277] mtrr: your CPUs had inconsistent fixed MTRR settings
>> [ 3.100001] mtrr: probably your BIOS does not setup all CPUs.
>> [    3.110000] mtrr: corrected configuration.
>>
>> can you boot with "debug ignore_loglevel show_msr=16" ?
>
> Yep, right here:
> http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5-20120209-mtrr.txt
>

Too bad, print_cpu_info() calling for AP get removed by some commit.

now we can not print initial AP register anymore.

Yinghai

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09  8:33 ` Ingo Molnar
  2012-02-09  9:08   ` Eric Dumazet
@ 2012-02-09 21:07   ` Jesper Krogh
  2012-02-10 15:21     ` Jesper Krogh
  1 sibling, 1 reply; 15+ messages in thread
From: Jesper Krogh @ 2012-02-09 21:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, jk, Andrew Morton, Yinghai Lu, Thomas Gleixner,
	H. Peter Anvin, Tejun Heo, yinghai, herrmann.der.user

On 2012-02-09 09:33, Ingo Molnar wrote:
> * Anders Ossowicki<aowi@novozymes.com>  wrote:
>> Hey,
>>
>> We're seeing unexpected slowdowns and other memory issues with a new system.
>> Enough to render it unusable. For example:
>>
>> Error: open3: fork failed: Cannot allocate memory
>>
>> at times where there's no real memory pressure:
>>                     total       used       free     shared    buffers     cached
>>        Mem:     132270720  131942388     328332          0     299768  103334420
>>        -/+ buffers/cache:   28308200  103962520
>>        Swap:      7811068      13760    7797308
>>
>> [...]
Anders' co-worker here..  below C-code (Summary: for -t processes that 
repeatedly
allocates and dallocates 2GB of memory) can excersize the bug
pretty frequently using -t 32 on this machine. On the other 128GB
machine it can run without issues.

It actually ended up toasting the machine:
jk@nysvin:~$ ./foo -t 32
-bash: fork: Cannot allocate memory
jk@nysvin:~$ w
-bash: fork: Cannot allocate memory
jk@nysvin:~$ top
-bash: fork: Cannot allocate memory
jk@nysvin:~$ ls
-bash: fork: Cannot allocate memory

I dont know what to conclude.

jk@nysvin:~$ ./foo -t 32
Upper bound: 1953 MB
malloc(1953) MB failed. iterations: 6
malloc(1953) MB failed. iterations: 2
malloc(1953) MB failed. iterations: 8

foo.c
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/wait.h>

void worker(void)
{
     long long i;
     char *p;
     int action;
     int mult = 500000;
     int size = mult * 4096;
     fprintf(stderr,"Upper bound: %lu MB\n",(long int) (size/1024/1024));
     for (i=0; ; i++) {
         action = i%2;
         switch(action) {
         case 0:
             p = malloc(size);
             if (!p){
                 fprintf(stderr,"malloc(%lu) MB failed. iterations: 
%lli\n", (long int)size/1024/1024,i);
                 exit(1);
             }
             break;
         case 1:
             free(p);
             break;
         }
     }
}

void usage(const char *cmd)
{
     fprintf(stderr,"Usage: %s [-t numthreads]\n", cmd);
     exit(1);
}

int main(int argc, char **argv)
{
     int c, i;
     int nproc = sysconf(_SC_NPROCESSORS_ONLN);


     while ((c = getopt(argc, argv, "t:")) != EOF) {
         switch (c) {
         case 't':
             nproc = strtol(optarg, 0, 0);
             break;
         default:
             usage(argv[0]);
         }
     }

     //printf("forking %d children\n", nproc);
     for (i=0; i < nproc; i++) {
         switch(fork()) {
         case -1:
             fprintf(stderr,"fork: %s\n", strerror(errno));
             exit(1);
         case 0: /* child */
             worker();
             exit(0);
         default: /* parent */
             /* nothing */
             break;
         }
     }

     for (i=0; i < nproc; i++) {
         int x, p;
         p = wait(&x);
     }

     return 0;
}

Can also be found here: http://shrek.krogh.cc/~jesper/foo.c

-- 
Jesper Krogh

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09 21:07   ` Jesper Krogh
@ 2012-02-10 15:21     ` Jesper Krogh
  2012-02-11 13:48       ` Ingo Molnar
  0 siblings, 1 reply; 15+ messages in thread
From: Jesper Krogh @ 2012-02-10 15:21 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, jk, Andrew Morton, Yinghai Lu, Thomas Gleixner,
	H. Peter Anvin, Tejun Heo, herrmann.der.user

Long story short, this is a red herring.

The system we migrated the configuration from had
vm.overcommit_memory => 2, so then the new
one got that too. (50% actual memory + swap)

That worked fine.. We set it back in 2008 due to the
heuristic version not doing the correct thing. What
has happened over the years is that the memory grow
and swap/memory ration has gone smaller, both due
to memory growth and swap being more and more irellevant.

So the new system was set up with reduced swap 8GB vs. 100GB
which mean that the algorithm used by overcommit_memory
ended up not allowing more than: 64GB+8GB of memory being
used (less than physical memory).. The system migrated from would
by this rule allow 64+100GB, this fitting quite ok.

I guess it took so long to realize, since something with "overcommit"
isn't what springs into mind when you dont think you're even
close to be there, combined with the mis-leading power-saving issue
that just confused the problem.

I would admit that we could have saved a significant of time/fustration
if dmesg had revealed a message that it was the overcommit limits being
hit and thus knocking off the processes.

Another change to suggest would be to not kill off processes due
to overcommit at least before actual memory size had been reached.

But, long story short, system misconfiguration..

-- 
Jesper

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-10 15:21     ` Jesper Krogh
@ 2012-02-11 13:48       ` Ingo Molnar
  2012-02-14  9:32         ` Anders Ossowicki
  0 siblings, 1 reply; 15+ messages in thread
From: Ingo Molnar @ 2012-02-11 13:48 UTC (permalink / raw)
  To: Jesper Krogh
  Cc: linux-kernel, jk, Andrew Morton, Yinghai Lu, Thomas Gleixner,
	H. Peter Anvin, Tejun Heo, herrmann.der.user


* Jesper Krogh <jesper@krogh.cc> wrote:

> [...]
> 
> I would admit that we could have saved a significant of 
> time/fustration if dmesg had revealed a message that it was 
> the overcommit limits being hit and thus knocking off the 
> processes.

It would be helpful if you could enhance the printk in such a 
way, and if you tested it with your workload that triggers it - 
and send us the resulting patch. Having more information in the 
dmesg is never bad.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-09 18:56             ` Yinghai Lu
@ 2012-02-11 13:50               ` Ingo Molnar
  0 siblings, 0 replies; 15+ messages in thread
From: Ingo Molnar @ 2012-02-11 13:50 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: aowi, Andreas Herrmann, JK (Jesper Agerbo Krogh),
	linux-kernel, Andrew Morton, Thomas Gleixner, H. Peter Anvin,
	Tejun Heo, Eric Dumazet, Peter Zijlstra


* Yinghai Lu <yinghai@kernel.org> wrote:

> On Thu, Feb 9, 2012 at 9:51 AM, Anders Ossowicki <aowi@novozymes.com> wrote:
> > On Thu, Feb 09, 2012 at 05:11:04PM +0100, Yinghai Lu wrote:
> >> mtrr setting has some problem too.
> >>
> >> [    3.098277] mtrr: your CPUs had inconsistent fixed MTRR settings
> >> [ 3.100001] mtrr: probably your BIOS does not setup all CPUs.
> >> [    3.110000] mtrr: corrected configuration.
> >>
> >> can you boot with "debug ignore_loglevel show_msr=16" ?
> >
> > Yep, right here:
> > http://dev.exherbo.org/~arkanoid/atlas-dmesg-3.2.5-20120209-mtrr.txt
> >
> 
> Too bad, print_cpu_info() calling for AP get removed by some 
> commit.
> 
> now we can not print initial AP register anymore.

Mind sending a patch that puts it back, so that it's printed via 
KERN_DEBUG or such, i.e. does not get emitted in the default 
log. Maybe even tie it to apic=debug or so.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Memory issues with Opteron 6220
  2012-02-11 13:48       ` Ingo Molnar
@ 2012-02-14  9:32         ` Anders Ossowicki
  0 siblings, 0 replies; 15+ messages in thread
From: Anders Ossowicki @ 2012-02-14  9:32 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel

On Sat, Feb 11, 2012 at 02:48:47PM +0100, Ingo Molnar wrote:
> It would be helpful if you could enhance the printk in such a 
> way, and if you tested it with your workload that triggers it - 
> and send us the resulting patch. Having more information in the 
> dmesg is never bad.
I'd be happy to do so, but I'm not sure where the appropriate place to add it
is.  I'm guessing a printk wrapped in a printk_ratelimit somewhere in mm but I
know next to nothing about the internals of the kernel.
-- 
Anders Ossowicki


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-02-14  9:32 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-02-08 14:37 Memory issues with Opteron 6220 Anders Ossowicki
2012-02-09  8:33 ` Ingo Molnar
2012-02-09  9:08   ` Eric Dumazet
2012-02-09 13:23     ` Ingo Molnar
2012-02-09 21:07   ` Jesper Krogh
2012-02-10 15:21     ` Jesper Krogh
2012-02-11 13:48       ` Ingo Molnar
2012-02-14  9:32         ` Anders Ossowicki
     [not found] ` <20120208205628.GA18909@alberich.amd.com>
2012-02-09 12:43   ` Anders Ossowicki
2012-02-09 13:28     ` Ingo Molnar
2012-02-09 13:49       ` Anders Ossowicki
2012-02-09 16:11         ` Yinghai Lu
2012-02-09 17:51           ` Anders Ossowicki
2012-02-09 18:56             ` Yinghai Lu
2012-02-11 13:50               ` Ingo Molnar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).