All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Pflug <pgadmin@pse-consulting.de>
To: Jan Beulich <JBeulich@suse.com>
Cc: 810964@bugs.debian.org, xen-devel@lists.xen.org
Subject: Re: [BUG] EDAC infomation partially missing
Date: Fri, 22 Jan 2016 10:09:04 +0100	[thread overview]
Message-ID: <56A1F1B0.9030508@pse-consulting.de> (raw)
In-Reply-To: <56A1184902000078000C9B3D@prv-mh.provo.novell.com>

Am 21.01.16 um 17:41 schrieb Jan Beulich:
>>>> On 20.01.16 at 16:01, <andreas.pflug@web.de> wrote:
>> Initially reported to debian
>> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here:
>>
>> With AMD Opteron 6xxx processors, half of the memory controllers are
>> missing from /sys/devices/system/edac/mc
>> Checked with single 6120 (dual memory controller) and twin 6344 (2x dual
>> MC), other dual-module CPUs might be affected too.
>>
>> Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are
>> listed under /sys/devices/system/edac/mc as expected. Same happens, when
>> Xen 4.1 is used: all MCs present.
>>
>> Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU
>> machine) or mc2/mc3 (dual CPU machine) are present, although the full
>> system memory is accessible. Checked versions were 4.1.4 (Debian
>> Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid)
> As already indicated by Ian in that bug, you should supply us with
> full kernel and hypervisor logs for both the good and bad cases
> (ideally with the same kernel version use in both runs, so that we
> can exclude kernel behavior differences).
Here are some dmesg excerpts, all performed with Linux 4.1.3.

When booting with Xen 4.1.4:

AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 0).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV
0000:00:18.2 (INTERRUPT)
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV
0000:00:19.2 (INTERRUPT)

When booting with Xen 4.4.1:

AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will
not load.
 Either enable ECC checking or force module loading by setting
'ecc_enable_override'.
 (Note that use of the override may cause unknown side effects.)
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV
0000:00:19.2 (INTERRUPT)

Apparently Xen4.4 doesn't report the BIOS flag correctly. I added
ecc_enable_override=1 to amd64_edac_mod, and then I get

EDAC MC: Ver: 3.0.0
AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will
not load.
EDAC amd64: Forcing ECC on!
EDAC amd64: F10h detected (node 0).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV
0000:00:18.2 (INTERRUPT)
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV
0000:00:19.2 (INTERRUPT)

This restored both MCs, so the BIOS flag seems to be the culprit.

Regards,
Andreas

  reply	other threads:[~2016-01-22  9:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <569FA160.6070308@web.de>
2016-01-21 16:41 ` [BUG] EDAC infomation partially missing Jan Beulich
2016-01-22  9:09   ` Andreas Pflug [this message]
2016-01-22 10:40     ` Jan Beulich
2016-01-22 11:33       ` Andreas Pflug
     [not found] <20170513223656.GA40303@scollay.m5p.com>
2017-05-15  8:02 ` Jan Beulich
2017-05-16  3:47   ` Elliott Mitchell
2017-05-16  9:54     ` Jan Beulich
2017-05-16 10:08       ` Andrew Cooper
2017-05-16 18:02       ` Elliott Mitchell
2017-05-13 22:36 Elliott Mitchell
  -- strict thread matches above, loose matches on Subject: below --
2016-01-20 15:01 Andreas Pflug

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56A1F1B0.9030508@pse-consulting.de \
    --to=pgadmin@pse-consulting.de \
    --cc=810964@bugs.debian.org \
    --cc=JBeulich@suse.com \
    --cc=xen-devel@lists.xen.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.