All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [BUG] EDAC infomation partially missing
       [not found] <20170513223656.GA40303@scollay.m5p.com>
@ 2017-05-15  8:02 ` Jan Beulich
  2017-05-16  3:47   ` Elliott Mitchell
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2017-05-15  8:02 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: 810964, xen-devel

>>> On 14.05.17 at 00:36, <ehem+debian@m5p.com> wrote:
> I haven't yet done as much experimentation as Andreas Pflug has, but I
> can confirm I'm also running into this bug with Xen 4.4.1.
> 
> I've only tried Linux kernel 3.16.43, but as Dom0:
> 
> EDAC MC: Ver: 3.0.0
> AMD64 EDAC driver v3.4.0
> EDAC amd64: DRAM ECC enabled.
> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
> EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
> load.
> AMD64 EDAC driver v3.4.0
> EDAC amd64: DRAM ECC enabled.
> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
> EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
> load.

Afaict the driver as is simply can't work in a Xen Dom0; it needs
enabling (read: para-virtualizing). I'm actually glad to see it doesn't
load (the worse alternative would be for it to load and then do the
wrong thing or give you a false sense of safety of your data).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] EDAC infomation partially missing
  2017-05-15  8:02 ` [BUG] EDAC infomation partially missing Jan Beulich
@ 2017-05-16  3:47   ` Elliott Mitchell
  2017-05-16  9:54     ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Elliott Mitchell @ 2017-05-16  3:47 UTC (permalink / raw)
  To: Jan Beulich; +Cc: 810964, xen-devel

On Mon, May 15, 2017 at 02:02:53AM -0600, Jan Beulich wrote:
> >>> On 14.05.17 at 00:36, <ehem+debian@m5p.com> wrote:
> > I haven't yet done as much experimentation as Andreas Pflug has, but I
> > can confirm I'm also running into this bug with Xen 4.4.1.
> > 
> > I've only tried Linux kernel 3.16.43, but as Dom0:
> > 
> > EDAC MC: Ver: 3.0.0
> > AMD64 EDAC driver v3.4.0
> > EDAC amd64: DRAM ECC enabled.
> > EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
> > EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
> > load.
> > AMD64 EDAC driver v3.4.0
> > EDAC amd64: DRAM ECC enabled.
> > EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
> > EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
> > load.
> 
> Afaict the driver as is simply can't work in a Xen Dom0; it needs
> enabling (read: para-virtualizing). I'm actually glad to see it doesn't
> load (the worse alternative would be for it to load and then do the
> wrong thing or give you a false sense of safety of your data).

I'm unsure of how to evaluate the situation.  Since ECC is enabled in the
BIOS, data should be safe whether or not the EDAC driver loads.  I
/suspect/ the EDAC driver failing to load merely means reportting of ECC
errors won't happen.  I suspect the only paravirtualization needed is to
map the physical address of the soft|hard errors to which VM's memory
range was effected.  What this effects is which VM should panic in case
of hard errors.

Depending upon the environment there may or may not be cause to report
soft errors anywhere beside Dom0.  In most cases a soft error will at
worst trigger a desire to replace the memory module, but not trigger a
panic for the affected VM.  It is only once a hard error occurs that it
is urgent to warn the effected VM and cause a panic; in this case it
may also be desireable to first alert Dom0 anyway.

As such I'm inclined to think force-enabling ECC EDAC monitoring in Dom0
is the best approach for now.  As long as a hard error doesn't occur in
Dom0's address range, Dom0 is in the best position to deal with the
situation.  The worst case is a hard error occuring in Xen's address
range, since that will mean all VMs on the machine are likely to be
toast.

I think this should be a fairly high priority for Xen since ECC memory is
a feature very common on systems running with a hypervisor.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         EHeM+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] EDAC infomation partially missing
  2017-05-16  3:47   ` Elliott Mitchell
@ 2017-05-16  9:54     ` Jan Beulich
  2017-05-16 10:08       ` Andrew Cooper
  2017-05-16 18:02       ` Elliott Mitchell
  0 siblings, 2 replies; 11+ messages in thread
From: Jan Beulich @ 2017-05-16  9:54 UTC (permalink / raw)
  To: Elliott Mitchell; +Cc: 810964, xen-devel

>>> On 16.05.17 at 05:47, <ehem+debian@m5p.com> wrote:
> On Mon, May 15, 2017 at 02:02:53AM -0600, Jan Beulich wrote:
>> >>> On 14.05.17 at 00:36, <ehem+debian@m5p.com> wrote:
>> > I haven't yet done as much experimentation as Andreas Pflug has, but I
>> > can confirm I'm also running into this bug with Xen 4.4.1.
>> > 
>> > I've only tried Linux kernel 3.16.43, but as Dom0:
>> > 
>> > EDAC MC: Ver: 3.0.0
>> > AMD64 EDAC driver v3.4.0
>> > EDAC amd64: DRAM ECC enabled.
>> > EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
>> > EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
>> > AMD64 EDAC driver v3.4.0
>> > EDAC amd64: DRAM ECC enabled.
>> > EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
>> > EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
>> > load.
>> 
>> Afaict the driver as is simply can't work in a Xen Dom0; it needs
>> enabling (read: para-virtualizing). I'm actually glad to see it doesn't
>> load (the worse alternative would be for it to load and then do the
>> wrong thing or give you a false sense of safety of your data).
> 
> I'm unsure of how to evaluate the situation.  Since ECC is enabled in the
> BIOS, data should be safe whether or not the EDAC driver loads.  I
> /suspect/ the EDAC driver failing to load merely means reportting of ECC
> errors won't happen.

"Merely" being relative here: The missing reports mean a false feeling
of safety, as they may be early indications of later double-bit errors.

>  I suspect the only paravirtualization needed is to
> map the physical address of the soft|hard errors to which VM's memory
> range was effected.  What this effects is which VM should panic in case
> of hard errors.

Which in turn obviously requires hypervisor interaction. It's not really
clear to me whether perhaps the driver would better live in the
hypervisor in the first place for that reason.

And there's a second piece of paravirtualization needed: The driver
doesn't distinguish physical and machine address spaces, yet the
addresses reported by hardware are machine ones and hence would
generally need translation to physical ones in order to assign Dom0-
local meaning to them (or to determine that the address belongs to
another VM or the hypervisor).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] EDAC infomation partially missing
  2017-05-16  9:54     ` Jan Beulich
@ 2017-05-16 10:08       ` Andrew Cooper
  2017-05-16 18:02       ` Elliott Mitchell
  1 sibling, 0 replies; 11+ messages in thread
From: Andrew Cooper @ 2017-05-16 10:08 UTC (permalink / raw)
  To: Jan Beulich, Elliott Mitchell; +Cc: 810964, xen-devel

On 16/05/17 10:54, Jan Beulich wrote:
>>>> On 16.05.17 at 05:47, <ehem+debian@m5p.com> wrote:
>> On Mon, May 15, 2017 at 02:02:53AM -0600, Jan Beulich wrote:
>>>>>> On 14.05.17 at 00:36, <ehem+debian@m5p.com> wrote:
>>>> I haven't yet done as much experimentation as Andreas Pflug has, but I
>>>> can confirm I'm also running into this bug with Xen 4.4.1.
>>>>
>>>> I've only tried Linux kernel 3.16.43, but as Dom0:
>>>>
>>>> EDAC MC: Ver: 3.0.0
>>>> AMD64 EDAC driver v3.4.0
>>>> EDAC amd64: DRAM ECC enabled.
>>>> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
>>>> EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
>>>> AMD64 EDAC driver v3.4.0
>>>> EDAC amd64: DRAM ECC enabled.
>>>> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
>>>> EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not 
>>>> load.
>>> Afaict the driver as is simply can't work in a Xen Dom0; it needs
>>> enabling (read: para-virtualizing). I'm actually glad to see it doesn't
>>> load (the worse alternative would be for it to load and then do the
>>> wrong thing or give you a false sense of safety of your data).
>> I'm unsure of how to evaluate the situation.  Since ECC is enabled in the
>> BIOS, data should be safe whether or not the EDAC driver loads.  I
>> /suspect/ the EDAC driver failing to load merely means reportting of ECC
>> errors won't happen.
> "Merely" being relative here: The missing reports mean a false feeling
> of safety, as they may be early indications of later double-bit errors.
>
>>  I suspect the only paravirtualization needed is to
>> map the physical address of the soft|hard errors to which VM's memory
>> range was effected.  What this effects is which VM should panic in case
>> of hard errors.
> Which in turn obviously requires hypervisor interaction. It's not really
> clear to me whether perhaps the driver would better live in the
> hypervisor in the first place for that reason.

The driver should probably live directly in Xen; it needs to program a
number of nothbridge and CPU registers including interrupt information.

For the reporting side of things, it looks like it would require vMCE to
pass on fault information to guests.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] EDAC infomation partially missing
  2017-05-16  9:54     ` Jan Beulich
  2017-05-16 10:08       ` Andrew Cooper
@ 2017-05-16 18:02       ` Elliott Mitchell
  1 sibling, 0 replies; 11+ messages in thread
From: Elliott Mitchell @ 2017-05-16 18:02 UTC (permalink / raw)
  To: Jan Beulich; +Cc: 810964, xen-devel

On Tue, May 16, 2017 at 03:54:37AM -0600, Jan Beulich wrote:
> >>> On 16.05.17 at 05:47, <ehem+debian@m5p.com> wrote:
> >  I suspect the only paravirtualization needed is to
> > map the physical address of the soft|hard errors to which VM's memory
> > range was effected.  What this effects is which VM should panic in case
> > of hard errors.
> 
> Which in turn obviously requires hypervisor interaction. It's not really
> clear to me whether perhaps the driver would better live in the
> hypervisor in the first place for that reason.
> 
> And there's a second piece of paravirtualization needed: The driver
> doesn't distinguish physical and machine address spaces, yet the
> addresses reported by hardware are machine ones and hence would
> generally need translation to physical ones in order to assign Dom0-
> local meaning to them (or to determine that the address belongs to
> another VM or the hypervisor).

Merely reporting the machine address to Dom0 is already high value since
it lets you attribute the failure to a memory module.  Without that you
may have a VM or whole machine randomly crash for a completely unknown
reason.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         EHeM+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] EDAC infomation partially missing
@ 2017-05-13 22:36 Elliott Mitchell
  0 siblings, 0 replies; 11+ messages in thread
From: Elliott Mitchell @ 2017-05-13 22:36 UTC (permalink / raw)
  To: 810964, xen-devel

I haven't yet done as much experimentation as Andreas Pflug has, but I
can confirm I'm also running into this bug with Xen 4.4.1.

I've only tried Linux kernel 3.16.43, but as Dom0:

EDAC MC: Ver: 3.0.0
AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.
AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load.

Whereas directly booting:

EDAC MC: Ver: 3.0.0
AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 0).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:     0MB 3:     0MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:  4096MB 1:  4096MB
EDAC amd64: MC: 2:     0MB 3:     0MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x4 syndromes.
EDAC amd64: MCT channel count: 2
EDAC amd64: CS0: Unbuffered DDR3 RAM
EDAC amd64: CS1: Unbuffered DDR3 RAM
EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV 0000:00:18.2 (INTERRUPT)
EDAC PCI0: Giving out device to module amd64_edac controller EDAC PCI controller: DEV 0000:00:18.2 (POLLED)

I have not tried force-enabling ECC checking.  Since I place high value
on my data, I rate this as a rather important bug.


-- 
(\___(\___(\______          --=> 8-) EHM <=--          ______/)___/)___/)
 \BS (    |         EHeM+sigmsg@m5p.com  PGP 87145445         |    )   /
  \_CS\   |  _____  -O #include <stddisclaimer.h> O-   _____  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] EDAC infomation partially missing
  2016-01-22 10:40     ` Jan Beulich
@ 2016-01-22 11:33       ` Andreas Pflug
  0 siblings, 0 replies; 11+ messages in thread
From: Andreas Pflug @ 2016-01-22 11:33 UTC (permalink / raw)
  To: Jan Beulich; +Cc: 810964, xen-devel

Am 22.01.16 um 11:40 schrieb Jan Beulich:
>>>> On 22.01.16 at 10:09, <pgadmin@pse-consulting.de> wrote:
>> When booting with Xen 4.4.1:
>>
>> AMD64 EDAC driver v3.4.0
>> EDAC amd64: DRAM ECC enabled.
>> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
> I wonder how valid his message is. We actually write this MSR with
> all ones during boot.
>
> However, considering involved functions like
> nb_mce_bank_enabled_on_node() or node_to_amd_nb() taking
> node IDs as inputs, and considering that PV guests (including
> Dom0) don't have a topology matching that of the host, I doubt
> very much that this driver is even remotely prepared to run
> under Xen. It working on Xen 4.1.x would then be by pure
> accident.
The dmesg is identical with or without Xen4.1, so I'd guess it does work
if flags are detected correctly.

Regards
Andreas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] EDAC infomation partially missing
  2016-01-22  9:09   ` Andreas Pflug
@ 2016-01-22 10:40     ` Jan Beulich
  2016-01-22 11:33       ` Andreas Pflug
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2016-01-22 10:40 UTC (permalink / raw)
  To: Andreas Pflug; +Cc: 810964, xen-devel

>>> On 22.01.16 at 10:09, <pgadmin@pse-consulting.de> wrote:
> When booting with Xen 4.4.1:
> 
> AMD64 EDAC driver v3.4.0
> EDAC amd64: DRAM ECC enabled.
> EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.

I wonder how valid his message is. We actually write this MSR with
all ones during boot.

However, considering involved functions like
nb_mce_bank_enabled_on_node() or node_to_amd_nb() taking
node IDs as inputs, and considering that PV guests (including
Dom0) don't have a topology matching that of the host, I doubt
very much that this driver is even remotely prepared to run
under Xen. It working on Xen 4.1.x would then be by pure
accident.

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] EDAC infomation partially missing
  2016-01-21 16:41 ` Jan Beulich
@ 2016-01-22  9:09   ` Andreas Pflug
  2016-01-22 10:40     ` Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Andreas Pflug @ 2016-01-22  9:09 UTC (permalink / raw)
  To: Jan Beulich; +Cc: 810964, xen-devel

Am 21.01.16 um 17:41 schrieb Jan Beulich:
>>>> On 20.01.16 at 16:01, <andreas.pflug@web.de> wrote:
>> Initially reported to debian
>> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here:
>>
>> With AMD Opteron 6xxx processors, half of the memory controllers are
>> missing from /sys/devices/system/edac/mc
>> Checked with single 6120 (dual memory controller) and twin 6344 (2x dual
>> MC), other dual-module CPUs might be affected too.
>>
>> Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are
>> listed under /sys/devices/system/edac/mc as expected. Same happens, when
>> Xen 4.1 is used: all MCs present.
>>
>> Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU
>> machine) or mc2/mc3 (dual CPU machine) are present, although the full
>> system memory is accessible. Checked versions were 4.1.4 (Debian
>> Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid)
> As already indicated by Ian in that bug, you should supply us with
> full kernel and hypervisor logs for both the good and bad cases
> (ideally with the same kernel version use in both runs, so that we
> can exclude kernel behavior differences).
Here are some dmesg excerpts, all performed with Linux 4.1.3.

When booting with Xen 4.1.4:

AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 0).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV
0000:00:18.2 (INTERRUPT)
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV
0000:00:19.2 (INTERRUPT)

When booting with Xen 4.4.1:

AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will
not load.
 Either enable ECC checking or force module loading by setting
'ecc_enable_override'.
 (Note that use of the override may cause unknown side effects.)
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV
0000:00:19.2 (INTERRUPT)

Apparently Xen4.4 doesn't report the BIOS flag correctly. I added
ecc_enable_override=1 to amd64_edac_mod, and then I get

EDAC MC: Ver: 3.0.0
AMD64 EDAC driver v3.4.0
EDAC amd64: DRAM ECC enabled.
EDAC amd64: NB MCE bank disabled, set MSR 0x0000017b[4] on node 0 to enable.
EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will
not load.
EDAC amd64: Forcing ECC on!
EDAC amd64: F10h detected (node 0).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC0: Giving out device to module amd64_edac controller F10h: DEV
0000:00:18.2 (INTERRUPT)
EDAC amd64: DRAM ECC enabled.
EDAC amd64: F10h detected (node 1).
EDAC MC: DCT0 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC MC: DCT1 chip selects:
EDAC amd64: MC: 0:     0MB 1:     0MB
EDAC amd64: MC: 2:  2048MB 3:  2048MB
EDAC amd64: MC: 4:     0MB 5:     0MB
EDAC amd64: MC: 6:     0MB 7:     0MB
EDAC amd64: using x8 syndromes.
EDAC amd64: MCT channel count: 2
EDAC MC1: Giving out device to module amd64_edac controller F10h: DEV
0000:00:19.2 (INTERRUPT)

This restored both MCs, so the BIOS flag seems to be the culprit.

Regards,
Andreas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] EDAC infomation partially missing
       [not found] <569FA160.6070308@web.de>
@ 2016-01-21 16:41 ` Jan Beulich
  2016-01-22  9:09   ` Andreas Pflug
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2016-01-21 16:41 UTC (permalink / raw)
  To: Andreas Pflug; +Cc: 810964, xen-devel

>>> On 20.01.16 at 16:01, <andreas.pflug@web.de> wrote:
> Initially reported to debian
> (http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here:
> 
> With AMD Opteron 6xxx processors, half of the memory controllers are
> missing from /sys/devices/system/edac/mc
> Checked with single 6120 (dual memory controller) and twin 6344 (2x dual
> MC), other dual-module CPUs might be affected too.
> 
> Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are
> listed under /sys/devices/system/edac/mc as expected. Same happens, when
> Xen 4.1 is used: all MCs present.
> 
> Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU
> machine) or mc2/mc3 (dual CPU machine) are present, although the full
> system memory is accessible. Checked versions were 4.1.4 (Debian
> Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid)

As already indicated by Ian in that bug, you should supply us with
full kernel and hypervisor logs for both the good and bad cases
(ideally with the same kernel version use in both runs, so that we
can exclude kernel behavior differences).

Jan

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [BUG] EDAC infomation partially missing
@ 2016-01-20 15:01 Andreas Pflug
  0 siblings, 0 replies; 11+ messages in thread
From: Andreas Pflug @ 2016-01-20 15:01 UTC (permalink / raw)
  To: xen-devel, 810964

Initially reported to debian
(http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810964), redirected here:

With AMD Opteron 6xxx processors, half of the memory controllers are
missing from /sys/devices/system/edac/mc
Checked with single 6120 (dual memory controller) and twin 6344 (2x dual
MC), other dual-module CPUs might be affected too.

Booting plain Linux (3.2, 3.16, 4.1, 4.3), all memory controllers are
listed under /sys/devices/system/edac/mc as expected. Same happens, when
Xen 4.1 is used: all MCs present.

Starting with Xen 4.4 (Debian Jessie), only mc1 (on the single CPU
machine) or mc2/mc3 (dual CPU machine) are present, although the full
system memory is accessible. Checked versions were 4.1.4 (Debian
Wheezy), 4.4.1 (Jessie) and 4.6.0 (Sid)

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-05-16 18:02 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170513223656.GA40303@scollay.m5p.com>
2017-05-15  8:02 ` [BUG] EDAC infomation partially missing Jan Beulich
2017-05-16  3:47   ` Elliott Mitchell
2017-05-16  9:54     ` Jan Beulich
2017-05-16 10:08       ` Andrew Cooper
2017-05-16 18:02       ` Elliott Mitchell
2017-05-13 22:36 Elliott Mitchell
     [not found] <569FA160.6070308@web.de>
2016-01-21 16:41 ` Jan Beulich
2016-01-22  9:09   ` Andreas Pflug
2016-01-22 10:40     ` Jan Beulich
2016-01-22 11:33       ` Andreas Pflug
  -- strict thread matches above, loose matches on Subject: below --
2016-01-20 15:01 Andreas Pflug

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.