All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Russ Anderson <rja@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	Justin Ernst <justin.ernst@hpe.com>,
	russ.anderson@hpe.com, Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	Aristeu Rozanski Filho <arozansk@redhat.com>
Subject: Re: [PATCH] Raise maximum number of memory controllers
Date: Wed, 26 Sep 2018 16:02:57 -0700	[thread overview]
Message-ID: <20180926230257.GA5666@agluck-desk> (raw)
In-Reply-To: <20180926182317.patqjso7nzw2oxiz@hpe.com>

This issue has made me look a bit more at what EDAC puts in sysfs.
It seems like the current code inherits some useless baggage
from the device calls it makes.

E.g. all the "power" subdirectories:

$ find /sys/devices/system/edac -name power
/sys/devices/system/edac/power
/sys/devices/system/edac/mc/mc6/dimm3/power
/sys/devices/system/edac/mc/mc6/power
/sys/devices/system/edac/mc/mc6/csrow0/power
/sys/devices/system/edac/mc/mc6/dimm6/power
/sys/devices/system/edac/mc/mc6/dimm0/power
/sys/devices/system/edac/mc/mc6/dimm9/power
/sys/devices/system/edac/mc/mc4/dimm3/power
/sys/devices/system/edac/mc/mc4/power
... total of 50 of these ...

$ grep -r . /sys/devices/system/edac/mc/mc6/dimm0/power
/sys/devices/system/edac/mc/mc6/dimm0/power/runtime_active_time:0
/sys/devices/system/edac/mc/mc6/dimm0/power/runtime_status:unsupported
grep: /sys/devices/system/edac/mc/mc6/dimm0/power/autosuspend_delay_ms: Input/output error
/sys/devices/system/edac/mc/mc6/dimm0/power/runtime_suspended_time:0
/sys/devices/system/edac/mc/mc6/dimm0/power/control:auto

We don't have stats, nor control of power on a per memory controller
or per dimm basis. So all these files are just noise.


But ... we are at -rc5. Not sure that we'll figure out, write, test & debug
the proper solution in the next 3-4 weeks. So perhaps we should apply

-#define EDAC_MAX_MCS   16
+#define EDAC_MAX_MCS   64

as a temporary band-aid to get HPE's 32-socket machine running while
we work on the proper fix?

-Tony

WARNING: multiple messages have this Message-ID (diff)
From: "Luck, Tony" <tony.luck@intel.com>
To: Russ Anderson <rja@hpe.com>
Cc: Borislav Petkov <bp@alien8.de>,
	Mauro Carvalho Chehab <mchehab+samsung@kernel.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	Justin Ernst <justin.ernst@hpe.com>,
	russ.anderson@hpe.com, Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
	Aristeu Rozanski Filho <arozansk@redhat.com>
Subject: Raise maximum number of memory controllers
Date: Wed, 26 Sep 2018 16:02:57 -0700	[thread overview]
Message-ID: <20180926230257.GA5666@agluck-desk> (raw)

This issue has made me look a bit more at what EDAC puts in sysfs.
It seems like the current code inherits some useless baggage
from the device calls it makes.

E.g. all the "power" subdirectories:

$ find /sys/devices/system/edac -name power
/sys/devices/system/edac/power
/sys/devices/system/edac/mc/mc6/dimm3/power
/sys/devices/system/edac/mc/mc6/power
/sys/devices/system/edac/mc/mc6/csrow0/power
/sys/devices/system/edac/mc/mc6/dimm6/power
/sys/devices/system/edac/mc/mc6/dimm0/power
/sys/devices/system/edac/mc/mc6/dimm9/power
/sys/devices/system/edac/mc/mc4/dimm3/power
/sys/devices/system/edac/mc/mc4/power
... total of 50 of these ...

$ grep -r . /sys/devices/system/edac/mc/mc6/dimm0/power
/sys/devices/system/edac/mc/mc6/dimm0/power/runtime_active_time:0
/sys/devices/system/edac/mc/mc6/dimm0/power/runtime_status:unsupported
grep: /sys/devices/system/edac/mc/mc6/dimm0/power/autosuspend_delay_ms: Input/output error
/sys/devices/system/edac/mc/mc6/dimm0/power/runtime_suspended_time:0
/sys/devices/system/edac/mc/mc6/dimm0/power/control:auto

We don't have stats, nor control of power on a per memory controller
or per dimm basis. So all these files are just noise.


But ... we are at -rc5. Not sure that we'll figure out, write, test & debug
the proper solution in the next 3-4 weeks. So perhaps we should apply

-#define EDAC_MAX_MCS   16
+#define EDAC_MAX_MCS   64

as a temporary band-aid to get HPE's 32-socket machine running while
we work on the proper fix?

-Tony

  reply	other threads:[~2018-09-26 23:03 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25 14:34 [PATCH] Raise maximum number of memory controllers Justin Ernst
2018-09-25 14:34 ` Justin Ernst
2018-09-25 15:26 ` [PATCH] " Borislav Petkov
2018-09-25 15:26   ` Borislav Petkov
2018-09-25 17:50   ` [PATCH] " Luck, Tony
2018-09-25 17:50     ` Luck, Tony
2018-09-25 18:07     ` [PATCH] " Borislav Petkov
2018-09-25 18:07       ` Borislav Petkov
2018-09-26  9:35       ` [PATCH] " Borislav Petkov
2018-09-26  9:35         ` Borislav Petkov
2018-09-26 15:27         ` [PATCH] " Borislav Petkov
2018-09-26 15:27           ` Borislav Petkov
2018-09-26 16:03           ` [PATCH] " Mauro Carvalho Chehab
2018-09-26 16:03             ` Mauro Carvalho Chehab
2018-09-26 16:17             ` [PATCH] " Borislav Petkov
2018-09-26 16:17               ` Borislav Petkov
2018-09-26 17:39               ` [PATCH] " Mauro Carvalho Chehab
2018-09-26 17:39                 ` Mauro Carvalho Chehab
2018-09-26 18:10               ` [PATCH] " Luck, Tony
2018-09-26 18:10                 ` Luck, Tony
2018-09-26 18:23                 ` [PATCH] " Russ Anderson
2018-09-26 18:23                   ` Russ Anderson
2018-09-26 23:02                   ` Luck, Tony [this message]
2018-09-26 23:02                     ` Luck, Tony
2018-09-27  4:52                     ` [PATCH] " Borislav Petkov
2018-09-27  4:52                       ` Borislav Petkov
2018-09-27 21:44                       ` [PATCH] " Luck, Tony
2018-09-27 21:44                         ` Luck, Tony
2018-09-27 22:03                         ` [PATCH] " Borislav Petkov
2018-09-27 22:03                           ` Borislav Petkov
2018-09-28  1:10                           ` [PATCH] " Mauro Carvalho Chehab
2018-09-28  1:10                             ` Mauro Carvalho Chehab
2018-10-01 12:47                             ` [PATCH] " Borislav Petkov
2018-10-01 12:47                               ` Borislav Petkov
2018-10-01 22:43                               ` [PATCH] EDAC: Don't add devices under /sys/bus/edac Luck, Tony
2018-10-01 22:43                                 ` Luck, Tony
2018-10-02  1:22                                 ` [PATCH] " Mauro Carvalho Chehab
2018-10-02  1:22                                   ` Mauro Carvalho Chehab
2018-10-02 15:51                                   ` [PATCH] " Ernst, Justin
2018-10-02 15:51                                     ` Justin Ernst
2018-10-02 16:26                                     ` [PATCH] " Borislav Petkov
2018-10-02 16:26                                       ` Borislav Petkov
2018-11-06 14:45                                       ` [PATCH] " Borislav Petkov
2018-11-06 14:45                                         ` Borislav Petkov
2018-11-13 19:09                                         ` [PATCH] " Ernst, Justin
2018-11-13 19:09                                           ` Justin Ernst
2018-11-13 19:15                                           ` [PATCH] " Borislav Petkov
2018-11-13 19:15                                             ` Borislav Petkov
2018-09-26  7:55 ` [PATCH] Raise maximum number of memory controllers Zhuo, Qiuxu
2018-09-26  7:55   ` Qiuxu Zhuo
2018-09-26 13:53   ` [PATCH] " Russ Anderson
2018-09-26 13:53     ` Russ Anderson
2018-09-26 16:13 ` [PATCH] " Aristeu Rozanski
2018-09-26 16:13   ` Aristeu Rozanski
2018-09-27  5:56 ` [PATCH] " Borislav Petkov
2018-09-27  5:56   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180926230257.GA5666@agluck-desk \
    --to=tony.luck@intel.com \
    --cc=arozansk@redhat.com \
    --cc=bp@alien8.de \
    --cc=gregkh@linuxfoundation.org \
    --cc=justin.ernst@hpe.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=mchehab@kernel.org \
    --cc=rja@hpe.com \
    --cc=russ.anderson@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.