All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Justin Ernst <justin.ernst@hpe.com>,
	russ.anderson@hpe.com, Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Raise maximum number of memory controllers
Date: Tue, 25 Sep 2018 10:50:23 -0700	[thread overview]
Message-ID: <20180925175023.GA16725@agluck-desk> (raw)
In-Reply-To: <20180925152659.GE23986@zn.tnic>

On Tue, Sep 25, 2018 at 05:26:59PM +0200, Borislav Petkov wrote:
> On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote:
> > We observe an oops in the skx_edac module during boot.
> > Examining /var/log/messages:
> > [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0
> > [ 3401.985887] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1
> > [ 3401.986014] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0
> > ...
> > [ 3401.987318] EDAC MC13: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1
> > [ 3401.987435] EDAC MC14: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0
> > [ 3401.987556] EDAC MC15: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1
> > [ 3401.987579] Too many memory controllers: 16
> > [ 3402.042614] EDAC MC: Removed device 0 for skx_edac Skylake Socket#0 IMC#0
> > 
> > We observe there are two memory controllers per socket, with a limit of 16.
> > Raise the maximum number of memory controllers from 16 to 2 * MAX_NUMNODES (1024).
> 
> Tony,
> 
> can we read that out from the hardware instead of having this silly
> static number?
> 
> Leaving in the rest.

There are way too many places where we use the identifier "bus"
in the edac core and drivers. But I'm not sure that we need a
static array mc_bus[EDAC_MAX_MCS].

Why can't we:


-	mci->bus = &mc_bus[mci->mc_idx];
+	mci->bus = kmalloc(sizeof *(mci->bus), GFP_KERNEL);

and then figure out where to kfree(mci->bus) on driver removal?

Do we every do arithmetic on different mci->bus pointers that
assume they are all part of a single array?

-Tony

WARNING: multiple messages have this Message-ID (diff)
From: "Luck, Tony" <tony.luck@intel.com>
To: Borislav Petkov <bp@alien8.de>
Cc: Justin Ernst <justin.ernst@hpe.com>,
	russ.anderson@hpe.com, Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Raise maximum number of memory controllers
Date: Tue, 25 Sep 2018 10:50:23 -0700	[thread overview]
Message-ID: <20180925175023.GA16725@agluck-desk> (raw)

On Tue, Sep 25, 2018 at 05:26:59PM +0200, Borislav Petkov wrote:
> On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote:
> > We observe an oops in the skx_edac module during boot.
> > Examining /var/log/messages:
> > [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0
> > [ 3401.985887] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1
> > [ 3401.986014] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0
> > ...
> > [ 3401.987318] EDAC MC13: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1
> > [ 3401.987435] EDAC MC14: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0
> > [ 3401.987556] EDAC MC15: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1
> > [ 3401.987579] Too many memory controllers: 16
> > [ 3402.042614] EDAC MC: Removed device 0 for skx_edac Skylake Socket#0 IMC#0
> > 
> > We observe there are two memory controllers per socket, with a limit of 16.
> > Raise the maximum number of memory controllers from 16 to 2 * MAX_NUMNODES (1024).
> 
> Tony,
> 
> can we read that out from the hardware instead of having this silly
> static number?
> 
> Leaving in the rest.

There are way too many places where we use the identifier "bus"
in the edac core and drivers. But I'm not sure that we need a
static array mc_bus[EDAC_MAX_MCS].

Why can't we:


-	mci->bus = &mc_bus[mci->mc_idx];
+	mci->bus = kmalloc(sizeof *(mci->bus), GFP_KERNEL);

and then figure out where to kfree(mci->bus) on driver removal?

Do we every do arithmetic on different mci->bus pointers that
assume they are all part of a single array?

-Tony

  reply	other threads:[~2018-09-25 17:50 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25 14:34 [PATCH] Raise maximum number of memory controllers Justin Ernst
2018-09-25 14:34 ` Justin Ernst
2018-09-25 15:26 ` [PATCH] " Borislav Petkov
2018-09-25 15:26   ` Borislav Petkov
2018-09-25 17:50   ` Luck, Tony [this message]
2018-09-25 17:50     ` Luck, Tony
2018-09-25 18:07     ` [PATCH] " Borislav Petkov
2018-09-25 18:07       ` Borislav Petkov
2018-09-26  9:35       ` [PATCH] " Borislav Petkov
2018-09-26  9:35         ` Borislav Petkov
2018-09-26 15:27         ` [PATCH] " Borislav Petkov
2018-09-26 15:27           ` Borislav Petkov
2018-09-26 16:03           ` [PATCH] " Mauro Carvalho Chehab
2018-09-26 16:03             ` Mauro Carvalho Chehab
2018-09-26 16:17             ` [PATCH] " Borislav Petkov
2018-09-26 16:17               ` Borislav Petkov
2018-09-26 17:39               ` [PATCH] " Mauro Carvalho Chehab
2018-09-26 17:39                 ` Mauro Carvalho Chehab
2018-09-26 18:10               ` [PATCH] " Luck, Tony
2018-09-26 18:10                 ` Luck, Tony
2018-09-26 18:23                 ` [PATCH] " Russ Anderson
2018-09-26 18:23                   ` Russ Anderson
2018-09-26 23:02                   ` [PATCH] " Luck, Tony
2018-09-26 23:02                     ` Luck, Tony
2018-09-27  4:52                     ` [PATCH] " Borislav Petkov
2018-09-27  4:52                       ` Borislav Petkov
2018-09-27 21:44                       ` [PATCH] " Luck, Tony
2018-09-27 21:44                         ` Luck, Tony
2018-09-27 22:03                         ` [PATCH] " Borislav Petkov
2018-09-27 22:03                           ` Borislav Petkov
2018-09-28  1:10                           ` [PATCH] " Mauro Carvalho Chehab
2018-09-28  1:10                             ` Mauro Carvalho Chehab
2018-10-01 12:47                             ` [PATCH] " Borislav Petkov
2018-10-01 12:47                               ` Borislav Petkov
2018-10-01 22:43                               ` [PATCH] EDAC: Don't add devices under /sys/bus/edac Luck, Tony
2018-10-01 22:43                                 ` Luck, Tony
2018-10-02  1:22                                 ` [PATCH] " Mauro Carvalho Chehab
2018-10-02  1:22                                   ` Mauro Carvalho Chehab
2018-10-02 15:51                                   ` [PATCH] " Ernst, Justin
2018-10-02 15:51                                     ` Justin Ernst
2018-10-02 16:26                                     ` [PATCH] " Borislav Petkov
2018-10-02 16:26                                       ` Borislav Petkov
2018-11-06 14:45                                       ` [PATCH] " Borislav Petkov
2018-11-06 14:45                                         ` Borislav Petkov
2018-11-13 19:09                                         ` [PATCH] " Ernst, Justin
2018-11-13 19:09                                           ` Justin Ernst
2018-11-13 19:15                                           ` [PATCH] " Borislav Petkov
2018-11-13 19:15                                             ` Borislav Petkov
2018-09-26  7:55 ` [PATCH] Raise maximum number of memory controllers Zhuo, Qiuxu
2018-09-26  7:55   ` Qiuxu Zhuo
2018-09-26 13:53   ` [PATCH] " Russ Anderson
2018-09-26 13:53     ` Russ Anderson
2018-09-26 16:13 ` [PATCH] " Aristeu Rozanski
2018-09-26 16:13   ` Aristeu Rozanski
2018-09-27  5:56 ` [PATCH] " Borislav Petkov
2018-09-27  5:56   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180925175023.GA16725@agluck-desk \
    --to=tony.luck@intel.com \
    --cc=bp@alien8.de \
    --cc=justin.ernst@hpe.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=russ.anderson@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.