All of lore.kernel.org
 help / color / mirror / Atom feed
From: Borislav Petkov <bp@alien8.de>
To: Tony Luck <tony.luck@intel.com>
Cc: Justin Ernst <justin.ernst@hpe.com>,
	russ.anderson@hpe.com, Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] Raise maximum number of memory controllers
Date: Tue, 25 Sep 2018 17:26:59 +0200	[thread overview]
Message-ID: <20180925152659.GE23986@zn.tnic> (raw)
In-Reply-To: <20180925143449.284634-1-justin.ernst@hpe.com>

On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote:
> We observe an oops in the skx_edac module during boot.
> Examining /var/log/messages:
> [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0
> [ 3401.985887] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1
> [ 3401.986014] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0
> ...
> [ 3401.987318] EDAC MC13: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1
> [ 3401.987435] EDAC MC14: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0
> [ 3401.987556] EDAC MC15: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1
> [ 3401.987579] Too many memory controllers: 16
> [ 3402.042614] EDAC MC: Removed device 0 for skx_edac Skylake Socket#0 IMC#0
> 
> We observe there are two memory controllers per socket, with a limit of 16.
> Raise the maximum number of memory controllers from 16 to 2 * MAX_NUMNODES (1024).

Tony,

can we read that out from the hardware instead of having this silly
static number?

Leaving in the rest.

> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-edac@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Acked-by: Russ Anderson <russ.anderson@hpe.com>
> Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
> ---
>  include/linux/edac.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index bffb97828ed6..958d69332c1d 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -17,6 +17,7 @@
>  #include <linux/completion.h>
>  #include <linux/workqueue.h>
>  #include <linux/debugfs.h>
> +#include <linux/numa.h>
>  
>  #define EDAC_DEVICE_NAME_LEN	31
>  
> @@ -670,6 +671,6 @@ struct mem_ctl_info {
>  /*
>   * Maximum number of memory controllers in the coherent fabric.
>   */
> -#define EDAC_MAX_MCS	16
> +#define EDAC_MAX_MCS	2 * MAX_NUMNODES
>  
>  #endif
> -- 
> 2.12.3
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

WARNING: multiple messages have this Message-ID (diff)
From: Borislav Petkov <bp@alien8.de>
To: Tony Luck <tony.luck@intel.com>
Cc: Justin Ernst <justin.ernst@hpe.com>,
	russ.anderson@hpe.com, Mauro Carvalho Chehab <mchehab@kernel.org>,
	linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Raise maximum number of memory controllers
Date: Tue, 25 Sep 2018 17:26:59 +0200	[thread overview]
Message-ID: <20180925152659.GE23986@zn.tnic> (raw)

On Tue, Sep 25, 2018 at 09:34:49AM -0500, Justin Ernst wrote:
> We observe an oops in the skx_edac module during boot.
> Examining /var/log/messages:
> [ 3401.985757] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0
> [ 3401.985887] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1
> [ 3401.986014] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0
> ...
> [ 3401.987318] EDAC MC13: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1
> [ 3401.987435] EDAC MC14: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0
> [ 3401.987556] EDAC MC15: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1
> [ 3401.987579] Too many memory controllers: 16
> [ 3402.042614] EDAC MC: Removed device 0 for skx_edac Skylake Socket#0 IMC#0
> 
> We observe there are two memory controllers per socket, with a limit of 16.
> Raise the maximum number of memory controllers from 16 to 2 * MAX_NUMNODES (1024).

Tony,

can we read that out from the hardware instead of having this silly
static number?

Leaving in the rest.

> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
> Cc: linux-edac@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Acked-by: Russ Anderson <russ.anderson@hpe.com>
> Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
> ---
>  include/linux/edac.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/edac.h b/include/linux/edac.h
> index bffb97828ed6..958d69332c1d 100644
> --- a/include/linux/edac.h
> +++ b/include/linux/edac.h
> @@ -17,6 +17,7 @@
>  #include <linux/completion.h>
>  #include <linux/workqueue.h>
>  #include <linux/debugfs.h>
> +#include <linux/numa.h>
>  
>  #define EDAC_DEVICE_NAME_LEN	31
>  
> @@ -670,6 +671,6 @@ struct mem_ctl_info {
>  /*
>   * Maximum number of memory controllers in the coherent fabric.
>   */
> -#define EDAC_MAX_MCS	16
> +#define EDAC_MAX_MCS	2 * MAX_NUMNODES
>  
>  #endif
> -- 
> 2.12.3
>

  reply	other threads:[~2018-09-25 15:27 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-25 14:34 [PATCH] Raise maximum number of memory controllers Justin Ernst
2018-09-25 14:34 ` Justin Ernst
2018-09-25 15:26 ` Borislav Petkov [this message]
2018-09-25 15:26   ` Borislav Petkov
2018-09-25 17:50   ` [PATCH] " Luck, Tony
2018-09-25 17:50     ` Luck, Tony
2018-09-25 18:07     ` [PATCH] " Borislav Petkov
2018-09-25 18:07       ` Borislav Petkov
2018-09-26  9:35       ` [PATCH] " Borislav Petkov
2018-09-26  9:35         ` Borislav Petkov
2018-09-26 15:27         ` [PATCH] " Borislav Petkov
2018-09-26 15:27           ` Borislav Petkov
2018-09-26 16:03           ` [PATCH] " Mauro Carvalho Chehab
2018-09-26 16:03             ` Mauro Carvalho Chehab
2018-09-26 16:17             ` [PATCH] " Borislav Petkov
2018-09-26 16:17               ` Borislav Petkov
2018-09-26 17:39               ` [PATCH] " Mauro Carvalho Chehab
2018-09-26 17:39                 ` Mauro Carvalho Chehab
2018-09-26 18:10               ` [PATCH] " Luck, Tony
2018-09-26 18:10                 ` Luck, Tony
2018-09-26 18:23                 ` [PATCH] " Russ Anderson
2018-09-26 18:23                   ` Russ Anderson
2018-09-26 23:02                   ` [PATCH] " Luck, Tony
2018-09-26 23:02                     ` Luck, Tony
2018-09-27  4:52                     ` [PATCH] " Borislav Petkov
2018-09-27  4:52                       ` Borislav Petkov
2018-09-27 21:44                       ` [PATCH] " Luck, Tony
2018-09-27 21:44                         ` Luck, Tony
2018-09-27 22:03                         ` [PATCH] " Borislav Petkov
2018-09-27 22:03                           ` Borislav Petkov
2018-09-28  1:10                           ` [PATCH] " Mauro Carvalho Chehab
2018-09-28  1:10                             ` Mauro Carvalho Chehab
2018-10-01 12:47                             ` [PATCH] " Borislav Petkov
2018-10-01 12:47                               ` Borislav Petkov
2018-10-01 22:43                               ` [PATCH] EDAC: Don't add devices under /sys/bus/edac Luck, Tony
2018-10-01 22:43                                 ` Luck, Tony
2018-10-02  1:22                                 ` [PATCH] " Mauro Carvalho Chehab
2018-10-02  1:22                                   ` Mauro Carvalho Chehab
2018-10-02 15:51                                   ` [PATCH] " Ernst, Justin
2018-10-02 15:51                                     ` Justin Ernst
2018-10-02 16:26                                     ` [PATCH] " Borislav Petkov
2018-10-02 16:26                                       ` Borislav Petkov
2018-11-06 14:45                                       ` [PATCH] " Borislav Petkov
2018-11-06 14:45                                         ` Borislav Petkov
2018-11-13 19:09                                         ` [PATCH] " Ernst, Justin
2018-11-13 19:09                                           ` Justin Ernst
2018-11-13 19:15                                           ` [PATCH] " Borislav Petkov
2018-11-13 19:15                                             ` Borislav Petkov
2018-09-26  7:55 ` [PATCH] Raise maximum number of memory controllers Zhuo, Qiuxu
2018-09-26  7:55   ` Qiuxu Zhuo
2018-09-26 13:53   ` [PATCH] " Russ Anderson
2018-09-26 13:53     ` Russ Anderson
2018-09-26 16:13 ` [PATCH] " Aristeu Rozanski
2018-09-26 16:13   ` Aristeu Rozanski
2018-09-27  5:56 ` [PATCH] " Borislav Petkov
2018-09-27  5:56   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180925152659.GE23986@zn.tnic \
    --to=bp@alien8.de \
    --cc=justin.ernst@hpe.com \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab@kernel.org \
    --cc=russ.anderson@hpe.com \
    --cc=tony.luck@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.