All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
To: Borislav Petkov <bp@suse.de>,
	Johannes Hirte <johannes.hirte@datenkhaos.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: RE: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block
Date: Thu, 17 May 2018 13:04:19 +0000	[thread overview]
Message-ID: <CY4PR12MB1557AB31F5F9D1ECBDAC2DA6F8910@CY4PR12MB1557.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20180517104124.GA25595@pd.tnic>

> -----Original Message-----
> From: Borislav Petkov <bp@suse.de>
> Sent: Thursday, May 17, 2018 6:41 AM
> To: Johannes Hirte <johannes.hirte@datenkhaos.de>
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux-
> edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com;
> x86@kernel.org
> Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized
> block
> 
> On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote:
> > Maybe I'm missing something, but those RDMSR IPSs don't happen on
> > pre-SMCA systems, right? So the caching should be avoided here, cause
> > the whole lookup looks more expensive to me than the simple switch-block
> > in get_block_address.
> 
> Yeah, and we should simply cache all those MSR values as I suggested then.
> 

Yes, you're right. I thought using the existing data structures would work, but it
seems I messed up the implementation.

> The patch at the end should fix your issue.
> 
> Yazen, so I'm working on the assumption that all addresses are the same
> on every core - I dumped them and it looks like this:
> 
>     128 bank: 00, block: 0 : 0xc0002003
>     128 bank: 00, block: 1 : 0x0
>     128 bank: 01, block: 0 : 0xc0002013
>     128 bank: 01, block: 1 : 0x0
>     128 bank: 02, block: 0 : 0xc0002023
>     128 bank: 02, block: 1 : 0x0
>     128 bank: 03, block: 0 : 0xc0002033
>     128 bank: 03, block: 1 : 0x0
>     128 bank: 04, block: 0 : 0x0
>     128 bank: 05, block: 0 : 0xc0002053
>     128 bank: 05, block: 1 : 0x0
>     128 bank: 06, block: 0 : 0xc0002063
>     128 bank: 06, block: 1 : 0x0
>     128 bank: 07, block: 0 : 0xc0002073
>     128 bank: 07, block: 1 : 0x0
>     128 bank: 08, block: 0 : 0xc0002083
>     128 bank: 08, block: 1 : 0x0
>     128 bank: 09, block: 0 : 0xc0002093
>     128 bank: 09, block: 1 : 0x0
>     128 bank: 10, block: 0 : 0xc00020a3
>     128 bank: 10, block: 1 : 0x0
>     128 bank: 11, block: 0 : 0xc00020b3
>     128 bank: 11, block: 1 : 0x0
>     128 bank: 12, block: 0 : 0xc00020c3
>     128 bank: 12, block: 1 : 0x0
>     128 bank: 13, block: 0 : 0xc00020d3
>     128 bank: 13, block: 1 : 0x0
>     128 bank: 14, block: 0 : 0xc00020e3
>     128 bank: 14, block: 1 : 0x0
>     128 bank: 15, block: 0 : 0xc00020f3
>     128 bank: 15, block: 1 : 0x0
>     128 bank: 16, block: 0 : 0xc0002103
>     128 bank: 16, block: 1 : 0x0

Banks 15 and 16 should have an address for block 1 also. Do you have PFEH
enabled on your system? That would cause MISC0 to be RAZ so we won't
get the MISC1 address. I'll try it myself also and let you know. 

>     128 bank: 17, block: 0 : 0xc0002113
>     128 bank: 17, block: 1 : 0x0
>     128 bank: 18, block: 0 : 0xc0002123
>     128 bank: 18, block: 1 : 0x0
>     128 bank: 19, block: 0 : 0xc0002133
>     128 bank: 19, block: 1 : 0x0
>     128 bank: 20, block: 0 : 0xc0002143
>     128 bank: 20, block: 1 : 0x0
>     128 bank: 21, block: 0 : 0xc0002153
>     128 bank: 21, block: 1 : 0x0
>     128 bank: 22, block: 0 : 0xc0002163
>     128 bank: 22, block: 1 : 0x0
> 
> so you have 128 times the same address on a 128 core Zen box.
> 
> If that is correct, then we can use this nice simplification and cache
> them globally and not per-CPU. Seems to work here. Scream if something's
> amiss.
> 

I think this good for now. We'll probably need to change it in the future, but
maybe we can clean up all the thresholding blocks code and make it simpler
when we do change it.

> Thx.
> 
> ---
> From d8614e904d8f63b89d2d204d6fa247c4c416a1b6 Mon Sep 17 00:00:00
> 2001
> From: Borislav Petkov <bp@susede>
> Date: Thu, 17 May 2018 10:46:26 +0200
> Subject: [PATCH] array caching
> 
> Not-Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 29 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index f7666eef4a87..c8e038800591 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -94,6 +94,11 @@ static struct smca_bank_name smca_names[] = {
>  	[SMCA_SMU]	= { "smu",		"System Management Unit" },
>  };
> 
> +static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
> +{
> +	[0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
> +};
> +
>  const char *smca_get_name(enum smca_bank_types t)
>  {
>  	if (t >= N_SMCA_BANK_TYPES)
> @@ -443,20 +448,26 @@ static u32 smca_get_block_address(unsigned int
> cpu, unsigned int bank,
>  	if (!block)
>  		return MSR_AMD64_SMCA_MCx_MISC(bank);
> 
> +	/* Check our cache first: */
> +	if (smca_bank_addrs[bank][block] != -1)
> +		return smca_bank_addrs[bank][block];
> +

This hunk could go above the !block. Though maybe the macro is lighter than the
array lookup. It'll work either way, but I'm just thinking out loud.

>  	/*
>  	 * For SMCA enabled processors, BLKPTR field of the first MISC register
>  	 * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-
> 4).
>  	 */
>  	if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_CONFIG(bank),
> &low, &high))
> -		return addr;
> +		goto out;
> 
>  	if (!(low & MCI_CONFIG_MCAX))
> -		return addr;
> +		goto out;
> 
>  	if (!rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_MISC(bank),
> &low, &high) &&
>  	    (low & MASK_BLKPTR_LO))
> -		return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
> +		addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
> 
> +out:
> +	smca_bank_addrs[bank][block] = addr;
>  	return addr;
>  }
> 

Since we're caching the values during init, we can drop all the *_on_cpu() calls.
What do you think?

Thanks,
Yazen

WARNING: multiple messages have this Message-ID (diff)
From: Yazen Ghannam <yazen.ghannam@amd.com>
To: Borislav Petkov <bp@suse.de>,
	Johannes Hirte <johannes.hirte@datenkhaos.de>
Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tony.luck@intel.com" <tony.luck@intel.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: [3/3] x86/MCE/AMD: Get address from already initialized block
Date: Thu, 17 May 2018 13:04:19 +0000	[thread overview]
Message-ID: <CY4PR12MB1557AB31F5F9D1ECBDAC2DA6F8910@CY4PR12MB1557.namprd12.prod.outlook.com> (raw)

> -----Original Message-----
> From: Borislav Petkov <bp@suse.de>
> Sent: Thursday, May 17, 2018 6:41 AM
> To: Johannes Hirte <johannes.hirte@datenkhaos.de>
> Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux-
> edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com;
> x86@kernel.org
> Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized
> block
> 
> On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote:
> > Maybe I'm missing something, but those RDMSR IPSs don't happen on
> > pre-SMCA systems, right? So the caching should be avoided here, cause
> > the whole lookup looks more expensive to me than the simple switch-block
> > in get_block_address.
> 
> Yeah, and we should simply cache all those MSR values as I suggested then.
> 

Yes, you're right. I thought using the existing data structures would work, but it
seems I messed up the implementation.

> The patch at the end should fix your issue.
> 
> Yazen, so I'm working on the assumption that all addresses are the same
> on every core - I dumped them and it looks like this:
> 
>     128 bank: 00, block: 0 : 0xc0002003
>     128 bank: 00, block: 1 : 0x0
>     128 bank: 01, block: 0 : 0xc0002013
>     128 bank: 01, block: 1 : 0x0
>     128 bank: 02, block: 0 : 0xc0002023
>     128 bank: 02, block: 1 : 0x0
>     128 bank: 03, block: 0 : 0xc0002033
>     128 bank: 03, block: 1 : 0x0
>     128 bank: 04, block: 0 : 0x0
>     128 bank: 05, block: 0 : 0xc0002053
>     128 bank: 05, block: 1 : 0x0
>     128 bank: 06, block: 0 : 0xc0002063
>     128 bank: 06, block: 1 : 0x0
>     128 bank: 07, block: 0 : 0xc0002073
>     128 bank: 07, block: 1 : 0x0
>     128 bank: 08, block: 0 : 0xc0002083
>     128 bank: 08, block: 1 : 0x0
>     128 bank: 09, block: 0 : 0xc0002093
>     128 bank: 09, block: 1 : 0x0
>     128 bank: 10, block: 0 : 0xc00020a3
>     128 bank: 10, block: 1 : 0x0
>     128 bank: 11, block: 0 : 0xc00020b3
>     128 bank: 11, block: 1 : 0x0
>     128 bank: 12, block: 0 : 0xc00020c3
>     128 bank: 12, block: 1 : 0x0
>     128 bank: 13, block: 0 : 0xc00020d3
>     128 bank: 13, block: 1 : 0x0
>     128 bank: 14, block: 0 : 0xc00020e3
>     128 bank: 14, block: 1 : 0x0
>     128 bank: 15, block: 0 : 0xc00020f3
>     128 bank: 15, block: 1 : 0x0
>     128 bank: 16, block: 0 : 0xc0002103
>     128 bank: 16, block: 1 : 0x0

Banks 15 and 16 should have an address for block 1 also. Do you have PFEH
enabled on your system? That would cause MISC0 to be RAZ so we won't
get the MISC1 address. I'll try it myself also and let you know. 

>     128 bank: 17, block: 0 : 0xc0002113
>     128 bank: 17, block: 1 : 0x0
>     128 bank: 18, block: 0 : 0xc0002123
>     128 bank: 18, block: 1 : 0x0
>     128 bank: 19, block: 0 : 0xc0002133
>     128 bank: 19, block: 1 : 0x0
>     128 bank: 20, block: 0 : 0xc0002143
>     128 bank: 20, block: 1 : 0x0
>     128 bank: 21, block: 0 : 0xc0002153
>     128 bank: 21, block: 1 : 0x0
>     128 bank: 22, block: 0 : 0xc0002163
>     128 bank: 22, block: 1 : 0x0
> 
> so you have 128 times the same address on a 128 core Zen box.
> 
> If that is correct, then we can use this nice simplification and cache
> them globally and not per-CPU. Seems to work here. Scream if something's
> amiss.
> 

I think this good for now. We'll probably need to change it in the future, but
maybe we can clean up all the thresholding blocks code and make it simpler
when we do change it.

> Thx.
> 
> ---
> From d8614e904d8f63b89d2d204d6fa247c4c416a1b6 Mon Sep 17 00:00:00
> 2001
> From: Borislav Petkov <bp@susede>
> Date: Thu, 17 May 2018 10:46:26 +0200
> Subject: [PATCH] array caching
> 
> Not-Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/kernel/cpu/mcheck/mce_amd.c | 29 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index f7666eef4a87..c8e038800591 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -94,6 +94,11 @@ static struct smca_bank_name smca_names[] = {
>  	[SMCA_SMU]	= { "smu",		"System Management Unit" },
>  };
> 
> +static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init =
> +{
> +	[0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 }
> +};
> +
>  const char *smca_get_name(enum smca_bank_types t)
>  {
>  	if (t >= N_SMCA_BANK_TYPES)
> @@ -443,20 +448,26 @@ static u32 smca_get_block_address(unsigned int
> cpu, unsigned int bank,
>  	if (!block)
>  		return MSR_AMD64_SMCA_MCx_MISC(bank);
> 
> +	/* Check our cache first: */
> +	if (smca_bank_addrs[bank][block] != -1)
> +		return smca_bank_addrs[bank][block];
> +

This hunk could go above the !block. Though maybe the macro is lighter than the
array lookup. It'll work either way, but I'm just thinking out loud.

>  	/*
>  	 * For SMCA enabled processors, BLKPTR field of the first MISC register
>  	 * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1-
> 4).
>  	 */
>  	if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_CONFIG(bank),
> &low, &high))
> -		return addr;
> +		goto out;
> 
>  	if (!(low & MCI_CONFIG_MCAX))
> -		return addr;
> +		goto out;
> 
>  	if (!rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_MISC(bank),
> &low, &high) &&
>  	    (low & MASK_BLKPTR_LO))
> -		return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
> +		addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1);
> 
> +out:
> +	smca_bank_addrs[bank][block] = addr;
>  	return addr;
>  }
> 

Since we're caching the values during init, we can drop all the *_on_cpu() calls.
What do you think?

Thanks,
Yazen

  reply	other threads:[~2018-05-17 13:04 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-01 18:48 [PATCH 1/3] x86/MCE/AMD: Redo function to get SMCA bank type Yazen Ghannam
2018-02-01 18:48 ` [1/3] " Yazen Ghannam
2018-02-01 18:48 ` [PATCH 2/3] x86/MCE/AMD, EDAC/mce_amd: Enumerate Reserved " Yazen Ghannam
2018-02-01 18:48   ` [2/3] " Yazen Ghannam
2018-02-08 15:15   ` [PATCH 2/3] " Borislav Petkov
2018-02-08 15:15     ` [2/3] " Borislav Petkov
2018-02-14 16:28     ` [PATCH 2/3] " Ghannam, Yazen
2018-02-14 16:28       ` [2/3] " Yazen Ghannam
2018-02-01 18:48 ` [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block Yazen Ghannam
2018-02-01 18:48   ` [3/3] " Yazen Ghannam
2018-02-08 15:26   ` [PATCH 3/3] " Borislav Petkov
2018-02-08 15:26     ` [3/3] " Borislav Petkov
2018-04-14  0:42   ` [PATCH 3/3] " Johannes Hirte
2018-04-14  0:42     ` [3/3] " Johannes Hirte
2018-04-16 11:56     ` [PATCH 3/3] " Johannes Hirte
2018-04-16 11:56       ` [3/3] " Johannes Hirte
2018-04-17 13:31       ` [PATCH 3/3] " Ghannam, Yazen
2018-04-17 13:31         ` [3/3] " Yazen Ghannam
2018-05-15  9:39         ` [PATCH 3/3] " Johannes Hirte
2018-05-15  9:39           ` [3/3] " Johannes Hirte
2018-05-16 22:46           ` [PATCH 3/3] " Borislav Petkov
2018-05-16 22:46             ` [3/3] " Boris Petkov
2018-05-17  6:49             ` [PATCH 3/3] " Johannes Hirte
2018-05-17  6:49               ` [3/3] " Johannes Hirte
2018-05-17 10:41               ` [PATCH 3/3] " Borislav Petkov
2018-05-17 10:41                 ` [3/3] " Boris Petkov
2018-05-17 13:04                 ` Ghannam, Yazen [this message]
2018-05-17 13:04                   ` Yazen Ghannam
2018-05-17 13:44                   ` [PATCH 3/3] " Borislav Petkov
2018-05-17 13:44                     ` [3/3] " Boris Petkov
2018-05-17 14:05                     ` [PATCH 3/3] " Ghannam, Yazen
2018-05-17 14:05                       ` [3/3] " Yazen Ghannam
2018-05-17 18:30                       ` [PATCH 1/2] x86/MCE/AMD: Cache SMCA MISC block addresses Borislav Petkov
2018-05-17 18:30                         ` [1/2] " Boris Petkov
2018-05-17 18:31                       ` [PATCH 2/2] x86/MCE/AMD: Read MCx_MISC block addresses on any CPU Borislav Petkov
2018-05-17 18:31                         ` [2/2] " Boris Petkov
2018-05-17 19:29                 ` [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block Johannes Hirte
2018-05-17 19:29                   ` [3/3] " Johannes Hirte
2018-05-17 19:33                   ` [PATCH 3/3] " Borislav Petkov
2018-05-17 19:33                     ` [3/3] " Boris Petkov
2018-05-19 13:21     ` [tip:ras/urgent] x86/MCE/AMD: Cache SMCA MISC block addresses tip-bot for Borislav Petkov
2018-02-08 15:04 ` [PATCH 1/3] x86/MCE/AMD: Redo function to get SMCA bank type Borislav Petkov
2018-02-08 15:04   ` [1/3] " Borislav Petkov
2018-02-14 16:38   ` [PATCH 1/3] " Ghannam, Yazen
2018-02-14 16:38     ` [1/3] " Yazen Ghannam
2018-02-14 19:35     ` [PATCH 1/3] " Borislav Petkov
2018-02-14 19:35       ` [1/3] " Boris Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CY4PR12MB1557AB31F5F9D1ECBDAC2DA6F8910@CY4PR12MB1557.namprd12.prod.outlook.com \
    --to=yazen.ghannam@amd.com \
    --cc=bp@suse.de \
    --cc=johannes.hirte@datenkhaos.de \
    --cc=linux-edac@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.