From: "Ghannam, Yazen" <Yazen.Ghannam@amd.com> To: Borislav Petkov <bp@suse.de>, Johannes Hirte <johannes.hirte@datenkhaos.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "tony.luck@intel.com" <tony.luck@intel.com>, "x86@kernel.org" <x86@kernel.org> Subject: RE: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block Date: Thu, 17 May 2018 13:04:19 +0000 [thread overview] Message-ID: <CY4PR12MB1557AB31F5F9D1ECBDAC2DA6F8910@CY4PR12MB1557.namprd12.prod.outlook.com> (raw) In-Reply-To: <20180517104124.GA25595@pd.tnic> > -----Original Message----- > From: Borislav Petkov <bp@suse.de> > Sent: Thursday, May 17, 2018 6:41 AM > To: Johannes Hirte <johannes.hirte@datenkhaos.de> > Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux- > edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com; > x86@kernel.org > Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized > block > > On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote: > > Maybe I'm missing something, but those RDMSR IPSs don't happen on > > pre-SMCA systems, right? So the caching should be avoided here, cause > > the whole lookup looks more expensive to me than the simple switch-block > > in get_block_address. > > Yeah, and we should simply cache all those MSR values as I suggested then. > Yes, you're right. I thought using the existing data structures would work, but it seems I messed up the implementation. > The patch at the end should fix your issue. > > Yazen, so I'm working on the assumption that all addresses are the same > on every core - I dumped them and it looks like this: > > 128 bank: 00, block: 0 : 0xc0002003 > 128 bank: 00, block: 1 : 0x0 > 128 bank: 01, block: 0 : 0xc0002013 > 128 bank: 01, block: 1 : 0x0 > 128 bank: 02, block: 0 : 0xc0002023 > 128 bank: 02, block: 1 : 0x0 > 128 bank: 03, block: 0 : 0xc0002033 > 128 bank: 03, block: 1 : 0x0 > 128 bank: 04, block: 0 : 0x0 > 128 bank: 05, block: 0 : 0xc0002053 > 128 bank: 05, block: 1 : 0x0 > 128 bank: 06, block: 0 : 0xc0002063 > 128 bank: 06, block: 1 : 0x0 > 128 bank: 07, block: 0 : 0xc0002073 > 128 bank: 07, block: 1 : 0x0 > 128 bank: 08, block: 0 : 0xc0002083 > 128 bank: 08, block: 1 : 0x0 > 128 bank: 09, block: 0 : 0xc0002093 > 128 bank: 09, block: 1 : 0x0 > 128 bank: 10, block: 0 : 0xc00020a3 > 128 bank: 10, block: 1 : 0x0 > 128 bank: 11, block: 0 : 0xc00020b3 > 128 bank: 11, block: 1 : 0x0 > 128 bank: 12, block: 0 : 0xc00020c3 > 128 bank: 12, block: 1 : 0x0 > 128 bank: 13, block: 0 : 0xc00020d3 > 128 bank: 13, block: 1 : 0x0 > 128 bank: 14, block: 0 : 0xc00020e3 > 128 bank: 14, block: 1 : 0x0 > 128 bank: 15, block: 0 : 0xc00020f3 > 128 bank: 15, block: 1 : 0x0 > 128 bank: 16, block: 0 : 0xc0002103 > 128 bank: 16, block: 1 : 0x0 Banks 15 and 16 should have an address for block 1 also. Do you have PFEH enabled on your system? That would cause MISC0 to be RAZ so we won't get the MISC1 address. I'll try it myself also and let you know. > 128 bank: 17, block: 0 : 0xc0002113 > 128 bank: 17, block: 1 : 0x0 > 128 bank: 18, block: 0 : 0xc0002123 > 128 bank: 18, block: 1 : 0x0 > 128 bank: 19, block: 0 : 0xc0002133 > 128 bank: 19, block: 1 : 0x0 > 128 bank: 20, block: 0 : 0xc0002143 > 128 bank: 20, block: 1 : 0x0 > 128 bank: 21, block: 0 : 0xc0002153 > 128 bank: 21, block: 1 : 0x0 > 128 bank: 22, block: 0 : 0xc0002163 > 128 bank: 22, block: 1 : 0x0 > > so you have 128 times the same address on a 128 core Zen box. > > If that is correct, then we can use this nice simplification and cache > them globally and not per-CPU. Seems to work here. Scream if something's > amiss. > I think this good for now. We'll probably need to change it in the future, but maybe we can clean up all the thresholding blocks code and make it simpler when we do change it. > Thx. > > --- > From d8614e904d8f63b89d2d204d6fa247c4c416a1b6 Mon Sep 17 00:00:00 > 2001 > From: Borislav Petkov <bp@susede> > Date: Thu, 17 May 2018 10:46:26 +0200 > Subject: [PATCH] array caching > > Not-Signed-off-by: Borislav Petkov <bp@suse.de> > --- > arch/x86/kernel/cpu/mcheck/mce_amd.c | 29 ++++++++++++++-------------- > 1 file changed, 14 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c > b/arch/x86/kernel/cpu/mcheck/mce_amd.c > index f7666eef4a87..c8e038800591 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c > @@ -94,6 +94,11 @@ static struct smca_bank_name smca_names[] = { > [SMCA_SMU] = { "smu", "System Management Unit" }, > }; > > +static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init = > +{ > + [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 } > +}; > + > const char *smca_get_name(enum smca_bank_types t) > { > if (t >= N_SMCA_BANK_TYPES) > @@ -443,20 +448,26 @@ static u32 smca_get_block_address(unsigned int > cpu, unsigned int bank, > if (!block) > return MSR_AMD64_SMCA_MCx_MISC(bank); > > + /* Check our cache first: */ > + if (smca_bank_addrs[bank][block] != -1) > + return smca_bank_addrs[bank][block]; > + This hunk could go above the !block. Though maybe the macro is lighter than the array lookup. It'll work either way, but I'm just thinking out loud. > /* > * For SMCA enabled processors, BLKPTR field of the first MISC register > * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1- > 4). > */ > if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_CONFIG(bank), > &low, &high)) > - return addr; > + goto out; > > if (!(low & MCI_CONFIG_MCAX)) > - return addr; > + goto out; > > if (!rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_MISC(bank), > &low, &high) && > (low & MASK_BLKPTR_LO)) > - return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1); > + addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1); > > +out: > + smca_bank_addrs[bank][block] = addr; > return addr; > } > Since we're caching the values during init, we can drop all the *_on_cpu() calls. What do you think? Thanks, Yazen
WARNING: multiple messages have this Message-ID (diff)
From: Yazen Ghannam <yazen.ghannam@amd.com> To: Borislav Petkov <bp@suse.de>, Johannes Hirte <johannes.hirte@datenkhaos.de> Cc: "linux-edac@vger.kernel.org" <linux-edac@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "tony.luck@intel.com" <tony.luck@intel.com>, "x86@kernel.org" <x86@kernel.org> Subject: [3/3] x86/MCE/AMD: Get address from already initialized block Date: Thu, 17 May 2018 13:04:19 +0000 [thread overview] Message-ID: <CY4PR12MB1557AB31F5F9D1ECBDAC2DA6F8910@CY4PR12MB1557.namprd12.prod.outlook.com> (raw) > -----Original Message----- > From: Borislav Petkov <bp@suse.de> > Sent: Thursday, May 17, 2018 6:41 AM > To: Johannes Hirte <johannes.hirte@datenkhaos.de> > Cc: Ghannam, Yazen <Yazen.Ghannam@amd.com>; linux- > edac@vger.kernel.org; linux-kernel@vger.kernel.org; tony.luck@intel.com; > x86@kernel.org > Subject: Re: [PATCH 3/3] x86/MCE/AMD: Get address from already initialized > block > > On Thu, May 17, 2018 at 08:49:31AM +0200, Johannes Hirte wrote: > > Maybe I'm missing something, but those RDMSR IPSs don't happen on > > pre-SMCA systems, right? So the caching should be avoided here, cause > > the whole lookup looks more expensive to me than the simple switch-block > > in get_block_address. > > Yeah, and we should simply cache all those MSR values as I suggested then. > Yes, you're right. I thought using the existing data structures would work, but it seems I messed up the implementation. > The patch at the end should fix your issue. > > Yazen, so I'm working on the assumption that all addresses are the same > on every core - I dumped them and it looks like this: > > 128 bank: 00, block: 0 : 0xc0002003 > 128 bank: 00, block: 1 : 0x0 > 128 bank: 01, block: 0 : 0xc0002013 > 128 bank: 01, block: 1 : 0x0 > 128 bank: 02, block: 0 : 0xc0002023 > 128 bank: 02, block: 1 : 0x0 > 128 bank: 03, block: 0 : 0xc0002033 > 128 bank: 03, block: 1 : 0x0 > 128 bank: 04, block: 0 : 0x0 > 128 bank: 05, block: 0 : 0xc0002053 > 128 bank: 05, block: 1 : 0x0 > 128 bank: 06, block: 0 : 0xc0002063 > 128 bank: 06, block: 1 : 0x0 > 128 bank: 07, block: 0 : 0xc0002073 > 128 bank: 07, block: 1 : 0x0 > 128 bank: 08, block: 0 : 0xc0002083 > 128 bank: 08, block: 1 : 0x0 > 128 bank: 09, block: 0 : 0xc0002093 > 128 bank: 09, block: 1 : 0x0 > 128 bank: 10, block: 0 : 0xc00020a3 > 128 bank: 10, block: 1 : 0x0 > 128 bank: 11, block: 0 : 0xc00020b3 > 128 bank: 11, block: 1 : 0x0 > 128 bank: 12, block: 0 : 0xc00020c3 > 128 bank: 12, block: 1 : 0x0 > 128 bank: 13, block: 0 : 0xc00020d3 > 128 bank: 13, block: 1 : 0x0 > 128 bank: 14, block: 0 : 0xc00020e3 > 128 bank: 14, block: 1 : 0x0 > 128 bank: 15, block: 0 : 0xc00020f3 > 128 bank: 15, block: 1 : 0x0 > 128 bank: 16, block: 0 : 0xc0002103 > 128 bank: 16, block: 1 : 0x0 Banks 15 and 16 should have an address for block 1 also. Do you have PFEH enabled on your system? That would cause MISC0 to be RAZ so we won't get the MISC1 address. I'll try it myself also and let you know. > 128 bank: 17, block: 0 : 0xc0002113 > 128 bank: 17, block: 1 : 0x0 > 128 bank: 18, block: 0 : 0xc0002123 > 128 bank: 18, block: 1 : 0x0 > 128 bank: 19, block: 0 : 0xc0002133 > 128 bank: 19, block: 1 : 0x0 > 128 bank: 20, block: 0 : 0xc0002143 > 128 bank: 20, block: 1 : 0x0 > 128 bank: 21, block: 0 : 0xc0002153 > 128 bank: 21, block: 1 : 0x0 > 128 bank: 22, block: 0 : 0xc0002163 > 128 bank: 22, block: 1 : 0x0 > > so you have 128 times the same address on a 128 core Zen box. > > If that is correct, then we can use this nice simplification and cache > them globally and not per-CPU. Seems to work here. Scream if something's > amiss. > I think this good for now. We'll probably need to change it in the future, but maybe we can clean up all the thresholding blocks code and make it simpler when we do change it. > Thx. > > --- > From d8614e904d8f63b89d2d204d6fa247c4c416a1b6 Mon Sep 17 00:00:00 > 2001 > From: Borislav Petkov <bp@susede> > Date: Thu, 17 May 2018 10:46:26 +0200 > Subject: [PATCH] array caching > > Not-Signed-off-by: Borislav Petkov <bp@suse.de> > --- > arch/x86/kernel/cpu/mcheck/mce_amd.c | 29 ++++++++++++++-------------- > 1 file changed, 14 insertions(+), 15 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c > b/arch/x86/kernel/cpu/mcheck/mce_amd.c > index f7666eef4a87..c8e038800591 100644 > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c > @@ -94,6 +94,11 @@ static struct smca_bank_name smca_names[] = { > [SMCA_SMU] = { "smu", "System Management Unit" }, > }; > > +static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init = > +{ > + [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 } > +}; > + > const char *smca_get_name(enum smca_bank_types t) > { > if (t >= N_SMCA_BANK_TYPES) > @@ -443,20 +448,26 @@ static u32 smca_get_block_address(unsigned int > cpu, unsigned int bank, > if (!block) > return MSR_AMD64_SMCA_MCx_MISC(bank); > > + /* Check our cache first: */ > + if (smca_bank_addrs[bank][block] != -1) > + return smca_bank_addrs[bank][block]; > + This hunk could go above the !block. Though maybe the macro is lighter than the array lookup. It'll work either way, but I'm just thinking out loud. > /* > * For SMCA enabled processors, BLKPTR field of the first MISC register > * (MCx_MISC0) indicates presence of additional MISC regs set (MISC1- > 4). > */ > if (rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_CONFIG(bank), > &low, &high)) > - return addr; > + goto out; > > if (!(low & MCI_CONFIG_MCAX)) > - return addr; > + goto out; > > if (!rdmsr_safe_on_cpu(cpu, MSR_AMD64_SMCA_MCx_MISC(bank), > &low, &high) && > (low & MASK_BLKPTR_LO)) > - return MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1); > + addr = MSR_AMD64_SMCA_MCx_MISCy(bank, block - 1); > > +out: > + smca_bank_addrs[bank][block] = addr; > return addr; > } > Since we're caching the values during init, we can drop all the *_on_cpu() calls. What do you think? Thanks, Yazen
next prev parent reply other threads:[~2018-05-17 13:04 UTC|newest] Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-02-01 18:48 [PATCH 1/3] x86/MCE/AMD: Redo function to get SMCA bank type Yazen Ghannam 2018-02-01 18:48 ` [1/3] " Yazen Ghannam 2018-02-01 18:48 ` [PATCH 2/3] x86/MCE/AMD, EDAC/mce_amd: Enumerate Reserved " Yazen Ghannam 2018-02-01 18:48 ` [2/3] " Yazen Ghannam 2018-02-08 15:15 ` [PATCH 2/3] " Borislav Petkov 2018-02-08 15:15 ` [2/3] " Borislav Petkov 2018-02-14 16:28 ` [PATCH 2/3] " Ghannam, Yazen 2018-02-14 16:28 ` [2/3] " Yazen Ghannam 2018-02-01 18:48 ` [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block Yazen Ghannam 2018-02-01 18:48 ` [3/3] " Yazen Ghannam 2018-02-08 15:26 ` [PATCH 3/3] " Borislav Petkov 2018-02-08 15:26 ` [3/3] " Borislav Petkov 2018-04-14 0:42 ` [PATCH 3/3] " Johannes Hirte 2018-04-14 0:42 ` [3/3] " Johannes Hirte 2018-04-16 11:56 ` [PATCH 3/3] " Johannes Hirte 2018-04-16 11:56 ` [3/3] " Johannes Hirte 2018-04-17 13:31 ` [PATCH 3/3] " Ghannam, Yazen 2018-04-17 13:31 ` [3/3] " Yazen Ghannam 2018-05-15 9:39 ` [PATCH 3/3] " Johannes Hirte 2018-05-15 9:39 ` [3/3] " Johannes Hirte 2018-05-16 22:46 ` [PATCH 3/3] " Borislav Petkov 2018-05-16 22:46 ` [3/3] " Boris Petkov 2018-05-17 6:49 ` [PATCH 3/3] " Johannes Hirte 2018-05-17 6:49 ` [3/3] " Johannes Hirte 2018-05-17 10:41 ` [PATCH 3/3] " Borislav Petkov 2018-05-17 10:41 ` [3/3] " Boris Petkov 2018-05-17 13:04 ` Ghannam, Yazen [this message] 2018-05-17 13:04 ` Yazen Ghannam 2018-05-17 13:44 ` [PATCH 3/3] " Borislav Petkov 2018-05-17 13:44 ` [3/3] " Boris Petkov 2018-05-17 14:05 ` [PATCH 3/3] " Ghannam, Yazen 2018-05-17 14:05 ` [3/3] " Yazen Ghannam 2018-05-17 18:30 ` [PATCH 1/2] x86/MCE/AMD: Cache SMCA MISC block addresses Borislav Petkov 2018-05-17 18:30 ` [1/2] " Boris Petkov 2018-05-17 18:31 ` [PATCH 2/2] x86/MCE/AMD: Read MCx_MISC block addresses on any CPU Borislav Petkov 2018-05-17 18:31 ` [2/2] " Boris Petkov 2018-05-17 19:29 ` [PATCH 3/3] x86/MCE/AMD: Get address from already initialized block Johannes Hirte 2018-05-17 19:29 ` [3/3] " Johannes Hirte 2018-05-17 19:33 ` [PATCH 3/3] " Borislav Petkov 2018-05-17 19:33 ` [3/3] " Boris Petkov 2018-05-19 13:21 ` [tip:ras/urgent] x86/MCE/AMD: Cache SMCA MISC block addresses tip-bot for Borislav Petkov 2018-02-08 15:04 ` [PATCH 1/3] x86/MCE/AMD: Redo function to get SMCA bank type Borislav Petkov 2018-02-08 15:04 ` [1/3] " Borislav Petkov 2018-02-14 16:38 ` [PATCH 1/3] " Ghannam, Yazen 2018-02-14 16:38 ` [1/3] " Yazen Ghannam 2018-02-14 19:35 ` [PATCH 1/3] " Borislav Petkov 2018-02-14 19:35 ` [1/3] " Boris Petkov
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CY4PR12MB1557AB31F5F9D1ECBDAC2DA6F8910@CY4PR12MB1557.namprd12.prod.outlook.com \ --to=yazen.ghannam@amd.com \ --cc=bp@suse.de \ --cc=johannes.hirte@datenkhaos.de \ --cc=linux-edac@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=tony.luck@intel.com \ --cc=x86@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.