linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
To: Borislav Petkov <bp@suse.de>, Jack Miller <jack@codezen.org>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"x86@kernel.org" <x86@kernel.org>
Subject: RE: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != thread 0
Date: Wed, 28 Jun 2017 18:51:08 +0000	[thread overview]
Message-ID: <BN6PR1201MB01310932CF91128D01FDB56EF8DD0@BN6PR1201MB0131.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20170628181634.mmot6fdd3lmsrkog@pd.tnic>

> -----Original Message-----
> From: Borislav Petkov [mailto:bp@suse.de]
> Sent: Wednesday, June 28, 2017 2:17 PM
> To: Jack Miller <jack@codezen.org>; Ghannam, Yazen
> <Yazen.Ghannam@amd.com>
> Cc: linux-kernel@vger.kernel.org; tglx@linutronix.de; x86@kernel.org
> Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 !=
> thread 0
> 
> On Wed, Jun 28, 2017 at 12:44:17PM -0500, Jack Miller wrote:
> > SwitchBSP() is part of the UEFI MPServices Protocol which I believe is
> > an extension but it is supported by all of the firmwares I've tested
> > on.
> 
> Damn, that ubiquitous firmware. One day the kernel will be just a userspace
> process to the fw.
> 
> > In this case, I'm using a bootloader to SwitchBSP() so that hardware
> > thread 0 (and thus core 0) can be offlined on AMD hardware
> > (cpu0_hotplug unsupported).
> 
> Why unsupported?
> 
> I remember doing some quick experiments with booting with "cpu0_hotplug"
> and being able to offline the BSP. It was a long time ago though.
> 
> > This is currently working by passing 'nomce' to the kernel, but
> > obviously I'd prefer not to disable it.
> 
> Right, nomce is not an optimal setting.
> 
> > Actually, with 'nomce' or this patch applied the system seems to chug
> > along merrily, no further errors in dmesg, no further BUGs. Linux
> > still gets all of the topology correct (i.e. CPU 0's
> > core/thread/siblings are correctly identified) so really, aside from
> > userspace programs doing naive stuff with CPU affinity (like expecting
> > even,odd CPUs to be SMT pairs), I think the overall result here is
> > that most threads are interchangeable... except when probing certain
> > features like these MCA types.
> 
> May I ask what your goal is? Or is it sekrit stuff? physical hotplug maybe?
> 
> > Unfortunately, it doesn't. That value is explicitly set to 0.
> 
> Yeah, I see it in smp_store_boot_cpu_info().
> 
> So if we had to be really correct, that code there should set the
> *actual* CPU index of the BSP and not simply write a 0. It's that
> 
> 	BSP index == 0
> 
> assumption I've been talking about.
> 
> > Most mechanisms operate around CPU #, which isn't very helpful if the
> > BSP was changed under the covers.
> >
> > Alternatively, we could possibly sidestep the APIC ID uncertainty by
> > patching get_smca_bank_info() to fallback on reading the bank
> > hwid_mcatype from other online CPUs (it's already using
> > rdmsr_safe_on_cpu) if its own hwid_mcatype isn't valid/recognized, but
> > that's a more invasive patch.
> 
> Yeah, I think there is some distinction whether you read the MSRs on the BSP
> and on the other threads. Yazen did that in
> 
>   5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for
> known SMCA IP types")
> 
> Yazen, why CPU 0? Can we get rid of that check there?
> 

The non-core MCA banks are only visible to a "master thread" on each Die. The
master thread is the first one on the Die. Since we have the same banks on each
Die we only need to read them once, and I assumed that CPU0 would always be
there.

One thing that I'm concerned about here is if the BSP is changed from CPU0, will
the non-core banks on Die0 be configured? I don't think they will be if they're
not visible to the new BSP. Unless FW does something to make the new BSP the
master thread.

We can get rid of the check, but the SMCA banks array is shared by the entire
system. So we'll need a per_cpu sysfs_id for each bank or we'll end up with
"unfriendly" names. I'll try to find a fix for this.

Please see here:
0b737a9c2af8 x86/ras/amd: Make sysfs names of banks more user-friendly

Thanks,
Yazen

  reply	other threads:[~2017-06-28 18:51 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-28  0:06 [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != thread 0 Jack Miller
2017-06-28  9:22 ` Borislav Petkov
2017-06-28 17:44   ` Jack Miller
2017-06-28 18:00     ` Ghannam, Yazen
2017-06-28 18:53       ` Jack Miller
2017-06-28 18:58         ` Ghannam, Yazen
2017-06-29 16:22           ` Jack Miller
2017-06-29 17:58             ` Ghannam, Yazen
2017-06-28 18:16     ` Borislav Petkov
2017-06-28 18:51       ` Ghannam, Yazen [this message]
2017-06-28 18:55         ` Borislav Petkov
2017-06-29 18:08 ` [PATCH] x86/mce/AMD: Allow any CPU to initialize smca_banks array Yazen Ghannam
2017-06-30 15:57   ` Jack Miller
2017-07-17  5:19   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN6PR1201MB01310932CF91128D01FDB56EF8DD0@BN6PR1201MB0131.namprd12.prod.outlook.com \
    --to=yazen.ghannam@amd.com \
    --cc=bp@suse.de \
    --cc=jack@codezen.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).