From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751786AbdF1SRO (ORCPT ); Wed, 28 Jun 2017 14:17:14 -0400 Received: from mx2.suse.de ([195.135.220.15]:57449 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751545AbdF1SRG (ORCPT ); Wed, 28 Jun 2017 14:17:06 -0400 Date: Wed, 28 Jun 2017 20:16:34 +0200 From: Borislav Petkov To: Jack Miller , Yazen.Ghannam@amd.com Cc: linux-kernel@vger.kernel.org, tglx@linutronix.de, x86@kernel.org Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != thread 0 Message-ID: <20170628181634.mmot6fdd3lmsrkog@pd.tnic> References: <20170628000630.1973-1-jack@codezen.org> <20170628092219.4df52dhwe7q3iao5@pd.tnic> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 28, 2017 at 12:44:17PM -0500, Jack Miller wrote: > SwitchBSP() is part of the UEFI MPServices Protocol which I believe is > an extension but it is supported by all of the firmwares I've tested > on. Damn, that ubiquitous firmware. One day the kernel will be just a userspace process to the fw. > In this case, I'm using a bootloader to SwitchBSP() so that hardware > thread 0 (and thus core 0) can be offlined on AMD hardware > (cpu0_hotplug unsupported). Why unsupported? I remember doing some quick experiments with booting with "cpu0_hotplug" and being able to offline the BSP. It was a long time ago though. > This is currently working by passing 'nomce' to the kernel, but > obviously I'd prefer not to disable it. Right, nomce is not an optimal setting. > Actually, with 'nomce' or this patch applied the system seems to chug > along merrily, no further errors in dmesg, no further BUGs. Linux > still gets all of the topology correct (i.e. CPU 0's > core/thread/siblings are correctly identified) so really, aside from > userspace programs doing naive stuff with CPU affinity (like expecting > even,odd CPUs to be SMT pairs), I think the overall result here is > that most threads are interchangeable... except when probing certain > features like these MCA types. May I ask what your goal is? Or is it sekrit stuff? physical hotplug maybe? > Unfortunately, it doesn't. That value is explicitly set to 0. Yeah, I see it in smp_store_boot_cpu_info(). So if we had to be really correct, that code there should set the *actual* CPU index of the BSP and not simply write a 0. It's that BSP index == 0 assumption I've been talking about. > Most mechanisms operate around CPU #, which isn't very helpful if the > BSP was changed under the covers. > > Alternatively, we could possibly sidestep the APIC ID uncertainty by > patching get_smca_bank_info() to fallback on reading the bank > hwid_mcatype from other online CPUs (it's already using > rdmsr_safe_on_cpu) if its own hwid_mcatype isn't valid/recognized, but > that's a more invasive patch. Yeah, I think there is some distinction whether you read the MSRs on the BSP and on the other threads. Yazen did that in 5896820e0aa3 ("x86/mce/AMD, EDAC/mce_amd: Define and use tables for known SMCA IP types") Yazen, why CPU 0? Can we get rid of that check there? -- Regards/Gruss, Boris. SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) --