From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751775AbdF1SxI (ORCPT ); Wed, 28 Jun 2017 14:53:08 -0400 Received: from mail-oi0-f50.google.com ([209.85.218.50]:35673 "EHLO mail-oi0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751672AbdF1SxE (ORCPT ); Wed, 28 Jun 2017 14:53:04 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170628000630.1973-1-jack@codezen.org> <20170628092219.4df52dhwe7q3iao5@pd.tnic> From: Jack Miller Date: Wed, 28 Jun 2017 13:53:03 -0500 X-Google-Sender-Auth: AsqSzevMUGg136rkavqAjlCvrxA Message-ID: Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != thread 0 To: "Ghannam, Yazen" Cc: Jack Miller , Borislav Petkov , "linux-kernel@vger.kernel.org" , "tglx@linutronix.de" , "x86@kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Jun 28, 2017 at 1:00 PM, Ghannam, Yazen wrote: >> -----Original Message----- >> From: themoken@gmail.com [mailto:themoken@gmail.com] On Behalf Of >> Jack Miller >> Sent: Wednesday, June 28, 2017 1:44 PM >> To: Borislav Petkov >> Cc: Jack Miller ; linux-kernel@vger.kernel.org; >> tglx@linutronix.de; Ghannam, Yazen ; >> x86@kernel.org >> Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != >> thread 0 >> >> On Wed, Jun 28, 2017 at 4:22 AM, Borislav Petkov wrote: >> > On Tue, Jun 27, 2017 at 07:06:30PM -0500, Jack Miller wrote: >> >> After a call to firmware SwitchBSP(), >> > >> > What is that and who does that? >> >> SwitchBSP() is part of the UEFI MPServices Protocol which I believe is an >> extension but it is supported by all of the firmwares I've tested on. >> >> In this case, I'm using a bootloader to SwitchBSP() so that hardware thread 0 >> (and thus core 0) can be offlined on AMD hardware (cpu0_hotplug >> unsupported). This is currently working by passing 'nomce' to the kernel, but >> obviously I'd prefer not to disable it. >> > > Which core are you using as the BSP with SwitchBSP()? Core 4, hardware thread 8 overall. I am testing on a Ryzen 7 machine. > >> > >> >> Linux can be booted with a thread >> >> that isn't the first in the system. That thread automatically becomes >> >> CPU 0. >> > >> > Btw, you should be seeing other explosions too as a lot of code >> > assumes CPU 0 is the BSP. >> >> Actually, with 'nomce' or this patch applied the system seems to chug along >> merrily, no further errors in dmesg, no further BUGs. Linux still gets all of the >> topology correct (i.e. CPU 0's core/thread/siblings are correctly identified) so >> really, aside from userspace programs doing naive stuff with CPU affinity (like >> expecting even,odd CPUs to be SMT pairs), I think the overall result here is >> that most threads are interchangeable... except when probing certain >> features like these MCA types. >> > > Do you see 23 banks named in the new BSP's /sys/devices/system/machinecheck/ > folder? You should see non-core banks like l3_cache, umc, etc. With my patch applied, I see entries like l3_cache under hardware thread 0's directory (it's shifted to CPU 1, so machinecheck1). Without my patch, only machinecheck0 has anything interesting in it (insn_fetch, l2_cache etc.) because the init failed on CPU 1. Jack