From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752486AbdF1S7s (ORCPT ); Wed, 28 Jun 2017 14:59:48 -0400 Received: from mail-bn3nam01on0042.outbound.protection.outlook.com ([104.47.33.42]:34264 "EHLO NAM01-BN3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751618AbdF1S6i (ORCPT ); Wed, 28 Jun 2017 14:58:38 -0400 From: "Ghannam, Yazen" To: Jack Miller CC: Borislav Petkov , "linux-kernel@vger.kernel.org" , "tglx@linutronix.de" , "x86@kernel.org" Subject: RE: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != thread 0 Thread-Topic: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != thread 0 Thread-Index: AQHS76Jnbcc3gTkcckexeDxn0cm136I6AM+AgACMQICAAAKioIAAEJSAgAABQNA= Date: Wed, 28 Jun 2017 18:58:36 +0000 Message-ID: References: <20170628000630.1973-1-jack@codezen.org> <20170628092219.4df52dhwe7q3iao5@pd.tnic> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: codezen.org; dkim=none (message not signed) header.d=none;codezen.org; dmarc=none action=none header.from=amd.com; x-originating-ip: [2601:345:300:4104:4403:c3bc:1f6:b82] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;BN6PR1201MB0130;20:QsElNK5eh9ZP43iuMlpwBBVqmBFhe667OiUpVMdfED7aMJQldIbP1nwH3eHxvRM/Bmh16L3vEgcnFcoh5x77ADxdxFJsqfsVwTyOBnYlOGdGlP+Pt3SVIgIcwmLzL8crHH0+7EhVAPBM4/fRQiDFjxudnEh8ZU16rjNGquOiOtOO5sIDjHExGC401z/RzxVe6O/LcZMbOaZgithhYcPLGmB6YxMWMgxqf13BhsJQ1ugT0mS4SCirj63XVGGy2Bka x-ms-office365-filtering-correlation-id: 7dd26906-1ade-4785-b5cb-08d4be57ae9c x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(300000500095)(300135000095)(300000501095)(300135300095)(22001)(300000502095)(300135100095)(2017030254075)(48565401081)(300000503095)(300135400095)(201703131423075)(201703031133081)(300000504095)(300135200095)(300000505095)(300135600095)(300000506095)(300135500095);SRVR:BN6PR1201MB0130; x-ms-traffictypediagnostic: BN6PR1201MB0130: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(236129657087228)(9452136761055)(767451399110)(148574349560750); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(100000700101)(100105000095)(100000701101)(100105300095)(100000702101)(100105100095)(6040450)(601004)(2401047)(5005006)(8121501046)(3002001)(100000703101)(100105400095)(10201501046)(93006095)(93001095)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123564025)(20161123560025)(20161123558100)(20161123555025)(6072148)(100000704101)(100105200095)(100000705101)(100105500095);SRVR:BN6PR1201MB0130;BCL:0;PCL:0;RULEID:(100000800101)(100110000095)(100000801101)(100110300095)(100000802101)(100110100095)(100000803101)(100110400095)(100000804101)(100110200095)(100000805101)(100110500095);SRVR:BN6PR1201MB0130; x-forefront-prvs: 03524FBD26 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(39410400002)(39860400002)(39840400002)(39450400003)(39850400002)(39400400002)(13464003)(24454002)(377454003)(478600001)(6916009)(6436002)(2950100002)(74316002)(6506006)(77096006)(2906002)(3660700001)(3280700002)(189998001)(4326008)(72206003)(50986999)(6116002)(33656002)(25786009)(229853002)(102836003)(54356999)(305945005)(76176999)(7736002)(53546010)(7696004)(6246003)(81166006)(15650500001)(110136004)(8676002)(38730400002)(2900100001)(86362001)(55016002)(93886004)(53936002)(8936002)(14454004)(9686003)(99286003)(5660300001)(54906002);DIR:OUT;SFP:1101;SCL:1;SRVR:BN6PR1201MB0130;H:BN6PR1201MB0131.namprd12.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-originalarrivaltime: 28 Jun 2017 18:58:36.6650 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR1201MB0130 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by mail.home.local id v5SIxqST008301 > -----Original Message----- > From: themoken@gmail.com [mailto:themoken@gmail.com] On Behalf Of > Jack Miller > Sent: Wednesday, June 28, 2017 2:53 PM > To: Ghannam, Yazen > Cc: Jack Miller ; Borislav Petkov ; linux- > kernel@vger.kernel.org; tglx@linutronix.de; x86@kernel.org > Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU 0 != > thread 0 > > On Wed, Jun 28, 2017 at 1:00 PM, Ghannam, Yazen > wrote: > >> -----Original Message----- > >> From: themoken@gmail.com [mailto:themoken@gmail.com] On Behalf Of > >> Jack Miller > >> Sent: Wednesday, June 28, 2017 1:44 PM > >> To: Borislav Petkov > >> Cc: Jack Miller ; linux-kernel@vger.kernel.org; > >> tglx@linutronix.de; Ghannam, Yazen ; > >> x86@kernel.org > >> Subject: Re: [PATCH] x86/mce/AMD: Fix partial SMCA bank init when CPU > >> 0 != thread 0 > >> > >> On Wed, Jun 28, 2017 at 4:22 AM, Borislav Petkov wrote: > >> > On Tue, Jun 27, 2017 at 07:06:30PM -0500, Jack Miller wrote: > >> >> After a call to firmware SwitchBSP(), > >> > > >> > What is that and who does that? > >> > >> SwitchBSP() is part of the UEFI MPServices Protocol which I believe > >> is an extension but it is supported by all of the firmwares I've tested on. > >> > >> In this case, I'm using a bootloader to SwitchBSP() so that hardware > >> thread 0 (and thus core 0) can be offlined on AMD hardware > >> (cpu0_hotplug unsupported). This is currently working by passing > >> 'nomce' to the kernel, but obviously I'd prefer not to disable it. > >> > > > > Which core are you using as the BSP with SwitchBSP()? > > Core 4, hardware thread 8 overall. I am testing on a Ryzen 7 machine. > > > > >> > > >> >> Linux can be booted with a thread > >> >> that isn't the first in the system. That thread automatically > >> >> becomes CPU 0. > >> > > >> > Btw, you should be seeing other explosions too as a lot of code > >> > assumes CPU 0 is the BSP. > >> > >> Actually, with 'nomce' or this patch applied the system seems to chug > >> along merrily, no further errors in dmesg, no further BUGs. Linux > >> still gets all of the topology correct (i.e. CPU 0's > >> core/thread/siblings are correctly identified) so really, aside from > >> userspace programs doing naive stuff with CPU affinity (like > >> expecting even,odd CPUs to be SMT pairs), I think the overall result > >> here is that most threads are interchangeable... except when probing > certain features like these MCA types. > >> > > > > Do you see 23 banks named in the new BSP's > > /sys/devices/system/machinecheck/ folder? You should see non-core banks > like l3_cache, umc, etc. > > With my patch applied, I see entries like l3_cache under hardware thread 0's > directory (it's shifted to CPU 1, so machinecheck1). > Without my patch, only machinecheck0 has anything interesting in it > (insn_fetch, l2_cache etc.) because the init failed on CPU 1. > What happens with SMT off? Thanks, Yazen