From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1947649AbdDYTQe (ORCPT ); Tue, 25 Apr 2017 15:16:34 -0400 Received: from mail-bn3nam01on0075.outbound.protection.outlook.com ([104.47.33.75]:50880 "EHLO NAM01-BN3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1947498AbdDYTQY (ORCPT ); Tue, 25 Apr 2017 15:16:24 -0400 Authentication-Results: vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=none action=none header.from=amd.com; From: Yazen Ghannam To: CC: Yazen Ghannam , Tony Luck , Borislav Petkov , , Subject: [PATCH v3 1/2] x86/mce/AMD: Redo logging of errors from APIC LVT interrupts Date: Tue, 25 Apr 2017 14:16:11 -0500 Message-ID: <1493147772-2721-1-git-send-email-Yazen.Ghannam@amd.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [165.204.78.1] X-ClientProxiedBy: MWHPR21CA0068.namprd21.prod.outlook.com (10.172.93.158) To BN6PR1201MB0131.namprd12.prod.outlook.com (10.174.114.144) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: da39b254-7efa-4fd3-6e2f-08d48c0f8f31 X-MS-Office365-Filtering-HT: Tenant X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(48565401081)(201703131423075)(201703031133081);SRVR:BN6PR1201MB0131; X-Microsoft-Exchange-Diagnostics: 1;BN6PR1201MB0131;3:piogpjrYAnTl5eMOGbB/4HoNehaR2lK9mHcQc/a0cBY9I+6WoLDx/l3IIf88wOD3Hc/4E4AN4VTM8M/x624Z8/fCIrKrVtRnENuohEG/dt2OZ3Y3yZeLbcK3gwBpKklH8O6nX7+PDEwcbU/Yjl4u4ZDo9qug5iyenhSI9JL+c/h+6X+2vJigZlJW0U5xKBIh/ftzy3XDhUItYulUZStPTlvf1fszxh3d3Q1+SYNP6hLFXsOPrwLR1siv3YeyQT9QtE/ixqlt69ZZGqYJ936Ox5FdtH81uPiFJZtJAwD6fU/aZt6EBq8vYLufcgtKYU3Nq7j1ZXbZg+qgadYZAHQ1PZbWFlvf0zrI3v8ZvmM3NJY=;25:E0sTxhs0c+69vDppt9+/xkl/us58ZRYhczRA94rcxX41oJh5VgzpFeBbmId48dIArkbgA9HyOHi+yg9WPgWphUao4O9/X+VM+4q8YeYp4o72SyETn+GAN/1T2PwH/iT0Xi4QEiawiO7/S2VexPQvJGtbpdruZ+WsnI24ZxzOsG2XSsoxFD6BwtuePEhS+cPFJvgqXxjIrwNDKLIMA7EfrMAp2Rc1CluF3sX0zs/g+1DtuQUbUz/qPBEcLxJDJusXeszavEQzndyy2znRhcNkLSxpUQLuLucCakHOE8C//6cVcCemRGztjwLdK3h6g7X18h+hFwONmYbNGv7GgRHp6xi0Zn48f0fY4VnQAeZ9JoxHCA5BaUeelIzItfgkmCybJgWhyyDCFLq2DV0Td2QHMesqRG7zfuwiv1AgjgW5JpKSO8bAMDWc+NQBUtRH5cFTBgEJYj1FnIpKGXgkw9nNtw== X-Microsoft-Exchange-Diagnostics: 1;BN6PR1201MB0131;31:jZDY+75nBjoGDVQezhLc/ZPsWgCUDCYDoPFK1K0Y1lRmQGqqEx17BmjTihg7QapQyMg0IVoMxjcTzwzscxcxgdVgSVfE1auWm84nW4btXBjgpO8zcapp//ZAgplZqPEJMpZ2yh3VeMVTHA8l1BF7jRkZiRmVx47Z44s+5wvIKrZYlvfvOZwHLOTGVPljOITr5bWnlZBQ1mcGZw92X6MNebEPg8CvSuzm74JeZnAY0w2aZ36HwY499bRYh/gPJjkj;20:eJkp+NSYk5pDMSblLleX4Cj77Gf1EVJI8uCSOlGQY/UjCS/BUMPCVDtcXDqpd+nb+nNw+xXGBxP9Nco+8CUw+M4BLtB1No+7C8dp05ezFTstHcdagZ0NFfmKuTrwUS3mELWMYMNZLT6I3CYQccqQuTDaazMAUG2Lg6d8TjHwNpzZ5fA7Qz14J+V+BbQUwZzbJbr3numgk/lY1z0eROduTlAnX58j5dCLZdZGKSljjY308wGHmn3eMwX9QoFK9ijzitKrbC9xIcFwpluohY8pF7At98E/AkUrVN3208d3AhQ9WHEPJexK+GauTsDwf0uKVEt/Ez9qvfGOd0yIOzeWafay/awqtzkEikzxSR4LxL+KCmU5eNRS3I4x9OQWmGu1nDikac91LM+EqzmZIyzmGXRqcSCLRboRh7L1R0Z/6fSjriLGdyCjbHjyj1FjzKIKOxRn/8iqhdJBunj/lrDCdLL/ZpG6X8Zh+1s3SZEiU41BZ8ED8igtnsK6rgiGCiog X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(767451399110)(42068640409301); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040450)(601004)(2401047)(5005006)(8121501046)(93006095)(93001095)(3002001)(10201501046)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(20161123562025)(20161123560025)(20161123555025)(20161123564025)(6072148);SRVR:BN6PR1201MB0131;BCL:0;PCL:0;RULEID:;SRVR:BN6PR1201MB0131; X-Microsoft-Exchange-Diagnostics: 1;BN6PR1201MB0131;4:ZK2W/oXcg7N0zxv4bmdJFO1ewh2o2sGdUl87RslJccWVGvGpBRxl4RqPEuqiVPsQ58BLNl+yKWfVAyUGo68hcAJO1dLFkjRanOPKmGhVEivvrxVhz3ZHc5aWfwNytnu3IAmtEwfsswkhmgE9iYLN6PXaTRRFV5zRXQ1wc8JAzmZMGn/y5vUG8MD3jjMtDX4BNAJbwxpSsyALWtzCNi2omMEoZqEbyDZadkYfMA+8Jej0PR0ds2wkofN+PnyIyAvlwYxNipiPC0PtmAibEiy3o1+sVpZwsT4Y2OUYtLrLFbQAVhd8FjVm0GkIBKJvgdzmix59P9S50hjQkTrDYJtef4/tEt0OdLB8CH0H4YnMMIi1GeoBuzs5T6BGP/95jcoX9HrI6M1Jy9/tPy63CiGdJeBZxTNRvgHlRCQv5tt/C5v4PW6XqFRWozg8RWZBzMs0F8GSNB0Nu3I8M5fBI9FRdBz4RV5lBG8nQfUOxSd5zC8iopIKjP2J/CkujTfgXHOZ9Mr+jLAXdYCgb2OedZ1BkYGEDnBmgxePLdzmgV1qvdcNU3eOzsJWJjrHhlPd37FrTVENu0yiZ5UUOV1wfi4gOtr0t5elMheIpju2v7bMkOhbvr7zDqEqyXxaSqyAiQ0HCYF1b/uZEDIiVE6NVq/C4/T9os+6+uXWNuNQ9sokuOxxB48m9e6PqtuwKb+j+7J1RTb2WavO/P0f6MSq+dpM4E54l2AgLD17VzC1bavc+3K2mDpZUlo/s2fm+sOSrh1H0DMuYqm9uF/Z7fFDCncdIWAeJ9NIp2E+C9zWhKIEFPS4j+fhwEeQff1RGx2FuOOr X-Forefront-PRVS: 0288CD37D9 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(4630300001)(6009001)(39450400003)(39860400002)(39840400002)(39400400002)(39410400002)(39850400002)(305945005)(6116002)(6916009)(7736002)(53416004)(6666003)(42186005)(3846002)(53936002)(6306002)(2351001)(189998001)(54906002)(5003940100001)(38730400002)(47776003)(66066001)(50986999)(110136004)(2906002)(5660300001)(8676002)(48376002)(25786009)(6486002)(50466002)(81166006)(36756003)(50226002)(4326008)(86362001)(217873001);DIR:OUT;SFP:1101;SCL:1;SRVR:BN6PR1201MB0131;H:yaz-diesel.amd.com;FPR:;SPF:None;MLV:sfv;LANG:en; X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1;BN6PR1201MB0131;23:U2NtgyiQV+3s7T4cW6vbxHYQ67BJqbJ+GBfNC5g?= =?us-ascii?Q?GGAb/ncVVvdqj7gE3M17Mqieit7I7OhK9rQbCc4rrePx0DMnr/1EGrVdXlYA?= =?us-ascii?Q?G6lZhPl0N5CpYmfAHLfeGg2TrjJQh6dygAb01s7vn6hHj+LyFTprKrsRSmTP?= =?us-ascii?Q?+L7oLEaDnQLGry4z4oOLH5Xf7npCwPDOP7iOh+JjECYsq3g87u16cTKJFT4R?= =?us-ascii?Q?EwdDU8wNSZcTYsq6mvCOnEGUAYPGaRE9NkytW8TYbNoIFviTyy7IUikw4l8D?= =?us-ascii?Q?ib0SzkjjZR7R8LKv0mSWeurQXNQFpH+tj0xqUL6pp8w2c3giR7tZMEfDWYKv?= =?us-ascii?Q?mJ3ei0/O+CnydPdiK+I7wHTWb0TAVHx3+b5EMrGi8IAo8qdpnQcKnxsgYxtG?= =?us-ascii?Q?yTokYpD9Z4MAuwN4Vz7cm4RZaIVt/hy0j1bus24uXbbk90OLjiGeBIHRPhIR?= =?us-ascii?Q?TipW9Qw9fufwJZbESEQpX6kHlzf6ET2lXUo7d24oMxvR+4FlL1j4MaAPJY1M?= =?us-ascii?Q?V+QEzIIm/n6zGvcPoUXpDa337x3Fo/3J85RxpZpjtGBw0Y5vIzG5Ng1gwh9/?= =?us-ascii?Q?JjPY1O+K/vb2l5dqcSZXHPuYMxOP6cxBrvcB4svLITChi1LeDew+drFFSDsH?= =?us-ascii?Q?B0eQ5uDZFRCWSgZROjw38ZeTp3phZm1vvTta/8EqZKiNo8kyFempVC88uNt4?= =?us-ascii?Q?uUWyekisVK7OjwWGvgL6vEBx9cVSYfYlRgbyrKbC9DzLSAgDAmg4vYjDjFJs?= =?us-ascii?Q?mfe2LhlikmxOlWouAdfDppqOs+qLCOUUhi6OpxU5ztdmOGHSl65oZI1cMF7i?= =?us-ascii?Q?/0gPXzvdPHWxhyGHjd84vaYQW4x3Yx+56AiLkt2fpmH9kSVZd6G2looGjdib?= =?us-ascii?Q?BaAjcndlh0ZpuLWsyDCRAeZz79dw8lke1Bl8VDIjEkEcClsPrMP4sW99I/Kr?= =?us-ascii?Q?xU+YeSaPoVBJmAgyoUD8dAVHGsZIHUQMoaYnwKzdWoO6JLGdmbYg9Ax8rqS/?= =?us-ascii?Q?mAWWhHs1CtwxkTwaflT5FqwEiC2+i0PVorzb9oFU9E9WTbw=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1;BN6PR1201MB0131;6:C6VkYjoIcrD8yzQ/yZDRftP6EtOug2nzAeZOnhAi3o0T+hGrcsyMYcNjwh1s53LIDxpVZVqQYUtTGN5mPZ1S3u8YxDMtbpl9oX+ZT022wpnXOchUhfkwD6wZyk50QU/2x91rChnRQ0KW331C3uw5JZJhrgR94cb0EQ/pFyCxXfdg+F4FVrLJ28cnPabNJSiHzmyJQDOC1lN1xeLlpn4iAalxwIhwB4W7POr9zO5MdF/OQQoD5CmO9our274lJJ8bUsT+CuF2WD+b12HhWTAg+fmPuhSIodcfmNrWmXtTzAdoEgKfsS9G0WuGK9+xFcEjDZa1c2vTf0FtwwH/OGlJPWsxOQvaNl/Hew+p3V8/Vdr3YF96AMkTTCnBdxCl6c9ZYkiqc/PQ4BOHaxsSONJfNC4XSv2VztQ5kEukLpFjlT0BHUD1EaKh+TcSkfrm7MRlDw8yxD09XewI6OEto8HRmWRvv04DG/ekuX1MUHMr+J/Mrlams7rycLbe5qfKZTrX87wGmw2VzcgXI5dXE9WwSEFTk8YmCZ7lJXz3vlTauwg=;5:Ngz/xRoT2x3XzrQZUbhZVlHsQjoecLzerMLnVwpr0B39mE2gvVTfuZfx4u/9wx6Eu6Sb00t6CWfcQuVFSgFpBVxYHOp2hBD4vDryVLFf4nq9FP2bLntcbmcSXM8puopKRq2qD5AQqmGW0ImrnKcf5A==;24:uytzxeLc/wS54/m5+4VjY9nDuZHUCWr1+AFBBsuo2Lyjxl49AQfvKTtuLg1/mSnmWB+uniFTgZt7hQhEJZGb4JztC5YqYyuM9+OQ9Y4iwBk= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;BN6PR1201MB0131;7:1dVgd1gAt71LKsfCQRoRb9jLI/kAnSXkBu6ctDt8CB2GNRxJ0zH2w6OKgX451N1BSSubA+4KZeNV0+woAYDmmIYHq8Qg4s5+TVmOXQRazwieIqyidwK6BFEul9YztS1mX6u7vD4OPusRqhJKIjc0P2EwTRHGuWtnrYmIb3ME8PSCR77cGvuMT3l3hzH2AMrZBRw8QRPsIQjqKA72ka7Rs2QeDDEv/n+sipgwQBQS8IHNzzov5vCPlqSVUETxXV5oD/Av97EtaRDW5M9uF9Yt8Uu95Zyez9YX1+ZY2CEGChc9gowuCNc2o9b5evdCRcvu/vMJ5l3kmmdlbaC3OCnWWQ==;20:k734jqozKsNkjcQcoA1oPE93CQ/IImMy4Y52ryg+0MNUMCgTQSijs+d63ya4kytEk32818bv0JP715BnxS/YLe3ffmGBadmDUa/ndxJs+dzYs+Ho47H3zpfhIHUFu4YIr2O6fGqMrS+V4H77cdB3b1s+UbDQ9vZaREbiffRdNWNGoADSoy0DD9ik5dmF0eBGuQ+/BZhVffbDUrpoWOnf9rqAhV/6mu1RlY1PlAMiXOZB+Zhv4MMCwB+RijZJxpov X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 25 Apr 2017 19:16:20.7951 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR1201MB0131 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Yazen Ghannam We have support for the new SMCA MCA_DE{STAT,ADDR} registers in Linux. So we've used these registers in place of MCA_{STATUS,ADDR} on SMCA systems. However, the guidance for current implementations of SMCA is to continue using MCA_{STATUS,ADDR} and to use MCA_DE{STAT,ADDR} only if a Deferred error was not found in the former registers. If we logged a Deferred error in MCA_STATUS then we should also clear MCA_DESTAT. This also means we shouldn't clear MCA_CONFIG[LogDeferredInMcaStat]. Don't clear MCA_CONFIG[LogDeferredInMcaStat] during AMD mcheck init. Don't break after finding the first error in either the Deferred or Thresholding interrupt handlers. Rework __log_error() to only log error. Most valid checks are moved into __log_error_* helper functions. Write the Deferred error __log_error_* helper function to follow the guidance for current SMCA systems. Signed-off-by: Yazen Ghannam --- Link: https://lkml.kernel.org/r/1491326672-48298-1-git-send-email-Yazen.Ghannam@amd.com v2->v3: - Write __log_error_*() helper functions for Deferred and Thresholding errors. - Redo __log_error() to only log the error and do valid checks in the helpers. - Log every valid error in the DE handler rather than just Deferred errors. - Don't break after the first error found in either interrupt handler. v1->v2: - Change name of check_deferred_status() to is_deferred_error(). - Rework __log_error() to move out SMCA/Deferred error-specific code. arch/x86/kernel/cpu/mcheck/mce_amd.c | 137 +++++++++++++++++++---------------- 1 file changed, 75 insertions(+), 62 deletions(-) diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index 6e4a047..e6e507d 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -472,20 +472,6 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr, smca_high |= BIT(0); /* - * SMCA logs Deferred Error information in MCA_DE{STAT,ADDR} - * registers with the option of additionally logging to - * MCA_{STATUS,ADDR} if MCA_CONFIG[LogDeferredInMcaStat] is set. - * - * This bit is usually set by BIOS to retain the old behavior - * for OSes that don't use the new registers. Linux supports the - * new registers so let's disable that additional logging here. - * - * MCA_CONFIG[LogDeferredInMcaStat] is bit 34 (bit 2 in the high - * portion of the MSR). - */ - smca_high &= ~BIT(2); - - /* * SMCA sets the Deferred Error Interrupt type per bank. * * MCA_CONFIG[DeferredIntTypeSupported] is bit 5, and tells us @@ -756,36 +742,19 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) EXPORT_SYMBOL_GPL(umc_normaddr_to_sysaddr); static void -__log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc) +__log_error(unsigned int bank, u64 status, u64 addr, u64 misc) { - u32 msr_status = msr_ops.status(bank); - u32 msr_addr = msr_ops.addr(bank); struct mce m; - u64 status; - - WARN_ON_ONCE(deferred_err && threshold_err); - - if (deferred_err && mce_flags.smca) { - msr_status = MSR_AMD64_SMCA_MCx_DESTAT(bank); - msr_addr = MSR_AMD64_SMCA_MCx_DEADDR(bank); - } - - rdmsrl(msr_status, status); - - if (!(status & MCI_STATUS_VAL)) - return; mce_setup(&m); m.status = status; + m.misc = misc; m.bank = bank; m.tsc = rdtsc(); - if (threshold_err) - m.misc = misc; - if (m.status & MCI_STATUS_ADDRV) { - rdmsrl(msr_addr, m.addr); + m.addr = addr; /* * Extract [55:] where lsb is the least significant @@ -806,8 +775,6 @@ __log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc) } mce_log(&m); - - wrmsrl(msr_status, 0); } static inline void __smp_deferred_error_interrupt(void) @@ -832,33 +799,83 @@ asmlinkage __visible void __irq_entry smp_trace_deferred_error_interrupt(void) exiting_ack_irq(); } +/* + * We have three scenarios for checking for Deferred errors. + * 1) Non-SMCA systems check MCA_STATUS and log error if found. + * 2) SMCA systems check MCA_STATUS. If error is found then log it and also + * clear MCA_DESTAT. + * 3) SMCA systems check MCA_DESTAT, if error was not found in MCA_STATUS, and + * log it. + */ +static void +__log_error_deferred(unsigned int bank) +{ + u64 status, addr; + bool logged_deferred = false; + + rdmsrl(msr_ops.status(bank), status); + + /* Log any valid error we find. */ + if (status & MCI_STATUS_VAL) { + rdmsrl(msr_ops.addr(bank), addr); + + __log_error(bank, status, addr, 0); + + wrmsrl(msr_ops.status(bank), 0); + + logged_deferred = !!(status & MCI_STATUS_DEFERRED); + } + + if (!mce_flags.smca) + return; + + /* Clear MCA_DESTAT if we logged the deferred error from MCA_STATUS. */ + if (logged_deferred) { + wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0); + return; + } + + rdmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), status); + + /* + * Only deferred errors are logged in MCA_DE{STAT,ADDR} so just check + * for a valid error. + */ + if (status & MCI_STATUS_VAL) { + rdmsrl(MSR_AMD64_SMCA_MCx_DEADDR(bank), addr); + + __log_error(bank, status, addr, 0); + + wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0); + } +} + /* APIC interrupt handler for deferred errors */ static void amd_deferred_error_interrupt(void) { unsigned int bank; - u32 msr_status; - u64 status; - for (bank = 0; bank < mca_cfg.banks; ++bank) { - msr_status = (mce_flags.smca) ? MSR_AMD64_SMCA_MCx_DESTAT(bank) - : msr_ops.status(bank); + for (bank = 0; bank < mca_cfg.banks; ++bank) + __log_error_deferred(bank); +} + +static void __log_error_thresholding(unsigned int bank, u64 misc) +{ + u64 status, addr; - rdmsrl(msr_status, status); + rdmsrl(msr_ops.status(bank), status); - if (!(status & MCI_STATUS_VAL) || - !(status & MCI_STATUS_DEFERRED)) - continue; + if (status & MCI_STATUS_VAL) { + rdmsrl(msr_ops.addr(bank), addr); - __log_error(bank, true, false, 0); - break; + __log_error(bank, status, addr, misc); + + wrmsrl(msr_ops.status(bank), 0); } } /* * APIC Interrupt Handler - */ - -/* * threshold interrupt handler will service THRESHOLD_APIC_VECTOR. * the interrupt goes off when error_count reaches threshold_limit. * the handler will simply log mcelog w/ software defined bank number. @@ -870,7 +887,6 @@ static void amd_threshold_interrupt(void) unsigned int bank, block, cpu = smp_processor_id(); struct thresh_restart tr; - /* assume first bank caused it */ for (bank = 0; bank < mca_cfg.banks; ++bank) { if (!(per_cpu(bank_map, cpu) & (1 << bank))) continue; @@ -897,19 +913,16 @@ static void amd_threshold_interrupt(void) * Log the machine check that caused the threshold * event. */ - if (high & MASK_OVERFLOW_HI) - goto log; + if (high & MASK_OVERFLOW_HI) { + __log_error_thresholding(bank, ((u64)high << 32) | low); + + /* Reset threshold block after logging error. */ + memset(&tr, 0, sizeof(tr)); + tr.b = &per_cpu(threshold_banks, cpu)[bank]->blocks[block]; + threshold_restart_bank(&tr); + } } } - return; - -log: - __log_error(bank, false, true, ((u64)high << 32) | low); - - /* Reset threshold block after logging error. */ - memset(&tr, 0, sizeof(tr)); - tr.b = &per_cpu(threshold_banks, cpu)[bank]->blocks[block]; - threshold_restart_bank(&tr); } /* -- 2.7.4