From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935204AbaGRVXH (ORCPT ); Fri, 18 Jul 2014 17:23:07 -0400 Received: from mail-vc0-f176.google.com ([209.85.220.176]:58559 "EHLO mail-vc0-f176.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932280AbaGRVXG (ORCPT ); Fri, 18 Jul 2014 17:23:06 -0400 MIME-Version: 1.0 In-Reply-To: <20140717105025.GA22549@pd.tnic> References: <1404925766-32253-5-git-send-email-hskinnemoen@google.com> <20140710164151.GA5603@pd.tnic> <20140710184416.GE5603@pd.tnic> <20140710191224.GF5603@pd.tnic> <20140711092454.GA17083@pd.tnic> <20140711195200.GA18246@pd.tnic> <20140717105025.GA22549@pd.tnic> Date: Fri, 18 Jul 2014 14:23:04 -0700 Message-ID: Subject: Re: [PATCH 4/6] x86-mce: Add spinlocks to prevent duplicated MCP and CMCI reports. From: Tony Luck To: Borislav Petkov Cc: Havard Skinnemoen , Linux Kernel , Ewout van Bekkum Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 17, 2014 at 3:50 AM, Borislav Petkov wrote: > Well, maybe it is about time we tracked shared banks. For cpus that support CMCI and the MCi_CTL2 registers we do track sharing. Only one cpu gets to be the "owner" of a bank that supports CMCI (the first one to find it and set bit 30 in the CTL2 register). The test_bit() at the top of the loop in machine_check_poll() makes sure only the owner of a bank actually looks at it. for (i = 0; i < mca_cfg.banks; i++) { if (!mce_banks[i].ctl || !test_bit(i, *b)) continue; If we don't have CMCI, then we don't have the CTL2 registers, and so have no way to find out which banks are shared. > We can evaluate later if the IRQs disabling is too heavy after all. I'd be surprised if it was a problem in practice. If we have CMCI, then we limit the banks that we look at (and if we see a high rate of interrupts, then we turn off interrupts an poll). If we don't have CMCI, then we are polling at a pretty low rate (current code adjusts the rate higher if we are finding errors to log, but we don't let that rate rise forever ... cap is ~ 1HZ). -Tony