From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1422876AbcFGSLR (ORCPT ); Tue, 7 Jun 2016 14:11:17 -0400 Received: from mga02.intel.com ([134.134.136.20]:33798 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161552AbcFGSLO (ORCPT ); Tue, 7 Jun 2016 14:11:14 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.26,434,1459839600"; d="scan'208";a="823504958" Date: Tue, 7 Jun 2016 11:11:09 -0700 From: "Luck, Tony" To: Borislav Petkov Cc: linux-edac , Yazen Ghannam , X86 ML , LKML Subject: Re: [RFC PATCH 1/4] RAS: Add a Corrected Errors Collector Message-ID: <20160607181109.GA23770@intel.com> References: <1465318345-22043-1-git-send-email-bp@alien8.de> <1465318345-22043-2-git-send-email-bp@alien8.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1465318345-22043-2-git-send-email-bp@alien8.de> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 07, 2016 at 06:52:22PM +0200, Borislav Petkov wrote: > +void mce_log(struct mce *m) > { > unsigned next, entry; > > + if (!in_atomic() && memory_error(m) && mce_usable_address(m)) > + if (!ce_add_elem(m->addr >> PAGE_SHIFT)) > + return; > + > /* Emit the trace record: */ > - trace_mce_record(mce); > + trace_mce_record(m); > > - if (!mce_gen_pool_add(mce)) > + if (!mce_gen_pool_add(m)) > irq_work_queue(&mce_irq_work); Is there a reason that we need to call the ce_add_elem() inline here instead of having it just register on the mce_notifier chain? This series just cleaned out all the /dev/mcelog special code from here, and you are adding something back before the ink is dry on that change. I'm also strongly divided about whether this corrected error handler should be allowed to preempt anything else even seeing the error. Argument for: Lonely corrected errors are "No Big Deal"(TM). Just counting them and moving on is a good thing. Arguments against: 1) We may miss out on a one-time opportunity to get extra information (from acpi_extlog.c). 2) I think this subverts our CMCI storm detection and mitigation code? We could make the chain more caller friendly by adding a filter argument so users could say "just tell me about memory errors" (currently each of the EDAC drivers has inline code to do the same as "memory_error(m) && mce_usable_address(m)") -Tony