From: Jeff Moyer <jmoyer@redhat.com>
To: Vishal Verma <vishal.l.verma@intel.com>
Cc: linux-edac@vger.kernel.org, Tony Luck <tony.luck@intel.com>,
Borislav Petkov <bp@alien8.de>,
stable@vger.kernel.org, linux-nvdimm@lists.01.org
Subject: Re: [PATCH v3 1/2] nfit, mce: only handle uncorrectable machine checks
Date: Wed, 20 Feb 2019 13:59:15 -0500 [thread overview]
Message-ID: <x49lg2a2sn0.fsf@segfault.boston.devel.redhat.com> (raw)
In-Reply-To: <20181026003729.8420-1-vishal.l.verma@intel.com> (Vishal Verma's message of "Thu, 25 Oct 2018 18:37:28 -0600")
Hi, Vishal,
Vishal Verma <vishal.l.verma@intel.com> writes:
> The mce handler for 'nfit' devices is called for memory errors on a
> Non-Volatile DIMM, and adds the error location to a 'badblocks' list.
> This list is used by the various NVDIMM drivers to avoid consuming known
> poison locations during IO.
>
> The mce handler gets called for both corrected and uncorrectable errors.
Sorry for necroposting. I thought the point of the CEC was to make sure
that the other registered decoders only ever saw uncorrected errors.
How do we end up getting called with a correctable error?
Thanks,
Jeff
> Until now, both kinds of errors have been added to the badblocks list.
> However, corrected memory errors indicate that the problem has already
> been fixed by hardware, and the resulting interrupt is merely a
> notification to Linux. As far as future accesses to that location are
> concerned, it is perfectly fine to use, and thus doesn't need to be
> included in the above badblocks list.
>
> Add a check in the nfit mce handler to filter out corrected mce events,
> and only process uncorrectable errors.
>
> Reported-by: Omar Avelar <omar.avelar@intel.com>
> Fixes: 6839a6d96f4e ("nfit: do an ARS scrub on hitting a latent media error")
> Cc: stable@vger.kernel.org
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Tony Luck <tony.luck@intel.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
> arch/x86/include/asm/mce.h | 1 +
> arch/x86/kernel/cpu/mcheck/mce.c | 3 ++-
> drivers/acpi/nfit/mce.c | 4 ++--
> 3 files changed, 5 insertions(+), 3 deletions(-)
>
> v3: Unchanged from v2
>
> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
> index 3a17107594c8..3111b3cee2ee 100644
> --- a/arch/x86/include/asm/mce.h
> +++ b/arch/x86/include/asm/mce.h
> @@ -216,6 +216,7 @@ static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *s
>
> int mce_available(struct cpuinfo_x86 *c);
> bool mce_is_memory_error(struct mce *m);
> +bool mce_is_correctable(struct mce *m);
>
> DECLARE_PER_CPU(unsigned, mce_exception_count);
> DECLARE_PER_CPU(unsigned, mce_poll_count);
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 953b3ce92dcc..27015948bc41 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -534,7 +534,7 @@ bool mce_is_memory_error(struct mce *m)
> }
> EXPORT_SYMBOL_GPL(mce_is_memory_error);
>
> -static bool mce_is_correctable(struct mce *m)
> +bool mce_is_correctable(struct mce *m)
> {
> if (m->cpuvendor == X86_VENDOR_AMD && m->status & MCI_STATUS_DEFERRED)
> return false;
> @@ -544,6 +544,7 @@ static bool mce_is_correctable(struct mce *m)
>
> return true;
> }
> +EXPORT_SYMBOL_GPL(mce_is_correctable);
>
> static bool cec_add_mce(struct mce *m)
> {
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index e9626bf6ca29..7a51707f87e9 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -25,8 +25,8 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
> struct acpi_nfit_desc *acpi_desc;
> struct nfit_spa *nfit_spa;
>
> - /* We only care about memory errors */
> - if (!mce_is_memory_error(mce))
> + /* We only care about uncorrectable memory errors */
> + if (!mce_is_memory_error(mce) || mce_is_correctable(mce))
> return NOTIFY_DONE;
>
> /*
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm
next prev parent reply other threads:[~2019-02-20 18:59 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-10-26 0:37 [PATCH v3 1/2] nfit, mce: only handle uncorrectable machine checks Vishal Verma
2018-10-26 0:37 ` [PATCH v3 2/2] nfit, mce: validate the mce->addr before using it Vishal Verma
2018-11-06 14:51 ` Borislav Petkov
2018-11-06 16:20 ` Dan Williams
2018-11-06 17:53 ` Borislav Petkov
2018-11-06 18:02 ` Dan Williams
2018-11-06 18:07 ` Borislav Petkov
2019-02-20 18:59 ` Jeff Moyer [this message]
2019-02-20 19:18 ` [PATCH v3 1/2] nfit, mce: only handle uncorrectable machine checks Borislav Petkov
2019-02-20 19:26 ` Jeff Moyer
2019-02-20 19:39 ` Borislav Petkov
2019-02-20 19:40 ` Dan Williams
2019-02-20 19:47 ` Borislav Petkov
2019-02-21 16:11 ` Jeff Moyer
2019-02-21 17:09 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=x49lg2a2sn0.fsf@segfault.boston.devel.redhat.com \
--to=jmoyer@redhat.com \
--cc=bp@alien8.de \
--cc=linux-edac@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=stable@vger.kernel.org \
--cc=tony.luck@intel.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).