All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-20 22:18 ` Vishal Verma
  0 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-20 22:18 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-acpi, Tony Luck, stable

The check for an MCE being a memory error in the NFIT mce handler was
bogus. Fix it to check for the correct MCA status compound error code.

Reported-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit/mce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index 3ba1c34..23e12a0 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
 	struct nfit_spa *nfit_spa;
 
 	/* We only care about memory errors */
-	if (!(mce->status & MCACOD))
+	if (!(mce->status & 0xef80) == BIT(7))
 		return NOTIFY_DONE;
 
 	/*
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-20 22:18 ` Vishal Verma
  0 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-20 22:18 UTC (permalink / raw)
  To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, Tony Luck,
	stable-u79uwXL29TY76Z2rM5mHXA

The check for an MCE being a memory error in the NFIT mce handler was
bogus. Fix it to check for the correct MCA status compound error code.

Reported-by: Tony Luck <tony.luck-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Signed-off-by: Vishal Verma <vishal.l.verma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/acpi/nfit/mce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index 3ba1c34..23e12a0 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
 	struct nfit_spa *nfit_spa;
 
 	/* We only care about memory errors */
-	if (!(mce->status & MCACOD))
+	if (!(mce->status & 0xef80) == BIT(7))
 		return NOTIFY_DONE;
 
 	/*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-20 22:18 ` Vishal Verma
  0 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-20 22:18 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-acpi, Tony Luck, Vishal Verma, stable

The check for an MCE being a memory error in the NFIT mce handler was
bogus. Fix it to check for the correct MCA status compound error code.

Reported-by: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 drivers/acpi/nfit/mce.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index 3ba1c34..23e12a0 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
 	struct nfit_spa *nfit_spa;
 
 	/* We only care about memory errors */
-	if (!(mce->status & MCACOD))
+	if (!(mce->status & 0xef80) == BIT(7))
 		return NOTIFY_DONE;
 
 	/*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
  2017-04-20 22:18 ` Vishal Verma
@ 2017-04-20 22:21   ` Verma, Vishal L
  -1 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-04-20 22:21 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: linux-acpi, Luck,

On Thu, 2017-04-20 at 16:18 -0600, Vishal Verma wrote:
> The check for an MCE being a memory error in the NFIT mce handler was
> bogus. Fix it to check for the correct MCA status compound error code.
> 
> Reported-by: Tony Luck <tony.luck@intel.com>
> Cc: <stable@vger.kernel.org>

Forgot to include,
Fixes: 6839a6d96f4e nfit: do an ARS scrub on hitting a latent media error

> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit/mce.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index 3ba1c34..23e12a0 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block
> *nb, unsigned long val,
>  	struct nfit_spa *nfit_spa;
>  
>  	/* We only care about memory errors */
> -	if (!(mce->status & MCACOD))
> +	if (!(mce->status & 0xef80) == BIT(7))
>  		return NOTIFY_DONE;
>  
>  	/*
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-20 22:21   ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-04-20 22:21 UTC (permalink / raw)
  To: linux-nvdimm; +Cc: Luck, Tony, stable, linux-acpi

On Thu, 2017-04-20 at 16:18 -0600, Vishal Verma wrote:
> The check for an MCE being a memory error in the NFIT mce handler was
> bogus. Fix it to check for the correct MCA status compound error code.
> 
> Reported-by: Tony Luck <tony.luck@intel.com>
> Cc: <stable@vger.kernel.org>

Forgot to include,
Fixes: 6839a6d96f4e nfit: do an ARS scrub on hitting a latent media error

> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit/mce.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index 3ba1c34..23e12a0 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block
> *nb, unsigned long val,
>  	struct nfit_spa *nfit_spa;
>  
>  	/* We only care about memory errors */
> -	if (!(mce->status & MCACOD))
> +	if (!(mce->status & 0xef80) == BIT(7))
>  		return NOTIFY_DONE;
>  
>  	/*

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21  2:21   ` kbuild test robot
  0 siblings, 0 replies; 66+ messages in thread
From: kbuild test robot @ 2017-04-21  2:21 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Tony Luck, linux-nvdimm, stable, linux-acpi, kbuild-all

Hi Vishal,

[auto build test WARNING on pm/linux-next]
[also build test WARNING on v4.11-rc7 next-20170420]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Vishal-Verma/acpi-nfit-fix-the-memory-error-check-in-nfit_handle_mce/20170421-084359
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-x005-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/acpi/nfit/mce.c: In function 'nfit_handle_mce':
>> drivers/acpi/nfit/mce.c:29:30: warning: comparison of constant '128ul' with boolean expression is always false [-Wbool-compare]
     if (!(mce->status & 0xef80) == BIT(7))
                                 ^~
>> drivers/acpi/nfit/mce.c:29:30: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]

vim +/128ul +29 drivers/acpi/nfit/mce.c

    13	 * General Public License for more details.
    14	 */
    15	#include <linux/notifier.h>
    16	#include <linux/acpi.h>
    17	#include <linux/nd.h>
    18	#include <asm/mce.h>
    19	#include "nfit.h"
    20	
    21	static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
    22				void *data)
    23	{
    24		struct mce *mce = (struct mce *)data;
    25		struct acpi_nfit_desc *acpi_desc;
    26		struct nfit_spa *nfit_spa;
    27	
    28		/* We only care about memory errors */
  > 29		if (!(mce->status & 0xef80) == BIT(7))
    30			return NOTIFY_DONE;
    31	
    32		/*
    33		 * mce->addr contains the physical addr accessed that caused the
    34		 * machine check. We need to walk through the list of NFITs, and see
    35		 * if any of them matches that address, and only then start a scrub.
    36		 */
    37		mutex_lock(&acpi_desc_lock);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21  2:21   ` kbuild test robot
  0 siblings, 0 replies; 66+ messages in thread
From: kbuild test robot @ 2017-04-21  2:21 UTC (permalink / raw)
  To: Vishal Verma
  Cc: Tony Luck, linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	stable-u79uwXL29TY76Z2rM5mHXA, linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	kbuild-all-JC7UmRfGjtg

Hi Vishal,

[auto build test WARNING on pm/linux-next]
[also build test WARNING on v4.11-rc7 next-20170420]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Vishal-Verma/acpi-nfit-fix-the-memory-error-check-in-nfit_handle_mce/20170421-084359
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-x005-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/acpi/nfit/mce.c: In function 'nfit_handle_mce':
>> drivers/acpi/nfit/mce.c:29:30: warning: comparison of constant '128ul' with boolean expression is always false [-Wbool-compare]
     if (!(mce->status & 0xef80) == BIT(7))
                                 ^~
>> drivers/acpi/nfit/mce.c:29:30: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]

vim +/128ul +29 drivers/acpi/nfit/mce.c

    13	 * General Public License for more details.
    14	 */
    15	#include <linux/notifier.h>
    16	#include <linux/acpi.h>
    17	#include <linux/nd.h>
    18	#include <asm/mce.h>
    19	#include "nfit.h"
    20	
    21	static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
    22				void *data)
    23	{
    24		struct mce *mce = (struct mce *)data;
    25		struct acpi_nfit_desc *acpi_desc;
    26		struct nfit_spa *nfit_spa;
    27	
    28		/* We only care about memory errors */
  > 29		if (!(mce->status & 0xef80) == BIT(7))
    30			return NOTIFY_DONE;
    31	
    32		/*
    33		 * mce->addr contains the physical addr accessed that caused the
    34		 * machine check. We need to walk through the list of NFITs, and see
    35		 * if any of them matches that address, and only then start a scrub.
    36		 */
    37		mutex_lock(&acpi_desc_lock);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21  2:21   ` kbuild test robot
  0 siblings, 0 replies; 66+ messages in thread
From: kbuild test robot @ 2017-04-21  2:21 UTC (permalink / raw)
  To: Vishal Verma
  Cc: kbuild-all, linux-nvdimm, linux-acpi, Tony Luck, Vishal Verma, stable

[-- Attachment #1: Type: text/plain, Size: 2158 bytes --]

Hi Vishal,

[auto build test WARNING on pm/linux-next]
[also build test WARNING on v4.11-rc7 next-20170420]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Vishal-Verma/acpi-nfit-fix-the-memory-error-check-in-nfit_handle_mce/20170421-084359
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: x86_64-randconfig-x005-201716 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/acpi/nfit/mce.c: In function 'nfit_handle_mce':
>> drivers/acpi/nfit/mce.c:29:30: warning: comparison of constant '128ul' with boolean expression is always false [-Wbool-compare]
     if (!(mce->status & 0xef80) == BIT(7))
                                 ^~
>> drivers/acpi/nfit/mce.c:29:30: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses]

vim +/128ul +29 drivers/acpi/nfit/mce.c

    13	 * General Public License for more details.
    14	 */
    15	#include <linux/notifier.h>
    16	#include <linux/acpi.h>
    17	#include <linux/nd.h>
    18	#include <asm/mce.h>
    19	#include "nfit.h"
    20	
    21	static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
    22				void *data)
    23	{
    24		struct mce *mce = (struct mce *)data;
    25		struct acpi_nfit_desc *acpi_desc;
    26		struct nfit_spa *nfit_spa;
    27	
    28		/* We only care about memory errors */
  > 29		if (!(mce->status & 0xef80) == BIT(7))
    30			return NOTIFY_DONE;
    31	
    32		/*
    33		 * mce->addr contains the physical addr accessed that caused the
    34		 * machine check. We need to walk through the list of NFITs, and see
    35		 * if any of them matches that address, and only then start a scrub.
    36		 */
    37		mutex_lock(&acpi_desc_lock);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 24819 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
  2017-04-20 22:18 ` Vishal Verma
@ 2017-04-21 19:21   ` Dan Williams
  -1 siblings, 0 replies; 66+ messages in thread
From: Dan Williams @ 2017-04-21 19:21 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Linux ACPI, Tony Luck, stable, linux-nvdimm

On Thu, Apr 20, 2017 at 3:18 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> The check for an MCE being a memory error in the NFIT mce handler was
> bogus. Fix it to check for the correct MCA status compound error code.
>
> Reported-by: Tony Luck <tony.luck@intel.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit/mce.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index 3ba1c34..23e12a0 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
>         struct nfit_spa *nfit_spa;
>
>         /* We only care about memory errors */
> -       if (!(mce->status & MCACOD))
> +       if (!(mce->status & 0xef80) == BIT(7))

Can we get a define for this, or a comment explaining all the magic
that's happening on that one line?
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 19:21   ` Dan Williams
  0 siblings, 0 replies; 66+ messages in thread
From: Dan Williams @ 2017-04-21 19:21 UTC (permalink / raw)
  To: Vishal Verma; +Cc: linux-nvdimm, Linux ACPI, Tony Luck, stable

On Thu, Apr 20, 2017 at 3:18 PM, Vishal Verma <vishal.l.verma@intel.com> wrote:
> The check for an MCE being a memory error in the NFIT mce handler was
> bogus. Fix it to check for the correct MCA status compound error code.
>
> Reported-by: Tony Luck <tony.luck@intel.com>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  drivers/acpi/nfit/mce.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index 3ba1c34..23e12a0 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
>         struct nfit_spa *nfit_spa;
>
>         /* We only care about memory errors */
> -       if (!(mce->status & MCACOD))
> +       if (!(mce->status & 0xef80) == BIT(7))

Can we get a define for this, or a comment explaining all the magic
that's happening on that one line?

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 19:56     ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-04-21 19:56 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: linux-acpi, Luck,

On Fri, 2017-04-21 at 12:21 -0700, Dan Williams wrote:
> On Thu, Apr 20, 2017 at 3:18 PM, Vishal Verma <vishal.l.verma@intel.co
> m> wrote:
> > The check for an MCE being a memory error in the NFIT mce handler
> > was
> > bogus. Fix it to check for the correct MCA status compound error
> > code.
> > 
> > Reported-by: Tony Luck <tony.luck@intel.com>
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/acpi/nfit/mce.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> > index 3ba1c34..23e12a0 100644
> > --- a/drivers/acpi/nfit/mce.c
> > +++ b/drivers/acpi/nfit/mce.c
> > @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block
> > *nb, unsigned long val,
> >         struct nfit_spa *nfit_spa;
> > 
> >         /* We only care about memory errors */
> > -       if (!(mce->status & MCACOD))
> > +       if (!(mce->status & 0xef80) == BIT(7))
> 
> Can we get a define for this, or a comment explaining all the magic
> that's happening on that one line?

Yes - also like lkp pointed out, the check isn't correct at all. Let me
figure out what really needs to be done, and I will resend with a better
comment. 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 19:56     ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-04-21 19:56 UTC (permalink / raw)
  To: Williams, Dan J
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, Luck, Tony,
	stable-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

On Fri, 2017-04-21 at 12:21 -0700, Dan Williams wrote:
> On Thu, Apr 20, 2017 at 3:18 PM, Vishal Verma <vishal.l.verma@intel.co
> m> wrote:
> > The check for an MCE being a memory error in the NFIT mce handler
> > was
> > bogus. Fix it to check for the correct MCA status compound error
> > code.
> > 
> > Reported-by: Tony Luck <tony.luck@intel.com>
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/acpi/nfit/mce.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> > index 3ba1c34..23e12a0 100644
> > --- a/drivers/acpi/nfit/mce.c
> > +++ b/drivers/acpi/nfit/mce.c
> > @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block
> > *nb, unsigned long val,
> >         struct nfit_spa *nfit_spa;
> > 
> >         /* We only care about memory errors */
> > -       if (!(mce->status & MCACOD))
> > +       if (!(mce->status & 0xef80) == BIT(7))
> 
> Can we get a define for this, or a comment explaining all the magic
> that's happening on that one line?

Yes - also like lkp pointed out, the check isn't correct at all. Let me
figure out what really needs to be done, and I will resend with a better
comment. 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 19:56     ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-04-21 19:56 UTC (permalink / raw)
  To: Williams, Dan J; +Cc: linux-nvdimm, Luck, Tony, stable, linux-acpi

On Fri, 2017-04-21 at 12:21 -0700, Dan Williams wrote:
> On Thu, Apr 20, 2017 at 3:18 PM, Vishal Verma <vishal.l.verma@intel.co
> m> wrote:
> > The check for an MCE being a memory error in the NFIT mce handler
> > was
> > bogus. Fix it to check for the correct MCA status compound error
> > code.
> > 
> > Reported-by: Tony Luck <tony.luck@intel.com>
> > Cc: <stable@vger.kernel.org>
> > Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> > ---
> >  drivers/acpi/nfit/mce.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> > index 3ba1c34..23e12a0 100644
> > --- a/drivers/acpi/nfit/mce.c
> > +++ b/drivers/acpi/nfit/mce.c
> > @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block
> > *nb, unsigned long val,
> >         struct nfit_spa *nfit_spa;
> > 
> >         /* We only care about memory errors */
> > -       if (!(mce->status & MCACOD))
> > +       if (!(mce->status & 0xef80) == BIT(7))
> 
> Can we get a define for this, or a comment explaining all the magic
> that's happening on that one line?

Yes - also like lkp pointed out, the check isn't correct at all. Let me
figure out what really needs to be done, and I will resend with a better
comment. 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:16       ` Luck, Tony
  0 siblings, 0 replies; 66+ messages in thread
From: Luck, Tony @ 2017-04-21 20:16 UTC (permalink / raw)
  To: Verma, Vishal L, Williams, Dan J; +Cc: linux-acpi, stable, linux-nvdimm

>> > +       if (!(mce->status & 0xef80) == BIT(7))
>> 
>> Can we get a define for this, or a comment explaining all the magic
>> that's happening on that one line?
>
> Yes - also like lkp pointed out, the check isn't correct at all. Let me
> figure out what really needs to be done, and I will resend with a better
> comment. 

Needs extra parentheses to make it right. Vishal, sorry I led you astray.

	if (!((mce->status & 0xef80) == BIT(7)))

The magic is shown in table 15-9 of the Intel Software Developers Manual
(but perhaps not well explained there).

mce->status in the above code is a value plucked from a machine check
bank status register. See figure 15-6 in the SDM.  The important bits for this
are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
are grouped into types, where the type is defined by the most significant '1'
bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
see section 15.9.2.1).

So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.

Maybe we should have defines in mce.h for them?  It gets a bit more complicated
as all the above only applies to Intel branded X86 CPUs ... on AMD different
decoding rules apply.

-Tony


_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:16       ` Luck, Tony
  0 siblings, 0 replies; 66+ messages in thread
From: Luck, Tony @ 2017-04-21 20:16 UTC (permalink / raw)
  To: Verma, Vishal L, Williams, Dan J
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

>> > +       if (!(mce->status & 0xef80) == BIT(7))
>> 
>> Can we get a define for this, or a comment explaining all the magic
>> that's happening on that one line?
>
> Yes - also like lkp pointed out, the check isn't correct at all. Let me
> figure out what really needs to be done, and I will resend with a better
> comment. 

Needs extra parentheses to make it right. Vishal, sorry I led you astray.

	if (!((mce->status & 0xef80) == BIT(7)))

The magic is shown in table 15-9 of the Intel Software Developers Manual
(but perhaps not well explained there).

mce->status in the above code is a value plucked from a machine check
bank status register. See figure 15-6 in the SDM.  The important bits for this
are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
are grouped into types, where the type is defined by the most significant '1'
bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
see section 15.9.2.1).

So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.

Maybe we should have defines in mce.h for them?  It gets a bit more complicated
as all the above only applies to Intel branded X86 CPUs ... on AMD different
decoding rules apply.

-Tony


_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:16       ` Luck, Tony
  0 siblings, 0 replies; 66+ messages in thread
From: Luck, Tony @ 2017-04-21 20:16 UTC (permalink / raw)
  To: Verma, Vishal L, Williams, Dan J; +Cc: linux-nvdimm, stable, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 1344 bytes --]

>> > +       if (!(mce->status & 0xef80) == BIT(7))
>> 
>> Can we get a define for this, or a comment explaining all the magic
>> that's happening on that one line?
>
> Yes - also like lkp pointed out, the check isn't correct at all. Let me
> figure out what really needs to be done, and I will resend with a better
> comment. 

Needs extra parentheses to make it right. Vishal, sorry I led you astray.

	if (!((mce->status & 0xef80) == BIT(7)))

The magic is shown in table 15-9 of the Intel Software Developers Manual
(but perhaps not well explained there).

mce->status in the above code is a value plucked from a machine check
bank status register. See figure 15-6 in the SDM.  The important bits for this
are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
are grouped into types, where the type is defined by the most significant '1'
bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
see section 15.9.2.1).

So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.

Maybe we should have defines in mce.h for them?  It gets a bit more complicated
as all the above only applies to Intel branded X86 CPUs ... on AMD different
decoding rules apply.

-Tony



[-- Attachment #2: compounderrorcodes.png --]
[-- Type: image/png, Size: 23390 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
  2017-04-21 20:16       ` Luck, Tony
@ 2017-04-21 20:19         ` Dan Williams
  -1 siblings, 0 replies; 66+ messages in thread
From: Dan Williams @ 2017-04-21 20:19 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-acpi, stable, linux-nvdimm

On Fri, Apr 21, 2017 at 1:16 PM, Luck, Tony <tony.luck@intel.com> wrote:
>>> > +       if (!(mce->status & 0xef80) == BIT(7))
>>>
>>> Can we get a define for this, or a comment explaining all the magic
>>> that's happening on that one line?
>>
>> Yes - also like lkp pointed out, the check isn't correct at all. Let me
>> figure out what really needs to be done, and I will resend with a better
>> comment.
>
> Needs extra parentheses to make it right. Vishal, sorry I led you astray.
>
>         if (!((mce->status & 0xef80) == BIT(7)))
>
> The magic is shown in table 15-9 of the Intel Software Developers Manual
> (but perhaps not well explained there).
>
> mce->status in the above code is a value plucked from a machine check
> bank status register. See figure 15-6 in the SDM.  The important bits for this
> are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
> are grouped into types, where the type is defined by the most significant '1'
> bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
> see section 15.9.2.1).
>
> So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
> error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.

Ah, ok.

> Maybe we should have defines in mce.h for them?  It gets a bit more complicated
> as all the above only applies to Intel branded X86 CPUs ... on AMD different
> decoding rules apply.

Yeah, this code is x86_64 generic so should call into helpers that do
the right thing per cpu type.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:19         ` Dan Williams
  0 siblings, 0 replies; 66+ messages in thread
From: Dan Williams @ 2017-04-21 20:19 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Verma, Vishal L, linux-nvdimm, stable, linux-acpi

On Fri, Apr 21, 2017 at 1:16 PM, Luck, Tony <tony.luck@intel.com> wrote:
>>> > +       if (!(mce->status & 0xef80) == BIT(7))
>>>
>>> Can we get a define for this, or a comment explaining all the magic
>>> that's happening on that one line?
>>
>> Yes - also like lkp pointed out, the check isn't correct at all. Let me
>> figure out what really needs to be done, and I will resend with a better
>> comment.
>
> Needs extra parentheses to make it right. Vishal, sorry I led you astray.
>
>         if (!((mce->status & 0xef80) == BIT(7)))
>
> The magic is shown in table 15-9 of the Intel Software Developers Manual
> (but perhaps not well explained there).
>
> mce->status in the above code is a value plucked from a machine check
> bank status register. See figure 15-6 in the SDM.  The important bits for this
> are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
> are grouped into types, where the type is defined by the most significant '1'
> bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
> see section 15.9.2.1).
>
> So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
> error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.

Ah, ok.

> Maybe we should have defines in mce.h for them?  It gets a bit more complicated
> as all the above only applies to Intel branded X86 CPUs ... on AMD different
> decoding rules apply.

Yeah, this code is x86_64 generic so should call into helpers that do
the right thing per cpu type.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
  2017-04-21 20:19         ` Dan Williams
@ 2017-04-21 20:27           ` Luck, Tony
  -1 siblings, 0 replies; 66+ messages in thread
From: Luck, Tony @ 2017-04-21 20:27 UTC (permalink / raw)
  To: Dan Williams, Borislav Petkov; +Cc: linux-acpi, stable, linux-nvdimm

On Fri, Apr 21, 2017 at 01:19:16PM -0700, Dan Williams wrote:
> On Fri, Apr 21, 2017 at 1:16 PM, Luck, Tony <tony.luck@intel.com> wrote:
> >>> > +       if (!(mce->status & 0xef80) == BIT(7))
> >>>
> >>> Can we get a define for this, or a comment explaining all the magic
> >>> that's happening on that one line?
> >>
> >> Yes - also like lkp pointed out, the check isn't correct at all. Let me
> >> figure out what really needs to be done, and I will resend with a better
> >> comment.
> >
> > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> >
> >         if (!((mce->status & 0xef80) == BIT(7)))
> >
> > The magic is shown in table 15-9 of the Intel Software Developers Manual
> > (but perhaps not well explained there).
> >
> > mce->status in the above code is a value plucked from a machine check
> > bank status register. See figure 15-6 in the SDM.  The important bits for this
> > are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
> > are grouped into types, where the type is defined by the most significant '1'
> > bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
> > see section 15.9.2.1).
> >
> > So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
> > error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.
> 
> Ah, ok.
> 
> > Maybe we should have defines in mce.h for them?  It gets a bit more complicated
> > as all the above only applies to Intel branded X86 CPUs ... on AMD different
> > decoding rules apply.
> 
> Yeah, this code is x86_64 generic so should call into helpers that do
> the right thing per cpu type.

Boris: you coded up a "static bool memory_error(struct mce *m)"
function inside the patches for the corrected error thingy.

Perhaps when it goes upstream it should be available for other
users too?

-Tony
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:27           ` Luck, Tony
  0 siblings, 0 replies; 66+ messages in thread
From: Luck, Tony @ 2017-04-21 20:27 UTC (permalink / raw)
  To: Dan Williams, Borislav Petkov
  Cc: Verma, Vishal L, linux-nvdimm, stable, linux-acpi

On Fri, Apr 21, 2017 at 01:19:16PM -0700, Dan Williams wrote:
> On Fri, Apr 21, 2017 at 1:16 PM, Luck, Tony <tony.luck@intel.com> wrote:
> >>> > +       if (!(mce->status & 0xef80) == BIT(7))
> >>>
> >>> Can we get a define for this, or a comment explaining all the magic
> >>> that's happening on that one line?
> >>
> >> Yes - also like lkp pointed out, the check isn't correct at all. Let me
> >> figure out what really needs to be done, and I will resend with a better
> >> comment.
> >
> > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> >
> >         if (!((mce->status & 0xef80) == BIT(7)))
> >
> > The magic is shown in table 15-9 of the Intel Software Developers Manual
> > (but perhaps not well explained there).
> >
> > mce->status in the above code is a value plucked from a machine check
> > bank status register. See figure 15-6 in the SDM.  The important bits for this
> > are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
> > are grouped into types, where the type is defined by the most significant '1'
> > bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
> > see section 15.9.2.1).
> >
> > So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
> > error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.
> 
> Ah, ok.
> 
> > Maybe we should have defines in mce.h for them?  It gets a bit more complicated
> > as all the above only applies to Intel branded X86 CPUs ... on AMD different
> > decoding rules apply.
> 
> Yeah, this code is x86_64 generic so should call into helpers that do
> the right thing per cpu type.

Boris: you coded up a "static bool memory_error(struct mce *m)"
function inside the patches for the corrected error thingy.

Perhaps when it goes upstream it should be available for other
users too?

-Tony

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
  2017-04-21 20:16       ` Luck, Tony
  (?)
@ 2017-04-21 20:35         ` Vishal Verma
  -1 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-21 20:35 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-acpi, stable, linux-nvdimm

On 04/21, Luck, Tony wrote:
> >> > +       if (!(mce->status & 0xef80) == BIT(7))
> >> 
> >> Can we get a define for this, or a comment explaining all the magic
> >> that's happening on that one line?
> >
> > Yes - also like lkp pointed out, the check isn't correct at all. Let me
> > figure out what really needs to be done, and I will resend with a better
> > comment. 
> 
> Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> 
> 	if (!((mce->status & 0xef80) == BIT(7)))

Is this still right though? Anything AND'ed with 0xef80 will never equal
BIT(7) which is simply 01000000 binary (the lowest byte of the left hand
side is '0')

> 
> The magic is shown in table 15-9 of the Intel Software Developers Manual
> (but perhaps not well explained there).
> 
> mce->status in the above code is a value plucked from a machine check
> bank status register. See figure 15-6 in the SDM.  The important bits for this
> are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
> are grouped into types, where the type is defined by the most significant '1'
> bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
> see section 15.9.2.1).
> 
> So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
> error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.
> 
> Maybe we should have defines in mce.h for them?  It gets a bit more complicated
> as all the above only applies to Intel branded X86 CPUs ... on AMD different
> decoding rules apply.
> 
> -Tony
> 
> 


_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:35         ` Vishal Verma
  0 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-21 20:35 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Williams, Dan J, linux-nvdimm, stable, linux-acpi

On 04/21, Luck, Tony wrote:
> >> > +       if (!(mce->status & 0xef80) == BIT(7))
> >> 
> >> Can we get a define for this, or a comment explaining all the magic
> >> that's happening on that one line?
> >
> > Yes - also like lkp pointed out, the check isn't correct at all. Let me
> > figure out what really needs to be done, and I will resend with a better
> > comment. 
> 
> Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> 
> 	if (!((mce->status & 0xef80) == BIT(7)))

Is this still right though? Anything AND'ed with 0xef80 will never equal
BIT(7) which is simply 01000000 binary (the lowest byte of the left hand
side is '0')

> 
> The magic is shown in table 15-9 of the Intel Software Developers Manual
> (but perhaps not well explained there).
> 
> mce->status in the above code is a value plucked from a machine check
> bank status register. See figure 15-6 in the SDM.  The important bits for this
> are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
> are grouped into types, where the type is defined by the most significant '1'
> bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
> see section 15.9.2.1).
> 
> So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
> error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.
> 
> Maybe we should have defines in mce.h for them?  It gets a bit more complicated
> as all the above only applies to Intel branded X86 CPUs ... on AMD different
> decoding rules apply.
> 
> -Tony
> 
> 



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:35         ` Vishal Verma
  0 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-21 20:35 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Williams, Dan J, linux-nvdimm, stable, linux-acpi

On 04/21, Luck, Tony wrote:
> >> > +�������if (!(mce->status & 0xef80) == BIT(7))
> >> 
> >> Can we get a define for this, or a comment explaining all the magic
> >> that's happening on that one line?
> >
> > Yes - also like lkp pointed out, the check isn't correct at all. Let me
> > figure out what really needs to be done, and I will resend with a better
> > comment. 
> 
> Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> 
> 	if (!((mce->status & 0xef80) == BIT(7)))

Is this still right though? Anything AND'ed with 0xef80 will never equal
BIT(7) which is simply 01000000 binary (the lowest byte of the left hand
side is '0')

> 
> The magic is shown in table 15-9 of the Intel Software Developers Manual
> (but perhaps not well explained there).
> 
> mce->status in the above code is a value plucked from a machine check
> bank status register. See figure 15-6 in the SDM.  The important bits for this
> are {15:0} which are the "MCA Error code".  Table 15-9 shows how these
> are grouped into types, where the type is defined by the most significant '1'
> bit in the field (excluding bit 12 which is the Correction Report Filtering bit,
> see section 15.9.2.1).
> 
> So if BIT(3) is the most significant bit, the this is a "Generic Cache Hierarchy"
> error, BIT(4) denotes a TLB error, BIT(7) a Memory error, and so on.
> 
> Maybe we should have defines in mce.h for them?  It gets a bit more complicated
> as all the above only applies to Intel branded X86 CPUs ... on AMD different
> decoding rules apply.
> 
> -Tony
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:50           ` Luck, Tony
  0 siblings, 0 replies; 66+ messages in thread
From: Luck, Tony @ 2017-04-21 20:50 UTC (permalink / raw)
  To: Vishal Verma; +Cc: linux-acpi, stable, linux-nvdimm

On Fri, Apr 21, 2017 at 02:35:51PM -0600, Vishal Verma wrote:
> On 04/21, Luck, Tony wrote:
> > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> > 
> > 	if (!((mce->status & 0xef80) == BIT(7)))
> 
> Is this still right though? Anything AND'ed with 0xef80 will never equal
> BIT(7) which is simply 01000000 binary (the lowest byte of the left hand
> side is '0')

I think so ... here it is in binary

ef80 = 1110 1111 1000 0000
BIT7 = 0000 0000 1000 0000

so the "&" will zap bits {6:0} and bit {12}  [and everything not part
of the MCACOD field].

If mce->status had some bit above BIT(7) set, it won't be zapped, so we
won't match the exact value BIT(7).

-Tony
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:50           ` Luck, Tony
  0 siblings, 0 replies; 66+ messages in thread
From: Luck, Tony @ 2017-04-21 20:50 UTC (permalink / raw)
  To: Vishal Verma
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

On Fri, Apr 21, 2017 at 02:35:51PM -0600, Vishal Verma wrote:
> On 04/21, Luck, Tony wrote:
> > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> > 
> > 	if (!((mce->status & 0xef80) == BIT(7)))
> 
> Is this still right though? Anything AND'ed with 0xef80 will never equal
> BIT(7) which is simply 01000000 binary (the lowest byte of the left hand
> side is '0')

I think so ... here it is in binary

ef80 = 1110 1111 1000 0000
BIT7 = 0000 0000 1000 0000

so the "&" will zap bits {6:0} and bit {12}  [and everything not part
of the MCACOD field].

If mce->status had some bit above BIT(7) set, it won't be zapped, so we
won't match the exact value BIT(7).

-Tony

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:50           ` Luck, Tony
  0 siblings, 0 replies; 66+ messages in thread
From: Luck, Tony @ 2017-04-21 20:50 UTC (permalink / raw)
  To: Vishal Verma; +Cc: Williams, Dan J, linux-nvdimm, stable, linux-acpi

On Fri, Apr 21, 2017 at 02:35:51PM -0600, Vishal Verma wrote:
> On 04/21, Luck, Tony wrote:
> > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> > 
> > 	if (!((mce->status & 0xef80) == BIT(7)))
> 
> Is this still right though? Anything AND'ed with 0xef80 will never equal
> BIT(7) which is simply 01000000 binary (the lowest byte of the left hand
> side is '0')

I think so ... here it is in binary

ef80 = 1110 1111 1000 0000
BIT7 = 0000 0000 1000 0000

so the "&" will zap bits {6:0} and bit {12}  [and everything not part
of the MCACOD field].

If mce->status had some bit above BIT(7) set, it won't be zapped, so we
won't match the exact value BIT(7).

-Tony

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
  2017-04-21 20:50           ` Luck, Tony
@ 2017-04-21 20:54             ` Vishal Verma
  -1 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-21 20:54 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-acpi, stable, linux-nvdimm

On 04/21, Luck, Tony wrote:
> On Fri, Apr 21, 2017 at 02:35:51PM -0600, Vishal Verma wrote:
> > On 04/21, Luck, Tony wrote:
> > > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> > > 
> > > 	if (!((mce->status & 0xef80) == BIT(7)))
> > 
> > Is this still right though? Anything AND'ed with 0xef80 will never equal
> > BIT(7) which is simply 01000000 binary (the lowest byte of the left hand
> > side is '0')
> 
> I think so ... here it is in binary
> 
> ef80 = 1110 1111 1000 0000
> BIT7 = 0000 0000 1000 0000
> 
> so the "&" will zap bits {6:0} and bit {12}  [and everything not part
> of the MCACOD field].
> 
> If mce->status had some bit above BIT(7) set, it won't be zapped, so we
> won't match the exact value BIT(7).

Ah, you're right, I was off by one, taking BIT(7) to mean 0100 0000

> 
> -Tony
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 20:54             ` Vishal Verma
  0 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-21 20:54 UTC (permalink / raw)
  To: Luck, Tony; +Cc: Williams, Dan J, linux-nvdimm, stable, linux-acpi

On 04/21, Luck, Tony wrote:
> On Fri, Apr 21, 2017 at 02:35:51PM -0600, Vishal Verma wrote:
> > On 04/21, Luck, Tony wrote:
> > > Needs extra parentheses to make it right. Vishal, sorry I led you astray.
> > > 
> > > 	if (!((mce->status & 0xef80) == BIT(7)))
> > 
> > Is this still right though? Anything AND'ed with 0xef80 will never equal
> > BIT(7) which is simply 01000000 binary (the lowest byte of the left hand
> > side is '0')
> 
> I think so ... here it is in binary
> 
> ef80 = 1110 1111 1000 0000
> BIT7 = 0000 0000 1000 0000
> 
> so the "&" will zap bits {6:0} and bit {12}  [and everything not part
> of the MCACOD field].
> 
> If mce->status had some bit above BIT(7) set, it won't be zapped, so we
> won't match the exact value BIT(7).

Ah, you're right, I was off by one, taking BIT(7) to mean 0100 0000

> 
> -Tony

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 21:07             ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-04-21 21:07 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-acpi, stable, linux-nvdimm

On Fri, Apr 21, 2017 at 01:27:41PM -0700, Luck, Tony wrote:
> Boris: you coded up a "static bool memory_error(struct mce *m)"
> function inside the patches for the corrected error thingy.
> 
> Perhaps when it goes upstream it should be available for other
> users too?

I don't see why not. struct mce.cpuvendor even has the vendor in there
so memory_error() wouldn't even have to look at boot_cpu_data when doing
per-vendor decision.

I guess we should rename it to something more global namespace-y like
"mce_is_memory_error() or so, though, before we expose it to wider
audience...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 21:07             ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-04-21 21:07 UTC (permalink / raw)
  To: Luck, Tony
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, stable-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

On Fri, Apr 21, 2017 at 01:27:41PM -0700, Luck, Tony wrote:
> Boris: you coded up a "static bool memory_error(struct mce *m)"
> function inside the patches for the corrected error thingy.
> 
> Perhaps when it goes upstream it should be available for other
> users too?

I don't see why not. struct mce.cpuvendor even has the vendor in there
so memory_error() wouldn't even have to look at boot_cpu_data when doing
per-vendor decision.

I guess we should rename it to something more global namespace-y like
"mce_is_memory_error() or so, though, before we expose it to wider
audience...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce
@ 2017-04-21 21:07             ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-04-21 21:07 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Dan Williams, Verma, Vishal L, linux-nvdimm, stable, linux-acpi

On Fri, Apr 21, 2017 at 01:27:41PM -0700, Luck, Tony wrote:
> Boris: you coded up a "static bool memory_error(struct mce *m)"
> function inside the patches for the corrected error thingy.
> 
> Perhaps when it goes upstream it should be available for other
> users too?

I don't see why not. struct mce.cpuvendor even has the vendor in there
so memory_error() wouldn't even have to look at boot_cpu_data when doing
per-vendor decision.

I guess we should rename it to something more global namespace-y like
"mce_is_memory_error() or so, though, before we expose it to wider
audience...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* [PATCH 1/2] x86/MCE: Export memory_error()
  2017-04-21 21:07             ` Borislav Petkov
@ 2017-04-24 11:36               ` Borislav Petkov
  -1 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-04-24 11:36 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-acpi, stable, linux-nvdimm

From: Borislav Petkov <bp@suse.de>
Date: Mon, 24 Apr 2017 13:16:50 +0200
Subject: [PATCH 1/2] x86/MCE: Export memory_error()

Export the function which checks whether an MCE is a memory error to
other users so that we can reuse the logic. Drop the boot_cpu_data use,
while at it, as mce.cpuvendor already has the CPU vendor in there.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mce.h       |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c | 12 +++++-------
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4fd5195deed0..3f9a3d2a5209 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -266,6 +266,7 @@ static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *s
 #endif
 
 int mce_available(struct cpuinfo_x86 *c);
+bool mce_is_memory_error(struct mce *m);
 
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4a29f7481761..5e79dd211d35 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -510,16 +510,14 @@ static int mce_usable_address(struct mce *m)
 	return 1;
 }
 
-static bool memory_error(struct mce *m)
+bool mce_is_memory_error(struct mce *m)
 {
-	struct cpuinfo_x86 *c = &boot_cpu_data;
-
-	if (c->x86_vendor == X86_VENDOR_AMD) {
+	if (m->cpuvendor == X86_VENDOR_AMD) {
 		/* ErrCodeExt[20:16] */
 		u8 xec = (m->status >> 16) & 0x1f;
 
 		return (xec == 0x0 || xec == 0x8);
-	} else if (c->x86_vendor == X86_VENDOR_INTEL) {
+	} else if (m->cpuvendor == X86_VENDOR_INTEL) {
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -547,7 +545,7 @@ static bool cec_add_mce(struct mce *m)
 		return false;
 
 	/* We eat only correctable DRAM errors with usable addresses. */
-	if (memory_error(m) &&
+	if (mce_is_memory_error(m) &&
 	    !(m->status & MCI_STATUS_UC) &&
 	    mce_usable_address(m))
 		if (!cec_add_elem(m->addr >> PAGE_SHIFT))
@@ -724,7 +722,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
-		if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m))
+		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
 			if (m.status & MCI_STATUS_ADDRV)
 				m.severity = severity;
 
-- 
2.11.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-04-24 11:36               ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-04-24 11:36 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Dan Williams, Verma, Vishal L, linux-nvdimm, stable, linux-acpi

From: Borislav Petkov <bp@suse.de>
Date: Mon, 24 Apr 2017 13:16:50 +0200
Subject: [PATCH 1/2] x86/MCE: Export memory_error()

Export the function which checks whether an MCE is a memory error to
other users so that we can reuse the logic. Drop the boot_cpu_data use,
while at it, as mce.cpuvendor already has the CPU vendor in there.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/include/asm/mce.h       |  1 +
 arch/x86/kernel/cpu/mcheck/mce.c | 12 +++++-------
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4fd5195deed0..3f9a3d2a5209 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -266,6 +266,7 @@ static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *s
 #endif
 
 int mce_available(struct cpuinfo_x86 *c);
+bool mce_is_memory_error(struct mce *m);
 
 DECLARE_PER_CPU(unsigned, mce_exception_count);
 DECLARE_PER_CPU(unsigned, mce_poll_count);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 4a29f7481761..5e79dd211d35 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -510,16 +510,14 @@ static int mce_usable_address(struct mce *m)
 	return 1;
 }
 
-static bool memory_error(struct mce *m)
+bool mce_is_memory_error(struct mce *m)
 {
-	struct cpuinfo_x86 *c = &boot_cpu_data;
-
-	if (c->x86_vendor == X86_VENDOR_AMD) {
+	if (m->cpuvendor == X86_VENDOR_AMD) {
 		/* ErrCodeExt[20:16] */
 		u8 xec = (m->status >> 16) & 0x1f;
 
 		return (xec == 0x0 || xec == 0x8);
-	} else if (c->x86_vendor == X86_VENDOR_INTEL) {
+	} else if (m->cpuvendor == X86_VENDOR_INTEL) {
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -547,7 +545,7 @@ static bool cec_add_mce(struct mce *m)
 		return false;
 
 	/* We eat only correctable DRAM errors with usable addresses. */
-	if (memory_error(m) &&
+	if (mce_is_memory_error(m) &&
 	    !(m->status & MCI_STATUS_UC) &&
 	    mce_usable_address(m))
 		if (!cec_add_elem(m->addr >> PAGE_SHIFT))
@@ -724,7 +722,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 
 		severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
 
-		if (severity == MCE_DEFERRED_SEVERITY && memory_error(&m))
+		if (severity == MCE_DEFERRED_SEVERITY && mce_is_memory_error(&m))
 			if (m.status & MCI_STATUS_ADDRV)
 				m.severity = severity;
 
-- 
2.11.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/2] x86/ras/mce_amd_inj: Preset MCE injection struct
  2017-04-21 21:07             ` Borislav Petkov
@ 2017-04-24 11:37               ` Borislav Petkov
  -1 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-04-24 11:37 UTC (permalink / raw)
  To: Luck, Tony; +Cc: linux-acpi, stable, linux-nvdimm

From: Borislav Petkov <bp@suse.de>
Date: Mon, 24 Apr 2017 13:19:34 +0200
Subject: [PATCH 2/2] x86/ras/mce_amd_inj: Preset MCE injection struct

Populate the MCE injection struct before doing initial injection so that
values which don't change already have proper values.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/ras/mce_amd_inj.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/ras/mce_amd_inj.c b/arch/x86/ras/mce_amd_inj.c
index 8730c2882fff..e3b5d7e7e9ee 100644
--- a/arch/x86/ras/mce_amd_inj.c
+++ b/arch/x86/ras/mce_amd_inj.c
@@ -463,6 +463,8 @@ static int __init init_mce_inject(void)
 			goto err_dfs_add;
 	}
 
+	mce_setup(&i_mce);
+
 	return 0;
 
 err_dfs_add:
-- 
2.11.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH 2/2] x86/ras/mce_amd_inj: Preset MCE injection struct
@ 2017-04-24 11:37               ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-04-24 11:37 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Dan Williams, Verma, Vishal L, linux-nvdimm, stable, linux-acpi

From: Borislav Petkov <bp@suse.de>
Date: Mon, 24 Apr 2017 13:19:34 +0200
Subject: [PATCH 2/2] x86/ras/mce_amd_inj: Preset MCE injection struct

Populate the MCE injection struct before doing initial injection so that
values which don't change already have proper values.

Signed-off-by: Borislav Petkov <bp@suse.de>
---
 arch/x86/ras/mce_amd_inj.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/ras/mce_amd_inj.c b/arch/x86/ras/mce_amd_inj.c
index 8730c2882fff..e3b5d7e7e9ee 100644
--- a/arch/x86/ras/mce_amd_inj.c
+++ b/arch/x86/ras/mce_amd_inj.c
@@ -463,6 +463,8 @@ static int __init init_mce_inject(void)
 			goto err_dfs_add;
 	}
 
+	mce_setup(&i_mce);
+
 	return 0;
 
 err_dfs_add:
-- 
2.11.0

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-04-24 11:36               ` Borislav Petkov
@ 2017-04-25 21:07                 ` Vishal Verma
  -1 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-25 21:07 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-acpi, Luck,

On 04/24, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> Date: Mon, 24 Apr 2017 13:16:50 +0200
> Subject: [PATCH 1/2] x86/MCE: Export memory_error()
> 
> Export the function which checks whether an MCE is a memory error to
> other users so that we can reuse the logic. Drop the boot_cpu_data use,
> while at it, as mce.cpuvendor already has the CPU vendor in there.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/include/asm/mce.h       |  1 +
>  arch/x86/kernel/cpu/mcheck/mce.c | 12 +++++-------
>  2 files changed, 6 insertions(+), 7 deletions(-)
> 
Here is the updated patch to use the above helper:

8<-----


>From 9661a85799c9067d762ecf29630f2b7f69897628 Mon Sep 17 00:00:00 2001
From: Vishal Verma <vishal.l.verma@intel.com>
Date: Tue, 25 Apr 2017 15:00:58 -0600
Subject: [PATCH v2] acpi, nfit: fix the memory error check in nfit_handle_mce

The check for an MCE being a memory error in the NFIT mce handler was
bogus. Export the new mce_is_memory_error helper, and use that tp
perform the correct check in the handler.

Reported-by: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 1 +
 drivers/acpi/nfit/mce.c          | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

This applies on tip/master + Borislav's patches in this thread above.
I'm not sure what the right process for queueing this for both
upstream and -stable is, so just replying here. Should I post it
independently?

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 361865ca..5cfbaeb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -527,6 +527,7 @@ bool mce_is_memory_error(struct mce *m)
 
 	return false;
 }
+EXPORT_SYMBOL_GPL(mce_is_memory_error);
 
 static bool cec_add_mce(struct mce *m)
 {
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index 3ba1c34..fd86bec 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
 	struct nfit_spa *nfit_spa;
 
 	/* We only care about memory errors */
-	if (!(mce->status & MCACOD))
+	if (!mce_is_memory_error(mce))
 		return NOTIFY_DONE;
 
 	/*
-- 
2.9.3

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-04-25 21:07                 ` Vishal Verma
  0 siblings, 0 replies; 66+ messages in thread
From: Vishal Verma @ 2017-04-25 21:07 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Luck, Tony, Dan Williams, linux-nvdimm, stable, linux-acpi

On 04/24, Borislav Petkov wrote:
> From: Borislav Petkov <bp@suse.de>
> Date: Mon, 24 Apr 2017 13:16:50 +0200
> Subject: [PATCH 1/2] x86/MCE: Export memory_error()
> 
> Export the function which checks whether an MCE is a memory error to
> other users so that we can reuse the logic. Drop the boot_cpu_data use,
> while at it, as mce.cpuvendor already has the CPU vendor in there.
> 
> Signed-off-by: Borislav Petkov <bp@suse.de>
> ---
>  arch/x86/include/asm/mce.h       |  1 +
>  arch/x86/kernel/cpu/mcheck/mce.c | 12 +++++-------
>  2 files changed, 6 insertions(+), 7 deletions(-)
> 
Here is the updated patch to use the above helper:

8<-----


>From 9661a85799c9067d762ecf29630f2b7f69897628 Mon Sep 17 00:00:00 2001
From: Vishal Verma <vishal.l.verma@intel.com>
Date: Tue, 25 Apr 2017 15:00:58 -0600
Subject: [PATCH v2] acpi, nfit: fix the memory error check in nfit_handle_mce

The check for an MCE being a memory error in the NFIT mce handler was
bogus. Export the new mce_is_memory_error helper, and use that tp
perform the correct check in the handler.

Reported-by: Tony Luck <tony.luck@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
---
 arch/x86/kernel/cpu/mcheck/mce.c | 1 +
 drivers/acpi/nfit/mce.c          | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

This applies on tip/master + Borislav's patches in this thread above.
I'm not sure what the right process for queueing this for both
upstream and -stable is, so just replying here. Should I post it
independently?

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 361865ca..5cfbaeb 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -527,6 +527,7 @@ bool mce_is_memory_error(struct mce *m)
 
 	return false;
 }
+EXPORT_SYMBOL_GPL(mce_is_memory_error);
 
 static bool cec_add_mce(struct mce *m)
 {
diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
index 3ba1c34..fd86bec 100644
--- a/drivers/acpi/nfit/mce.c
+++ b/drivers/acpi/nfit/mce.c
@@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block *nb, unsigned long val,
 	struct nfit_spa *nfit_spa;
 
 	/* We only care about memory errors */
-	if (!(mce->status & MCACOD))
+	if (!mce_is_memory_error(mce))
 		return NOTIFY_DONE;
 
 	/*
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] x86/ras/mce_amd_inj: Preset MCE injection struct
  2017-04-24 11:37               ` Borislav Petkov
@ 2017-04-26 19:59                 ` kbuild test robot
  -1 siblings, 0 replies; 66+ messages in thread
From: kbuild test robot @ 2017-04-26 19:59 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi, kbuild-all

Hi Borislav,

[auto build test ERROR on next-20170421]
[cannot apply to tip/x86/core v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.11-rc8]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Borislav-Petkov/x86-MCE-Export-memory_error/20170424-215449
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

>> ERROR: "mce_setup" [arch/x86/ras/mce_amd_inj.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 2/2] x86/ras/mce_amd_inj: Preset MCE injection struct
@ 2017-04-26 19:59                 ` kbuild test robot
  0 siblings, 0 replies; 66+ messages in thread
From: kbuild test robot @ 2017-04-26 19:59 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: kbuild-all, Luck, Tony, Dan Williams, Verma, Vishal L,
	linux-nvdimm, stable, linux-acpi

[-- Attachment #1: Type: text/plain, Size: 783 bytes --]

Hi Borislav,

[auto build test ERROR on next-20170421]
[cannot apply to tip/x86/core v4.9-rc8 v4.9-rc7 v4.9-rc6 v4.11-rc8]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Borislav-Petkov/x86-MCE-Export-memory_error/20170424-215449
config: i386-allmodconfig (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

>> ERROR: "mce_setup" [arch/x86/ras/mce_amd_inj.ko] undefined!

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 60084 bytes --]

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 19:31                   ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 19:31 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, Luck,

On Tue, 2017-04-25 at 15:07 -0600, Vishal Verma wrote:
> On 04/24, Borislav Petkov wrote:
> > From: Borislav Petkov <bp@suse.de>
> > Date: Mon, 24 Apr 2017 13:16:50 +0200
> > Subject: [PATCH 1/2] x86/MCE: Export memory_error()
> > 
> > Export the function which checks whether an MCE is a memory error to
> > other users so that we can reuse the logic. Drop the boot_cpu_data
> > use,
> > while at it, as mce.cpuvendor already has the CPU vendor in there.
> > 
> > Signed-off-by: Borislav Petkov <bp@suse.de>
> > ---
> >  arch/x86/include/asm/mce.h       |  1 +
> >  arch/x86/kernel/cpu/mcheck/mce.c | 12 +++++-------
> >  2 files changed, 6 insertions(+), 7 deletions(-)
> > 

Hi Boris/Tony,

I didn't see the above patches in the RAS branches of the tip tree -
were you thinking we would carry these through the nvdimm tree
(including my updated patch below)?

> 
> Here is the updated patch to use the above helper:
> 
> 8<-----
> 
> 
> From 9661a85799c9067d762ecf29630f2b7f69897628 Mon Sep 17 00:00:00 2001
> From: Vishal Verma <vishal.l.verma@intel.com>
> Date: Tue, 25 Apr 2017 15:00:58 -0600
> Subject: [PATCH v2] acpi, nfit: fix the memory error check in
> nfit_handle_mce
> 
> The check for an MCE being a memory error in the NFIT mce handler was
> bogus. Export the new mce_is_memory_error helper, and use that tp
> perform the correct check in the handler.
> 
> Reported-by: Tony Luck <tony.luck@intel.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 1 +
>  drivers/acpi/nfit/mce.c          | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> This applies on tip/master + Borislav's patches in this thread above.
> I'm not sure what the right process for queueing this for both
> upstream and -stable is, so just replying here. Should I post it
> independently?
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 361865ca..5cfbaeb 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -527,6 +527,7 @@ bool mce_is_memory_error(struct mce *m)
>  
>  	return false;
>  }
> +EXPORT_SYMBOL_GPL(mce_is_memory_error);
>  
>  static bool cec_add_mce(struct mce *m)
>  {
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index 3ba1c34..fd86bec 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block
> *nb, unsigned long val,
>  	struct nfit_spa *nfit_spa;
>  
>  	/* We only care about memory errors */
> -	if (!(mce->status & MCACOD))
> +	if (!mce_is_memory_error(mce))
>  		return NOTIFY_DONE;
>  
>  	/*
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 19:31                   ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 19:31 UTC (permalink / raw)
  To: bp-Gina5bIWoIWzQB+pC5nmwQ
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, Luck, Tony,
	stable-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

On Tue, 2017-04-25 at 15:07 -0600, Vishal Verma wrote:
> On 04/24, Borislav Petkov wrote:
> > From: Borislav Petkov <bp@suse.de>
> > Date: Mon, 24 Apr 2017 13:16:50 +0200
> > Subject: [PATCH 1/2] x86/MCE: Export memory_error()
> > 
> > Export the function which checks whether an MCE is a memory error to
> > other users so that we can reuse the logic. Drop the boot_cpu_data
> > use,
> > while at it, as mce.cpuvendor already has the CPU vendor in there.
> > 
> > Signed-off-by: Borislav Petkov <bp@suse.de>
> > ---
> >  arch/x86/include/asm/mce.h       |  1 +
> >  arch/x86/kernel/cpu/mcheck/mce.c | 12 +++++-------
> >  2 files changed, 6 insertions(+), 7 deletions(-)
> > 

Hi Boris/Tony,

I didn't see the above patches in the RAS branches of the tip tree -
were you thinking we would carry these through the nvdimm tree
(including my updated patch below)?

> 
> Here is the updated patch to use the above helper:
> 
> 8<-----
> 
> 
> From 9661a85799c9067d762ecf29630f2b7f69897628 Mon Sep 17 00:00:00 2001
> From: Vishal Verma <vishal.l.verma@intel.com>
> Date: Tue, 25 Apr 2017 15:00:58 -0600
> Subject: [PATCH v2] acpi, nfit: fix the memory error check in
> nfit_handle_mce
> 
> The check for an MCE being a memory error in the NFIT mce handler was
> bogus. Export the new mce_is_memory_error helper, and use that tp
> perform the correct check in the handler.
> 
> Reported-by: Tony Luck <tony.luck@intel.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 1 +
>  drivers/acpi/nfit/mce.c          | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> This applies on tip/master + Borislav's patches in this thread above.
> I'm not sure what the right process for queueing this for both
> upstream and -stable is, so just replying here. Should I post it
> independently?
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 361865ca..5cfbaeb 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -527,6 +527,7 @@ bool mce_is_memory_error(struct mce *m)
>  
>  	return false;
>  }
> +EXPORT_SYMBOL_GPL(mce_is_memory_error);
>  
>  static bool cec_add_mce(struct mce *m)
>  {
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index 3ba1c34..fd86bec 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block
> *nb, unsigned long val,
>  	struct nfit_spa *nfit_spa;
>  
>  	/* We only care about memory errors */
> -	if (!(mce->status & MCACOD))
> +	if (!mce_is_memory_error(mce))
>  		return NOTIFY_DONE;
>  
>  	/*
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 19:31                   ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 19:31 UTC (permalink / raw)
  To: bp; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Tue, 2017-04-25 at 15:07 -0600, Vishal Verma wrote:
> On 04/24, Borislav Petkov wrote:
> > From: Borislav Petkov <bp@suse.de>
> > Date: Mon, 24 Apr 2017 13:16:50 +0200
> > Subject: [PATCH 1/2] x86/MCE: Export memory_error()
> > 
> > Export the function which checks whether an MCE is a memory error to
> > other users so that we can reuse the logic. Drop the boot_cpu_data
> > use,
> > while at it, as mce.cpuvendor already has the CPU vendor in there.
> > 
> > Signed-off-by: Borislav Petkov <bp@suse.de>
> > ---
> >  arch/x86/include/asm/mce.h       |  1 +
> >  arch/x86/kernel/cpu/mcheck/mce.c | 12 +++++-------
> >  2 files changed, 6 insertions(+), 7 deletions(-)
> > 

Hi Boris/Tony,

I didn't see the above patches in the RAS branches of the tip tree -
were you thinking we would carry these through the nvdimm tree
(including my updated patch below)?

> 
> Here is the updated patch to use the above helper:
> 
> 8<-----
> 
> 
> From 9661a85799c9067d762ecf29630f2b7f69897628 Mon Sep 17 00:00:00 2001
> From: Vishal Verma <vishal.l.verma@intel.com>
> Date: Tue, 25 Apr 2017 15:00:58 -0600
> Subject: [PATCH v2] acpi, nfit: fix the memory error check in
> nfit_handle_mce
> 
> The check for an MCE being a memory error in the NFIT mce handler was
> bogus. Export the new mce_is_memory_error helper, and use that tp
> perform the correct check in the handler.
> 
> Reported-by: Tony Luck <tony.luck@intel.com>
> Cc: Borislav Petkov <bp@suse.de>
> Cc: <stable@vger.kernel.org>
> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
> ---
>  arch/x86/kernel/cpu/mcheck/mce.c | 1 +
>  drivers/acpi/nfit/mce.c          | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> This applies on tip/master + Borislav's patches in this thread above.
> I'm not sure what the right process for queueing this for both
> upstream and -stable is, so just replying here. Should I post it
> independently?
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> b/arch/x86/kernel/cpu/mcheck/mce.c
> index 361865ca..5cfbaeb 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -527,6 +527,7 @@ bool mce_is_memory_error(struct mce *m)
>  
>  	return false;
>  }
> +EXPORT_SYMBOL_GPL(mce_is_memory_error);
>  
>  static bool cec_add_mce(struct mce *m)
>  {
> diff --git a/drivers/acpi/nfit/mce.c b/drivers/acpi/nfit/mce.c
> index 3ba1c34..fd86bec 100644
> --- a/drivers/acpi/nfit/mce.c
> +++ b/drivers/acpi/nfit/mce.c
> @@ -26,7 +26,7 @@ static int nfit_handle_mce(struct notifier_block
> *nb, unsigned long val,
>  	struct nfit_spa *nfit_spa;
>  
>  	/* We only care about memory errors */
> -	if (!(mce->status & MCACOD))
> +	if (!mce_is_memory_error(mce))
>  		return NOTIFY_DONE;
>  
>  	/*

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-05-10 19:31                   ` Verma, Vishal L
@ 2017-05-10 20:04                     ` Borislav Petkov
  -1 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-10 20:04 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: linux-acpi, Luck,

On Wed, May 10, 2017 at 07:31:30PM +0000, Verma, Vishal L wrote:
> I didn't see the above patches in the RAS branches of the tip tree -

You need to be patient - we have merge window right now. Next week,
after -rc1 releases, it is business as usual again.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 20:04                     ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-10 20:04 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, May 10, 2017 at 07:31:30PM +0000, Verma, Vishal L wrote:
> I didn't see the above patches in the RAS branches of the tip tree -

You need to be patient - we have merge window right now. Next week,
after -rc1 releases, it is business as usual again.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-05-10 20:04                     ` Borislav Petkov
@ 2017-05-10 20:06                       ` Verma, Vishal L
  -1 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 20:06 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, Luck,

On Wed, 2017-05-10 at 22:04 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 07:31:30PM +0000, Verma, Vishal L wrote:
> > I didn't see the above patches in the RAS branches of the tip tree -
> 
> You need to be patient - we have merge window right now. Next week,
> after -rc1 releases, it is business as usual again.
> 
Ah I was under the impression that this can go in for 4.12..
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 20:06                       ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 20:06 UTC (permalink / raw)
  To: bp; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, 2017-05-10 at 22:04 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 07:31:30PM +0000, Verma, Vishal L wrote:
> > I didn't see the above patches in the RAS branches of the tip tree -
> 
> You need to be patient - we have merge window right now. Next week,
> after -rc1 releases, it is business as usual again.
> 
Ah I was under the impression that this can go in for 4.12..

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-05-10 20:06                       ` Verma, Vishal L
@ 2017-05-10 20:08                         ` Borislav Petkov
  -1 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-10 20:08 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: linux-acpi, Luck,

On Wed, May 10, 2017 at 08:06:53PM +0000, Verma, Vishal L wrote:
> Ah I was under the impression that this can go in for 4.12..

... and the reason for hurrying it into 4.12 is?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 20:08                         ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-10 20:08 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, May 10, 2017 at 08:06:53PM +0000, Verma, Vishal L wrote:
> Ah I was under the impression that this can go in for 4.12..

... and the reason for hurrying it into 4.12 is?

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-05-10 20:08                         ` Borislav Petkov
@ 2017-05-10 21:12                           ` Verma, Vishal L
  -1 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 21:12 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, Luck,

On Wed, 2017-05-10 at 22:08 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 08:06:53PM +0000, Verma, Vishal L wrote:
> > Ah I was under the impression that this can go in for 4.12..
> 
> ... and the reason for hurrying it into 4.12 is?
> 
The memory error check in the nfit handler is a valid, and simple fix.
That said, I understand if you want additional soak time for the full
set.

Thanks,
	-Vishal
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 21:12                           ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 21:12 UTC (permalink / raw)
  To: bp; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, 2017-05-10 at 22:08 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 08:06:53PM +0000, Verma, Vishal L wrote:
> > Ah I was under the impression that this can go in for 4.12..
> 
> ... and the reason for hurrying it into 4.12 is?
> 
The memory error check in the nfit handler is a valid, and simple fix.
That said, I understand if you want additional soak time for the full
set.

Thanks,
	-Vishal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-05-10 21:12                           ` Verma, Vishal L
@ 2017-05-10 21:57                             ` Borislav Petkov
  -1 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-10 21:57 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: linux-acpi, Luck,

On Wed, May 10, 2017 at 09:12:12PM +0000, Verma, Vishal L wrote:
> The memory error check in the nfit handler is a valid, and simple fix.

I need the big picture here: "Without this fix, the nfit handler ...".

Then, if the stable rules apply, we can always expedite it through
urgent/stable.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 21:57                             ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-10 21:57 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, May 10, 2017 at 09:12:12PM +0000, Verma, Vishal L wrote:
> The memory error check in the nfit handler is a valid, and simple fix.

I need the big picture here: "Without this fix, the nfit handler ...".

Then, if the stable rules apply, we can always expedite it through
urgent/stable.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-05-10 21:57                             ` Borislav Petkov
@ 2017-05-10 22:03                               ` Verma, Vishal L
  -1 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 22:03 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, Luck,

On Wed, 2017-05-10 at 23:57 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 09:12:12PM +0000, Verma, Vishal L wrote:
> > The memory error check in the nfit handler is a valid, and simple
> > fix.
> 
> I need the big picture here: "Without this fix, the nfit handler ...".

..will potentially add bogus address to an 'error list', even when there
may not have been a memory error. (can mce->addr have an address when
the mce is not due to a memory error?)
The result of adding an address to this list is that future accesses to
this location will prematurely error out. Depending on how frequently
machine checks happen that are not memory errors but have the addr field
set (hopefully rare anyway), we could be incorrectly marking a lot of
locations as media errors.

> 
> Then, if the stable rules apply, we can always expedite it through
> urgent/stable.
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 22:03                               ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 22:03 UTC (permalink / raw)
  To: bp; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, 2017-05-10 at 23:57 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 09:12:12PM +0000, Verma, Vishal L wrote:
> > The memory error check in the nfit handler is a valid, and simple
> > fix.
> 
> I need the big picture here: "Without this fix, the nfit handler ...".

..will potentially add bogus address to an 'error list', even when there
may not have been a memory error. (can mce->addr have an address when
the mce is not due to a memory error?)
The result of adding an address to this list is that future accesses to
this location will prematurely error out. Depending on how frequently
machine checks happen that are not memory errors but have the addr field
set (hopefully rare anyway), we could be incorrectly marking a lot of
locations as media errors.

> 
> Then, if the stable rules apply, we can always expedite it through
> urgent/stable.
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 22:16                                 ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-10 22:16 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: linux-acpi, Luck,

On Wed, May 10, 2017 at 10:03:42PM +0000, Verma, Vishal L wrote:
> ... Depending on how frequently machine checks happen that are not
> memory errors but have the addr field set (hopefully rare anyway), we
> could be incorrectly marking a lot of locations as media errors.

Sounds serious enough to me, thanks.

I'll prep the queue next week and run tests.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 22:16                                 ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-10 22:16 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, Luck, Tony,
	stable-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

On Wed, May 10, 2017 at 10:03:42PM +0000, Verma, Vishal L wrote:
> ... Depending on how frequently machine checks happen that are not
> memory errors but have the addr field set (hopefully rare anyway), we
> could be incorrectly marking a lot of locations as media errors.

Sounds serious enough to me, thanks.

I'll prep the queue next week and run tests.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 22:16                                 ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-10 22:16 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, May 10, 2017 at 10:03:42PM +0000, Verma, Vishal L wrote:
> ... Depending on how frequently machine checks happen that are not
> memory errors but have the addr field set (hopefully rare anyway), we
> could be incorrectly marking a lot of locations as media errors.

Sounds serious enough to me, thanks.

I'll prep the queue next week and run tests.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-05-10 22:16                                 ` Borislav Petkov
@ 2017-05-10 22:22                                   ` Verma, Vishal L
  -1 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 22:22 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, Luck,

On Thu, 2017-05-11 at 00:16 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 10:03:42PM +0000, Verma, Vishal L wrote:
> > ... Depending on how frequently machine checks happen that are not
> > memory errors but have the addr field set (hopefully rare anyway),
> > we
> > could be incorrectly marking a lot of locations as media errors.
> 
> Sounds serious enough to me, thanks.
> 
> I'll prep the queue next week and run tests.
> 
Ok, Thanks Boris!

	-Vishal
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-10 22:22                                   ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-10 22:22 UTC (permalink / raw)
  To: bp; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Thu, 2017-05-11 at 00:16 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 10:03:42PM +0000, Verma, Vishal L wrote:
> > ... Depending on how frequently machine checks happen that are not
> > memory errors but have the addr field set (hopefully rare anyway),
> > we
> > could be incorrectly marking a lot of locations as media errors.
> 
> Sounds serious enough to me, thanks.
> 
> I'll prep the queue next week and run tests.
> 
Ok, Thanks Boris!

	-Vishal

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-17 12:38                                     ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-17 12:38 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: linux-acpi, Luck,

On Wed, May 10, 2017 at 10:22:32PM +0000, Verma, Vishal L wrote:
> > I'll prep the queue next week and run tests.
> > 
> Ok, Thanks Boris!

Ok, I've pushed a branch:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-ras-pending

Please have a look, test, poke, etc...

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-17 12:38                                     ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-17 12:38 UTC (permalink / raw)
  To: Verma, Vishal L
  Cc: linux-acpi-u79uwXL29TY76Z2rM5mHXA, Luck, Tony,
	stable-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw

On Wed, May 10, 2017 at 10:22:32PM +0000, Verma, Vishal L wrote:
> > I'll prep the queue next week and run tests.
> > 
> Ok, Thanks Boris!

Ok, I've pushed a branch:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-ras-pending

Please have a look, test, poke, etc...

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-17 12:38                                     ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-17 12:38 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, May 10, 2017 at 10:22:32PM +0000, Verma, Vishal L wrote:
> > I'll prep the queue next week and run tests.
> > 
> Ok, Thanks Boris!

Ok, I've pushed a branch:

https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-ras-pending

Please have a look, test, poke, etc...

Thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-05-17 12:38                                     ` Borislav Petkov
@ 2017-05-17 18:58                                       ` Verma, Vishal L
  -1 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-17 18:58 UTC (permalink / raw)
  To: bp; +Cc: linux-acpi, Luck,

On Wed, 2017-05-17 at 14:38 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 10:22:32PM +0000, Verma, Vishal L wrote:
> > > I'll prep the queue next week and run tests.
> > > 
> > 
> > Ok, Thanks Boris!
> 
> Ok, I've pushed a branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-r
> as-pending
> 
> Please have a look, test, poke, etc...

Quick/minor observation - you moved the EXPORT_SYMBOL_GPL into your
original patch, but the commit message of mine still talks about
exporting. Perhaps it should read:

"Use the new mce_is_memory_error() helper.."

Thanks,
	Vishal

> 
> Thanks.
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-17 18:58                                       ` Verma, Vishal L
  0 siblings, 0 replies; 66+ messages in thread
From: Verma, Vishal L @ 2017-05-17 18:58 UTC (permalink / raw)
  To: bp; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, 2017-05-17 at 14:38 +0200, Borislav Petkov wrote:
> On Wed, May 10, 2017 at 10:22:32PM +0000, Verma, Vishal L wrote:
> > > I'll prep the queue next week and run tests.
> > > 
> > 
> > Ok, Thanks Boris!
> 
> Ok, I've pushed a branch:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/bp/bp.git/log/?h=tip-r
> as-pending
> 
> Please have a look, test, poke, etc...

Quick/minor observation - you moved the EXPORT_SYMBOL_GPL into your
original patch, but the commit message of mine still talks about
exporting. Perhaps it should read:

"Use the new mce_is_memory_error() helper.."

Thanks,
	Vishal

> 
> Thanks.
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
  2017-05-17 18:58                                       ` Verma, Vishal L
@ 2017-05-17 19:20                                         ` Borislav Petkov
  -1 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-17 19:20 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: linux-acpi, Luck,

On Wed, May 17, 2017 at 06:58:00PM +0000, Verma, Vishal L wrote:
> Quick/minor observation - you moved the EXPORT_SYMBOL_GPL into your

Yes, it belongs there conceptually.

> original patch, but the commit message of mine still talks about
> exporting. Perhaps it should read:
> 
> "Use the new mce_is_memory_error() helper.."

Fixed, thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH 1/2] x86/MCE: Export memory_error()
@ 2017-05-17 19:20                                         ` Borislav Petkov
  0 siblings, 0 replies; 66+ messages in thread
From: Borislav Petkov @ 2017-05-17 19:20 UTC (permalink / raw)
  To: Verma, Vishal L; +Cc: Luck, Tony, linux-nvdimm, stable, linux-acpi

On Wed, May 17, 2017 at 06:58:00PM +0000, Verma, Vishal L wrote:
> Quick/minor observation - you moved the EXPORT_SYMBOL_GPL into your

Yes, it belongs there conceptually.

> original patch, but the commit message of mine still talks about
> exporting. Perhaps it should read:
> 
> "Use the new mce_is_memory_error() helper.."

Fixed, thanks.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2017-05-17 19:21 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-20 22:18 [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce Vishal Verma
2017-04-20 22:18 ` Vishal Verma
2017-04-20 22:18 ` Vishal Verma
2017-04-20 22:21 ` Verma, Vishal L
2017-04-20 22:21   ` Verma, Vishal L
2017-04-21  2:21 ` kbuild test robot
2017-04-21  2:21   ` kbuild test robot
2017-04-21  2:21   ` kbuild test robot
2017-04-21 19:21 ` Dan Williams
2017-04-21 19:21   ` Dan Williams
2017-04-21 19:56   ` Verma, Vishal L
2017-04-21 19:56     ` Verma, Vishal L
2017-04-21 19:56     ` Verma, Vishal L
2017-04-21 20:16     ` Luck, Tony
2017-04-21 20:16       ` Luck, Tony
2017-04-21 20:16       ` Luck, Tony
2017-04-21 20:19       ` Dan Williams
2017-04-21 20:19         ` Dan Williams
2017-04-21 20:27         ` Luck, Tony
2017-04-21 20:27           ` Luck, Tony
2017-04-21 21:07           ` Borislav Petkov
2017-04-21 21:07             ` Borislav Petkov
2017-04-21 21:07             ` Borislav Petkov
2017-04-24 11:36             ` [PATCH 1/2] x86/MCE: Export memory_error() Borislav Petkov
2017-04-24 11:36               ` Borislav Petkov
2017-04-25 21:07               ` Vishal Verma
2017-04-25 21:07                 ` Vishal Verma
2017-05-10 19:31                 ` Verma, Vishal L
2017-05-10 19:31                   ` Verma, Vishal L
2017-05-10 19:31                   ` Verma, Vishal L
2017-05-10 20:04                   ` Borislav Petkov
2017-05-10 20:04                     ` Borislav Petkov
2017-05-10 20:06                     ` Verma, Vishal L
2017-05-10 20:06                       ` Verma, Vishal L
2017-05-10 20:08                       ` Borislav Petkov
2017-05-10 20:08                         ` Borislav Petkov
2017-05-10 21:12                         ` Verma, Vishal L
2017-05-10 21:12                           ` Verma, Vishal L
2017-05-10 21:57                           ` Borislav Petkov
2017-05-10 21:57                             ` Borislav Petkov
2017-05-10 22:03                             ` Verma, Vishal L
2017-05-10 22:03                               ` Verma, Vishal L
2017-05-10 22:16                               ` Borislav Petkov
2017-05-10 22:16                                 ` Borislav Petkov
2017-05-10 22:16                                 ` Borislav Petkov
2017-05-10 22:22                                 ` Verma, Vishal L
2017-05-10 22:22                                   ` Verma, Vishal L
2017-05-17 12:38                                   ` Borislav Petkov
2017-05-17 12:38                                     ` Borislav Petkov
2017-05-17 12:38                                     ` Borislav Petkov
2017-05-17 18:58                                     ` Verma, Vishal L
2017-05-17 18:58                                       ` Verma, Vishal L
2017-05-17 19:20                                       ` Borislav Petkov
2017-05-17 19:20                                         ` Borislav Petkov
2017-04-24 11:37             ` [PATCH 2/2] x86/ras/mce_amd_inj: Preset MCE injection struct Borislav Petkov
2017-04-24 11:37               ` Borislav Petkov
2017-04-26 19:59               ` kbuild test robot
2017-04-26 19:59                 ` kbuild test robot
2017-04-21 20:35       ` [PATCH] acpi, nfit: fix the memory error check in nfit_handle_mce Vishal Verma
2017-04-21 20:35         ` Vishal Verma
2017-04-21 20:35         ` Vishal Verma
2017-04-21 20:50         ` Luck, Tony
2017-04-21 20:50           ` Luck, Tony
2017-04-21 20:50           ` Luck, Tony
2017-04-21 20:54           ` Vishal Verma
2017-04-21 20:54             ` Vishal Verma

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.