linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Updates to EDAC mce_amd_inj
@ 2015-05-27 19:03 Aravind Gopalakrishnan
  2015-05-27 19:03 ` [PATCH 1/6] edac, mce_amd_inj: Use MCE_INJECT_GET for bank Aravind Gopalakrishnan
                   ` (5 more replies)
  0 siblings, 6 replies; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-27 19:03 UTC (permalink / raw)
  To: bp, dougthompson, mchehab; +Cc: linux-edac, linux-kernel

Patch 1, 2: Cleanups to the code. No functional change
Patch 3: Modifying flags attribute to accept string arguments and
	 also make do_inject() code recognize these changes before
	 injecting a 'sw' or 'hw' error
Patch 4: Extend flags attribute to take in values for 'dfr', 'thr'.
	 These values will be used to trigger deferred error/threshold
	 error apic interrupts respectively
Patch 5: Add README file that describes the attributes
Patch 6: Modify injection mechanism for bank 4 errors. Since they are
	 typically logged or reported only on NBC, we make sure that
	 we inject on the correct core here.

Aravind Gopalakrishnan (6):
  edac, mce_amd_inj: Use MCE_INJECT_GET for bank
  edac, mce_amd_inj: Rework sanity check for inj_bank_set
  edac, mce_amd_inj: Modify flags attrigute to use string arguments
  edac, mce_amd_inj: Add capability to trigger apic interrupts
  edac, mce_amd_inj: Add README file
  edac, mce_amd_inj: Inject errors on NBC for bank 4 errors

 drivers/edac/mce_amd_inj.c | 284 ++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 256 insertions(+), 28 deletions(-)

-- 
2.4.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/6] edac, mce_amd_inj: Use MCE_INJECT_GET for bank
  2015-05-27 19:03 [PATCH 0/6] Updates to EDAC mce_amd_inj Aravind Gopalakrishnan
@ 2015-05-27 19:03 ` Aravind Gopalakrishnan
  2015-05-27 19:03 ` [PATCH 2/6] edac, mce_amd_inj: Rework sanity check for inj_bank_set Aravind Gopalakrishnan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-27 19:03 UTC (permalink / raw)
  To: bp, dougthompson, mchehab; +Cc: linux-edac, linux-kernel

inj_bank_get is generic enough that we can use the
macro in it's place. Doing that here.

No functional change.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 drivers/edac/mce_amd_inj.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index f7681b5..7f3a97a 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -187,13 +187,7 @@ static int inj_bank_set(void *data, u64 val)
 	return 0;
 }
 
-static int inj_bank_get(void *data, u64 *val)
-{
-	struct mce *m = (struct mce *)data;
-
-	*val = m->bank;
-	return 0;
-}
+MCE_INJECT_GET(bank);
 
 DEFINE_SIMPLE_ATTRIBUTE(bank_fops, inj_bank_get, inj_bank_set, "%llu\n");
 
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 2/6] edac, mce_amd_inj: Rework sanity check for inj_bank_set
  2015-05-27 19:03 [PATCH 0/6] Updates to EDAC mce_amd_inj Aravind Gopalakrishnan
  2015-05-27 19:03 ` [PATCH 1/6] edac, mce_amd_inj: Use MCE_INJECT_GET for bank Aravind Gopalakrishnan
@ 2015-05-27 19:03 ` Aravind Gopalakrishnan
  2015-05-27 19:03 ` [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments Aravind Gopalakrishnan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-27 19:03 UTC (permalink / raw)
  To: bp, dougthompson, mchehab; +Cc: linux-edac, linux-kernel

The number of banks for a given processor is encoded in
MSR_IA32_MCG_CAP. So, use this to obtain the value and
for sanity checking in inj_bank_set() instead of requiring a
family/model check.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 drivers/edac/mce_amd_inj.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index 7f3a97a..15f6aa1 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -25,6 +25,8 @@
 static struct mce i_mce;
 static struct dentry *dfs_inj;
 
+static u8 n_banks;
+
 #define MCE_INJECT_SET(reg)						\
 static int inj_##reg##_set(void *data, u64 val)				\
 {									\
@@ -174,11 +176,9 @@ static int inj_bank_set(void *data, u64 val)
 {
 	struct mce *m = (struct mce *)data;
 
-	if (val > 5) {
-		if (boot_cpu_data.x86 != 0x15 || val > 6) {
-			pr_err("Non-existent MCE bank: %llu\n", val);
-			return -EINVAL;
-		}
+	if (val >= n_banks) {
+		pr_err("Non-existent MCE bank: %llu\n", val);
+		return -EINVAL;
 	}
 
 	m->bank = val;
@@ -207,6 +207,10 @@ static struct dfs_node {
 static int __init init_mce_inject(void)
 {
 	int i;
+	u64 cap;
+
+	rdmsrl(MSR_IA32_MCG_CAP, cap);
+	n_banks = cap & MCG_BANKCNT_MASK;
 
 	dfs_inj = debugfs_create_dir("mce-inject", NULL);
 	if (!dfs_inj)
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments
  2015-05-27 19:03 [PATCH 0/6] Updates to EDAC mce_amd_inj Aravind Gopalakrishnan
  2015-05-27 19:03 ` [PATCH 1/6] edac, mce_amd_inj: Use MCE_INJECT_GET for bank Aravind Gopalakrishnan
  2015-05-27 19:03 ` [PATCH 2/6] edac, mce_amd_inj: Rework sanity check for inj_bank_set Aravind Gopalakrishnan
@ 2015-05-27 19:03 ` Aravind Gopalakrishnan
  2015-05-29 13:49   ` Borislav Petkov
  2015-05-27 19:03 ` [PATCH 4/6] edac, mce_amd_inj: Add capability to trigger apic interrupts Aravind Gopalakrishnan
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-27 19:03 UTC (permalink / raw)
  To: bp, dougthompson, mchehab; +Cc: linux-edac, linux-kernel

Use char values such as "hw" or "sw" to indicate the type of error
injection to be performed.

Current flags attribute derives the meanings of values that can be
programmed into it from asm/mce.h. Moving to defined strings for the
atribute allows this module to be self sufficient and removes the
dependency. Also, we can introduce new flags as and when needed without
having to worry about conflicting with the flags already defined
in asm/mce.h

Also, modify do_inject() to use the newly defined injection_type enum
to figure out the injection mechanism we need to use

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 drivers/edac/mce_amd_inj.c | 85 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 76 insertions(+), 9 deletions(-)

diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index 15f6aa1..daec4af 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -15,6 +15,8 @@
 #include <linux/device.h>
 #include <linux/module.h>
 #include <linux/cpu.h>
+#include <linux/string.h>
+#include <linux/uaccess.h>
 #include <asm/mce.h>
 
 #include "mce_amd.h"
@@ -27,6 +29,23 @@ static struct dentry *dfs_inj;
 
 static u8 n_banks;
 
+#define MAX_FLAG_OPT_SIZE	10
+
+enum injection_type {
+	SW_INJ = 0,	/* SW injection, simply decode the error */
+	HW_INJ,		/* Trigger a #MC */
+	N_INJ_TYPES,
+};
+
+static const char * const flags_options[] = {
+	[SW_INJ] = "sw",
+	[HW_INJ] = "hw",
+	NULL
+};
+
+/* Set default injection to SW_INJ */
+enum injection_type inj_type = SW_INJ;
+
 #define MCE_INJECT_SET(reg)						\
 static int inj_##reg##_set(void *data, u64 val)				\
 {									\
@@ -81,24 +100,72 @@ static int toggle_hw_mce_inject(unsigned int cpu, bool enable)
 	return err;
 }
 
-static int flags_get(void *data, u64 *val)
+static int __set_inj(const char *buf, size_t cnt)
 {
-	struct mce *m = (struct mce *)data;
+	int i;
+	const char *inj_op = NULL;
 
-	*val = m->inject_flags;
+	for (i = 0; i <= N_INJ_TYPES; i++) {
+		inj_op = flags_options[i];
+		if (inj_op && strncmp(inj_op, buf, cnt) == 0)
+			break;
+	}
+
+	if (!inj_op)
+		return -EINVAL;
+
+	inj_type = i;
 
 	return 0;
 }
 
-static int flags_set(void *data, u64 val)
+static ssize_t flags_read(struct file *filp, char __user *ubuf,
+			  size_t cnt, loff_t *ppos)
 {
-	struct mce *m = (struct mce *)data;
+	char buf[MAX_FLAG_OPT_SIZE + 1];
+	int n;
 
-	m->inject_flags = (u8)val;
-	return 0;
+	n = sprintf(buf, "%s\n", flags_options[inj_type]);
+
+	return simple_read_from_buffer(ubuf, cnt, ppos, buf, n);
 }
 
-DEFINE_SIMPLE_ATTRIBUTE(flags_fops, flags_get, flags_set, "%llu\n");
+static ssize_t flags_write(struct file *filp, const char __user *ubuf,
+			   size_t cnt, loff_t *ppos)
+{
+	char buf[MAX_FLAG_OPT_SIZE + 1];
+	int err;
+	size_t ret;
+
+	ret = cnt;
+
+	if (cnt > MAX_FLAG_OPT_SIZE)
+		cnt = MAX_FLAG_OPT_SIZE;
+
+	if (copy_from_user(&buf, ubuf, cnt))
+		return -EFAULT;
+
+	buf[cnt] = 0;
+
+	/* strip whitespaces.. */
+	strstrip(buf);
+
+	err = __set_inj(buf, cnt - 1);
+	if (err) {
+		pr_err("%s: Invalid flags value: %s\n", __func__, buf);
+		return err;
+	}
+
+	*ppos += ret;
+
+	return ret;
+}
+
+static const struct file_operations flags_fops = {
+	.read           = flags_read,
+	.write          = flags_write,
+	.llseek         = generic_file_llseek,
+};
 
 /*
  * On which CPU to inject?
@@ -130,7 +197,7 @@ static void do_inject(void)
 	unsigned int cpu = i_mce.extcpu;
 	u8 b = i_mce.bank;
 
-	if (!(i_mce.inject_flags & MCJ_EXCEPTION)) {
+	if (inj_type == SW_INJ) {
 		amd_decode_mce(NULL, 0, &i_mce);
 		return;
 	}
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4/6] edac, mce_amd_inj: Add capability to trigger apic interrupts
  2015-05-27 19:03 [PATCH 0/6] Updates to EDAC mce_amd_inj Aravind Gopalakrishnan
                   ` (2 preceding siblings ...)
  2015-05-27 19:03 ` [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments Aravind Gopalakrishnan
@ 2015-05-27 19:03 ` Aravind Gopalakrishnan
  2015-05-29 15:36   ` Borislav Petkov
  2015-05-27 19:03 ` [PATCH 5/6] edac, mce_amd_inj: Add README file Aravind Gopalakrishnan
  2015-05-27 19:03 ` [PATCH 6/6] edac, mce_amd_inj: Inject errors on NBC for bank 4 errors Aravind Gopalakrishnan
  5 siblings, 1 reply; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-27 19:03 UTC (permalink / raw)
  To: bp, dougthompson, mchehab; +Cc: linux-edac, linux-kernel

With this extension to the flags attribute, deferred error interrupts
and threshold interrupts can be triggered to test the apic interrupt
handler functionality for these type of errors

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 drivers/edac/mce_amd_inj.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index daec4af..f1c1433 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -34,12 +34,16 @@ static u8 n_banks;
 enum injection_type {
 	SW_INJ = 0,	/* SW injection, simply decode the error */
 	HW_INJ,		/* Trigger a #MC */
+	DFR_INT_INJ,	/* Trigger Deferred error interrupt */
+	THR_INT_INJ,	/* Trigger threshold interrupt */
 	N_INJ_TYPES,
 };
 
 static const char * const flags_options[] = {
 	[SW_INJ] = "sw",
 	[HW_INJ] = "hw",
+	[DFR_INT_INJ] = "dfr",
+	[THR_INT_INJ] = "thr",
 	NULL
 };
 
@@ -191,6 +195,16 @@ static void trigger_mce(void *info)
 	asm volatile("int $18");
 }
 
+static void trigger_dfr_int(void *info)
+{
+	asm volatile("int $244");
+}
+
+static void trigger_thr_int(void *info)
+{
+	asm volatile("int $249");
+}
+
 static void do_inject(void)
 {
 	u64 mcg_status = 0;
@@ -202,6 +216,20 @@ static void do_inject(void)
 		return;
 	}
 
+	if (inj_type == DFR_INT_INJ) {
+		/*
+		 * Ensure necessary status bits for deferred errors:
+		 * a. MCx_STATUS[Deferred] is set -
+		 *	This is to ensure the error will be handled by the
+		 *	interrupt handler
+		 * b. unset MCx_STATUS[UC]
+		 *	As deferred errors are _not_ UC
+		 */
+
+		i_mce.status |= MCI_STATUS_DEFERRED;
+		i_mce.status |= (i_mce.status & ~MCI_STATUS_UC);
+	}
+
 	get_online_cpus();
 	if (!cpu_online(cpu))
 		goto err;
@@ -228,7 +256,12 @@ static void do_inject(void)
 
 	toggle_hw_mce_inject(cpu, false);
 
-	smp_call_function_single(cpu, trigger_mce, NULL, 0);
+	if (inj_type == DFR_INT_INJ)
+		smp_call_function_single(cpu, trigger_dfr_int, NULL, 0);
+	else if (inj_type == THR_INT_INJ)
+		smp_call_function_single(cpu, trigger_thr_int, NULL, 0);
+	else
+		smp_call_function_single(cpu, trigger_mce, NULL, 0);
 
 err:
 	put_online_cpus();
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 5/6] edac, mce_amd_inj: Add README file
  2015-05-27 19:03 [PATCH 0/6] Updates to EDAC mce_amd_inj Aravind Gopalakrishnan
                   ` (3 preceding siblings ...)
  2015-05-27 19:03 ` [PATCH 4/6] edac, mce_amd_inj: Add capability to trigger apic interrupts Aravind Gopalakrishnan
@ 2015-05-27 19:03 ` Aravind Gopalakrishnan
  2015-05-29 15:38   ` Borislav Petkov
  2015-05-27 19:03 ` [PATCH 6/6] edac, mce_amd_inj: Inject errors on NBC for bank 4 errors Aravind Gopalakrishnan
  5 siblings, 1 reply; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-27 19:03 UTC (permalink / raw)
  To: bp, dougthompson, mchehab; +Cc: linux-edac, linux-kernel

Provides information about each file and the usages.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 drivers/edac/mce_amd_inj.c | 68 +++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 61 insertions(+), 7 deletions(-)

diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index f1c1433..ca5b29f 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -291,17 +291,71 @@ MCE_INJECT_GET(bank);
 
 DEFINE_SIMPLE_ATTRIBUTE(bank_fops, inj_bank_get, inj_bank_set, "%llu\n");
 
+static const char readme_msg[] =
+	"\nDescription of the files and their usages:\n\n"
+	"status:  Set a value to be programmed into MCx_STATUS(bank)\n"
+	"\t The status bits provide insight into the type of\n"
+	"\t error that caused the MCE.\n\n"
+	"misc:	 Set value of MCx_MISC(bank)\n"
+	"\t misc register provides auxiliary info about the error. This\n"
+	"\t register is typically used for error thresholding purpose and\n"
+	"\t validity of the register is indicated by MCx_STATUS[MiscV]\n\n"
+	"addr:	 Error address value to be written to MCx_ADDR(bank)\n"
+	"\t This register is used to log address information associated\n"
+	"\t with the error.\n\n"
+	"Note:	 See respective BKDGs for the exact bit definitions of the\n"
+	"\t above registers as they mirror the MCi_[STATUS | MISC | ADDR]\n"
+	"\t hardware registers.\n\n"
+	"bank:	 Specify the bank you want to inject the error into.\n"
+	"\t The number of banks in a processor varies and is family/model\n"
+	"\t dependent. So, a sanity check performed while writing.\n"
+	"\t Writing to this file will trigger a #MC or APIC interrupts or\n"
+	"\t invoke the error decoder routines for AMD processors. The value\n"
+	"\t in 'flags' file decides which of above actions is triggered.\n\n"
+	"flags:	 Write to this file to speficy the error injection policy.\n"
+	"\t Allowed values:\n"
+	"\t\t\"sw\"  -	SW error injection, Only calls error decoder\n"
+	"\t\t\troutines to print error info in human readable format\n"
+	"\t\t\"hw\"  -	HW error injection, Forces a #MC,\n"
+	"\t\t\tcauses exception handler to handle the error\n"
+	"\t\t\tif UC or poll handler catches it if CE\n"
+	"\t\t\tWarning: Might cause system panic if MCx_STATUS[PCC]\n"
+	"\t\t\tis set. For debug purposes, consider setting\n"
+	"\t\t\t/<debugfs_mountpoint>/mce/fake_panic\n"
+	"\t\t\"dfr\" -	Trigger APIC interrupt for Deferred error\n"
+	"\t\t\tError is handled by deferred error apic handler if\n"
+	"\t\t\tfeature is present in HW.\n"
+	"\t\t\"thr\" -	Trigger APIC interrupt for threshold error\n"
+	"\t\t\tError is handled by threshold apic handler\n\n"
+	"cpu:	 The cpu to inject the error on.\n\n"
+;
+
+static ssize_t
+inj_readme_read(struct file *filp, char __user *ubuf,
+		       size_t cnt, loff_t *ppos)
+{
+	return simple_read_from_buffer(ubuf, cnt, ppos,
+					readme_msg, strlen(readme_msg));
+}
+
+static const struct file_operations readme_fops = {
+	.read		= inj_readme_read,
+};
+
 static struct dfs_node {
 	char *name;
 	struct dentry *d;
 	const struct file_operations *fops;
+	umode_t perm;
 } dfs_fls[] = {
-	{ .name = "status",	.fops = &status_fops },
-	{ .name = "misc",	.fops = &misc_fops },
-	{ .name = "addr",	.fops = &addr_fops },
-	{ .name = "bank",	.fops = &bank_fops },
-	{ .name = "flags",	.fops = &flags_fops },
-	{ .name = "cpu",	.fops = &extcpu_fops },
+	{ .name = "status",	.fops = &status_fops,	S_IRUSR | S_IWUSR },
+	{ .name = "misc",	.fops = &misc_fops,	S_IRUSR | S_IWUSR },
+	{ .name = "addr",	.fops = &addr_fops,	S_IRUSR | S_IWUSR },
+	{ .name = "bank",	.fops = &bank_fops,	S_IRUSR | S_IWUSR },
+	{ .name = "flags",	.fops = &flags_fops,	S_IRUSR | S_IWUSR },
+	{ .name = "cpu",	.fops = &extcpu_fops,	S_IRUSR | S_IWUSR },
+	{ .name = "README",	.fops = &readme_fops,	S_IRUSR | S_IRGRP |
+							S_IROTH },
 };
 
 static int __init init_mce_inject(void)
@@ -318,7 +372,7 @@ static int __init init_mce_inject(void)
 
 	for (i = 0; i < ARRAY_SIZE(dfs_fls); i++) {
 		dfs_fls[i].d = debugfs_create_file(dfs_fls[i].name,
-						    S_IRUSR | S_IWUSR,
+						    dfs_fls[i].perm,
 						    dfs_inj,
 						    &i_mce,
 						    dfs_fls[i].fops);
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 6/6] edac, mce_amd_inj: Inject errors on NBC for bank 4 errors
  2015-05-27 19:03 [PATCH 0/6] Updates to EDAC mce_amd_inj Aravind Gopalakrishnan
                   ` (4 preceding siblings ...)
  2015-05-27 19:03 ` [PATCH 5/6] edac, mce_amd_inj: Add README file Aravind Gopalakrishnan
@ 2015-05-27 19:03 ` Aravind Gopalakrishnan
  2015-05-29 16:00   ` Borislav Petkov
  5 siblings, 1 reply; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-27 19:03 UTC (permalink / raw)
  To: bp, dougthompson, mchehab; +Cc: linux-edac, linux-kernel

For bank 4 errors, MCE logging and reporting is done only on
node base cores. Refer D18F3x44[NbMcaToMstCpuEn] field in
Fam10h and later BKDGs.

This patch ensures that we inject the error on the node base core
for bank 4 errors. Otherwise, triggering #MC or apic interrupts on
a non node base core would not have any effect on the system.
(i.e), we would not see any relevant output on kernel logs for
the error we just injected.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
---
 drivers/edac/mce_amd_inj.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 76 insertions(+)

diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
index ca5b29f..bd44ba4 100644
--- a/drivers/edac/mce_amd_inj.c
+++ b/drivers/edac/mce_amd_inj.c
@@ -17,9 +17,12 @@
 #include <linux/cpu.h>
 #include <linux/string.h>
 #include <linux/uaccess.h>
+#include <linux/pci.h>
 #include <asm/mce.h>
+#include <asm/amd_nb.h>
 
 #include "mce_amd.h"
+#include "amd64_edac.h"
 
 /*
  * Collect all the MCi_XXX settings
@@ -205,6 +208,65 @@ static void trigger_thr_int(void *info)
 	asm volatile("int $249");
 }
 
+static u32 amd_get_num_nodes(void)
+{
+	u32 nodes = 1;
+
+	if (cpu_has_topoext) {
+		u32 ecx;
+
+		ecx = cpuid_ecx(0x8000001e);
+		nodes = ((ecx >> 8) & 7) + 1;
+	} else if (static_cpu_has(X86_FEATURE_NODEID_MSR)) {
+		u64 value;
+
+		rdmsrl(MSR_FAM10H_NODE_ID, value);
+		nodes = ((value >> 3) & 7) + 1;
+	}
+
+	return nodes;
+}
+
+static u32 amd_get_nbc_for_node(int node_id)
+{
+	struct cpuinfo_x86 *c = &boot_cpu_data;
+	u32 nodes, cores_per_node;
+
+	nodes = amd_get_num_nodes();
+	cores_per_node = c->x86_max_cores / nodes;
+
+	return cores_per_node * node_id;
+}
+
+static void toggle_nb_mca_mst_cpu(u16 nid)
+{
+	struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
+	u32 val;
+	int err;
+
+	if (!F3)
+		return;
+
+	err = pci_read_config_dword(F3, NBCFG, &val);
+	if (err) {
+		pr_err("%s: Error reading F%dx%03x.\n", __func__,
+			PCI_FUNC(F3->devfn),
+			NBCFG);
+		return;
+	}
+
+	if (!(val & BIT(27))) {
+		pr_err("%s: BIOS not setting D18F3x44[NbMcaToMstCpuEn]."
+		       "Doing that here\n", __func__);
+		val |= BIT(27);
+		err = pci_write_config_dword(F3, NBCFG, val);
+		if (err)
+			pr_err("%s: Error writing F%dx%03x.\n", __func__,
+				PCI_FUNC(F3->devfn),
+				NBCFG);
+	}
+}
+
 static void do_inject(void)
 {
 	u64 mcg_status = 0;
@@ -240,6 +302,20 @@ static void do_inject(void)
 	if (!(i_mce.status & MCI_STATUS_PCC))
 		mcg_status |= MCG_STATUS_RIPV;
 
+	/*
+	 * For multi node cpus, logging and reporting of bank == 4 errors
+	 * happen only on the node base core. Refer D18F3x44[NbMcaToMstCpuEn]
+	 * for Fam10h and later BKDGs
+	 */
+	if (static_cpu_has(X86_FEATURE_AMD_DCM) && b == 4) {
+		/*
+		 * BIOS sets D18F3x44[NbMcaToMstCpuEn] by default.
+		 * But make sure of it here just in case..
+		 */
+		toggle_nb_mca_mst_cpu(amd_get_nb_id(cpu));
+		cpu = amd_get_nbc_for_node(amd_get_nb_id(cpu));
+	}
+
 	toggle_hw_mce_inject(cpu, true);
 
 	wrmsr_on_cpu(cpu, MSR_IA32_MCG_STATUS,
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments
  2015-05-27 19:03 ` [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments Aravind Gopalakrishnan
@ 2015-05-29 13:49   ` Borislav Petkov
  2015-05-29 18:20     ` Aravind Gopalakrishnan
  2015-06-01 19:16     ` Aravind Gopalakrishnan
  0 siblings, 2 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-05-29 13:49 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On Wed, May 27, 2015 at 02:03:35PM -0500, Aravind Gopalakrishnan wrote:
> Use char values such as "hw" or "sw" to indicate the type of error
> injection to be performed.
> 
> Current flags attribute derives the meanings of values that can be
> programmed into it from asm/mce.h. Moving to defined strings for the
> atribute allows this module to be self sufficient and removes the
> dependency. Also, we can introduce new flags as and when needed without
> having to worry about conflicting with the flags already defined
> in asm/mce.h
> 
> Also, modify do_inject() to use the newly defined injection_type enum
> to figure out the injection mechanism we need to use
> 
> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> ---
>  drivers/edac/mce_amd_inj.c | 85 +++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 76 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
> index 15f6aa1..daec4af 100644
> --- a/drivers/edac/mce_amd_inj.c
> +++ b/drivers/edac/mce_amd_inj.c
> @@ -15,6 +15,8 @@
>  #include <linux/device.h>
>  #include <linux/module.h>
>  #include <linux/cpu.h>
> +#include <linux/string.h>
> +#include <linux/uaccess.h>
>  #include <asm/mce.h>
>  
>  #include "mce_amd.h"
> @@ -27,6 +29,23 @@ static struct dentry *dfs_inj;
>  
>  static u8 n_banks;
>  
> +#define MAX_FLAG_OPT_SIZE	10

Why 10?

This should be 2 and increased when another, longer injection type
string gets introduced.

> +
> +enum injection_type {
> +	SW_INJ = 0,	/* SW injection, simply decode the error */
> +	HW_INJ,		/* Trigger a #MC */
> +	N_INJ_TYPES,
> +};
> +
> +static const char * const flags_options[] = {
> +	[SW_INJ] = "sw",
> +	[HW_INJ] = "hw",
> +	NULL
> +};


> +
> +/* Set default injection to SW_INJ */
> +enum injection_type inj_type = SW_INJ;
> +
>  #define MCE_INJECT_SET(reg)						\
>  static int inj_##reg##_set(void *data, u64 val)				\
>  {									\
> @@ -81,24 +100,72 @@ static int toggle_hw_mce_inject(unsigned int cpu, bool enable)
>  	return err;
>  }
>  
> -static int flags_get(void *data, u64 *val)
> +static int __set_inj(const char *buf, size_t cnt)
>  {
> -	struct mce *m = (struct mce *)data;
> +	int i;
> +	const char *inj_op = NULL;
>  
> -	*val = m->inject_flags;
> +	for (i = 0; i <= N_INJ_TYPES; i++) {
> +		inj_op = flags_options[i];
> +		if (inj_op && strncmp(inj_op, buf, cnt) == 0)
> +			break;
> +	}

What's wrong with:

	for (i = 0; i < N_INJ_TYPES; i++) {
		if (!strncmp(flags_options[i], buf, strlen(flags_options[i]))) {
			inj_type = i;
			return 0;
		}
	}

	return -EINVAL;

> +
> +	if (!inj_op)
> +		return -EINVAL;
> +
> +	inj_type = i;
>  
>  	return 0;
>  }
>  
> -static int flags_set(void *data, u64 val)
> +static ssize_t flags_read(struct file *filp, char __user *ubuf,
> +			  size_t cnt, loff_t *ppos)
>  {
> -	struct mce *m = (struct mce *)data;
> +	char buf[MAX_FLAG_OPT_SIZE + 1];
> +	int n;
>  
> -	m->inject_flags = (u8)val;
> -	return 0;
> +	n = sprintf(buf, "%s\n", flags_options[inj_type]);
> +
> +	return simple_read_from_buffer(ubuf, cnt, ppos, buf, n);
>  }
>  
> -DEFINE_SIMPLE_ATTRIBUTE(flags_fops, flags_get, flags_set, "%llu\n");
> +static ssize_t flags_write(struct file *filp, const char __user *ubuf,
> +			   size_t cnt, loff_t *ppos)
> +{
> +	char buf[MAX_FLAG_OPT_SIZE + 1];
> +	int err;
> +	size_t ret;
> +
> +	ret = cnt;

You're assigning cnt to ret here...

> +
> +	if (cnt > MAX_FLAG_OPT_SIZE)
> +		cnt = MAX_FLAG_OPT_SIZE;

... but correcting cnt afterwards. The assignment should be *after* that
correction.

> +
> +	if (copy_from_user(&buf, ubuf, cnt))
> +		return -EFAULT;
> +
> +	buf[cnt] = 0;
> +
> +	/* strip whitespaces.. */
> +	strstrip(buf);
> +
> +	err = __set_inj(buf, cnt - 1);
> +	if (err) {
> +		pr_err("%s: Invalid flags value: %s\n", __func__, buf);
> +		return err;
> +	}
> +
> +	*ppos += ret;
> +
> +	return ret;
> +}
> +
> +static const struct file_operations flags_fops = {
> +	.read           = flags_read,
> +	.write          = flags_write,
> +	.llseek         = generic_file_llseek,
> +};
>  
>  /*
>   * On which CPU to inject?
> @@ -130,7 +197,7 @@ static void do_inject(void)
>  	unsigned int cpu = i_mce.extcpu;
>  	u8 b = i_mce.bank;
>  
> -	if (!(i_mce.inject_flags & MCJ_EXCEPTION)) {
> +	if (inj_type == SW_INJ) {
>  		amd_decode_mce(NULL, 0, &i_mce);
>  		return;
>  	}
> -- 
> 2.4.0
> 
> 

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] edac, mce_amd_inj: Add capability to trigger apic interrupts
  2015-05-27 19:03 ` [PATCH 4/6] edac, mce_amd_inj: Add capability to trigger apic interrupts Aravind Gopalakrishnan
@ 2015-05-29 15:36   ` Borislav Petkov
  2015-05-29 18:27     ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2015-05-29 15:36 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On Wed, May 27, 2015 at 02:03:36PM -0500, Aravind Gopalakrishnan wrote:
> With this extension to the flags attribute, deferred error interrupts
> and threshold interrupts can be triggered to test the apic interrupt
> handler functionality for these type of errors
> 
> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> ---
>  drivers/edac/mce_amd_inj.c | 35 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 34 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
> index daec4af..f1c1433 100644
> --- a/drivers/edac/mce_amd_inj.c
> +++ b/drivers/edac/mce_amd_inj.c
> @@ -34,12 +34,16 @@ static u8 n_banks;
>  enum injection_type {
>  	SW_INJ = 0,	/* SW injection, simply decode the error */
>  	HW_INJ,		/* Trigger a #MC */
> +	DFR_INT_INJ,	/* Trigger Deferred error interrupt */
> +	THR_INT_INJ,	/* Trigger threshold interrupt */
>  	N_INJ_TYPES,
>  };
>  
>  static const char * const flags_options[] = {
>  	[SW_INJ] = "sw",
>  	[HW_INJ] = "hw",
> +	[DFR_INT_INJ] = "dfr",
> +	[THR_INT_INJ] = "thr",
>  	NULL
>  };
>  
> @@ -191,6 +195,16 @@ static void trigger_mce(void *info)
>  	asm volatile("int $18");
>  }
>  
> +static void trigger_dfr_int(void *info)
> +{
> +	asm volatile("int $244");
> +}
> +
> +static void trigger_thr_int(void *info)
> +{
> +	asm volatile("int $249");
> +}

Hardcoded naked numbers huh?

Guess what happens when someone changes DEFERRED_ERROR_VECTOR and
THRESHOLD_APIC_VECTOR.

> +
>  static void do_inject(void)
>  {
>  	u64 mcg_status = 0;
> @@ -202,6 +216,20 @@ static void do_inject(void)
>  		return;
>  	}
>  
> +	if (inj_type == DFR_INT_INJ) {
> +		/*
> +		 * Ensure necessary status bits for deferred errors:
> +		 * a. MCx_STATUS[Deferred] is set -
> +		 *	This is to ensure the error will be handled by the
> +		 *	interrupt handler
> +		 * b. unset MCx_STATUS[UC]
> +		 *	As deferred errors are _not_ UC
> +		 */
> +
> +		i_mce.status |= MCI_STATUS_DEFERRED;
> +		i_mce.status |= (i_mce.status & ~MCI_STATUS_UC);

		i_mce.status &= ~MCI_STATUS_UC;

> +	}
> +
>  	get_online_cpus();
>  	if (!cpu_online(cpu))
>  		goto err;
> @@ -228,7 +256,12 @@ static void do_inject(void)
>  
>  	toggle_hw_mce_inject(cpu, false);
>  
> -	smp_call_function_single(cpu, trigger_mce, NULL, 0);
> +	if (inj_type == DFR_INT_INJ)
> +		smp_call_function_single(cpu, trigger_dfr_int, NULL, 0);
> +	else if (inj_type == THR_INT_INJ)
> +		smp_call_function_single(cpu, trigger_thr_int, NULL, 0);
> +	else
> +		smp_call_function_single(cpu, trigger_mce, NULL, 0);

I guess a switch-case is kinda offering itself here...

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] edac, mce_amd_inj: Add README file
  2015-05-27 19:03 ` [PATCH 5/6] edac, mce_amd_inj: Add README file Aravind Gopalakrishnan
@ 2015-05-29 15:38   ` Borislav Petkov
  2015-05-29 18:29     ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2015-05-29 15:38 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On Wed, May 27, 2015 at 02:03:37PM -0500, Aravind Gopalakrishnan wrote:
> Provides information about each file and the usages.
> 
> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> ---
>  drivers/edac/mce_amd_inj.c | 68 +++++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 61 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
> index f1c1433..ca5b29f 100644
> --- a/drivers/edac/mce_amd_inj.c
> +++ b/drivers/edac/mce_amd_inj.c
> @@ -291,17 +291,71 @@ MCE_INJECT_GET(bank);
>  
>  DEFINE_SIMPLE_ATTRIBUTE(bank_fops, inj_bank_get, inj_bank_set, "%llu\n");
>  
> +static const char readme_msg[] =
> +	"\nDescription of the files and their usages:\n\n"
> +	"status:  Set a value to be programmed into MCx_STATUS(bank)\n"
> +	"\t The status bits provide insight into the type of\n"
> +	"\t error that caused the MCE.\n\n"
> +	"misc:	 Set value of MCx_MISC(bank)\n"
> +	"\t misc register provides auxiliary info about the error. This\n"
> +	"\t register is typically used for error thresholding purpose and\n"
> +	"\t validity of the register is indicated by MCx_STATUS[MiscV]\n\n"
> +	"addr:	 Error address value to be written to MCx_ADDR(bank)\n"
> +	"\t This register is used to log address information associated\n"
> +	"\t with the error.\n\n"
> +	"Note:	 See respective BKDGs for the exact bit definitions of the\n"
> +	"\t above registers as they mirror the MCi_[STATUS | MISC | ADDR]\n"
> +	"\t hardware registers.\n\n"
> +	"bank:	 Specify the bank you want to inject the error into.\n"
> +	"\t The number of banks in a processor varies and is family/model\n"
> +	"\t dependent. So, a sanity check performed while writing.\n"
> +	"\t Writing to this file will trigger a #MC or APIC interrupts or\n"
> +	"\t invoke the error decoder routines for AMD processors. The value\n"
> +	"\t in 'flags' file decides which of above actions is triggered.\n\n"
> +	"flags:	 Write to this file to speficy the error injection policy.\n"
> +	"\t Allowed values:\n"
> +	"\t\t\"sw\"  -	SW error injection, Only calls error decoder\n"
> +	"\t\t\troutines to print error info in human readable format\n"
> +	"\t\t\"hw\"  -	HW error injection, Forces a #MC,\n"
> +	"\t\t\tcauses exception handler to handle the error\n"
> +	"\t\t\tif UC or poll handler catches it if CE\n"
> +	"\t\t\tWarning: Might cause system panic if MCx_STATUS[PCC]\n"
> +	"\t\t\tis set. For debug purposes, consider setting\n"
> +	"\t\t\t/<debugfs_mountpoint>/mce/fake_panic\n"
> +	"\t\t\"dfr\" -	Trigger APIC interrupt for Deferred error\n"
> +	"\t\t\tError is handled by deferred error apic handler if\n"
> +	"\t\t\tfeature is present in HW.\n"
> +	"\t\t\"thr\" -	Trigger APIC interrupt for threshold error\n"
> +	"\t\t\tError is handled by threshold apic handler\n\n"
> +	"cpu:	 The cpu to inject the error on.\n\n"
> +;

This one needs to be split in two - the second one adding the readme file...

> +static ssize_t
> +inj_readme_read(struct file *filp, char __user *ubuf,
> +		       size_t cnt, loff_t *ppos)
> +{
> +	return simple_read_from_buffer(ubuf, cnt, ppos,
> +					readme_msg, strlen(readme_msg));
> +}
> +
> +static const struct file_operations readme_fops = {
> +	.read		= inj_readme_read,
> +};
> +
>  static struct dfs_node {
>  	char *name;
>  	struct dentry *d;
>  	const struct file_operations *fops;
> +	umode_t perm;
>  } dfs_fls[] = {
> -	{ .name = "status",	.fops = &status_fops },
> -	{ .name = "misc",	.fops = &misc_fops },
> -	{ .name = "addr",	.fops = &addr_fops },
> -	{ .name = "bank",	.fops = &bank_fops },
> -	{ .name = "flags",	.fops = &flags_fops },
> -	{ .name = "cpu",	.fops = &extcpu_fops },
> +	{ .name = "status",	.fops = &status_fops,	S_IRUSR | S_IWUSR },
> +	{ .name = "misc",	.fops = &misc_fops,	S_IRUSR | S_IWUSR },
> +	{ .name = "addr",	.fops = &addr_fops,	S_IRUSR | S_IWUSR },
> +	{ .name = "bank",	.fops = &bank_fops,	S_IRUSR | S_IWUSR },
> +	{ .name = "flags",	.fops = &flags_fops,	S_IRUSR | S_IWUSR },
> +	{ .name = "cpu",	.fops = &extcpu_fops,	S_IRUSR | S_IWUSR },
> +	{ .name = "README",	.fops = &readme_fops,	S_IRUSR | S_IRGRP |
> +							S_IROTH },

... and the first one adding perm to struct dfs_node.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] edac, mce_amd_inj: Inject errors on NBC for bank 4 errors
  2015-05-27 19:03 ` [PATCH 6/6] edac, mce_amd_inj: Inject errors on NBC for bank 4 errors Aravind Gopalakrishnan
@ 2015-05-29 16:00   ` Borislav Petkov
  2015-05-29 18:52     ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2015-05-29 16:00 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On Wed, May 27, 2015 at 02:03:38PM -0500, Aravind Gopalakrishnan wrote:
> For bank 4 errors, MCE logging and reporting is done only on
> node base cores. Refer D18F3x44[NbMcaToMstCpuEn] field in
> Fam10h and later BKDGs.
> 
> This patch ensures that we inject the error on the node base core
> for bank 4 errors. Otherwise, triggering #MC or apic interrupts on
> a non node base core would not have any effect on the system.
> (i.e), we would not see any relevant output on kernel logs for
> the error we just injected.
> 
> Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
> ---
>  drivers/edac/mce_amd_inj.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 76 insertions(+)
> 
> diff --git a/drivers/edac/mce_amd_inj.c b/drivers/edac/mce_amd_inj.c
> index ca5b29f..bd44ba4 100644
> --- a/drivers/edac/mce_amd_inj.c
> +++ b/drivers/edac/mce_amd_inj.c
> @@ -17,9 +17,12 @@
>  #include <linux/cpu.h>
>  #include <linux/string.h>
>  #include <linux/uaccess.h>
> +#include <linux/pci.h>
>  #include <asm/mce.h>
> +#include <asm/amd_nb.h>
>  
>  #include "mce_amd.h"
> +#include "amd64_edac.h"
>  
>  /*
>   * Collect all the MCi_XXX settings
> @@ -205,6 +208,65 @@ static void trigger_thr_int(void *info)
>  	asm volatile("int $249");
>  }
>  
> +static u32 amd_get_num_nodes(void)
> +{
> +	u32 nodes = 1;
> +
> +	if (cpu_has_topoext) {
> +		u32 ecx;
> +
> +		ecx = cpuid_ecx(0x8000001e);
> +		nodes = ((ecx >> 8) & 7) + 1;
> +	} else if (static_cpu_has(X86_FEATURE_NODEID_MSR)) {
> +		u64 value;
> +
> +		rdmsrl(MSR_FAM10H_NODE_ID, value);
> +		nodes = ((value >> 3) & 7) + 1;
> +	}

So we already do that (and more) in amd_get_topology(). I'm thinking
you should take this function out of the CONFIG_X86_HT ifdeffery (also
in its caller amd_detect_cmp()) and you should save "nodes" in a local
static variable called nodes_per_processor and a small accessor called
amd_get_nodes_cnt() should return it. Similar to amd_get_nb_id().

Don't forget to add a comment explaning what that nodes_per_processor
means.

And then amd_mce_inj.c will simply use it instead of duplicating that
information here.

Please do that in 2 pre-patches.

> +
> +	return nodes;
> +}
> +
> +static u32 amd_get_nbc_for_node(int node_id)
> +{
> +	struct cpuinfo_x86 *c = &boot_cpu_data;
> +	u32 nodes, cores_per_node;
> +
> +	nodes = amd_get_num_nodes();
> +	cores_per_node = c->x86_max_cores / nodes;
> +
> +	return cores_per_node * node_id;
> +}
> +
> +static void toggle_nb_mca_mst_cpu(u16 nid)
> +{
> +	struct pci_dev *F3 = node_to_amd_nb(nid)->misc;
> +	u32 val;
> +	int err;
> +
> +	if (!F3)
> +		return;
> +
> +	err = pci_read_config_dword(F3, NBCFG, &val);
> +	if (err) {
> +		pr_err("%s: Error reading F%dx%03x.\n", __func__,
> +			PCI_FUNC(F3->devfn),
> +			NBCFG);
> +		return;
> +	}
> +
> +	if (!(val & BIT(27))) {
> +		pr_err("%s: BIOS not setting D18F3x44[NbMcaToMstCpuEn]."
> +		       "Doing that here\n", __func__);

WARNING: quoted string split across lines
#99: FILE: drivers/edac/mce_amd_inj.c:260:
+               pr_err("%s: BIOS not setting D18F3x44[NbMcaToMstCpuEn]."
+                      "Doing that here\n", __func__);

Do integrate checkpatch.pl into your workflow. It is sometimes right.

> +		val |= BIT(27);
> +		err = pci_write_config_dword(F3, NBCFG, val);
> +		if (err)
> +			pr_err("%s: Error writing F%dx%03x.\n", __func__,
> +				PCI_FUNC(F3->devfn),
> +				NBCFG);
> +	}
> +}

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments
  2015-05-29 13:49   ` Borislav Petkov
@ 2015-05-29 18:20     ` Aravind Gopalakrishnan
  2015-05-29 19:05       ` Borislav Petkov
  2015-06-01 19:16     ` Aravind Gopalakrishnan
  1 sibling, 1 reply; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-29 18:20 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On 5/29/2015 8:49 AM, Borislav Petkov wrote:
> On Wed, May 27, 2015 at 02:03:35PM -0500, Aravind Gopalakrishnan wrote:
>>   
>> +#define MAX_FLAG_OPT_SIZE	10
> Why 10?

No specific reason. Just an arbitrary max value that we won't hit right 
now or in the future.

> This should be 2 and increased when another, longer injection type
> string gets introduced.

Okay, I'll make it 2 here and will increase it to 3 in the next patch 
when I introduce injection types for the interrupts.

>
>>   
>> -	*val = m->inject_flags;
>> +	for (i = 0; i <= N_INJ_TYPES; i++) {
>> +		inj_op = flags_options[i];
>> +		if (inj_op && strncmp(inj_op, buf, cnt) == 0)
>> +			break;
>> +	}
> What's wrong with:
>
> 	for (i = 0; i < N_INJ_TYPES; i++) {
> 		if (!strncmp(flags_options[i], buf, strlen(flags_options[i]))) {
> 			inj_type = i;
> 			return 0;
> 		}
> 	}
>
> 	return -EINVAL;

Hmm. That should work. Will simplify it in the next version.
I think I had a NULL in flags_options[] to denote the invalid option.
With your simplification, that can be removed. I'll make that change too.

And the 'cnt' value that I used won't be necessary. So, will remove that 
from __set_inj() definition as well.

>> +static ssize_t flags_write(struct file *filp, const char __user *ubuf,
>> +			   size_t cnt, loff_t *ppos)
>> +{
>> +	char buf[MAX_FLAG_OPT_SIZE + 1];
>> +	int err;
>> +	size_t ret;
>> +
>> +	ret = cnt;
> You're assigning cnt to ret here...
>
>> +
>> +	if (cnt > MAX_FLAG_OPT_SIZE)
>> +		cnt = MAX_FLAG_OPT_SIZE;
> ... but correcting cnt afterwards. The assignment should be *after* that
> correction.

With the change to MAX_FLAG_OPT_SIZE to 2 (or 3 depending on the max 
size), this check is
really an error check.

Shall I make err = -EINVAL by default and return that here if the 
condition is not satisfied?

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] edac, mce_amd_inj: Add capability to trigger apic interrupts
  2015-05-29 15:36   ` Borislav Petkov
@ 2015-05-29 18:27     ` Aravind Gopalakrishnan
  2015-05-29 19:18       ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-29 18:27 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On 5/29/2015 10:36 AM, Borislav Petkov wrote:
> On Wed, May 27, 2015 at 02:03:36PM -0500, Aravind Gopalakrishnan wrote:
>>   
>> +static void trigger_dfr_int(void *info)
>> +{
>> +	asm volatile("int $244");
>> +}
>> +
>> +static void trigger_thr_int(void *info)
>> +{
>> +	asm volatile("int $249");
>> +}
> Hardcoded naked numbers huh?
>
> Guess what happens when someone changes DEFERRED_ERROR_VECTOR and
> THRESHOLD_APIC_VECTOR.

Right. Sorry about that.
Fixed it thusly:
         u8 dfr_vec = DEFERRED_ERROR_VECTOR;
         asm volatile("int %0"
                      :: "n" (dfr_vec));

and similar for threshold interrupt as well.

Tested the above and it seems to work fine.

>   
> -	smp_call_function_single(cpu, trigger_mce, NULL, 0);
> +	if (inj_type == DFR_INT_INJ)
> +		smp_call_function_single(cpu, trigger_dfr_int, NULL, 0);
> +	else if (inj_type == THR_INT_INJ)
> +		smp_call_function_single(cpu, trigger_thr_int, NULL, 0);
> +	else
> +		smp_call_function_single(cpu, trigger_mce, NULL, 0);
> I guess a switch-case is kinda offering itself here...
>

Ok, will switch it:)

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 5/6] edac, mce_amd_inj: Add README file
  2015-05-29 15:38   ` Borislav Petkov
@ 2015-05-29 18:29     ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-29 18:29 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On 5/29/2015 10:38 AM, Borislav Petkov wrote:
>
> This one needs to be split in two - the second one adding the readme file...

Ack.

>> +static ssize_t
>> +inj_readme_read(struct file *filp, char __user *ubuf,
>> +		       size_t cnt, loff_t *ppos)
>> +{
>> +	return simple_read_from_buffer(ubuf, cnt, ppos,
>> +					readme_msg, strlen(readme_msg));
>> +}
>> +
>> +static const struct file_operations readme_fops = {
>> +	.read		= inj_readme_read,
>> +};
>> +
>>   static struct dfs_node {
>>   	char *name;
>>   	struct dentry *d;
>>   	const struct file_operations *fops;
>> +	umode_t perm;
>>   } dfs_fls[] = {
>> -	{ .name = "status",	.fops = &status_fops },
>> -	{ .name = "misc",	.fops = &misc_fops },
>> -	{ .name = "addr",	.fops = &addr_fops },
>> -	{ .name = "bank",	.fops = &bank_fops },
>> -	{ .name = "flags",	.fops = &flags_fops },
>> -	{ .name = "cpu",	.fops = &extcpu_fops },
>> +	{ .name = "status",	.fops = &status_fops,	S_IRUSR | S_IWUSR },
>> +	{ .name = "misc",	.fops = &misc_fops,	S_IRUSR | S_IWUSR },
>> +	{ .name = "addr",	.fops = &addr_fops,	S_IRUSR | S_IWUSR },
>> +	{ .name = "bank",	.fops = &bank_fops,	S_IRUSR | S_IWUSR },
>> +	{ .name = "flags",	.fops = &flags_fops,	S_IRUSR | S_IWUSR },
>> +	{ .name = "cpu",	.fops = &extcpu_fops,	S_IRUSR | S_IWUSR },
>> +	{ .name = "README",	.fops = &readme_fops,	S_IRUSR | S_IRGRP |
>> +							S_IROTH },
> ... and the first one adding perm to struct dfs_node.
>

Will do.

Thanks,
-Aravind.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] edac, mce_amd_inj: Inject errors on NBC for bank 4 errors
  2015-05-29 16:00   ` Borislav Petkov
@ 2015-05-29 18:52     ` Aravind Gopalakrishnan
  2015-05-29 19:23       ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-29 18:52 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On 5/29/2015 11:00 AM, Borislav Petkov wrote:
> On Wed, May 27, 2015 at 02:03:38PM -0500, Aravind Gopalakrishnan wrote:
>>   
>> +static u32 amd_get_num_nodes(void)
>> +{
>> +	u32 nodes = 1;
>> +
>> +	if (cpu_has_topoext) {
>> +		u32 ecx;
>> +
>> +		ecx = cpuid_ecx(0x8000001e);
>> +		nodes = ((ecx >> 8) & 7) + 1;
>> +	} else if (static_cpu_has(X86_FEATURE_NODEID_MSR)) {
>> +		u64 value;
>> +
>> +		rdmsrl(MSR_FAM10H_NODE_ID, value);
>> +		nodes = ((value >> 3) & 7) + 1;
>> +	}
> So we already do that (and more) in amd_get_topology(). I'm thinking
> you should take this function out of the CONFIG_X86_HT ifdeffery (also
> in its caller amd_detect_cmp()) and you should save "nodes" in a local
> static variable called nodes_per_processor and a small accessor called
> amd_get_nodes_cnt() should return it. Similar to amd_get_nb_id().

  I can remove the #ifdefs, but I'm wondering why it was there to begin 
with..
CONFIG_X86_HT defaults to 'Y' anyway right?

And OK, will add the function in a separate pre-patch.

> Don't forget to add a comment explaning what that nodes_per_processor
> means.

Will do.

> And then amd_mce_inj.c will simply use it instead of duplicating that
> information here.
>
> Please do that in 2 pre-patches.

Will do.

>> +
>> +	if (!(val & BIT(27))) {
>> +		pr_err("%s: BIOS not setting D18F3x44[NbMcaToMstCpuEn]."
>> +		       "Doing that here\n", __func__);
> WARNING: quoted string split across lines
> #99: FILE: drivers/edac/mce_amd_inj.c:260:
> +               pr_err("%s: BIOS not setting D18F3x44[NbMcaToMstCpuEn]."
> +                      "Doing that here\n", __func__);
>
> Do integrate checkpatch.pl into your workflow. It is sometimes right.

Yeah, I do run checkpatch. With this, I think the line was going above 
79 chars. So split it.
I thought we split such error messages right?
(amd64_edac and mce_amd.c for example have such instances)


Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments
  2015-05-29 18:20     ` Aravind Gopalakrishnan
@ 2015-05-29 19:05       ` Borislav Petkov
  2015-05-29 19:12         ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2015-05-29 19:05 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On Fri, May 29, 2015 at 01:20:31PM -0500, Aravind Gopalakrishnan wrote:
> Hmm. That should work. Will simplify it in the next version.
> I think I had a NULL in flags_options[] to denote the invalid option.

Still a good thing to have it as a terminator. But the loop I proposed
doesn't even touch it: i < N_INJ_TYPES. I'd leave the NULL array
terminator.

> Shall I make err = -EINVAL by default and return that here if the
> condition is not satisfied?

We have to make sure to not be overeager to error out - this'll make
using this facility a bit more cumbersome.

So I think your approach was correct to trim it to MAX_FLAG_OPT_SIZE.
__set_inj() will catch any string mismatches anyway.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments
  2015-05-29 19:05       ` Borislav Petkov
@ 2015-05-29 19:12         ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-29 19:12 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On 5/29/2015 2:05 PM, Borislav Petkov wrote:
> On Fri, May 29, 2015 at 01:20:31PM -0500, Aravind Gopalakrishnan wrote:
>> Hmm. That should work. Will simplify it in the next version.
>> I think I had a NULL in flags_options[] to denote the invalid option.
> Still a good thing to have it as a terminator. But the loop I proposed
> doesn't even touch it: i < N_INJ_TYPES.

Which was exactly why I was thinking of removing it. But..

> I'd leave the NULL array
> terminator.

.. OK. Shall retain.

>> Shall I make err = -EINVAL by default and return that here if the
>> condition is not satisfied?
> We have to make sure to not be overeager to error out - this'll make
> using this facility a bit more cumbersome.
>
> So I think your approach was correct to trim it to MAX_FLAG_OPT_SIZE.
> __set_inj() will catch any string mismatches anyway.
>

OK then, Will retain the original method,
and move the 'ret = cnt' assignment after this check. (Forgot to ack 
that in earlier reply)

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/6] edac, mce_amd_inj: Add capability to trigger apic interrupts
  2015-05-29 18:27     ` Aravind Gopalakrishnan
@ 2015-05-29 19:18       ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2015-05-29 19:18 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On Fri, May 29, 2015 at 01:27:48PM -0500, Aravind Gopalakrishnan wrote:
> Fixed it thusly:
>         u8 dfr_vec = DEFERRED_ERROR_VECTOR;
>         asm volatile("int %0"
>                      :: "n" (dfr_vec));

asm volatile("int %0" :: "i" (DEFERRED_ERROR_VECTOR));

should be just fine.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] edac, mce_amd_inj: Inject errors on NBC for bank 4 errors
  2015-05-29 18:52     ` Aravind Gopalakrishnan
@ 2015-05-29 19:23       ` Borislav Petkov
  2015-05-29 19:27         ` Aravind Gopalakrishnan
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2015-05-29 19:23 UTC (permalink / raw)
  To: Aravind Gopalakrishnan; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On Fri, May 29, 2015 at 01:52:32PM -0500, Aravind Gopalakrishnan wrote:
> Yeah, I do run checkpatch. With this, I think the line was going above 79
> chars. So split it.

Yeah, don't take checkpatch too seriously - only as a hint. Common sense
should be employed, instead. :)

> I thought we split such error messages right?

Right, so string error messages we do not split for easier grepping.

> (amd64_edac and mce_amd.c for example have such instances)

Yeah, those will have to be fixed gradually when we're touching the code
anyway, at some point.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 6/6] edac, mce_amd_inj: Inject errors on NBC for bank 4 errors
  2015-05-29 19:23       ` Borislav Petkov
@ 2015-05-29 19:27         ` Aravind Gopalakrishnan
  0 siblings, 0 replies; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-05-29 19:27 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On 5/29/2015 2:23 PM, Borislav Petkov wrote:
>
> Right, so string error messages we do not split for easier grepping.

OK, makes sense. Will write it out across one line in V2.

>> (amd64_edac and mce_amd.c for example have such instances)
> Yeah, those will have to be fixed gradually when we're touching the code
> anyway, at some point.
>

Thanks,
-Aravind.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments
  2015-05-29 13:49   ` Borislav Petkov
  2015-05-29 18:20     ` Aravind Gopalakrishnan
@ 2015-06-01 19:16     ` Aravind Gopalakrishnan
  1 sibling, 0 replies; 21+ messages in thread
From: Aravind Gopalakrishnan @ 2015-06-01 19:16 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: dougthompson, mchehab, linux-edac, linux-kernel

On 5/29/2015 8:49 AM, Borislav Petkov wrote:
> On Wed, May 27, 2015 at 02:03:35PM -0500, Aravind Gopalakrishnan wrote:
>>   
>> +#define MAX_FLAG_OPT_SIZE	10
> Why 10?
>
> This should be 2 and increased when another, longer injection type
> string gets introduced.
>

So I hit an issue when I have this as 2 and moving 'ret = cnt' statement 
(in flags_write() below) to after the check for MAX_FLAG_OPT_SIZE.

The problem was, if I did echo hw > flags;
I would get an error on dmesg like so-
[   78.692949] flags_write: Invalid flags value:

But the write would have actually gone through. i.e, if we do cat flags, 
the output was 'hw'.

The issue seems to be that 'cnt' of flags_write() is 3 (after accounting 
for a terminating NULL) when we enter the function (for this case),

Since we move 'ret = cnt;' statement to after the check, we end up 
returning 2.
And since it is less than 'cnt', we re-enter flags_write() for the final 
NULL character.
At this point, our comparisons with flags_options[] fail and we end up 
returning -EINVAL from __set_inj().
Hence the error on dmesg..

The fix to this was simply having MAX_FLAG_OPT_SIZE as 3. In this case, 
we already account for the NULL and flags_write()
returns the correct value upon success.
So I shall go ahead and make that change to MAX_FLAG_OPT_SIZE as 3 and 
increase it in subsequent patch for the apic interrupts.

Thanks,
-Aravind.


>>   
>> -DEFINE_SIMPLE_ATTRIBUTE(flags_fops, flags_get, flags_set, "%llu\n");
>> +static ssize_t flags_write(struct file *filp, const char __user *ubuf,
>> +			   size_t cnt, loff_t *ppos)
>> +{
>> +	char buf[MAX_FLAG_OPT_SIZE + 1];
>> +	int err;
>> +	size_t ret;
>> +
>> +	ret = cnt;
> You're assigning cnt to ret here...
>
>> +
>> +	if (cnt > MAX_FLAG_OPT_SIZE)
>> +		cnt = MAX_FLAG_OPT_SIZE;
> ... but correcting cnt afterwards. The assignment should be *after* that
> correction.
>
>> +
>> +	if (copy_from_user(&buf, ubuf, cnt))
>> +		return -EFAULT;
>> +
>> +	buf[cnt] = 0;
>> +
>> +	/* strip whitespaces.. */
>> +	strstrip(buf);
>> +
>> +	err = __set_inj(buf, cnt - 1);
>> +	if (err) {
>> +		pr_err("%s: Invalid flags value: %s\n", __func__, buf);
>> +		return err;
>> +	}
>> +
>> +	*ppos += ret;
>> +
>> +	return ret;
>> +}
>> +
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2015-06-01 19:17 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-27 19:03 [PATCH 0/6] Updates to EDAC mce_amd_inj Aravind Gopalakrishnan
2015-05-27 19:03 ` [PATCH 1/6] edac, mce_amd_inj: Use MCE_INJECT_GET for bank Aravind Gopalakrishnan
2015-05-27 19:03 ` [PATCH 2/6] edac, mce_amd_inj: Rework sanity check for inj_bank_set Aravind Gopalakrishnan
2015-05-27 19:03 ` [PATCH 3/6] edac, mce_amd_inj: Modify flags attrigute to use string arguments Aravind Gopalakrishnan
2015-05-29 13:49   ` Borislav Petkov
2015-05-29 18:20     ` Aravind Gopalakrishnan
2015-05-29 19:05       ` Borislav Petkov
2015-05-29 19:12         ` Aravind Gopalakrishnan
2015-06-01 19:16     ` Aravind Gopalakrishnan
2015-05-27 19:03 ` [PATCH 4/6] edac, mce_amd_inj: Add capability to trigger apic interrupts Aravind Gopalakrishnan
2015-05-29 15:36   ` Borislav Petkov
2015-05-29 18:27     ` Aravind Gopalakrishnan
2015-05-29 19:18       ` Borislav Petkov
2015-05-27 19:03 ` [PATCH 5/6] edac, mce_amd_inj: Add README file Aravind Gopalakrishnan
2015-05-29 15:38   ` Borislav Petkov
2015-05-29 18:29     ` Aravind Gopalakrishnan
2015-05-27 19:03 ` [PATCH 6/6] edac, mce_amd_inj: Inject errors on NBC for bank 4 errors Aravind Gopalakrishnan
2015-05-29 16:00   ` Borislav Petkov
2015-05-29 18:52     ` Aravind Gopalakrishnan
2015-05-29 19:23       ` Borislav Petkov
2015-05-29 19:27         ` Aravind Gopalakrishnan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).