linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Call memory_failure() on Deferred errors
@ 2017-03-20 20:26 Yazen Ghannam
  2017-03-20 20:26 ` [PATCH v2 1/4] EDAC,mce_amd: Find node ID on SMCA systems using generic methods Yazen Ghannam
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Yazen Ghannam @ 2017-03-20 20:26 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, Tony Luck, Borislav Petkov, x86, linux-kernel

From: Yazen Ghannam <yazen.ghannam@amd.com>

This set is based on an earlier 3 patch set.

Patch 1:
- Address comments by using cpu_to_node() when finding a node ID rather
than amd_get_nb_id().
Link: http://lkml.kernel.org/r/1486760120-60944-1-git-send-email-Yazen.Ghannam@amd.com

Patch 2:
- Fix up commit message.
Link: http://lkml.kernel.org/r/1486760120-60944-2-git-send-email-Yazen.Ghannam@amd.com

Patches 3 and 4:
- Redo Patch 3 from old set.
- After looking at the CEC patch I got the idea to use the SRAO notifier
block instead of calling memory_failure() from the AMD Deferred error
interrupt handler.
Link: http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-Yazen.Ghannam@amd.com

Yazen Ghannam (4):
  EDAC,mce_amd: Find node ID on SMCA systems using generic methods
  x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck
  x86/mce/AMD: Mark Deferred errors as Action Optional on SMCA systems
  x86/mce: Add AMD SMCA support to SRAO notifier

 arch/x86/include/asm/mce.h                |  2 ++
 arch/x86/kernel/cpu/mcheck/mce-severity.c |  6 +++-
 arch/x86/kernel/cpu/mcheck/mce.c          | 52 ++++++++++++++++++++++++-------
 arch/x86/kernel/cpu/mcheck/mce_amd.c      | 25 +++++++++++++++
 drivers/edac/amd64_edac.c                 | 20 +-----------
 drivers/edac/mce_amd.c                    |  2 +-
 6 files changed, 74 insertions(+), 33 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2 1/4] EDAC,mce_amd: Find node ID on SMCA systems using generic methods
  2017-03-20 20:26 [PATCH v2 0/4] Call memory_failure() on Deferred errors Yazen Ghannam
@ 2017-03-20 20:26 ` Yazen Ghannam
  2017-03-20 20:26 ` [PATCH v2 2/4] x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck Yazen Ghannam
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Yazen Ghannam @ 2017-03-20 20:26 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, Tony Luck, Borislav Petkov, x86, linux-kernel

From: Yazen Ghannam <yazen.ghannam@amd.com>

We should move away from using AMD-specific amd_get_nb_id() to find a node
ID and move toward using generic Linux methods. We can use cpu_to_node()
since NUMA should be working as expected on newly released Fam17h systems.

Replace call to amd_get_nb_id() and related shifting by a call to
cpu_to_node() which works as expected.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link: http://lkml.kernel.org/r/1486760120-60944-1-git-send-email-Yazen.Ghannam@amd.com

v1->v2:
- Just use cpu_to_node() instead of trying to make amd_get_nb_id() work.

 drivers/edac/mce_amd.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index ba35b7e..dbccf40 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -878,12 +878,8 @@ static void decode_smca_errors(struct mce *m)
 		pr_cont("%s.\n", smca_mce_descs[bank_type].descs[xec]);
 	}
 
-	/*
-	 * amd_get_nb_id() returns the last level cache id.
-	 * The last level cache on Fam17h is 1 level below the node.
-	 */
 	if (bank_type == SMCA_UMC && xec == 0 && decode_dram_ecc)
-		decode_dram_ecc(amd_get_nb_id(m->extcpu) >> 1, m);
+		decode_dram_ecc(cpu_to_node(m->extcpu), m);
 }
 
 static inline void amd_decode_err_code(u16 ec)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 2/4] x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck
  2017-03-20 20:26 [PATCH v2 0/4] Call memory_failure() on Deferred errors Yazen Ghannam
  2017-03-20 20:26 ` [PATCH v2 1/4] EDAC,mce_amd: Find node ID on SMCA systems using generic methods Yazen Ghannam
@ 2017-03-20 20:26 ` Yazen Ghannam
  2017-03-22 21:16   ` Borislav Petkov
  2017-03-20 20:26 ` [PATCH v2 3/4] x86/mce/AMD: Mark Deferred errors as Action Optional on SMCA systems Yazen Ghannam
  2017-03-20 20:26 ` [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier Yazen Ghannam
  3 siblings, 1 reply; 10+ messages in thread
From: Yazen Ghannam @ 2017-03-20 20:26 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, Tony Luck, Borislav Petkov, x86, linux-kernel

We need to find a UMC's channel number from mcheck code when translating
UMC normalized addresses to system physical addresses. So move the function
there from EDAC.

Also, drop the struct pvt from the function parameters since we don't use
it. And add a sanity check to make sure we're only looking at UMCs in case
the UMC instance IDs ever match up with other bank types.

Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
---
Link: http://lkml.kernel.org/r/1486760120-60944-2-git-send-email-Yazen.Ghannam@amd.com

v1->v2:
- Redo commit message based on comments.
- Add UMC bank type sanity check.

 arch/x86/include/asm/mce.h           |  2 ++
 arch/x86/kernel/cpu/mcheck/mce_amd.c | 21 +++++++++++++++++++++
 drivers/edac/amd64_edac.c            | 20 +-------------------
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index e638736..1ac261c 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -258,9 +258,11 @@ static inline void cmci_recheck(void) {}
 #ifdef CONFIG_X86_MCE_AMD
 void mce_amd_feature_init(struct cpuinfo_x86 *c);
 int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr);
+int find_umc_channel(struct mce *m);
 #else
 static inline void mce_amd_feature_init(struct cpuinfo_x86 *c) { }
 static inline int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr) { return -EINVAL; };
+static inline int find_umc_channel(struct mce *m) { return -EINVAL; };
 #endif
 
 int mce_available(struct cpuinfo_x86 *c);
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 524cc57..10fddcc 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -755,6 +755,27 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
 }
 EXPORT_SYMBOL_GPL(umc_normaddr_to_sysaddr);
 
+/*
+ * To find the UMC channel represented by this bank we need to match on its
+ * instance_id. The instance_id of a bank is held in the lower 32 bits of its
+ * IPID.
+ */
+int find_umc_channel(struct mce *m)
+{
+	u32 umc_instance_id[] = {0x50f00, 0x150f00};
+	u32 instance_id = m->ipid & GENMASK(31, 0);
+	int i, channel = -EINVAL;
+
+	if (smca_banks[m->bank].hwid &&
+	    smca_banks[m->bank].hwid->bank_type == SMCA_UMC)
+		for (i = 0; i < ARRAY_SIZE(umc_instance_id); i++)
+			if (umc_instance_id[i] == instance_id)
+				channel = i;
+
+	return channel;
+}
+EXPORT_SYMBOL_GPL(find_umc_channel);
+
 static void
 __log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc)
 {
diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 82dab16..11f973d 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2447,24 +2447,6 @@ static inline void decode_bus_error(int node_id, struct mce *m)
 	__log_ecc_error(mci, &err, ecc_type);
 }
 
-/*
- * To find the UMC channel represented by this bank we need to match on its
- * instance_id. The instance_id of a bank is held in the lower 32 bits of its
- * IPID.
- */
-static int find_umc_channel(struct amd64_pvt *pvt, struct mce *m)
-{
-	u32 umc_instance_id[] = {0x50f00, 0x150f00};
-	u32 instance_id = m->ipid & GENMASK(31, 0);
-	int i, channel = -1;
-
-	for (i = 0; i < ARRAY_SIZE(umc_instance_id); i++)
-		if (umc_instance_id[i] == instance_id)
-			channel = i;
-
-	return channel;
-}
-
 static void decode_umc_error(int node_id, struct mce *m)
 {
 	u8 ecc_type = (m->status >> 45) & 0x3;
@@ -2484,7 +2466,7 @@ static void decode_umc_error(int node_id, struct mce *m)
 	if (m->status & MCI_STATUS_DEFERRED)
 		ecc_type = 3;
 
-	err.channel = find_umc_channel(pvt, m);
+	err.channel = find_umc_channel(m);
 	if (err.channel < 0) {
 		err.err_code = ERR_CHANNEL;
 		goto log_error;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 3/4] x86/mce/AMD: Mark Deferred errors as Action Optional on SMCA systems
  2017-03-20 20:26 [PATCH v2 0/4] Call memory_failure() on Deferred errors Yazen Ghannam
  2017-03-20 20:26 ` [PATCH v2 1/4] EDAC,mce_amd: Find node ID on SMCA systems using generic methods Yazen Ghannam
  2017-03-20 20:26 ` [PATCH v2 2/4] x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck Yazen Ghannam
@ 2017-03-20 20:26 ` Yazen Ghannam
  2017-03-20 20:26 ` [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier Yazen Ghannam
  3 siblings, 0 replies; 10+ messages in thread
From: Yazen Ghannam @ 2017-03-20 20:26 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, Tony Luck, Borislav Petkov, x86, linux-kernel

From: Yazen Ghannam <yazen.ghannam@amd.com>

Give Deferred errors an Action Optional severity on SMCA systems so that
the SRAO notifier block can potentially handle them.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link: http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-Yazen.Ghannam@amd.com

v1->v2:
- New in v2. Based on v1 patch 3.
- Use mce_severity() in AMD interrupt handlers.
- Set proper severity so we can use notifier chain instead of calling
memory_failure() directly.

 arch/x86/kernel/cpu/mcheck/mce-severity.c | 6 +++++-
 arch/x86/kernel/cpu/mcheck/mce_amd.c      | 4 ++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce-severity.c b/arch/x86/kernel/cpu/mcheck/mce-severity.c
index 87cc9ab..2773c85 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-severity.c
+++ b/arch/x86/kernel/cpu/mcheck/mce-severity.c
@@ -278,8 +278,12 @@ static int mce_severity_amd(struct mce *m, int tolerant, char **msg, bool is_exc
 	 * deferred error: poll handler catches these and adds to mce_ring so
 	 * memory-failure can take recovery actions.
 	 */
-	if (m->status & MCI_STATUS_DEFERRED)
+	if (m->status & MCI_STATUS_DEFERRED) {
+		if (mce_flags.smca)
+			return MCE_AO_SEVERITY;
+
 		return MCE_DEFERRED_SEVERITY;
+	}
 
 	/*
 	 * corrected error: poll handler catches these and passes responsibility
diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
index 10fddcc..743ae31 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
@@ -28,6 +28,8 @@
 #include <asm/msr.h>
 #include <asm/trace/irq_vectors.h>
 
+#include "mce-internal.h"
+
 #define NR_BLOCKS         5
 #define THRESHOLD_MAX     0xFFF
 #define INT_TYPE_APIC     0x00020000
@@ -802,6 +804,8 @@ __log_error(unsigned int bank, bool deferred_err, bool threshold_err, u64 misc)
 	m.bank   = bank;
 	m.tsc	 = rdtsc();
 
+	m.severity = mce_severity(&m, mca_cfg.tolerant, NULL, false);
+
 	if (threshold_err)
 		m.misc = misc;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier
  2017-03-20 20:26 [PATCH v2 0/4] Call memory_failure() on Deferred errors Yazen Ghannam
                   ` (2 preceding siblings ...)
  2017-03-20 20:26 ` [PATCH v2 3/4] x86/mce/AMD: Mark Deferred errors as Action Optional on SMCA systems Yazen Ghannam
@ 2017-03-20 20:26 ` Yazen Ghannam
  2017-03-21 21:47   ` Ghannam, Yazen
  2017-03-22 21:13   ` Borislav Petkov
  3 siblings, 2 replies; 10+ messages in thread
From: Yazen Ghannam @ 2017-03-20 20:26 UTC (permalink / raw)
  To: linux-edac; +Cc: Yazen Ghannam, Tony Luck, Borislav Petkov, x86, linux-kernel

From: Yazen Ghannam <yazen.ghannam@amd.com>

Deferred errors on AMD systems may get an Action Optional severity with the
goal of being handled by the SRAO notifier block. However, the process of
determining if an address is usable is different between Intel and AMD. So
define vendor-specific functions for this.

Also, check for the AO severity before determining if an address is usable
to possibly save some cycles.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link: http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-Yazen.Ghannam@amd.com

v1->v2:
- New in v2. Based on v1 patch 3.
- Update SRAO notifier block to handle errors from SMCA systems.

 arch/x86/kernel/cpu/mcheck/mce.c | 52 ++++++++++++++++++++++++++++++----------
 1 file changed, 40 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 5e365a2..1a2669d 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -547,19 +547,49 @@ static void mce_report_event(struct pt_regs *regs)
  * be somewhat complicated (e.g. segment offset would require an instruction
  * parser). So only support physical addresses up to page granuality for now.
  */
-static int mce_usable_address(struct mce *m)
+static int mce_usable_address_intel(struct mce *m, unsigned long *pfn)
 {
-	if (!(m->status & MCI_STATUS_MISCV) || !(m->status & MCI_STATUS_ADDRV))
+	if (!(m->status & MCI_STATUS_MISCV))
 		return 0;
-
-	/* Checks after this one are Intel-specific: */
-	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
-		return 1;
-
 	if (MCI_MISC_ADDR_LSB(m->misc) > PAGE_SHIFT)
 		return 0;
 	if (MCI_MISC_ADDR_MODE(m->misc) != MCI_MISC_ADDR_PHYS)
 		return 0;
+
+	*pfn = m->addr >> PAGE_SHIFT;
+	return 1;
+}
+
+/* Only support this on SMCA systems and errors logged from a UMC. */
+static int mce_usable_address_amd(struct mce *m, unsigned long *pfn)
+{
+	u8 umc;
+	u16 nid = cpu_to_node(m->extcpu);
+	u64 addr;
+
+	if (!mce_flags.smca)
+		return 0;
+
+	umc = find_umc_channel(m);
+
+	if (umc < 0 || umc_normaddr_to_sysaddr(m->addr, nid, umc, &addr))
+		return 0;
+
+	*pfn = addr >> PAGE_SHIFT;
+	return 1;
+}
+
+static int mce_usable_address(struct mce *m, unsigned long *pfn)
+{
+	if (!(m->status & MCI_STATUS_ADDRV))
+		return 0;
+
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
+		return mce_usable_address_intel(m, pfn);
+
+	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
+		return mce_usable_address_amd(m, pfn);
+
 	return 1;
 }
 
@@ -567,15 +597,13 @@ static int srao_decode_notifier(struct notifier_block *nb, unsigned long val,
 				void *data)
 {
 	struct mce *mce = (struct mce *)data;
-	unsigned long pfn;
+	unsigned long pfn = 0;
 
 	if (!mce)
 		return NOTIFY_DONE;
 
-	if (mce_usable_address(mce) && (mce->severity == MCE_AO_SEVERITY)) {
-		pfn = mce->addr >> PAGE_SHIFT;
+	if ((mce->severity == MCE_AO_SEVERITY) && mce_usable_address(mce, &pfn))
 		memory_failure(pfn, MCE_VECTOR, 0);
-	}
 
 	return NOTIFY_OK;
 }
@@ -750,7 +778,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 		 */
 		if (!(flags & MCP_DONTLOG) && !mca_cfg.dont_log_ce)
 			mce_log(&m);
-		else if (mce_usable_address(&m)) {
+		else if (mce_usable_address(&m, NULL)) {
 			/*
 			 * Although we skipped logging this, we still want
 			 * to take action. Add to the pool so the registered
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* RE: [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier
  2017-03-20 20:26 ` [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier Yazen Ghannam
@ 2017-03-21 21:47   ` Ghannam, Yazen
  2017-03-22 21:13   ` Borislav Petkov
  1 sibling, 0 replies; 10+ messages in thread
From: Ghannam, Yazen @ 2017-03-21 21:47 UTC (permalink / raw)
  To: linux-edac; +Cc: Tony Luck, Borislav Petkov, x86, linux-kernel

> -----Original Message-----
> From: Ghannam, Yazen
> Sent: Monday, March 20, 2017 4:27 PM

[...]

> +/* Only support this on SMCA systems and errors logged from a UMC. */
> +static int mce_usable_address_amd(struct mce *m, unsigned long *pfn) {
> +	u8 umc;

This should be
	int umc;
to match the return value from find_umc_channel() below.

> +	u16 nid = cpu_to_node(m->extcpu);
> +	u64 addr;
> +
> +	if (!mce_flags.smca)
> +		return 0;
> +
> +	umc = find_umc_channel(m);
> +
> +	if (umc < 0 || umc_normaddr_to_sysaddr(m->addr, nid, umc, &addr))
> +		return 0;
> +
> +	*pfn = addr >> PAGE_SHIFT;
> +	return 1;
> +}
> +

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier
  2017-03-20 20:26 ` [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier Yazen Ghannam
  2017-03-21 21:47   ` Ghannam, Yazen
@ 2017-03-22 21:13   ` Borislav Petkov
  2017-03-22 21:40     ` Ghannam, Yazen
  1 sibling, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2017-03-22 21:13 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, Tony Luck, x86, linux-kernel

On Mon, Mar 20, 2017 at 03:26:54PM -0500, Yazen Ghannam wrote:
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Deferred errors on AMD systems may get an Action Optional severity with the
> goal of being handled by the SRAO notifier block. However, the process of
> determining if an address is usable is different between Intel and AMD. So
> define vendor-specific functions for this.
> 
> Also, check for the AO severity before determining if an address is usable
> to possibly save some cycles.
> 
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> ---
> Link: http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-Yazen.Ghannam@amd.com
> 
> v1->v2:
> - New in v2. Based on v1 patch 3.
> - Update SRAO notifier block to handle errors from SMCA systems.
> 
>  arch/x86/kernel/cpu/mcheck/mce.c | 52 ++++++++++++++++++++++++++++++----------
>  1 file changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
> index 5e365a2..1a2669d 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> @@ -547,19 +547,49 @@ static void mce_report_event(struct pt_regs *regs)
>   * be somewhat complicated (e.g. segment offset would require an instruction
>   * parser). So only support physical addresses up to page granuality for now.
>   */
> -static int mce_usable_address(struct mce *m)
> +static int mce_usable_address_intel(struct mce *m, unsigned long *pfn)

So this function is basically saying whether the address is usable but
then it is also returning it if so.

And then it is using an I/O argument. Yuck.

So it sounds to me like this functionality needs redesign: something
like have a get_usable_address() function (the "mce_" prefix is not
really needed as it is static) which returns an invalid value when it
determines that it doesn't have a usable address and the address itself
if it succeeds.

>  {
> -	if (!(m->status & MCI_STATUS_MISCV) || !(m->status & MCI_STATUS_ADDRV))
> +	if (!(m->status & MCI_STATUS_MISCV))
>  		return 0;
> -
> -	/* Checks after this one are Intel-specific: */
> -	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> -		return 1;
> -
>  	if (MCI_MISC_ADDR_LSB(m->misc) > PAGE_SHIFT)
>  		return 0;
>  	if (MCI_MISC_ADDR_MODE(m->misc) != MCI_MISC_ADDR_PHYS)
>  		return 0;
> +
> +	*pfn = m->addr >> PAGE_SHIFT;
> +	return 1;
> +}
> +
> +/* Only support this on SMCA systems and errors logged from a UMC. */
> +static int mce_usable_address_amd(struct mce *m, unsigned long *pfn)
> +{
> +	u8 umc;
> +	u16 nid = cpu_to_node(m->extcpu);
> +	u64 addr;
> +
> +	if (!mce_flags.smca)
> +		return 0;

So on !SMCA systems there'll be no usable address ever! Even with
MCI_STATUS_ADDRV set.

Please *test* your stuff on all affected hardware before submitting.

> +
> +	umc = find_umc_channel(m);
> +
> +	if (umc < 0 || umc_normaddr_to_sysaddr(m->addr, nid, umc, &addr))
> +		return 0;
> +
> +	*pfn = addr >> PAGE_SHIFT;
> +	return 1;
> +}
> +
> +static int mce_usable_address(struct mce *m, unsigned long *pfn)
> +{
> +	if (!(m->status & MCI_STATUS_ADDRV))
> +		return 0;

What happened to the MCI_STATUS_MISCV bit check?

> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> +		return mce_usable_address_intel(m, pfn);
> +
> +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> +		return mce_usable_address_amd(m, pfn);
> +
>  	return 1;

We definitely don't want to say that the address is usable on a third
vendor. It would be most likely a lie even if we never reach this code
on a third vendor.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2 2/4] x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck
  2017-03-20 20:26 ` [PATCH v2 2/4] x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck Yazen Ghannam
@ 2017-03-22 21:16   ` Borislav Petkov
  2017-03-22 21:41     ` Ghannam, Yazen
  0 siblings, 1 reply; 10+ messages in thread
From: Borislav Petkov @ 2017-03-22 21:16 UTC (permalink / raw)
  To: Yazen Ghannam; +Cc: linux-edac, Tony Luck, x86, linux-kernel

On Mon, Mar 20, 2017 at 03:26:52PM -0500, Yazen Ghannam wrote:
> We need to find a UMC's channel number from mcheck code when translating
> UMC normalized addresses to system physical addresses. So move the function
> there from EDAC.
> 
> Also, drop the struct pvt from the function parameters since we don't use
> it. And add a sanity check to make sure we're only looking at UMCs in case
> the UMC instance IDs ever match up with other bank types.
> 
> Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
> ---

...

> diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> index 524cc57..10fddcc 100644
> --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> @@ -755,6 +755,27 @@ int umc_normaddr_to_sysaddr(u64 norm_addr, u16 nid, u8 umc, u64 *sys_addr)
>  }
>  EXPORT_SYMBOL_GPL(umc_normaddr_to_sysaddr);
>  
> +/*
> + * To find the UMC channel represented by this bank we need to match on its
> + * instance_id. The instance_id of a bank is held in the lower 32 bits of its
> + * IPID.
> + */
> +int find_umc_channel(struct mce *m)
> +{
> +	u32 umc_instance_id[] = {0x50f00, 0x150f00};
> +	u32 instance_id = m->ipid & GENMASK(31, 0);
> +	int i, channel = -EINVAL;
> +
> +	if (smca_banks[m->bank].hwid &&
> +	    smca_banks[m->bank].hwid->bank_type == SMCA_UMC)
> +		for (i = 0; i < ARRAY_SIZE(umc_instance_id); i++)
> +			if (umc_instance_id[i] == instance_id)
> +				channel = i;
> +
> +	return channel;
> +}
> +EXPORT_SYMBOL_GPL(find_umc_channel);

This is an exported function now so it should have a prefix. Maybe
"amd_find_umc_channel" or so.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier
  2017-03-22 21:13   ` Borislav Petkov
@ 2017-03-22 21:40     ` Ghannam, Yazen
  0 siblings, 0 replies; 10+ messages in thread
From: Ghannam, Yazen @ 2017-03-22 21:40 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, Tony Luck, x86, linux-kernel

> -----Original Message-----
> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Wednesday, March 22, 2017 5:13 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Tony Luck <tony.luck@intel.com>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO
> notifier
> 
> On Mon, Mar 20, 2017 at 03:26:54PM -0500, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> >
> > Deferred errors on AMD systems may get an Action Optional severity
> > with the goal of being handled by the SRAO notifier block. However,
> > the process of determining if an address is usable is different
> > between Intel and AMD. So define vendor-specific functions for this.
> >
> > Also, check for the AO severity before determining if an address is
> > usable to possibly save some cycles.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> > ---
> > Link:
> > http://lkml.kernel.org/r/1486760120-60944-3-git-send-email-Yazen.Ghann
> > am@amd.com
> >
> > v1->v2:
> > - New in v2. Based on v1 patch 3.
> > - Update SRAO notifier block to handle errors from SMCA systems.
> >
> >  arch/x86/kernel/cpu/mcheck/mce.c | 52
> > ++++++++++++++++++++++++++++++----------
> >  1 file changed, 40 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce.c
> > b/arch/x86/kernel/cpu/mcheck/mce.c
> > index 5e365a2..1a2669d 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce.c
> > @@ -547,19 +547,49 @@ static void mce_report_event(struct pt_regs
> *regs)
> >   * be somewhat complicated (e.g. segment offset would require an
> instruction
> >   * parser). So only support physical addresses up to page granuality for
> now.
> >   */
> > -static int mce_usable_address(struct mce *m)
> > +static int mce_usable_address_intel(struct mce *m, unsigned long
> > +*pfn)
> 
> So this function is basically saying whether the address is usable but then it is
> also returning it if so.
> 
> And then it is using an I/O argument. Yuck.
> 
> So it sounds to me like this functionality needs redesign: something like have a
> get_usable_address() function (the "mce_" prefix is not really needed as it is
> static) which returns an invalid value when it determines that it doesn't have a
> usable address and the address itself if it succeeds.
> 

Okay, I'll redo it.

> >  {
> > -	if (!(m->status & MCI_STATUS_MISCV) || !(m->status &
> MCI_STATUS_ADDRV))
> > +	if (!(m->status & MCI_STATUS_MISCV))
> >  		return 0;
> > -
> > -	/* Checks after this one are Intel-specific: */
> > -	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
> > -		return 1;
> > -
> >  	if (MCI_MISC_ADDR_LSB(m->misc) > PAGE_SHIFT)
> >  		return 0;
> >  	if (MCI_MISC_ADDR_MODE(m->misc) != MCI_MISC_ADDR_PHYS)
> >  		return 0;
> > +
> > +	*pfn = m->addr >> PAGE_SHIFT;
> > +	return 1;
> > +}
> > +
> > +/* Only support this on SMCA systems and errors logged from a UMC. */
> > +static int mce_usable_address_amd(struct mce *m, unsigned long *pfn)
> > +{
> > +	u8 umc;
> > +	u16 nid = cpu_to_node(m->extcpu);
> > +	u64 addr;
> > +
> > +	if (!mce_flags.smca)
> > +		return 0;
> 
> So on !SMCA systems there'll be no usable address ever! Even with
> MCI_STATUS_ADDRV set.
> 
> Please *test* your stuff on all affected hardware before submitting.
> 

I was thinking of this for use with the SRAO notifier. But since this can be
used in other places then the SMCA check should be grouped with the
code below.

> > +
> > +	umc = find_umc_channel(m);
> > +
> > +	if (umc < 0 || umc_normaddr_to_sysaddr(m->addr, nid, umc, &addr))
> > +		return 0;
> > +
> > +	*pfn = addr >> PAGE_SHIFT;
> > +	return 1;
> > +}
> > +
> > +static int mce_usable_address(struct mce *m, unsigned long *pfn) {
> > +	if (!(m->status & MCI_STATUS_ADDRV))
> > +		return 0;
> 
> What happened to the MCI_STATUS_MISCV bit check?
> 

It was moved into the Intel function. It's not necessary on AMD.

> > +	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)
> > +		return mce_usable_address_intel(m, pfn);
> > +
> > +	if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD)
> > +		return mce_usable_address_amd(m, pfn);
> > +
> >  	return 1;
> 
> We definitely don't want to say that the address is usable on a third vendor. It
> would be most likely a lie even if we never reach this code on a third vendor.
> 

Okay.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH v2 2/4] x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck
  2017-03-22 21:16   ` Borislav Petkov
@ 2017-03-22 21:41     ` Ghannam, Yazen
  0 siblings, 0 replies; 10+ messages in thread
From: Ghannam, Yazen @ 2017-03-22 21:41 UTC (permalink / raw)
  To: Borislav Petkov; +Cc: linux-edac, Tony Luck, x86, linux-kernel

> -----Original Message-----
> From: Borislav Petkov [mailto:bp@alien8.de]
> Sent: Wednesday, March 22, 2017 5:17 PM
> To: Ghannam, Yazen <Yazen.Ghannam@amd.com>
> Cc: linux-edac@vger.kernel.org; Tony Luck <tony.luck@intel.com>;
> x86@kernel.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH v2 2/4] x86/mce/AMD; EDAC,amd64: Move
> find_umc_channel() to AMD mcheck
> 
> On Mon, Mar 20, 2017 at 03:26:52PM -0500, Yazen Ghannam wrote:
> > We need to find a UMC's channel number from mcheck code when
> > translating UMC normalized addresses to system physical addresses. So
> > move the function there from EDAC.
> >
> > Also, drop the struct pvt from the function parameters since we don't
> > use it. And add a sanity check to make sure we're only looking at UMCs
> > in case the UMC instance IDs ever match up with other bank types.
> >
> > Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
> > ---
> 
> ...
> 
> > diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > index 524cc57..10fddcc 100644
> > --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c
> > @@ -755,6 +755,27 @@ int umc_normaddr_to_sysaddr(u64 norm_addr,
> u16
> > nid, u8 umc, u64 *sys_addr)  }
> > EXPORT_SYMBOL_GPL(umc_normaddr_to_sysaddr);
> >
> > +/*
> > + * To find the UMC channel represented by this bank we need to match
> > +on its
> > + * instance_id. The instance_id of a bank is held in the lower 32
> > +bits of its
> > + * IPID.
> > + */
> > +int find_umc_channel(struct mce *m)
> > +{
> > +	u32 umc_instance_id[] = {0x50f00, 0x150f00};
> > +	u32 instance_id = m->ipid & GENMASK(31, 0);
> > +	int i, channel = -EINVAL;
> > +
> > +	if (smca_banks[m->bank].hwid &&
> > +	    smca_banks[m->bank].hwid->bank_type == SMCA_UMC)
> > +		for (i = 0; i < ARRAY_SIZE(umc_instance_id); i++)
> > +			if (umc_instance_id[i] == instance_id)
> > +				channel = i;
> > +
> > +	return channel;
> > +}
> > +EXPORT_SYMBOL_GPL(find_umc_channel);
> 
> This is an exported function now so it should have a prefix. Maybe
> "amd_find_umc_channel" or so.
> 

Okay, will do.

Thanks,
Yazen

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-03-22 21:42 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-20 20:26 [PATCH v2 0/4] Call memory_failure() on Deferred errors Yazen Ghannam
2017-03-20 20:26 ` [PATCH v2 1/4] EDAC,mce_amd: Find node ID on SMCA systems using generic methods Yazen Ghannam
2017-03-20 20:26 ` [PATCH v2 2/4] x86/mce/AMD; EDAC,amd64: Move find_umc_channel() to AMD mcheck Yazen Ghannam
2017-03-22 21:16   ` Borislav Petkov
2017-03-22 21:41     ` Ghannam, Yazen
2017-03-20 20:26 ` [PATCH v2 3/4] x86/mce/AMD: Mark Deferred errors as Action Optional on SMCA systems Yazen Ghannam
2017-03-20 20:26 ` [PATCH v2 4/4] x86/mce: Add AMD SMCA support to SRAO notifier Yazen Ghannam
2017-03-21 21:47   ` Ghannam, Yazen
2017-03-22 21:13   ` Borislav Petkov
2017-03-22 21:40     ` Ghannam, Yazen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).