linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/3] Firmware first mode for corrected errors
@ 2013-07-01 15:38 Naveen N. Rao
  2013-07-01 15:38 ` [PATCH v3 1/3] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC Naveen N. Rao
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-01 15:38 UTC (permalink / raw)
  To: tony.luck, bp; +Cc: ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

Hi Tony,
Here is the updated patchset which addresses all previous review comments. For
the third patch, I have also moved the soft-offline section in ghes.c into the
#ifdef block for APEI_MEMORY_FAILURE.

If there are no further comments, can you please queue this up for 3.11?


Thanks,
Naveen

---

Naveen N. Rao (3):
      mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
      mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
      mce, acpi/apei: Soft-offline a page on firmware GHES notification


 Documentation/x86/x86_64/boot-options.txt |    5 +++
 arch/x86/include/asm/acpi.h               |    2 +
 arch/x86/include/asm/mce.h                |    3 ++
 arch/x86/kernel/acpi/boot.c               |    5 +++
 arch/x86/kernel/cpu/mcheck/mce-internal.h |    3 ++
 arch/x86/kernel/cpu/mcheck/mce.c          |   28 +++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce_intel.c    |   42 ++++++++++++++++++-----
 drivers/acpi/apei/ghes.c                  |   12 +++++++
 drivers/acpi/apei/hest.c                  |   38 +++++++++++++++++++++
 include/linux/mm.h                        |    1 +
 mm/memory-failure.c                       |   53 +++++++++++++++++++----------
 11 files changed, 164 insertions(+), 28 deletions(-)

-- 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v3 1/3] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
  2013-07-01 15:38 [PATCH v3 0/3] Firmware first mode for corrected errors Naveen N. Rao
@ 2013-07-01 15:38 ` Naveen N. Rao
  2013-07-01 22:45   ` Borislav Petkov
  2013-07-01 15:38 ` [PATCH v3 2/3] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors Naveen N. Rao
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-01 15:38 UTC (permalink / raw)
  To: tony.luck, bp; +Cc: ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

The Corrected Machine Check structure (CMC) in HEST has a flag which can be
set by the firmware to indicate to the OS that it prefers to process the
corrected error events first. In this scenario, the OS is expected to not
monitor for corrected errors (through CMCI/polling). Instead, the firmware
notifies the OS on corrected error events through GHES.

Linux already has support for GHES. This patch adds support for parsing CMC
structure and to disable CMCI/polling if the firmware first flag is set.

Further, the list of machine check bank structures at the end of CMC is used
to determine which MCA banks function in FF mode, so that we continue to
monitor error events on the other banks.


Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 arch/x86/include/asm/mce.h                |    3 ++
 arch/x86/kernel/cpu/mcheck/mce-internal.h |    3 ++
 arch/x86/kernel/cpu/mcheck/mce.c          |   28 +++++++++++++++++++
 arch/x86/kernel/cpu/mcheck/mce_intel.c    |   42 ++++++++++++++++++++++-------
 drivers/acpi/apei/hest.c                  |   37 ++++++++++++++++++++++++++
 5 files changed, 103 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6b52980..680480a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -188,6 +188,9 @@ extern void register_mce_write_callback(ssize_t (*)(struct file *filp,
 				    const char __user *ubuf,
 				    size_t usize, loff_t *off));
 
+/* Disable CMCI/polling for MCA bank claimed by firmware */
+extern void mce_disable_bank(int bank);
+
 /*
  * Exception handler
  */
diff --git a/arch/x86/kernel/cpu/mcheck/mce-internal.h b/arch/x86/kernel/cpu/mcheck/mce-internal.h
index 5b7d4fa..09edd0b 100644
--- a/arch/x86/kernel/cpu/mcheck/mce-internal.h
+++ b/arch/x86/kernel/cpu/mcheck/mce-internal.h
@@ -25,15 +25,18 @@ int mce_severity(struct mce *a, int tolerant, char **msg);
 struct dentry *mce_get_debugfs_dir(void);
 
 extern struct mce_bank *mce_banks;
+extern mce_banks_t mce_banks_ce_disabled;
 
 #ifdef CONFIG_X86_MCE_INTEL
 unsigned long mce_intel_adjust_timer(unsigned long interval);
 void mce_intel_cmci_poll(void);
 void mce_intel_hcpu_update(unsigned long cpu);
+void cmci_disable_bank(int bank);
 #else
 # define mce_intel_adjust_timer mce_adjust_timer_default
 static inline void mce_intel_cmci_poll(void) { }
 static inline void mce_intel_hcpu_update(unsigned long cpu) { }
+static inline void cmci_disable_bank(int bank) { }
 #endif
 
 void mce_timer_kick(unsigned long interval);
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c
index 9239504..e432213 100644
--- a/arch/x86/kernel/cpu/mcheck/mce.c
+++ b/arch/x86/kernel/cpu/mcheck/mce.c
@@ -94,6 +94,15 @@ DEFINE_PER_CPU(mce_banks_t, mce_poll_banks) = {
 	[0 ... BITS_TO_LONGS(MAX_NR_BANKS)-1] = ~0UL
 };
 
+/*
+ * MCA banks controlled through firmware first for corrected errors.
+ * This is a global list of banks for which we won't enable CMCI and we
+ * won't poll. Firmware controls these banks and is responsible for
+ * reporting corrected errors through GHES. Uncorrected/recoverable
+ * errors are still notified through a machine check.
+ */
+mce_banks_t mce_banks_ce_disabled;
+
 static DEFINE_PER_CPU(struct work_struct, mce_work);
 
 static void (*quirk_no_way_out)(int bank, struct mce *m, struct pt_regs *regs);
@@ -1932,6 +1941,25 @@ static struct miscdevice mce_chrdev_device = {
 	&mce_chrdev_ops,
 };
 
+static void __mce_disable_bank(void *arg)
+{
+	int bank = *((int *)arg);
+	__clear_bit(bank, __get_cpu_var(mce_poll_banks));
+	cmci_disable_bank(bank);
+}
+
+void mce_disable_bank(int bank)
+{
+	if (bank >= mca_cfg.banks) {
+		pr_warning(FW_BUG 
+			"Ignoring request to disable invalid MCA bank %d.\n",
+			bank);
+		return;
+	}
+	set_bit(bank, mce_banks_ce_disabled);
+	on_each_cpu(__mce_disable_bank, &bank, 1);
+}
+
 /*
  * mce=off Disables machine check
  * mce=no_cmci Disables CMCI
diff --git a/arch/x86/kernel/cpu/mcheck/mce_intel.c b/arch/x86/kernel/cpu/mcheck/mce_intel.c
index ae1697c..488eae3 100644
--- a/arch/x86/kernel/cpu/mcheck/mce_intel.c
+++ b/arch/x86/kernel/cpu/mcheck/mce_intel.c
@@ -191,6 +191,10 @@ static void cmci_discover(int banks)
 		if (test_bit(i, owned))
 			continue;
 
+		/* Skip banks in firmware first mode */
+		if (test_bit(i, mce_banks_ce_disabled))
+			continue;
+
 		rdmsrl(MSR_IA32_MCx_CTL2(i), val);
 
 		/* Already owned by someone else? */
@@ -259,6 +263,19 @@ void cmci_recheck(void)
 	local_irq_restore(flags);
 }
 
+/* Caller must hold the lock on cmci_discover_lock */
+static void __cmci_disable_bank(int bank)
+{
+	u64 val;
+
+	if (!test_bit(bank, __get_cpu_var(mce_banks_owned)))
+		return;
+	rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	val &= ~MCI_CTL2_CMCI_EN;
+	wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	__clear_bit(bank, __get_cpu_var(mce_banks_owned));
+}
+
 /*
  * Disable CMCI on this CPU for all banks it owns when it goes down.
  * This allows other CPUs to claim the banks on rediscovery.
@@ -268,20 +285,12 @@ void cmci_clear(void)
 	unsigned long flags;
 	int i;
 	int banks;
-	u64 val;
 
 	if (!cmci_supported(&banks))
 		return;
 	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
-	for (i = 0; i < banks; i++) {
-		if (!test_bit(i, __get_cpu_var(mce_banks_owned)))
-			continue;
-		/* Disable CMCI */
-		rdmsrl(MSR_IA32_MCx_CTL2(i), val);
-		val &= ~MCI_CTL2_CMCI_EN;
-		wrmsrl(MSR_IA32_MCx_CTL2(i), val);
-		__clear_bit(i, __get_cpu_var(mce_banks_owned));
-	}
+	for (i = 0; i < banks; i++)
+		__cmci_disable_bank(i);
 	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
 }
 
@@ -315,6 +324,19 @@ void cmci_reenable(void)
 		cmci_discover(banks);
 }
 
+void cmci_disable_bank(int bank)
+{
+	int banks;
+	unsigned long flags;
+
+	if (!cmci_supported(&banks))
+		return;
+
+	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+	__cmci_disable_bank(bank);
+	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+}
+
 static void intel_init_cmci(void)
 {
 	int banks;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index f5ef5d5..b108b11 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -36,6 +36,7 @@
 #include <linux/io.h>
 #include <linux/platform_device.h>
 #include <acpi/apei.h>
+#include <asm/mce.h>
 
 #include "apei-internal.h"
 
@@ -121,6 +122,40 @@ int apei_hest_parse(apei_hest_func_t func, void *data)
 }
 EXPORT_SYMBOL_GPL(apei_hest_parse);
 
+/*
+ * Check if firmware advertises firmware first mode. We need FF bit to be set
+ * along with a set of MC banks which work in FF mode.
+ */
+static int __init hest_parse_cmc(struct acpi_hest_header *hest_hdr, void *data)
+{
+	int i;
+	struct acpi_hest_ia_corrected *cmc;
+	struct acpi_hest_ia_error_bank *mc_bank;
+
+	if (hest_hdr->type != ACPI_HEST_TYPE_IA32_CORRECTED_CHECK)
+		return 0;
+
+	cmc = (struct acpi_hest_ia_corrected *)hest_hdr;
+	if (!cmc->enabled)
+		return 0;
+
+	/*
+	 * We expect HEST to provide a list of MC banks that report errors
+	 * in firmware first mode. Otherwise, return non-zero value to
+	 * indicate that we are done parsing HEST.
+	 */
+	if (!(cmc->flags & ACPI_HEST_FIRMWARE_FIRST) || !cmc->num_hardware_banks)
+		return 1;
+
+	pr_info(HEST_PFX "Enabling Firmware First mode for corrected errors.\n");
+
+	mc_bank = (struct acpi_hest_ia_error_bank *)(cmc + 1);
+	for (i = 0; i < cmc->num_hardware_banks; i++, mc_bank++)
+		mce_disable_bank(mc_bank->bank_number);
+
+	return 1;
+}
+
 struct ghes_arr {
 	struct platform_device **ghes_devs;
 	unsigned int count;
@@ -227,6 +262,8 @@ void __init acpi_hest_init(void)
 		goto err;
 	}
 
+	apei_hest_parse(hest_parse_cmc, NULL);
+
 	if (!ghes_disable) {
 		rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count);
 		if (rc)


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 2/3] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
  2013-07-01 15:38 [PATCH v3 0/3] Firmware first mode for corrected errors Naveen N. Rao
  2013-07-01 15:38 ` [PATCH v3 1/3] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC Naveen N. Rao
@ 2013-07-01 15:38 ` Naveen N. Rao
  2013-07-01 22:49   ` Borislav Petkov
  2013-07-01 15:38 ` [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification Naveen N. Rao
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-01 15:38 UTC (permalink / raw)
  To: tony.luck, bp; +Cc: ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

Add a boot option to disable firmware first mode for corrected errors.


Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 Documentation/x86/x86_64/boot-options.txt |    5 +++++
 arch/x86/include/asm/acpi.h               |    2 ++
 arch/x86/kernel/acpi/boot.c               |    5 +++++
 drivers/acpi/apei/hest.c                  |    3 ++-
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index e9e8ddb..1228b22 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -176,6 +176,11 @@ ACPI
 
   acpi=noirq	Don't route interrupts
 
+  acpi=nocmcff	Disable firmware first mode for corrected errors. This
+		disables parsing the HEST CMC error source to check if
+		firmware has set the FF flag. This may result in
+		duplicate corrected error reports.
+
 PCI
 
   pci=off		Don't use PCI
diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index b31bf97..42db2b8 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -86,6 +86,7 @@ extern int acpi_pci_disabled;
 extern int acpi_skip_timer_override;
 extern int acpi_use_timer_override;
 extern int acpi_fix_pin2_polarity;
+extern int acpi_disable_cmcff;
 
 extern u8 acpi_sci_flags;
 extern int acpi_sci_override_gsi;
@@ -168,6 +169,7 @@ static inline void arch_acpi_set_pdc_bits(u32 *buf)
 
 #define acpi_lapic 0
 #define acpi_ioapic 0
+#define acpi_disable_cmcff 0
 static inline void acpi_noirq_set(void) { }
 static inline void acpi_disable_pci(void) { }
 static inline void disable_acpi(void) { }
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 230c8ea..d1998d5 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -66,6 +66,7 @@ EXPORT_SYMBOL(acpi_pci_disabled);
 int acpi_lapic;
 int acpi_ioapic;
 int acpi_strict;
+int acpi_disable_cmcff;
 
 u8 acpi_sci_flags __initdata;
 int acpi_sci_override_gsi __initdata;
@@ -1619,6 +1620,10 @@ static int __init parse_acpi(char *arg)
 	/* "acpi=copy_dsdt" copys DSDT */
 	else if (strcmp(arg, "copy_dsdt") == 0) {
 		acpi_gbl_copy_dsdt_locally = 1;
+	}
+	/* "acpi=nocmcff" disables FF mode for corrected errors */
+	else if (strcmp(arg, "nocmcff") == 0) {
+		acpi_disable_cmcff = 1;
 	} else {
 		/* Core will printk when we return error. */
 		return -EINVAL;
diff --git a/drivers/acpi/apei/hest.c b/drivers/acpi/apei/hest.c
index b108b11..5020245 100644
--- a/drivers/acpi/apei/hest.c
+++ b/drivers/acpi/apei/hest.c
@@ -262,7 +262,8 @@ void __init acpi_hest_init(void)
 		goto err;
 	}
 
-	apei_hest_parse(hest_parse_cmc, NULL);
+	if (!acpi_disable_cmcff)
+		apei_hest_parse(hest_parse_cmc, NULL);
 
 	if (!ghes_disable) {
 		rc = apei_hest_parse(hest_parse_ghes_count, &ghes_count);


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-01 15:38 [PATCH v3 0/3] Firmware first mode for corrected errors Naveen N. Rao
  2013-07-01 15:38 ` [PATCH v3 1/3] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC Naveen N. Rao
  2013-07-01 15:38 ` [PATCH v3 2/3] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors Naveen N. Rao
@ 2013-07-01 15:38 ` Naveen N. Rao
  2013-07-01 23:08   ` Borislav Petkov
  2013-07-01 20:08 ` [PATCH v3 0/3] Firmware first mode for corrected errors Borislav Petkov
  2013-07-02 12:54 ` [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report Naveen N. Rao
  4 siblings, 1 reply; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-01 15:38 UTC (permalink / raw)
  To: tony.luck, bp; +Cc: ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

If the firmware indicates in GHES error data entry that the error threshold
has exceeded for a corrected error event, then we try to soft-offline the
page. This could be called in interrupt context, so we queue this up similar
to how we handle memory failure scenarios.


Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 drivers/acpi/apei/ghes.c |   12 ++++++++++
 include/linux/mm.h       |    1 +
 mm/memory-failure.c      |   53 ++++++++++++++++++++++++++++++----------------
 3 files changed, 48 insertions(+), 18 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fcd7d91..5a630ed 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -429,6 +429,18 @@ static void ghes_do_proc(struct ghes *ghes,
 						  mem_err);
 #endif
 #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
+			if (sec_sev == GHES_SEV_CORRECTED &&
+			    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
+			    (mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)) {
+				unsigned long pfn;
+				pfn = mem_err->physical_addr >> PAGE_SHIFT;
+				if (pfn_valid(pfn))
+					soft_memory_failure_queue(pfn, 0, 0);
+				else
+					pr_warning(FW_WARN GHES_PFX
+					"Invalid address in generic error data: %#lx\n",
+					mem_err->physical_addr);
+			}
 			if (sev == GHES_SEV_RECOVERABLE &&
 			    sec_sev == GHES_SEV_RECOVERABLE &&
 			    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e0c8528..f9907d2 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1787,6 +1787,7 @@ enum mf_flags {
 };
 extern int memory_failure(unsigned long pfn, int trapno, int flags);
 extern void memory_failure_queue(unsigned long pfn, int trapno, int flags);
+extern void soft_memory_failure_queue(unsigned long pfn, int trapno, int flags);
 extern int unpoison_memory(unsigned long pfn);
 extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ceb0c7f..50caefd 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1222,6 +1222,7 @@ struct memory_failure_entry {
 	unsigned long pfn;
 	int trapno;
 	int flags;
+	bool soft_offline;
 };
 
 struct memory_failure_cpu {
@@ -1233,6 +1234,28 @@ struct memory_failure_cpu {
 
 static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
 
+void __memory_failure_queue(unsigned long pfn, int trapno, int flags, int soft_offline)
+{
+	struct memory_failure_cpu *mf_cpu;
+	unsigned long proc_flags;
+	struct memory_failure_entry entry = {
+		.pfn =		pfn,
+		.trapno =	trapno,
+		.flags =	flags,
+		.soft_offline = soft_offline
+	};
+
+	mf_cpu = &get_cpu_var(memory_failure_cpu);
+	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
+	if (kfifo_put(&mf_cpu->fifo, &entry))
+		schedule_work_on(smp_processor_id(), &mf_cpu->work);
+	else
+		pr_err("Memory failure: buffer overflow when queuing memory failure at 0x%#lx\n",
+		       pfn);
+	spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
+	put_cpu_var(memory_failure_cpu);
+}
+
 /**
  * memory_failure_queue - Schedule handling memory failure of a page.
  * @pfn: Page Number of the corrupted page
@@ -1250,28 +1273,19 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
  *
  * Can run in IRQ context.
  */
+
 void memory_failure_queue(unsigned long pfn, int trapno, int flags)
 {
-	struct memory_failure_cpu *mf_cpu;
-	unsigned long proc_flags;
-	struct memory_failure_entry entry = {
-		.pfn =		pfn,
-		.trapno =	trapno,
-		.flags =	flags,
-	};
-
-	mf_cpu = &get_cpu_var(memory_failure_cpu);
-	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
-	if (kfifo_put(&mf_cpu->fifo, &entry))
-		schedule_work_on(smp_processor_id(), &mf_cpu->work);
-	else
-		pr_err("Memory failure: buffer overflow when queuing memory failure at 0x%#lx\n",
-		       pfn);
-	spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
-	put_cpu_var(memory_failure_cpu);
+	__memory_failure_queue(pfn, trapno, flags, false);
 }
 EXPORT_SYMBOL_GPL(memory_failure_queue);
 
+void soft_memory_failure_queue(unsigned long pfn, int trapno, int flags)
+{
+	__memory_failure_queue(pfn, trapno, flags, true);
+}
+EXPORT_SYMBOL_GPL(soft_memory_failure_queue);
+
 static void memory_failure_work_func(struct work_struct *work)
 {
 	struct memory_failure_cpu *mf_cpu;
@@ -1286,7 +1300,10 @@ static void memory_failure_work_func(struct work_struct *work)
 		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
 		if (!gotten)
 			break;
-		memory_failure(entry.pfn, entry.trapno, entry.flags);
+		if (entry.soft_offline)
+			soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
+		else
+			memory_failure(entry.pfn, entry.trapno, entry.flags);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 0/3] Firmware first mode for corrected errors
  2013-07-01 15:38 [PATCH v3 0/3] Firmware first mode for corrected errors Naveen N. Rao
                   ` (2 preceding siblings ...)
  2013-07-01 15:38 ` [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification Naveen N. Rao
@ 2013-07-01 20:08 ` Borislav Petkov
  2013-07-02 12:54 ` [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report Naveen N. Rao
  4 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2013-07-01 20:08 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On Mon, Jul 01, 2013 at 09:08:42PM +0530, Naveen N. Rao wrote:
> Hi Tony,
> Here is the updated patchset which addresses all previous review comments. For
> the third patch, I have also moved the soft-offline section in ghes.c into the
> #ifdef block for APEI_MEMORY_FAILURE.
> 
> If there are no further comments, can you please queue this up for 3.11?

I think you mean 3.12 - 3.11 merge window is open right now and patches
for it going in need to have spent some quality time in -next first.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 1/3] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC
  2013-07-01 15:38 ` [PATCH v3 1/3] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC Naveen N. Rao
@ 2013-07-01 22:45   ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2013-07-01 22:45 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On Mon, Jul 01, 2013 at 09:08:47PM +0530, Naveen N. Rao wrote:
> The Corrected Machine Check structure (CMC) in HEST has a flag which can be
> set by the firmware to indicate to the OS that it prefers to process the
> corrected error events first. In this scenario, the OS is expected to not
> monitor for corrected errors (through CMCI/polling). Instead, the firmware
> notifies the OS on corrected error events through GHES.
> 
> Linux already has support for GHES. This patch adds support for parsing CMC
> structure and to disable CMCI/polling if the firmware first flag is set.
> 
> Further, the list of machine check bank structures at the end of CMC is used
> to determine which MCA banks function in FF mode, so that we continue to
> monitor error events on the other banks.
> 
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

Acked-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 2/3] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors
  2013-07-01 15:38 ` [PATCH v3 2/3] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors Naveen N. Rao
@ 2013-07-01 22:49   ` Borislav Petkov
  0 siblings, 0 replies; 21+ messages in thread
From: Borislav Petkov @ 2013-07-01 22:49 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On Mon, Jul 01, 2013 at 09:08:54PM +0530, Naveen N. Rao wrote:
> Add a boot option to disable firmware first mode for corrected errors.
> 
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

Acked-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-01 15:38 ` [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification Naveen N. Rao
@ 2013-07-01 23:08   ` Borislav Petkov
  2013-07-02 11:05     ` Naveen N. Rao
  2013-07-02 11:32     ` Naveen N. Rao
  0 siblings, 2 replies; 21+ messages in thread
From: Borislav Petkov @ 2013-07-01 23:08 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On Mon, Jul 01, 2013 at 09:08:59PM +0530, Naveen N. Rao wrote:
> If the firmware indicates in GHES error data entry that the error threshold
> has exceeded for a corrected error event, then we try to soft-offline the
> page. This could be called in interrupt context, so we queue this up similar
> to how we handle memory failure scenarios.
> 
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  drivers/acpi/apei/ghes.c |   12 ++++++++++
>  include/linux/mm.h       |    1 +
>  mm/memory-failure.c      |   53 ++++++++++++++++++++++++++++++----------------
>  3 files changed, 48 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fcd7d91..5a630ed 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -429,6 +429,18 @@ static void ghes_do_proc(struct ghes *ghes,
>  						  mem_err);
>  #endif
>  #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
> +			if (sec_sev == GHES_SEV_CORRECTED &&
> +			    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
> +			    (mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)) {
> +				unsigned long pfn;
> +				pfn = mem_err->physical_addr >> PAGE_SHIFT;
> +				if (pfn_valid(pfn))
> +					soft_memory_failure_queue(pfn, 0, 0);
> +				else
> +					pr_warning(FW_WARN GHES_PFX
> +					"Invalid address in generic error data: %#lx\n",
> +					mem_err->physical_addr);
> +			}

Yuck, this looks like BIOS code.

Can we carve out this into a function and do

void function(.. )
{
#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE

	<code at 1st indentation, much more readable>

#endif
}

so that we can nicely call it from ghes_do_proc()?

>  			if (sev == GHES_SEV_RECOVERABLE &&
>  			    sec_sev == GHES_SEV_RECOVERABLE &&
>  			    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e0c8528..f9907d2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1787,6 +1787,7 @@ enum mf_flags {
>  };
>  extern int memory_failure(unsigned long pfn, int trapno, int flags);
>  extern void memory_failure_queue(unsigned long pfn, int trapno, int flags);
> +extern void soft_memory_failure_queue(unsigned long pfn, int trapno, int flags);
>  extern int unpoison_memory(unsigned long pfn);
>  extern int sysctl_memory_failure_early_kill;
>  extern int sysctl_memory_failure_recovery;
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ceb0c7f..50caefd 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1222,6 +1222,7 @@ struct memory_failure_entry {
>  	unsigned long pfn;
>  	int trapno;
>  	int flags;
> +	bool soft_offline;

Why a new bool? This flags int looks nice above. :)

>  };
>  
>  struct memory_failure_cpu {
> @@ -1233,6 +1234,28 @@ struct memory_failure_cpu {
>  
>  static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
>  
> +void __memory_failure_queue(unsigned long pfn, int trapno, int flags, int soft_offline)
> +{
> +	struct memory_failure_cpu *mf_cpu;
> +	unsigned long proc_flags;
> +	struct memory_failure_entry entry = {
> +		.pfn =		pfn,
> +		.trapno =	trapno,
> +		.flags =	flags,
> +		.soft_offline = soft_offline
> +	};
> +
> +	mf_cpu = &get_cpu_var(memory_failure_cpu);
> +	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
> +	if (kfifo_put(&mf_cpu->fifo, &entry))
> +		schedule_work_on(smp_processor_id(), &mf_cpu->work);
> +	else
> +		pr_err("Memory failure: buffer overflow when queuing memory failure at 0x%#lx\n",
> +		       pfn);
> +	spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
> +	put_cpu_var(memory_failure_cpu);
> +}
> +
>  /**
>   * memory_failure_queue - Schedule handling memory failure of a page.
>   * @pfn: Page Number of the corrupted page
> @@ -1250,28 +1273,19 @@ static DEFINE_PER_CPU(struct memory_failure_cpu, memory_failure_cpu);
>   *
>   * Can run in IRQ context.
>   */
> +
>  void memory_failure_queue(unsigned long pfn, int trapno, int flags)
>  {
> -	struct memory_failure_cpu *mf_cpu;
> -	unsigned long proc_flags;
> -	struct memory_failure_entry entry = {
> -		.pfn =		pfn,
> -		.trapno =	trapno,
> -		.flags =	flags,
> -	};
> -
> -	mf_cpu = &get_cpu_var(memory_failure_cpu);
> -	spin_lock_irqsave(&mf_cpu->lock, proc_flags);
> -	if (kfifo_put(&mf_cpu->fifo, &entry))
> -		schedule_work_on(smp_processor_id(), &mf_cpu->work);
> -	else
> -		pr_err("Memory failure: buffer overflow when queuing memory failure at 0x%#lx\n",
> -		       pfn);
> -	spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
> -	put_cpu_var(memory_failure_cpu);
> +	__memory_failure_queue(pfn, trapno, flags, false);
>  }
>  EXPORT_SYMBOL_GPL(memory_failure_queue);
>  
> +void soft_memory_failure_queue(unsigned long pfn, int trapno, int flags)
> +{
> +	__memory_failure_queue(pfn, trapno, flags, true);

... which can save us this "true" thing there and maybe a bunch of code
churn around here too.

> +}
> +EXPORT_SYMBOL_GPL(soft_memory_failure_queue);
> +
>  static void memory_failure_work_func(struct work_struct *work)
>  {
>  	struct memory_failure_cpu *mf_cpu;
> @@ -1286,7 +1300,10 @@ static void memory_failure_work_func(struct work_struct *work)
>  		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
>  		if (!gotten)
>  			break;
> -		memory_failure(entry.pfn, entry.trapno, entry.flags);
> +		if (entry.soft_offline)
> +			soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
> +		else
> +			memory_failure(entry.pfn, entry.trapno, entry.flags);
>  	}
>  }
>  
> 
> 

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-01 23:08   ` Borislav Petkov
@ 2013-07-02 11:05     ` Naveen N. Rao
  2013-07-02 11:32     ` Naveen N. Rao
  1 sibling, 0 replies; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-02 11:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On 07/02/2013 04:38 AM, Borislav Petkov wrote:
> On Mon, Jul 01, 2013 at 09:08:59PM +0530, Naveen N. Rao wrote:
>> If the firmware indicates in GHES error data entry that the error threshold
>> has exceeded for a corrected error event, then we try to soft-offline the
>> page. This could be called in interrupt context, so we queue this up similar
>> to how we handle memory failure scenarios.
>>
>>
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> ---
>>   drivers/acpi/apei/ghes.c |   12 ++++++++++
>>   include/linux/mm.h       |    1 +
>>   mm/memory-failure.c      |   53 ++++++++++++++++++++++++++++++----------------
>>   3 files changed, 48 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index fcd7d91..5a630ed 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -429,6 +429,18 @@ static void ghes_do_proc(struct ghes *ghes,
>>   						  mem_err);
>>   #endif
>>   #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>> +			if (sec_sev == GHES_SEV_CORRECTED &&
>> +			    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
>> +			    (mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)) {
>> +				unsigned long pfn;
>> +				pfn = mem_err->physical_addr >> PAGE_SHIFT;
>> +				if (pfn_valid(pfn))
>> +					soft_memory_failure_queue(pfn, 0, 0);
>> +				else
>> +					pr_warning(FW_WARN GHES_PFX
>> +					"Invalid address in generic error data: %#lx\n",
>> +					mem_err->physical_addr);
>> +			}
>
> Yuck, this looks like BIOS code.
>
> Can we carve out this into a function and do
>
> void function(.. )
> {
> #ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>
> 	<code at 1st indentation, much more readable>
>
> #endif
> }
>
> so that we can nicely call it from ghes_do_proc()?

Sure.

>
>>   			if (sev == GHES_SEV_RECOVERABLE &&
>>   			    sec_sev == GHES_SEV_RECOVERABLE &&
>>   			    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index e0c8528..f9907d2 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1787,6 +1787,7 @@ enum mf_flags {
>>   };
>>   extern int memory_failure(unsigned long pfn, int trapno, int flags);
>>   extern void memory_failure_queue(unsigned long pfn, int trapno, int flags);
>> +extern void soft_memory_failure_queue(unsigned long pfn, int trapno, int flags);
>>   extern int unpoison_memory(unsigned long pfn);
>>   extern int sysctl_memory_failure_early_kill;
>>   extern int sysctl_memory_failure_recovery;
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index ceb0c7f..50caefd 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1222,6 +1222,7 @@ struct memory_failure_entry {
>>   	unsigned long pfn;
>>   	int trapno;
>>   	int flags;
>> +	bool soft_offline;
>
> Why a new bool? This flags int looks nice above. :)

D'uh! I considered that, but I can't recall why I chose not to use that! 
Let me redo this patch.

Thanks,
Naveen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-01 23:08   ` Borislav Petkov
  2013-07-02 11:05     ` Naveen N. Rao
@ 2013-07-02 11:32     ` Naveen N. Rao
  2013-07-03 14:44       ` Borislav Petkov
  1 sibling, 1 reply; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-02 11:32 UTC (permalink / raw)
  To: tony.luck, bp; +Cc: ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

Here is the updated patch. I also added printk_ratelimit() in line with the
rest of the GHES code.

Thanks,
Naveen

--
If the firmware indicates in GHES error data entry that the error threshold
has exceeded for a corrected error event, then we try to soft-offline the
page. This could be called in interrupt context, so we queue this up similar
to how we handle memory failure scenarios.


Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 drivers/acpi/apei/ghes.c |   38 +++++++++++++++++++++++++++++---------
 include/linux/mm.h       |    1 +
 mm/memory-failure.c      |    5 ++++-
 3 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fcd7d91..74ef688 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -409,6 +409,34 @@ static void ghes_clear_estatus(struct ghes *ghes)
 	ghes->flags &= ~GHES_TO_CLEAR;
 }
 
+static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)
+{
+#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
+	int sec_sev = ghes_severity(gdata->error_severity);
+	struct cper_sec_mem_err *mem_err;
+	mem_err = (struct cper_sec_mem_err *)(gdata+1);
+	if (sec_sev == GHES_SEV_CORRECTED &&
+	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
+	    (mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)) {
+		unsigned long pfn;
+		pfn = mem_err->physical_addr >> PAGE_SHIFT;
+		if (pfn_valid(pfn))
+			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
+		else if (printk_ratelimit())
+			pr_warning(FW_WARN GHES_PFX
+			"Invalid address in generic error data: %#llx\n",
+			mem_err->physical_addr);
+	}
+	if (sev == GHES_SEV_RECOVERABLE &&
+	    sec_sev == GHES_SEV_RECOVERABLE &&
+	    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
+		unsigned long pfn;
+		pfn = mem_err->physical_addr >> PAGE_SHIFT;
+		memory_failure_queue(pfn, 0, 0);
+	}
+#endif
+}
+
 static void ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
@@ -428,15 +456,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
 						  mem_err);
 #endif
-#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
-			if (sev == GHES_SEV_RECOVERABLE &&
-			    sec_sev == GHES_SEV_RECOVERABLE &&
-			    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
-				unsigned long pfn;
-				pfn = mem_err->physical_addr >> PAGE_SHIFT;
-				memory_failure_queue(pfn, 0, 0);
-			}
-#endif
+			ghes_handle_memory_failure(gdata, sev);
 		}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
 		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e0c8528..958e9efd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1784,6 +1784,7 @@ enum mf_flags {
 	MF_COUNT_INCREASED = 1 << 0,
 	MF_ACTION_REQUIRED = 1 << 1,
 	MF_MUST_KILL = 1 << 2,
+	MF_SOFT_OFFLINE = 1 << 3,
 };
 extern int memory_failure(unsigned long pfn, int trapno, int flags);
 extern void memory_failure_queue(unsigned long pfn, int trapno, int flags);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ceb0c7f..0d6717e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1286,7 +1286,10 @@ static void memory_failure_work_func(struct work_struct *work)
 		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
 		if (!gotten)
 			break;
-		memory_failure(entry.pfn, entry.trapno, entry.flags);
+		if (entry.flags & MF_SOFT_OFFLINE)
+			soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
+		else
+			memory_failure(entry.pfn, entry.trapno, entry.flags);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report
  2013-07-01 15:38 [PATCH v3 0/3] Firmware first mode for corrected errors Naveen N. Rao
                   ` (3 preceding siblings ...)
  2013-07-01 20:08 ` [PATCH v3 0/3] Firmware first mode for corrected errors Borislav Petkov
@ 2013-07-02 12:54 ` Naveen N. Rao
  2013-07-03 14:46   ` Borislav Petkov
  4 siblings, 1 reply; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-02 12:54 UTC (permalink / raw)
  To: tony.luck, bp; +Cc: ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

I am adding another patch here to disable page offlining in case the firmware
starts acting up.

Thanks,
Naveen

--

Add a sysctl memory_failure_soft_offline to control what is done on receipt of
firmware ghes notification for a corrected error. By default, kernel tries
to soft-offline the page immediately. If set to 0, no action is taken.

Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 Documentation/sysctl/vm.txt |   12 ++++++++++++
 include/linux/mm.h          |    1 +
 kernel/sysctl.c             |    9 +++++++++
 mm/memory-failure.c         |   10 +++++++---
 4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index dcc75a9..6d0fcba 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -375,6 +375,18 @@ Enable memory failure recovery (when supported by the platform)
 
 ==============================================================
 
+memory_failure_soft_offline
+
+Control soft-offlining of pages on receipt of appropriate firmware error
+report through GHES. Note that this does not affect user-space initiated
+soft-offlining.
+
+1: Attempt soft-offlining.
+
+0: No action.
+
+==============================================================
+
 min_free_kbytes:
 
 This is used to force the Linux VM to keep a minimum number
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 958e9efd..2c16ca4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1791,6 +1791,7 @@ extern void memory_failure_queue(unsigned long pfn, int trapno, int flags);
 extern int unpoison_memory(unsigned long pfn);
 extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
+extern int sysctl_memory_failure_soft_offline;
 extern void shake_page(struct page *p, int access);
 extern atomic_long_t num_poisoned_pages;
 extern int soft_offline_page(struct page *page, int flags);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index b0a1f99..cc4b794 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1427,6 +1427,15 @@ static struct ctl_table vm_table[] = {
 		.extra1		= &zero,
 		.extra2		= &one,
 	},
+	{
+		.procname	= "memory_failure_soft_offline",
+		.data		= &sysctl_memory_failure_soft_offline,
+		.maxlen		= sizeof(sysctl_memory_failure_soft_offline),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &one,
+	},
 #endif
 	{
 		.procname	= "user_reserve_kbytes",
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 0d6717e..ec4851c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -61,6 +61,8 @@ int sysctl_memory_failure_early_kill __read_mostly = 0;
 
 int sysctl_memory_failure_recovery __read_mostly = 1;
 
+int sysctl_memory_failure_soft_offline __read_mostly = 1;
+
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 #if defined(CONFIG_HWPOISON_INJECT) || defined(CONFIG_HWPOISON_INJECT_MODULE)
@@ -1286,9 +1288,11 @@ static void memory_failure_work_func(struct work_struct *work)
 		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
 		if (!gotten)
 			break;
-		if (entry.flags & MF_SOFT_OFFLINE)
-			soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
-		else
+		if (entry.flags & MF_SOFT_OFFLINE) {
+			if (sysctl_memory_failure_soft_offline)
+				soft_offline_page(pfn_to_page(entry.pfn),
+						entry.flags);
+		} else
 			memory_failure(entry.pfn, entry.trapno, entry.flags);
 	}
 }


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-02 11:32     ` Naveen N. Rao
@ 2013-07-03 14:44       ` Borislav Petkov
  2013-07-03 15:40         ` Naveen N. Rao
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2013-07-03 14:44 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On Tue, Jul 02, 2013 at 05:02:48PM +0530, Naveen N. Rao wrote:
> Here is the updated patch. I also added printk_ratelimit() in line with the
> rest of the GHES code.
> 
> Thanks,
> Naveen
> 
> --
> If the firmware indicates in GHES error data entry that the error threshold
> has exceeded for a corrected error event, then we try to soft-offline the
> page. This could be called in interrupt context, so we queue this up similar
> to how we handle memory failure scenarios.
> 
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
> ---
>  drivers/acpi/apei/ghes.c |   38 +++++++++++++++++++++++++++++---------
>  include/linux/mm.h       |    1 +
>  mm/memory-failure.c      |    5 ++++-
>  3 files changed, 34 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index fcd7d91..74ef688 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -409,6 +409,34 @@ static void ghes_clear_estatus(struct ghes *ghes)
>  	ghes->flags &= ~GHES_TO_CLEAR;
>  }
>  
> +static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)
> +{
> +#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
> +	int sec_sev = ghes_severity(gdata->error_severity);
> +	struct cper_sec_mem_err *mem_err;
> +	mem_err = (struct cper_sec_mem_err *)(gdata+1);

A newline here please. Also, spaces around '+'.

> +	if (sec_sev == GHES_SEV_CORRECTED &&
> +	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
> +	    (mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)) {
> +		unsigned long pfn;

This pfn is defined twice, move it up to the beginning of the function.

> +		pfn = mem_err->physical_addr >> PAGE_SHIFT;
> +		if (pfn_valid(pfn))
> +			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
> +		else if (printk_ratelimit())
> +			pr_warning(FW_WARN GHES_PFX

WARNING: Prefer printk_ratelimited or pr_<level>_ratelimited to printk_ratelimit
#35: FILE: drivers/acpi/apei/ghes.c:425:
+               else if (printk_ratelimit())

Please run your patches through checkpatch.pl first.

This requested change will even simplify the code (ontop of your patch):

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 74ef6882bca9..87e11d468f6b 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -422,10 +422,10 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 		pfn = mem_err->physical_addr >> PAGE_SHIFT;
 		if (pfn_valid(pfn))
 			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
-		else if (printk_ratelimit())
-			pr_warning(FW_WARN GHES_PFX
-			"Invalid address in generic error data: %#llx\n",
-			mem_err->physical_addr);
+		else
+			pr_warn_ratelimited(FW_WARN GHES_PFX
+					    "Invalid address in generic error data: %#llx\n",
+					    mem_err->physical_addr);
 	}
 	if (sev == GHES_SEV_RECOVERABLE &&
 	    sec_sev == GHES_SEV_RECOVERABLE &&
---



> +			"Invalid address in generic error data: %#llx\n",
> +			mem_err->physical_addr);
> +	}
> +	if (sev == GHES_SEV_RECOVERABLE &&
> +	    sec_sev == GHES_SEV_RECOVERABLE &&
> +	    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
> +		unsigned long pfn;
> +		pfn = mem_err->physical_addr >> PAGE_SHIFT;
> +		memory_failure_queue(pfn, 0, 0);
> +	}
> +#endif
> +}
> +
>  static void ghes_do_proc(struct ghes *ghes,
>  			 const struct acpi_hest_generic_status *estatus)
>  {
> @@ -428,15 +456,7 @@ static void ghes_do_proc(struct ghes *ghes,
>  			apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
>  						  mem_err);
>  #endif
> -#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
> -			if (sev == GHES_SEV_RECOVERABLE &&
> -			    sec_sev == GHES_SEV_RECOVERABLE &&
> -			    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
> -				unsigned long pfn;
> -				pfn = mem_err->physical_addr >> PAGE_SHIFT;
> -				memory_failure_queue(pfn, 0, 0);
> -			}
> -#endif
> +			ghes_handle_memory_failure(gdata, sev);
>  		}
>  #ifdef CONFIG_ACPI_APEI_PCIEAER
>  		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e0c8528..958e9efd 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1784,6 +1784,7 @@ enum mf_flags {
>  	MF_COUNT_INCREASED = 1 << 0,
>  	MF_ACTION_REQUIRED = 1 << 1,
>  	MF_MUST_KILL = 1 << 2,
> +	MF_SOFT_OFFLINE = 1 << 3,
>  };
>  extern int memory_failure(unsigned long pfn, int trapno, int flags);
>  extern void memory_failure_queue(unsigned long pfn, int trapno, int flags);
> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
> index ceb0c7f..0d6717e 100644
> --- a/mm/memory-failure.c
> +++ b/mm/memory-failure.c
> @@ -1286,7 +1286,10 @@ static void memory_failure_work_func(struct work_struct *work)
>  		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
>  		if (!gotten)
>  			break;
> -		memory_failure(entry.pfn, entry.trapno, entry.flags);
> +		if (entry.flags & MF_SOFT_OFFLINE)
> +			soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
> +		else
> +			memory_failure(entry.pfn, entry.trapno, entry.flags);

The rest looks ok to me.

I'm guessing this has been tested by injecting errors...?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report
  2013-07-02 12:54 ` [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report Naveen N. Rao
@ 2013-07-03 14:46   ` Borislav Petkov
  2013-07-03 15:46     ` Naveen N. Rao
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2013-07-03 14:46 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On Tue, Jul 02, 2013 at 06:24:00PM +0530, Naveen N. Rao wrote:
> I am adding another patch here to disable page offlining in case the firmware
> starts acting up.
> 
> Thanks,
> Naveen
> 
> --
> 
> Add a sysctl memory_failure_soft_offline to control what is done on receipt of
> firmware ghes notification for a corrected error. By default, kernel tries
> to soft-offline the page immediately. If set to 0, no action is taken.

What is the rationale for that? Are we adding it just in case, as a
chicken bit or do you have a specific case?

If the second, we'd love to hear about it in the commit message. :)

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-03 14:44       ` Borislav Petkov
@ 2013-07-03 15:40         ` Naveen N. Rao
  2013-07-08 19:00           ` Tony Luck
  0 siblings, 1 reply; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-03 15:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On 07/03/2013 08:14 PM, Borislav Petkov wrote:
> On Tue, Jul 02, 2013 at 05:02:48PM +0530, Naveen N. Rao wrote:
>> Here is the updated patch. I also added printk_ratelimit() in line with the
>> rest of the GHES code.
>>
>> Thanks,
>> Naveen
>>
>> --
>> If the firmware indicates in GHES error data entry that the error threshold
>> has exceeded for a corrected error event, then we try to soft-offline the
>> page. This could be called in interrupt context, so we queue this up similar
>> to how we handle memory failure scenarios.
>>
>>
>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>> ---
>>   drivers/acpi/apei/ghes.c |   38 +++++++++++++++++++++++++++++---------
>>   include/linux/mm.h       |    1 +
>>   mm/memory-failure.c      |    5 ++++-
>>   3 files changed, 34 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
>> index fcd7d91..74ef688 100644
>> --- a/drivers/acpi/apei/ghes.c
>> +++ b/drivers/acpi/apei/ghes.c
>> @@ -409,6 +409,34 @@ static void ghes_clear_estatus(struct ghes *ghes)
>>   	ghes->flags &= ~GHES_TO_CLEAR;
>>   }
>>
>> +static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)
>> +{
>> +#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>> +	int sec_sev = ghes_severity(gdata->error_severity);
>> +	struct cper_sec_mem_err *mem_err;
>> +	mem_err = (struct cper_sec_mem_err *)(gdata+1);
>
> A newline here please. Also, spaces around '+'.

This was borrowed from existing code in ghes_do_proc(), but yes, let me 
make this change.

>
>> +	if (sec_sev == GHES_SEV_CORRECTED &&
>> +	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
>> +	    (mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)) {
>> +		unsigned long pfn;
>
> This pfn is defined twice, move it up to the beginning of the function.

Ok.

>
>> +		pfn = mem_err->physical_addr >> PAGE_SHIFT;
>> +		if (pfn_valid(pfn))
>> +			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
>> +		else if (printk_ratelimit())
>> +			pr_warning(FW_WARN GHES_PFX
>
> WARNING: Prefer printk_ratelimited or pr_<level>_ratelimited to printk_ratelimit
> #35: FILE: drivers/acpi/apei/ghes.c:425:
> +               else if (printk_ratelimit())
>
> Please run your patches through checkpatch.pl first.

I did run checkpatch.pl, but chose to ignore this warning. ghes.c uses 
printk_ratelimit() [hence "in line with the rest of the ghes code" in 
patch description] and I felt using it is better in this scenario given 
there will be other messages being printed by the rest of the APEI code.
So, rate-limiting these messages globally seems better rather than doing 
locally using pr_warn_ratelimited()

>
> This requested change will even simplify the code (ontop of your patch):
>
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 74ef6882bca9..87e11d468f6b 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -422,10 +422,10 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
>   		pfn = mem_err->physical_addr >> PAGE_SHIFT;
>   		if (pfn_valid(pfn))
>   			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
> -		else if (printk_ratelimit())
> -			pr_warning(FW_WARN GHES_PFX
> -			"Invalid address in generic error data: %#llx\n",
> -			mem_err->physical_addr);
> +		else
> +			pr_warn_ratelimited(FW_WARN GHES_PFX
> +					    "Invalid address in generic error data: %#llx\n",
> +					    mem_err->physical_addr);
>   	}
>   	if (sev == GHES_SEV_RECOVERABLE &&
>   	    sec_sev == GHES_SEV_RECOVERABLE &&
> ---
>
>
>
>> +			"Invalid address in generic error data: %#llx\n",
>> +			mem_err->physical_addr);
>> +	}
>> +	if (sev == GHES_SEV_RECOVERABLE &&
>> +	    sec_sev == GHES_SEV_RECOVERABLE &&
>> +	    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
>> +		unsigned long pfn;
>> +		pfn = mem_err->physical_addr >> PAGE_SHIFT;
>> +		memory_failure_queue(pfn, 0, 0);
>> +	}
>> +#endif
>> +}
>> +
>>   static void ghes_do_proc(struct ghes *ghes,
>>   			 const struct acpi_hest_generic_status *estatus)
>>   {
>> @@ -428,15 +456,7 @@ static void ghes_do_proc(struct ghes *ghes,
>>   			apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
>>   						  mem_err);
>>   #endif
>> -#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>> -			if (sev == GHES_SEV_RECOVERABLE &&
>> -			    sec_sev == GHES_SEV_RECOVERABLE &&
>> -			    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
>> -				unsigned long pfn;
>> -				pfn = mem_err->physical_addr >> PAGE_SHIFT;
>> -				memory_failure_queue(pfn, 0, 0);
>> -			}
>> -#endif
>> +			ghes_handle_memory_failure(gdata, sev);
>>   		}
>>   #ifdef CONFIG_ACPI_APEI_PCIEAER
>>   		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index e0c8528..958e9efd 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1784,6 +1784,7 @@ enum mf_flags {
>>   	MF_COUNT_INCREASED = 1 << 0,
>>   	MF_ACTION_REQUIRED = 1 << 1,
>>   	MF_MUST_KILL = 1 << 2,
>> +	MF_SOFT_OFFLINE = 1 << 3,
>>   };
>>   extern int memory_failure(unsigned long pfn, int trapno, int flags);
>>   extern void memory_failure_queue(unsigned long pfn, int trapno, int flags);
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index ceb0c7f..0d6717e 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1286,7 +1286,10 @@ static void memory_failure_work_func(struct work_struct *work)
>>   		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
>>   		if (!gotten)
>>   			break;
>> -		memory_failure(entry.pfn, entry.trapno, entry.flags);
>> +		if (entry.flags & MF_SOFT_OFFLINE)
>> +			soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
>> +		else
>> +			memory_failure(entry.pfn, entry.trapno, entry.flags);
>
> The rest looks ok to me.
>
> I'm guessing this has been tested by injecting errors...?

Yes, at least partially to ensure this works ;)

Thanks,
Naveen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report
  2013-07-03 14:46   ` Borislav Petkov
@ 2013-07-03 15:46     ` Naveen N. Rao
  2013-07-08 20:26       ` Luck, Tony
  0 siblings, 1 reply; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-03 15:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On 07/03/2013 08:16 PM, Borislav Petkov wrote:
> On Tue, Jul 02, 2013 at 06:24:00PM +0530, Naveen N. Rao wrote:
>> I am adding another patch here to disable page offlining in case the firmware
>> starts acting up.
>>
>> Thanks,
>> Naveen
>>
>> --
>>
>> Add a sysctl memory_failure_soft_offline to control what is done on receipt of
>> firmware ghes notification for a corrected error. By default, kernel tries
>> to soft-offline the page immediately. If set to 0, no action is taken.
>
> What is the rationale for that? Are we adding it just in case, as a
> chicken bit or do you have a specific case?
>
> If the second, we'd love to hear about it in the commit message. :)

Nope, this is a just-in-case thing. I think you or Tony asked to have 
this in a previous discussion so that we're covered if firmware starts 
acting up. Other than that, I'm ok if this is left out.


Thanks,
Naveen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-03 15:40         ` Naveen N. Rao
@ 2013-07-08 19:00           ` Tony Luck
  2013-07-10  9:27             ` Naveen N. Rao
  0 siblings, 1 reply; 21+ messages in thread
From: Tony Luck @ 2013-07-08 19:00 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Borislav Petkov, ananth, masbock, lcm, Linux Kernel Mailing List,
	linux-acpi, Huang Ying

On Wed, Jul 3, 2013 at 8:40 AM, Naveen N. Rao
<naveen.n.rao@linux.vnet.ibm.com> wrote:

>>> +#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>>> +       int sec_sev = ghes_severity(gdata->error_severity);
>>> +       struct cper_sec_mem_err *mem_err;
>>> +       mem_err = (struct cper_sec_mem_err *)(gdata+1);
>>
>>
>> A newline here please. Also, spaces around '+'.

I was off on vacation last week - looks like you got lots done without me :-)

I have parts 1 & 2 applied to an internal tree. Looks like parts 3 & 4 need a
few final polishes to get an Ack from Boris.

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report
  2013-07-03 15:46     ` Naveen N. Rao
@ 2013-07-08 20:26       ` Luck, Tony
  2013-07-10  9:17         ` Naveen N. Rao
  0 siblings, 1 reply; 21+ messages in thread
From: Luck, Tony @ 2013-07-08 20:26 UTC (permalink / raw)
  To: Naveen N. Rao, Borislav Petkov
  Cc: ananth, masbock, lcm, linux-kernel, linux-acpi, Huang, Ying

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 913 bytes --]

> Nope, this is a just-in-case thing. I think you or Tony asked to have 
> this in a previous discussion so that we're covered if firmware starts 
> acting up. Other than that, I'm ok if this is left out.

I'm struggling to think of a case where this would help.  It implies that
we are on a running system, and we somehow notice that the BIOS is
telling us to take some pages offline - and that we know better than the
BIOS that we'd like to just ignore any more such messages from the BIOS.

But we still leave the BIOS in charge of logging the errors and keeping
track of the thresholds.

I'm happy with just the acpi=nocmcff to avoid a BIOS that does weird
stuff.  Or do you think we might still have to deal with a string of APEI
messages?

-Tony

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report
  2013-07-08 20:26       ` Luck, Tony
@ 2013-07-10  9:17         ` Naveen N. Rao
  0 siblings, 0 replies; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-10  9:17 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Borislav Petkov, ananth, masbock, lcm, linux-kernel, linux-acpi,
	Huang, Ying

On 07/09/2013 01:56 AM, Luck, Tony wrote:
> I'm happy with just the acpi=nocmcff to avoid a BIOS that does weird
> stuff.  Or do you think we might still have to deal with a string of APEI
> messages?

Agreed - and I don't think this patch can help with a string of APEI 
messages either. So yes, I think we can leave this out for now.

Thanks,
Naveen


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-08 19:00           ` Tony Luck
@ 2013-07-10  9:27             ` Naveen N. Rao
  2013-07-10  9:38               ` Borislav Petkov
  0 siblings, 1 reply; 21+ messages in thread
From: Naveen N. Rao @ 2013-07-10  9:27 UTC (permalink / raw)
  To: tony.luck, bp; +Cc: ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On 07/09/2013 12:30 AM, Tony Luck wrote:
> I was off on vacation last week - looks like you got lots done without me
> :-)
> 
> I have parts 1 & 2 applied to an internal tree. Looks like parts 3 & 4 need
> a few final polishes to get an Ack from Boris.

Cool :)
Here is the updated patch for part 3 incorporating Boris's comments.

Thanks,
Naveen
--

If the firmware indicates in GHES error data entry that the error threshold
has exceeded for a corrected error event, then we try to soft-offline the
page. This could be called in interrupt context, so we queue this up similar
to how we handle memory failure scenarios.


Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
---
 drivers/acpi/apei/ghes.c |   38 +++++++++++++++++++++++++++++---------
 include/linux/mm.h       |    1 +
 mm/memory-failure.c      |    5 ++++-
 3 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fcd7d91..ba99d32 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -409,6 +409,34 @@ static void ghes_clear_estatus(struct ghes *ghes)
 	ghes->flags &= ~GHES_TO_CLEAR;
 }
 
+static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)
+{
+#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
+	unsigned long pfn;
+	int sec_sev = ghes_severity(gdata->error_severity);
+	struct cper_sec_mem_err *mem_err;
+	mem_err = (struct cper_sec_mem_err *)(gdata + 1);
+
+	if (sec_sev == GHES_SEV_CORRECTED &&
+	    (gdata->flags & CPER_SEC_ERROR_THRESHOLD_EXCEEDED) &&
+	    (mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS)) {
+		pfn = mem_err->physical_addr >> PAGE_SHIFT;
+		if (pfn_valid(pfn))
+			memory_failure_queue(pfn, 0, MF_SOFT_OFFLINE);
+		else if (printk_ratelimit())
+			pr_warning(FW_WARN GHES_PFX
+			"Invalid address in generic error data: %#llx\n",
+			mem_err->physical_addr);
+	}
+	if (sev == GHES_SEV_RECOVERABLE &&
+	    sec_sev == GHES_SEV_RECOVERABLE &&
+	    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
+		pfn = mem_err->physical_addr >> PAGE_SHIFT;
+		memory_failure_queue(pfn, 0, 0);
+	}
+#endif
+}
+
 static void ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
@@ -428,15 +456,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			apei_mce_report_mem_error(sev == GHES_SEV_CORRECTED,
 						  mem_err);
 #endif
-#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
-			if (sev == GHES_SEV_RECOVERABLE &&
-			    sec_sev == GHES_SEV_RECOVERABLE &&
-			    mem_err->validation_bits & CPER_MEM_VALID_PHYSICAL_ADDRESS) {
-				unsigned long pfn;
-				pfn = mem_err->physical_addr >> PAGE_SHIFT;
-				memory_failure_queue(pfn, 0, 0);
-			}
-#endif
+			ghes_handle_memory_failure(gdata, sev);
 		}
 #ifdef CONFIG_ACPI_APEI_PCIEAER
 		else if (!uuid_le_cmp(*(uuid_le *)gdata->section_type,
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e0c8528..958e9efd 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1784,6 +1784,7 @@ enum mf_flags {
 	MF_COUNT_INCREASED = 1 << 0,
 	MF_ACTION_REQUIRED = 1 << 1,
 	MF_MUST_KILL = 1 << 2,
+	MF_SOFT_OFFLINE = 1 << 3,
 };
 extern int memory_failure(unsigned long pfn, int trapno, int flags);
 extern void memory_failure_queue(unsigned long pfn, int trapno, int flags);
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index ceb0c7f..0d6717e 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1286,7 +1286,10 @@ static void memory_failure_work_func(struct work_struct *work)
 		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
 		if (!gotten)
 			break;
-		memory_failure(entry.pfn, entry.trapno, entry.flags);
+		if (entry.flags & MF_SOFT_OFFLINE)
+			soft_offline_page(pfn_to_page(entry.pfn), entry.flags);
+		else
+			memory_failure(entry.pfn, entry.trapno, entry.flags);
 	}
 }
 


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-10  9:27             ` Naveen N. Rao
@ 2013-07-10  9:38               ` Borislav Petkov
  2013-07-10 20:05                 ` Tony Luck
  0 siblings, 1 reply; 21+ messages in thread
From: Borislav Petkov @ 2013-07-10  9:38 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: tony.luck, ananth, masbock, lcm, linux-kernel, linux-acpi, ying.huang

On Wed, Jul 10, 2013 at 02:57:01PM +0530, Naveen N. Rao wrote:
> On 07/09/2013 12:30 AM, Tony Luck wrote:
> > I was off on vacation last week - looks like you got lots done without me
> > :-)
> > 
> > I have parts 1 & 2 applied to an internal tree. Looks like parts 3 & 4 need
> > a few final polishes to get an Ack from Boris.
> 
> Cool :)
> Here is the updated patch for part 3 incorporating Boris's comments.
> 
> Thanks,
> Naveen
> --
> 
> If the firmware indicates in GHES error data entry that the error threshold
> has exceeded for a corrected error event, then we try to soft-offline the
> page. This could be called in interrupt context, so we queue this up similar
> to how we handle memory failure scenarios.
> 
> 
> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>

Acked-by: Borislav Petkov <bp@suse.de>

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification
  2013-07-10  9:38               ` Borislav Petkov
@ 2013-07-10 20:05                 ` Tony Luck
  0 siblings, 0 replies; 21+ messages in thread
From: Tony Luck @ 2013-07-10 20:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Naveen N. Rao, ananth, masbock, lcm, Linux Kernel Mailing List,
	linux-acpi, Huang Ying

>> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
>
> Acked-by: Borislav Petkov <bp@suse.de>

Applied-by: Tony Luck :-)

Naveen: Thanks for having this idea, implementing it, and sticking
with it through the review process.

Once 3.11-rc1 is out I'll ask Ingo to pull this series to the tip tree
... and then on to 3.12

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-07-10 20:05 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-01 15:38 [PATCH v3 0/3] Firmware first mode for corrected errors Naveen N. Rao
2013-07-01 15:38 ` [PATCH v3 1/3] mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMC Naveen N. Rao
2013-07-01 22:45   ` Borislav Petkov
2013-07-01 15:38 ` [PATCH v3 2/3] mce: acpi/apei: Add a boot option to disable ff mode for corrected errors Naveen N. Rao
2013-07-01 22:49   ` Borislav Petkov
2013-07-01 15:38 ` [PATCH v3 3/3] mce, acpi/apei: Soft-offline a page on firmware GHES notification Naveen N. Rao
2013-07-01 23:08   ` Borislav Petkov
2013-07-02 11:05     ` Naveen N. Rao
2013-07-02 11:32     ` Naveen N. Rao
2013-07-03 14:44       ` Borislav Petkov
2013-07-03 15:40         ` Naveen N. Rao
2013-07-08 19:00           ` Tony Luck
2013-07-10  9:27             ` Naveen N. Rao
2013-07-10  9:38               ` Borislav Petkov
2013-07-10 20:05                 ` Tony Luck
2013-07-01 20:08 ` [PATCH v3 0/3] Firmware first mode for corrected errors Borislav Petkov
2013-07-02 12:54 ` [PATCH 4] mce: acpi/apei: Add a sysctl to control page offlining on firmware report Naveen N. Rao
2013-07-03 14:46   ` Borislav Petkov
2013-07-03 15:46     ` Naveen N. Rao
2013-07-08 20:26       ` Luck, Tony
2013-07-10  9:17         ` Naveen N. Rao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).