[PATCH 0/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
@ 2020-02-28 17:48 ` James Morse
  0 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2020-02-28 17:48 UTC (permalink / raw)
  To: linux-mm, linux-acpi, linux-arm-kernel
  Cc: Andrew Morton, Naoya Horiguchi, Rafael Wysocki, Len Brown,
	Tony Luck, Borislav Petkov, Catalin Marinas, Will Deacon,
	Mark Rutland, Tyler Baicar, Xie XiuQi, James Morse

Hello!

These are the remaining patches from the SDEI series[0] that fix
a race between memory_failure() and user-space re-triggering the error
in ghes.c.

ghes_handle_memory_failure() calls memory_failure_queue() from
IRQ context to schedule memory_failure()s work as it needs to sleep.
Once the GHES machinery returns from the IRQ, it may return to user-space
before memory_failure() runs.

If the error that kicked all this off is specific to user-space, e.g. a
load from corrupted memory, we may find ourselves taking the error
again. If the user-space task is scheduled out, and memory_failure() runs,
the same user-space task may be scheduled in on another CPU, which could
also take the same error.

These lead to exaggerated error counters, which may cause some threshold
to be reached early.

This can happen with any error that causes a Synchronous External Abort
on arm64. I can't see why the same wouldn't happen with a machine-check
handled firmware first on x86.

This series adds a memory_failure_queue_kick() helper to
memory-failure.c, and calls it as task-work before returning to
user-space.

Currently arm64 papers over this problem by ignoring ghes_notify_sea()'s
return code as it knows there is still work to do. arm64 generates its
own signal to user-space, which means the first task to discover an
error will always be killed, even if the error was later handled.
(which is no improvement on the no-RAS behaviour)

As a final piece, arm64 can try to process the irq work queued by
ghes_notify_sea() while its still in the external abort handler. A succesfull
return value here now means the memory_failure() work will be done before we
return to user-space, we no longer need to generate our own signal.
This lets the original task survive the error if memory_failure() can
recover the corrupted memory.

Based on v5.6-rc2. I'm afraid it touches three different trees.
$subject says ACPI as that is where the bulk of the diffstat is.

This series may conflict in arm64 with a series from Mark Rutland to
cleanup the daif/PMR toggling.

This would be v9 of these patches, but after a year I figure I should
start the numbering again. I've dropped any collected tags.

Known issues:
 * arm64's apei_claim_sea() may unwittingly re-enable debug if it takes
   an external-abort from debug context. Patch 3 makes this worse
   instead of fixing it. The fix would make use of helpers from Mark R's
   series.

Thanks,

James

[0] https://lore.kernel.org/linux-arm-kernel/20190129184902.102850-1-james.morse@arm.com/
[1] https://lore.kernel.org/linux-acpi/1506516620-20033-3-git-send-email-xiexiuqi@huawei.com/

James Morse (3):
  mm/memory-failure: Add memory_failure_queue_kick()
  ACPI / APEI: Kick the memory_failure() queue for synchronous errors
  arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work

 arch/arm64/kernel/acpi.c | 25 +++++++++++++++
 arch/arm64/mm/fault.c    | 12 ++++---
 drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++-------
 include/acpi/ghes.h      |  3 ++
 include/linux/mm.h       |  1 +
 mm/memory-failure.c      | 15 ++++++++-
 6 files changed, 107 insertions(+), 17 deletions(-)

-- 
2.24.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 0/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
@ 2020-02-28 17:48 ` James Morse
  0 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2020-02-28 17:48 UTC (permalink / raw)
  To: linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Tyler Baicar, Borislav Petkov, James Morse,
	Andrew Morton, Will Deacon, Naoya Horiguchi, Len Brown

Hello!

These are the remaining patches from the SDEI series[0] that fix
a race between memory_failure() and user-space re-triggering the error
in ghes.c.

ghes_handle_memory_failure() calls memory_failure_queue() from
IRQ context to schedule memory_failure()s work as it needs to sleep.
Once the GHES machinery returns from the IRQ, it may return to user-space
before memory_failure() runs.

If the error that kicked all this off is specific to user-space, e.g. a
load from corrupted memory, we may find ourselves taking the error
again. If the user-space task is scheduled out, and memory_failure() runs,
the same user-space task may be scheduled in on another CPU, which could
also take the same error.

These lead to exaggerated error counters, which may cause some threshold
to be reached early.

This can happen with any error that causes a Synchronous External Abort
on arm64. I can't see why the same wouldn't happen with a machine-check
handled firmware first on x86.

This series adds a memory_failure_queue_kick() helper to
memory-failure.c, and calls it as task-work before returning to
user-space.

Currently arm64 papers over this problem by ignoring ghes_notify_sea()'s
return code as it knows there is still work to do. arm64 generates its
own signal to user-space, which means the first task to discover an
error will always be killed, even if the error was later handled.
(which is no improvement on the no-RAS behaviour)

As a final piece, arm64 can try to process the irq work queued by
ghes_notify_sea() while its still in the external abort handler. A succesfull
return value here now means the memory_failure() work will be done before we
return to user-space, we no longer need to generate our own signal.
This lets the original task survive the error if memory_failure() can
recover the corrupted memory.

Based on v5.6-rc2. I'm afraid it touches three different trees.
$subject says ACPI as that is where the bulk of the diffstat is.

This series may conflict in arm64 with a series from Mark Rutland to
cleanup the daif/PMR toggling.

This would be v9 of these patches, but after a year I figure I should
start the numbering again. I've dropped any collected tags.

Known issues:
 * arm64's apei_claim_sea() may unwittingly re-enable debug if it takes
   an external-abort from debug context. Patch 3 makes this worse
   instead of fixing it. The fix would make use of helpers from Mark R's
   series.

Thanks,

James

[0] https://lore.kernel.org/linux-arm-kernel/20190129184902.102850-1-james.morse@arm.com/
[1] https://lore.kernel.org/linux-acpi/1506516620-20033-3-git-send-email-xiexiuqi@huawei.com/

James Morse (3):
  mm/memory-failure: Add memory_failure_queue_kick()
  ACPI / APEI: Kick the memory_failure() queue for synchronous errors
  arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work

 arch/arm64/kernel/acpi.c | 25 +++++++++++++++
 arch/arm64/mm/fault.c    | 12 ++++---
 drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++-------
 include/acpi/ghes.h      |  3 ++
 include/linux/mm.h       |  1 +
 mm/memory-failure.c      | 15 ++++++++-
 6 files changed, 107 insertions(+), 17 deletions(-)

-- 
2.24.1

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/3] mm/memory-failure: Add memory_failure_queue_kick()
  2020-02-28 17:48 ` James Morse
@ 2020-02-28 17:48   ` James Morse
  -1 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2020-02-28 17:48 UTC (permalink / raw)
  To: linux-mm, linux-acpi, linux-arm-kernel
  Cc: Andrew Morton, Naoya Horiguchi, Rafael Wysocki, Len Brown,
	Tony Luck, Borislav Petkov, Catalin Marinas, Will Deacon,
	Mark Rutland, Tyler Baicar, Xie XiuQi, James Morse

The GHES code calls memory_failure_queue() from IRQ context to schedule
work on the current CPU so that memory_failure() can sleep.

For synchronous memory errors the arch code needs to know any signals
that memory_failure() will trigger are pending before it returns to
user-space, possibly when exiting from the IRQ.

Add a helper to kick the memory failure queue, to ensure the scheduled
work has happened. This has to be called from process context, so may
have been migrated from the original cpu. Pass the cpu the work was
queued on.

Change memory_failure_work_func() to permit being called on the 'wrong'
cpu.

Signed-off-by: James Morse <james.morse@arm.com>
---
 include/linux/mm.h  |  1 +
 mm/memory-failure.c | 15 ++++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..389460de3ee6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2806,6 +2806,7 @@ enum mf_flags {
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
+extern void memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
 extern int get_hwpoison_page(struct page *page);
 #define put_hwpoison_page(page)	put_page(page)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 41c634f45d45..afdf1fd5ef9c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1468,7 +1468,7 @@ static void memory_failure_work_func(struct work_struct *work)
 	unsigned long proc_flags;
 	int gotten;
 
-	mf_cpu = this_cpu_ptr(&memory_failure_cpu);
+	mf_cpu = container_of(work, struct memory_failure_cpu, work);
 	for (;;) {
 		spin_lock_irqsave(&mf_cpu->lock, proc_flags);
 		gotten = kfifo_get(&mf_cpu->fifo, &entry);
@@ -1482,6 +1482,19 @@ static void memory_failure_work_func(struct work_struct *work)
 	}
 }
 
+/*
+ * Process memory_failure work queued on the specified CPU.
+ * Used to avoid return-to-userspace racing with the memory_failure workqueue.
+ */
+void memory_failure_queue_kick(int cpu)
+{
+	struct memory_failure_cpu *mf_cpu;
+
+	mf_cpu = &per_cpu(memory_failure_cpu, cpu);
+	cancel_work_sync(&mf_cpu->work);
+	memory_failure_work_func(&mf_cpu->work);
+}
+
 static int __init memory_failure_init(void)
 {
 	struct memory_failure_cpu *mf_cpu;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 1/3] mm/memory-failure: Add memory_failure_queue_kick()
@ 2020-02-28 17:48   ` James Morse
  0 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2020-02-28 17:48 UTC (permalink / raw)
  To: linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Tyler Baicar, Borislav Petkov, James Morse,
	Andrew Morton, Will Deacon, Naoya Horiguchi, Len Brown

The GHES code calls memory_failure_queue() from IRQ context to schedule
work on the current CPU so that memory_failure() can sleep.

For synchronous memory errors the arch code needs to know any signals
that memory_failure() will trigger are pending before it returns to
user-space, possibly when exiting from the IRQ.

Add a helper to kick the memory failure queue, to ensure the scheduled
work has happened. This has to be called from process context, so may
have been migrated from the original cpu. Pass the cpu the work was
queued on.

Change memory_failure_work_func() to permit being called on the 'wrong'
cpu.

Signed-off-by: James Morse <james.morse@arm.com>
---
 include/linux/mm.h  |  1 +
 mm/memory-failure.c | 15 ++++++++++++++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 52269e56c514..389460de3ee6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2806,6 +2806,7 @@ enum mf_flags {
 };
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
+extern void memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
 extern int get_hwpoison_page(struct page *page);
 #define put_hwpoison_page(page)	put_page(page)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 41c634f45d45..afdf1fd5ef9c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1468,7 +1468,7 @@ static void memory_failure_work_func(struct work_struct *work)
 	unsigned long proc_flags;
 	int gotten;
 
-	mf_cpu = this_cpu_ptr(&memory_failure_cpu);
+	mf_cpu = container_of(work, struct memory_failure_cpu, work);
 	for (;;) {
 		spin_lock_irqsave(&mf_cpu->lock, proc_flags);
 		gotten = kfifo_get(&mf_cpu->fifo, &entry);
@@ -1482,6 +1482,19 @@ static void memory_failure_work_func(struct work_struct *work)
 	}
 }
 
+/*
+ * Process memory_failure work queued on the specified CPU.
+ * Used to avoid return-to-userspace racing with the memory_failure workqueue.
+ */
+void memory_failure_queue_kick(int cpu)
+{
+	struct memory_failure_cpu *mf_cpu;
+
+	mf_cpu = &per_cpu(memory_failure_cpu, cpu);
+	cancel_work_sync(&mf_cpu->work);
+	memory_failure_work_func(&mf_cpu->work);
+}
+
 static int __init memory_failure_init(void)
 {
 	struct memory_failure_cpu *mf_cpu;
-- 
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
  2020-02-28 17:48 ` James Morse
@ 2020-02-28 17:48   ` James Morse
  -1 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2020-02-28 17:48 UTC (permalink / raw)
  To: linux-mm, linux-acpi, linux-arm-kernel
  Cc: Andrew Morton, Naoya Horiguchi, Rafael Wysocki, Len Brown,
	Tony Luck, Borislav Petkov, Catalin Marinas, Will Deacon,
	Mark Rutland, Tyler Baicar, Xie XiuQi, James Morse

memory_failure() offlines or repairs pages of memory that have been
discovered to be corrupt. These may be detected by an external
component, (e.g. the memory controller), and notified via an IRQ.
In this case the work is queued as not all of memory_failure()s work
can happen in IRQ context.

If the error was detected as a result of user-space accessing a
corrupt memory location the CPU may take an abort instead. On arm64
this is a 'synchronous external abort', and on a firmware first
system it is replayed using NOTIFY_SEA.

This notification has NMI like properties, (it can interrupt
IRQ-masked code), so the memory_failure() work is queued. If we
return to user-space before the queued memory_failure() work is
processed, we will take the fault again. This loop may cause platform
firmware to exceed some threshold and reboot when Linux could have
recovered from this error.

For NMIlike notifications keep track of whether memory_failure() work
was queued, and make task_work pending to flush out the queue.
To save memory allocations, the task_work is allocated as part of
the ghes_estatus_node, and free()ing it back to the pool is deferred.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++-------
 include/acpi/ghes.h      |  3 ++
 2 files changed, 60 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 103acbbfcf9a..c91c9ec55750 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -40,6 +40,7 @@
 #include <linux/sched/clock.h>
 #include <linux/uuid.h>
 #include <linux/ras.h>
+#include <linux/task_work.h>
 
 #include <acpi/actbl1.h>
 #include <acpi/ghes.h>
@@ -414,23 +415,47 @@ static void ghes_clear_estatus(struct ghes *ghes,
 		ghes_ack_error(ghes->generic_v2);
 }
 
-static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)
+/*
+ * Called as task_work before returning to user-space.
+ * Ensure any queued work has been done before we return to the context that
+ * triggered the notification.
+ */
+static void ghes_kick_task_work(struct callback_head *head)
+{
+	struct acpi_hest_generic_status *estatus;
+	struct ghes_estatus_node *estatus_node;
+	u32 node_len;
+
+	estatus_node = container_of(head, struct ghes_estatus_node, task_work);
+	if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
+		memory_failure_queue_kick(estatus_node->task_work_cpu);
+
+	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+	node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus));
+	gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len);
+}
+
+static bool ghes_handle_memory_failure(struct ghes *ghes,
+				       struct acpi_hest_generic_data *gdata,
+				       int sev)
 {
-#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
 	unsigned long pfn;
 	int flags = -1;
 	int sec_sev = ghes_severity(gdata->error_severity);
 	struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
+	if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
+		return false;
+
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
-		return;
+		return false;
 
 	pfn = mem_err->physical_addr >> PAGE_SHIFT;
 	if (!pfn_valid(pfn)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		mem_err->physical_addr);
-		return;
+		return false;
 	}
 
 	/* iff following two events can be handled properly by now */
@@ -440,9 +465,12 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 	if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
 		flags = 0;
 
-	if (flags != -1)
+	if (flags != -1) {
 		memory_failure_queue(pfn, flags);
-#endif
+		return true;
+	}
+
+	return false;
 }
 
 /*
@@ -490,7 +518,7 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
 #endif
 }
 
-static void ghes_do_proc(struct ghes *ghes,
+static bool ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
 	int sev, sec_sev;
@@ -498,6 +526,7 @@ static void ghes_do_proc(struct ghes *ghes,
 	guid_t *sec_type;
 	const guid_t *fru_id = &guid_null;
 	char *fru_text = "";
+	bool queued = false;
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
@@ -515,7 +544,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			ghes_edac_report_mem_error(sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
-			ghes_handle_memory_failure(gdata, sev);
+			queued = ghes_handle_memory_failure(ghes, gdata, sev);
 		}
 		else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
 			ghes_handle_aer(gdata);
@@ -532,6 +561,8 @@ static void ghes_do_proc(struct ghes *ghes,
 					       gdata->error_data_length);
 		}
 	}
+
+	return queued;
 }
 
 static void __ghes_print_estatus(const char *pfx,
@@ -827,7 +858,9 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
 	struct ghes_estatus_node *estatus_node;
 	struct acpi_hest_generic *generic;
 	struct acpi_hest_generic_status *estatus;
+	bool task_work_pending;
 	u32 len, node_len;
+	int ret;
 
 	llnode = llist_del_all(&ghes_estatus_llist);
 	/*
@@ -842,14 +875,26 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
 		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
 		len = cper_estatus_len(estatus);
 		node_len = GHES_ESTATUS_NODE_LEN(len);
-		ghes_do_proc(estatus_node->ghes, estatus);
+		task_work_pending = ghes_do_proc(estatus_node->ghes, estatus);
 		if (!ghes_estatus_cached(estatus)) {
 			generic = estatus_node->generic;
 			if (ghes_print_estatus(NULL, generic, estatus))
 				ghes_estatus_cache_add(generic, estatus);
 		}
-		gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
-			      node_len);
+
+		if (task_work_pending && current->mm != &init_mm) {
+			estatus_node->task_work.func = ghes_kick_task_work;
+			estatus_node->task_work_cpu = smp_processor_id();
+			ret = task_work_add(current, &estatus_node->task_work,
+					    true);
+			if (ret)
+				estatus_node->task_work.func = NULL;
+		}
+
+		if (!estatus_node->task_work.func)
+			gen_pool_free(ghes_estatus_pool,
+				      (unsigned long)estatus_node, node_len);
+
 		llnode = next;
 	}
 }
@@ -909,6 +954,7 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
 
 	estatus_node->ghes = ghes;
 	estatus_node->generic = ghes->generic;
+	estatus_node->task_work.func = NULL;
 	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
 
 	if (__ghes_read_estatus(estatus, buf_paddr, fixmap_idx, len)) {
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index e3f1cddb4ac8..517a5231cc1b 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -33,6 +33,9 @@ struct ghes_estatus_node {
 	struct llist_node llnode;
 	struct acpi_hest_generic *generic;
 	struct ghes *ghes;
+
+	int task_work_cpu;
+	struct callback_head task_work;
 };
 
 struct ghes_estatus_cache {
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
@ 2020-02-28 17:48   ` James Morse
  0 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2020-02-28 17:48 UTC (permalink / raw)
  To: linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Tyler Baicar, Borislav Petkov, James Morse,
	Andrew Morton, Will Deacon, Naoya Horiguchi, Len Brown

memory_failure() offlines or repairs pages of memory that have been
discovered to be corrupt. These may be detected by an external
component, (e.g. the memory controller), and notified via an IRQ.
In this case the work is queued as not all of memory_failure()s work
can happen in IRQ context.

If the error was detected as a result of user-space accessing a
corrupt memory location the CPU may take an abort instead. On arm64
this is a 'synchronous external abort', and on a firmware first
system it is replayed using NOTIFY_SEA.

This notification has NMI like properties, (it can interrupt
IRQ-masked code), so the memory_failure() work is queued. If we
return to user-space before the queued memory_failure() work is
processed, we will take the fault again. This loop may cause platform
firmware to exceed some threshold and reboot when Linux could have
recovered from this error.

For NMIlike notifications keep track of whether memory_failure() work
was queued, and make task_work pending to flush out the queue.
To save memory allocations, the task_work is allocated as part of
the ghes_estatus_node, and free()ing it back to the pool is deferred.

Signed-off-by: James Morse <james.morse@arm.com>
---
 drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++-------
 include/acpi/ghes.h      |  3 ++
 2 files changed, 60 insertions(+), 11 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 103acbbfcf9a..c91c9ec55750 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -40,6 +40,7 @@
 #include <linux/sched/clock.h>
 #include <linux/uuid.h>
 #include <linux/ras.h>
+#include <linux/task_work.h>
 
 #include <acpi/actbl1.h>
 #include <acpi/ghes.h>
@@ -414,23 +415,47 @@ static void ghes_clear_estatus(struct ghes *ghes,
 		ghes_ack_error(ghes->generic_v2);
 }
 
-static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)
+/*
+ * Called as task_work before returning to user-space.
+ * Ensure any queued work has been done before we return to the context that
+ * triggered the notification.
+ */
+static void ghes_kick_task_work(struct callback_head *head)
+{
+	struct acpi_hest_generic_status *estatus;
+	struct ghes_estatus_node *estatus_node;
+	u32 node_len;
+
+	estatus_node = container_of(head, struct ghes_estatus_node, task_work);
+	if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
+		memory_failure_queue_kick(estatus_node->task_work_cpu);
+
+	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
+	node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus));
+	gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len);
+}
+
+static bool ghes_handle_memory_failure(struct ghes *ghes,
+				       struct acpi_hest_generic_data *gdata,
+				       int sev)
 {
-#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
 	unsigned long pfn;
 	int flags = -1;
 	int sec_sev = ghes_severity(gdata->error_severity);
 	struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
 
+	if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
+		return false;
+
 	if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
-		return;
+		return false;
 
 	pfn = mem_err->physical_addr >> PAGE_SHIFT;
 	if (!pfn_valid(pfn)) {
 		pr_warn_ratelimited(FW_WARN GHES_PFX
 		"Invalid address in generic error data: %#llx\n",
 		mem_err->physical_addr);
-		return;
+		return false;
 	}
 
 	/* iff following two events can be handled properly by now */
@@ -440,9 +465,12 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
 	if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
 		flags = 0;
 
-	if (flags != -1)
+	if (flags != -1) {
 		memory_failure_queue(pfn, flags);
-#endif
+		return true;
+	}
+
+	return false;
 }
 
 /*
@@ -490,7 +518,7 @@ static void ghes_handle_aer(struct acpi_hest_generic_data *gdata)
 #endif
 }
 
-static void ghes_do_proc(struct ghes *ghes,
+static bool ghes_do_proc(struct ghes *ghes,
 			 const struct acpi_hest_generic_status *estatus)
 {
 	int sev, sec_sev;
@@ -498,6 +526,7 @@ static void ghes_do_proc(struct ghes *ghes,
 	guid_t *sec_type;
 	const guid_t *fru_id = &guid_null;
 	char *fru_text = "";
+	bool queued = false;
 
 	sev = ghes_severity(estatus->error_severity);
 	apei_estatus_for_each_section(estatus, gdata) {
@@ -515,7 +544,7 @@ static void ghes_do_proc(struct ghes *ghes,
 			ghes_edac_report_mem_error(sev, mem_err);
 
 			arch_apei_report_mem_error(sev, mem_err);
-			ghes_handle_memory_failure(gdata, sev);
+			queued = ghes_handle_memory_failure(ghes, gdata, sev);
 		}
 		else if (guid_equal(sec_type, &CPER_SEC_PCIE)) {
 			ghes_handle_aer(gdata);
@@ -532,6 +561,8 @@ static void ghes_do_proc(struct ghes *ghes,
 					       gdata->error_data_length);
 		}
 	}
+
+	return queued;
 }
 
 static void __ghes_print_estatus(const char *pfx,
@@ -827,7 +858,9 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
 	struct ghes_estatus_node *estatus_node;
 	struct acpi_hest_generic *generic;
 	struct acpi_hest_generic_status *estatus;
+	bool task_work_pending;
 	u32 len, node_len;
+	int ret;
 
 	llnode = llist_del_all(&ghes_estatus_llist);
 	/*
@@ -842,14 +875,26 @@ static void ghes_proc_in_irq(struct irq_work *irq_work)
 		estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
 		len = cper_estatus_len(estatus);
 		node_len = GHES_ESTATUS_NODE_LEN(len);
-		ghes_do_proc(estatus_node->ghes, estatus);
+		task_work_pending = ghes_do_proc(estatus_node->ghes, estatus);
 		if (!ghes_estatus_cached(estatus)) {
 			generic = estatus_node->generic;
 			if (ghes_print_estatus(NULL, generic, estatus))
 				ghes_estatus_cache_add(generic, estatus);
 		}
-		gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node,
-			      node_len);
+
+		if (task_work_pending && current->mm != &init_mm) {
+			estatus_node->task_work.func = ghes_kick_task_work;
+			estatus_node->task_work_cpu = smp_processor_id();
+			ret = task_work_add(current, &estatus_node->task_work,
+					    true);
+			if (ret)
+				estatus_node->task_work.func = NULL;
+		}
+
+		if (!estatus_node->task_work.func)
+			gen_pool_free(ghes_estatus_pool,
+				      (unsigned long)estatus_node, node_len);
+
 		llnode = next;
 	}
 }
@@ -909,6 +954,7 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *ghes,
 
 	estatus_node->ghes = ghes;
 	estatus_node->generic = ghes->generic;
+	estatus_node->task_work.func = NULL;
 	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
 
 	if (__ghes_read_estatus(estatus, buf_paddr, fixmap_idx, len)) {
diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h
index e3f1cddb4ac8..517a5231cc1b 100644
--- a/include/acpi/ghes.h
+++ b/include/acpi/ghes.h
@@ -33,6 +33,9 @@ struct ghes_estatus_node {
 	struct llist_node llnode;
 	struct acpi_hest_generic *generic;
 	struct ghes *ghes;
+
+	int task_work_cpu;
+	struct callback_head task_work;
 };
 
 struct ghes_estatus_cache {
-- 
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
  2020-02-28 17:48 ` James Morse
@ 2020-02-28 17:48   ` James Morse
  -1 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2020-02-28 17:48 UTC (permalink / raw)
  To: linux-mm, linux-acpi, linux-arm-kernel
  Cc: Andrew Morton, Naoya Horiguchi, Rafael Wysocki, Len Brown,
	Tony Luck, Borislav Petkov, Catalin Marinas, Will Deacon,
	Mark Rutland, Tyler Baicar, Xie XiuQi, James Morse

APEI is unable to do all of its error handling work in nmi-context, so
it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
sends an IPI to the calling cpu, but this is not guaranteed to be taken
before returning to user-space.

Unless the exception interrupted a context with irqs-masked,
irq_work_run() can run immediately. Otherwise return -EINPROGRESS to
indicate ghes_notify_sea() found some work to do, but it hasn't
finished yet.

With this apei_claim_sea() returning '0' means this external-abort was
also notification of a firmware-first RAS error, and that APEI has
processed the CPER records.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since $last_year:
 * Dropped all the tags ... its been a year.
 * Added user_mode() test in do_sea() and expanded the comment.
 * Dont depend on daif value for return_to_irqs_enabled because of pNMI.
 * pr_warn() should be ratelimited
---
 arch/arm64/kernel/acpi.c | 25 +++++++++++++++++++++++++
 arch/arm64/mm/fault.c    | 12 +++++++-----
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index a100483b47c4..46ec402e97ed 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -19,6 +19,7 @@
 #include <linux/init.h>
 #include <linux/irq.h>
 #include <linux/irqdomain.h>
+#include <linux/irq_work.h>
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
 #include <linux/smp.h>
@@ -269,6 +270,7 @@ pgprot_t __acpi_get_mem_attribute(phys_addr_t addr)
 int apei_claim_sea(struct pt_regs *regs)
 {
 	int err = -ENOENT;
+	bool return_to_irqs_enabled;
 	unsigned long current_flags;
 
 	if (!IS_ENABLED(CONFIG_ACPI_APEI_GHES))
@@ -276,6 +278,12 @@ int apei_claim_sea(struct pt_regs *regs)
 
 	current_flags = local_daif_save_flags();
 
+	/* current_flags isn't useful here as daif doesn't tell us about pNMI */
+	return_to_irqs_enabled = !irqs_disabled_flags(arch_local_save_flags());
+
+	if (regs)
+		return_to_irqs_enabled = interrupts_enabled(regs);
+
 	/*
 	 * SEA can interrupt SError, mask it and describe this as an NMI so
 	 * that APEI defers the handling.
@@ -284,6 +292,23 @@ int apei_claim_sea(struct pt_regs *regs)
 	nmi_enter();
 	err = ghes_notify_sea();
 	nmi_exit();
+
+	/*
+	 * APEI NMI-like notifications are deferred to irq_work. Unless
+	 * we interrupted irqs-masked code, we can do that now.
+	 */
+	if (!err) {
+		if (return_to_irqs_enabled) {
+			local_daif_restore(DAIF_PROCCTX_NOIRQ);
+			__irq_enter();
+			irq_work_run();
+			__irq_exit();
+		} else {
+			pr_warn_ratelimited("APEI work queued but not completed");
+			err = -EINPROGRESS;
+		}
+	}
+
 	local_daif_restore(current_flags);
 
 	return err;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 85566d32958f..cefeb34580da 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -645,11 +645,13 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 
 	inf = esr_to_fault_info(esr);
 
-	/*
-	 * Return value ignored as we rely on signal merging.
-	 * Future patches will make this more robust.
-	 */
-	apei_claim_sea(regs);
+	if (user_mode(regs) && apei_claim_sea(regs) == 0) {
+		/*
+		 * APEI claimed this as a firmware-first notification.
+		 * Some processing deferred to task_work before ret_to_user().
+		 */
+		return 0;
+	}
 
 	if (esr & ESR_ELx_FnV)
 		siaddr = NULL;
-- 
2.24.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/3] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
@ 2020-02-28 17:48   ` James Morse
  0 siblings, 0 replies; 16+ messages in thread
From: James Morse @ 2020-02-28 17:48 UTC (permalink / raw)
  To: linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Tyler Baicar, Borislav Petkov, James Morse,
	Andrew Morton, Will Deacon, Naoya Horiguchi, Len Brown

APEI is unable to do all of its error handling work in nmi-context, so
it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
sends an IPI to the calling cpu, but this is not guaranteed to be taken
before returning to user-space.

Unless the exception interrupted a context with irqs-masked,
irq_work_run() can run immediately. Otherwise return -EINPROGRESS to
indicate ghes_notify_sea() found some work to do, but it hasn't
finished yet.

With this apei_claim_sea() returning '0' means this external-abort was
also notification of a firmware-first RAS error, and that APEI has
processed the CPER records.

Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since $last_year:
 * Dropped all the tags ... its been a year.
 * Added user_mode() test in do_sea() and expanded the comment.
 * Dont depend on daif value for return_to_irqs_enabled because of pNMI.
 * pr_warn() should be ratelimited
---
 arch/arm64/kernel/acpi.c | 25 +++++++++++++++++++++++++
 arch/arm64/mm/fault.c    | 12 +++++++-----
 2 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
index a100483b47c4..46ec402e97ed 100644
--- a/arch/arm64/kernel/acpi.c
+++ b/arch/arm64/kernel/acpi.c
@@ -19,6 +19,7 @@
 #include <linux/init.h>
 #include <linux/irq.h>
 #include <linux/irqdomain.h>
+#include <linux/irq_work.h>
 #include <linux/memblock.h>
 #include <linux/of_fdt.h>
 #include <linux/smp.h>
@@ -269,6 +270,7 @@ pgprot_t __acpi_get_mem_attribute(phys_addr_t addr)
 int apei_claim_sea(struct pt_regs *regs)
 {
 	int err = -ENOENT;
+	bool return_to_irqs_enabled;
 	unsigned long current_flags;
 
 	if (!IS_ENABLED(CONFIG_ACPI_APEI_GHES))
@@ -276,6 +278,12 @@ int apei_claim_sea(struct pt_regs *regs)
 
 	current_flags = local_daif_save_flags();
 
+	/* current_flags isn't useful here as daif doesn't tell us about pNMI */
+	return_to_irqs_enabled = !irqs_disabled_flags(arch_local_save_flags());
+
+	if (regs)
+		return_to_irqs_enabled = interrupts_enabled(regs);
+
 	/*
 	 * SEA can interrupt SError, mask it and describe this as an NMI so
 	 * that APEI defers the handling.
@@ -284,6 +292,23 @@ int apei_claim_sea(struct pt_regs *regs)
 	nmi_enter();
 	err = ghes_notify_sea();
 	nmi_exit();
+
+	/*
+	 * APEI NMI-like notifications are deferred to irq_work. Unless
+	 * we interrupted irqs-masked code, we can do that now.
+	 */
+	if (!err) {
+		if (return_to_irqs_enabled) {
+			local_daif_restore(DAIF_PROCCTX_NOIRQ);
+			__irq_enter();
+			irq_work_run();
+			__irq_exit();
+		} else {
+			pr_warn_ratelimited("APEI work queued but not completed");
+			err = -EINPROGRESS;
+		}
+	}
+
 	local_daif_restore(current_flags);
 
 	return err;
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 85566d32958f..cefeb34580da 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -645,11 +645,13 @@ static int do_sea(unsigned long addr, unsigned int esr, struct pt_regs *regs)
 
 	inf = esr_to_fault_info(esr);
 
-	/*
-	 * Return value ignored as we rely on signal merging.
-	 * Future patches will make this more robust.
-	 */
-	apei_claim_sea(regs);
+	if (user_mode(regs) && apei_claim_sea(regs) == 0) {
+		/*
+		 * APEI claimed this as a firmware-first notification.
+		 * Some processing deferred to task_work before ret_to_user().
+		 */
+		return 0;
+	}
 
 	if (esr & ESR_ELx_FnV)
 		siaddr = NULL;
-- 
2.24.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
  2020-02-28 17:48   ` James Morse
  (?)
@ 2020-03-09 17:07     ` Tyler Baicar OS
  -1 siblings, 0 replies; 16+ messages in thread
From: Tyler Baicar OS @ 2020-03-09 17:07 UTC (permalink / raw)
  To: James Morse, linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Tyler Baicar, Borislav Petkov, Andrew Morton,
	Will Deacon, Naoya Horiguchi, Len Brown

Hello James,

On Fri, Feb 28, 2020 at 12:49 PM James Morse <james.morse@arm.com> wrote:
>
> -static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)

> +static bool ghes_handle_memory_failure(struct ghes *ghes,
> +                                      struct acpi_hest_generic_data *gdata,
> +                                      int sev)

It doesn't look like ghes needs to be added as a parameter to this
function, unless I'm missing something :)

>  {
> -#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>         unsigned long pfn;
>         int flags = -1;
>         int sec_sev = ghes_severity(gdata->error_severity);
>         struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
>
> +       if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
> +               return false;
> +
>         if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> -               return;
> +               return false;
>
>         pfn = mem_err->physical_addr >> PAGE_SHIFT;
>         if (!pfn_valid(pfn)) {
>                 pr_warn_ratelimited(FW_WARN GHES_PFX
>                 "Invalid address in generic error data: %#llx\n",
>                 mem_err->physical_addr);
> -               return;
> +               return false;
>         }
>
>         /* iff following two events can be handled properly by now */
> @@ -440,9 +465,12 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
>         if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
>                 flags = 0;
>
> -       if (flags != -1)
> +       if (flags != -1) {
>                 memory_failure_queue(pfn, flags);
> -#endif
> +               return true;
> +       }
> +
> +       return false;
>  }
>

This series looks good to me overall. I'm going to pull it and give it a spin.

Thanks,
Tyler

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
@ 2020-03-09 17:07     ` Tyler Baicar OS
  0 siblings, 0 replies; 16+ messages in thread
From: Tyler Baicar OS @ 2020-03-09 17:07 UTC (permalink / raw)
  To: James Morse, linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Tyler Baicar, Borislav Petkov, Andrew Morton,
	Will Deacon, Naoya Horiguchi, Len Brown

Hello James,

On Fri, Feb 28, 2020 at 12:49 PM James Morse <james.morse@arm.com> wrote:
>
> -static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)

> +static bool ghes_handle_memory_failure(struct ghes *ghes,
> +                                      struct acpi_hest_generic_data *gdata,
> +                                      int sev)

It doesn't look like ghes needs to be added as a parameter to this
function, unless I'm missing something :)

>  {
> -#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>         unsigned long pfn;
>         int flags = -1;
>         int sec_sev = ghes_severity(gdata->error_severity);
>         struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
>
> +       if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
> +               return false;
> +
>         if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> -               return;
> +               return false;
>
>         pfn = mem_err->physical_addr >> PAGE_SHIFT;
>         if (!pfn_valid(pfn)) {
>                 pr_warn_ratelimited(FW_WARN GHES_PFX
>                 "Invalid address in generic error data: %#llx\n",
>                 mem_err->physical_addr);
> -               return;
> +               return false;
>         }
>
>         /* iff following two events can be handled properly by now */
> @@ -440,9 +465,12 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
>         if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
>                 flags = 0;
>
> -       if (flags != -1)
> +       if (flags != -1) {
>                 memory_failure_queue(pfn, flags);
> -#endif
> +               return true;
> +       }
> +
> +       return false;
>  }
>

This series looks good to me overall. I'm going to pull it and give it a spin.

Thanks,
Tyler

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
@ 2020-03-09 17:07     ` Tyler Baicar OS
  0 siblings, 0 replies; 16+ messages in thread
From: Tyler Baicar OS @ 2020-03-09 17:07 UTC (permalink / raw)
  To: James Morse, linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Will Deacon, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Borislav Petkov, Andrew Morton, Tyler Baicar,
	Naoya Horiguchi, Len Brown

Hello James,

On Fri, Feb 28, 2020 at 12:49 PM James Morse <james.morse@arm.com> wrote:
>
> -static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int sev)

> +static bool ghes_handle_memory_failure(struct ghes *ghes,
> +                                      struct acpi_hest_generic_data *gdata,
> +                                      int sev)

It doesn't look like ghes needs to be added as a parameter to this
function, unless I'm missing something :)

>  {
> -#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE
>         unsigned long pfn;
>         int flags = -1;
>         int sec_sev = ghes_severity(gdata->error_severity);
>         struct cper_sec_mem_err *mem_err = acpi_hest_get_payload(gdata);
>
> +       if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
> +               return false;
> +
>         if (!(mem_err->validation_bits & CPER_MEM_VALID_PA))
> -               return;
> +               return false;
>
>         pfn = mem_err->physical_addr >> PAGE_SHIFT;
>         if (!pfn_valid(pfn)) {
>                 pr_warn_ratelimited(FW_WARN GHES_PFX
>                 "Invalid address in generic error data: %#llx\n",
>                 mem_err->physical_addr);
> -               return;
> +               return false;
>         }
>
>         /* iff following two events can be handled properly by now */
> @@ -440,9 +465,12 @@ static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gdata, int
>         if (sev == GHES_SEV_RECOVERABLE && sec_sev == GHES_SEV_RECOVERABLE)
>                 flags = 0;
>
> -       if (flags != -1)
> +       if (flags != -1) {
>                 memory_failure_queue(pfn, flags);
> -#endif
> +               return true;
> +       }
> +
> +       return false;
>  }
>

This series looks good to me overall. I'm going to pull it and give it a spin.

Thanks,
Tyler
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
  2020-02-28 17:48 ` James Morse
  (?)
@ 2020-03-20 19:19   ` Tyler Baicar OS
  -1 siblings, 0 replies; 16+ messages in thread
From: Tyler Baicar OS @ 2020-03-20 19:19 UTC (permalink / raw)
  To: James Morse, linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Tyler Baicar, Borislav Petkov, Andrew Morton,
	Will Deacon, Naoya Horiguchi, Len Brown

Hello James,

I think my one comment on patch 2 is valid, right? But for this series:

Tested-by: Tyler Baicar <baicar@os.amperecomputing.com>

Thanks,
Tyler

On Fri, Feb 28, 2020 at 12:48 PM James Morse <james.morse@arm.com> wrote:
>
> Hello!
>
> These are the remaining patches from the SDEI series[0] that fix
> a race between memory_failure() and user-space re-triggering the error
> in ghes.c.
>
>
> ghes_handle_memory_failure() calls memory_failure_queue() from
> IRQ context to schedule memory_failure()s work as it needs to sleep.
> Once the GHES machinery returns from the IRQ, it may return to user-space
> before memory_failure() runs.
>
> If the error that kicked all this off is specific to user-space, e.g. a
> load from corrupted memory, we may find ourselves taking the error
> again. If the user-space task is scheduled out, and memory_failure() runs,
> the same user-space task may be scheduled in on another CPU, which could
> also take the same error.
>
> These lead to exaggerated error counters, which may cause some threshold
> to be reached early.
>
> This can happen with any error that causes a Synchronous External Abort
> on arm64. I can't see why the same wouldn't happen with a machine-check
> handled firmware first on x86.
>
>
> This series adds a memory_failure_queue_kick() helper to
> memory-failure.c, and calls it as task-work before returning to
> user-space.
>
>
> Currently arm64 papers over this problem by ignoring ghes_notify_sea()'s
> return code as it knows there is still work to do. arm64 generates its
> own signal to user-space, which means the first task to discover an
> error will always be killed, even if the error was later handled.
> (which is no improvement on the no-RAS behaviour)
>
> As a final piece, arm64 can try to process the irq work queued by
> ghes_notify_sea() while its still in the external abort handler. A succesfull
> return value here now means the memory_failure() work will be done before we
> return to user-space, we no longer need to generate our own signal.
> This lets the original task survive the error if memory_failure() can
> recover the corrupted memory.
>
> Based on v5.6-rc2. I'm afraid it touches three different trees.
> $subject says ACPI as that is where the bulk of the diffstat is.
>
> This series may conflict in arm64 with a series from Mark Rutland to
> cleanup the daif/PMR toggling.
>
>
> This would be v9 of these patches, but after a year I figure I should
> start the numbering again. I've dropped any collected tags.
>
> Known issues:
>  * arm64's apei_claim_sea() may unwittingly re-enable debug if it takes
>    an external-abort from debug context. Patch 3 makes this worse
>    instead of fixing it. The fix would make use of helpers from Mark R's
>    series.
>
>
> Thanks,
>
> James
>
>
> [0] https://lore.kernel.org/linux-arm-kernel/20190129184902.102850-1-james.morse@arm.com/
> [1] https://lore.kernel.org/linux-acpi/1506516620-20033-3-git-send-email-xiexiuqi@huawei.com/
>
> James Morse (3):
>   mm/memory-failure: Add memory_failure_queue_kick()
>   ACPI / APEI: Kick the memory_failure() queue for synchronous errors
>   arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
>
>  arch/arm64/kernel/acpi.c | 25 +++++++++++++++
>  arch/arm64/mm/fault.c    | 12 ++++---
>  drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++-------
>  include/acpi/ghes.h      |  3 ++
>  include/linux/mm.h       |  1 +
>  mm/memory-failure.c      | 15 ++++++++-
>  6 files changed, 107 insertions(+), 17 deletions(-)
>
> --
> 2.24.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
@ 2020-03-20 19:19   ` Tyler Baicar OS
  0 siblings, 0 replies; 16+ messages in thread
From: Tyler Baicar OS @ 2020-03-20 19:19 UTC (permalink / raw)
  To: James Morse, linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Tyler Baicar, Borislav Petkov, Andrew Morton,
	Will Deacon, Naoya Horiguchi, Len Brown

Hello James,

I think my one comment on patch 2 is valid, right? But for this series:

Tested-by: Tyler Baicar <baicar@os.amperecomputing.com>

Thanks,
Tyler

On Fri, Feb 28, 2020 at 12:48 PM James Morse <james.morse@arm.com> wrote:
>
> Hello!
>
> These are the remaining patches from the SDEI series[0] that fix
> a race between memory_failure() and user-space re-triggering the error
> in ghes.c.
>
>
> ghes_handle_memory_failure() calls memory_failure_queue() from
> IRQ context to schedule memory_failure()s work as it needs to sleep.
> Once the GHES machinery returns from the IRQ, it may return to user-space
> before memory_failure() runs.
>
> If the error that kicked all this off is specific to user-space, e.g. a
> load from corrupted memory, we may find ourselves taking the error
> again. If the user-space task is scheduled out, and memory_failure() runs,
> the same user-space task may be scheduled in on another CPU, which could
> also take the same error.
>
> These lead to exaggerated error counters, which may cause some threshold
> to be reached early.
>
> This can happen with any error that causes a Synchronous External Abort
> on arm64. I can't see why the same wouldn't happen with a machine-check
> handled firmware first on x86.
>
>
> This series adds a memory_failure_queue_kick() helper to
> memory-failure.c, and calls it as task-work before returning to
> user-space.
>
>
> Currently arm64 papers over this problem by ignoring ghes_notify_sea()'s
> return code as it knows there is still work to do. arm64 generates its
> own signal to user-space, which means the first task to discover an
> error will always be killed, even if the error was later handled.
> (which is no improvement on the no-RAS behaviour)
>
> As a final piece, arm64 can try to process the irq work queued by
> ghes_notify_sea() while its still in the external abort handler. A succesfull
> return value here now means the memory_failure() work will be done before we
> return to user-space, we no longer need to generate our own signal.
> This lets the original task survive the error if memory_failure() can
> recover the corrupted memory.
>
> Based on v5.6-rc2. I'm afraid it touches three different trees.
> $subject says ACPI as that is where the bulk of the diffstat is.
>
> This series may conflict in arm64 with a series from Mark Rutland to
> cleanup the daif/PMR toggling.
>
>
> This would be v9 of these patches, but after a year I figure I should
> start the numbering again. I've dropped any collected tags.
>
> Known issues:
>  * arm64's apei_claim_sea() may unwittingly re-enable debug if it takes
>    an external-abort from debug context. Patch 3 makes this worse
>    instead of fixing it. The fix would make use of helpers from Mark R's
>    series.
>
>
> Thanks,
>
> James
>
>
> [0] https://lore.kernel.org/linux-arm-kernel/20190129184902.102850-1-james.morse@arm.com/
> [1] https://lore.kernel.org/linux-acpi/1506516620-20033-3-git-send-email-xiexiuqi@huawei.com/
>
> James Morse (3):
>   mm/memory-failure: Add memory_failure_queue_kick()
>   ACPI / APEI: Kick the memory_failure() queue for synchronous errors
>   arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
>
>  arch/arm64/kernel/acpi.c | 25 +++++++++++++++
>  arch/arm64/mm/fault.c    | 12 ++++---
>  drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++-------
>  include/acpi/ghes.h      |  3 ++
>  include/linux/mm.h       |  1 +
>  mm/memory-failure.c      | 15 ++++++++-
>  6 files changed, 107 insertions(+), 17 deletions(-)
>
> --
> 2.24.1

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors
@ 2020-03-20 19:19   ` Tyler Baicar OS
  0 siblings, 0 replies; 16+ messages in thread
From: Tyler Baicar OS @ 2020-03-20 19:19 UTC (permalink / raw)
  To: James Morse, linux-mm, linux-acpi, linux-arm-kernel
  Cc: Mark Rutland, Tony Luck, Will Deacon, Xie XiuQi, Catalin Marinas,
	Rafael Wysocki, Borislav Petkov, Andrew Morton, Tyler Baicar,
	Naoya Horiguchi, Len Brown

Hello James,

I think my one comment on patch 2 is valid, right? But for this series:

Tested-by: Tyler Baicar <baicar@os.amperecomputing.com>

Thanks,
Tyler

On Fri, Feb 28, 2020 at 12:48 PM James Morse <james.morse@arm.com> wrote:
>
> Hello!
>
> These are the remaining patches from the SDEI series[0] that fix
> a race between memory_failure() and user-space re-triggering the error
> in ghes.c.
>
>
> ghes_handle_memory_failure() calls memory_failure_queue() from
> IRQ context to schedule memory_failure()s work as it needs to sleep.
> Once the GHES machinery returns from the IRQ, it may return to user-space
> before memory_failure() runs.
>
> If the error that kicked all this off is specific to user-space, e.g. a
> load from corrupted memory, we may find ourselves taking the error
> again. If the user-space task is scheduled out, and memory_failure() runs,
> the same user-space task may be scheduled in on another CPU, which could
> also take the same error.
>
> These lead to exaggerated error counters, which may cause some threshold
> to be reached early.
>
> This can happen with any error that causes a Synchronous External Abort
> on arm64. I can't see why the same wouldn't happen with a machine-check
> handled firmware first on x86.
>
>
> This series adds a memory_failure_queue_kick() helper to
> memory-failure.c, and calls it as task-work before returning to
> user-space.
>
>
> Currently arm64 papers over this problem by ignoring ghes_notify_sea()'s
> return code as it knows there is still work to do. arm64 generates its
> own signal to user-space, which means the first task to discover an
> error will always be killed, even if the error was later handled.
> (which is no improvement on the no-RAS behaviour)
>
> As a final piece, arm64 can try to process the irq work queued by
> ghes_notify_sea() while its still in the external abort handler. A succesfull
> return value here now means the memory_failure() work will be done before we
> return to user-space, we no longer need to generate our own signal.
> This lets the original task survive the error if memory_failure() can
> recover the corrupted memory.
>
> Based on v5.6-rc2. I'm afraid it touches three different trees.
> $subject says ACPI as that is where the bulk of the diffstat is.
>
> This series may conflict in arm64 with a series from Mark Rutland to
> cleanup the daif/PMR toggling.
>
>
> This would be v9 of these patches, but after a year I figure I should
> start the numbering again. I've dropped any collected tags.
>
> Known issues:
>  * arm64's apei_claim_sea() may unwittingly re-enable debug if it takes
>    an external-abort from debug context. Patch 3 makes this worse
>    instead of fixing it. The fix would make use of helpers from Mark R's
>    series.
>
>
> Thanks,
>
> James
>
>
> [0] https://lore.kernel.org/linux-arm-kernel/20190129184902.102850-1-james.morse@arm.com/
> [1] https://lore.kernel.org/linux-acpi/1506516620-20033-3-git-send-email-xiexiuqi@huawei.com/
>
> James Morse (3):
>   mm/memory-failure: Add memory_failure_queue_kick()
>   ACPI / APEI: Kick the memory_failure() queue for synchronous errors
>   arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
>
>  arch/arm64/kernel/acpi.c | 25 +++++++++++++++
>  arch/arm64/mm/fault.c    | 12 ++++---
>  drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++-------
>  include/acpi/ghes.h      |  3 ++
>  include/linux/mm.h       |  1 +
>  mm/memory-failure.c      | 15 ++++++++-
>  6 files changed, 107 insertions(+), 17 deletions(-)
>
> --
> 2.24.1
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
  2020-02-28 17:48   ` James Morse
@ 2020-03-24 16:41     ` Catalin Marinas
  -1 siblings, 0 replies; 16+ messages in thread
From: Catalin Marinas @ 2020-03-24 16:41 UTC (permalink / raw)
  To: James Morse
  Cc: linux-mm, linux-acpi, linux-arm-kernel, Andrew Morton,
	Naoya Horiguchi, Rafael Wysocki, Len Brown, Tony Luck,
	Borislav Petkov, Will Deacon, Mark Rutland, Tyler Baicar,
	Xie XiuQi

On Fri, Feb 28, 2020 at 05:48:17PM +0000, James Morse wrote:
> APEI is unable to do all of its error handling work in nmi-context, so
> it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
> sends an IPI to the calling cpu, but this is not guaranteed to be taken
> before returning to user-space.
> 
> Unless the exception interrupted a context with irqs-masked,
> irq_work_run() can run immediately. Otherwise return -EINPROGRESS to
> indicate ghes_notify_sea() found some work to do, but it hasn't
> finished yet.
> 
> With this apei_claim_sea() returning '0' means this external-abort was
> also notification of a firmware-first RAS error, and that APEI has
> processed the CPER records.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since $last_year:
>  * Dropped all the tags ... its been a year.

I think this patch hasn't changed much since, so my ack still stands.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/3] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work
@ 2020-03-24 16:41     ` Catalin Marinas
  0 siblings, 0 replies; 16+ messages in thread
From: Catalin Marinas @ 2020-03-24 16:41 UTC (permalink / raw)
  To: James Morse
  Cc: Mark Rutland, Tony Luck, linux-acpi, Tyler Baicar, Xie XiuQi,
	Rafael Wysocki, linux-mm, Borislav Petkov, Andrew Morton,
	Will Deacon, Naoya Horiguchi, linux-arm-kernel, Len Brown

On Fri, Feb 28, 2020 at 05:48:17PM +0000, James Morse wrote:
> APEI is unable to do all of its error handling work in nmi-context, so
> it defers non-fatal work onto the irq_work queue. arch_irq_work_raise()
> sends an IPI to the calling cpu, but this is not guaranteed to be taken
> before returning to user-space.
> 
> Unless the exception interrupted a context with irqs-masked,
> irq_work_run() can run immediately. Otherwise return -EINPROGRESS to
> indicate ghes_notify_sea() found some work to do, but it hasn't
> finished yet.
> 
> With this apei_claim_sea() returning '0' means this external-abort was
> also notification of a firmware-first RAS error, and that APEI has
> processed the CPER records.
> 
> Signed-off-by: James Morse <james.morse@arm.com>
> ---
> Changes since $last_year:
>  * Dropped all the tags ... its been a year.

I think this patch hasn't changed much since, so my ack still stands.

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-03-24 16:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 17:48 [PATCH 0/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors James Morse
2020-02-28 17:48 ` James Morse
2020-02-28 17:48 ` [PATCH 1/3] mm/memory-failure: Add memory_failure_queue_kick() James Morse
2020-02-28 17:48   ` James Morse
2020-02-28 17:48 ` [PATCH 2/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors James Morse
2020-02-28 17:48   ` James Morse
2020-03-09 17:07   ` Tyler Baicar OS
2020-03-09 17:07     ` Tyler Baicar OS
2020-03-09 17:07     ` Tyler Baicar OS
2020-02-28 17:48 ` [PATCH 3/3] arm64: acpi: Make apei_claim_sea() synchronise with APEI's irq work James Morse
2020-02-28 17:48   ` James Morse
2020-03-24 16:41   ` Catalin Marinas
2020-03-24 16:41     ` Catalin Marinas
2020-03-20 19:19 ` [PATCH 0/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors Tyler Baicar OS
2020-03-20 19:19   ` Tyler Baicar OS
2020-03-20 19:19   ` Tyler Baicar OS

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.