All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xie XiuQi <xiexiuqi@huawei.com>
To: <catalin.marinas@arm.com>, <will@kernel.org>,
	<james.morse@arm.com>, <rafael@kernel.org>, <tony.luck@intel.com>,
	<robert.moore@intel.com>, <bp@alien8.de>, <devel@acpica.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-acpi@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Cc: <tanxiaofei@huawei.com>, <wangxiongfeng2@huawei.com>,
	<lvying6@huawei.com>, <naoya.horiguchi@nec.com>,
	<wangkefeng.wang@huawei.com>
Subject: [PATCH v3 3/4] arm64: ghes: handle the case when memory_failure recovery failed
Date: Tue, 6 Dec 2022 00:00:42 +0800	[thread overview]
Message-ID: <20221205160043.57465-4-xiexiuqi@huawei.com> (raw)
In-Reply-To: <20221205160043.57465-1-xiexiuqi@huawei.com>

memory_failure() may not always recovery successfully. In synchronous 
external data abort case, if memory_failure() recovery failed, we must handle it.

In this case, if the recovery fails, the common helper function
arch_apei_do_recovery_failed() is invoked. For arm64 platform, we just
send a SIGBUS.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 drivers/acpi/apei/ghes.c |  3 ++-
 include/linux/mm.h       |  2 +-
 mm/memory-failure.c      | 24 +++++++++++++++++-------
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index ba0631c54c52..ddc4da603215 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -435,7 +435,8 @@ static void ghes_kick_task_work(struct callback_head *head)
 
 	estatus_node = container_of(head, struct ghes_estatus_node, task_work);
 	if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
-		memory_failure_queue_kick(estatus_node->task_work_cpu);
+		if (memory_failure_queue_kick(estatus_node->task_work_cpu))
+			arch_apei_do_recovery_failed();
 
 	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
 	node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus));
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 974ccca609d2..126d1395c208 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3290,7 +3290,7 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
 		      unsigned long count, int mf_flags);
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
-extern void memory_failure_queue_kick(int cpu);
+extern int memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
 extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index bead6bccc7f2..b9398f67264a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2240,12 +2240,12 @@ void memory_failure_queue(unsigned long pfn, int flags)
 }
 EXPORT_SYMBOL_GPL(memory_failure_queue);
 
-static void memory_failure_work_func(struct work_struct *work)
+static int __memory_failure_work_func(struct work_struct *work)
 {
 	struct memory_failure_cpu *mf_cpu;
 	struct memory_failure_entry entry = { 0, };
 	unsigned long proc_flags;
-	int gotten;
+	int gotten, ret = 0, result;
 
 	mf_cpu = container_of(work, struct memory_failure_cpu, work);
 	for (;;) {
@@ -2254,24 +2254,34 @@ static void memory_failure_work_func(struct work_struct *work)
 		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
 		if (!gotten)
 			break;
-		if (entry.flags & MF_SOFT_OFFLINE)
+		if (entry.flags & MF_SOFT_OFFLINE) {
 			soft_offline_page(entry.pfn, entry.flags);
-		else
-			memory_failure(entry.pfn, entry.flags);
+		} else {
+			result = memory_failure(entry.pfn, entry.flags);
+			if (ret == 0 && result != 0)
+				ret = result;
+		}
 	}
+
+	return ret;
+}
+
+static void memory_failure_work_func(struct work_struct *work)
+{
+	__memory_failure_work_func(work);
 }
 
 /*
  * Process memory_failure work queued on the specified CPU.
  * Used to avoid return-to-userspace racing with the memory_failure workqueue.
  */
-void memory_failure_queue_kick(int cpu)
+int memory_failure_queue_kick(int cpu)
 {
 	struct memory_failure_cpu *mf_cpu;
 
 	mf_cpu = &per_cpu(memory_failure_cpu, cpu);
 	cancel_work_sync(&mf_cpu->work);
-	memory_failure_work_func(&mf_cpu->work);
+	return __memory_failure_work_func(&mf_cpu->work);
 }
 
 static int __init memory_failure_init(void)
-- 
2.20.1


WARNING: multiple messages have this Message-ID (diff)
From: Xie XiuQi <xiexiuqi@huawei.com>
To: <catalin.marinas@arm.com>, <will@kernel.org>,
	<james.morse@arm.com>, <rafael@kernel.org>, <tony.luck@intel.com>,
	<robert.moore@intel.com>, <bp@alien8.de>, <devel@acpica.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-acpi@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Cc: <tanxiaofei@huawei.com>, <wangxiongfeng2@huawei.com>,
	<lvying6@huawei.com>, <naoya.horiguchi@nec.com>,
	<wangkefeng.wang@huawei.com>
Subject: [PATCH v3 3/4] arm64: ghes: handle the case when memory_failure recovery failed
Date: Tue, 6 Dec 2022 00:00:42 +0800	[thread overview]
Message-ID: <20221205160043.57465-4-xiexiuqi@huawei.com> (raw)
In-Reply-To: <20221205160043.57465-1-xiexiuqi@huawei.com>

memory_failure() may not always recovery successfully. In synchronous 
external data abort case, if memory_failure() recovery failed, we must handle it.

In this case, if the recovery fails, the common helper function
arch_apei_do_recovery_failed() is invoked. For arm64 platform, we just
send a SIGBUS.

Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
---
 drivers/acpi/apei/ghes.c |  3 ++-
 include/linux/mm.h       |  2 +-
 mm/memory-failure.c      | 24 +++++++++++++++++-------
 3 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index ba0631c54c52..ddc4da603215 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -435,7 +435,8 @@ static void ghes_kick_task_work(struct callback_head *head)
 
 	estatus_node = container_of(head, struct ghes_estatus_node, task_work);
 	if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE))
-		memory_failure_queue_kick(estatus_node->task_work_cpu);
+		if (memory_failure_queue_kick(estatus_node->task_work_cpu))
+			arch_apei_do_recovery_failed();
 
 	estatus = GHES_ESTATUS_FROM_NODE(estatus_node);
 	node_len = GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus));
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 974ccca609d2..126d1395c208 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3290,7 +3290,7 @@ int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index,
 		      unsigned long count, int mf_flags);
 extern int memory_failure(unsigned long pfn, int flags);
 extern void memory_failure_queue(unsigned long pfn, int flags);
-extern void memory_failure_queue_kick(int cpu);
+extern int memory_failure_queue_kick(int cpu);
 extern int unpoison_memory(unsigned long pfn);
 extern int sysctl_memory_failure_early_kill;
 extern int sysctl_memory_failure_recovery;
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index bead6bccc7f2..b9398f67264a 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2240,12 +2240,12 @@ void memory_failure_queue(unsigned long pfn, int flags)
 }
 EXPORT_SYMBOL_GPL(memory_failure_queue);
 
-static void memory_failure_work_func(struct work_struct *work)
+static int __memory_failure_work_func(struct work_struct *work)
 {
 	struct memory_failure_cpu *mf_cpu;
 	struct memory_failure_entry entry = { 0, };
 	unsigned long proc_flags;
-	int gotten;
+	int gotten, ret = 0, result;
 
 	mf_cpu = container_of(work, struct memory_failure_cpu, work);
 	for (;;) {
@@ -2254,24 +2254,34 @@ static void memory_failure_work_func(struct work_struct *work)
 		spin_unlock_irqrestore(&mf_cpu->lock, proc_flags);
 		if (!gotten)
 			break;
-		if (entry.flags & MF_SOFT_OFFLINE)
+		if (entry.flags & MF_SOFT_OFFLINE) {
 			soft_offline_page(entry.pfn, entry.flags);
-		else
-			memory_failure(entry.pfn, entry.flags);
+		} else {
+			result = memory_failure(entry.pfn, entry.flags);
+			if (ret == 0 && result != 0)
+				ret = result;
+		}
 	}
+
+	return ret;
+}
+
+static void memory_failure_work_func(struct work_struct *work)
+{
+	__memory_failure_work_func(work);
 }
 
 /*
  * Process memory_failure work queued on the specified CPU.
  * Used to avoid return-to-userspace racing with the memory_failure workqueue.
  */
-void memory_failure_queue_kick(int cpu)
+int memory_failure_queue_kick(int cpu)
 {
 	struct memory_failure_cpu *mf_cpu;
 
 	mf_cpu = &per_cpu(memory_failure_cpu, cpu);
 	cancel_work_sync(&mf_cpu->work);
-	memory_failure_work_func(&mf_cpu->work);
+	return __memory_failure_work_func(&mf_cpu->work);
 }
 
 static int __init memory_failure_init(void)
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2022-12-05 15:43 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-05 16:00 [PATCH v3 0/4] arm64: improve handle synchronous External Data Abort Xie XiuQi
2022-12-05 16:00 ` Xie XiuQi
2022-12-05 16:00 ` [PATCH v3 1/4] ACPI: APEI: include missing acpi/apei.h Xie XiuQi
2022-12-05 16:00   ` Xie XiuQi
2022-12-05 16:00 ` [PATCH v3 2/4] arm64: ghes: fix error unhandling in synchronous External Data Abort Xie XiuQi
2022-12-05 16:00   ` Xie XiuQi
2022-12-05 16:00 ` Xie XiuQi [this message]
2022-12-05 16:00   ` [PATCH v3 3/4] arm64: ghes: handle the case when memory_failure recovery failed Xie XiuQi
2022-12-05 16:00 ` [PATCH v3 4/4] arm64: ghes: pass MF_ACTION_REQUIRED to memory_failure when sea Xie XiuQi
2022-12-05 16:00   ` Xie XiuQi
2022-12-10 13:35 ` [PATCH v3 0/4] arm64: improve handle synchronous External Data Abort Shuai Xue
2022-12-10 13:35   ` Shuai Xue

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221205160043.57465-4-xiexiuqi@huawei.com \
    --to=xiexiuqi@huawei.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=devel@acpica.org \
    --cc=james.morse@arm.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lvying6@huawei.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=rafael@kernel.org \
    --cc=robert.moore@intel.com \
    --cc=tanxiaofei@huawei.com \
    --cc=tony.luck@intel.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=wangxiongfeng2@huawei.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.