From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D2B6C3F2D3 for ; Fri, 28 Feb 2020 18:11:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 31A5E246AC for ; Fri, 28 Feb 2020 18:11:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 31A5E246AC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 905C76B000A; Fri, 28 Feb 2020 12:48:33 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 88F316B000C; Fri, 28 Feb 2020 12:48:33 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77E326B000D; Fri, 28 Feb 2020 12:48:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0188.hostedemail.com [216.40.44.188]) by kanga.kvack.org (Postfix) with ESMTP id 5D2DA6B000A for ; Fri, 28 Feb 2020 12:48:33 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2726B5DC8 for ; Fri, 28 Feb 2020 17:48:33 +0000 (UTC) X-FDA: 76540270506.08.bag44_106655af09911 X-HE-Tag: bag44_106655af09911 X-Filterd-Recvd-Size: 8604 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Fri, 28 Feb 2020 17:48:32 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BD5F2106F; Fri, 28 Feb 2020 09:48:31 -0800 (PST) Received: from eglon.cambridge.arm.com (eglon.cambridge.arm.com [10.1.196.105]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id DC02D3F7B4; Fri, 28 Feb 2020 09:48:29 -0800 (PST) From: James Morse To: linux-mm@kvack.org, linux-acpi@vger.kernel.org, linux-arm-kernel@lists.infradead.org Cc: Andrew Morton , Naoya Horiguchi , Rafael Wysocki , Len Brown , Tony Luck , Borislav Petkov , Catalin Marinas , Will Deacon , Mark Rutland , Tyler Baicar , Xie XiuQi , James Morse Subject: [PATCH 2/3] ACPI / APEI: Kick the memory_failure() queue for synchronous errors Date: Fri, 28 Feb 2020 17:48:16 +0000 Message-Id: <20200228174817.74278-3-james.morse@arm.com> X-Mailer: git-send-email 2.24.1 In-Reply-To: <20200228174817.74278-1-james.morse@arm.com> References: <20200228174817.74278-1-james.morse@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: memory_failure() offlines or repairs pages of memory that have been discovered to be corrupt. These may be detected by an external component, (e.g. the memory controller), and notified via an IRQ. In this case the work is queued as not all of memory_failure()s work can happen in IRQ context. If the error was detected as a result of user-space accessing a corrupt memory location the CPU may take an abort instead. On arm64 this is a 'synchronous external abort', and on a firmware first system it is replayed using NOTIFY_SEA. This notification has NMI like properties, (it can interrupt IRQ-masked code), so the memory_failure() work is queued. If we return to user-space before the queued memory_failure() work is processed, we will take the fault again. This loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. For NMIlike notifications keep track of whether memory_failure() work was queued, and make task_work pending to flush out the queue. To save memory allocations, the task_work is allocated as part of the ghes_estatus_node, and free()ing it back to the pool is deferred. Signed-off-by: James Morse --- drivers/acpi/apei/ghes.c | 68 +++++++++++++++++++++++++++++++++------- include/acpi/ghes.h | 3 ++ 2 files changed, 60 insertions(+), 11 deletions(-) diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 103acbbfcf9a..c91c9ec55750 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -40,6 +40,7 @@ #include #include #include +#include =20 #include #include @@ -414,23 +415,47 @@ static void ghes_clear_estatus(struct ghes *ghes, ghes_ack_error(ghes->generic_v2); } =20 -static void ghes_handle_memory_failure(struct acpi_hest_generic_data *gd= ata, int sev) +/* + * Called as task_work before returning to user-space. + * Ensure any queued work has been done before we return to the context = that + * triggered the notification. + */ +static void ghes_kick_task_work(struct callback_head *head) +{ + struct acpi_hest_generic_status *estatus; + struct ghes_estatus_node *estatus_node; + u32 node_len; + + estatus_node =3D container_of(head, struct ghes_estatus_node, task_work= ); + if (IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) + memory_failure_queue_kick(estatus_node->task_work_cpu); + + estatus =3D GHES_ESTATUS_FROM_NODE(estatus_node); + node_len =3D GHES_ESTATUS_NODE_LEN(cper_estatus_len(estatus)); + gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, node_len)= ; +} + +static bool ghes_handle_memory_failure(struct ghes *ghes, + struct acpi_hest_generic_data *gdata, + int sev) { -#ifdef CONFIG_ACPI_APEI_MEMORY_FAILURE unsigned long pfn; int flags =3D -1; int sec_sev =3D ghes_severity(gdata->error_severity); struct cper_sec_mem_err *mem_err =3D acpi_hest_get_payload(gdata); =20 + if (!IS_ENABLED(CONFIG_ACPI_APEI_MEMORY_FAILURE)) + return false; + if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) - return; + return false; =20 pfn =3D mem_err->physical_addr >> PAGE_SHIFT; if (!pfn_valid(pfn)) { pr_warn_ratelimited(FW_WARN GHES_PFX "Invalid address in generic error data: %#llx\n", mem_err->physical_addr); - return; + return false; } =20 /* iff following two events can be handled properly by now */ @@ -440,9 +465,12 @@ static void ghes_handle_memory_failure(struct acpi_h= est_generic_data *gdata, int if (sev =3D=3D GHES_SEV_RECOVERABLE && sec_sev =3D=3D GHES_SEV_RECOVERA= BLE) flags =3D 0; =20 - if (flags !=3D -1) + if (flags !=3D -1) { memory_failure_queue(pfn, flags); -#endif + return true; + } + + return false; } =20 /* @@ -490,7 +518,7 @@ static void ghes_handle_aer(struct acpi_hest_generic_= data *gdata) #endif } =20 -static void ghes_do_proc(struct ghes *ghes, +static bool ghes_do_proc(struct ghes *ghes, const struct acpi_hest_generic_status *estatus) { int sev, sec_sev; @@ -498,6 +526,7 @@ static void ghes_do_proc(struct ghes *ghes, guid_t *sec_type; const guid_t *fru_id =3D &guid_null; char *fru_text =3D ""; + bool queued =3D false; =20 sev =3D ghes_severity(estatus->error_severity); apei_estatus_for_each_section(estatus, gdata) { @@ -515,7 +544,7 @@ static void ghes_do_proc(struct ghes *ghes, ghes_edac_report_mem_error(sev, mem_err); =20 arch_apei_report_mem_error(sev, mem_err); - ghes_handle_memory_failure(gdata, sev); + queued =3D ghes_handle_memory_failure(ghes, gdata, sev); } else if (guid_equal(sec_type, &CPER_SEC_PCIE)) { ghes_handle_aer(gdata); @@ -532,6 +561,8 @@ static void ghes_do_proc(struct ghes *ghes, gdata->error_data_length); } } + + return queued; } =20 static void __ghes_print_estatus(const char *pfx, @@ -827,7 +858,9 @@ static void ghes_proc_in_irq(struct irq_work *irq_wor= k) struct ghes_estatus_node *estatus_node; struct acpi_hest_generic *generic; struct acpi_hest_generic_status *estatus; + bool task_work_pending; u32 len, node_len; + int ret; =20 llnode =3D llist_del_all(&ghes_estatus_llist); /* @@ -842,14 +875,26 @@ static void ghes_proc_in_irq(struct irq_work *irq_w= ork) estatus =3D GHES_ESTATUS_FROM_NODE(estatus_node); len =3D cper_estatus_len(estatus); node_len =3D GHES_ESTATUS_NODE_LEN(len); - ghes_do_proc(estatus_node->ghes, estatus); + task_work_pending =3D ghes_do_proc(estatus_node->ghes, estatus); if (!ghes_estatus_cached(estatus)) { generic =3D estatus_node->generic; if (ghes_print_estatus(NULL, generic, estatus)) ghes_estatus_cache_add(generic, estatus); } - gen_pool_free(ghes_estatus_pool, (unsigned long)estatus_node, - node_len); + + if (task_work_pending && current->mm !=3D &init_mm) { + estatus_node->task_work.func =3D ghes_kick_task_work; + estatus_node->task_work_cpu =3D smp_processor_id(); + ret =3D task_work_add(current, &estatus_node->task_work, + true); + if (ret) + estatus_node->task_work.func =3D NULL; + } + + if (!estatus_node->task_work.func) + gen_pool_free(ghes_estatus_pool, + (unsigned long)estatus_node, node_len); + llnode =3D next; } } @@ -909,6 +954,7 @@ static int ghes_in_nmi_queue_one_entry(struct ghes *g= hes, =20 estatus_node->ghes =3D ghes; estatus_node->generic =3D ghes->generic; + estatus_node->task_work.func =3D NULL; estatus =3D GHES_ESTATUS_FROM_NODE(estatus_node); =20 if (__ghes_read_estatus(estatus, buf_paddr, fixmap_idx, len)) { diff --git a/include/acpi/ghes.h b/include/acpi/ghes.h index e3f1cddb4ac8..517a5231cc1b 100644 --- a/include/acpi/ghes.h +++ b/include/acpi/ghes.h @@ -33,6 +33,9 @@ struct ghes_estatus_node { struct llist_node llnode; struct acpi_hest_generic *generic; struct ghes *ghes; + + int task_work_cpu; + struct callback_head task_work; }; =20 struct ghes_estatus_cache { --=20 2.24.1