From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1BE1AC63708 for ; Wed, 7 Dec 2022 09:39:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 668798E0001; Wed, 7 Dec 2022 04:39:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FDF38E0006; Wed, 7 Dec 2022 04:39:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FBD28E0001; Wed, 7 Dec 2022 04:39:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 107AD8E0003 for ; Wed, 7 Dec 2022 04:39:55 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D471012039F for ; Wed, 7 Dec 2022 09:39:54 +0000 (UTC) X-FDA: 80215013508.01.7BD46CF Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by imf12.hostedemail.com (Postfix) with ESMTP id D5E5140015 for ; Wed, 7 Dec 2022 09:39:53 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf12.hostedemail.com: domain of lvying6@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=lvying6@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670405994; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lchbkpFC8/dZsxo+xEh1BbRQ1U6asHstsQzxmyE7OPc=; b=WKLW0sg2lLNgeQz/6MVJzyZLzXao+OmAUjDJc31bgpKysCpgyjlHnp2kbuKM2Mv8zENtjd Y3H3az4RCZwaSiIOuEeouPQdVp94R7w4m7y/pjm3ow71146GiXXaGrGlrbBFAUb70Cnetb vot2VaF8G4vrI3Ylr1VWQ820X2YAR5E= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf12.hostedemail.com: domain of lvying6@huawei.com designates 45.249.212.189 as permitted sender) smtp.mailfrom=lvying6@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670405994; a=rsa-sha256; cv=none; b=qjmv3Rz+4snsvlNsXxobngXyOfUtdoeOTdiemitp/pheII1zV8IqwZZhPctnmGoake0NST hDogmAVx0QEEwSf7EDg8TlZ0pwkZpTf4wqdgPBp5h95oAW1jibDFVQkocRecVf0GQ8XeJh 9nZTzJZ/ivnoL18gy3mIORgQ+7QD0Wk= Received: from kwepemi500015.china.huawei.com (unknown [172.30.72.54]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4NRsc341rMzJp8f; Wed, 7 Dec 2022 17:36:19 +0800 (CST) Received: from huawei.com (10.175.124.27) by kwepemi500015.china.huawei.com (7.221.188.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 7 Dec 2022 17:39:16 +0800 From: Lv Ying To: , , , , , , , , , CC: , , , , , , , Subject: [RFC PATCH v1 2/2] ACPI: APEI: fix reboot caused by synchronous error loop because of memory_failure() failed Date: Wed, 7 Dec 2022 17:39:35 +0800 Message-ID: <20221207093935.1972530-3-lvying6@huawei.com> X-Mailer: git-send-email 2.36.1 In-Reply-To: <20221207093935.1972530-1-lvying6@huawei.com> References: <20221207093935.1972530-1-lvying6@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.175.124.27] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To kwepemi500015.china.huawei.com (7.221.188.92) X-CFilter-Loop: Reflected X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: D5E5140015 X-Stat-Signature: 419h7b3cwngcc9z1188586yyuhec8yi9 X-Spamd-Result: default: False [-2.20 / 9.00]; BAYES_HAM(-6.00)[99.99%]; R_MISSING_CHARSET(2.50)[]; MID_CONTAINS_FROM(1.00)[]; SUBJECT_HAS_UNDERSCORES(1.00)[]; DMARC_POLICY_ALLOW(-0.50)[huawei.com,quarantine]; R_SPF_ALLOW(-0.20)[+ip4:45.249.212.187/29]; MIME_GOOD(-0.10)[text/plain]; RCVD_NO_TLS_LAST(0.10)[]; R_DKIM_NA(0.00)[]; FROM_EQ_ENVFROM(0.00)[]; RCPT_COUNT_TWELVE(0.00)[18]; RCVD_COUNT_THREE(0.00)[3]; MIME_TRACE(0.00)[0:+]; HAS_XOIP(0.00)[]; TO_MATCH_ENVRCPT_SOME(0.00)[]; FROM_HAS_DN(0.00)[]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; TO_DN_NONE(0.00)[]; ARC_NA(0.00)[] X-HE-Tag: 1670405993-747455 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Synchronous error was detected as a result of user-space accessing a corrupt memory location the CPU may take an abort instead. On arm64 this is a 'synchronous external abort' which can be notified by SEA. If memory_failure() failed, we return to user-space will trigger SEA again, such loop may cause platform firmware to exceed some threshold and reboot when Linux could have recovered from this error. Not all memory_failure() processing failures will cause the reboot, VM_FAULT_HWPOISON[_LARGE] handling in arm64 page fault will send SIGBUS signal to the user-space accessing process to terminate this loop. If process mapping fault page, but memory_failure() abnormal return before try_to_unmap(), for example, the fault page process mapping is KSM page. In this case, arm64 cannot use the page fault process to terminate the loop. Add judgement of memory_failure() result in task_work before returning to user-space. If memory_failure() failed, send SIGBUS signal to the current process to avoid SEA loop. Signed-off-by: Lv Ying --- mm/memory-failure.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 3b6ac3694b8d..07ec7b62f330 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2255,7 +2255,7 @@ static void __memory_failure_work_func(struct work_struct *work, bool sync) struct memory_failure_cpu *mf_cpu; struct memory_failure_entry entry = { 0, }; unsigned long proc_flags; - int gotten; + int gotten, ret; mf_cpu = container_of(work, struct memory_failure_cpu, work); for (;;) { @@ -2266,7 +2266,16 @@ static void __memory_failure_work_func(struct work_struct *work, bool sync) break; if (entry.flags & MF_SOFT_OFFLINE) soft_offline_page(entry.pfn, entry.flags); - else if (!sync || (entry.flags & MF_ACTION_REQUIRED)) + else if (sync) { + if (entry.flags & MF_ACTION_REQUIRED) { + ret = memory_failure(entry.pfn, entry.flags); + if (ret == -EHWPOISON || ret == -EOPNOTSUPP) + return; + + pr_err("Memory error not recovered"); + force_sig(SIGBUS); + } + } else memory_failure(entry.pfn, entry.flags); } } -- 2.36.1