From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 357A8C433DB for ; Wed, 24 Feb 2021 07:16:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 57AC964E79 for ; Wed, 24 Feb 2021 07:16:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 57AC964E79 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kingsoft.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D111A6B0070; Wed, 24 Feb 2021 02:16:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CC1206B0071; Wed, 24 Feb 2021 02:16:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD69D6B0072; Wed, 24 Feb 2021 02:16:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id A98356B0070 for ; Wed, 24 Feb 2021 02:16:32 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6D0B5180AD83B for ; Wed, 24 Feb 2021 07:16:32 +0000 (UTC) X-FDA: 77852303424.11.C7C507D Received: from mail.kingsoft.com (mail.kingsoft.com [114.255.44.145]) by imf16.hostedemail.com (Postfix) with ESMTP id 8DF8280192E2 for ; Wed, 24 Feb 2021 07:16:27 +0000 (UTC) X-AuditID: 0a580155-713ff700000550c6-bc-6035f6376939 Received: from mail.kingsoft.com (localhost [10.88.1.32]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-2-NODE-85) with SMTP id 50.3A.20678.736F5306; Wed, 24 Feb 2021 14:46:15 +0800 (HKT) Received: from alex-virtual-machine (172.16.253.254) by KSBJMAIL2.kingsoft.cn (10.88.1.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Wed, 24 Feb 2021 15:16:19 +0800 Date: Wed, 24 Feb 2021 15:16:19 +0800 From: Aili Yao To: , , , , , , CC: , , , , , Subject: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Message-ID: <20210224151619.67c29731@alex-virtual-machine> Organization: kingsoft X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [172.16.253.254] X-ClientProxiedBy: KSBJMAIL1.kingsoft.cn (10.88.1.31) To KSBJMAIL2.kingsoft.cn (10.88.1.32) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrOLMWRmVeSWpSXmKPExsXCFcGooGv+zTTBYOFGKYs569ewWXze8I/N YtpGcYvbB9YwWlzeNYfN4t6a/6wWlw4sYLK42HiA0WLzpqnMFm8u3GOx+LHhMasDt8f31j4W j8V7XjJ5bFrVyeax6dMkdo93586xe5yY8ZvF48XVjSwe7/ddZfP4vEnO40TLF9YArigum5TU nMyy1CJ9uwSujJeLFrEUTBev+Dj7IGMD4zqhLkZODgkBE4mLr46ydTFycQgJTGeSOLroKiOE 84pRYtHpOexdjBwcLAKqEj9vmoM0sAGZu+7NYgWpERFYwCix+fdDZhCHWWAuo8SRv8dZQKqE Bdwk7u1ZyQpi8wpYSTy83QFm8wuISfRe+c8Esdpeom3LIkaIGkGJkzOfgPUyC+hInFh1jBnC lpfY/nYOmC0koChxeMkvdoheJYkj3TPYIOxYiWXzXrFOYBSchWTULCSjZiEZtYCReRUjS3Fu utEmRkjchO5gnNH0Ue8QIxMH4yFGCQ5mJRFetrtGCUK8KYmVValF+fFFpTmpxYcYpTlYlMR5 p241SRASSE8sSc1OTS1ILYLJMnFwSjUweYiXZzH7bZdNixcQMr5Sdjbs8oYtoiWnIq/81xW2 dxCRY+bQ3H48b9fzqdrn9t3RXcQ8tU3+hfquCDsJ/eU6ax9qrZWOE4z3F5t279B1v8TLuT8u TTq64bB1y3LZjKdeGX2Xn9ltmqhvlRTrEXYtkE8n9PoE2Yo7Li55KkuLl08rLBUyuN6zWNFU Z+eORkF/9j03OK3Y9yz5IZ5RV7PGUWTj86pJLzrOxPKHiHW0F5/4fb18b9eJTZf5Ciz32yU+ b3VpiO+ftmnNhx3OC5fsEvfmcvywvSb0aMyOTYxXSs2WeKvlX/50ZtOxsCtKefv9RQ5rTHzc KrCBWaIv/2GRhnW+s+XFn6uvuF1bOsdViaU4I9FQi7moOBEAoaWicgoDAAA= X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8DF8280192E2 X-Stat-Signature: w1g6w5hzfpm4c6kpqarccydunsx3oham Received-SPF: none (kingsoft.com>: No applicable sender policy available) receiver=imf16; identity=mailfrom; envelope-from=""; helo=mail.kingsoft.com; client-ip=114.255.44.145 X-HE-DKIM-Result: none/none X-HE-Tag: 1614150987-254482 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When the page is already poisoned, another memory_failure() call in the same page now return 0, meaning OK. For nested memory mce handling, this behavior may lead real serious problem, Example: 1.When LCME is enabled, and there are two processes A && B running on different core X && Y separately, which will access one same page, then the page corrupted when process A access it, a MCE will be rasied to core X and the error process is just underway. 2.Then B access the page and trigger another MCE to core Y, it will also do error process, it will see TestSetPageHWPoison be true, and 0 is returned. 3.The kill_me_maybe will check the return: 1244 static void kill_me_maybe(struct callback_head *cb) 1245 { 1254 if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && 1255 !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { 1256 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, p->mce_whole_page); 1257 sync_core(); 1258 return; 1259 } 1267 } 4. The error process for B will end, and may nothing happened if kill-early is not set, We may let the wrong data go into effect. For other cases which care the return value of memory_failure() should check why they want to process a memory error which have already been processed. This behavior seems reasonable. In kill_me_maybe, log the fact about the memory may not recovered, and we will kill the related process. Signed-off-by: Aili Yao --- arch/x86/kernel/cpu/mce/core.c | 2 ++ mm/memory-failure.c | 4 ++-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index e133ce1e562b..db4afc5bf15a 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -1259,6 +1259,8 @@ static void kill_me_maybe(struct callback_head *cb) } if (p->mce_vaddr != (void __user *)-1l) { + pr_err("Memory error may not recovered: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n", + p->mce_addr >> PAGE_SHIFT, p->comm, p->pid); force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT); } else { pr_err("Memory error not recovered"); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index e9481632fcd1..06f006174b8c 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1224,7 +1224,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) if (TestSetPageHWPoison(head)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + return -EBUSY; } num_poisoned_pages_inc(); @@ -1420,7 +1420,7 @@ int memory_failure(unsigned long pfn, int flags) if (TestSetPageHWPoison(p)) { pr_err("Memory failure: %#lx: already hardware poisoned\n", pfn); - return 0; + return -EBUSY; } orig_head = hpage = compound_head(p); -- 2.25.1