From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-14.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_2 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AACFC433DB for ; Thu, 25 Feb 2021 03:43:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C92D464ECF for ; Thu, 25 Feb 2021 03:43:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C92D464ECF Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kingsoft.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3FD166B0006; Wed, 24 Feb 2021 22:43:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3AC9F6B006C; Wed, 24 Feb 2021 22:43:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29B066B006E; Wed, 24 Feb 2021 22:43:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0045.hostedemail.com [216.40.44.45]) by kanga.kvack.org (Postfix) with ESMTP id 108016B0006 for ; Wed, 24 Feb 2021 22:43:41 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D073A801C918 for ; Thu, 25 Feb 2021 03:43:40 +0000 (UTC) X-FDA: 77855395800.29.94A9CA5 Received: from mail.kingsoft.com (unknown [114.255.44.146]) by imf11.hostedemail.com (Postfix) with ESMTP id 82F7E2000386 for ; Thu, 25 Feb 2021 03:43:23 +0000 (UTC) X-AuditID: 0a580157-f39ff7000005df43-2f-6037169b23e1 Received: from mail.kingsoft.com (localhost [10.88.1.32]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-1-NODE-87) with SMTP id 0A.5F.57155.B9617306; Thu, 25 Feb 2021 11:16:43 +0800 (HKT) Received: from alex-virtual-machine (172.16.253.254) by KSBJMAIL2.kingsoft.cn (10.88.1.32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1979.3; Thu, 25 Feb 2021 11:43:30 +0800 Date: Thu, 25 Feb 2021 11:43:29 +0800 From: Aili Yao To: Oscar Salvador , , , CC: , , , , , , , , , , Subject: Re: [PATCH] mm,hwpoison: return -EBUSY when page already poisoned Message-ID: <20210225114329.4e1a41c6@alex-virtual-machine> In-Reply-To: <20210224103105.GA16368@linux> References: <20210224151619.67c29731@alex-virtual-machine> <20210224103105.GA16368@linux> Organization: kingsoft X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [172.16.253.254] X-ClientProxiedBy: KSBJMAIL1.kingsoft.cn (10.88.1.31) To KSBJMAIL2.kingsoft.cn (10.88.1.32) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFprJIsWRmVeSWpSXmKPExsXCFcGooDtbzDzBYNdLQYs569ewWXze8I/N 4uv6X8wW0zaKW9w+sIbR4vKuOWwW99b8Z7W4dGABk8XFxgOMFmemFVls3jSV2eLNhXssFj82 PGZ14PX43trH4rF4z0smj02rOtk8Nn2axO7x7tw5do8TM36zeLy4upHF4/2+q2wem09Xe3ze JOdxouULawB3FJdNSmpOZllqkb5dAldG17OD7AVbdCtmblrE2MC4TrmLkYNDQsBE4ts1uS5G Tg4hgelMEkf6bboYuYDsV4wSM1bMYQJJsAioShw48pwRxGYDsnfdm8UKYosI5Els71rKDGIz C7QzSTy8XARiCwt4SXy5vxasnlfASmLjq2VsIDangLbEo95GVohl0RL7jrSB1fALiEn0XvkP tktCwF6ibcsiqF5BiZMzn7BAzNeROLHqGNQueYntb+cwQ8xRlDi85Bc7RK+SxJHuGWwQdqzE snmvWCcwCs9CMmoWklGzkIxawMi8ipGlODfdcBMjJPbCdzDOa/qod4iRiYPxEKMEB7OSCO/m f6YJQrwpiZVVqUX58UWlOanFhxilOViUxHml2MwThATSE0tSs1NTC1KLYLJMHJxSDUzsd7qt TjkKzY3feOB7oMaMRLfez+Upxh/3t/GeaPifaOhQ1C/S9S5Peseu/gUPQ6o7H5QeC7k4cdvu pHRt/WDegpUqz7ZfeSzwsD33XsZcKz27snPiBSpSVqc1PuhmZXfbTxFfVnHk7dGLvorVm0qu Lbr7ZVVmqvbxCbGb+vMC/Nc9PClfeMA5bG3ogQ5pQ16x3bnWkrluqpPav39le7dgwwnJ5ruH z1f6R1jXHE//6vu28Sa/9/vXZUlJd3JeP/7y9MD9y5x2GTsfqNow/k/eOW9/ivxDruiVf9uc LGLmrFY3Pf5G957YN7G3qhe/WLSudjPoe8WndaVHXLdRl3m++NQ19oI25fFn/qYH+CuxFGck GmoxFxUnAgAQST0bLAMAAA== X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 82F7E2000386 X-Stat-Signature: mkht95aur5oget9mkquq7saf4t5j87bn Received-SPF: none (kingsoft.com>: No applicable sender policy available) receiver=imf11; identity=mailfrom; envelope-from=""; helo=mail.kingsoft.com; client-ip=114.255.44.146 X-HE-DKIM-Result: none/none X-HE-Tag: 1614224603-988443 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 24 Feb 2021 11:31:55 +0100 Oscar Salvador wrote: > I have some questions: > > > 1.When LCME is enabled, and there are two processes A && B running on > > different core X && Y separately, which will access one same page, then > > the page corrupted when process A access it, a MCE will be rasied to > > core X and the error process is just underway. > > When !LMCE, that is not a problem because new MCE needs to wait for the ongoing MCE? I am not sure whether this case will happen when !LMCE, when I realized this place may be an issue I tried to reproduce it and my configuration is LMCE enabled. > > 2.Then B access the page and trigger another MCE to core Y, it will also > > do error process, it will see TestSetPageHWPoison be true, and 0 is > > returned. > > For non-nested calls, that is no problem because the page will be taken out > of business(unmapped from the processes), right? So no more MCE are possible. Yes, I think after the recovery jod is finished, other processes still access the page will meet a page fault and error will be returned; > > > > 3.The kill_me_maybe will check the return: > > > > 1244 static void kill_me_maybe(struct callback_head *cb) > > 1245 { > > > > 1254 if (!memory_failure(p->mce_addr >> PAGE_SHIFT, flags) && > > 1255 !(p->mce_kflags & MCE_IN_KERNEL_COPYIN)) { > > 1256 set_mce_nospec(p->mce_addr >> PAGE_SHIFT, > > So, IIUC, in case of a LMCE nested call, the second MCE will reach here. > set_mce_nospec() will either mark the underlying page as not mapped/cached. > This set_mce_nospec() is not proper when the recovery job is on the fly. In my test this function failed. > Should not have memory_failure()->hwpoison_user_mappings() unmapped the page > from both process A and B? Or this is in case the ongoing MCE(process A) has > not still unmapped anything, so process B can still access this page. > What I care is the process B triggered the error again after process A, I don't know how it return and proceed. > So with your change, process B will be sent a SIGBUG, while process A is still > handling the MCE, right? Right! > > p->mce_whole_page); > > 1257 sync_core(); > > 1258 return; > > 1259 } > > > > 1267 } > > > > 4. The error process for B will end, and may nothing happened if > > kill-early is not set, We may let the wrong data go into effect. > > > > For other cases which care the return value of memory_failure() should > > check why they want to process a memory error which have already been > > processed. This behavior seems reasonable. > > > > In kill_me_maybe, log the fact about the memory may not recovered, and > > we will kill the related process. > > > > Signed-off-by: Aili Yao > > --- > > arch/x86/kernel/cpu/mce/core.c | 2 ++ > > mm/memory-failure.c | 4 ++-- > > 2 files changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > > index e133ce1e562b..db4afc5bf15a 100644 > > --- a/arch/x86/kernel/cpu/mce/core.c > > +++ b/arch/x86/kernel/cpu/mce/core.c > > @@ -1259,6 +1259,8 @@ static void kill_me_maybe(struct callback_head *cb) > > } > > > > if (p->mce_vaddr != (void __user *)-1l) { > > + pr_err("Memory error may not recovered: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n", > > + p->mce_addr >> PAGE_SHIFT, p->comm, p->pid); > > force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT); > > } else { > > pr_err("Memory error not recovered"); > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index e9481632fcd1..06f006174b8c 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -1224,7 +1224,7 @@ static int memory_failure_hugetlb(unsigned long pfn, int flags) > > if (TestSetPageHWPoison(head)) { > > pr_err("Memory failure: %#lx: already hardware poisoned\n", > > pfn); > > - return 0; > > + return -EBUSY; > > As David said, madvise_inject_error() will start returning -EBUSY now in case > we madvise(MADV_HWPOISON) on an already hwpoisoned page. > > AFAICS, memory_failure() can return 0, -Eerrors, and MF_XXX. > Would it make sense to unify that? That way we could declare error codes that > make somse sense (like MF_ALREADY_HWPOISONED). > @David: I checked the code again, and find a few places will care the exact return value, like: 1: drivers/base/memory.c:483: ret = memory_failure(pfn, 0); This is for hard page offline, I see the code in mcelog: static void offline_action(struct mempage *mp, u64 addr) { if (offline <= OFFLINE_ACCOUNT) return; Lprintf("Offlining page %llx\n", addr); if (memory_offline(addr) < 0) { Lprintf("Offlining page %llx failed: %s\n", addr, strerror(errno)); mp->offlined = PAGE_OFFLINE_FAILED; } else mp->offlined = PAGE_OFFLINE; } I think return an negative value will be more proper? As the related killing function may not be performed, and we can't say it's a success operation? 2:mm/hwpoison-inject.c:51: return memory_failure(pfn, 0); mm/madvise.c:910: ret = memory_failure(pfn, MF_COUNT_INCREASED); These two cases are mainly for error injections, I checked the test codes, mostly it only care if the value is 0 or < 0; I do the related test, normally it work well, but for stress test, sometimes in some case, I do meet some fail cases along with the -EBUSY return. I will dig more. Other place will only care if the return value is 0. or just ignore it. Hi naoya, what's your opnion for this possible issue, I need your inputs! Thanks Aili Yao