From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,HK_RANDOM_FROM,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_2 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 57E51C43460 for ; Tue, 6 Apr 2021 01:04:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D0F63613F3 for ; Tue, 6 Apr 2021 01:04:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D0F63613F3 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kingsoft.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3CFE96B007B; Mon, 5 Apr 2021 21:04:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 37E396B007D; Mon, 5 Apr 2021 21:04:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F8D86B007E; Mon, 5 Apr 2021 21:04:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id 02E716B007B for ; Mon, 5 Apr 2021 21:04:50 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id ADA92180ACF8F for ; Tue, 6 Apr 2021 01:04:50 +0000 (UTC) X-FDA: 78000147540.30.4989FDE Received: from mail.kingsoft.com (unknown [114.255.44.146]) by imf12.hostedemail.com (Postfix) with ESMTP id E2C08E6 for ; Tue, 6 Apr 2021 01:04:45 +0000 (UTC) X-AuditID: 0a580155-f6dff70000015057-ce-606bb3ada819 Received: from mail.kingsoft.com (localhost [10.88.1.79]) (using TLS with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client did not present a certificate) by mail.kingsoft.com (SMG-2-NODE-85) with SMTP id B5.22.20567.DA3BB606; Tue, 6 Apr 2021 09:04:45 +0800 (HKT) Received: from alex-virtual-machine (172.16.253.254) by KSBJMAIL4.kingsoft.cn (10.88.1.79) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.2; Tue, 6 Apr 2021 09:04:45 +0800 Date: Tue, 6 Apr 2021 09:04:44 +0800 From: Aili Yao To: "HORIGUCHI =?UTF-8?B?TkFPWUE=?=(=?UTF-8?B?5aCA5Y+j44CA55u05Lmf?=)" , "Luck, Tony" CC: Oscar Salvador , "david@redhat.com" , "akpm@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "yangfeng1@kingsoft.com" , "sunhao2@kingsoft.com" , Subject: Re: [PATCH v3] mm,hwpoison: return -EHWPOISON when page already poisoned Message-ID: <20210406090444.2a69b9e2@alex-virtual-machine> In-Reply-To: <20210405135017.GA6504@hori.linux.bs1.fc.nec.co.jp> References: <3690ece2101d428fb9067fcd2a423ff8@intel.com> <20210308223839.GA21886@hori.linux.bs1.fc.nec.co.jp> <20210308225504.GA233893@agluck-desk2.amr.corp.intel.com> <20210309100421.3d09b6b1@alex-virtual-machine> <20210309060440.GA29668@hori.linux.bs1.fc.nec.co.jp> <20210309143534.6c1a8ec5@alex-virtual-machine> <20210331192540.2141052f@alex-virtual-machine> <20210401153320.GA426964@agluck-desk2.amr.corp.intel.com> <20210402091820.04d7c3e0@alex-virtual-machine> <20210405135017.GA6504@hori.linux.bs1.fc.nec.co.jp> Organization: kingsoft X-Mailer: Claws Mail 3.17.5 (GTK+ 2.24.30; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Originating-IP: [172.16.253.254] X-ClientProxiedBy: KSBJMAIL1.kingsoft.cn (10.88.1.31) To KSBJMAIL4.kingsoft.cn (10.88.1.79) X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFnrDLMWRmVeSWpSXmKPExsXCFcHor7t2c3aCwZYn0hZz1q9hs/i6/hez xeVdc9gs7q35z2pxsfEAo8WZaUUWby7cY3Fg91i85yWTx6ZPk9g9Tsz4zeLx4upGFo/3+66y eWw+Xe3xeZNcAHsUl01Kak5mWWqRvl0CV8a9zavYC5byV5xd95K1gbGTp4uRk0NCwERiz9vV 7F2MXBxCAtOZJG7smcAC4bxklNi1qY0JpIpFQEVi5btvzCA2m4CqxK57s1hBikQE2hgl1q85 zwbiMAt8YZJYcm4j0CwODmGBYInV8/VBGngFrCTu974CG8Qp4CBx6O4xVogNx1kkjkxdwAqS 4BcQk+i98p8J4iZ7ibYtixghmgUlTs58wgJiMwtoSrRu/80OYWtLLFv4GuwiIQFFicNLfrFD 9CpJHOmewQZhx0o0HbjFNoFReBaSUbOQjJqFZNQCRuZVjCzFuelGmxghERK6g3FG00e9Q4xM HIyHGCU4mJVEeHf0ZicI8aYkVlalFuXHF5XmpBYfYpTmYFES59XmTk8QEkhPLEnNTk0tSC2C yTJxcEo1MBm/m71TIY7j8c3b+ks/3VY63Jjq8WB57/Obe95/Z9bcv5td8WRv7pPpX2rWsTgZ TuVJ/PxflvNP6B3F3selvNPnFve18d9INOA1tFssmralrab5mA1/5JbknwWPjJS/S5vkJ7w9 3yv/6Ke4S82reT0LXXc58L9NZ1ZNnfttXefU5AVin3k9P59aqz3BT/5O8dM+b+ULK+4Yd89L cno67/L2zyb520xrRA7pck45a7nlH+scL7Gsg6fvWf/lnHMs5cDvVHbr0/XCM48+q1woZuRv bFW7fA/3lfcnBe0VklfvE2wU3/T7pDTv+8R/thMO+vOddbq1piGn1vCXj9lxAZOQAKmbugtT dc496Dwdmq7EUpyRaKjFXFScCACFuqPk/wIAAA== X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E2C08E6 X-Stat-Signature: 5uggctk19jc3n9zs6mkwjdk6j7ea78yc Received-SPF: none (kingsoft.com>: No applicable sender policy available) receiver=imf12; identity=mailfrom; envelope-from=""; helo=mail.kingsoft.com; client-ip=114.255.44.146 X-HE-DKIM-Result: none/none X-HE-Tag: 1617671085-770492 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 5 Apr 2021 13:50:18 +0000 HORIGUCHI NAOYA(=E5=A0=80=E5=8F=A3=E3=80=80=E7=9B=B4=E4=B9=9F) wrote: > On Fri, Apr 02, 2021 at 03:11:20PM +0000, Luck, Tony wrote: > > >> Combined with my "mutex" patch (to get rid of races where 2nd proces= s returns > > >> early, but first process is still looking for mappings to unmap and = tasks > > >> to signal) this patch moves forward a bit. But I think it needs an > > >> additional change here in kill_me_maybe() to just "return" if there = is a > > >> EHWPOISON return from memory_failure() > > >>=20 > > > Got this, Thanks for your reply! > > > I will dig into this! > >=20 > > One problem with this approach is when the first task to find poison > > fails to complete actions. Then the poison pages are not unmapped, > > and just returning from kill_me_maybe() gets into a loop :-( >=20 > Yes, that's the pain point. We need send SIGBUS to the current process in > "already haredware poisoned" case of memory_failure(). SIGBUS should > contain the error virtual address, but unfortunately walking the page tab= le > or using p->mce_vaddr is not always reliable now. >=20 > So as a second-best approach, we can extend the "walking page table" > approach such that we walk over the whole virtual address space to make s= ure > that the number of entries pointing to the error page is exactly 1. > If that's the case, then we can confidently send SIGBUS with it. If we f= ind > multiple entries pointing to the error page, then we give up guessing, th= en > send a nomral SIGBUS to the current process. That's not worse than now, > and I think we need wait in the hope that the virtual address will be > available in MCE handler. >=20 > Anyway I'll try to write a patch for this. Yeah, previous patch didn't adress the multiple virtual address issue, If t= here is a way to fix that, That would be great! --=20 Thanks! Aili Yao