From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55F8CC433B4 for ; Tue, 20 Apr 2021 16:31:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9C6E560FEA for ; Tue, 20 Apr 2021 16:31:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9C6E560FEA Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E4AAD6B006C; Tue, 20 Apr 2021 12:31:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFB5A6B006E; Tue, 20 Apr 2021 12:31:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9CDB6B0070; Tue, 20 Apr 2021 12:31:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0148.hostedemail.com [216.40.44.148]) by kanga.kvack.org (Postfix) with ESMTP id AE89B6B006C for ; Tue, 20 Apr 2021 12:31:05 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 695358249980 for ; Tue, 20 Apr 2021 16:31:05 +0000 (UTC) X-FDA: 78053284890.36.3C18123 Received: from mail-vk1-f177.google.com (mail-vk1-f177.google.com [209.85.221.177]) by imf21.hostedemail.com (Postfix) with ESMTP id 7DB38E007A5D for ; Tue, 20 Apr 2021 16:31:02 +0000 (UTC) Received: by mail-vk1-f177.google.com with SMTP id a11so2082721vkl.0 for ; Tue, 20 Apr 2021 09:31:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SyeBpP93r/ZScXTlbohZqR/MhbvH5Rc92gL8R/yslVY=; b=CgbVw/oINtzTTI1+A0YC6JPWahqbSC2IdZ9dEmpk7H1hsd38Q1gr9Pd3Q9rN7Z9D5g yCqwpqVnXCBGiqGtUChUW4HbEtfwSpTK8ROdJXcsh0qB9OodRCUEAJJDVkhjPlQOKn2D vUHQwDt2559vKDjgYkFeTx0+zRl+rTmxlMWUAqBIPuQPAycPmTip1SqtJXMiCa7l74ow w+/KqQoaYJ5rWqWM4jXAJ5VvS7KibmpMIxKV6ywT5AjDflR9R+8S6eM+2z5uedxa64Gm vgB+wchJDtBhwLdxpEfKqk1qKzp9TOwy7pmzCuUwsIF52bGr+HKvWjtSdIoPhNjKf70E SIrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SyeBpP93r/ZScXTlbohZqR/MhbvH5Rc92gL8R/yslVY=; b=TTzdUR7T1325VZQEiTUdHzzjOwts1Zo1OPapqOyEOFZRMFqnv4D5NyVJZvhKeTgvqG jmzD5wD8xKUDksdYLsnDT2olI0mDxvF6ZbMUvIMuAV49Pu6qD3SO8KKwos7HuoBy78qp r3MzFcEb4prG2HLxV3FogTEmBWKtv22rEpCOSzMOxE6r7U2Ks4K8/6kYyiY3Vu6eERg1 vBeEZsT3t1uMCCntkKUh8NTtwC2l5EbVfvpqm5Rs47x3VAyhZRv9V6aQq1cjeMwKEVb1 +3B6Oppszs8CFo5l6rnc1O7K2L7/1TD0suXBEUA6b2M0V5qx/pJQXQ3WbF4RWOLGrgDE 2Cfw== X-Gm-Message-State: AOAM5339XQXh72NborDGPiAH5d+dxzyMR046CjC9rnSfAMxiXeR3tckw qvj5jcqNOXEcd//deirjADplwEXXY5B/AMyqYYoTTg== X-Google-Smtp-Source: ABdhPJx6jG67VpNY3OyeKZKHSCMUZbY8X2DVAXHrorLVCyhF0BXZcmUT2UykHJjG9gynWS5xc8oj2nlUOHHQ8Nnsb6w= X-Received: by 2002:a1f:3105:: with SMTP id x5mr9160343vkx.8.1618936263724; Tue, 20 Apr 2021 09:31:03 -0700 (PDT) MIME-Version: 1.0 References: <20210420154730.GA577592@agluck-desk2.amr.corp.intel.com> In-Reply-To: <20210420154730.GA577592@agluck-desk2.amr.corp.intel.com> From: Jue Wang Date: Tue, 20 Apr 2021 09:30:52 -0700 Message-ID: Subject: Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address To: "Luck, Tony" Cc: Naoya Horiguchi , Andrew Morton , Borislav Petkov , david@redhat.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, luto@kernel.org, =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Oscar Salvador , yaoaili@kingsoft.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 7DB38E007A5D X-Stat-Signature: iub3rbyj1c4m6ot75ogikg77rdym6mcj Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf21; identity=mailfrom; envelope-from=""; helo=mail-vk1-f177.google.com; client-ip=209.85.221.177 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618936262-81542 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 20, 2021 at 8:48 AM Luck, Tony wrote: > > On Mon, Apr 19, 2021 at 07:03:01PM -0700, Jue Wang wrote: > > On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote: > > > > > This patch suggests to do page table walk to find the error virtual > > > address. If we find multiple virtual addresses in walking, we now can't > > > determine which one is correct, so we fall back to sending SIGBUS in > > > kill_me_maybe() without error info as we do now. This corner case needs > > > to be solved in the future. > > > > Instead of walking the page tables, I wonder what about the following idea: > > > > When failing to get vaddr, memory_failure just ensures the mapping is removed > > and an hwpoisoned swap pte is put in place; or the original page is flagged with > > PG_HWPOISONED and kept in the radix tree (e.g., for SHMEM THP). > > To remove the mapping, you need to know the virtual address :-) I meant in this case (racing to access the same poisoned pages), the page mapping should have been removed by and the hwpoison swap pte installed by the winner thread? Other racing threads can rely on the subsequent #PFs to get the correct SIGBUS with accurate vaddr semantics? Or is the goal to "give back correct SIGBUS with accurate vaddr on _the first MCE on ANY threads_"? I wonder if that goal is absolutely necessary and can be relaxed a little to take into account subsequent #PFs. > > Well, I did try a patch that removed *all* user mappings (switched CR3 to > swapper_pgdir) and returned to user. Then have the resulting page fault > report the address. But that didn't work very well. Curious what didn't work well in this case? :-) > > > > > > NOTE: no SIGBUS is sent to user space. > > > > Then do_machine_check just returns to user space to resume execution, the > > re-execution will result in a #PF and should land to the exact page fault > > handling code that generates a SIGBUS with the precise vaddr info: > > That's how SRAO (and other races) are supposed to work. Hmm, I wonder why it doesn't apply to this race. > > -Tony