From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id A53F8C433EF for ; Sat, 12 Feb 2022 03:13:27 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230514AbiBLDLA (ORCPT ); Fri, 11 Feb 2022 22:11:00 -0500 Received: from gmail-smtp-in.l.google.com ([23.128.96.19]:60550 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229541AbiBLDKy (ORCPT ); Fri, 11 Feb 2022 22:10:54 -0500 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E12082E09F for ; Fri, 11 Feb 2022 19:10:50 -0800 (PST) Received: from canpemm500002.china.huawei.com (unknown [172.30.72.53]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Jwb3r2CK9zZcDl; Sat, 12 Feb 2022 11:06:32 +0800 (CST) Received: from [10.174.177.76] (10.174.177.76) by canpemm500002.china.huawei.com (7.192.104.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.21; Sat, 12 Feb 2022 11:10:48 +0800 Subject: Re: [PATCH] mm: clean up hwpoison page cache page in fault path To: Rik van Riel CC: , , Andrew Morton , Mel Gorman , Johannes Weiner , Matthew Wilcox , linux-kernel References: <20220211170557.7964a301@imladris.surriel.com> From: Miaohe Lin Message-ID: Date: Sat, 12 Feb 2022 11:10:47 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <20220211170557.7964a301@imladris.surriel.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.177.76] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To canpemm500002.china.huawei.com (7.192.104.244) X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2022/2/12 6:05, Rik van Riel wrote: > Sometimes the page offlining code can leave behind a hwpoisoned clean > page cache page. This can lead to programs being killed over and over Yep, __soft_offline_page tries to invalidate_inode_page in a lightway. > and over again as they fault in the hwpoisoned page, get killed, and > then get re-spawned by whatever wanted to run them. > > This is particularly embarrassing when the page was offlined due to > having too many corrected memory errors. Now we are killing tasks > due to them trying to access memory that probably isn't even corrupted. > > This problem can be avoided by invalidating the page from the page > fault handler, which already has a branch for dealing with these > kinds of pages. With this patch we simply pretend the page fault > was successful if the page was invalidated, return to userspace, > incur another page fault, read in the file from disk (to a new > memory page), and then everything works again. > > Signed-off-by: Rik van Riel Good catch! This looks good to me. Thanks. Reviewed-by: Miaohe Lin > > diff --git a/mm/memory.c b/mm/memory.c > index c125c4969913..2300358e268c 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3871,11 +3871,16 @@ static vm_fault_t __do_fault(struct vm_fault *vmf) > return ret; > > if (unlikely(PageHWPoison(vmf->page))) { > - if (ret & VM_FAULT_LOCKED) > + int poisonret = VM_FAULT_HWPOISON; > + if (ret & VM_FAULT_LOCKED) { > + /* Retry if a clean page was removed from the cache. */ > + if (invalidate_inode_page(vmf->page)) > + poisonret = 0; > unlock_page(vmf->page); > + } > put_page(vmf->page); > vmf->page = NULL; > - return VM_FAULT_HWPOISON; > + return poisonret; > } > > if (unlikely(!(ret & VM_FAULT_LOCKED))) > >