From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F7EEC433F5 for ; Tue, 15 Feb 2022 01:37:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70F5A6B0078; Mon, 14 Feb 2022 20:37:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6BF306B007B; Mon, 14 Feb 2022 20:37:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AD0E6B007D; Mon, 14 Feb 2022 20:37:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 4CA8E6B0078 for ; Mon, 14 Feb 2022 20:37:43 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E87D0181AC9C6 for ; Tue, 15 Feb 2022 01:37:42 +0000 (UTC) X-FDA: 79143302364.18.A4D9D52 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf15.hostedemail.com (Postfix) with ESMTP id 79CE2A0004 for ; Tue, 15 Feb 2022 01:37:42 +0000 (UTC) Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nJmmV-0007om-I1; Mon, 14 Feb 2022 20:37:27 -0500 Message-ID: <6f70cc26ccc92d099f1080e4c57ab44709bafd68.camel@surriel.com> Subject: Re: [PATCH v2] mm: clean up hwpoison page cache page in fault path From: Rik van Riel To: Andrew Morton Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org, Miaohe Lin , Mel Gorman , Johannes Weiner , Matthew Wilcox , Naoya Horiguchi , Naoya Horiguchi Date: Mon, 14 Feb 2022 20:37:26 -0500 In-Reply-To: <20220214152407.67e0d7dd1a532252c9dd203e@linux-foundation.org> References: <20220212213740.423efcea@imladris.surriel.com> <20220214152407.67e0d7dd1a532252c9dd203e@linux-foundation.org> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-b7zfiHYjDu/3wMt4SsyU" User-Agent: Evolution 3.42.3 (3.42.3-1.fc35) MIME-Version: 1.0 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 79CE2A0004 X-Stat-Signature: hzsua9jequ3jzdfuwf1akpe6bfsyzmtq X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=none; dmarc=none; spf=none (imf15.hostedemail.com: domain of riel@shelob.surriel.com has no SPF policy when checking 96.67.55.147) smtp.mailfrom=riel@shelob.surriel.com X-HE-Tag: 1644889062-696094 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --=-b7zfiHYjDu/3wMt4SsyU Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 2022-02-14 at 15:24 -0800, Andrew Morton wrote: >=20 > > Subject: [PATCH v2] mm: clean up hwpoison page cache page in fault > > path >=20 > At first scan I thought this was a code cleanup. >=20 > I think I'll do s/clean up/invalidate/. >=20 OK, that sounds good. > On Sat, 12 Feb 2022 21:37:40 -0500 Rik van Riel > wrote: >=20 > > Sometimes the page offlining code can leave behind a hwpoisoned > > clean > > page cache page. >=20 > Is this correct behaviour? It is not desirable, and the soft page offlining code tries to invalidate the page, but I don't think overhauling the way we lock and refcount page cache pages just to make offlining them more reliable would be worthwhile, when we already have a branch in the page fault handler to deal with these pages, anyway. > > This can lead to programs being killed over and over > > and over again as they fault in the hwpoisoned page, get killed, > > and > > then get re-spawned by whatever wanted to run them. > >=20 > > This is particularly embarrassing when the page was offlined due to > > having too many corrected memory errors. Now we are killing tasks > > due to them trying to access memory that probably isn't even > > corrupted. > >=20 > > This problem can be avoided by invalidating the page from the page > > fault handler, which already has a branch for dealing with these > > kinds of pages. With this patch we simply pretend the page fault > > was successful if the page was invalidated, return to userspace, > > incur another page fault, read in the file from disk (to a new > > memory page), and then everything works again. >=20 > Is this worth a cc:stable? Maybe. I don't know how far back this issue goes... --=20 All Rights Reversed. --=-b7zfiHYjDu/3wMt4SsyU Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQEyBAABCAAdFiEEKR73pCCtJ5Xj3yADznnekoTE3oMFAmILA9YACgkQznnekoTE 3oOFvAf1Eybcb8fqZjb7Qum00p9okDKc5SuiWsTTFGEsKIXcC7BlUhFblo2zFQEX 98tgwOw5KzXtj5iuvkEw2VLja8XbaH/uWDVMi8OOFwKyPhj2wmnDsS1z3bmhnFbM Gq36IigdymUctDQnUIJFkXsdVEB+O0LzG19JLbi6BEhnio1Qq6u5Zsu+1bwNvuP3 zOI2rM3Lk4o4++0WidG8W1jQ3aCmMXlZmCqciWkWl4dIP9Gqm21rSSHIhVfgCaKs PMXMjlX4XbEASBA3pvt2YBa3GRFoUxYlgGqwryzMnO4XZpudG4BfbIV9Ymkx9Rie Oun+LZjwhT+8180Ks+JvjU0o5Z9E =UKB2 -----END PGP SIGNATURE----- --=-b7zfiHYjDu/3wMt4SsyU--