From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AC95C433EF for ; Tue, 15 Feb 2022 15:05:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E553B6B0085; Tue, 15 Feb 2022 10:05:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DDD3E6B0087; Tue, 15 Feb 2022 10:05:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C5AC46B0088; Tue, 15 Feb 2022 10:05:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id B52266B0085 for ; Tue, 15 Feb 2022 10:05:00 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 7C3CE180428F2 for ; Tue, 15 Feb 2022 15:05:00 +0000 (UTC) X-FDA: 79145336760.21.F20B586 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) by imf27.hostedemail.com (Postfix) with ESMTP id 2223B4000C for ; Tue, 15 Feb 2022 15:04:59 +0000 (UTC) Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1nJzNx-0001uA-LX; Tue, 15 Feb 2022 10:04:57 -0500 Message-ID: Subject: Re: [PATCH v2] mm: clean up hwpoison page cache page in fault path From: Rik van Riel To: Oscar Salvador Cc: linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org, Miaohe Lin , Andrew Morton , Mel Gorman , Johannes Weiner , Matthew Wilcox Date: Tue, 15 Feb 2022 10:04:57 -0500 In-Reply-To: References: <20220212213740.423efcea@imladris.surriel.com> Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="=-cEEbhHLZGjBrDgd1rkUJ" User-Agent: Evolution 3.42.3 (3.42.3-1.fc35) MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2223B4000C X-Stat-Signature: yszw91uzffw1r9wentbxbxbjwmbjw33p Authentication-Results: imf27.hostedemail.com; dkim=none; spf=none (imf27.hostedemail.com: domain of riel@shelob.surriel.com has no SPF policy when checking 96.67.55.147) smtp.mailfrom=riel@shelob.surriel.com; dmarc=none X-HE-Tag: 1644937499-196296 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --=-cEEbhHLZGjBrDgd1rkUJ Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, 2022-02-15 at 13:51 +0100, Oscar Salvador wrote: > On Sat, Feb 12, 2022 at 09:37:40PM -0500, Rik van Riel wrote: > > Sometimes the page offlining code can leave behind a hwpoisoned > > clean > > page cache page. This can lead to programs being killed over and > > over > > and over again as they fault in the hwpoisoned page, get killed, > > and > > then get re-spawned by whatever wanted to run them. >=20 > Hi Rik, >=20 > Do you know how that exactly happens? We should not be really leaving > anything behind, and soft-offline (not hard) code works with the > premise > of only poisoning a page in case it was contained, so I am wondering > what is going on here. >=20 > In-use pagecache pages are migrated away, and the actual page is > contained, and for clean ones, we already do the > invalidate_inode_page() > and then contain it in case we succeed. I do not know the exact failure case, since I have never caught a system in the act of leaking one of these pages. I just know I have seen this issue on systems where the "soft_offline: %#lx: invalidated\n" printk was the only offline method leaving any message in the kernel log. However, there are a few code paths through the soft offlining code path that don't seem to have any printks, so I am not sure exactly where things went wrong. I only really found the aftermath, and tested this patch by loading it as a kernel live patch module on some of those systems. --=20 All Rights Reversed. --=-cEEbhHLZGjBrDgd1rkUJ Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part Content-Transfer-Encoding: 7bit -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEKR73pCCtJ5Xj3yADznnekoTE3oMFAmILwRkACgkQznnekoTE 3oOatgf/VctfpxQ82Wr2xD3ogIG/T6vKdLWw/cOzRgJoZDyal2JxdXppe3Cu1IPt C8UGfdwh/LKsmFf2fUdux3aBc9abX4KAzntPkhnfN2ST3Bd4Eph8ejFoLQPsmFV8 UMP966KO25wDVf8eovgXHQLB0gcIMVxivr72wOVXzZz2Iz0DzUovcYwjgPmt1NMG nGJ4Xre00BEPi0Pb1ktzGoAWOfC8iv27C+mMPR9cQY1RFDvkbAYhS33ch7ntKKHq 9mbNXxIPlIFVR3Zh61qssRrZzGrX3L/PotkiTtZW9qPs+roaWHZQwSCyAhp6tNT5 wCXIC1iwsAqEMS7Lxk74heUHTIscWw== =1vXq -----END PGP SIGNATURE----- --=-cEEbhHLZGjBrDgd1rkUJ--