From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD2D7C4320A for ; Mon, 16 Aug 2021 17:56:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3E8BC6108B for ; Mon, 16 Aug 2021 17:56:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 3E8BC6108B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 7923A8D0001; Mon, 16 Aug 2021 13:56:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 741D56B0072; Mon, 16 Aug 2021 13:56:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 630848D0001; Mon, 16 Aug 2021 13:56:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0114.hostedemail.com [216.40.44.114]) by kanga.kvack.org (Postfix) with ESMTP id 47FD66B006C for ; Mon, 16 Aug 2021 13:56:51 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id DD5FE8249980 for ; Mon, 16 Aug 2021 17:56:50 +0000 (UTC) X-FDA: 78481699380.39.EE6BC52 Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf09.hostedemail.com (Postfix) with ESMTP id B136C30600CD for ; Mon, 16 Aug 2021 17:56:49 +0000 (UTC) X-IronPort-AV: E=McAfee;i="6200,9189,10078"; a="279651009" X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="279651009" Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 10:56:48 -0700 X-IronPort-AV: E=Sophos;i="5.84,326,1620716400"; d="scan'208";a="519744355" Received: from agluck-desk2.sc.intel.com (HELO agluck-desk2.amr.corp.intel.com) ([10.3.52.146]) by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2021 10:56:48 -0700 Date: Mon, 16 Aug 2021 10:56:46 -0700 From: "Luck, Tony" To: Naoya Horiguchi Cc: HORIGUCHI =?utf-8?B?TkFPWUEo5aCA5Y+j44CA55u05LmfKQ==?= , Naoya Horiguchi , Oscar Salvador , Muchun Song , Mike Kravetz , "linux-mm@kvack.org" , Andrew Morton , Michal Hocko , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH v6 1/2] mm,hwpoison: fix race with hugetlb page allocation Message-ID: <20210816175646.GA1600630@agluck-desk2.amr.corp.intel.com> References: <20210603233632.2964832-1-nao.horiguchi@gmail.com> <20210603233632.2964832-2-nao.horiguchi@gmail.com> <20210812042813.GA1576603@agluck-desk2.amr.corp.intel.com> <20210812090303.GA153531@hori.linux.bs1.fc.nec.co.jp> <20210812152548.GA1579021@agluck-desk2.amr.corp.intel.com> <20210813062951.GA203438@hori.linux.bs1.fc.nec.co.jp> <96d4fd8b75e44a6c970e4d9530980f21@intel.com> <20210816171207.GA2239284@u2004> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210816171207.GA2239284@u2004> X-Rspamd-Queue-Id: B136C30600CD Authentication-Results: imf09.hostedemail.com; dkim=none; spf=none (imf09.hostedemail.com: domain of tony.luck@intel.com has no SPF policy when checking 134.134.136.100) smtp.mailfrom=tony.luck@intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-Rspamd-Server: rspam01 X-Stat-Signature: togt96idsd61nkzwyhfbxafb5gu1cwsn X-HE-Tag: 1629136609-210517 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 17, 2021 at 02:12:07AM +0900, Naoya Horiguchi wrote: > This dump indicates that HWPoisonHandlable() returned false due to > the lack of PG_lru flag. In older code before 5.13, get_any_page() does > retry with shake_page(), but does not since 5.13, which seems to me > the root cause of the issue. So my suggestion is to call shake_page() > when HWPoisonHandlable() is false. > > Could you try checking that the following diff fixes the issue? > I could still have better fix (like inserting shake_page() to other > retry paths in get_any_page()), but the below is the minimum one. Tried it ... and it works! Injected and recovered from a thousand errors without seeing any problems. -Tony P.S. Somewhere in the mail system your patch arrived with s changed to spaces. Here's what I applied to v5.14-rc6 (hopefully with TABS preserved) ... just in case anyone else is following along with this thread and wants to try some tests. diff --git a/mm/memory-failure.c b/mm/memory-failure.c index eefd823deb67..aa6592540f17 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1146,7 +1146,7 @@ static int __get_hwpoison_page(struct page *page) * unexpected races caused by taking a page refcount. */ if (!HWPoisonHandlable(head)) - return 0; + return -EBUSY; if (PageTransHuge(head)) { /* @@ -1199,9 +1199,14 @@ static int get_any_page(struct page *p, unsigned long flags) } goto out; } else if (ret == -EBUSY) { - /* We raced with freeing huge page to buddy, retry. */ - if (pass++ < 3) + /* + * We raced with (possibly temporary) unhandlable + * page, retry. + */ + if (pass++ < 3) { + shake_page(p, 1); goto try_again; + } goto out; } }