From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 67A2BC433DF for ; Fri, 16 Oct 2020 02:44:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D766A208E4 for ; Fri, 16 Oct 2020 02:44:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="WAoL6mTi" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D766A208E4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 60639940019; Thu, 15 Oct 2020 22:44:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53FB4940007; Thu, 15 Oct 2020 22:44:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42ED4940019; Thu, 15 Oct 2020 22:44:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0073.hostedemail.com [216.40.44.73]) by kanga.kvack.org (Postfix) with ESMTP id 0C181940007 for ; Thu, 15 Oct 2020 22:44:16 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AE7238249980 for ; Fri, 16 Oct 2020 02:44:15 +0000 (UTC) X-FDA: 77376244470.07.edge46_261456c27219 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin07.hostedemail.com (Postfix) with ESMTP id 955041803F9A2 for ; Fri, 16 Oct 2020 02:44:15 +0000 (UTC) X-HE-Tag: edge46_261456c27219 X-Filterd-Recvd-Size: 3669 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf31.hostedemail.com (Postfix) with ESMTP for ; Fri, 16 Oct 2020 02:44:15 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 72FEF20878; Fri, 16 Oct 2020 02:44:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1602816254; bh=v3Pr3yERnK3TDef0FPAWPMAJIRreiDyEWJ/wBrRKKcE=; h=Date:From:To:Subject:In-Reply-To:From; b=WAoL6mTiMBV5f+IM/icCiVinRPgPaw8kIuyKyMcwnEcKWGV873ErCiDcMfTuTlu6L pCRDvqWZ243Z5syhFP2s7GJKZkjcgWlBzG84MOODCOD4ngqAy7yyDqEA62SYBEGEFo HBp9/NFifL3GRaA503L1qQpdPRHJG1RfL8QnHauI= Date: Thu, 15 Oct 2020 19:44:12 -0700 From: Andrew Morton To: akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, aneesh.kumar@linux.vnet.ibm.com, aris@ruivo.org, cai@lca.pw, dave.hansen@intel.com, david@redhat.com, linux-mm@kvack.org, mhocko@kernel.org, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.com, tony.luck@intel.com, torvalds@linux-foundation.org, zeil@yandex-team.ru Subject: [patch 053/156] mm,hwpoison: double-check page count in __get_any_page() Message-ID: <20201016024412.3mTi2uR8k%akpm@linux-foundation.org> In-Reply-To: <20201015192732.f448da14e9854c7cb7299956@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi Subject: mm,hwpoison: double-check page count in __get_any_page() Soft offlining could fail with EIO due to the race condition with hugepage migration. This issuse became visible due to the change by previous patch that makes soft offline handler take page refcount by its own. We have no way to directly pin zero refcount page, and the page considered as a zero refcount page could be allocated just after the first check. This patch adds the second check to find the race and gives us chance to handle it more reliably. Link: https://lkml.kernel.org/r/20200922135650.1634-14-osalvador@suse.de Signed-off-by: Naoya Horiguchi Reported-by: Qian Cai Cc: "Aneesh Kumar K.V" Cc: Aneesh Kumar K.V Cc: Aristeu Rozanski Cc: Dave Hansen Cc: David Hildenbrand Cc: Dmitry Yakunin Cc: Michal Hocko Cc: Mike Kravetz Cc: Oscar Salvador Cc: Tony Luck Signed-off-by: Andrew Morton --- mm/memory-failure.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/mm/memory-failure.c~mmhwpoison-double-check-page-count-in-__get_any_page +++ a/mm/memory-failure.c @@ -1707,6 +1707,9 @@ static int __get_any_page(struct page *p } else if (is_free_buddy_page(p)) { pr_info("%s: %#lx free buddy page\n", __func__, pfn); ret = 0; + } else if (page_count(p)) { + /* raced with allocation */ + ret = -EBUSY; } else { pr_info("%s: %#lx: unknown zero refcount page type %lx\n", __func__, pfn, p->flags); @@ -1723,6 +1726,9 @@ static int get_any_page(struct page *pag { int ret = __get_any_page(page, pfn, flags); + if (ret == -EBUSY) + ret = __get_any_page(page, pfn, flags); + if (ret == 1 && !PageHuge(page) && !PageLRU(page) && !__PageMovable(page)) { /* _