From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE06AC43457 for ; Fri, 16 Oct 2020 03:07:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 91CDC2087D for ; Fri, 16 Oct 2020 03:07:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1602817649; bh=v3Pr3yERnK3TDef0FPAWPMAJIRreiDyEWJ/wBrRKKcE=; h=Date:From:To:Subject:In-Reply-To:Reply-To:List-ID:From; b=HPOfAZIXsMSg4W0w8/4U9k/rjJeWRgQX+hIBmFXJNM3QQg8n8SZUT9O/fGe0n/d9U vrXj+h4P9y1Mc/PsYmK1ZqH//n+/U/V6cIrJ43QkSh7MXAhpIQQ6nnh2KqOFfkuqPP dwNiiHoSdHuqJtprdyPPk5JBEB2YxHmvT55XIWHQ= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729724AbgJPDH3 (ORCPT ); Thu, 15 Oct 2020 23:07:29 -0400 Received: from mail.kernel.org ([198.145.29.99]:45416 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727382AbgJPDH3 (ORCPT ); Thu, 15 Oct 2020 23:07:29 -0400 Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 4158720789; Fri, 16 Oct 2020 03:07:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1602817647; bh=v3Pr3yERnK3TDef0FPAWPMAJIRreiDyEWJ/wBrRKKcE=; h=Date:From:To:Subject:In-Reply-To:From; b=quMnZ22r3VSxMbOPOmvUwgi1lyH/ld8fF8jMkP8g9/RZ3DKt0TwRph61lCvNT9u0D jQEBDG3FY0Kusn4fUP/klPOGQlFdfC+NQhAzlpByc9DgiUfP32zaIXVMnbH2SfBDeN l/FI/9mdTxBz0JxxMObE6bWh0JXpvWP6UIpsxKsQ= Date: Thu, 15 Oct 2020 20:07:25 -0700 From: Andrew Morton To: akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, aneesh.kumar@linux.vnet.ibm.com, aris@ruivo.org, cai@lca.pw, dave.hansen@intel.com, david@redhat.com, mhocko@kernel.org, mike.kravetz@oracle.com, mm-commits@vger.kernel.org, naoya.horiguchi@nec.com, osalvador@suse.com, tony.luck@intel.com, torvalds@linux-foundation.org, zeil@yandex-team.ru Subject: [patch 053/156] mm,hwpoison: double-check page count in __get_any_page() Message-ID: <20201016030725.zk1DTEVQR%akpm@linux-foundation.org> In-Reply-To: <20201015194043.84cda0c1d6ca2a6847f2384a@linux-foundation.org> User-Agent: s-nail v14.8.16 Precedence: bulk Reply-To: linux-kernel@vger.kernel.org List-ID: X-Mailing-List: mm-commits@vger.kernel.org From: Naoya Horiguchi Subject: mm,hwpoison: double-check page count in __get_any_page() Soft offlining could fail with EIO due to the race condition with hugepage migration. This issuse became visible due to the change by previous patch that makes soft offline handler take page refcount by its own. We have no way to directly pin zero refcount page, and the page considered as a zero refcount page could be allocated just after the first check. This patch adds the second check to find the race and gives us chance to handle it more reliably. Link: https://lkml.kernel.org/r/20200922135650.1634-14-osalvador@suse.de Signed-off-by: Naoya Horiguchi Reported-by: Qian Cai Cc: "Aneesh Kumar K.V" Cc: Aneesh Kumar K.V Cc: Aristeu Rozanski Cc: Dave Hansen Cc: David Hildenbrand Cc: Dmitry Yakunin Cc: Michal Hocko Cc: Mike Kravetz Cc: Oscar Salvador Cc: Tony Luck Signed-off-by: Andrew Morton --- mm/memory-failure.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/mm/memory-failure.c~mmhwpoison-double-check-page-count-in-__get_any_page +++ a/mm/memory-failure.c @@ -1707,6 +1707,9 @@ static int __get_any_page(struct page *p } else if (is_free_buddy_page(p)) { pr_info("%s: %#lx free buddy page\n", __func__, pfn); ret = 0; + } else if (page_count(p)) { + /* raced with allocation */ + ret = -EBUSY; } else { pr_info("%s: %#lx: unknown zero refcount page type %lx\n", __func__, pfn, p->flags); @@ -1723,6 +1726,9 @@ static int get_any_page(struct page *pag { int ret = __get_any_page(page, pfn, flags); + if (ret == -EBUSY) + ret = __get_any_page(page, pfn, flags); + if (ret == 1 && !PageHuge(page) && !PageLRU(page) && !__PageMovable(page)) { /* _