From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0BEB7C43334 for ; Thu, 2 Jun 2022 05:07:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 97E276B0075; Thu, 2 Jun 2022 01:07:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 92F0F6B0078; Thu, 2 Jun 2022 01:07:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F1546B007B; Thu, 2 Jun 2022 01:07:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 71F696B0075 for ; Thu, 2 Jun 2022 01:07:18 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3D05060DD2 for ; Thu, 2 Jun 2022 05:07:18 +0000 (UTC) X-FDA: 79532112156.21.F471E29 Received: from out1.migadu.com (out1.migadu.com [91.121.223.63]) by imf27.hostedemail.com (Postfix) with ESMTP id AD1D14005D for ; Thu, 2 Jun 2022 05:07:13 +0000 (UTC) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1654146436; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=69HjEHnWoOWXnEg6MdqERUrKKt+r81ak43WL3MVYDqU=; b=ErJcBXqaS4mYysb0lBd2slIeeDZ8PVHR7I2q+OH+OvDRMLQeObJt4/4r0pa3e72Ap9vVnO R3S3+dXl1od9asrjUxRYuDT7qu+vW4n+yrL//rV2wX3d19ICdeDQd4hAabQZrK9uPAlqbN X0Dm0eqHEs8sq5hDG+6WGsA6OpiyKOQ= From: Naoya Horiguchi To: linux-mm@kvack.org Cc: Andrew Morton , David Hildenbrand , Mike Kravetz , Miaohe Lin , Liu Shixin , Yang Shi , Oscar Salvador , Muchun Song , Naoya Horiguchi , linux-kernel@vger.kernel.org Subject: [PATCH v1 4/5] mm, hwpoison: skip raw hwpoison page in freeing 1GB hugepage Date: Thu, 2 Jun 2022 14:06:30 +0900 Message-Id: <20220602050631.771414-5-naoya.horiguchi@linux.dev> In-Reply-To: <20220602050631.771414-1-naoya.horiguchi@linux.dev> References: <20220602050631.771414-1-naoya.horiguchi@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Migadu-Auth-User: linux.dev Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ErJcBXqa; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf27.hostedemail.com: domain of naoya.horiguchi@linux.dev designates 91.121.223.63 as permitted sender) smtp.mailfrom=naoya.horiguchi@linux.dev X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: AD1D14005D X-Rspam-User: X-Stat-Signature: tehrmt4fteqqhfb3sx34kcxdt1u4nn5t X-HE-Tag: 1654146433-480047 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Naoya Horiguchi Currently if memory_failure() (modified to remove blocking code) is called on a page in some 1GB hugepage, memory error handling returns failure and the raw error page gets into undesirable state. The impact is small in production systems (just leaked single 4kB page), but this limits the test efficiency because unpoison doesn't work for it. So we can no longer create 1GB hugepage on the 1GB physical address range with such hwpoison pages, that could be an issue in testing on small systems. When a hwpoison page in a 1GB hugepage is handled, it's caught by the PageHWPoison check in free_pages_prepare() because the hugepage is broken down into raw error page and order is 0: if (unlikely(PageHWPoison(page)) && !order) { ... return false; } Then, the page is not sent to buddy and the page refcount is left 0. Originally this check is supposed to work when the error page is freed from page_handle_poison() (that is called from soft-offline), but now we are opening another path to call it, so the callers of __page_handle_poison() need to handle the case by considering the return value 0 as success. Then page refcount for hwpoison is properly incremented and now unpoison works. Signed-off-by: Naoya Horiguchi --- mm/memory-failure.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index f149a7864c81..babeb34f7477 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1043,7 +1043,6 @@ static int me_huge_page(struct page_state *ps, struct page *p) res = truncate_error_page(hpage, page_to_pfn(p), mapping); unlock_page(hpage); } else { - res = MF_FAILED; unlock_page(hpage); /* * migration entry prevents later access on error anonymous @@ -1051,9 +1050,11 @@ static int me_huge_page(struct page_state *ps, struct page *p) * save healthy subpages. */ put_page(hpage); - if (__page_handle_poison(p) > 0) { + if (__page_handle_poison(p) >= 0) { page_ref_inc(p); res = MF_RECOVERED; + } else { + res = MF_FAILED; } } @@ -1601,9 +1602,11 @@ static int try_memory_failure_hugetlb(unsigned long pfn, int flags, int *hugetlb */ if (res == 0) { unlock_page(head); - if (__page_handle_poison(p) > 0) { + if (__page_handle_poison(p) >= 0) { page_ref_inc(p); res = MF_RECOVERED; + } else { + res = MF_FAILED; } action_result(pfn, MF_MSG_FREE_HUGE, res); return res == MF_RECOVERED ? 0 : -EBUSY; -- 2.25.1