From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.8 required=3.0 tests=BAYES_00, CHARSET_FARAWAY_HEADER,DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8C1DC433E0 for ; Fri, 17 Jul 2020 06:56:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8B22C2071A for ; Fri, 17 Jul 2020 06:56:01 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=necglobal.onmicrosoft.com header.i=@necglobal.onmicrosoft.com header.b="pf04KLMm" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727096AbgGQGz7 (ORCPT ); Fri, 17 Jul 2020 02:55:59 -0400 Received: from mail-eopbgr1400059.outbound.protection.outlook.com ([40.107.140.59]:26016 "EHLO JPN01-TY1-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726166AbgGQGz7 (ORCPT ); Fri, 17 Jul 2020 02:55:59 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=k4/KNgqQ62iyu+RFAUkvwdoSvIpSyZLsoLj5UQ44oh7ZpCD5MFVPDUYVn+169Y5Bbnc292FO5i8AIOk0uhOIyVGdf+PnhuwX75M1WV66oNDQjUy6BoNBfIO08gPmr3Iu0nEGYYVqu++psfAPn/oIwCGkSoswuqTyr7+IGBaU9qQzvCyj3gpYsDBku3iCDSE3GRV3a7qY91UuIa59Yu97+Tk3kbjCsngz9BeZfnO4ckVkA+lMBMqOzZrBLI1BLcIz60R1Kf/Y+8EPCDHBMWqeheCokGm6BO8B9o6D0GP22IAWW3ubMyV0lT3sSWx/1ICX4Q3A2c1F+nlVTrjjzHyWBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Sew+sj7R60iMtB0vRGd4Ck2Nt7KLbq8hjZbXmm+WHrY=; b=XztWdHk8kwOaa9SI7b7Ys4iJRO1ju8A4QGVaX/GNzZoC9/fcmkEerGyVg0UcoFb76m2zLpJ3UrDYmH7izvrU1DBheYKsBBOnghDEyVytvAgcuFc5jCD/37pU3k2AsFh7TaRdIowWnK3aR93Y0IYvTFUMxxIabfje8S75DINfDUO1KXJEbb+D5k8QZ+8XauUYr/wG3MurRes0Ixn4UTutLiBKtaHtbmWfRPG/J1atTHLcz2PkI1FcgobMqPm/FrghZG0P1ovljjz6LluD12a38PQhDss94pw0DOFqSbyBT8jSbRIvjhA3Xo3s89O3UdOXu9GjR2zR0ALBJmi2VSWm8A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nec.com; dmarc=pass action=none header.from=nec.com; dkim=pass header.d=nec.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=necglobal.onmicrosoft.com; s=selector1-necglobal-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=Sew+sj7R60iMtB0vRGd4Ck2Nt7KLbq8hjZbXmm+WHrY=; b=pf04KLMm1vPb8wBeBV70uDOqPTcayJAXbZIiqJKpKZwT1yEmEkV7F5IJzpP2cLUjxVIXHSu2YRT1joZJJ6cVmUrDekMvPc6SAomG7abS56tsZ4fOlkpePAmLkSCkLa4GIxJbOcd4UEStRWWkengxuPtYQwqpjjbV4cSi7J2tpyc= Received: from TY2PR01MB3210.jpnprd01.prod.outlook.com (2603:1096:404:74::14) by TYAPR01MB2767.jpnprd01.prod.outlook.com (2603:1096:404:85::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3174.20; Fri, 17 Jul 2020 06:55:53 +0000 Received: from TY2PR01MB3210.jpnprd01.prod.outlook.com ([fe80::99a9:dbe4:15e2:3c15]) by TY2PR01MB3210.jpnprd01.prod.outlook.com ([fe80::99a9:dbe4:15e2:3c15%4]) with mapi id 15.20.3174.025; Fri, 17 Jul 2020 06:55:53 +0000 From: =?iso-2022-jp?B?SE9SSUdVQ0hJIE5BT1lBKBskQktZOH0hIUQ+TGkbKEIp?= To: Oscar Salvador CC: "akpm@linux-foundation.org" , "mhocko@suse.com" , "linux-mm@kvack.org" , "mike.kravetz@oracle.com" , "david@redhat.com" , "aneesh.kumar@linux.vnet.ibm.com" , "linux-kernel@vger.kernel.org" , Oscar Salvador Subject: Re: [PATCH v4 12/15] mm,hwpoison: Rework soft offline for in-use pages Thread-Topic: [PATCH v4 12/15] mm,hwpoison: Rework soft offline for in-use pages Thread-Index: AQHWW24L3MUp+TUYEUyOQLgUEXJef6kLV8IA Date: Fri, 17 Jul 2020 06:55:52 +0000 Message-ID: <20200717065552.GA20384@hori.linux.bs1.fc.nec.co.jp> References: <20200716123810.25292-1-osalvador@suse.de> <20200716123810.25292-13-osalvador@suse.de> In-Reply-To: <20200716123810.25292-13-osalvador@suse.de> Accept-Language: ja-JP, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: suse.de; dkim=none (message not signed) header.d=none;suse.de; dmarc=none action=none header.from=nec.com; x-originating-ip: [165.225.110.205] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: ba0e7171-4eb5-443a-09f7-08d82a1e7278 x-ms-traffictypediagnostic: TYAPR01MB2767: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: jBJjmUt7pMpLNx0u8snOJQrpLRFa5VI9fG+dXGaAIYm4zIqBlppYgFBTuRvHPdiTLQSV4oNW8a98ydsUvfuWGYSOrCEptvg99tMP3GKGGhOTMN8vGCEJEzaeh/HxTIxg6s5Q0soNpXZu+P7tmx/d9VXql3EUr56jMim+hG3Ie12ZBI1N//dyp+PAYYCD/U+IZNk/GwOrv3p8aXF40u5nBuQz9ddfAZkSnUM3/8ViqKgYPt2gPnUp7IJTUJHcnQySdwHvgcLsUYuL69u4XfABmIZWXw4BMawf/fuf/w2MrD1iwCzcRHAh8X9e3fXA/j8cTNneSZIzBJ2KTnbRzYWSVMA4vkZzuC3kUy4ITAMK/M6WjWM+06ZzuT7Kg6MXn/Oq0NqlifROctUTe08wWCj1jp87yohok7oZWS/MopuEoH9Js232XDWa8F0Jm6RHoHj9HTdCp1JhQ6MFwsltZmE4nQ== x-forefront-antispam-report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:TY2PR01MB3210.jpnprd01.prod.outlook.com;PTR:;CAT:NONE;SFTY:;SFS:(4636009)(346002)(39860400002)(366004)(136003)(376002)(396003)(6916009)(30864003)(66446008)(6506007)(64756008)(4326008)(66946007)(2906002)(26005)(33656002)(76116006)(66556008)(66476007)(186003)(85182001)(83380400001)(55236004)(478600001)(6512007)(71200400001)(86362001)(54906003)(5660300002)(8936002)(316002)(1076003)(8676002)(9686003)(6486002)(142933001);DIR:OUT;SFP:1101; x-ms-exchange-antispam-messagedata: 8pJS1XyZ3ukRZZRWInXYWGSeXuRMVUwdBOEAWmBm02b4SAmhYmPQOBWo/dQuRF7E3IiyG9bJbCEF9SkVHHBjsJcv6gFq1CVpXga0/M6YjAntjdPHj+HAvqzYBnd1lhHaFYB2/aP99zVXLihlYnBpbg6LNQdK6IdLr1fA6L5/AW7qeJD0nISb7raCDvGQBgX+ihdxsCdIWLNYxxTKy6ofdl3MoBGr5vFbePeCGdz9/8h0dFebClDnrlRoiiTcymPG0ynl9F37X2drB3nq5AyyAz7yGejF04qpJCHyQhvSn1JD8JpGQSiuuRKCZ+mZ+UyfYg6av4b21hKqw3AIV+kGke5jwenl3cYvnLc7SyDK7HrEEYLYPthNUg4QjHuHBMmgjLp4s8EycVQE3ouuRLR0rwY9Q6QONh9RQkBgeyjZVAxawwzE7PPdVtt51GMF9o8VhKvugbcwIkntSU02pKaZoCLnVxliCYyeXU6eCmu/Zs5BYJoEkthyTk+LOEJYmqn3 Content-Type: text/plain; charset="iso-2022-jp" Content-ID: <7AC0E0D1AA659147837FA6AEE3DE1154@jpnprd01.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: nec.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: TY2PR01MB3210.jpnprd01.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: ba0e7171-4eb5-443a-09f7-08d82a1e7278 X-MS-Exchange-CrossTenant-originalarrivaltime: 17 Jul 2020 06:55:52.9500 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: e67df547-9d0d-4f4d-9161-51c6ed1f7d11 X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: GNe+d/T2JIoqBrL8KO5+eS82RQmB70awNN6ILxTxqpiCGoMb78oDY0cTzYOjYnq8J15XDghw2H4EhabrahujZQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYAPR01MB2767 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 16, 2020 at 02:38:06PM +0200, Oscar Salvador wrote: > This patch changes the way we set and handle in-use poisoned pages. > Until now, poisoned pages were released to the buddy allocator, trusting > that the checks that take place prior to deliver the page to its end > user would act as a safe net and would skip that page. >=20 > This has proved to be wrong, as we got some pfn walkers out there, like > compaction, that all they care is the page to be PageBuddy and be in a > freelist. > Although this might not be the only user, having poisoned pages > in the buddy allocator seems a bad idea as we should only have > free pages that are ready and meant to be used as such. >=20 > Before explaining the taken approach, let us break down the kind > of pages we can soft offline. >=20 > - Anonymous THP (after the split, they end up being 4K pages) > - Hugetlb > - Order-0 pages (that can be either migrated or invalited) >=20 > * Normal pages (order-0 and anon-THP) >=20 > - If they are clean and unmapped page cache pages, we invalidate > then by means of invalidate_inode_page(). > - If they are mapped/dirty, we do the isolate-and-migrate dance. >=20 > Either way, do not call put_page directly from those paths. > Instead, we keep the page and send it to page_set_poison to perform the > right handling. >=20 > Among other things, page_set_poison() sets the HWPoison flag and does t= he last > put_page. > This call to put_page is mainly to be able to call __page_cache_release= , > since this function is not exported. >=20 > Down the chain, we placed a check for HWPoison page in > free_pages_prepare, that just skips any poisoned page, so those pages > do not end up either in a pcplist or in buddy-freelist. >=20 > After that, we set the refcount on the page to 1 and we increment > the poisoned pages counter. >=20 > We could do as we do for free pages: > 1) wait until the page hits buddy's freelists > 2) take it off > 3) flag it >=20 > The problem is that we could race with an allocation, so by the time we > want to take the page off the buddy, the page is already allocated, so = we > cannot soft-offline it. > This is not fatal of course, but if it is better if we can close the ra= ce > as does not require a lot of code. >=20 > * Hugetlb pages >=20 > - We isolate-and-migrate them >=20 > There is no magic in here, we just isolate and migrate them. > A new set of internal functions have been made to flag a hugetlb page a= s > poisoned (SetPageHugePoisoned(), PageHugePoisoned(), ClearPageHugePoiso= ned()) > This allows us to flag the page when we migrate it, back in > move_hugetlb_state(). > > Later on we check whether the page is poisoned in __free_huge_page, > and we bail out in that case before sending the page to e.g: active > free list. > This gives us full control of the page, and we can handle it > page_handle_poison(). >=20 > In other words, we do not allow migrated hugepages to get back to the > freelists. >=20 > Since now the page has no user and has been migrated, we can call > dissolve_free_huge_page, which will end up calling update_and_free_page= . > In update_and_free_page(), we check for the page to be poisoned. > If it so, we handle it as we handle gigantic pages, i.e: we break down > the page in order-0 pages and free them one by one. > Doing so, allows us for free_pages_prepare to skip poisoned pages. >=20 > Because of the way we handle now in-use pages, we no longer need the > put-as-isolation-migratetype dance, that was guarding for poisoned pages > to end up in pcplists. I ran Quan Cai's test program (https://github.com/cailca/linux-mm) on a small (4GB memory) VM, and weiredly found that (1) the target hugepages are not always dissolved and (2) dissovled hugetpages are still counted in "HugePages_Total:". See below: $ ./random 1 - start: migrate_huge_offline - use NUMA nodes 0,1. - mmap and free 8388608 bytes hugepages on node 0 - mmap and free 8388608 bytes hugepages on node 1 madvise: Cannot allocate memory $ cat /proc/meminfo MemTotal: 4026772 kB MemFree: 976300 kB MemAvailable: 892840 kB Buffers: 20936 kB Cached: 99768 kB SwapCached: 5904 kB Active: 84332 kB Inactive: 116328 kB Active(anon): 27944 kB Inactive(anon): 68524 kB Active(file): 56388 kB Inactive(file): 47804 kB Unevictable: 7532 kB Mlocked: 0 kB SwapTotal: 2621436 kB SwapFree: 2609844 kB Dirty: 56 kB Writeback: 0 kB AnonPages: 81764 kB Mapped: 54348 kB Shmem: 8948 kB KReclaimable: 22744 kB Slab: 52056 kB SReclaimable: 22744 kB SUnreclaim: 29312 kB KernelStack: 3888 kB PageTables: 2804 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 3260612 kB Committed_AS: 828196 kB VmallocTotal: 34359738367 kB VmallocUsed: 19260 kB VmallocChunk: 0 kB Percpu: 5120 kB HardwareCorrupted: 5368 kB AnonHugePages: 18432 kB ShmemHugePages: 0 kB ShmemPmdMapped: 0 kB FileHugePages: 0 kB FilePmdMapped: 0 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 1342 // still counted as hugetlb pages. HugePages_Free: 0 // all hugepage are still allocated (or le= aked?) HugePages_Rsvd: 0 HugePages_Surp: 762 // some are counted in surplus. Hugepagesize: 2048 kB Hugetlb: 2748416 kB DirectMap4k: 112480 kB DirectMap2M: 4081664 kB $ page-types -b hwpoison flags page-count MB symbolic-flags = long-symbolic-flags 0x0000000000080008 421 1 ___U_______________X______= _________________ uptodate,hwpoison 0x00000000000a8018 1 0 ___UD__________H_G_X______= _________________ uptodate,dirty,compound_head,huge,hwpoison 0x00000000000a801c 920 3 __RUD__________H_G_X______= _________________ referenced,uptodate,dirty,compound_head,huge,hwpoiso= n total 1342 5 This means that some hugepages are dissolved, but the others not, maybe which is not desirable. I'll dig this more later but just let me share at first. A few minor comment below ... >=20 > Signed-off-by: Oscar Salvador > Signed-off-by: Naoya Horiguchi > --- > include/linux/page-flags.h | 5 ---- > mm/hugetlb.c | 60 +++++++++++++++++++++++++++++++++----- > mm/memory-failure.c | 53 +++++++++++++-------------------- > mm/migrate.c | 11 ++----- > mm/page_alloc.c | 38 +++++++----------------- > 5 files changed, 86 insertions(+), 81 deletions(-) >=20 > diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h > index 01baf6d426ff..2ac8bfa0cf20 100644 > --- a/include/linux/page-flags.h > +++ b/include/linux/page-flags.h > @@ -426,13 +426,8 @@ PAGEFLAG(HWPoison, hwpoison, PF_ANY) > TESTSCFLAG(HWPoison, hwpoison, PF_ANY) > #define __PG_HWPOISON (1UL << PG_hwpoison) > extern bool take_page_off_buddy(struct page *page); > -extern bool set_hwpoison_free_buddy_page(struct page *page); > #else > PAGEFLAG_FALSE(HWPoison) > -static inline bool set_hwpoison_free_buddy_page(struct page *page) > -{ > - return 0; > -} > #define __PG_HWPOISON 0 > #endif > =20 > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 7badb01d15e3..1c6397936512 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > =20 > #include > #include > @@ -1209,9 +1210,26 @@ static int hstate_next_node_to_free(struct hstate = *h, nodemask_t *nodes_allowed) > ((node =3D hstate_next_node_to_free(hs, mask)) || 1); \ > nr_nodes--) > =20 > -#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE > -static void destroy_compound_gigantic_page(struct page *page, > - unsigned int order) > +static inline bool PageHugePoisoned(struct page *page) > +{ > + if (!PageHuge(page)) > + return false; > + > + return (unsigned long)page[3].mapping =3D=3D -1U; > +} > + > +static inline void SetPageHugePoisoned(struct page *page) > +{ > + page[3].mapping =3D (void *)-1U; > +} > + > +static inline void ClearPageHugePoisoned(struct page *page) > +{ > + page[3].mapping =3D NULL; > +} > + > +static void destroy_compound_gigantic_page(struct hstate *h, struct page= *page, > + unsigned int order) > { > int i; > int nr_pages =3D 1 << order; > @@ -1222,14 +1240,19 @@ static void destroy_compound_gigantic_page(struct= page *page, > atomic_set(compound_pincount_ptr(page), 0); > =20 > for (i =3D 1; i < nr_pages; i++, p =3D mem_map_next(p, page, i)) { > + if (!hstate_is_gigantic(h)) > + p->mapping =3D NULL; > clear_compound_head(p); > set_page_refcounted(p); > } > =20 > + if (PageHugePoisoned(page)) > + ClearPageHugePoisoned(page); > set_compound_order(page, 0); > __ClearPageHead(page); > } > =20 > +#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE > static void free_gigantic_page(struct page *page, unsigned int order) > { > /* > @@ -1284,13 +1307,16 @@ static struct page *alloc_gigantic_page(struct hs= tate *h, gfp_t gfp_mask, > return NULL; > } > static inline void free_gigantic_page(struct page *page, unsigned int or= der) { } > -static inline void destroy_compound_gigantic_page(struct page *page, > - unsigned int order) { } > +static inline void destroy_compound_gigantic_page(struct hstate *h, > + struct page *page, > + unsigned int order) { } > #endif > =20 > static void update_and_free_page(struct hstate *h, struct page *page) > { > int i; > + bool poisoned =3D PageHugePoisoned(page); > + unsigned int order =3D huge_page_order(h); > =20 > if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) > return; > @@ -1313,11 +1339,21 @@ static void update_and_free_page(struct hstate *h= , struct page *page) > * we might block in free_gigantic_page(). > */ > spin_unlock(&hugetlb_lock); > - destroy_compound_gigantic_page(page, huge_page_order(h)); > - free_gigantic_page(page, huge_page_order(h)); > + destroy_compound_gigantic_page(h, page, order); > + free_gigantic_page(page, order); > spin_lock(&hugetlb_lock); > } else { > - __free_pages(page, huge_page_order(h)); > + if (unlikely(poisoned)) { > + /* > + * If the hugepage is poisoned, do as we do for > + * gigantic pages and free the pages as order-0. > + * free_pages_prepare will skip over the poisoned ones. > + */ > + destroy_compound_gigantic_page(h, page, order); This function is for gigantic page from its name, so shouldn't be called for non-gigantic huge page. Maybe renaming it and/or introducing some inner function layer to factor out common part would be better. > + free_contig_range(page_to_pfn(page), 1 << order); > + } else { > + __free_pages(page, huge_page_order(h)); > + } > } > } > =20 > @@ -1427,6 +1463,11 @@ static void __free_huge_page(struct page *page) > if (restore_reserve) > h->resv_huge_pages++; > =20 > + if (PageHugePoisoned(page)) { > + spin_unlock(&hugetlb_lock); > + return; > + } > + > if (PageHugeTemporary(page)) { > list_del(&page->lru); > ClearPageHugeTemporary(page); > @@ -5642,6 +5683,9 @@ void move_hugetlb_state(struct page *oldpage, struc= t page *newpage, int reason) > hugetlb_cgroup_migrate(oldpage, newpage); > set_page_owner_migrate_reason(newpage, reason); > =20 > + if (reason =3D=3D MR_MEMORY_FAILURE) > + SetPageHugePoisoned(oldpage); > + > /* > * transfer temporary state of the new huge page. This is > * reverse to other transitions because the newpage is going to > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index caf012d34607..c0ebab4eed4c 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -65,9 +65,17 @@ int sysctl_memory_failure_recovery __read_mostly =3D 1= ; > =20 > atomic_long_t num_poisoned_pages __read_mostly =3D ATOMIC_LONG_INIT(0); > =20 > -static void page_handle_poison(struct page *page) > +static void page_handle_poison(struct page *page, bool release, bool set= _flag, > + bool huge_flag) > { > - SetPageHWPoison(page); > + if (set_flag) > + SetPageHWPoison(page); > + > + if (huge_flag) > + dissolve_free_huge_page(page); > + else if (release) > + put_page(page); > + Indentation seems to be broken, you can run checkpatch.pl to find details. Thanks, Naoya Horiguchi > page_ref_inc(page); > num_poisoned_pages_inc(); > } > @@ -1717,7 +1725,7 @@ static int get_any_page(struct page *page, unsigned= long pfn) > =20 > static int soft_offline_huge_page(struct page *page) > { > - int ret; > + int ret =3D -EBUSY; > unsigned long pfn =3D page_to_pfn(page); > struct page *hpage =3D compound_head(page); > LIST_HEAD(pagelist); > @@ -1757,19 +1765,12 @@ static int soft_offline_huge_page(struct page *pa= ge) > ret =3D -EIO; > } else { > /* > - * We set PG_hwpoison only when the migration source hugepage > - * was successfully dissolved, because otherwise hwpoisoned > - * hugepage remains on free hugepage list, then userspace will > - * find it as SIGBUS by allocation failure. That's not expected > - * in soft-offlining. > + * At this point the page cannot be in-use since we do not > + * let the page to go back to hugetlb freelists. > + * In that case we just need to dissolve it. > + * page_handle_poison will take care of it. > */ > - ret =3D dissolve_free_huge_page(page); > - if (!ret) { > - if (set_hwpoison_free_buddy_page(page)) > - num_poisoned_pages_inc(); > - else > - ret =3D -EBUSY; > - } > + page_handle_poison(page, true, true, true); > } > return ret; > } > @@ -1804,10 +1805,8 @@ static int __soft_offline_page(struct page *page) > * would need to fix isolation locking first. > */ > if (ret =3D=3D 1) { > - put_page(page); > pr_info("soft_offline: %#lx: invalidated\n", pfn); > - SetPageHWPoison(page); > - num_poisoned_pages_inc(); > + page_handle_poison(page, true, true, false); > return 0; > } > =20 > @@ -1838,7 +1837,9 @@ static int __soft_offline_page(struct page *page) > list_add(&page->lru, &pagelist); > ret =3D migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, > MIGRATE_SYNC, MR_MEMORY_FAILURE); > - if (ret) { > + if (!ret) { > + page_handle_poison(page, true, true, false); > + } else { > if (!list_empty(&pagelist)) > putback_movable_pages(&pagelist); > =20 > @@ -1857,37 +1858,25 @@ static int __soft_offline_page(struct page *page) > static int soft_offline_in_use_page(struct page *page) > { > int ret; > - int mt; > struct page *hpage =3D compound_head(page); > =20 > if (!PageHuge(page) && PageTransHuge(hpage)) > if (try_to_split_thp_page(page, "soft offline") < 0) > return -EBUSY; > =20 > - /* > - * Setting MIGRATE_ISOLATE here ensures that the page will be linked > - * to free list immediately (not via pcplist) when released after > - * successful page migration. Otherwise we can't guarantee that the > - * page is really free after put_page() returns, so > - * set_hwpoison_free_buddy_page() highly likely fails. > - */ > - mt =3D get_pageblock_migratetype(page); > - set_pageblock_migratetype(page, MIGRATE_ISOLATE); > if (PageHuge(page)) > ret =3D soft_offline_huge_page(page); > else > ret =3D __soft_offline_page(page); > - set_pageblock_migratetype(page, mt); > return ret; > } > =20 > static int soft_offline_free_page(struct page *page) > { > int rc =3D -EBUSY; > - int rc =3D dissolve_free_huge_page(page); > =20 > if (!dissolve_free_huge_page(page) && take_page_off_buddy(page)) { > - page_handle_poison(page); > + page_handle_poison(page, false, true, false); > rc =3D 0; > } > =20 > diff --git a/mm/migrate.c b/mm/migrate.c > index 75c10d81e833..a68d81d0ae6e 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -1222,16 +1222,11 @@ static int unmap_and_move(new_page_t get_new_page= , > * we want to retry. > */ > if (rc =3D=3D MIGRATEPAGE_SUCCESS) { > - put_page(page); > - if (reason =3D=3D MR_MEMORY_FAILURE) { > + if (reason !=3D MR_MEMORY_FAILURE) > /* > - * Set PG_HWPoison on just freed page > - * intentionally. Although it's rather weird, > - * it's how HWPoison flag works at the moment. > + * We handle poisoned pages in page_handle_poison. > */ > - if (set_hwpoison_free_buddy_page(page)) > - num_poisoned_pages_inc(); > - } > + put_page(page); > } else { > if (rc !=3D -EAGAIN) { > if (likely(!__PageMovable(page))) { > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 4fa0e0887c07..11df51fc2718 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1175,6 +1175,16 @@ static __always_inline bool free_pages_prepare(str= uct page *page, > =20 > trace_mm_page_free(page, order); > =20 > + if (unlikely(PageHWPoison(page)) && !order) { > + /* > + * Untie memcg state and reset page's owner > + */ > + if (memcg_kmem_enabled() && PageKmemcg(page)) > + __memcg_kmem_uncharge_page(page, order); > + reset_page_owner(page, order); > + return false; > + } > + > /* > * Check tail pages before head page information is cleared to > * avoid checking PageCompound for order-0 pages. > @@ -8844,32 +8854,4 @@ bool take_page_off_buddy(struct page *page) > spin_unlock_irqrestore(&zone->lock, flags); > return ret; > } > - > -/* > - * Set PG_hwpoison flag if a given page is confirmed to be a free page. = This > - * test is performed under the zone lock to prevent a race against page > - * allocation. > - */ > -bool set_hwpoison_free_buddy_page(struct page *page) > -{ > - struct zone *zone =3D page_zone(page); > - unsigned long pfn =3D page_to_pfn(page); > - unsigned long flags; > - unsigned int order; > - bool hwpoisoned =3D false; > - > - spin_lock_irqsave(&zone->lock, flags); > - for (order =3D 0; order < MAX_ORDER; order++) { > - struct page *page_head =3D page - (pfn & ((1 << order) - 1)); > - > - if (PageBuddy(page_head) && page_order(page_head) >=3D order) { > - if (!TestSetPageHWPoison(page)) > - hwpoisoned =3D true; > - break; > - } > - } > - spin_unlock_irqrestore(&zone->lock, flags); > - > - return hwpoisoned; > -} > #endif > --=20 > 2.26.2 > =