From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 947ABC54EAA for ; Mon, 30 Jan 2023 07:48:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25B128E0001; Mon, 30 Jan 2023 02:48:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 20ABA6B0073; Mon, 30 Jan 2023 02:48:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FA6D8E0001; Mon, 30 Jan 2023 02:48:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 018DB6B0072 for ; Mon, 30 Jan 2023 02:48:04 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CFCB91A01E2 for ; Mon, 30 Jan 2023 07:48:04 +0000 (UTC) X-FDA: 80410686888.16.1414ADE Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) by imf10.hostedemail.com (Postfix) with ESMTP id 343B1C000F for ; Mon, 30 Jan 2023 07:48:00 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf10.hostedemail.com: domain of xuyu@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=xuyu@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675064882; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TZnlWiT3NN+onY2jaCZF17jlz1vmDIJJZFMh0A/dgis=; b=u7zNg/h/sfiyI3YYhaseFg7KLm+EXFMVHeGTtd2wZWpU94S4nOdpI87sBlaqLupDyPufOg jC1omzClIDSaz/w6/I5+YTvbwpiHSq8lgv7oGAP+2Jj7ajD1Ci0Q0ZFo1GRc4+kOfPJlkR ya5qCURr2BitlngeM40AP3eu6adOTk0= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=alibaba.com; spf=pass (imf10.hostedemail.com: domain of xuyu@linux.alibaba.com designates 115.124.30.101 as permitted sender) smtp.mailfrom=xuyu@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675064882; a=rsa-sha256; cv=none; b=5jfNjY6GnPquKIn3mqV5QiRU4tz3fJpMoQGx0MFB21edIYoI96w549vWqqdWnWSjQlDqBI k1xJ9JzoqEgA5wclOcR1M31/0ilzl+2Py0zhWMK0/6sfwXO+rpOCbBuYhxXttr6VNdinbc Hjsg8/a2FibTVh8NNf61Igza/pHAvZg= X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046050;MF=xuyu@linux.alibaba.com;NM=1;PH=DS;RN=3;SR=0;TI=SMTPD_---0VaQebbZ_1675064877; Received: from 30.221.144.253(mailfrom:xuyu@linux.alibaba.com fp:SMTPD_---0VaQebbZ_1675064877) by smtp.aliyun-inc.com; Mon, 30 Jan 2023 15:47:57 +0800 Message-ID: <43df1462-ba61-a91a-9c55-f0dc4e1b80c8@linux.alibaba.com> Date: Mon, 30 Jan 2023 15:47:56 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.6.1 Subject: Re: [PATCH v2 1/1] mm: Always release pages to the buddy allocator in memblock_free_late(). From: Xu Yu To: baolin.wang@linux.alibaba.com, Mike Rapoport , linux-mm@kvack.org References: <010101857bbc3a41-173240b3-9064-42ef-93f3-482081126ec2-000000@us-west-2.amazonses.com> <20230105041650.1485-1-dev@aaront.org> <010001858025fc22-e619988e-c0a5-4545-bd93-783890b9ad14-000000@email.amazonses.com> Content-Language: en-US In-Reply-To: <010001858025fc22-e619988e-c0a5-4545-bd93-783890b9ad14-000000@email.amazonses.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 343B1C000F X-Stat-Signature: on4tnq5u1dx54dzhefhcj1da6skco6ig X-HE-Tag: 1675064880-755665 X-HE-Meta: U2FsdGVkX1/ReihHCM7I5YCTU96rgKqIuGQfB7pjatei3SENUNpxeroIfMaUmRgFyHPse+LoHSMQ5vfq1tDVdzUnnOajq0XdGfq4Pgr3RQg5vUxeipYcUipyLdMadcCArVyireMw/ReWHe6MEVwLFrQCjsLcfpD/AdnGWrSS7BzgmEKGjxqinfeGSULWRZiB6M8DItGt8kiUey3fNBooim6Uj3eUyj36lM8oGwY/xDOpoDm3wx3X4FVAer7Sry/0p1l9vbmnPfP9d0DljZ1CYh0A6redjc8EO6dPNIKJrK16r3bmzVTatPM5Zg9BlHdNP5C//WrKr0NeRSH2aRT5EEO/hMNOuNmbPdIs6KEwEm1yVT38eU9cC+oguuoAqyYohGVpiqmMYG/LaeKZkouB+UCBSruGNWx0IV2MzGlp7d8qpV6RP3Wz9Mm4+LTPF42c0yMeqgoirDh0QLv5RRZtCuGJH1y8mXffXi5FHYO98k7XGZcj8IdLIQ5GiDSc4CQhNXeGWohvagn7z1/Ja5IPAJTjkami7xzgn3pOx4AZtII5yR7+1CGiywJcSdF5pSHV98oV4MYRq1LJasWUK1sR6F6GB/VIyiTGiwtI3/Q7JN3zwvyu4zzn+REBjLfRA8WobpNoUPc/UQApTTsgZiixTF3ircgouQqVoJBL1gtjb3hGgwXvVvAlYT9Mfn3O9LQ82H/Nzzr9X6VlFiXniYtsibwIhuA8l5VsTKCEBYZb2CQ0yrrQRoDsZ0LTNcfmZ8KiPNaaJMZBEmb6VxDX/CM4nmnkNqD9/eEtsLh9ZGmruIR4l9CII8H665q75GkgkKVL9jkkwV+J68ppYSut+ZadjxyYc3/RU4Hxv6fYHK8+BqfK1DUMgAug2AAjm9TT9Rtqs7ObSUB2TmEZWy7+lJCcyuoYH4urdZ0yDO+PB0AK6TVHc+weYrOjcQQ/9Ol87uy7Q2HmTMYjUfjPFUchKwe PrV+w4vt U5UpufC1Rq2VqW9p3mDsekixcZoTCFhRTPXtShAmjRkqANDkhZvI/SbWQhOSsJ41dJQDJ8d0q9RvLeVIwKRtMKfQYijUC0ueXWIRyVDjeIdi9fVln8UVxb9OuSDb6rIc+VOsIdJCc1PqsFXH73uy46WWHOYM/p2yuZ3/j8mtLAZCM0ptwGqRnCg5VQMlAEt/+s9QIlWQ2DbkopbI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Sorry for sending out the email accidentally, please ignore it. Actually, I was trying this fix in my environment. On 1/30/23 3:40 PM, Xu Yu wrote: > From: Aaron Thompson > > If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, memblock_free_pages() > only releases pages to the buddy allocator if they are not in the > deferred range. This is correct for free pages (as defined by > for_each_free_mem_pfn_range_in_zone()) because free pages in the > deferred range will be initialized and released as part of the deferred > init process. memblock_free_pages() is called by memblock_free_late(), > which is used to free reserved ranges after memblock_free_all() has > run. All pages in reserved ranges have been initialized at that point, > and accordingly, those pages are not touched by the deferred init > process. This means that currently, if the pages that > memblock_free_late() intends to release are in the deferred range, they > will never be released to the buddy allocator. They will forever be > reserved. > > In addition, memblock_free_pages() calls kmsan_memblock_free_pages(), > which is also correct for free pages but is not correct for reserved > pages. KMSAN metadata for reserved pages is initialized by > kmsan_init_shadow(), which runs shortly before memblock_free_all(). > > For both of these reasons, memblock_free_pages() should only be called > for free pages, and memblock_free_late() should call __free_pages_core() > directly instead. > > One case where this issue can occur in the wild is EFI boot on > x86_64. The x86 EFI code reserves all EFI boot services memory ranges > via memblock_reserve() and frees them later via memblock_free_late() > (efi_reserve_boot_services() and efi_free_boot_services(), > respectively). If any of those ranges happen to fall within the deferred > init range, the pages will not be released and that memory will be > unavailable. > > For example, on an Amazon EC2 t3.micro VM (1 GB) booting via EFI: > > v6.2-rc2: > # grep -E 'Node|spanned|present|managed' /proc/zoneinfo > Node 0, zone DMA > spanned 4095 > present 3999 > managed 3840 > Node 0, zone DMA32 > spanned 246652 > present 245868 > managed 178867 > > v6.2-rc2 + patch: > # grep -E 'Node|spanned|present|managed' /proc/zoneinfo > Node 0, zone DMA > spanned 4095 > present 3999 > managed 3840 > Node 0, zone DMA32 > spanned 246652 > present 245868 > managed 222816 > > Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") > Signed-off-by: Aaron Thompson > --- > mm/memblock.c | 8 +++++++- > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/mm/memblock.c b/mm/memblock.c > index 511d4783dcf1..fc3d8fbd2060 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1640,7 +1640,13 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size) > end = PFN_DOWN(base + size); > > for (; cursor < end; cursor++) { > - memblock_free_pages(pfn_to_page(cursor), cursor, 0); > + /* > + * Reserved pages are always initialized by the end of > + * memblock_free_all() (by memmap_init() and, if deferred > + * initialization is enabled, memmap_init_reserved_pages()), so > + * these pages can be released directly to the buddy allocator. > + */ > + __free_pages_core(pfn_to_page(cursor), 0); > totalram_pages_inc(); > } > } -- Thanks, Yu