From: David Rientjes <rientjes@google.com>
To: Aaron Thompson <dev@aaront.org>
Cc: linux-mm@kvack.org, Mike Rapoport <rppt@kernel.org>,
"H. Peter Anvin" <hpa@zytor.com>,
Alexander Potapenko <glider@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Shevchenko <andy@infradead.org>,
Ard Biesheuvel <ardb@kernel.org>, Borislav Petkov <bp@alien8.de>,
Darren Hart <dvhart@infradead.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Dmitry Vyukov <dvyukov@google.com>,
Ingo Molnar <mingo@redhat.com>, Marco Elver <elver@google.com>,
Thomas Gleixner <tglx@linutronix.de>,
kasan-dev@googlegroups.com, linux-efi@vger.kernel.org,
linux-kernel@vger.kernel.org,
platform-driver-x86@vger.kernel.org, x86@kernel.org
Subject: Re: [PATCH 0/1] Pages not released from memblock to the buddy allocator
Date: Wed, 4 Jan 2023 17:43:51 -0800 (PST) [thread overview]
Message-ID: <30478b4a-870b-bf48-76d0-a236a40e7674@google.com> (raw)
In-Reply-To: <010101857bbc3a41-173240b3-9064-42ef-93f3-482081126ec2-000000@us-west-2.amazonses.com>
On Wed, 4 Jan 2023, Aaron Thompson wrote:
> Hi all,
>
> (I've CC'ed the KMSAN and x86 EFI maintainers as an FYI; the only code change
> I'm proposing is in memblock.)
>
> I've run into a case where pages are not released from memblock to the buddy
> allocator. If deferred struct page init is enabled, and memblock_free_late() is
> called before page_alloc_init_late() has run, and the pages being freed are in
> the deferred init range, then the pages are never released. memblock_free_late()
> calls memblock_free_pages() which only releases the pages if they are not in the
> deferred range. That is correct for free pages because they will be initialized
> and released by page_alloc_init_late(), but memblock_free_late() is dealing with
> reserved pages. If memblock_free_late() doesn't release those pages, they will
> forever be reserved. All reserved pages were initialized by memblock_free_all(),
> so I believe the fix is to simply have memblock_free_late() call
> __free_pages_core() directly instead of memblock_free_pages().
>
> In addition, there was a recent change (3c20650982609 "init: kmsan: call KMSAN
> initialization routines") that added a call to kmsan_memblock_free_pages() in
> memblock_free_pages(). It looks to me like it would also be incorrect to make
> that call in the memblock_free_late() case, because the KMSAN metadata was
> already initialized for all reserved pages by kmsan_init_shadow(), which runs
> before memblock_free_all(). Having memblock_free_late() call __free_pages_core()
> directly also fixes this issue.
>
> I encountered this issue when I tried to switch some x86_64 VMs I was running
> from BIOS boot to EFI boot. The x86 EFI code reserves all EFI boot services
> ranges via memblock_reserve() (part of setup_arch()), and it frees them later
> via memblock_free_late() (part of efi_enter_virtual_mode()). The EFI
> implementation of the VM I was attempting this on, an Amazon EC2 t3.micro
> instance, maps north of 170 MB in boot services ranges that happen to fall in
> the deferred init range. I certainly noticed when that much memory went missing
> on a 1 GB VM.
>
> I've tested the patch on EC2 instances, qemu/KVM VMs with OVMF, and some real
> x86_64 EFI systems, and they all look good to me. However, the physical systems
> that I have don't actually trigger this issue because they all have more than 4
> GB of RAM, so their deferred init range starts above 4 GB (it's always in the
> highest zone and ZONE_DMA32 ends at 4 GB) while their EFI boot services mappings
> are below 4 GB.
>
> Deferred struct page init can't be enabled on x86_32 so those systems are
> unaffected. I haven't found any other code paths that would trigger this issue,
> though I can't promise that there aren't any. I did run with this patch on an
> arm64 VM as a sanity check, but memblock=debug didn't show any calls to
> memblock_free_late() so that system was unaffected as well.
>
> I am guessing that this change should also go the stable kernels but it may not
> apply cleanly (__free_pages_core() was __free_pages_boot_core() and
> memblock_free_pages() was __free_pages_bootmem() when this issue was first
> introduced). I haven't gone through that process before so please let me know if
> I can help with that.
>
> This is the end result on an EC2 t3.micro instance booting via EFI:
>
> v6.2-rc2:
> # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
> Node 0, zone DMA
> spanned 4095
> present 3999
> managed 3840
> Node 0, zone DMA32
> spanned 246652
> present 245868
> managed 178867
>
> v6.2-rc2 + patch:
> # grep -E 'Node|spanned|present|managed' /proc/zoneinfo
> Node 0, zone DMA
> spanned 4095
> present 3999
> managed 3840
> Node 0, zone DMA32
> spanned 246652
> present 245868
> managed 222816
>
The above before + after seems useful information to include in the commit
description of the change.
next prev parent reply other threads:[~2023-01-05 1:44 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-04 7:43 [PATCH 0/1] Pages not released from memblock to the buddy allocator Aaron Thompson
2023-01-05 1:43 ` David Rientjes [this message]
2023-01-05 4:17 ` [PATCH v2 " Aaron Thompson
[not found] ` <20230105041650.1485-1-dev@aaront.org>
2023-01-05 4:17 ` [PATCH v2 1/1] mm: Always release pages to the buddy allocator in memblock_free_late() Aaron Thompson
2023-01-30 7:40 ` Xu Yu
2023-01-05 10:48 ` Ingo Molnar
2023-01-06 2:02 ` Aaron Thompson
2023-01-06 3:12 ` Ingo Molnar
2023-01-30 7:47 ` Xu Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=30478b4a-870b-bf48-76d0-a236a40e7674@google.com \
--to=rientjes@google.com \
--cc=akpm@linux-foundation.org \
--cc=andy@infradead.org \
--cc=ardb@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=dev@aaront.org \
--cc=dvhart@infradead.org \
--cc=dvyukov@google.com \
--cc=elver@google.com \
--cc=glider@google.com \
--cc=hpa@zytor.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-efi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@redhat.com \
--cc=platform-driver-x86@vger.kernel.org \
--cc=rppt@kernel.org \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.