From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Michal Hocko <mhocko@kernel.org>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Vlastimil Babka <vbabka@suse.cz>,
Pavel Tatashin <pasha.tatashin@soleen.com>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v4 1/5] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Tue, 16 Mar 2021 18:45:17 +0100 [thread overview]
Message-ID: <a03fcbb3-5b77-8671-6376-13c360f5ae25@redhat.com> (raw)
In-Reply-To: <a2bf7b25-1e7a-bb6b-2fcd-08a4f4636ed5@redhat.com>
On 16.03.21 17:46, David Hildenbrand wrote:
> On 15.03.21 11:22, Oscar Salvador wrote:
>> On Thu, Mar 11, 2021 at 08:06:53PM +0100, David Hildenbrand wrote:
>>> This looks essentially good to me, except some parts in
>>> mhp_supports_memmap_on_memory()
>>>
>>>> +bool mhp_supports_memmap_on_memory(unsigned long size)
>>>> +{
>>>> + unsigned long pageblock_size = PFN_PHYS(pageblock_nr_pages);
>>>> + unsigned long remaining_mem = size - PMD_SIZE;
>>
>> Hi David, thanks for the review!
>>
>>> This looks weird. I think what you want to test is that
>>>
>>>
>>> a) "nr_vmemmap_pages * sizeof(struct page)" spans complete PMDs (IOW, we
>>> won't map too much via the altmap when populating a PMD in the vmemmap)
>>>
>>> b) "remaining = size - nr_vmemmap_pages * sizeof(struct page)" spans
>>> complete pageblock.
>>
>> We do not know the nr_vmemmap_pages at this point in time, although it is
>> easy to pre-compute.
>>
>> For every section we populate, we use PMD_SIZE. So, PMD_SIZE/PAGE_SIZE lays
>> the nr_vmemmap_pages that are used for populating a single section.
>
> I find that cross reference to vmemmap code a little hard to digest.
> I would have assume that we don't have to care about PMDs in this
> code here at all. The vmemmap population code should handle that.
>
> I think I already mentioned that somewhere, I think it should be like this:
>
> a) vmemmap code should *never* populate more memory than requested for
> a single memory section when we are populating from the altmap.
> If that cannot be guaranteed for PMDs, then we have to fallback
> to populating base pages. Populating PMDs from an altmap with
> sizeof(struct page) == 64 is highly dangerous.
>
> Assume we have sizeof(struct page) == 56. A 128 MiB section
> spans 32768 pages - we need 32768 * sizeof(struct page)
> space for the vmemmap.
>
> With 64k pages we *can* use exactly one PMD. With 56k pages
> we need 448 individual (full!) pages for the vmemmap.
>
> IOW, we don't care how vmemmap code will do the mapping.
> vmemmap code has to get it right. IMHO, asserting it in
> this code is wrong.
>
>
> b) In this code, we really should only care about what
> memory onlining/offlining code can or can't do.
> We really only care that
>
> 1) size == memory_block_size_bytes()
> 2) remaining_size
> 3) IS_ALIGNED(remaining_size, pageblock_size);
>
>
> I think a) is a logical consequence of b) for x86-64,
> whereby the pageblock size corresponds to PMD, so we
> might not have to care about a) right now.
>
> See below for my take.
>
>
>>
>> But let me explain the reasoning I use in the current code:
>>
>> I will enumarate the assumptions that must hold true in order to support the
>> feature together with their check:
>>
>> - We span a single memory block
>>
>> size == memory_block_size_bytes()
>>
>> - The vmemmap pages span a complete PMD and no more than a PMD.
>>
>> !(PMD_SIZE % sizeof(struct page))
>>
>> - The vmemmap pages and the pages exposed to the buddy have to cover full
>> pageblocks
>>
>> remaining_mem = size - PMD_SIZE;
>> IS_ALIGNED(remaining_mem, pageblock_size)
>>
>> Although this check only covers the range without the vmemmap pages, one could
>> argue that since we use only a PMD_SIZE at a time, we know that PMD_SIZE is
>> pageblock aligned, so the vmemmap range is PMD_SIZE as well.
>>
>> Now, I see how this might be confusing and rather incomplete.
>> So I guess a better and more clear way to write it would be:
>>
>> bool mhp_supports_memmap_on_memory(unsigned long size)
>> {
>> unsigned long nr_vmemmap_pages = PMD_SIZE / PAGE_SIZE;
>> unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
>> unsigned long remaining_size = size - vmemmap_size;
>>
>> return memmap_on_memory &&
>> IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
>> size == memory_block_size_bytes() &&
>> !(PMD_SIZE % vmemmap_size) &&
>> IS_ALIGNED(vmemmap_size, pageblock_size) &&
>> remaining_size &&
>> IS_ALIGNED(remaining_size, pageblock_size);
>> }
>>
>> Note that above check is only for a single section, but if assumptions hold true
>> for a single section, it will for many as well.
>
> Okay, please document the statement about single sections, that's
> important to understand what's happening.
>
> My take would be
>
> bool mhp_supports_memmap_on_memory(unsigned long size)
> {
> /*
> * Note: We calculate for a single memory section. The calculation
> * implicitly covers memory blocks that span multiple sections.
> */
> unsigned long nr_vmemmap_pages = SECTION_SIZE / PAGE_SIZE;
> unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page);
> unsigned long remaining_size = size - vmemmap_size;
>
> /*
> * Note: vmemmap code has to make sure to not populate more memory
> * via the altmap than absolutely necessary for a single section.
> * This always holds when we allocate pageblock-sized chunks from
> * the altmap, as we require pageblock alignment here.
> *
> * TODO, other doc
> */
> return memmap_on_memory &&
> IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) &&
> size == memory_block_size_bytes() &&
> remaining_size &&
> IS_ALIGNED(remaining_size, pageblock_size);
Oh, I forgot, we can also drop the check for "remaining_size" in that
code. If we ever have PAGE_SIZE == sizeof(struct page) we'll be in
bigger trouble :D
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2021-03-16 17:45 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-09 17:55 [PATCH v4 0/5] Allocate memmap from hotadded memory (per device) Oscar Salvador
2021-03-09 17:55 ` [PATCH v4 1/5] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2021-03-11 19:06 ` David Hildenbrand
[not found] ` <20210315102224.GA24699@linux>
2021-03-16 16:46 ` David Hildenbrand
2021-03-16 17:45 ` David Hildenbrand [this message]
2021-03-17 14:08 ` Oscar Salvador
2021-03-17 14:35 ` David Hildenbrand
2021-03-18 8:27 ` Oscar Salvador
2021-03-18 10:38 ` Oscar Salvador
2021-03-18 11:24 ` David Hildenbrand
2021-03-18 12:03 ` Oscar Salvador
2021-03-18 12:05 ` David Hildenbrand
2021-03-09 17:55 ` [PATCH v4 2/5] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2021-03-09 17:55 ` [PATCH v4 3/5] mm,memory_hotplug: Add kernel boot option to enable memmap_on_memory Oscar Salvador
2021-03-09 17:55 ` [PATCH v4 4/5] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
2021-03-09 17:55 ` [PATCH v4 5/5] arm64/Kconfig: " Oscar Salvador
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a03fcbb3-5b77-8671-6376-13c360f5ae25@redhat.com \
--to=david@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=osalvador@suse.de \
--cc=pasha.tatashin@soleen.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).