From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: mhocko@kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, vbabka@suse.cz, pasha.tatashin@soleen.com
Subject: Re: [RFC PATCH v3 2/4] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Wed, 9 Dec 2020 11:32:26 +0100 [thread overview]
Message-ID: <20201209103226.GC30892@linux> (raw)
In-Reply-To: <0e53ae49-6b0a-0714-df0f-fd6183061752@redhat.com>
On Wed, Dec 02, 2020 at 11:05:53AM +0100, David Hildenbrand wrote:
> If you take a look at generic_online_page() there are some things that
> won't be done for our vmemmap pages
>
> 1. kernel_map_pages(page, 1 << order, 1);
>
> We're accessing these pages already when initializing the memmap. We
> might have to explicitly map these vmemmap pages at some point. Might
> require some thought. Did you test with debug pagealloc?
I always try to run with all debug stuff enabled, but I definitely
did not enable debug_pagealloc.
I will have a look at it.
> 2. totalram_pages_add(1UL << order);
>
> We should add/remove the vmemmap pages manually from totalram I guess.
Yes, we should. That was a clear oversight.
> Also, I wonder if we should add these pages to the zone managed page
> count. Not quite sure yet what the benefit would be and if we care at
> all. (it's a change in behavior when hotplugging memory, though, that's
> why I am poining it out)
I would rather not since they are not "manageable" per se.
The same we do not account the pages used for bootmem stuff into zone->managed.
> > Hot-remove:
> >
> > If the range was using memmap on memory (aka vmemmap pages),
> > we construct an altmap structure so free_hugepage_table does
> > the right thing and calls vmem_altmap_free instead of
> > free_pagetable.
> >
>
> You might want to update the comment in include/linux/mmzone.h for
> ZONE_MOVABLE, adding this special case when onlining such memory to the
> MOVABLE zone. The pages are movable for offlining only, not for allocations.
Sure, let us document that.
> > -int create_memory_block_devices(unsigned long start, unsigned long size)
> > +int create_memory_block_devices(unsigned long start, unsigned long size,
> > + unsigned long vmemmap_pages)
> > {
> > const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
> > unsigned long end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
> > @@ -645,9 +649,10 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
> > return -EINVAL;
> >
> > for (block_id = start_block_id; block_id != end_block_id; block_id++) {
> > - ret = init_memory_block(block_id, MEM_OFFLINE);
> > + ret = init_memory_block(block_id, MEM_OFFLINE, vmemmap_pages);
> > if (ret)
> > break;
> > + vmemmap_pages = 0;
>
> We should even bail out if someone would have vmemmap_pages set when
> creating more than one memory block device.
That cannot really happen as we would have bailed out earlier prior to
call create_memory_block_devices, by the checks in mhp_supports_memmap_on_memory.
> > - for (pfn = start_pfn; pfn < end_pfn; pfn += MAX_ORDER_NR_PAGES)
> > - (*online_page_callback)(pfn_to_page(pfn), MAX_ORDER - 1);
> > + for (pfn = buddy_start_pfn; pfn < end_pfn; pfn += (1 << order)) {
> > + order = MAX_ORDER - 1;
> > + while (pfn & ((1 << order) - 1))
> > + order--;
> > + (*online_page_callback)(pfn_to_page(pfn), order);
> > + }
>
> I think it would be nicer to handle the unaligned part in a separate loop.
No strong opinion here, I can certainly do it.
> > /* mark all involved sections as online */
> > online_mem_sections(start_pfn, end_pfn);
> > @@ -678,6 +684,7 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon
> > pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - pgdat->node_start_pfn;
> >
> > }
> > +
>
> Unrelated change.
Ups ^^
> > +static int get_memblock_vmemmap_pages_cb(struct memory_block *mem, void *arg)
> > +{
> > + unsigned long *nr_vmemmap_pages = (unsigned long *)arg;
> > + int ret = !mem->nr_vmemmap_pages;
> > +
> > + if (!ret)
> > + *nr_vmemmap_pages += mem->nr_vmemmap_pages;
> > + return ret;
>
> When removing in the same granularity than adding, all we care about it
> the first memory block.
>
>
> unsigned long *nr_vmemmap_pages = (unsigned long *)arg;
>
> *nr_vmemmap_pages = mem->nr_vmemmap_pages;
> /* We only care about the first memory block. */
> return 1;
Yeah, cleaner, thanks
> > + /*
> > + * Prepare a vmem_altmap struct if we used it at hot-add, so
> > + * remove_pmd_table->free_hugepage_table does the right thing.
> > + */
> > + (void)walk_memory_blocks(start, size, &nr_vmemmap_pages,
> > + get_memblock_vmemmap_pages_cb);
> > + if (nr_vmemmap_pages) {
> > + mhp_altmap.alloc = nr_vmemmap_pages;
> > + altmap = &mhp_altmap;
> > + }
> > +
>
> If someone would remove_memory() in a different granularity than
> add_memory(), this would no longer work. How can we catch that
> efficiently? Or at least document that this is not supported with
> memmap_on_memory).
Well, we can check whether the size spans more than a single memory block.
And if it does, we can print a warning with pr_warn and refuse to remove
the memory.
We could document that "memory_hotplug.memmap_on_memory" is meant to operate
with ranges that span a single memory block.
And I guess that we could place that documentation under [1].
[1] Documentation/admin-guide/kernel-parameters.txt
Thanks for the review David
--
Oscar Salvador
SUSE L3
next prev parent reply other threads:[~2020-12-09 10:32 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-01 11:51 [RFC PATCH v3 0/4] Allocate memmap from hotadded memory (per device) Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 1/4] mm,memory_hotplug: Add mhp_supports_memmap_on_memory Oscar Salvador
2020-12-02 9:36 ` David Hildenbrand
2020-12-09 9:36 ` Oscar Salvador
2020-12-09 9:40 ` David Hildenbrand
2020-12-09 9:43 ` Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 2/4] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2020-12-02 10:05 ` David Hildenbrand
2020-12-09 10:32 ` Oscar Salvador [this message]
2020-12-09 11:51 ` Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 3/4] mm,memory_hotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2020-12-02 9:37 ` David Hildenbrand
2020-12-09 9:38 ` Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 4/4] mm,memory_hotplug: Add mhp_memmap_on_memory boot option Oscar Salvador
2020-12-02 9:42 ` David Hildenbrand
2020-12-09 10:02 ` Oscar Salvador
2020-12-09 10:04 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201209103226.GC30892@linux \
--to=osalvador@suse.de \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=pasha.tatashin@soleen.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).