linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: mhocko@kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, vbabka@suse.cz, pasha.tatashin@soleen.com
Subject: Re: [RFC PATCH v3 2/4] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Wed, 9 Dec 2020 11:32:26 +0100	[thread overview]
Message-ID: <20201209103226.GC30892@linux> (raw)
In-Reply-To: <0e53ae49-6b0a-0714-df0f-fd6183061752@redhat.com>

On Wed, Dec 02, 2020 at 11:05:53AM +0100, David Hildenbrand wrote:
> If you take a look at generic_online_page() there are some things that
> won't be done for our vmemmap pages
> 
> 1. kernel_map_pages(page, 1 << order, 1);
> 
> We're accessing these pages already when initializing the memmap. We
> might have to explicitly map these vmemmap pages at some point. Might
> require some thought. Did you test with debug pagealloc?

I always try to run with all debug stuff enabled, but I definitely
did not enable debug_pagealloc.
I will have a look at it.

> 2. totalram_pages_add(1UL << order);
> 
> We should add/remove the vmemmap pages manually from totalram I guess.

Yes, we should. That was a clear oversight.

> Also, I wonder if we should add these pages to the zone managed page
> count. Not quite sure yet what the benefit would be and if we care at
> all. (it's a change in behavior when hotplugging memory, though, that's
> why I am poining it out)

I would rather not since they are not "manageable" per se.
The same we do not account the pages used for bootmem stuff into zone->managed.

> > Hot-remove:
> > 
> >  If the range was using memmap on memory (aka vmemmap pages),
> >  we construct an altmap structure so free_hugepage_table does
> >  the right thing and calls vmem_altmap_free instead of
> >  free_pagetable.
> > 
> 
> You might want to update the comment in include/linux/mmzone.h for
> ZONE_MOVABLE, adding this special case when onlining such memory to the
> MOVABLE zone. The pages are movable for offlining only, not for allocations.

Sure, let us document that.

> > -int create_memory_block_devices(unsigned long start, unsigned long size)
> > +int create_memory_block_devices(unsigned long start, unsigned long size,
> > +				unsigned long vmemmap_pages)
> >  {
> >  	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
> >  	unsigned long end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
> > @@ -645,9 +649,10 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
> >  		return -EINVAL;
> >  
> >  	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
> > -		ret = init_memory_block(block_id, MEM_OFFLINE);
> > +		ret = init_memory_block(block_id, MEM_OFFLINE, vmemmap_pages);
> >  		if (ret)
> >  			break;
> > +		vmemmap_pages = 0;
> 
> We should even bail out if someone would have vmemmap_pages set when
> creating more than one memory block device.

That cannot really happen as we would have bailed out earlier prior to
call create_memory_block_devices, by the checks in mhp_supports_memmap_on_memory.

> > -	for (pfn = start_pfn; pfn < end_pfn; pfn += MAX_ORDER_NR_PAGES)
> > -		(*online_page_callback)(pfn_to_page(pfn), MAX_ORDER - 1);
> > +	for (pfn = buddy_start_pfn; pfn < end_pfn; pfn += (1 << order)) {
> > +		order = MAX_ORDER - 1;
> > +		while (pfn & ((1 << order) - 1))
> > +			order--;
> > +		(*online_page_callback)(pfn_to_page(pfn), order);
> > +	}
> 
> I think it would be nicer to handle the unaligned part in a separate loop.

No strong opinion here, I can certainly do it.

> >  	/* mark all involved sections as online */
> >  	online_mem_sections(start_pfn, end_pfn);
> > @@ -678,6 +684,7 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon
> >  	pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - pgdat->node_start_pfn;
> >  
> >  }
> > +
> 
> Unrelated change.

Ups ^^

> > +static int get_memblock_vmemmap_pages_cb(struct memory_block *mem, void *arg)
> > +{
> > +	unsigned long *nr_vmemmap_pages = (unsigned long *)arg;
> > +	int ret = !mem->nr_vmemmap_pages;
> > +
> > +	if (!ret)
> > +		*nr_vmemmap_pages += mem->nr_vmemmap_pages;
> > +	return ret;
> 
> When removing in the same granularity than adding, all we care about it
> the first memory block.
> 
> 
> unsigned long *nr_vmemmap_pages = (unsigned long *)arg;
> 
> *nr_vmemmap_pages = mem->nr_vmemmap_pages;
> /* We only care about the first memory block. */
> return 1;

Yeah, cleaner, thanks

> > +	/*
> > +	 * Prepare a vmem_altmap struct if we used it at hot-add, so
> > +	 * remove_pmd_table->free_hugepage_table does the right thing.
> > +	 */
> > +	(void)walk_memory_blocks(start, size, &nr_vmemmap_pages,
> > +	     get_memblock_vmemmap_pages_cb);
> > +	if (nr_vmemmap_pages) {
> > +		mhp_altmap.alloc = nr_vmemmap_pages;
> > +		altmap = &mhp_altmap;
> > +	}
> > +
> 
> If someone would remove_memory() in a different granularity than
> add_memory(), this would no longer work. How can we catch that
> efficiently? Or at least document that this is not supported with
> memmap_on_memory).

Well, we can check whether the size spans more than a single memory block.
And if it does, we can print a warning with pr_warn and refuse to remove
the memory.

We could document that "memory_hotplug.memmap_on_memory" is meant to operate
with ranges that span a single memory block.
And I guess that we could place that documentation under [1].

[1] Documentation/admin-guide/kernel-parameters.txt

Thanks for the review David

-- 
Oscar Salvador
SUSE L3


  reply	other threads:[~2020-12-09 10:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-01 11:51 [RFC PATCH v3 0/4] Allocate memmap from hotadded memory (per device) Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 1/4] mm,memory_hotplug: Add mhp_supports_memmap_on_memory Oscar Salvador
2020-12-02  9:36   ` David Hildenbrand
2020-12-09  9:36     ` Oscar Salvador
2020-12-09  9:40       ` David Hildenbrand
2020-12-09  9:43         ` Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 2/4] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2020-12-02 10:05   ` David Hildenbrand
2020-12-09 10:32     ` Oscar Salvador [this message]
2020-12-09 11:51       ` Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 3/4] mm,memory_hotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2020-12-02  9:37   ` David Hildenbrand
2020-12-09  9:38     ` Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 4/4] mm,memory_hotplug: Add mhp_memmap_on_memory boot option Oscar Salvador
2020-12-02  9:42   ` David Hildenbrand
2020-12-09 10:02     ` Oscar Salvador
2020-12-09 10:04       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201209103226.GC30892@linux \
    --to=osalvador@suse.de \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).