All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Thu, 15 Apr 2021 13:19:59 +0200	[thread overview]
Message-ID: <54bed4d3-631f-7d30-aa2c-f8dd2f2c6804@redhat.com> (raw)
In-Reply-To: <20210408121804.10440-5-osalvador@suse.de>

On 08.04.21 14:18, Oscar Salvador wrote:
> Physical memory hotadd has to allocate a memmap (struct page array) for
> the newly added memory section. Currently, alloc_pages_node() is used
> for those allocations.
> 
> This has some disadvantages:
>   a) an existing memory is consumed for that purpose
>      (eg: ~2MB per 128MB memory section on x86_64)
>   b) if the whole node is movable then we have off-node struct pages
>      which has performance drawbacks.
>   c) It might be there are no PMD_ALIGNED chunks so memmap array gets
>      populated with base pages.
> 
> This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
> 
> Vmemap page tables can map arbitrary memory.
> That means that we can simply use the beginning of each memory section and
> map struct pages there.
> struct pages which back the allocated space then just need to be treated
> carefully.
> 
> Implementation wise we will reuse vmem_altmap infrastructure to override
> the default allocator used by __populate_section_memmap.
> Part of the implementation also relies on memory_block structure gaining
> a new field which specifies the number of vmemmap_pages at the beginning.
> This patch also introduces the following functions:
> 
>   - vmemmap_init_space: Initializes vmemmap pages by calling move_pfn_range_to_zone(),
> 		       calls kasan_add_zero_shadow() or the vmemmap range and marks
> 		       online as many sections as vmemmap pages fully span.
>   - vmemmap_adjust_pages: Accounts/substract vmemmap_pages to node and zone
> 			 present_pages
>   - vmemmap_deinit_space: Undoes what vmemmap_init_space does.
> 

This is a bit asynchronous; and the function names are not really expressing what is being done :) I'll try to come up with better names below.

It is worth mentioning that the real "mess" is that we want offline_pages() to properly handle zone->present_pages going to 0. Therefore, we want to manually mess with the present page count.


> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>   drivers/base/memory.c          |  64 ++++++++++++++--
>   include/linux/memory.h         |   8 +-
>   include/linux/memory_hotplug.h |  13 ++++
>   include/linux/memremap.h       |   2 +-
>   include/linux/mmzone.h         |   7 +-
>   mm/Kconfig                     |   5 ++
>   mm/memory_hotplug.c            | 162 ++++++++++++++++++++++++++++++++++++++++-
>   mm/sparse.c                    |   2 -
>   8 files changed, 247 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index f209925a5d4e..a5e536a3e9a4 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -173,16 +173,65 @@ static int memory_block_online(struct memory_block *mem)
>   {
>   	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
>   	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
> +	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
> +	int ret;
> +
> +	/*
> +	 * Although vmemmap pages have a different lifecycle than the pages
> +	 * they describe (they remain until the memory is unplugged), doing
> +	 * its initialization and accounting at hot-{online,offline} stage
> +	 * simplifies things a lot
> +	 */

I suggest detecting the zone in here and just passing it down to online_pages().

> +	if (nr_vmemmap_pages) {
> +		ret = vmemmap_init_space(start_pfn, nr_vmemmap_pages, mem->nid,
> +					 mem->online_type);
> +		if (ret)
> +			return ret;
> +	}
>   
> -	return online_pages(start_pfn, nr_pages, mem->online_type, mem->nid);
> +	ret = online_pages(start_pfn + nr_vmemmap_pages,
> +			   nr_pages - nr_vmemmap_pages, mem->online_type,
> +			   mem->nid);
> +
> +	/*
> +	 * Undo the work if online_pages() fails.
> +	 */
> +	if (ret && nr_vmemmap_pages) {
> +		vmemmap_adjust_pages(start_pfn, -nr_vmemmap_pages);
> +		vmemmap_deinit_space(start_pfn, nr_vmemmap_pages);
> +	}
> +
> +	return ret;
>   }

My take would be doing the present page adjustment after onlining succeeded:

static int memory_block_online(struct memory_block *mem)
{
	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
	struct zone *zone;
	int ret;

	zone = mhp_get_target_zone(mem->nid, mem->online_type);

	if (nr_vmemmap_pages) {
		ret = mhp_init_memmap_on_memory(start_pfn, nr_vmemmap_pages, zone);
		if (ret)
			return ret;
	}

	ret = online_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages, zone);
	if (ret) {
		if (nr_vmemmap_pages)
			mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
		return ret;
	}

	/*
	 * Account once onlining succeeded. If the page was unpopulated,
	 * it is now already properly populated.
	 */
	if (nr_vmemmap_pages)
		adjust_present_page_count(zone, nr_vmemmap_pages);
	return 0;		
}

And the opposite:

static int memory_block_offline(struct memory_block *mem)
{
	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
	unsigned long nr_pages = PAGES_PER_SECTION * sections_per_block;
	unsigned long nr_vmemmap_pages = mem->nr_vmemmap_pages;
	struct zone *zone;
	int ret;

	zone = page_zone(pfn_to_page(start_pfn));


	/*
	 * Unaccount before offlining, such that unpopulated zones can
	 * properly be torn down in offline_pages().
	 */
	if (nr_vmemmap_pages)
		adjust_present_page_count(zone, -nr_vmemmap_pages);

	ret = offline_pages(start_pfn + nr_vmemmap_pages, nr_pages - nr_vmemmap_pages);
	if (ret) {
		if (nr_vmemmap_pages)
			adjust_present_page_count(zone, +nr_vmemmap_pages);
		return ret;
	}

	if (nr_vmemmap_pages)
		mhp_deinit_memmap_on_memory(start_pfn, nr_vmemmap_pages);
	return 0;		
}

Having to do the present page adjustment manually is not completely nice,
but it's easier than manually having to mess with zones becomming populated/unpopulated
outside of online_pages()/offline_pages().


-- 
Thanks,

David / dhildenb


  reply	other threads:[~2021-04-15 11:20 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-08 12:17 [PATCH v7 0/8] Allocate memmap from hotadded memory (per device) Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 1/8] drivers/base/memory: Introduce memory_block_{online,offline} Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 2/8] mm,memory_hotplug: Relax fully spanned sections check Oscar Salvador
2021-04-08 12:17 ` [PATCH v7 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count() Oscar Salvador
2021-04-08 12:18 ` [PATCH v7 4/8] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2021-04-15 11:19   ` David Hildenbrand [this message]
2021-04-16  7:25     ` Oscar Salvador
2021-04-16  8:32       ` David Hildenbrand
2021-04-08 12:18 ` [PATCH v7 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2021-04-08 12:18 ` [PATCH v7 6/8] mm,memory_hotplug: Add kernel boot option to enable memmap_on_memory Oscar Salvador
2021-04-08 12:18 ` [PATCH v7 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
2021-04-08 13:36   ` Christoph Hellwig
2021-04-08 18:19     ` David Hildenbrand
2021-04-08 12:18 ` [PATCH v7 8/8] arm64/Kconfig: " Oscar Salvador
2021-04-15 10:38 ` [PATCH v7 0/8] Allocate memmap from hotadded memory (per device) Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54bed4d3-631f-7d30-aa2c-f8dd2f2c6804@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.