All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@redhat.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v10 4/8] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Wed, 21 Apr 2021 13:37:22 +0200	[thread overview]
Message-ID: <YIAOcrolk359gnOC@dhcp22.suse.cz> (raw)
In-Reply-To: <20210421102701.25051-5-osalvador@suse.de>

On Wed 21-04-21 12:26:57, Oscar Salvador wrote:
> Physical memory hotadd has to allocate a memmap (struct page array) for
> the newly added memory section. Currently, alloc_pages_node() is used
> for those allocations.
> 
> This has some disadvantages:
>  a) an existing memory is consumed for that purpose
>     (eg: ~2MB per 128MB memory section on x86_64)
>     This can even lead to extreme cases where system goes OOM because
>     the physically hotplugged memory depletes the available memory before
>     it is onlined.
>  b) if the whole node is movable then we have off-node struct pages
>     which has performance drawbacks.
>  c) It might be there are no PMD_ALIGNED chunks so memmap array gets
>     populated with base pages.
> 
> This can be improved when CONFIG_SPARSEMEM_VMEMMAP is enabled.
> 
> Vmemap page tables can map arbitrary memory. That means that we can
> reserve a part of the physically hotadded memory to back vmemmap page
> tables. This implementation uses the beginning of the hotplugged memory
> for that purpose.
> 
> There are some non-obviously things to consider though.  Vmemmap
> pages are allocated/freed during the memory hotplug events
> (add_memory_resource(), try_remove_memory()) when the memory is
> added/removed. This means that the reserved physical range is not online
> although it is used. The most obvious side effect is that pfn_to_online_page()
> returns NULL for those pfns. The current design expects that this
> should be OK as the hotplugged memory is considered a garbage until it
> is onlined. For example hibernation wouldn't save the content of those
> vmmemmaps into the image so it wouldn't be restored on resume but this
> should be OK as there no real content to recover anyway while metadata
> is reachable from other data structures (e.g. vmemmap page tables).
> 
> The reserved space is therefore (de)initialized during the {on,off}line
> events (mhp_{de}init_memmap_on_memory). That is done by extracting page
> allocator independent initialization from the regular onlining path.
> The primary reason to handle the reserved space outside of {on,off}line_pages
> is to make each initialization specific to the purpose rather than
> special case them in a single function.
> 
> As per above, the functions that are introduced are:
> 
>  - mhp_init_memmap_on_memory:
> 		       Initializes vmemmap pages by calling move_pfn_range_to_zone(),
> 		       calls kasan_add_zero_shadow(), and onlines as many sections
> 		       as vmemmap pages fully span.
>  - mhp_deinit_memmap_on_memory:
> 		       Offlines as many sections as vmemmap pages fully span,
> 		       removes the range from zhe zone by remove_pfn_range_from_zone(),
> 		       and calls kasan_remove_zero_shadow() for the range.
> 
> The new function memory_block_online() calls mhp_init_memmap_on_memory() before
> doing the actual online_pages(). Should online_pages() fail, we clean up
> by calling mhp_deinit_memmap_on_memory().
> Adjusting of present_pages is done at the end once we know that online_pages()
> succedeed.
> 
> On offline, memory_block_offline() needs to unaccount vmemmap pages from
> present_pages() before calling offline_pages().
> This is necessary because offline_pages() tears down some structures based
> on the fact whether the node or the zone become empty.
> If offline_pages() fails, we account back vmemmap pages.
> If it succeeds, we call mhp_deinit_memmap_on_memory().
> 
> Hot-remove:
> 
>  We need to be careful when removing memory, as adding and
>  removing memory needs to be done with the same granularity.
>  To check that this assumption is not violated, we check the
>  memory range we want to remove and if a) any memory block has
>  vmemmap pages and b) the range spans more than a single memory
>  block, we scream out loud and refuse to proceed.
> 
>  If all is good and the range was using memmap on memory (aka vmemmap pages),
>  we construct an altmap structure so free_hugepage_table does the right
>  thing and calls vmem_altmap_free instead of free_pagetable.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Reviewed-by: David Hildenbrand <david@redhat.com>

Thanks for updating the changelog.

Acked-by: Michal Hocko <mhocko@suse.com>

[...]
> @@ -648,9 +650,16 @@ static void online_pages_range(unsigned long start_pfn, unsigned long nr_pages)
>  	 * decide to not expose all pages to the buddy (e.g., expose them
>  	 * later). We account all pages as being online and belonging to this
>  	 * zone ("present").
> +	 * When using memmap_on_memory, the range might not be aligned to
> +	 * MAX_ORDER_NR_PAGES - 1, but pageblock aligned. __ffs() will detect
> +	 * this and the first chunk to online will be pageblock_nr_pages.
>  	 */
> -	for (pfn = start_pfn; pfn < end_pfn; pfn += MAX_ORDER_NR_PAGES)
> -		(*online_page_callback)(pfn_to_page(pfn), MAX_ORDER - 1);
> +	for (pfn = start_pfn; pfn < end_pfn;) {
> +		int order = min(MAX_ORDER - 1UL, __ffs(pfn));
> +
> +		(*online_page_callback)(pfn_to_page(pfn), order);
> +		pfn += (1UL << order);
> +	}
>  
>  	/* mark all involved sections as online */
>  	online_mem_sections(start_pfn, end_pfn);

You have dropped the check for the overflow beyond end_pfn and this made
me think whether that is safe in general. It took me a while to realize
that end_pfn is always going to be within MAX_ORDER - 1 due to section
constrains (hopefully no surprises on some arches). Early init code is
in a much more complicated situation because the early memory maps can
have many oddities.

... just thinking out loud in case I need too to look that up again in
future...
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2021-04-21 11:37 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-21 10:26 [PATCH v10 0/8] Allocate memmap from hotadded memory (per device) Oscar Salvador
2021-04-21 10:26 ` [PATCH v10 1/8] drivers/base/memory: Introduce memory_block_{online,offline} Oscar Salvador
2021-04-21 10:26 ` [PATCH v10 2/8] mm,memory_hotplug: Relax fully spanned sections check Oscar Salvador
2021-04-21 10:26 ` [PATCH v10 3/8] mm,memory_hotplug: Factor out adjusting present pages into adjust_present_page_count() Oscar Salvador
2021-04-21 10:26 ` [PATCH v10 4/8] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2021-04-21 11:37   ` Michal Hocko [this message]
2021-04-21 10:26 ` [PATCH v10 5/8] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2021-04-21 10:26 ` [PATCH v10 6/8] mm,memory_hotplug: Add kernel boot option to enable memmap_on_memory Oscar Salvador
2021-04-21 10:27 ` [PATCH v10 7/8] x86/Kconfig: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
2021-04-21 10:27 ` [PATCH v10 8/8] arm64/Kconfig: " Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YIAOcrolk359gnOC@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.