linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joerg Roedel <jroedel@suse.de>, Ard Biesheuvel <ardb@kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	David Rientjes <rientjes@google.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Dario Faggioli <dfaggioli@suse.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Mike Rapoport <rppt@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	marcelo.cerri@canonical.com, tim.gardner@canonical.com,
	khalid.elmously@canonical.com, philip.cox@canonical.com,
	aarcange@redhat.com, peterx@redhat.com, x86@kernel.org,
	linux-mm@kvack.org, linux-coco@lists.linux.dev,
	linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCHv8 02/14] mm: Add support for unaccepted memory
Date: Thu, 12 Jan 2023 12:59:06 +0100	[thread overview]
Message-ID: <2b6c77bd-bead-7bfb-bf07-63e9ca837c58@suse.cz> (raw)
In-Reply-To: <20221224164639.pb3hrvbxtlodgm5e@box.shutemov.name>

On 12/24/22 17:46, Kirill A. Shutemov wrote:
> On Fri, Dec 09, 2022 at 11:23:50PM +0100, Vlastimil Babka wrote:
>> On 12/9/22 20:26, Kirill A. Shutemov wrote:
>> >> >  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
>> >> >  			/*
>> >> >  			 * Watermark failed for this zone, but see if we can
>> >> > @@ -4299,6 +4411,9 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
>> >> >  
>> >> >  			return page;
>> >> >  		} else {
>> >> > +			if (try_to_accept_memory(zone))
>> >> > +				goto try_this_zone;
>> >> 
>> >> On the other hand, here we failed the full rmqueue(), including the
>> >> potentially fragmenting fallbacks, so I'm worried that before we finally
>> >> fail all of that and resort to accepting more memory, we already fragmented
>> >> the already accepted memory, more than necessary.
>> > 
>> > I'm not sure I follow. We accept memory in pageblock chunks. Do we want to
>> > allocate from a free pageblock if we have other memory to tap from? It
>> > doesn't make sense to me.
>> 
>> The fragmentation avoidance based on migratetype does work with pageblock
>> granularity, so yeah, if you accept a single pageblock worth of memory and
>> then (through __rmqueue_fallback()) end up serving both movable and
>> unmovable allocations from it, the whole fragmentation avoidance mechanism
>> is defeated and you end up with unmovable allocations (e.g. page tables)
>> scattered over many pageblocks and inability to allocate any huge pages.
>> 
>> >> So one way to prevent would be to move the acceptance into rmqueue() to
>> >> happen before __rmqueue_fallback(), which I originally had in mind and maybe
>> >> suggested that previously.
>> > 
>> > I guess it should be pretty straight forward to fail __rmqueue_fallback()
>> > if there's non-empty unaccepted_pages list and steer to
>> > try_to_accept_memory() this way.
>> 
>> That could be a way indeed. We do have ALLOC_NOFRAGMENT which could be
>> possible to employ here.
>> But maybe the zone_watermark_fast() modification would be simpler yet
>> sufficient. It makes sense to me that we'd try to keep a high watermark
>> worth of pre-accepted memory. zone_watermark_fast() would fail at low
>> watermark, so we could try accepting (high-low) at a time instead of single
>> pageblock.
> 
> Looks like we already have __zone_watermark_unusable_free() that seems
> match use-case rather closely. We only need switch unaccepted memory to
> per-zone accounting.

Could work. I'd still suggest also making try_to_accept_memory() to accept
up to high watermark, not a single pageblock.
> The fixup below suppose to do the trick, but I'm not sure how to test
> fragmentation avoidance properly.
> 
> Any suggestions?

Haven't done that for years, maybe Mel knows better. But from what I
remember, I'd compare /proc/pagetypeinfo with and without memory accepting,
and collect the mm_page_alloc_extfrag tracepoint. If there are more of these
events happening, it's bad. Ideally with a workload that stresses both
userspace (movable) allocations and kernel allocations. Again, Mel might
have suggestions for a mmtest?

> 
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index ca6f0590be21..1bd2d245edee 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -483,7 +483,7 @@ static ssize_t node_read_meminfo(struct device *dev,
>  #endif
>  #ifdef CONFIG_UNACCEPTED_MEMORY
>  			     ,
> -			     nid, K(node_page_state(pgdat, NR_UNACCEPTED))
> +			     nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
>  #endif
>  			    );
>  	len += hugetlb_report_node_meminfo(buf, len, nid);
> diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
> index 789b77c7b6df..e9c05b4c457c 100644
> --- a/fs/proc/meminfo.c
> +++ b/fs/proc/meminfo.c
> @@ -157,7 +157,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
>  
>  #ifdef CONFIG_UNACCEPTED_MEMORY
>  	show_val_kb(m, "Unaccepted:     ",
> -		    global_node_page_state(NR_UNACCEPTED));
> +		    global_zone_page_state(NR_UNACCEPTED));
>  #endif
>  
>  	hugetlb_report_meminfo(m);
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 9c762e8175fc..8b5800cd4424 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -152,6 +152,9 @@ enum zone_stat_item {
>  	NR_ZSPAGES,		/* allocated in zsmalloc */
>  #endif
>  	NR_FREE_CMA_PAGES,
> +#ifdef CONFIG_UNACCEPTED_MEMORY
> +	NR_UNACCEPTED,
> +#endif
>  	NR_VM_ZONE_STAT_ITEMS };
>  
>  enum node_stat_item {
> @@ -198,9 +201,6 @@ enum node_stat_item {
>  	NR_FOLL_PIN_ACQUIRED,	/* via: pin_user_page(), gup flag: FOLL_PIN */
>  	NR_FOLL_PIN_RELEASED,	/* pages returned via unpin_user_page() */
>  	NR_KERNEL_STACK_KB,	/* measured in KiB */
> -#ifdef CONFIG_UNACCEPTED_MEMORY
> -	NR_UNACCEPTED,
> -#endif
>  #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
>  	NR_KERNEL_SCS_KB,	/* measured in KiB */
>  #endif
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index e80e8d398863..404b267332a9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1779,7 +1779,7 @@ static bool try_to_accept_memory(struct zone *zone)
>  
>  	migratetype = get_pfnblock_migratetype(page, page_to_pfn(page));
>  	__mod_zone_freepage_state(zone, -1 << order, migratetype);
> -	__mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, -1 << order);
> +	__mod_zone_page_state(zone, NR_UNACCEPTED, -1 << order);
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  
>  	if (last)
> @@ -1808,7 +1808,7 @@ static void __free_unaccepted(struct page *page, unsigned int order)
>  	migratetype = get_pfnblock_migratetype(page, page_to_pfn(page));
>  	list_add_tail(&page->lru, &zone->unaccepted_pages);
>  	__mod_zone_freepage_state(zone, 1 << order, migratetype);
> -	__mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, 1 << order);
> +	__mod_zone_page_state(zone, NR_UNACCEPTED, 1 << order);
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  
>  	if (first)
> @@ -4074,6 +4074,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
>  	if (!(alloc_flags & ALLOC_CMA))
>  		unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
>  #endif
> +#ifdef CONFIG_UNACCEPTED_MEMORY
> +	unusable_free += zone_page_state(z, NR_UNACCEPTED);
> +#endif
>  
>  	return unusable_free;
>  }



  reply	other threads:[~2023-01-12 11:59 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-07  1:49 [PATCHv8 00/14] mm, x86/cc: Implement support for unaccepted memory Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 01/14] x86/boot: Centralize __pa()/__va() definitions Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 02/14] mm: Add support for unaccepted memory Kirill A. Shutemov
2022-12-09 18:10   ` Vlastimil Babka
2022-12-09 19:26     ` Kirill A. Shutemov
2022-12-09 22:23       ` Vlastimil Babka
2022-12-24 16:46         ` Kirill A. Shutemov
2023-01-12 11:59           ` Vlastimil Babka [this message]
2022-12-26 12:23   ` Borislav Petkov
2022-12-27  3:18     ` Kirill A. Shutemov
2023-01-16 13:04   ` Mel Gorman
2022-12-07  1:49 ` [PATCHv8 03/14] mm: Report unaccepted memory in meminfo Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 04/14] efi/x86: Get full memory map in allocate_e820() Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 05/14] x86/boot: Add infrastructure required for unaccepted memory support Kirill A. Shutemov
2023-01-03 13:52   ` Borislav Petkov
2022-12-07  1:49 ` [PATCHv8 06/14] efi/x86: Implement support for unaccepted memory Kirill A. Shutemov
2023-01-03 14:20   ` Borislav Petkov
2023-03-25  0:51     ` Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 07/14] x86/boot/compressed: Handle " Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 08/14] x86/mm: Reserve unaccepted memory bitmap Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 09/14] x86/mm: Provide helpers for unaccepted memory Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 10/14] x86/mm: Avoid load_unaligned_zeropad() stepping into " Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 11/14] x86: Disable kexec if system has " Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 12/14] x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 13/14] x86/tdx: Refactor try_accept_one() Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 14/14] x86/tdx: Add unaccepted memory support Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2b6c77bd-bead-7bfb-bf07-63e9ca837c58@suse.cz \
    --to=vbabka@suse.cz \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dfaggioli@suse.com \
    --cc=jroedel@suse.de \
    --cc=khalid.elmously@canonical.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=marcelo.cerri@canonical.com \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=philip.cox@canonical.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tim.gardner@canonical.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).