All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joerg Roedel <jroedel@suse.de>, Ard Biesheuvel <ardb@kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	David Rientjes <rientjes@google.com>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Dario Faggioli <dfaggioli@suse.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Mike Rapoport <rppt@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	marcelo.cerri@canonical.com, tim.gardner@canonical.com,
	khalid.elmously@canonical.com, philip.cox@canonical.com,
	aarcange@redhat.com, peterx@redhat.com, x86@kernel.org,
	linux-mm@kvack.org, linux-coco@lists.linux.dev,
	linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCHv8 02/14] mm: Add support for unaccepted memory
Date: Sat, 24 Dec 2022 19:46:39 +0300	[thread overview]
Message-ID: <20221224164639.pb3hrvbxtlodgm5e@box.shutemov.name> (raw)
In-Reply-To: <3ab6ea38-5a9b-af4f-3c94-b75dce682bc1@suse.cz>

On Fri, Dec 09, 2022 at 11:23:50PM +0100, Vlastimil Babka wrote:
> On 12/9/22 20:26, Kirill A. Shutemov wrote:
> >> >  #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
> >> >  			/*
> >> >  			 * Watermark failed for this zone, but see if we can
> >> > @@ -4299,6 +4411,9 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
> >> >  
> >> >  			return page;
> >> >  		} else {
> >> > +			if (try_to_accept_memory(zone))
> >> > +				goto try_this_zone;
> >> 
> >> On the other hand, here we failed the full rmqueue(), including the
> >> potentially fragmenting fallbacks, so I'm worried that before we finally
> >> fail all of that and resort to accepting more memory, we already fragmented
> >> the already accepted memory, more than necessary.
> > 
> > I'm not sure I follow. We accept memory in pageblock chunks. Do we want to
> > allocate from a free pageblock if we have other memory to tap from? It
> > doesn't make sense to me.
> 
> The fragmentation avoidance based on migratetype does work with pageblock
> granularity, so yeah, if you accept a single pageblock worth of memory and
> then (through __rmqueue_fallback()) end up serving both movable and
> unmovable allocations from it, the whole fragmentation avoidance mechanism
> is defeated and you end up with unmovable allocations (e.g. page tables)
> scattered over many pageblocks and inability to allocate any huge pages.
> 
> >> So one way to prevent would be to move the acceptance into rmqueue() to
> >> happen before __rmqueue_fallback(), which I originally had in mind and maybe
> >> suggested that previously.
> > 
> > I guess it should be pretty straight forward to fail __rmqueue_fallback()
> > if there's non-empty unaccepted_pages list and steer to
> > try_to_accept_memory() this way.
> 
> That could be a way indeed. We do have ALLOC_NOFRAGMENT which could be
> possible to employ here.
> But maybe the zone_watermark_fast() modification would be simpler yet
> sufficient. It makes sense to me that we'd try to keep a high watermark
> worth of pre-accepted memory. zone_watermark_fast() would fail at low
> watermark, so we could try accepting (high-low) at a time instead of single
> pageblock.

Looks like we already have __zone_watermark_unusable_free() that seems
match use-case rather closely. We only need switch unaccepted memory to
per-zone accounting.

The fixup below suppose to do the trick, but I'm not sure how to test
fragmentation avoidance properly.

Any suggestions?

diff --git a/drivers/base/node.c b/drivers/base/node.c
index ca6f0590be21..1bd2d245edee 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -483,7 +483,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 #endif
 #ifdef CONFIG_UNACCEPTED_MEMORY
 			     ,
-			     nid, K(node_page_state(pgdat, NR_UNACCEPTED))
+			     nid, K(sum_zone_node_page_state(nid, NR_UNACCEPTED))
 #endif
 			    );
 	len += hugetlb_report_node_meminfo(buf, len, nid);
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 789b77c7b6df..e9c05b4c457c 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -157,7 +157,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 
 #ifdef CONFIG_UNACCEPTED_MEMORY
 	show_val_kb(m, "Unaccepted:     ",
-		    global_node_page_state(NR_UNACCEPTED));
+		    global_zone_page_state(NR_UNACCEPTED));
 #endif
 
 	hugetlb_report_meminfo(m);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 9c762e8175fc..8b5800cd4424 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -152,6 +152,9 @@ enum zone_stat_item {
 	NR_ZSPAGES,		/* allocated in zsmalloc */
 #endif
 	NR_FREE_CMA_PAGES,
+#ifdef CONFIG_UNACCEPTED_MEMORY
+	NR_UNACCEPTED,
+#endif
 	NR_VM_ZONE_STAT_ITEMS };
 
 enum node_stat_item {
@@ -198,9 +201,6 @@ enum node_stat_item {
 	NR_FOLL_PIN_ACQUIRED,	/* via: pin_user_page(), gup flag: FOLL_PIN */
 	NR_FOLL_PIN_RELEASED,	/* pages returned via unpin_user_page() */
 	NR_KERNEL_STACK_KB,	/* measured in KiB */
-#ifdef CONFIG_UNACCEPTED_MEMORY
-	NR_UNACCEPTED,
-#endif
 #if IS_ENABLED(CONFIG_SHADOW_CALL_STACK)
 	NR_KERNEL_SCS_KB,	/* measured in KiB */
 #endif
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e80e8d398863..404b267332a9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1779,7 +1779,7 @@ static bool try_to_accept_memory(struct zone *zone)
 
 	migratetype = get_pfnblock_migratetype(page, page_to_pfn(page));
 	__mod_zone_freepage_state(zone, -1 << order, migratetype);
-	__mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, -1 << order);
+	__mod_zone_page_state(zone, NR_UNACCEPTED, -1 << order);
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	if (last)
@@ -1808,7 +1808,7 @@ static void __free_unaccepted(struct page *page, unsigned int order)
 	migratetype = get_pfnblock_migratetype(page, page_to_pfn(page));
 	list_add_tail(&page->lru, &zone->unaccepted_pages);
 	__mod_zone_freepage_state(zone, 1 << order, migratetype);
-	__mod_node_page_state(page_pgdat(page), NR_UNACCEPTED, 1 << order);
+	__mod_zone_page_state(zone, NR_UNACCEPTED, 1 << order);
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	if (first)
@@ -4074,6 +4074,9 @@ static inline long __zone_watermark_unusable_free(struct zone *z,
 	if (!(alloc_flags & ALLOC_CMA))
 		unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES);
 #endif
+#ifdef CONFIG_UNACCEPTED_MEMORY
+	unusable_free += zone_page_state(z, NR_UNACCEPTED);
+#endif
 
 	return unusable_free;
 }
-- 
  Kiryl Shutsemau / Kirill A. Shutemov

  reply	other threads:[~2022-12-24 16:46 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-07  1:49 [PATCHv8 00/14] mm, x86/cc: Implement support for unaccepted memory Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 01/14] x86/boot: Centralize __pa()/__va() definitions Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 02/14] mm: Add support for unaccepted memory Kirill A. Shutemov
2022-12-09 18:10   ` Vlastimil Babka
2022-12-09 19:26     ` Kirill A. Shutemov
2022-12-09 22:23       ` Vlastimil Babka
2022-12-24 16:46         ` Kirill A. Shutemov [this message]
2023-01-12 11:59           ` Vlastimil Babka
2022-12-26 12:23   ` Borislav Petkov
2022-12-27  3:18     ` Kirill A. Shutemov
2023-01-16 13:04   ` Mel Gorman
2022-12-07  1:49 ` [PATCHv8 03/14] mm: Report unaccepted memory in meminfo Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 04/14] efi/x86: Get full memory map in allocate_e820() Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 05/14] x86/boot: Add infrastructure required for unaccepted memory support Kirill A. Shutemov
2023-01-03 13:52   ` Borislav Petkov
2022-12-07  1:49 ` [PATCHv8 06/14] efi/x86: Implement support for unaccepted memory Kirill A. Shutemov
2023-01-03 14:20   ` Borislav Petkov
2023-03-25  0:51     ` Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 07/14] x86/boot/compressed: Handle " Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 08/14] x86/mm: Reserve unaccepted memory bitmap Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 09/14] x86/mm: Provide helpers for unaccepted memory Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 10/14] x86/mm: Avoid load_unaligned_zeropad() stepping into " Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 11/14] x86: Disable kexec if system has " Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 12/14] x86/tdx: Make _tdx_hypercall() and __tdx_module_call() available in boot stub Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 13/14] x86/tdx: Refactor try_accept_one() Kirill A. Shutemov
2022-12-07  1:49 ` [PATCHv8 14/14] x86/tdx: Add unaccepted memory support Kirill A. Shutemov
2022-12-08 15:29 ` [PATCH v6 0/5] Provide SEV-SNP support for unaccepted memory Tom Lendacky
2022-12-08 15:29   ` [PATCH v6 1/5] x86/sev: Fix calculation of end address based on number of pages Tom Lendacky
2022-12-08 15:29   ` [PATCH v6 2/5] x86/sev: Put PSC struct on the stack in prep for unaccepted memory support Tom Lendacky
2022-12-08 15:29   ` [PATCH v6 3/5] x86/sev: Allow for use of the early boot GHCB for PSC requests Tom Lendacky
2022-12-08 15:29   ` [PATCH v6 4/5] x86/sev: Use large PSC requests if applicable Tom Lendacky
2022-12-08 15:29   ` [PATCH v6 5/5] x86/sev: Add SNP-specific unaccepted memory support Tom Lendacky
2022-12-08 22:12     ` Kirill A. Shutemov
2022-12-09 14:18       ` Tom Lendacky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221224164639.pb3hrvbxtlodgm5e@box.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dfaggioli@suse.com \
    --cc=jroedel@suse.de \
    --cc=khalid.elmously@canonical.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=marcelo.cerri@canonical.com \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=philip.cox@canonical.com \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=tim.gardner@canonical.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.