All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Dave Hansen <dave.hansen@intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>,
	Sean Christopherson <seanjc@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joerg Roedel <jroedel@suse.de>, Ard Biesheuvel <ardb@kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Kuppuswamy Sathyanarayanan
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	David Rientjes <rientjes@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Tom Lendacky <thomas.lendacky@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Ingo Molnar <mingo@redhat.com>,
	Varad Gautam <varad.gautam@suse.com>,
	Dario Faggioli <dfaggioli@suse.com>,
	Brijesh Singh <brijesh.singh@amd.com>,
	Mike Rapoport <rppt@kernel.org>,
	David Hildenbrand <david@redhat.com>,
	x86@kernel.org, linux-mm@kvack.org, linux-coco@lists.linux.dev,
	linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	Mike Rapoport <rppt@linux.ibm.com>
Subject: Re: [PATCHv4 1/8] mm: Add support for unaccepted memory
Date: Sat, 9 Apr 2022 18:54:23 +0300	[thread overview]
Message-ID: <20220409155423.iv2arckmvavvpegt@box.shutemov.name> (raw)
In-Reply-To: <767c2100-c171-1fd3-6a92-0af2e4bf3067@intel.com>

On Fri, Apr 08, 2022 at 11:55:43AM -0700, Dave Hansen wrote:
> On 4/5/22 16:43, Kirill A. Shutemov wrote:
> > UEFI Specification version 2.9 introduces the concept of memory
> > acceptance. Some Virtual Machine platforms, such as Intel TDX or AMD
> > SEV-SNP, requiring memory to be accepted before it can be used by the
> 
> 		^ require

Heh. That's wording form the spec.

> > guest. Accepting happens via a protocol specific for the Virtual Machine
> > platform.
> 
> 							^ s/for/to
> 
> > Accepting memory is costly and it makes VMM allocate memory for the
> > accepted guest physical address range. It's better to postpone memory
> > acceptance until memory is needed. It lowers boot time and reduces
> > memory overhead.
> > 
> > Support of such memory requires a few changes in core-mm code:
> > 
> >   - memblock has to accept memory on allocation;
> > 
> >   - page allocator has to accept memory on the first allocation of the
> >     page;
> > 
> > Memblock change is trivial.
> > 
> > The page allocator is modified to accept pages on the first allocation.
> > PageUnaccepted() is used to indicate that the page requires acceptance.
> 
> Does this consume an actual page flag or is it aliased?

It is encoded as a page type in mapcount of unallocated memory. It is not
aliased with PageOffline() as I did before.

I will mention that it is a new page type.

> > Kernel only needs to accept memory once after boot, so during the boot
> > and warm up phase there will be a lot of memory acceptance. After things
> > are settled down the only price of the feature if couple of checks for
> > PageUnaccepted() in allocate and free paths. The check refers a hot
> 
> 							       ^ to
> 
> ...
> > + /*
> > +  * PageUnaccepted() indicates that the page has to be "accepted" before it can
> > +  * be used. Page allocator has to call accept_page() before returning the page
> > +  * to the caller.
> > +  */
> 
> Let's talk about "used" with a bit more detail.  Maybe:
> 
> /*
>  * PageUnaccepted() indicates that the page has to be "accepted" before
>  * it can be read or written.  The page allocator must to call
>  * accept_page() before touching the page or returning it to the caller.
>  */

I guess s/must to call/must call/, right?

> ...
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 2db95780e003..53f4aa1c92a7 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -121,6 +121,12 @@ typedef int __bitwise fpi_t;
> >   */
> >  #define FPI_SKIP_KASAN_POISON	((__force fpi_t)BIT(2))
> >  
> > +/*
> > + * Check if the page needs to be marked as PageUnaccepted().
> > + * Used for the new pages added to the buddy allocator for the first time.
> > + */
> > +#define FPI_UNACCEPTED		((__force fpi_t)BIT(3))
> > +
> >  /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
> >  static DEFINE_MUTEX(pcp_batch_high_lock);
> >  #define MIN_PERCPU_PAGELIST_HIGH_FRACTION (8)
> > @@ -1023,6 +1029,26 @@ buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
> >  	return page_is_buddy(higher_page, higher_buddy, order + 1);
> >  }
> >  
> > +static void accept_page(struct page *page, unsigned int order)
> > +{
> > +	phys_addr_t start = page_to_phys(page);
> > +	int i;
> > +
> > +	accept_memory(start, start + (PAGE_SIZE << order));
> > +
> > +	for (i = 0; i < (1 << order); i++) {
> > +		if (PageUnaccepted(page + i))
> > +			__ClearPageUnaccepted(page + i);
> > +	}
> > +}
> 
> It's probably worth a comment somewhere that this can be really slow.
> 
> > +static bool page_is_unaccepted(struct page *page, unsigned int order)
> > +{
> > +	phys_addr_t start = page_to_phys(page);
> > +
> > +	return memory_is_unaccepted(start, start + (PAGE_SIZE << order));
> > +}
> > +
> >  /*
> >   * Freeing function for a buddy system allocator.
> >   *
> > @@ -1058,6 +1084,7 @@ static inline void __free_one_page(struct page *page,
> >  	unsigned long combined_pfn;
> >  	struct page *buddy;
> >  	bool to_tail;
> > +	bool unaccepted = PageUnaccepted(page);
> >  
> >  	VM_BUG_ON(!zone_is_initialized(zone));
> >  	VM_BUG_ON_PAGE(page->flags & PAGE_FLAGS_CHECK_AT_PREP, page);
> > @@ -1089,6 +1116,11 @@ static inline void __free_one_page(struct page *page,
> >  			clear_page_guard(zone, buddy, order, migratetype);
> >  		else
> >  			del_page_from_free_list(buddy, zone, order);
> > +
> > +		/* Mark page unaccepted if any of merged pages were unaccepted */
> > +		if (PageUnaccepted(buddy))
> > +			unaccepted = true;
> 
> Naming nit: following the logic with a double-negative like !unaccepted
> is a bit hard.  Would this be more readable if it were:
> 
> 	bool page_needs_acceptance = PageUnaccepted(page);
> 
> and then the code below...
> 
> >  		combined_pfn = buddy_pfn & pfn;
> >  		page = page + (combined_pfn - pfn);
> >  		pfn = combined_pfn;
> > @@ -1124,6 +1156,17 @@ static inline void __free_one_page(struct page *page,
> >  done_merging:
> >  	set_buddy_order(page, order);
> >  
> > +	/*
> > +	 * Check if the page needs to be marked as PageUnaccepted().
> > +	 * Used for the new pages added to the buddy allocator for the first
> > +	 * time.
> > +	 */
> > +	if (!unaccepted && (fpi_flags & FPI_UNACCEPTED))
> > +		unaccepted = page_is_unaccepted(page, order);
> 
> 	if (page_needs_acceptance && (fpi_flags & FPI_UNACCEPTED))
> 		page_needs_acceptance = page_is_unaccepted(page, order);
> 
> > +	if (unaccepted)
> > +		__SetPageUnaccepted(page);
> 
> This is getting hard for me to follow.
> 
> There are:
> 1. Pages that come in here with PageUnaccepted()==1
> 2. Pages that come in here with PageUnaccepted()==0, but a buddy that
>    was PageUnaccepted()==1
> 
> In either of those cases, the bitmap will be consulted to see if the
> page is *truly* unaccepted or not.  But, I'm struggling to figure out
> how a page could end up in one of those scenarios and *not* be
> page_is_unaccepted().
> 
> There are three pieces of information that come in:
> 1. PageUnaccepted(page)
> 2. PageUnaccepted(buddies[])
> 3. the bitmap

1 and 2 are the same conceptionally: merged-in pieces of the resulting page.

> 
> and one piece of information going out:
> 
> PageUnaccepted(page);
> 
> I think I need a more coherent description of how those four things fit
> together.

The page gets marked as PageUnaccepted() if any of merged-in pages is
PageUnaccepted().

For new pages, just being added to buddy allocator, consult
page_is_unaccepted(). FPI_UNACCEPTED indicates that the page is new and
page_is_unaccepted() check is required.

Avoid calling page_is_unaccepted() if it is known that the page needs
acceptance anyway. It can be costly.

Is it good enough explanation?

FPI_UNACCEPTED is not a good name. Any help with a better one?
FPI_CHECK_UNACCEPTED?

> >  	if (fpi_flags & FPI_TO_TAIL)
> >  		to_tail = true;
> >  	else if (is_shuffle_order(order))
> > @@ -1149,7 +1192,8 @@ static inline void __free_one_page(struct page *page,
> >  static inline bool page_expected_state(struct page *page,
> >  					unsigned long check_flags)
> >  {
> > -	if (unlikely(atomic_read(&page->_mapcount) != -1))
> > +	if (unlikely(atomic_read(&page->_mapcount) != -1) &&
> > +	    !PageUnaccepted(page))
> >  		return false;
> 
> That probably deserves a comment, and maybe its own if() statement.

Own if does not work. PageUnaccepted() is encoded in _mapcount.

What about this:

	/*
	 * page->_mapcount is expected to be -1.
	 *
	 * There is an exception for PageUnaccepted(). The page type can be set
	 * for pages on free list. Page types are encoded in _mapcount.
	 *
	 * PageUnaccepted() will get cleared in post_alloc_hook().
	 */
	if (unlikely((atomic_read(&page->_mapcount) | PG_unaccepted) != -1))
		return false;

?

> >  	if (unlikely((unsigned long)page->mapping |
> > @@ -1654,7 +1698,8 @@ void __free_pages_core(struct page *page, unsigned int order)
> >  	 * Bypass PCP and place fresh pages right to the tail, primarily
> >  	 * relevant for memory onlining.
> >  	 */
> > -	__free_pages_ok(page, order, FPI_TO_TAIL | FPI_SKIP_KASAN_POISON);
> > +	__free_pages_ok(page, order,
> > +			FPI_TO_TAIL | FPI_SKIP_KASAN_POISON | FPI_UNACCEPTED);
> >  }
> >  
> >  #ifdef CONFIG_NUMA
> > @@ -1807,6 +1852,7 @@ static void __init deferred_free_range(unsigned long pfn,
> >  		return;
> >  	}
> >  
> > +	accept_memory(pfn << PAGE_SHIFT, (pfn + nr_pages) << PAGE_SHIFT);
> >  	for (i = 0; i < nr_pages; i++, page++, pfn++) {
> >  		if ((pfn & (pageblock_nr_pages - 1)) == 0)
> >  			set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> 
> Comment, please.  I assume doing the slow accept up front is OK here
> because this is in the deferred path.  But, it would be nice to know for
> sure.

It is acceptance of smaller than page block upfront. I will add a comment.

> 
> > @@ -2266,6 +2312,10 @@ static inline void expand(struct zone *zone, struct page *page,
> >  		if (set_page_guard(zone, &page[size], high, migratetype))
> >  			continue;
> >  
> > +		/* Transfer PageUnaccepted() to the newly split pages */
> > +		if (PageUnaccepted(page))
> > +			__SetPageUnaccepted(&page[size]);
> 
> We don't want to just accept the page here, right?  Because we're
> holding the zone lock?  Maybe we should mention that:
> 
> 		/*
> 		 * Transfer PageUnaccepted() to the newly split pages so
> 		 * they can be accepted after dropping the zone lock.
> 		 */

Okay.

> >  		add_to_free_list(&page[size], zone, high, migratetype);
> >  		set_buddy_order(&page[size], high);
> >  	}
> > @@ -2396,6 +2446,9 @@ inline void post_alloc_hook(struct page *page, unsigned int order,
> >  	 */
> >  	kernel_unpoison_pages(page, 1 << order);
> >  
> > +	if (PageUnaccepted(page))
> > +		accept_page(page, order);
> > +
> >  	/*
> >  	 * As memory initialization might be integrated into KASAN,
> >  	 * KASAN unpoisoning and memory initializion code must be
> 
> Is accepted memory guaranteed to be zeroed?  Do we want to skip the
> __GFP_ZERO behavior later in this function?  Or is that just a silly
> over-optimization?

For TDX, it is true that the memory gets cleared on acceptance, but I
don't we can say the same for any possible implementation.

I would rather leave __GFP_ZERO for peace of mind. Clearing the cache-hot
page for the second time shouldn't be a big deal comparing to acceptance
cost.

-- 
 Kirill A. Shutemov

  reply	other threads:[~2022-04-09 15:52 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-05 23:43 [PATCHv4 0/8] mm, x86/cc: Implement support for unaccepted memory Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 1/8] mm: Add " Kirill A. Shutemov
2022-04-08 18:55   ` Dave Hansen
2022-04-09 15:54     ` Kirill A. Shutemov [this message]
2022-04-11  6:38       ` Dave Hansen
2022-04-11 10:07         ` Mike Rapoport
2022-04-13 11:40           ` Kirill A. Shutemov
2022-04-13 14:48             ` Mike Rapoport
2022-04-13 15:15               ` Kirill A. Shutemov
2022-04-13 20:06                 ` Mike Rapoport
2022-04-11  8:47       ` David Hildenbrand
2022-04-08 19:04   ` David Hildenbrand
2022-04-08 19:11   ` Dave Hansen
2022-04-09 17:52     ` Kirill A. Shutemov
2022-04-11  6:41       ` Dave Hansen
2022-04-11 15:55         ` Borislav Petkov
2022-04-11 16:27           ` Dave Hansen
2022-04-11 18:55             ` Tom Lendacky
2022-04-12  8:15     ` David Hildenbrand
2022-04-12 16:08       ` Dave Hansen
2022-04-13 10:36         ` David Hildenbrand
2022-04-13 11:30           ` Kirill A. Shutemov
2022-04-13 11:32             ` David Hildenbrand
2022-04-13 15:36             ` Dave Hansen
2022-04-13 16:07               ` David Hildenbrand
2022-04-13 16:13                 ` Dave Hansen
2022-04-13 16:24               ` Kirill A. Shutemov
2022-04-13 14:39           ` Mike Rapoport
2022-04-05 23:43 ` [PATCHv4 2/8] efi/x86: Get full memory map in allocate_e820() Kirill A. Shutemov
2022-04-13  9:59   ` Borislav Petkov
2022-04-13 11:45     ` Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 3/8] efi/x86: Implement support for unaccepted memory Kirill A. Shutemov
2022-04-08 17:26   ` Dave Hansen
2022-04-09 19:41     ` Kirill A. Shutemov
2022-04-14 15:55     ` Borislav Petkov
2022-04-15 22:24   ` Borislav Petkov
2022-04-18 15:55     ` Kirill A. Shutemov
2022-04-18 16:38       ` Borislav Petkov
2022-04-18 20:24         ` Kirill A. Shutemov
2022-04-18 21:01           ` Borislav Petkov
2022-04-18 23:50             ` Kirill A. Shutemov
2022-04-19  7:39               ` Borislav Petkov
2022-04-19 15:30                 ` Kirill A. Shutemov
2022-04-19 16:38                   ` Dave Hansen
2022-04-19 19:23                   ` Borislav Petkov
2022-04-21 12:26                 ` Borislav Petkov
2022-04-22  0:21                 ` Kirill A. Shutemov
2022-04-22  9:30                   ` Borislav Petkov
2022-04-22 13:26                     ` Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 4/8] x86/boot/compressed: Handle " Kirill A. Shutemov
2022-04-08 17:57   ` Dave Hansen
2022-04-09 20:20     ` Kirill A. Shutemov
2022-04-11  6:49       ` Dave Hansen
2022-04-05 23:43 ` [PATCHv4 5/8] x86/mm: Reserve unaccepted memory bitmap Kirill A. Shutemov
2022-04-08 18:08   ` Dave Hansen
2022-04-09 20:43     ` Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 6/8] x86/mm: Provide helpers for unaccepted memory Kirill A. Shutemov
2022-04-08 18:15   ` Dave Hansen
2022-04-08 19:21   ` Dave Hansen
2022-04-13 16:08     ` Kirill A. Shutemov
2022-04-05 23:43 ` [PATCHv4 7/8] x86/tdx: Unaccepted memory support Kirill A. Shutemov
2022-04-08 18:28   ` Dave Hansen
2022-04-05 23:43 ` [PATCHv4 8/8] mm/vmstat: Add counter for memory accepting Kirill A. Shutemov
2022-04-12  8:18   ` David Hildenbrand
2022-04-08 17:02 ` [PATCHv4 0/8] mm, x86/cc: Implement support for unaccepted memory Dave Hansen
2022-04-09 23:44   ` Kirill A. Shutemov
2022-04-21 12:29     ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220409155423.iv2arckmvavvpegt@box.shutemov.name \
    --to=kirill@shutemov.name \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=ardb@kernel.org \
    --cc=bp@alien8.de \
    --cc=brijesh.singh@amd.com \
    --cc=dave.hansen@intel.com \
    --cc=david@redhat.com \
    --cc=dfaggioli@suse.com \
    --cc=jroedel@suse.de \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-coco@lists.linux.dev \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rientjes@google.com \
    --cc=rppt@kernel.org \
    --cc=rppt@linux.ibm.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=varad.gautam@suse.com \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.