All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Huang, Ying" <ying.huang@intel.com>
To: Khalid Aziz <khalid.aziz@oracle.com>
Cc: akpm@linux-foundation.org, willy@infradead.org,
	steven.sistare@oracle.com, david@redhat.com,
	mgorman@techsingularity.net, baolin.wang@linux.alibaba.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Khalid Aziz <khalid@kernel.org>
Subject: Re: [PATCH v4] mm, compaction: Skip all non-migratable pages during scan
Date: Mon, 29 May 2023 11:01:40 +0800	[thread overview]
Message-ID: <87ttvvx2ln.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <20230525191507.160076-1-khalid.aziz@oracle.com> (Khalid Aziz's message of "Thu, 25 May 2023 13:15:07 -0600")

Khalid Aziz <khalid.aziz@oracle.com> writes:

> Pages pinned in memory through extra refcounts can not be migrated.
> Currently as isolate_migratepages_block() scans pages for
> compaction, it skips any pinned anonymous pages. All non-migratable
> pages should be skipped and not just the anonymous pinned pages.
> This patch adds a check for extra refcounts on a page to determine
> if the page can be migrated.  This was seen as a real issue on a
> customer workload where a large number of pages were pinned by vfio
> on the host and any attempts to allocate hugepages resulted in
> significant amount of cpu time spent in either direct compaction or
> in kcompactd scanning vfio pinned pages over and over again that can
> not be migrated. These are the changes in relevant stats with this
> patch for a test run of this scenario:
>
> 				Before			After
> compact_migrate_scanned		329,798,858		370,984,387
> compact_free_scanned		40,478,406		25,843,262
> compact_isolated		135,470,452		777,235
> pgmigrate_success		544,255			507,325
> pgmigrate_fail			134,616,282		47
> kcompactd CPU time		5:12.81			0:12.28
>
> Before the patch, large number of pages were isolated but most of
> them failed to migrate.
>
> Signed-off-by: Khalid Aziz <khalid.aziz@oracle.com>
> Suggested-by: Steve Sistare <steven.sistare@oracle.com>
> Cc: Khalid Aziz <khalid@kernel.org>
> ---
> v4:
> 	- Use existing folio_expected_refs() function (Suggested
> 	  by Huang, Ying)
> 	- Use folio functions
> 	- Take into account contig allocations when checking for
> 	  long temr pinning and skip pages in ZONE_MOVABLE and
> 	  MIGRATE_CMA type pages (Suggested by David Hildenbrand)
> 	- Use folio version of total_mapcount() instead of
> 	  page_mapcount() (Suggested by Baolin Wang)
>
> v3:
> 	- Account for extra ref added by get_page_unless_zero() earlier
> 	  in isolate_migratepages_block() (Suggested by Huang, Ying)
> 	- Clean up computation of extra refs to be consistent 
> 	  (Suggested by Huang, Ying)
>
> v2:
> 	- Update comments in the code (Suggested by Andrew)
> 	- Use PagePrivate() instead of page_has_private() (Suggested
> 	  by Matthew)
> 	- Pass mapping to page_has_extrarefs() (Suggested by Matthew)
> 	- Use page_ref_count() (Suggested by Matthew)
> 	- Rename is_pinned_page() to reflect its function more
> 	  accurately (Suggested by Matthew)
>
>  include/linux/migrate.h | 16 +++++++++++++++
>  mm/compaction.c         | 44 +++++++++++++++++++++++++++++++++++++----
>  mm/migrate.c            | 14 -------------
>  3 files changed, 56 insertions(+), 18 deletions(-)
>
> diff --git a/include/linux/migrate.h b/include/linux/migrate.h
> index 6241a1596a75..4f59e15eae99 100644
> --- a/include/linux/migrate.h
> +++ b/include/linux/migrate.h
> @@ -141,6 +141,22 @@ const struct movable_operations *page_movable_ops(struct page *page)
>  		((unsigned long)page->mapping - PAGE_MAPPING_MOVABLE);
>  }
>  
> +static inline
> +int folio_expected_refs(struct address_space *mapping,
> +		struct folio *folio)

I don't think that it's necessary to make this function inline.  It
isn't called in hot path.

> +{
> +	int refs = 1;
> +
> +	if (!mapping)
> +		return refs;
> +
> +	refs += folio_nr_pages(folio);
> +	if (folio_test_private(folio))
> +		refs++;
> +
> +	return refs;
> +}
> +
>  #ifdef CONFIG_NUMA_BALANCING
>  int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
>  			   int node);
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 5a9501e0ae01..b548e05f0349 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -764,6 +764,42 @@ static bool too_many_isolated(pg_data_t *pgdat)
>  	return too_many;
>  }
>  
> +/*
> + * Check if this base page should be skipped from isolation because
> + * it has extra refcounts that will prevent it from being migrated.
> + * This code is inspired by similar code in migrate_vma_check_page(),
> + * can_split_folio() and folio_migrate_mapping()
> + */
> +static inline bool page_has_extra_refs(struct page *page,
> +					struct address_space *mapping)
> +{
> +	unsigned long extra_refs;

s/extra_refs/expected_refs/
?

> +	struct folio *folio;
> +
> +	/*
> +	 * Skip this check for pages in ZONE_MOVABLE or MIGRATE_CMA
> +	 * pages that can not be long term pinned
> +	 */
> +	if (is_zone_movable_page(page) || is_migrate_cma_page(page))
> +		return false;

I suggest to move these 2 checks out to the place before calling the
function.  Or change the name of the function.

> +
> +	folio = page_folio(page);
> +
> +	/*
> +	 * caller holds a ref already from get_page_unless_zero()
> +	 * which is accounted for in folio_expected_refs()
> +	 */
> +	extra_refs = folio_expected_refs(mapping, folio);
> +
> +	/*
> +	 * This is an admittedly racy check but good enough to determine
> +	 * if a page is pinned and can not be migrated
> +	 */
> +	if ((folio_ref_count(folio) - extra_refs) > folio_mapcount(folio))
> +		return true;
> +	return false;
> +}
> +
>  /**
>   * isolate_migratepages_block() - isolate all migrate-able pages within
>   *				  a single pageblock
> @@ -992,12 +1028,12 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			goto isolate_fail;
>  
>  		/*
> -		 * Migration will fail if an anonymous page is pinned in memory,
> -		 * so avoid taking lru_lock and isolating it unnecessarily in an
> -		 * admittedly racy check.
> +		 * Migration will fail if a page has extra refcounts
> +		 * from long term pinning preventing it from migrating,
> +		 * so avoid taking lru_lock and isolating it unnecessarily.
>  		 */
>  		mapping = page_mapping(page);
> -		if (!mapping && (page_count(page) - 1) > total_mapcount(page))
> +		if (!cc->alloc_contig && page_has_extra_refs(page, mapping))
>  			goto isolate_fail_put;
>  
>  		/*
> diff --git a/mm/migrate.c b/mm/migrate.c
> index db3f154446af..a2f3e5834996 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -385,20 +385,6 @@ void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
>  }
>  #endif
>  
> -static int folio_expected_refs(struct address_space *mapping,
> -		struct folio *folio)
> -{
> -	int refs = 1;
> -	if (!mapping)
> -		return refs;
> -
> -	refs += folio_nr_pages(folio);
> -	if (folio_test_private(folio))
> -		refs++;
> -
> -	return refs;
> -}
> -
>  /*
>   * Replace the page in the mapping.
>   *

Best Regards,
Huang, Ying

      parent reply	other threads:[~2023-05-29  3:02 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-25 19:15 [PATCH v4] mm, compaction: Skip all non-migratable pages during scan Khalid Aziz
2023-05-25 19:58 ` Matthew Wilcox
2023-05-25 20:15   ` Steven Sistare
2023-05-25 20:45     ` Matthew Wilcox
2023-05-25 21:31       ` Matthew Wilcox
2023-05-26 15:44         ` Khalid Aziz
2023-05-26 16:44           ` Matthew Wilcox
2023-05-26 16:46             ` David Hildenbrand
2023-05-26 18:46               ` Matthew Wilcox
2023-05-26 18:50                 ` David Hildenbrand
2023-05-27  2:11                   ` John Hubbard
2023-05-27  3:18                     ` Matthew Wilcox
2023-05-28 23:49                       ` John Hubbard
2023-05-29  0:31                         ` Matthew Wilcox
2023-05-29  9:25                           ` David Hildenbrand
2023-05-30 15:42                             ` Khalid Aziz
2023-06-09 22:11                               ` Andrew Morton
2023-06-09 23:28                                 ` Khalid Aziz
2023-05-25 20:41   ` Khalid Aziz
2023-05-29  3:01 ` Huang, Ying [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87ttvvx2ln.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=khalid.aziz@oracle.com \
    --cc=khalid@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=steven.sistare@oracle.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.