Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration

From: Alistair Popple <apopple@nvidia.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Zi Yan <ziy@nvidia.com>, Yang Shi <shy828301@gmail.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	Oscar Salvador <osalvador@suse.de>,
	Matthew Wilcox <willy@infradead.org>,
	Bharata B Rao <bharata@amd.com>, haoxin <xhao@linux.alibaba.com>
Subject: Re: [PATCH 2/8] migrate_pages: separate hugetlb folios migration
Date: Thu, 05 Jan 2023 17:43:05 +1100	[thread overview]
Message-ID: <877cy1scg5.fsf@nvidia.com> (raw)
In-Reply-To: <87pmbtedfp.fsf@yhuang6-desk2.ccr.corp.intel.com>

"Huang, Ying" <ying.huang@intel.com> writes:

> Alistair Popple <apopple@nvidia.com> writes:
>
>> Huang Ying <ying.huang@intel.com> writes:
>>
>>> This is a preparation patch to batch the folio unmapping and moving
>>> for the non-hugetlb folios.  Based on that we can batch the TLB
>>> shootdown during the folio migration and make it possible to use some
>>> hardware accelerator for the folio copying.
>>>
>>> In this patch the hugetlb folios and non-hugetlb folios migration is
>>> separated in migrate_pages() to make it easy to change the non-hugetlb
>>> folios migration implementation.
>>>
>>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>>> Cc: Zi Yan <ziy@nvidia.com>
>>> Cc: Yang Shi <shy828301@gmail.com>
>>> Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
>>> Cc: Oscar Salvador <osalvador@suse.de>
>>> Cc: Matthew Wilcox <willy@infradead.org>
>>> Cc: Bharata B Rao <bharata@amd.com>
>>> Cc: Alistair Popple <apopple@nvidia.com>
>>> Cc: haoxin <xhao@linux.alibaba.com>
>>> ---
>>>  mm/migrate.c | 114 ++++++++++++++++++++++++++++++++++++++++++++-------
>>>  1 file changed, 99 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index ec9263a33d38..bdbe73fe2eb7 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1404,6 +1404,87 @@ struct migrate_pages_stats {
>>>  	int nr_thp_split;
>>>  };
>>>  
>>> +static int migrate_hugetlbs(struct list_head *from, new_page_t get_new_page,
>>> +			    free_page_t put_new_page, unsigned long private,
>>> +			    enum migrate_mode mode, int reason,
>>> +			    struct migrate_pages_stats *stats,
>>> +			    struct list_head *ret_folios)
>>> +{
>>> +	int retry = 1;
>>> +	int nr_failed = 0;
>>> +	int nr_retry_pages = 0;
>>> +	int pass = 0;
>>> +	struct folio *folio, *folio2;
>>> +	int rc = 0, nr_pages;
>>> +
>>> +	for (pass = 0; pass < 10 && retry; pass++) {
>>> +		retry = 0;
>>> +		nr_retry_pages = 0;
>>> +
>>> +		list_for_each_entry_safe(folio, folio2, from, lru) {
>>> +			if (!folio_test_hugetlb(folio))
>>> +				continue;
>>> +
>>> +			nr_pages = folio_nr_pages(folio);
>>> +
>>> +			cond_resched();
>>> +
>>> +			rc = unmap_and_move_huge_page(get_new_page,
>>> +						      put_new_page, private,
>>> +						      &folio->page, pass > 2, mode,
>>> +						      reason, ret_folios);
>>> +			/*
>>> +			 * The rules are:
>>> +			 *	Success: hugetlb folio will be put back
>>> +			 *	-EAGAIN: stay on the from list
>>> +			 *	-ENOMEM: stay on the from list
>>> +			 *	-ENOSYS: stay on the from list
>>> +			 *	Other errno: put on ret_folios list
>>> +			 */
>>> +			switch(rc) {
>>> +			case -ENOSYS:
>>> +				/* Hugetlb migration is unsupported */
>>> +				nr_failed++;
>>> +				stats->nr_failed_pages += nr_pages;
>>> +				list_move_tail(&folio->lru, ret_folios);
>>> +				break;
>>> +			case -ENOMEM:
>>> +				/*
>>> +				 * When memory is low, don't bother to try to migrate
>>> +				 * other folios, just exit.
>>> +				 */
>>> +				nr_failed++;
>>
>> This currently isn't relevant for -ENOMEM and I think it would be
>> clearer if it was dropped.
>
> OK.
>
>>> +				stats->nr_failed_pages += nr_pages;
>>
>> Makes sense not to continue migration with low memory, but shouldn't we
>> add the remaining unmigrated hugetlb folios to stats->nr_failed_pages as
>> well? Ie. don't we still have to continue the iteration to to find and
>> account for these?
>
> I think nr_failed_pages only counts tried pages.  IIUC, it's the
> original behavior and behavior for non-hugetlb pages too.

Hmm, I agree it seems this is the original behavior but that behaviour
seems arbitrary and wrong IMHO. The page failed to migrate, therefore it
should count as such. The fact we didn't even try seems irrelevant.

Indeed it looks like this was introduced because it was confusing to see
no failures even though migrate_pages() was called - see dfef2ef4027b
("mm, migrate: increment fail count on ENOMEM").

But that seems inconsistent - why count this one folio as failed because
of the allocation failure while other folios which would also likely
cause allocation failures don't get counted? Fixing it is probably
outside the scope of this series so I won't insist, but it would be nice
as it could still lead to confusion in some scenarios.

[...]

>>> @@ -1462,30 +1549,28 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  		nr_retry_pages = 0;
>>>  
>>>  		list_for_each_entry_safe(folio, folio2, from, lru) {
>>> +			if (folio_test_hugetlb(folio)) {
>>
>> How do we hit this case? Shouldn't migrate_hugetlbs() have already moved
>> any hugetlb folios off the from list?
>
> Retried hugetlb folios will be kept in from list.

Couldn't migrate_hugetlbs() remove the failing retried pages from the
list on the final pass? That seems cleaner to me.

>>> +				list_move_tail(&folio->lru, &ret_folios);
>>> +				continue;
>>> +			}
>>> +
>>>  			/*
>>>  			 * Large folio statistics is based on the source large
>>>  			 * folio. Capture required information that might get
>>>  			 * lost during migration.
>>>  			 */
>>> -			is_large = folio_test_large(folio) && !folio_test_hugetlb(folio);
>>> +			is_large = folio_test_large(folio);
>>>  			is_thp = is_large && folio_test_pmd_mappable(folio);
>>>  			nr_pages = folio_nr_pages(folio);
>>> +
>>>  			cond_resched();
>>>  
>>> -			if (folio_test_hugetlb(folio))
>>> -				rc = unmap_and_move_huge_page(get_new_page,
>>> -						put_new_page, private,
>>> -						&folio->page, pass > 2, mode,
>>> -						reason,
>>> -						&ret_folios);
>>> -			else
>>> -				rc = unmap_and_move(get_new_page, put_new_page,
>>> -						private, folio, pass > 2, mode,
>>> -						reason, &ret_folios);
>>> +			rc = unmap_and_move(get_new_page, put_new_page,
>>> +					    private, folio, pass > 2, mode,
>>> +					    reason, &ret_folios);
>>>  			/*
>>>  			 * The rules are:
>>> -			 *	Success: non hugetlb folio will be freed, hugetlb
>>> -			 *		 folio will be put back
>>> +			 *	Success: folio will be freed
>>>  			 *	-EAGAIN: stay on the from list
>>>  			 *	-ENOMEM: stay on the from list
>>>  			 *	-ENOSYS: stay on the from list
>>> @@ -1512,7 +1597,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
>>>  						stats.nr_thp_split += is_thp;
>>>  						break;
>>>  					}
>>> -				/* Hugetlb migration is unsupported */
>>>  				} else if (!no_split_folio_counting) {
>>>  					nr_failed++;
>>>  				}
>
> Best Regards,
> Huang, Ying